JMIR Preprints #66476: Improving Large Language Models Summarization by Highlighting Discharge Notes: A Comparative Evaluation

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Improving Large Language Models Summarization by Highlighting Discharge Notes: A Comparative Evaluation

Mahshad Koohi Habibi Dehkordi;
Yehoshua Perl;
Fadi P Deek;
Zhe He;
Vipina K Keloth;
Hao Liu;
Gai Elhanan;
Andrew J Einstein

ABSTRACT

Background:

The American Medical Association recommends that Electronic Health Record (EHR) notes, often dense and written in nuanced language, be made readable for patients and laypeople, a practice we refer to as the simplification of EHR notes. Our approach to achieve simplification of EHR notes involves a process of incremental simplification steps to achieve the ideal note. In this paper we present the first step of this process. Large Language Models (LLMs), have demonstrated considerable success in text summarization. Such LLMs summaries re-present the content of EHR notes in an easier to read language. However, LLMs summaries can also introduce inaccuracies.

Objective:

Our objective is to obtain more accurate summaries of EHR notes. For this purpose, we aim to prove a hypothesis that summaries, generated by LLMs, of highlighted EHR notes are likely to be more accurate than such summaries of the original notes.

Methods:

To test our hypothesis, we perform a study where we randomly sampled 15 EHR notes from the MIMIC III database and highlighted them. Highlighting of EHR notes is done automatically using an Interface Technology we previously designed using Machine Learning techniques. To calibrate the LLMs summaries for our simplification goal, we have chosen GPT-4o and used prompt engineering to ensure high-quality prompts and address issues of output inconsistency and prompt sensitivity. We provide both highlighted and unhighlighted versions of each EHR note along with their corresponding prompts to GPT-4o. Each generated summary is manually evaluated to assess its quality using the evaluation metrics: completeness, correctness, and structural integrity.

Results:

On average, summaries from highlighted notes (H-summaries) achieved 96% completeness, 8% higher than summaries from unhighlighted notes (U-summaries). Moreover, H-summaries demonstrated better correctness, with fewer instances of erroneous information. Furthermore, H-summaries included fewer structural errors, such as improper headers and misplaced information. We show that, our findings support the hypothesis that summarizing highlighted EHR notes improves accuracy.

Conclusions:

Feeding the LLMs with highlighted EHR notes, combined with prompt engineering, results in generating higher-quality summaries in terms of correctness, completeness, and structural integrity, compare to unhighlighted EHR notes. The summaries generated with this approach will later be used to further simplify EHR notes for patients and laypeople, as recommended by the NIH.

Citation

Please cite as:

Koohi Habibi Dehkordi M, Perl Y, Deek FP, He Z, Keloth VK, Liu H, Elhanan G, Einstein AJ

Improving Large Language Models’ Summarization Accuracy by Adding Highlights to Discharge Notes: Comparative Evaluation

JMIR Med Inform 2025;13:e66476

DOI: 10.2196/66476

PMID: 40705416

PMCID: 12332456

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Sep 15, 2024

Date Accepted: Jun 16, 2025

Improving Large Language Models Summarization by Highlighting Discharge Notes: A Comparative Evaluation

ABSTRACT

Citation

Copyright