JMIR Preprints #68998: Beyond enthusiasm. Scientific evidence for clinical text summarization using large language models: A scoping review

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Beyond enthusiasm. Scientific evidence for clinical text summarization using large language models: A scoping review

Lydie Bednarczyk;
Daniel Reichenpfader;
Christophe Gaudet-Blavignac;
Amon Kenna Ette;
Jamil Zaghir;
Yuanyuan Zheng;
Adel Bensahla;
Mina Bjelogrlic;
Christian Lovis

ABSTRACT

Background:

Information overload in electronic health records (EHRs) requires effective solutions to alleviate clinicians' administrative tasks. Automatically summarizing clinical text has gained significant attention with the rise of large language models (LLMs). While individual studies show strong optimism, a structured overview of the state-of-research is currently not available.

Objective:

We aim to present the current state of the art on clinical text summarization using large language models. In addition, we aim to evaluate the level of evidence in the current state of research and assess the reliability of performance findings for clinical application.

Methods:

This scoping review complies with the PRISMA-ScR guidelines. Literature published between January 1, 2019, and June 18, 2024, is identified from five databases: PubMed, Embase, Web of Science, IEEE Xplore, and ACM Digital Library. Data related to experimental design, evaluation methods, and other relevant factors are systematically collected and analyzed by three authors independently.

Results:

A total of 30 original studies are included in the analysis. The research landscape demonstrates a narrow research focus, predominantly centered on summarizing specific Chest X-Ray reports (26.7%), primarily involving patients in Intensive Care Units (50%) and data originating from U.S.-based institutions (63.3%). This focus aligns with the frequent reliance on the open-source MIMIC dataset (50%). While summarization methodologies vary, significant under-reporting exists in data input structure (50%), input source count (80%), summarization technique (33.3%), and deployment environment (83.3%). Heterogeneous evaluation frameworks hinder research integration, while reported strategies might fail to capture models’ translational value. In addition, ethical considerations are largely overlooked, with bias analysis entirely absent and only one study (3.3%) addressing risk analysis.

Conclusions:

While enthusiasm regarding large language models is warranted, our review highlights the importance of maintaining a measured, clear-sighted, and evidence-based approach. Limited scientific evidence stems from under-reported experimental designs and heterogeneous evaluation frameworks across studies. Prudent and carefully monitored use of these models in clinical settings is therefore crucial. To advance the field, future research should emphasize transparency to enable research integration and build on prior work. Moreover, evaluation frameworks must prioritize the translational value of these models to more effectively assess their performance, applicability, and alignment with ethical standards in clinical settings.

Citation

Please cite as:

Bednarczyk L, Reichenpfader D, Gaudet-Blavignac C, Ette AK, Zaghir J, Zheng Y, Bensahla A, Bjelogrlic M, Lovis C

Scientific Evidence for Clinical Text Summarization Using Large Language Models: Scoping Review

J Med Internet Res 2025;27:e68998

DOI: 10.2196/68998

PMID: 40371947

PMCID: 12123242

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Nov 20, 2024

Date Accepted: Mar 12, 2025

Beyond enthusiasm. Scientific evidence for clinical text summarization using large language models: A scoping review

ABSTRACT

Citation

Copyright