Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Nov 20, 2024
Date Accepted: Mar 12, 2025

The final, peer-reviewed published version of this preprint can be found here:

Scientific Evidence for Clinical Text Summarization Using Large Language Models: Scoping Review

Bednarczyk L, Reichenpfader D, Gaudet-Blavignac C, Ette AK, Zaghir J, Zheng Y, Bensahla A, Bjelogrlic M, Lovis C

Scientific Evidence for Clinical Text Summarization Using Large Language Models: Scoping Review

J Med Internet Res 2025;27:e68998

DOI: 10.2196/68998

PMID: 40371947

PMCID: 12123242

Beyond enthusiasm. Scientific evidence for clinical text summarization using large language models: A scoping review

  • Lydie Bednarczyk; 
  • Daniel Reichenpfader; 
  • Christophe Gaudet-Blavignac; 
  • Amon Kenna Ette; 
  • Jamil Zaghir; 
  • Yuanyuan Zheng; 
  • Adel Bensahla; 
  • Mina Bjelogrlic; 
  • Christian Lovis

ABSTRACT

Background:

Information overload in electronic health records (EHRs) requires effective solutions to alleviate clinicians' administrative tasks. Automatically summarizing clinical text has gained significant attention with the rise of large language models (LLMs). While individual studies show strong optimism, a structured overview of the state-of-research is currently not available.

Objective:

We aim to present the current state of the art on clinical text summarization using large language models. In addition, we aim to evaluate the level of evidence in the current state of research and assess the reliability of performance findings for clinical application.

Methods:

This scoping review complies with the PRISMA-ScR guidelines. Literature published between January 1, 2019, and June 18, 2024, is identified from five databases: PubMed, Embase, Web of Science, IEEE Xplore, and ACM Digital Library. Data related to experimental design, evaluation methods, and other relevant factors are systematically collected and analyzed by three authors independently.

Results:

A total of 30 original studies are included in the analysis. The research landscape demonstrates a narrow research focus, predominantly centered on summarizing specific Chest X-Ray reports (26.7%), primarily involving patients in Intensive Care Units (50%) and data originating from U.S.-based institutions (63.3%). This focus aligns with the frequent reliance on the open-source MIMIC dataset (50%). While summarization methodologies vary, significant under-reporting exists in data input structure (50%), input source count (80%), summarization technique (33.3%), and deployment environment (83.3%). Heterogeneous evaluation frameworks hinder research integration, while reported strategies might fail to capture models’ translational value. In addition, ethical considerations are largely overlooked, with bias analysis entirely absent and only one study (3.3%) addressing risk analysis.

Conclusions:

While enthusiasm regarding large language models is warranted, our review highlights the importance of maintaining a measured, clear-sighted, and evidence-based approach. Limited scientific evidence stems from under-reported experimental designs and heterogeneous evaluation frameworks across studies. Prudent and carefully monitored use of these models in clinical settings is therefore crucial. To advance the field, future research should emphasize transparency to enable research integration and build on prior work. Moreover, evaluation frameworks must prioritize the translational value of these models to more effectively assess their performance, applicability, and alignment with ethical standards in clinical settings.


 Citation

Please cite as:

Bednarczyk L, Reichenpfader D, Gaudet-Blavignac C, Ette AK, Zaghir J, Zheng Y, Bensahla A, Bjelogrlic M, Lovis C

Scientific Evidence for Clinical Text Summarization Using Large Language Models: Scoping Review

J Med Internet Res 2025;27:e68998

DOI: 10.2196/68998

PMID: 40371947

PMCID: 12123242

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.