Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Feb 2, 2025
Open Peer Review Period: Feb 2, 2025 - Mar 30, 2025
Date Accepted: Apr 21, 2025
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Large Language Models in Medical Diagnostics: Scoping Review With Bibliometric Analysis

Su H, Sun Y, Li R, Zhang A, Yang Y, Xiao F, Duan Z, Chen J, Hu Q, Yang T, Xu B, Zhang Q, Zhao J, Li Y, Li H

Large Language Models in Medical Diagnostics: Scoping Review With Bibliometric Analysis

J Med Internet Res 2025;27:e72062

DOI: 10.2196/72062

PMID: 40489764

PMCID: 12186007

Large language models in medical diagnostics: A scoping review with bibliometric analysis

  • Hankun Su; 
  • Yuanyuan Sun; 
  • Ruiting Li; 
  • Aozhe Zhang; 
  • Yuemeng Yang; 
  • Fen Xiao; 
  • Zhiying Duan; 
  • Jingjing Chen; 
  • Qin Hu; 
  • Tianli Yang; 
  • Bin Xu; 
  • Qiong Zhang; 
  • Jing Zhao; 
  • Yanping Li; 
  • Hui Li

ABSTRACT

Background:

In the critical domain of healthcare, the efficacy of medical decision-making and diagnostic accuracy is essential for managing medical conditions effectively. Large Language Models (LLMs) have emerged as sophisticated AI systems that can autonomously generate responses to inquiries and engage in interactive dialogues. Their potential to enhance various facets of healthcare, especially in diagnostics, has garnered significant interest. However, there is a noticeable absence of comprehensive analysis examining the evolution and analytical appraisal of LLMs in medical diagnosis.

Objective:

This scoping review aims to provide an overview of the current state of research regarding the use of LLMs in medical diagnostics. The study seeks to answer three primary subquestions: (1) Which LLMs are commonly used and how are they assessed in diagnosis? (2) What is the current performance of LLMs in diagnosing diseases? (3) Which medical domains are currently investigating the application of LLMs?

Methods:

This scoping review was conducted according to the JBI Manual for evidence synthesis and adheres to the PRISMA extension for scoping reviews (PRISMA-ScR). Relevant literature was searched from Web of Science, PubMed, Embase, IEEE Xplore, and ACM Digital Library databases from 2022 to 2025. Articles were screened and selected based on predefined inclusion and exclusion criteria. Bibliometric analysis was performed using VoSviewer to identify major research clusters and trends. Data extraction included details on LLM types, application domains, and performance metrics.

Results:

A total of 95 articles were included in the final review. The bibliometric analysis identified three major clusters: (1) The assessment of LLMs in medical diagnosis, (2) The application of LLMs in medical diagnosis, and (3) The impacts of utilizing LLMs in medical diagnosis. GPT-4 and its variants were the most commonly used LLMs, with 73.7% of studies employing them. LLMs showed significant potential in disease classification, medical question answering, and generating high-quality diagnostic content. However, concerns regarding bias and ethical issues were highlighted. The application of LLMs was most prevalent in radiology (17.1%) and psychiatry, with promising results in improving diagnostic accuracy and facilitating clinical decision-making.

Conclusions:

LLMs, particularly GPT-4 and its variants, have shown substantial promise in enhancing diagnostic accuracy and facilitating clinical decision-making across various medical specialties. However, critical challenges remain, including the potential for bias and the complexity of real-world clinical scenarios. Future research should focus on refining LLMs to address these challenges, with a particular emphasis on eliminating bias and protecting patient privacy. Practical implementation studies evaluating the impact of LLMs on patient outcomes and healthcare workflow are also needed.


 Citation

Please cite as:

Su H, Sun Y, Li R, Zhang A, Yang Y, Xiao F, Duan Z, Chen J, Hu Q, Yang T, Xu B, Zhang Q, Zhao J, Li Y, Li H

Large Language Models in Medical Diagnostics: Scoping Review With Bibliometric Analysis

J Med Internet Res 2025;27:e72062

DOI: 10.2196/72062

PMID: 40489764

PMCID: 12186007

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.