Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Jul 31, 2024
Date Accepted: Mar 25, 2025
Comparing Diagnostic Accuracy of Clinical Professionals and Large Language Models: Systematic Review
ABSTRACT
Background:
In the era of healthcare big data, integrating artificial intelligence with clinical decision support systems has become a significant trend. Although many experts have investigated the application of specialized AI and software tools in clinical diagnosis, the performance of large language models in this area remains underexplored.
Objective:
This study systematically reviewed the accuracy of large language model in clinical diagnosis, and provided reference for further clinical application.
Methods:
We conducted searches in CNKI, VIP Database, SinoMed, PubMed, Web of Science, Embase, and CINAHL from January 1, 2017, to the present. Two reviewers independently screened the literature and extracted relevant information. The risk of bias was assessed using the Prediction Model Risk of Bias Assessment Tool (PROBAST), which evaluates both the risk of bias and the applicability of included studies.
Results:
Twenty studies involving seven large language models and a total of 2787 cases were included. Quality assessment indicated that the included studies generally had a low risk of bias. For the optimal model, the accuracy of the primary diagnosis ranged from 25% to 97.8%, while the triage accuracy ranged from 66.7% to 98%.
Conclusions:
Large language models have demonstrated certain diagnostic capabilities and significant potential for application in various clinical cases. Further research involving larger sample sizes, multicenter collaborations, and high-quality studies is necessary to fully explore the diagnostic performance of these models.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.