Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Jan 6, 2024
Date Accepted: May 8, 2024
ChatGPT-4 outperforms emergency department physicians in diagnostic accuracy: retrospective analysis
ABSTRACT
Background:
OpenAI's Chat Generative Pretrained Transformer (ChatGPT) is a pioneering artificial intelligence (AI) in natural language processing and offers significant potential in medicine for treatment advice. Also, recent studies show promising results using ChatGPT for emergency medicine triage, however its diagnostic accuracy in the emergency department has not been evaluated.
Objective:
This study compares the diagnostic accuracy of ChatGPT versions 3.5 and 4 against primary treating resident physicians in an emergency room (ER) setting.
Methods:
In 100 adults admitted to our ER in January 2023 for internal medicine issues the diagnostic accuracy was assessed by comparing the diagnoses of ER resident physicians and ChatGPT versions 3.5 and 4 against the final hospital discharge diagnosis, using a point system for grading accuracy.
Results:
The enrolled 100 patients, with a median age of 72, were admitted to our internal medicine emergency department, primarily for cardiovascular, endocrine or gastrointestinal and infectious diseases. ChatGPT-4 outperformed both ChatGPT-3.5 (p < 0.001) and ER resident physicians (p = 0.012) in diagnostic accuracy for internal medicine emergencies. Also, across various disease subgroups, ChatGPT-4 consistently outperformed ChatGPT-3.5 and resident physicians, with significant superiority in cardiovascular (ChatGPT-4 vs. ER physicians: p = 0.029) and endocrine or gastrointestinal diseases (ChatGPT-4 vs. ChatGPT-3.5: p = 0.014), while in other categories, the differences were not statistically significant.
Conclusions:
In this study comparing the diagnostic accuracy of ChatGPT-3.5, ChatGPT-4, and ER resident physicians against a discharge diagnosis gold standard, ChatGPT-4 outperformed both the resident physician and its predecessor, ChatGPT-3.5. Despite the study's retrospective design and limited sample size, its results underscore AI's potential as a supportive diagnostic tool in ER settings.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.