Previously submitted to: JMIR AI (no longer under consideration since Aug 14, 2023)
Date Submitted: Apr 27, 2023
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Diagnostic accuracy of ChatGPT and physicians in patients with abdominal pain: a cohort study
ABSTRACT
Background:
Economic growth has increased the demand for healthcare resources, but has also led to challenges such as lengthy appointment waiting times and a shortage of medical professionals. The uneven distribution of medical infrastructure in some regions has resulted in limited healthcare services in rural or impoverished areas. ChatGPT-3.5, the latest and most popular conversational artificial intelligence(AI), has demonstrated its potential in providing real-time health information and alleviating the burden on healthcare workers. While ChatGPT has performed well in medical knowledge examinations, its capabilities in clinical decision-making remain uncertain.
Objective:
Evaluate the potential value of GPT in medical diagnosis.
Methods:
The diagnostic accuracy of ChatGPT was compared among three groups: patients, questionnaire respondents, and physicians. The results showed that the accuracy was lowest in the patient group (True: 19.1%, False: 80.9%), highest in the physician group (True: 59.6%, False: 39.6%), and moderate in the questionnaire group (True: 51.1%, False: 48.9%). The difference between the patient group and the other groups was statistically significant (p<0.05). Among all disease categories, the highest diagnostic accuracy was observed for appendicitis and pancreatitis, while gastrointestinal tumors were difficult to diagnose accurately across all groups.
Results:
The diagnostic accuracy of ChatGPT was compared among three groups: patients, questionnaire respondents, and physicians. The results showed that the accuracy was lowest in the patient group (True: 19.1%, False: 80.9%), highest in the physician group (True: 59.6%, False: 39.6%), and moderate in the questionnaire group (True: 51.1%, False: 48.9%). The difference between the patient group and the other groups was statistically significant (p<0.05). Among all disease categories, the highest diagnostic accuracy was observed for appendicitis and pancreatitis, while gastrointestinal tumors were difficult to diagnose accurately across all groups.
Conclusions:
This study reveals that ChatGPT demonstrates promising diagnostic accuracy in abdominal pain-related diseases when provided with detailed information. However, limitations in patient self-expression, information-gathering, and humanistic care prevent it from fully replacing doctors. Further development and research are needed to enhance AI's role in assisting medical professionals and providing medical consultation services to patients. Clinical Trial: none
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.