Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Previously submitted to: JMIR AI (no longer under consideration since Aug 14, 2023)

Date Submitted: Apr 27, 2023

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Diagnostic accuracy of ChatGPT and physicians in patients with abdominal pain: a cohort study

  • Weijian Lun; 
  • Canhua Luo; 
  • Yongjia Liu; 
  • Huang Wei Chen; 
  • Guoyin Li

ABSTRACT

Background:

Economic growth has increased the demand for healthcare resources, but has also led to challenges such as lengthy appointment waiting times and a shortage of medical professionals. The uneven distribution of medical infrastructure in some regions has resulted in limited healthcare services in rural or impoverished areas. ChatGPT-3.5, the latest and most popular conversational artificial intelligence(AI), has demonstrated its potential in providing real-time health information and alleviating the burden on healthcare workers. While ChatGPT has performed well in medical knowledge examinations, its capabilities in clinical decision-making remain uncertain.

Objective:

Evaluate the potential value of GPT in medical diagnosis.

Methods:

The diagnostic accuracy of ChatGPT was compared among three groups: patients, questionnaire respondents, and physicians. The results showed that the accuracy was lowest in the patient group (True: 19.1%, False: 80.9%), highest in the physician group (True: 59.6%, False: 39.6%), and moderate in the questionnaire group (True: 51.1%, False: 48.9%). The difference between the patient group and the other groups was statistically significant (p<0.05). Among all disease categories, the highest diagnostic accuracy was observed for appendicitis and pancreatitis, while gastrointestinal tumors were difficult to diagnose accurately across all groups.

Results:

The diagnostic accuracy of ChatGPT was compared among three groups: patients, questionnaire respondents, and physicians. The results showed that the accuracy was lowest in the patient group (True: 19.1%, False: 80.9%), highest in the physician group (True: 59.6%, False: 39.6%), and moderate in the questionnaire group (True: 51.1%, False: 48.9%). The difference between the patient group and the other groups was statistically significant (p<0.05). Among all disease categories, the highest diagnostic accuracy was observed for appendicitis and pancreatitis, while gastrointestinal tumors were difficult to diagnose accurately across all groups.

Conclusions:

This study reveals that ChatGPT demonstrates promising diagnostic accuracy in abdominal pain-related diseases when provided with detailed information. However, limitations in patient self-expression, information-gathering, and humanistic care prevent it from fully replacing doctors. Further development and research are needed to enhance AI's role in assisting medical professionals and providing medical consultation services to patients. Clinical Trial: none


 Citation

Please cite as:

Lun W, Luo C, Liu Y, Chen HW, Li G

Diagnostic accuracy of ChatGPT and physicians in patients with abdominal pain: a cohort study

JMIR Preprints. 27/04/2023:48540

DOI: 10.2196/preprints.48540

URL: https://preprints.jmir.org/preprint/48540

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.