Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Nov 29, 2023
Open Peer Review Period: Nov 29, 2023 - Jan 24, 2024
Date Accepted: Jul 24, 2024
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

The Diagnostic Ability of GPT-3.5 and GPT-4.0 in Surgery: Comparative Analysis

Liu J, Liang X, Fang D, Zheng J, Yin C, Xie H, Li Y, Sun X, Tong Y, Che H, Hu P, Yang F, Wang B, Chen Y, Cheng G, Zhang J

The Diagnostic Ability of GPT-3.5 and GPT-4.0 in Surgery: Comparative Analysis

J Med Internet Res 2024;26:e54985

DOI: 10.2196/54985

PMID: 39255016

PMCID: 11422746

The Diagnostic Ability of GPT-3.5, GPT-4.0 in Surgery: A Comparative Analysis

  • Jiayu Liu; 
  • Xiuting Liang; 
  • Dandong Fang; 
  • Jiqi Zheng; 
  • Chengliang Yin; 
  • Hui Xie; 
  • Yanteng Li; 
  • Xiaochun Sun; 
  • Yue Tong; 
  • Hebin Che; 
  • Ping Hu; 
  • Fan Yang; 
  • Bingxian Wang; 
  • Yuanyuan Chen; 
  • Gang Cheng; 
  • Jianning Zhang

ABSTRACT

Background:

ChatGPT has shown great potential in clinical diagnosis and could become an excellent auxiliary tool in clinical practice. In our study, we investigate and evaluate ChatGPT in diagnostic capabilities by comparing the performance of GPT-4.0 and GPT-3.5 across model iterations.

Objective:

This study aims to evaluate the precise diagnostic ability of ChatGPT for colon cancer and its potential as an auxiliary diagnostic tool for surgeons.

Methods:

We conducted a prospective study to investigate whether GPT-4.0 or GPT-3.5 can provide an accurate diagnosis based on clinical information from the public case database. The results were analyzed regarding items’ specialty classification and error patterns (medical histories, symptoms, physical signs, examination results, and intraoperative findings).

Results:

For primary diagnose, the accuracy rates of GPT-4.0 were significantly higher than GPT-3.5 (0.972±0.137 vs. 0.855±0.335, p<0.001). For secondary diagnoses, the accuracy rates of GPT-4.0 were also significantly higher than GPT-3.5 (0.908±0.159 vs. 0.617±0.349, p<0.001). GPT-3.5 has limitations in processing patient history, symptom presentation, laboratory tests, and imaging data effectively. GPT-4.0, while improving upon GPT-3.5, still has limitations in identifying symptoms and laboratory test data by classifying and analyzing the causes of misdiagnosis. For both primary and secondary diagnoses, there was no significant difference in age, sex, and system group in GPT-4.0 and GPT-3.5 (p>0.05).

Conclusions:

Based on the findings of this study, it is evident that ChatGPT has potential in the field of medical diagnosis. The diagnostic accuracy of GPT-4.0 is better than that of GPT-3.5, but GPT-4.0 still has limitations regarding patient symptoms and laboratory data recognition. Further studies performed in the dynamic clinical practice environment are needed.


 Citation

Please cite as:

Liu J, Liang X, Fang D, Zheng J, Yin C, Xie H, Li Y, Sun X, Tong Y, Che H, Hu P, Yang F, Wang B, Chen Y, Cheng G, Zhang J

The Diagnostic Ability of GPT-3.5 and GPT-4.0 in Surgery: Comparative Analysis

J Med Internet Res 2024;26:e54985

DOI: 10.2196/54985

PMID: 39255016

PMCID: 11422746

Per the author's request the PDF is not available.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.