JMIR Preprints #54985: The Diagnostic Ability of GPT-3.5, GPT-4.0 in Surgery: A Comparative Analysis

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

The Diagnostic Ability of GPT-3.5, GPT-4.0 in Surgery: A Comparative Analysis

Jiayu Liu;
Xiuting Liang;
Dandong Fang;
Jiqi Zheng;
Chengliang Yin;
Hui Xie;
Yanteng Li;
Xiaochun Sun;
Yue Tong;
Hebin Che;
Ping Hu;
Fan Yang;
Bingxian Wang;
Yuanyuan Chen;
Gang Cheng;
Jianning Zhang

ABSTRACT

Background:

ChatGPT has shown great potential in clinical diagnosis and could become an excellent auxiliary tool in clinical practice. In our study, we investigate and evaluate ChatGPT in diagnostic capabilities by comparing the performance of GPT-4.0 and GPT-3.5 across model iterations.

Objective:

This study aims to evaluate the precise diagnostic ability of ChatGPT for colon cancer and its potential as an auxiliary diagnostic tool for surgeons.

Methods:

We conducted a prospective study to investigate whether GPT-4.0 or GPT-3.5 can provide an accurate diagnosis based on clinical information from the public case database. The results were analyzed regarding items’ specialty classification and error patterns (medical histories, symptoms, physical signs, examination results, and intraoperative findings).

Results:

For primary diagnose, the accuracy rates of GPT-4.0 were significantly higher than GPT-3.5 (0.972±0.137 vs. 0.855±0.335, p<0.001). For secondary diagnoses, the accuracy rates of GPT-4.0 were also significantly higher than GPT-3.5 (0.908±0.159 vs. 0.617±0.349, p<0.001). GPT-3.5 has limitations in processing patient history, symptom presentation, laboratory tests, and imaging data effectively. GPT-4.0, while improving upon GPT-3.5, still has limitations in identifying symptoms and laboratory test data by classifying and analyzing the causes of misdiagnosis. For both primary and secondary diagnoses, there was no significant difference in age, sex, and system group in GPT-4.0 and GPT-3.5 (p>0.05).

Conclusions:

Based on the findings of this study, it is evident that ChatGPT has potential in the field of medical diagnosis. The diagnostic accuracy of GPT-4.0 is better than that of GPT-3.5, but GPT-4.0 still has limitations regarding patient symptoms and laboratory data recognition. Further studies performed in the dynamic clinical practice environment are needed.

Citation

Please cite as:

Liu J, Liang X, Fang D, Zheng J, Yin C, Xie H, Li Y, Sun X, Tong Y, Che H, Hu P, Yang F, Wang B, Chen Y, Cheng G, Zhang J

The Diagnostic Ability of GPT-3.5 and GPT-4.0 in Surgery: Comparative Analysis

J Med Internet Res 2024;26:e54985

DOI: 10.2196/54985

PMID: 39255016

PMCID: 11422746

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Nov 29, 2023

Open Peer Review Period: Nov 29, 2023 - Jan 24, 2024

Date Accepted: Jul 24, 2024

(closed for review but you can still tweet)

The Diagnostic Ability of GPT-3.5, GPT-4.0 in Surgery: A Comparative Analysis

ABSTRACT

Citation

Copyright

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Nov 29, 2023

Open Peer Review Period: Nov 29, 2023 - Jan 24, 2024

Date Accepted: Jul 24, 2024

(closed for review but you can still tweet)

The Diagnostic Ability of GPT-3.5, GPT-4.0 in Surgery: A Comparative Analysis

ABSTRACT

Citation

Per the author's request the PDF is not available.

Copyright