JMIR Preprints #60226: Performance of the Generative Artificial Intelligence Chatbot in Ophthalmic Registration and Clinical Diagnosis: a Cross-sectional Study

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)

Performance of the Generative Artificial Intelligence Chatbot in Ophthalmic Registration and Clinical Diagnosis: a Cross-sectional Study

Shuai Ming;
Xi Yao;
Xiaohong Guo;
Qingge Guo;
Kunpeng Xie;
Dandan Chen;
Bo Lei

ABSTRACT

Background:

Artificial Intelligence (AI) chatbots like ChatGPT are expected to impact vision healthcare significantly. Their potential to optimize the consultation process and diagnostic capabilities across range of ophthalmic sub-specialties remain to be fully explored.

Objective:

To investigate the performance of AI chatbots in recommending ophthalmic outpatient registration and in diagnosing eye diseases within clinical case profiles.

Methods:

This cross-sectional study utilized clinical cases from the Chinese Standardized Resident Training (SRT) - Ophthalmology (2nd Edition). For each case, two profiles were created: “Patient with History” (Hx) and “Patient with History and Examination” (Hx and Ex). These profiles served as independent queries for ChatGPT-3.5 and 4.0 (accessed from March 5-18, 2024). Similarly, three ophthalmic residents were posed the same profiles in a questionnaire format. The accuracy of recommending ophthalmic sub-specialty registration was primarily evaluated using “Hx” profiles. The accuracy of the top-ranked diagnosis, and the accuracy of diagnosis within the top three suggestions (do-not-miss diagnosis), were assessed using “Hx and Ex” profiles. The gold standard for judgment was the published official diagnosis. Characteristics of incorrect diagnoses by ChatGPT were also analyzed.

Results:

A total of 208 clinical profiles from 12 ophthalmic sub-specialties were analyzed (104 “Hx” and 104 “Hx + Ex”). For “Hx” cases, GPT-3.5, GPT-4.0, and residents showed comparable accuracy in registration suggestions (63.5%, 77.9%, and 69.2%, respectively, P = 0.073), with ocular trauma, retinal diseases, and strabismus & amblyopia achieving the top three accuracy. For “Hx + Ex” cases, both GPT-4.0 and residents demonstrated higher diagnostic accuracy than GPT-3.5 (59.6% and 60.6% vs. 39.4%, P = 0.003 and P = 0.001). Accuracy for “do-not-miss” diagnoses also improved (76.0% and 65.4% vs. 49.0%, P < 0.001 and P = 0.015). The highest diagnostic accuracy were observed in glaucoma, lens diseases, and eyelid/lacrimal/orbital diseases. GPT-4.0 recorded fewer incorrect top-3 diagnosis (59.5% vs. 84.1%, P = 0.005) and more partially correct diagnosis (50% vs. 11.1%, P < 0.001) than GPT-3.5, while GPT-3.5 had more completely incorrect (42.9% vs. 16.7%, P = 0.005) and less precise diagnosis (34.9% vs. 11.9%, P = 0.009).

Conclusions:

GPT-3.5 and GPT-4.0 showed intermediate performance in recommending ophthalmic sub-specialties for registration. While GPT-3.5 under-performed, GPT-4.0 approached and numerically surpassed residents in differential diagnosis. AI chatbots show promise in facilitating ophthalmic patient registration. However, their integration into diagnostic decision-making requires more validation.

Citation

Please cite as:

Ming S, Yao X, Guo X, Guo Q, Xie K, Chen D, Lei B

Performance of ChatGPT in Ophthalmic Registration and Clinical Diagnosis: Cross-Sectional Study

J Med Internet Res 2024;26:e60226

DOI: 10.2196/60226

PMID: 39541581

PMCID: 11605262

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

JMIR Publications

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: May 5, 2024

Date Accepted: Oct 15, 2024

(closed for review but you can still tweet)

Performance of the Generative Artificial Intelligence Chatbot in Ophthalmic Registration and Clinical Diagnosis: a Cross-sectional Study

ABSTRACT

Citation