JMIR Preprints #55627: Evaluating ChatGPT-4's Diagnostic Accuracy: Impact of Visual Data Integration

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Evaluating ChatGPT-4's Diagnostic Accuracy: Impact of Visual Data Integration

Takanobu Hirosawa;
Yukinori Harada;
Kazuki Tokumasu;
Takahiro Ito;
Tomoharu Suzuki;
Taro Shimizu

ABSTRACT

Background:

There are several multimodal generative artificial intelligence (AI) systems, including ChatGPT-4 with vision, also known as ChatGPT-4V or ChatGPT-4Vision, accept image data with text data. However, the change in diagnostic accuracy of ChatGPT-4 by adding image data is unknown.

Objective:

We compared the diagnostic accuracy between ChatGPT-4 with vision, inputting text and image (intervention) and ChatGPT-4 without vision, inputting only text (control), for case descriptions derived by case reports.

Methods:

We used the dataset of case descriptions and final diagnoses derived from the American Journal of Case Reports published from January 2022 to March 2023. We also extracted the figures and tables mentioned in case descriptions as image data. We excluded non-diagnostics, pediatric, and case reports without figures or tables in their case descriptions. From the case descriptions and images, ChatGPT-4 with vision generated the differential-diagnosis lists. We compared the diagnostic accuracy by ChatGPT-4 without vision, which was inputted the same case descriptions without images. Two physicians independently evaluated whether the final diagnosis was included in the lists. Discrepancies were resolved by another physician.

Results:

A total of 363 case descriptions were included. The rate of final diagnoses within the top 10 differential-diagnosis lists generated by ChatGPT-4 with vision was 85.1% (309/363), which was not different compared to 87.9% (319/363) by ChatGPT-4 without vision (P=.33). The rate of final diagnoses as the top diagnosis generated by ChatGPT-4 with vision was 44.4% (161/363), inferior to 55.9% (203/363) by ChatGPT-4 without vision (P=.002).

Conclusions:

The rates of final diagnoses within the differential-diagnosis lists generated by ChatGPT-4 with vision were not improved compared to those without vision. The rate of final diagnoses as the top diagnosis generated by ChatGPT-4 with vision was inferior to that without vision. These results suggest that a multimodal generative AI system, ChatGPT-4 with vision, mainly relies on the text data, even though it accepts image data for generating differentials. Multimodal generative AI systems should be further developed to improve diagnostic performance through better integration of clinical data before being utilized in medicine. Clinical Trial: Not applicable

Citation

Please cite as:

Hirosawa T, Harada Y, Tokumasu K, Ito T, Suzuki T, Shimizu T

Evaluating ChatGPT-4’s Diagnostic Accuracy: Impact of Visual Data Integration

JMIR Med Inform 2024;12:e55627

DOI: 10.2196/55627

PMID: 38592758

PMCID: 11040438

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Dec 18, 2023

Open Peer Review Period: Dec 25, 2023 - Feb 19, 2024

Date Accepted: Mar 13, 2024

(closed for review but you can still tweet)

Evaluating ChatGPT-4's Diagnostic Accuracy: Impact of Visual Data Integration

ABSTRACT

Citation

Copyright