JMIR Preprints #67621: Evaluating ChatGPT's Efficacy in Pediatric Pneumonia Detection from Chest X-rays: A Comparative Analysis with Specialized AI Models

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Evaluating ChatGPT's Efficacy in Pediatric Pneumonia Detection from Chest X-rays: A Comparative Analysis with Specialized AI Models

Nitin Chetla;
Mihir Tandon;
Joseph Chang;
Kunal Sukhija;
Romil Patel;
Ramon Sanchez

ABSTRACT

Background:

Machine learning and artificial intelligence have made large impacts in the field of medicine, especially the study of radiology. One of these impacts include the potential ability to help diagnose disease in patients using machine-learning image recognition on patient diagnostic images. The importance of AI in medical imaging can be seen through its use to increase radiology efficiency, reduce diagnostic errors, or even as far as filling in the role of a radiologist when one is not available.

Objective:

Machine learning and artificial intelligence (AI) have indeed shown promise in medicine, particularly in radiology. While AI, especially through image recognition, has demonstrated the ability to assist in diagnosing diseases from patient diagnostic images, its reliability and long-term role are far from certain. The goal of this study is to understand its diagnostic capability and reliability in pediatric pneumonia chest x-rays.

Methods:

To better understand how effective Artificial Intelligence would be at diagnostic imaging, ChatGPT was asked to categorize 1000 pediatric lung x-ray images into either normal or pneumonia groups and measure the accuracy based on confirmed diagnosis. The study did not require Institutional Review Board (IRB) approval because it did not involve patient consent or direct interaction with individuals.

Results:

The results show that while ChatGPT-4 Turbo and ChatGPT-4o have a high specificity rate and sensitivity rate, respectively, the accuracy of the LLM is poor compared to a pneumonia-specific machine-learning algorithm.

Conclusions:

ChatGPT-4 has limitations when being used for diagnosing pneumonia from chest X-ray radiographs as shown by this research. The model's strong bias towards a non-pneumonia diagnosis, limited ability to distinguish between the two classes, need for extensive prompt engineering, and lack of specialized medical knowledge suggest that it may not be suitable for clinical use in its current form.

Citation

Please cite as:

Chetla N, Tandon M, Chang J, Sukhija K, Patel R, Sanchez R

Evaluating ChatGPT’s Efficacy in Pediatric Pneumonia Detection From Chest X-Rays: Comparative Analysis of Specialized AI Models

JMIR AI 2025;4:e67621

DOI: 10.2196/67621

PMID: 39793007

PMCID: 11759907

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR AI

Date Submitted: Oct 16, 2024

Date Accepted: Dec 4, 2024

Evaluating ChatGPT's Efficacy in Pediatric Pneumonia Detection from Chest X-rays: A Comparative Analysis with Specialized AI Models

ABSTRACT

Citation

Copyright