Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR AI

Date Submitted: Oct 16, 2024
Date Accepted: Dec 4, 2024

The final, peer-reviewed published version of this preprint can be found here:

Evaluating ChatGPT’s Efficacy in Pediatric Pneumonia Detection From Chest X-Rays: Comparative Analysis of Specialized AI Models

Chetla N, Tandon M, Chang J, Sukhija K, Patel R, Sanchez R

Evaluating ChatGPT’s Efficacy in Pediatric Pneumonia Detection From Chest X-Rays: Comparative Analysis of Specialized AI Models

JMIR AI 2025;4:e67621

DOI: 10.2196/67621

PMID: 39793007

PMCID: 11759907

Evaluating ChatGPT's Efficacy in Pediatric Pneumonia Detection from Chest X-rays: A Comparative Analysis with Specialized AI Models

  • Nitin Chetla; 
  • Mihir Tandon; 
  • Joseph Chang; 
  • Kunal Sukhija; 
  • Romil Patel; 
  • Ramon Sanchez

ABSTRACT

Background:

Machine learning and artificial intelligence have made large impacts in the field of medicine, especially the study of radiology. One of these impacts include the potential ability to help diagnose disease in patients using machine-learning image recognition on patient diagnostic images. The importance of AI in medical imaging can be seen through its use to increase radiology efficiency, reduce diagnostic errors, or even as far as filling in the role of a radiologist when one is not available.

Objective:

Machine learning and artificial intelligence (AI) have indeed shown promise in medicine, particularly in radiology. While AI, especially through image recognition, has demonstrated the ability to assist in diagnosing diseases from patient diagnostic images, its reliability and long-term role are far from certain. The goal of this study is to understand its diagnostic capability and reliability in pediatric pneumonia chest x-rays.

Methods:

To better understand how effective Artificial Intelligence would be at diagnostic imaging, ChatGPT was asked to categorize 1000 pediatric lung x-ray images into either normal or pneumonia groups and measure the accuracy based on confirmed diagnosis. The study did not require Institutional Review Board (IRB) approval because it did not involve patient consent or direct interaction with individuals.

Results:

The results show that while ChatGPT-4 Turbo and ChatGPT-4o have a high specificity rate and sensitivity rate, respectively, the accuracy of the LLM is poor compared to a pneumonia-specific machine-learning algorithm.

Conclusions:

ChatGPT-4 has limitations when being used for diagnosing pneumonia from chest X-ray radiographs as shown by this research. The model's strong bias towards a non-pneumonia diagnosis, limited ability to distinguish between the two classes, need for extensive prompt engineering, and lack of specialized medical knowledge suggest that it may not be suitable for clinical use in its current form.


 Citation

Please cite as:

Chetla N, Tandon M, Chang J, Sukhija K, Patel R, Sanchez R

Evaluating ChatGPT’s Efficacy in Pediatric Pneumonia Detection From Chest X-Rays: Comparative Analysis of Specialized AI Models

JMIR AI 2025;4:e67621

DOI: 10.2196/67621

PMID: 39793007

PMCID: 11759907

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.