Evaluating ChatGPT's Efficacy in Pediatric Pneumonia Detection from Chest X-rays: A Comparative Analysis with Specialized AI Models
ABSTRACT
Background:
Machine learning and artificial intelligence have made large impacts in the field of medicine, especially the study of radiology. One of these impacts include the potential ability to help diagnose disease in patients using machine-learning image recognition on patient diagnostic images. The importance of AI in medical imaging can be seen through its use to increase radiology efficiency, reduce diagnostic errors, or even as far as filling in the role of a radiologist when one is not available.
Objective:
Machine learning and artificial intelligence (AI) have indeed shown promise in medicine, particularly in radiology. While AI, especially through image recognition, has demonstrated the ability to assist in diagnosing diseases from patient diagnostic images, its reliability and long-term role are far from certain. The goal of this study is to understand its diagnostic capability and reliability in pediatric pneumonia chest x-rays.
Methods:
To better understand how effective Artificial Intelligence would be at diagnostic imaging, ChatGPT was asked to categorize 1000 pediatric lung x-ray images into either normal or pneumonia groups and measure the accuracy based on confirmed diagnosis. The study did not require Institutional Review Board (IRB) approval because it did not involve patient consent or direct interaction with individuals.
Results:
The results show that while ChatGPT-4 Turbo and ChatGPT-4o have a high specificity rate and sensitivity rate, respectively, the accuracy of the LLM is poor compared to a pneumonia-specific machine-learning algorithm.
Conclusions:
ChatGPT-4 has limitations when being used for diagnosing pneumonia from chest X-ray radiographs as shown by this research. The model's strong bias towards a non-pneumonia diagnosis, limited ability to distinguish between the two classes, need for extensive prompt engineering, and lack of specialized medical knowledge suggest that it may not be suitable for clinical use in its current form.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.