JMIR Preprints #76800: Evaluating Large Multimodal Models in COVID-19 Pneumonia Detection: A Case Study Using Chest X-Rays

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Evaluating Large Multimodal Models in COVID-19 Pneumonia Detection: A Case Study Using Chest X-Rays

Nitin Chetla;
Tamer Hage;
Tamer Hage;
Swapna Vaja;
Harshita Kacham;
Yasmeen Abisaab;
Rahul Reddy;
Sai Samayamanthula;
Varun Raja;
Kunal Sukhija

ABSTRACT

Background:

Recent advances in large language models (LLMs) have enabled the development of multimodal systems capable of interpreting both text and medical images. These models show promise in automating clinical tasks such as diagnostic image review. However, their real-world performance, especially in high-stakes scenarios like detecting COVID-19 pneumonia on chest X-rays (CXRs), remains underexplored.

Objective:

To assess the diagnostic accuracy of Gemini 2.0, a state-of-the-art multimodal LLM, in detecting COVID-19 pneumonia from CXRs and compare its performance to prior evaluations of ChatGPT-4 Turbo and ChatGPT-4o on the same dataset.

Methods:

We used the publicly available COVIDx CXR-4 dataset (n=20,000), equally divided between pneumonia-positive and negative cases. Each image was submitted to Gemini 2.0 via its API with a standardized diagnostic prompt. Output responses were analyzed to calculate accuracy, precision, recall, and F1-score. Results were compared with prior benchmark evaluations using ChatGPT models.

Results:

Gemini 2.0 achieved an overall diagnostic accuracy of 45%. Precision and recall for pneumonia-positive cases were 34% and 11%, respectively. For pneumonia-negative cases, precision was 47% and recall 79%. Compared to ChatGPT-4 Turbo (54.1%) and ChatGPT-4o (61.2%), Gemini 2.0 demonstrated inferior performance on the same dataset.

Conclusions:

Despite its multimodal capabilities, Gemini 2.0 underperformed compared to other LLMs in detecting COVID-19 pneumonia from CXRs, particularly in sensitivity. These findings underscore the limitations of current multimodal AI systems for clinical imaging and highlight the need for further development and validation prior to deployment in diagnostic settings. Clinical Trial: N/A

Citation

Please cite as:

Chetla N, Hage T, Hage T, Vaja S, Kacham H, Abisaab Y, Reddy R, Samayamanthula S, Raja V, Sukhija K

Evaluating Large Multimodal Models in COVID-19 Pneumonia Detection: A Case Study Using Chest X-Rays

JMIR Preprints. 30/04/2025:76800

DOI: 10.2196/preprints.76800

URL: https://preprints.jmir.org/preprint/76800

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Previously submitted to: JMIR Biomedical Engineering (no longer under consideration since Jul 08, 2025)

Date Submitted: Apr 30, 2025

Open Peer Review Period: Jun 2, 2025 - Jul 28, 2025

(closed for review but you can still tweet)

NOTE: This is an unreviewed Preprint

Evaluating Large Multimodal Models in COVID-19 Pneumonia Detection: A Case Study Using Chest X-Rays

ABSTRACT

Citation

Copyright