JMIR Preprints #86974: Supporting Radiology Resident Education and Clinical Decision-Making with Large Language Models: Comparative Study of Reasoning Models DeepSeek-R1 and ChatGPT-o1

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Supporting Radiology Resident Education and Clinical Decision-Making with Large Language Models: Comparative Study of Reasoning Models DeepSeek-R1 and ChatGPT-o1

Semil Eminovic;
Robin Schmidt;
Bogdan Levita;
Maximilian Lindholz;
Anna-Maria Haack;
Alina Burdenski;
Maurice Bui;
Isabel Theresa Schobert;
Andrea Dell’Orco;
Jawed Nawabi;
Tobias Penzkofer

ABSTRACT

Background:

Radiology trainees require efficient, accurate, and accessible resources to master complex imaging techniques and identify findings guiding clinical decision-making. Large language models (LLMs) are emerging as promising tools for medical education and clinical workflows, offering the potential to enhance learning by providing instant feedback, aiding in diagnostic accuracy, and offering personalized learning experiences. However, systematic comparisons of LLMs for radiology education and clinical support remain limited, particularly regarding differences across subspecialties and resident experience levels.

Objective:

This study aimed to evaluate and compare the response quality of two state-of-the-art reasoning-based LLMs, namely DeepSeek-R1 and ChatGPT-o1 as clinical and radiology residency support tools, comparing performance across clinical and didactic dimensions including text- and image-based responses.

Methods:

Twenty-seven radiology questions covering nine radiological subspecialties were answered by both LLMs. Additionally, six image-based questions were presented only to ChatGPT-o1 due to its image processing capabilities. Responses were independently rated by seven radiology residents (postgraduate years 2 – 5) across nine rating criteria grouped into three dimensions (factual accuracy, clinical practicality, didactic value), using a 5-point Likert scale. Statistics compared LLMs, reader experience, and response types for text- as well as image-based for ChatGPT-o1 queries.

Results:

DeepSeek-R1 consistently outperformed ChatGPT-o1 across all rating dimensions with highly significant differences across all criteria (P<.001). Consistently, DeepSeek-R1 also descriptively outperformed ChatGPT-o1 across all subspecialties. For both LLMs accumulated, junior residents tended to rate slightly higher than seniors in seven of nine criteria, although differences were not statistically significant. However, for ChatGPT-o1, junior residents rated significantly higher in overall score across all criteria (P=.017). Image-based responses by ChatGPT-o1 scored significantly lower than text-based (P=.007), particularly in Factual Accuracy (P<.001) and Clinical Practicality (P=.025).

Conclusions:

Both DeepSeek-R1 and ChatGPT-o1 demonstrate promising potential in enhancing radiology education, with DeepSeek-R1 outperforming ChatGPT-o1 in all evaluated criteria. The results emphasize the value of open-source LLMs in radiology training, offering significant insights into how these models can be integrated into educational and clinical environments. Future research should explore the refinement of these models, particularly in the domain of image-based responses, to further optimize their role in resident training and clinical support.

Citation

Please cite as:

Eminovic S, Schmidt R, Levita B, Lindholz M, Haack AM, Burdenski A, Bui M, Schobert IT, Dell’Orco A, Nawabi J, Penzkofer T

Supporting Radiology Resident Education and Clinical Decision-Making With Large Language Models: Comparative Study of Reasoning Models DeepSeek-R1 and ChatGPT-o1

JMIR AI 2026;5:e86974

DOI: 10.2196/86974

PMID: 42361338

PMCID: 13309062

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR AI

Date Submitted: Nov 2, 2025

Date Accepted: Apr 30, 2026

Supporting Radiology Resident Education and Clinical Decision-Making with Large Language Models: Comparative Study of Reasoning Models DeepSeek-R1 and ChatGPT-o1

ABSTRACT

Citation

Copyright