JMIR Preprints #79716: Large Language Models Enhance Diagnostic Reasoning of Medical Students in Rheumatology: Randomized Controlled Trial

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Large Language Models Enhance Diagnostic Reasoning of Medical Students in Rheumatology: Randomized Controlled Trial

Anna Roemer;
Nadine Schlicker;
Anna Kernder;
Benedikt Albe;
Juliana Hack;
Martin Hirsch;
Andreas Mayr;
Sebastian Kuhn;
Johannes Knitza

ABSTRACT

Background:

Although large language models (LLMs) have demonstrated promising diagnostic performance, it is uncertain whether their use improves diagnostic reasoning of medical students.

Objective:

To investigate the impact of an LLM on medical students’ diagnostic performance in rheumatology compared with traditional resources.

Methods:

This randomized controlled trial was conducted from January 7 to March 30, 2025, and recruited medical students from University Marburg, Germany. Participants provided a main diagnosis with corresponding diagnostic confidence and up to four additional differential diagnoses for three rheumatic vignettes. Participants were randomized to either use the LLM in addition to traditional diagnostic resources or traditional resources only. The primary outcome was the proportion of cases with a correct top diagnosis. Secondary outcomes included the proportion of cases with a correct diagnosis among top 5 suggestions, a cumulative diagnostic score, diagnostic confidence and case completion time. Diagnostic suggestions were rated by blinded expert consensus.

Results:

A total of 68 participants (mean [SD] age, 24.8 [2.6]) were randomized. Participants using the LLM identified the correct top diagnosis significantly more often than those in the control group (77.5% vs 32.4%), corresponding to an adjusted odds ratio of 7.0 (95% CI: [3.8, 14.4], P<.001) and also outperformed the LLM alone (77.5% vs 71.6%). Mean cumulative diagnostic scores were significantly higher in the LLM group (mean [SD], 12.3 [12.3]) compared with the control group (6.7 [3.2]; Welch t₆₀.₂₂ = 8.1; P<.001). Diagnostic confidence was greater in the LLM group (mean 7.0 [SD 1.3]) than in the control group (mean 6.1 [SD 1.2]; P<.001). Case completion time was significantly longer in the LLM group (mean 505 seconds [SD 131]) compared to the control group (mean 287 seconds [SD 106]; P<.001).

Conclusions:

In this randomized clinical trial, medical students using an LLM achieved significantly higher diagnostic accuracy than those using conventional resources. Students assisted by the LLM also outperformed the model alone, highlighting the potential of human-AI collaboration. These findings suggest that LLMs may help improve clinical reasoning in complex fields such as rheumatology. Clinical Trial: ClinicalTrials.gov Identifier: NCT06748170

Citation

Please cite as:

Roemer A, Schlicker N, Kernder A, Albe B, Hack J, Hirsch M, Mayr A, Kuhn S, Knitza J

Large Language Models Enhance Diagnostic Reasoning of Medical Students in Rheumatology: Randomized Controlled Trial

JMIR Preprints. 26/06/2025:79716

DOI: 10.2196/preprints.79716

URL: https://preprints.jmir.org/preprint/79716

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Previously submitted to: JMIR Medical Education (no longer under consideration since Dec 01, 2025)

Date Submitted: Jun 26, 2025

Open Peer Review Period: Sep 29, 2025 - Nov 24, 2025

(closed for review but you can still tweet)

NOTE: This is an unreviewed Preprint

Large Language Models Enhance Diagnostic Reasoning of Medical Students in Rheumatology: Randomized Controlled Trial

ABSTRACT

Citation

Copyright