Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Previously submitted to: JMIR Medical Education (no longer under consideration since Dec 01, 2025)

Date Submitted: Jun 26, 2025
Open Peer Review Period: Sep 29, 2025 - Nov 24, 2025
(closed for review but you can still tweet)

NOTE: This is an unreviewed Preprint

Warning: This is a unreviewed preprint (What is a preprint?). Readers are warned that the document has not been peer-reviewed by expert/patient reviewers or an academic editor, may contain misleading claims, and is likely to undergo changes before final publication, if accepted, or may have been rejected/withdrawn (a note "no longer under consideration" will appear above).

Peer review me: Readers with interest and expertise are encouraged to sign up as peer-reviewer, if the paper is within an open peer-review period (in this case, a "Peer Review Me" button to sign up as reviewer is displayed above). All preprints currently open for review are listed here. Outside of the formal open peer-review period we encourage you to tweet about the preprint.

Citation: Please cite this preprint only for review purposes or for grant applications and CVs (if you are the author).

Final version: If our system detects a final peer-reviewed "version of record" (VoR) published in any journal, a link to that VoR will appear below. Readers are then encourage to cite the VoR instead of this preprint.

Settings: If you are the author, you can login and change the preprint display settings, but the preprint URL/DOI is supposed to be stable and citable, so it should not be removed once posted.

Submit: To post your own preprint, simply submit to any JMIR journal, and choose the appropriate settings to expose your submitted version as preprint.

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Large Language Models Enhance Diagnostic Reasoning of Medical Students in Rheumatology: Randomized Controlled Trial

  • Anna Roemer; 
  • Nadine Schlicker; 
  • Anna Kernder; 
  • Benedikt Albe; 
  • Juliana Hack; 
  • Martin Hirsch; 
  • Andreas Mayr; 
  • Sebastian Kuhn; 
  • Johannes Knitza

ABSTRACT

Background:

Although large language models (LLMs) have demonstrated promising diagnostic performance, it is uncertain whether their use improves diagnostic reasoning of medical students.

Objective:

To investigate the impact of an LLM on medical students’ diagnostic performance in rheumatology compared with traditional resources.

Methods:

This randomized controlled trial was conducted from January 7 to March 30, 2025, and recruited medical students from University Marburg, Germany. Participants provided a main diagnosis with corresponding diagnostic confidence and up to four additional differential diagnoses for three rheumatic vignettes. Participants were randomized to either use the LLM in addition to traditional diagnostic resources or traditional resources only. The primary outcome was the proportion of cases with a correct top diagnosis. Secondary outcomes included the proportion of cases with a correct diagnosis among top 5 suggestions, a cumulative diagnostic score, diagnostic confidence and case completion time. Diagnostic suggestions were rated by blinded expert consensus.

Results:

A total of 68 participants (mean [SD] age, 24.8 [2.6]) were randomized. Participants using the LLM identified the correct top diagnosis significantly more often than those in the control group (77.5% vs 32.4%), corresponding to an adjusted odds ratio of 7.0 (95% CI: [3.8, 14.4], P<.001) and also outperformed the LLM alone (77.5% vs 71.6%). Mean cumulative diagnostic scores were significantly higher in the LLM group (mean [SD], 12.3 [12.3]) compared with the control group (6.7 [3.2]; Welch t₆₀.₂₂ = 8.1; P<.001). Diagnostic confidence was greater in the LLM group (mean 7.0 [SD 1.3]) than in the control group (mean 6.1 [SD 1.2]; P<.001). Case completion time was significantly longer in the LLM group (mean 505 seconds [SD 131]) compared to the control group (mean 287 seconds [SD 106]; P<.001).

Conclusions:

In this randomized clinical trial, medical students using an LLM achieved significantly higher diagnostic accuracy than those using conventional resources. Students assisted by the LLM also outperformed the model alone, highlighting the potential of human-AI collaboration. These findings suggest that LLMs may help improve clinical reasoning in complex fields such as rheumatology. Clinical Trial: ClinicalTrials.gov Identifier: NCT06748170


 Citation

Please cite as:

Roemer A, Schlicker N, Kernder A, Albe B, Hack J, Hirsch M, Mayr A, Kuhn S, Knitza J

Large Language Models Enhance Diagnostic Reasoning of Medical Students in Rheumatology: Randomized Controlled Trial

JMIR Preprints. 26/06/2025:79716

DOI: 10.2196/preprints.79716

URL: https://preprints.jmir.org/preprint/79716

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.