JMIR Preprints #69752: A multicentric study comparing a medical LLM's performance with clinical experts in radiation oncology

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

A multicentric study comparing a medical LLM's performance with clinical experts in radiation oncology

Fabio Dennstädt;
Max Schmerder;
Elena Riggenbach;
Lucas Mose;
Katarina Bryjova;
Nicolas Bachmann;
Paul-Henry Mackeprang;
Maiwand Ahmadsei;
Dubravko Sinovcic;
Paul Windisch;
Daniel Zwahlen;
Susanne Rogers;
Oliver Riesterer;
Martin Maffei;
Eleni Gkika;
Hathal Haddad;
Jan Peeken;
Paul Martin Putora;
Markus Glatzer;
Florian Putz;
Daniel Hoefler;
Sebastian Christ;
Irina Filchenko;
Janna Hastings;
Roberto Gaio;
Lawrence Chiang;
Daniel Aebersold;
Nikola Cihoric

ABSTRACT

Background:

Large Language Models (LLMs) hold promise for supporting clinical tasks, particularly in technical fields like radiation oncology. While prior evaluations have focused on exam-style settings, their performance in real-life clinical scenarios remains unclear.

Objective:

This study aimed to assess a state-of-the-art medical LLM’s ability to answer real-world clinical questions in radiation oncology compared to clinical experts.

Methods:

Physicians from 10 departments collected routine clinical questions. Fifty of these questions were answered by three senior radiation oncology experts and the LLM Llama3-OpenBioLLM-70B. In a blinded review, physicians rated answer quality on a 5-point Likert scale, assessed safety, and determined if responses were from the LLM or an expert (recognizability). Comparisons were made for quality, harmfulness, and recognizability.

Results:

There were no significant differences between the quality of the answers between LLM and clinical experts (mean scores of 3.38 vs. 3.63; Median M 4.00, interquartile range, IQR [3.00, 4.00] vs. M 3.67 IQR [3.33, 4.00]; p=0.263). The answers of the LLM were deemed potentially harmful in 16% of cases versus 13% for the clinical experts (p=0.633). Physicians correctly identified whether an answer was provided by an LLM or a clinician in 72% and 78% of cases, respectively.

Conclusions:

The quality of the answers of the LLM seems similar to those of clinical experts. While great caution is recommended while using LLMs in clinical practice, their ability in answering real-life clinical questions is satisfactory, including highly specialized domains like radiation oncology.

Citation

Please cite as:

Dennstädt F, Schmerder M, Riggenbach E, Mose L, Bryjova K, Bachmann N, Mackeprang PH, Ahmadsei M, Sinovcic D, Windisch P, Zwahlen D, Rogers S, Riesterer O, Maffei M, Gkika E, Haddad H, Peeken J, Putora PM, Glatzer M, Putz F, Hoefler D, Christ S, Filchenko I, Hastings J, Gaio R, Chiang L, Aebersold D, Cihoric N

Comparative Evaluation of a Medical Large Language Model in Answering Real-World Radiation Oncology Questions: Multicenter Observational Study

J Med Internet Res 2025;27:e69752

DOI: 10.2196/69752

PMID: 40986858

PMCID: 12504895

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Dec 10, 2024

Date Accepted: Mar 31, 2025

A multicentric study comparing a medical LLM's performance with clinical experts in radiation oncology

ABSTRACT

Citation

Copyright