JMIR Preprints #87163: Symptom-Only Localization of Brainstem Ischemia: Large Language Models vs. Neurologists in 109 Diffusion-Weighted Imaging

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Symptom-Only Localization of Brainstem Ischemia: Large Language Models vs. Neurologists in 109 Diffusion-Weighted Imaging–Positive Cases: A Retrospective Study

Nedim Beste;
Thomas Dratsch;
Jonathan Kottlors;
Pia Floßdrof;
Agni-Maria Konitsioti;
Lukas Volz;
Uta Hanning;
Daniel Pinto dos Santos;
Lukas Goertz;
David Zopfs;
Christoph Kabbasch;
Marc Schlamann;
Kai Laukamp;
Michael Schönfeld

ABSTRACT

Background:

Localizing brainstem ischemic lesions based solely on neurological symptoms is challenging due to the complex anatomy and variable symptom presentation. Large language models (LLMs) take an emerging role in medical diagnostics by identifying patterns within clinical narratives.

Objective:

This study evaluates the diagnostic accuracy of LLMs compared to neurologists.

Methods:

We retrospectively analyzed 109 patients with diffusion-weighted imaging (DWI)-confirmed acute brainstem ischemia. Three neurologists and six LLMs (GPT-5, GPT-4, GPT-4.1, GPT-4o, o3, o3 pro) predicted lesion localization (midbrain, pons, medulla) and laterality (left/right) based on clinical symptoms alone. Accuracy, Cohen’s κ, regional performance, and correlations with symptom count were assessed, pairwise Chi2 tests with FDR corrections were performed to compare model performances.

Results:

GPT-4 and GPT-4o achieved the highest overall accuracy (56.0 %, 95 % CI 46.1–65.5), significantly outperforming all neurologists (χ² = 7.4–20.1, p < 0.01) and reasoning-based models. No significant differences were observed among GPT-4, GPT-4o, GPT-4.1, and GPT-5 (p > 0.05). In regional analysis, significant effects were restricted to pontine infarcts, where GPT-4 (74 %) and GPT-4o (69 %) exceeded all neurologists (χ² = 6.4–18.3, p < 0.01). For mesencephalic and medullary lesions, accuracies did not differ significantly (p > 0.05). GPT-o3 pro performed worst overall (10 %, p < 0.001). Cohen’s κ reached 0.29 for GPT-4o, and accuracy correlated with symptom count (r = 0.28, p < 0.01).

Conclusions:

GPT-4, and GPT-4o outperformed experienced neurologists in this constrained diagnostic task. Accuracy remained modest, particularly for non-pontine lesions, and reasoning-augmented models did not improve additional benefit. These findings highlight both the potential and current limitations of LLMs in clinical reasoning, reinforcing the need for multimodal input and prospective validation.

Citation

Please cite as:

Beste N, Dratsch T, Kottlors J, Floßdrof P, Konitsioti AM, Volz L, Hanning U, Pinto dos Santos D, Goertz L, Zopfs D, Kabbasch C, Schlamann M, Laukamp K, Schönfeld M

Symptom-Only Localization of Brainstem Ischemia: Large Language Models vs. Neurologists in 109 Diffusion-Weighted Imaging–Positive Cases: A Retrospective Study

JMIR Preprints. 05/11/2025:87163

DOI: 10.2196/preprints.87163

URL: https://preprints.jmir.org/preprint/87163

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Currently accepted at: JMIR Formative Research

Date Submitted: Nov 5, 2025

Date Accepted: Apr 3, 2026

Symptom-Only Localization of Brainstem Ischemia: Large Language Models vs. Neurologists in 109 Diffusion-Weighted Imaging–Positive Cases: A Retrospective Study

ABSTRACT

Citation

Copyright