Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Currently accepted at: JMIR Formative Research

Date Submitted: Nov 5, 2025
Date Accepted: Apr 3, 2026

This paper has been accepted and is currently in production.

It will appear shortly on 10.2196/87163

The final accepted version (not copyedited yet) is in this tab.

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Symptom-Only Localization of Brainstem Ischemia: Large Language Models vs. Neurologists in 109 Diffusion-Weighted Imaging–Positive Cases: A Retrospective Study

  • Nedim Beste; 
  • Thomas Dratsch; 
  • Jonathan Kottlors; 
  • Pia Floßdrof; 
  • Agni-Maria Konitsioti; 
  • Lukas Volz; 
  • Uta Hanning; 
  • Daniel Pinto dos Santos; 
  • Lukas Goertz; 
  • David Zopfs; 
  • Christoph Kabbasch; 
  • Marc Schlamann; 
  • Kai Laukamp; 
  • Michael Schönfeld

ABSTRACT

Background:

Localizing brainstem ischemic lesions based solely on neurological symptoms is challenging due to the complex anatomy and variable symptom presentation. Large language models (LLMs) take an emerging role in medical diagnostics by identifying patterns within clinical narratives.

Objective:

This study evaluates the diagnostic accuracy of LLMs compared to neurologists.

Methods:

We retrospectively analyzed 109 patients with diffusion-weighted imaging (DWI)-confirmed acute brainstem ischemia. Three neurologists and six LLMs (GPT-5, GPT-4, GPT-4.1, GPT-4o, o3, o3 pro) predicted lesion localization (midbrain, pons, medulla) and laterality (left/right) based on clinical symptoms alone. Accuracy, Cohen’s κ, regional performance, and correlations with symptom count were assessed, pairwise Chi2 tests with FDR corrections were performed to compare model performances.

Results:

GPT-4 and GPT-4o achieved the highest overall accuracy (56.0 %, 95 % CI 46.1–65.5), significantly outperforming all neurologists (χ² = 7.4–20.1, p < 0.01) and reasoning-based models. No significant differences were observed among GPT-4, GPT-4o, GPT-4.1, and GPT-5 (p > 0.05). In regional analysis, significant effects were restricted to pontine infarcts, where GPT-4 (74 %) and GPT-4o (69 %) exceeded all neurologists (χ² = 6.4–18.3, p < 0.01). For mesencephalic and medullary lesions, accuracies did not differ significantly (p > 0.05). GPT-o3 pro performed worst overall (10 %, p < 0.001). Cohen’s κ reached 0.29 for GPT-4o, and accuracy correlated with symptom count (r = 0.28, p < 0.01).

Conclusions:

GPT-4, and GPT-4o outperformed experienced neurologists in this constrained diagnostic task. Accuracy remained modest, particularly for non-pontine lesions, and reasoning-augmented models did not improve additional benefit. These findings highlight both the potential and current limitations of LLMs in clinical reasoning, reinforcing the need for multimodal input and prospective validation.


 Citation

Please cite as:

Beste N, Dratsch T, Kottlors J, Floßdrof P, Konitsioti AM, Volz L, Hanning U, Pinto dos Santos D, Goertz L, Zopfs D, Kabbasch C, Schlamann M, Laukamp K, Schönfeld M

Symptom-Only Localization of Brainstem Ischemia: Large Language Models vs. Neurologists in 109 Diffusion-Weighted Imaging–Positive Cases: A Retrospective Study

JMIR Preprints. 05/11/2025:87163

DOI: 10.2196/preprints.87163

URL: https://preprints.jmir.org/preprint/87163

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.