Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Feb 27, 2025
Date Accepted: Jun 17, 2025

The final, peer-reviewed published version of this preprint can be found here:

Large Language Models in Neurological Practice: Real-World Study

Maiorana NV, Marceglia S, Treddenti M, Tosi M, Guidetti M, Creta MF, Bocci T, Oliveri S, Martinelli Boneschi F, Priori A

Large Language Models in Neurological Practice: Real-World Study

J Med Internet Res 2025;27:e73212

DOI: 10.2196/73212

PMID: 40982758

PMCID: 12453287

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Is it time for the neurologist to use Large Language Models in everyday practice?

  • Natale Vincenzo Maiorana; 
  • Sara Marceglia; 
  • Mauro Treddenti; 
  • Mattia Tosi; 
  • Matteo Guidetti; 
  • Maria Francesca Creta; 
  • Tommaso Bocci; 
  • Serena Oliveri; 
  • Filippo Martinelli Boneschi; 
  • Alberto Priori

ABSTRACT

Background:

Large Language Models (LLMs) such as ChatGPT and Gemini are increasingly explored for their potential in medical diagnostics, including neurology. Their real-world applicability remains inadequately assessed, particularly in clinical workflows where nuanced decision-making is required.

Objective:

To evaluate the diagnostic accuracy and appropriateness of clinical recommendations provided by ChatGPT and Gemini compared to neurologists using real-world clinical cases.

Methods:

This study consisted of a two-phase approach: (1) a systematic review of the literature on LLMs in neurology diagnosis to assess the adequacy of applied methodologies for clinical translation, and (2) an experimental evaluation of LLMs' diagnostic performance presenting real-world neurology cases to ChatGPT and Gemini, comparing their performance with that of clinical neurologists. The study was conducted simulating a first visit using information from anonymized patient records from the neurology department of the ASST Santi Paolo e Carlo Hospital (Milan, Italy), ensuring a real-world clinical context. A cohort of 28 anonymized patient cases was selected based on routine neurology consultations. These cases covered a range of neurological conditions and diagnostic complexities representative of daily clinical practice. The primary outcome was diagnostic accuracy of both neurologists and LLMs, defined as concordance with discharge diagnoses. Secondary outcomes included the appropriateness of recommended diagnostic tests and the extent of additional prompting required for accurate responses.

Results:

Among the 24 studies identified in the literature review, most exhibited heterogeneous methodologies with structured prompts, specifically designed for the interaction with LLMs, but lacked real-world case evaluations. In the experimental phase, neurologists achieved a diagnostic accuracy of 75%, outperforming ChatGPT (54%) and Gemini (46%). Both LLMs demonstrated limitations in nuanced clinical reasoning and over-prescribed diagnostic tests in 17–25% of cases. Additionally, complex or ambiguous cases required further prompting to refine AI-generated responses.

Conclusions:

While LLMs show potential as supportive tools in neurology, they currently lack the depth required for independent clinical decision-making. Future research should focus on refining LLM capabilities and developing evaluation methodologies that reflect the complexities of real-world neurological practice, thus ensuring effective, responsible, and safe use of such promising technologies.


 Citation

Please cite as:

Maiorana NV, Marceglia S, Treddenti M, Tosi M, Guidetti M, Creta MF, Bocci T, Oliveri S, Martinelli Boneschi F, Priori A

Large Language Models in Neurological Practice: Real-World Study

J Med Internet Res 2025;27:e73212

DOI: 10.2196/73212

PMID: 40982758

PMCID: 12453287

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.