Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR AI

Date Submitted: Aug 18, 2025
Date Accepted: Jan 18, 2026
Date Submitted to PubMed: Jan 20, 2026

The final, peer-reviewed published version of this preprint can be found here:

Assessment of the Modified Rankin Scale in Electronic Health Records With a Fine-Tuned Large Language Model: Development and Internal Validation

Silva L, Milani M, Bindra S, Ikramuddin S, Tessmer M, Frederickson K, Datta A, Ergen H, Stangebye A, Cooper D, Kumar K, Yeung J, Lakshminarayan K, Streib C

Assessment of the Modified Rankin Scale in Electronic Health Records With a Fine-Tuned Large Language Model: Development and Internal Validation

JMIR AI 2026;5:e82607

DOI: 10.2196/82607

PMID: 41740162

PMCID: 12935414

Assessment of the Modified Rankin Scale in Electronic Health Records with a Fine-tuned Large Language Model

  • Luis Silva; 
  • Marcus Milani; 
  • Sohum Bindra; 
  • Salman Ikramuddin; 
  • Megan Tessmer; 
  • Kaylee Frederickson; 
  • Abhigyan Datta; 
  • Halil Ergen; 
  • Alex Stangebye; 
  • Dawson Cooper; 
  • Kompa Kumar; 
  • Jeremy Yeung; 
  • Kamakshi Lakshminarayan; 
  • Christopher Streib

ABSTRACT

Background:

The modified Rankin scale (mRS) is an important metric in stroke research, often used as a primary outcome in clinical trials and observational studies. The mRS can be assessed retrospectively from electronic health records (EHR), though this process is labor-intensive and prone to inter-rater variability. Large language models (LLMs) have demonstrated potential in automating text classification

Objective:

We aim to create a fine-tuned LLM that can analyze EHR text and classify mRS scores for clinical and research applications.

Methods:

We performed a retrospective cohort study of patients admitted to a specialist stroke neurology service at a large academic hospital system between August 2020 and June 2023. Each patient’s medical record was reviewed at two time points: (1) hospital discharge and (2) approximately 90 days post-discharge. Two independent researchers assigned an mRS score at each time point. Two separate models were trained on EHR passages with corresponding mRS scores as labeled outcomes: (1) a multiclass model to classify all seven mRS scores and (2) a binary model to classify functional independence (mRS 0–2) versus non-independence (mRS 3–6). Four-fold cross-validation was conducted, using accuracy and Cohen's kappa as model performance metrics.

Results:

A total of 2,290 EHR passages with corresponding mRS scores were included in model training. The multiclass model—considering all seven scores of the mRS—attained an accuracy of 77% and a weighted Cohen's Kappa of 0.92. Class-specific accuracy was highest for mRS 4 (90%) and lowest for mRS 2 (28%). The binary model—considering only functional independence vs non-independence —attained an accuracy of 92% and Cohen's Kappa of 0.84.

Conclusions:

Our findings demonstrate that LLMs can be successfully trained to determine mRS scores through EHR text analysis. With further advancements, fully automated LLMs could scale across large clinical datasets, enabling data-driven public health strategies and optimized resource allocation. Clinical Trial: stroke; modified Rankin scale; artificial intelligence; large language model; machine learning; electronic health records


 Citation

Please cite as:

Silva L, Milani M, Bindra S, Ikramuddin S, Tessmer M, Frederickson K, Datta A, Ergen H, Stangebye A, Cooper D, Kumar K, Yeung J, Lakshminarayan K, Streib C

Assessment of the Modified Rankin Scale in Electronic Health Records With a Fine-Tuned Large Language Model: Development and Internal Validation

JMIR AI 2026;5:e82607

DOI: 10.2196/82607

PMID: 41740162

PMCID: 12935414

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.