JMIR Preprints #81245: Automated Glasgow Coma Scale Score Extraction: Mining Unstructured Electronic Health Records

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Automated Glasgow Coma Scale Score Extraction: Mining Unstructured Electronic Health Records

Marta Fernandes;
Niels Turley;
Haoqi Sun;
Shibani S. Mukerji;
Lidia M. V. R. Moura;
M. Brandon Westover;
Sahar F. Zafar

ABSTRACT

Background:

Multicenter electronic health records (EHR) can support quality improvement and comparative effectiveness research in critical care. However, limitations of EHR-based research include challenges in abstracting key clinical variables, including a patient’s level of consciousness.

Objective:

The objective of our study was to develop a natural language processing (NLP) model to predict the Glasgow Coma Scale (GCS) scores from daily EHR notes.

Methods:

The study included adult patients (≥18 years) admitted to Massachusetts General Brigham (MGB) hospitals (2017-2024) and patients from the MIMIC-III database (Medical Information Mart for Intensive Care-MIMIC III 2001-2012) v1.4. A dataset with daily notes, age, sex, admission type, of all patients from both institutions was split into train/hold-out test (70%/30%) sets. We trained an ordinal regression model “ordinalNet” with an elastic net penalty to predict the lowest daily score among three levels: severe (GCS 3-8), moderate (GCS 9-12) and mild (GCS 13-15). Model performance was assessed in the hold-out test set (MGB+MIMIC) using areas under the receiver characteristic curve (AUROC) and precision-recall curve (AUPRC).

Results:

Our modeling cohort included 55,285 patients (MGB =36,696; MIMIC =18,589) with 122,010 days of hospitalization; average age 64 [SD 17] years; 56% male, and 76% White. The ordinalNet achieved AUROC and AUPRC [95% CI]: MGB + MIMIC – 0.91 [0.91-0.91] and 0.84 [0.83-0.84]; MGB – 0.91 [0.90-0.91] and 0.83 [0.82-0.84]; MIMIC –0.91 [0.90-0.91] and 0.83 [0.83-0.84]. The model predicted severe GCS 3-8 with AUROC and AUPRC of 0.97 [0.97-0.97] and 0.94 [0.93-0.94].

Conclusions:

Our NLP-based model can enable large-scale phenotyping of neurological assessments and critical care research studies.

Citation

Please cite as:

Fernandes M, Turley N, Sun H, Mukerji SS, Moura LMVR, Westover MB, Zafar SF

Automated Prediction of Glasgow Coma Scale Scores From Unstructured Electronic Health Records Using Natural Language Processing: Development and Validation Study

J Med Internet Res 2026;28:e81245

DOI: 10.2196/81245

PMID: 42372230

PMCID: 13313573

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Jul 24, 2025

Date Accepted: Apr 20, 2026

Automated Glasgow Coma Scale Score Extraction: Mining Unstructured Electronic Health Records

ABSTRACT

Citation

Copyright