Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Oct 20, 2020
Date Accepted: Nov 19, 2021

The final, peer-reviewed published version of this preprint can be found here:

Assessment of Natural Language Processing Methods for Ascertaining the Expanded Disability Status Scale Score From the Electronic Health Records of Patients With Multiple Sclerosis: Algorithm Development and Validation Study

Yang Z, Pou-Prom C, Jones A, Banning M, Dai D, Mamdani M, Oh J, Antoniou T

Assessment of Natural Language Processing Methods for Ascertaining the Expanded Disability Status Scale Score From the Electronic Health Records of Patients With Multiple Sclerosis: Algorithm Development and Validation Study

JMIR Med Inform 2022;10(1):e25157

DOI: 10.2196/25157

PMID: 35019849

PMCID: 8792771

Assessment of natural language processing methods for ascertaining the Expanded Disability Status Scale score from electronic health records of multiple sclerosis patients

  • Zhen Yang; 
  • Chloé Pou-Prom; 
  • Ashley Jones; 
  • Michaelia Banning; 
  • David Dai; 
  • Muhammad Mamdani; 
  • Jiwon Oh; 
  • Tony Antoniou

ABSTRACT

Background:

The Expanded Disability Status Scale (EDSS) score is a widely used measure to monitor disability progression in people with multiple sclerosis (MS). However, extracting and deriving the EDSS from unstructured electronic health records can be time-consuming.

Objective:

To compare rule-based and deep learning natural language processing algorithms for detecting and predicting the total EDSS and EDSS functional system subscores from the electronic health records of patients with MS.

Methods:

We studied 17,452 electronic health records of 4,906 MS patients followed at one of Canada’s largest MS clinics between June 2015 and July 2019. We randomly divided the records into training (80%) and test (20%) data sets and compared the performance characteristics of three natural language processing models. First, we applied a rule-based approach, extracting the EDSS score from sentences containing the keyword ‘EDSS’. Next, we trained a Convolutional Neural Network (CNN) model to predict the nineteen half-step increments of the EDSS score. Finally, we used a combined rule-based-CNN model. For each approach, we determined the accuracy, precision, recall, and F-score compared to the reference standard, which were the manually labelled EDSS scores in the clinic database.

Results:

Overall, the combined keyword-CNN model demonstrated the best performance, with accuracy, precision, recall and F-score of 0.90, 0.83, 0.83, and 0.83 respectively. Respective figures for the rule-based and CNN models individually were 0.57, 0.91, 0.65, 0.70 and 0.86, 0.70, 0.70, 0.70. Because of missing data, model performance for EDSS sub-scores was lower than that for the total EDSS score. Performance improved when considering notes with known values of the EDSS subscores.

Conclusions:

A combined keyword/CNN natural language processing model can extract and accurately predict EDSS scores from patient records. This approach can be automated for efficient information extraction in clinical and research settings.


 Citation

Please cite as:

Yang Z, Pou-Prom C, Jones A, Banning M, Dai D, Mamdani M, Oh J, Antoniou T

Assessment of Natural Language Processing Methods for Ascertaining the Expanded Disability Status Scale Score From the Electronic Health Records of Patients With Multiple Sclerosis: Algorithm Development and Validation Study

JMIR Med Inform 2022;10(1):e25157

DOI: 10.2196/25157

PMID: 35019849

PMCID: 8792771

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.