JMIR Preprints #25157: Assessment of natural language processing methods for ascertaining the Expanded Disability Status Scale score from electronic health records of multiple sclerosis patients

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Assessment of natural language processing methods for ascertaining the Expanded Disability Status Scale score from electronic health records of multiple sclerosis patients

Zhen Yang;
Chloé Pou-Prom;
Ashley Jones;
Michaelia Banning;
David Dai;
Muhammad Mamdani;
Jiwon Oh;
Tony Antoniou

ABSTRACT

Background:

The Expanded Disability Status Scale (EDSS) score is a widely used measure to monitor disability progression in people with multiple sclerosis (MS). However, extracting and deriving the EDSS from unstructured electronic health records can be time-consuming.

Objective:

To compare rule-based and deep learning natural language processing algorithms for detecting and predicting the total EDSS and EDSS functional system subscores from the electronic health records of patients with MS.

Methods:

We studied 17,452 electronic health records of 4,906 MS patients followed at one of Canada’s largest MS clinics between June 2015 and July 2019. We randomly divided the records into training (80%) and test (20%) data sets and compared the performance characteristics of three natural language processing models. First, we applied a rule-based approach, extracting the EDSS score from sentences containing the keyword ‘EDSS’. Next, we trained a Convolutional Neural Network (CNN) model to predict the nineteen half-step increments of the EDSS score. Finally, we used a combined rule-based-CNN model. For each approach, we determined the accuracy, precision, recall, and F-score compared to the reference standard, which were the manually labelled EDSS scores in the clinic database.

Results:

Overall, the combined keyword-CNN model demonstrated the best performance, with accuracy, precision, recall and F-score of 0.90, 0.83, 0.83, and 0.83 respectively. Respective figures for the rule-based and CNN models individually were 0.57, 0.91, 0.65, 0.70 and 0.86, 0.70, 0.70, 0.70. Because of missing data, model performance for EDSS sub-scores was lower than that for the total EDSS score. Performance improved when considering notes with known values of the EDSS subscores.

Conclusions:

A combined keyword/CNN natural language processing model can extract and accurately predict EDSS scores from patient records. This approach can be automated for efficient information extraction in clinical and research settings.

Citation

Please cite as:

Yang Z, Pou-Prom C, Jones A, Banning M, Dai D, Mamdani M, Oh J, Antoniou T

Assessment of Natural Language Processing Methods for Ascertaining the Expanded Disability Status Scale Score From the Electronic Health Records of Patients With Multiple Sclerosis: Algorithm Development and Validation Study

JMIR Med Inform 2022;10(1):e25157

DOI: 10.2196/25157

PMID: 35019849

PMCID: 8792771

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Oct 20, 2020

Date Accepted: Nov 19, 2021

Assessment of natural language processing methods for ascertaining the Expanded Disability Status Scale score from electronic health records of multiple sclerosis patients

ABSTRACT

Citation

Copyright