JMIR Preprints #24381: Automating stroke data extraction from free-text radiology reports using natural language processing: an instrument validation study

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Automating stroke data extraction from free-text radiology reports using natural language processing: an instrument validation study

Amy Y X Yu;
Zhongyu A Liu;
Chloe Pou-Prom;
Kaitlyn Lopes;
Moira K Kapral;
Richard I Aviv;
Muhammad Mamdani

ABSTRACT

Background:

Diagnostic neurovascular imaging data are important in stroke research, but obtaining these data typically requires laborious manual chart reviews.

Objective:

We aimed to determine the accuracy of a natural language processing (NLP) approach to extract information on the presence and location of vascular occlusions as well as other stroke-related attributes from free-text reports.

Methods:

From the full report of 1,320 consecutive computed tomograms (CT), CT angiograms, and CT perfusion scans of the head and neck performed in a tertiary stroke centre between October 2017 and January 2019, we manually extracted data on the presence of proximal large vessel occlusion, our primary outcome, and several secondary outcomes, including distal vessel occlusion, ischemia, hemorrhage, Alberta stroke program early CT score (ASPECTS), and collateral status. Reports were randomly split into training (n= 921) and validation sets (n= 399) and attributes were extracted using rule-based NLP. We report the sensitivity, specificity, positive and negative predictive values (PPV, NPV), and overall accuracy of the NLP approach relative to manually extracted data.

Results:

The prevalence of large vessel occlusion was 12%. In the training sample, the NLP approach identified this attribute with an overall accuracy of 97% (sensitivity 96%, specificity 98%, PPV 84%, NPV 99%). In the validation set, the overall accuracy was 95% (sensitivity 90%, specificity 97%, PPV 76%, NPV 99%). The accuracy of identifying distal or basilar occlusion as well as hemorrhage was also high, but there were limitations in identifying cerebral ischemia, ASPECTS, and collateral status.

Conclusions:

NLP may improve the efficiency of large-scale imaging data collection for stroke surveillance and research. Clinical Trial: Not applicable

Citation

Please cite as:

Yu AYX, Liu ZA, Pou-Prom C, Lopes K, Kapral MK, Aviv RI, Mamdani M

Automating Stroke Data Extraction From Free-Text Radiology Reports Using Natural Language Processing: Instrument Validation Study

JMIR Med Inform 2021;9(5):e24381

DOI: 10.2196/24381

PMID: 33944791

PMCID: 8132979

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Sep 16, 2020

Date Accepted: Apr 16, 2021

Automating stroke data extraction from free-text radiology reports using natural language processing: an instrument validation study

ABSTRACT

Citation

Copyright