Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Sep 16, 2020
Date Accepted: Apr 16, 2021

The final, peer-reviewed published version of this preprint can be found here:

Automating Stroke Data Extraction From Free-Text Radiology Reports Using Natural Language Processing: Instrument Validation Study

Yu AYX, Liu ZA, Pou-Prom C, Lopes K, Kapral MK, Aviv RI, Mamdani M

Automating Stroke Data Extraction From Free-Text Radiology Reports Using Natural Language Processing: Instrument Validation Study

JMIR Med Inform 2021;9(5):e24381

DOI: 10.2196/24381

PMID: 33944791

PMCID: 8132979

Automating stroke data extraction from free-text radiology reports using natural language processing: an instrument validation study

  • Amy Y X Yu; 
  • Zhongyu A Liu; 
  • Chloe Pou-Prom; 
  • Kaitlyn Lopes; 
  • Moira K Kapral; 
  • Richard I Aviv; 
  • Muhammad Mamdani

ABSTRACT

Background:

Diagnostic neurovascular imaging data are important in stroke research, but obtaining these data typically requires laborious manual chart reviews.

Objective:

We aimed to determine the accuracy of a natural language processing (NLP) approach to extract information on the presence and location of vascular occlusions as well as other stroke-related attributes from free-text reports.

Methods:

From the full report of 1,320 consecutive computed tomograms (CT), CT angiograms, and CT perfusion scans of the head and neck performed in a tertiary stroke centre between October 2017 and January 2019, we manually extracted data on the presence of proximal large vessel occlusion, our primary outcome, and several secondary outcomes, including distal vessel occlusion, ischemia, hemorrhage, Alberta stroke program early CT score (ASPECTS), and collateral status. Reports were randomly split into training (n= 921) and validation sets (n= 399) and attributes were extracted using rule-based NLP. We report the sensitivity, specificity, positive and negative predictive values (PPV, NPV), and overall accuracy of the NLP approach relative to manually extracted data.

Results:

The prevalence of large vessel occlusion was 12%. In the training sample, the NLP approach identified this attribute with an overall accuracy of 97% (sensitivity 96%, specificity 98%, PPV 84%, NPV 99%). In the validation set, the overall accuracy was 95% (sensitivity 90%, specificity 97%, PPV 76%, NPV 99%). The accuracy of identifying distal or basilar occlusion as well as hemorrhage was also high, but there were limitations in identifying cerebral ischemia, ASPECTS, and collateral status.

Conclusions:

NLP may improve the efficiency of large-scale imaging data collection for stroke surveillance and research. Clinical Trial: Not applicable


 Citation

Please cite as:

Yu AYX, Liu ZA, Pou-Prom C, Lopes K, Kapral MK, Aviv RI, Mamdani M

Automating Stroke Data Extraction From Free-Text Radiology Reports Using Natural Language Processing: Instrument Validation Study

JMIR Med Inform 2021;9(5):e24381

DOI: 10.2196/24381

PMID: 33944791

PMCID: 8132979

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.