Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Sep 5, 2018
Open Peer Review Period: Sep 9, 2018 - Nov 4, 2018
Date Accepted: Mar 30, 2019
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Natural Language Processing for the Identification of Silent Brain Infarcts From Neuroimaging Reports

Fu S, Leung LY, Wang Y, Raulli AO, Kallmes DF, Kinsman KA, Nelson KB, Clark MS, Luetmer PH, Kingsbury PR, Kent DM, Liu H

Natural Language Processing for the Identification of Silent Brain Infarcts From Neuroimaging Reports

JMIR Med Inform 2019;7(2):e12109

DOI: 10.2196/12109

PMID: 31066686

PMCID: 6524454

Natural Language Processing for the Identification of Silent Brain Infarcts from Neuroimaging Reports

  • Sunyang Fu; 
  • Lester Y Leung; 
  • Yanshan Wang; 
  • Anne-Olivia Raulli; 
  • David F Kallmes; 
  • Kristin A Kinsman; 
  • Kristoff B Nelson; 
  • Michael S Clark; 
  • Patrick H Luetmer; 
  • Paul R Kingsbury; 
  • David M Kent; 
  • Hongfang Liu

ABSTRACT

Background:

Silent brain infarction (SBI) is defined as the presence of one or more brain lesions, presumed to be due to vascular occlusion, found by neuroimaging (magnetic resonance imaging or computed tomography) in patients without clinical manifestations of stroke. It is more common than stroke and can be detected in 20% of healthy elderly people. Early detection of SBI may mitigate the risk of stroke by offering preventative treatment plans. Natural language processing (NLP) techniques offer an opportunity to systematically identify SBI cases in electronic health records using by extracting, normalizing and classifying SBI related incidental findings interpreted by radiologists from neuroimaging reports.

Objective:

Our study is to develop NLP systems to determine individuals with incidentally-discovered SBIs from neuroimaging reports at two sites: Mayo Clinic and Tufts Medical Center.

Methods:

A screening protocol using diagnosis codes and problem lists was developed to identify index neuroimaging reports (the patient’s first MRI or CT in the EHR) for individual patients without clinically-evident stroke, transient ischemic attack (TIA), and dementia any time before or up to 30 days after the index imaging exam. 500 post-screened radiology reports were retrieved from two sites and 400 out of 1000 were randomly selected to create a duplication set for calculating inter-annotator agreement (IAA). The reports were evenly distributed to 4 radiology residents (2 from each site) to manually annotate SBI-related findings. Both rule-based and machine learning approaches were adopted in developing the NLP system. The rule-based system was implemented using the open-source NLP pipeline MedTagger, developed by Mayo Clinic. Features for rule-based system including significant words and patterns related to SBI were generated using point-wise mutual information. The machine learning models adopted convolutional neural network, random forest, support vector machine and logistic regression.

Results:

5 reports were removed due to invalid scan types. The IAAs across Mayo and Tufts neuroimaging reports were 0.87 and 0.91, respectively. The rule-based system yielded the best performance of predicting SBI with an accuracy, sensitivity, specificity, positive predictive value and negative predictive value of 0.991, 0.925, 1.000, 1.000, and 0.990, respectively. The CNN achieved the best score on predicting WMD with an accuracy, sensitivity, specificity, positive predictive value and negative predictive value of 0.994, 0.994, 0.994, 0.994, and 0.994, respectively. Overall, the rule-based model required a longer development time due to the features and rules were iteratively refined during multiple training phases using manual chart review as gold standard.

Conclusions:

We adopted a standardized data abstraction and modeling process to developed NLP techniques (rule-based and machine learning) to detect incidental SBIs and WMDs from annotated neuroimaging reports. Validation statistics suggested a high feasibility of detecting SBIs and WMDs from EHRs using NLP.


 Citation

Please cite as:

Fu S, Leung LY, Wang Y, Raulli AO, Kallmes DF, Kinsman KA, Nelson KB, Clark MS, Luetmer PH, Kingsbury PR, Kent DM, Liu H

Natural Language Processing for the Identification of Silent Brain Infarcts From Neuroimaging Reports

JMIR Med Inform 2019;7(2):e12109

DOI: 10.2196/12109

PMID: 31066686

PMCID: 6524454

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.