Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Sep 5, 2018
Open Peer Review Period: Sep 9, 2018 - Nov 4, 2018
Date Accepted: Mar 30, 2019
(closed for review but you can still tweet)
Natural Language Processing for the Identification of Silent Brain Infarcts from Neuroimaging Reports
ABSTRACT
Background:
Silent brain infarction (SBI) is defined as the presence of one or more brain lesions, presumed to be due to vascular occlusion, found by neuroimaging (magnetic resonance imaging or computed tomography) in patients without clinical manifestations of stroke. It is more common than stroke and can be detected in 20% of healthy elderly people. Early detection of SBI may mitigate the risk of stroke by offering preventative treatment plans. Natural language processing (NLP) techniques offer an opportunity to systematically identify SBI cases in electronic health records using by extracting, normalizing and classifying SBI related incidental findings interpreted by radiologists from neuroimaging reports.
Objective:
Our study is to develop NLP systems to determine individuals with incidentally-discovered SBIs from neuroimaging reports at two sites: Mayo Clinic and Tufts Medical Center.
Methods:
A screening protocol using diagnosis codes and problem lists was developed to identify index neuroimaging reports (the patient’s first MRI or CT in the EHR) for individual patients without clinically-evident stroke, transient ischemic attack (TIA), and dementia any time before or up to 30 days after the index imaging exam. 500 post-screened radiology reports were retrieved from two sites and 400 out of 1000 were randomly selected to create a duplication set for calculating inter-annotator agreement (IAA). The reports were evenly distributed to 4 radiology residents (2 from each site) to manually annotate SBI-related findings. Both rule-based and machine learning approaches were adopted in developing the NLP system. The rule-based system was implemented using the open-source NLP pipeline MedTagger, developed by Mayo Clinic. Features for rule-based system including significant words and patterns related to SBI were generated using point-wise mutual information. The machine learning models adopted convolutional neural network, random forest, support vector machine and logistic regression.
Results:
5 reports were removed due to invalid scan types. The IAAs across Mayo and Tufts neuroimaging reports were 0.87 and 0.91, respectively. The rule-based system yielded the best performance of predicting SBI with an accuracy, sensitivity, specificity, positive predictive value and negative predictive value of 0.991, 0.925, 1.000, 1.000, and 0.990, respectively. The CNN achieved the best score on predicting WMD with an accuracy, sensitivity, specificity, positive predictive value and negative predictive value of 0.994, 0.994, 0.994, 0.994, and 0.994, respectively. Overall, the rule-based model required a longer development time due to the features and rules were iteratively refined during multiple training phases using manual chart review as gold standard.
Conclusions:
We adopted a standardized data abstraction and modeling process to developed NLP techniques (rule-based and machine learning) to detect incidental SBIs and WMDs from annotated neuroimaging reports. Validation statistics suggested a high feasibility of detecting SBIs and WMDs from EHRs using NLP.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.