JMIR Preprints #12109: Natural Language Processing for the Identification of Silent Brain Infarcts from Neuroimaging Reports

Current Preprint Settings

(as selected by the authors)

1. Allow access to the preprint PDF upon submission to:

(a) Open peer-review purposes
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) Nobody

2. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) Nobody

3. When a final paper is published in a JMIR journal, display the preprint as follows:

(a) Allow download
(b) Show abstract only
(c) Do not display anything

4. If the paper is rejected from JMIR journals, display the preprint to:

(a) Logged-in users only
(b) Anybody, anytime
(c) Nobody

Natural Language Processing for the Identification of Silent Brain Infarcts from Neuroimaging Reports

Sunyang Fu;
Lester Y Leung;
Yanshan Wang;
Anne-Olivia Raulli;
David F Kallmes;
Kristin A Kinsman;
Kristoff B Nelson;
Michael S Clark;
Patrick H Luetmer;
Paul R Kingsbury;
David M Kent;
Hongfang Liu

ABSTRACT

Background:

Silent brain infarction (SBI) is defined as the presence of one or more brain lesions, presumed to be due to vascular occlusion, found by neuroimaging (magnetic resonance imaging or computed tomography) in patients without clinical manifestations of stroke. It is more common than stroke and can be detected in 20% of healthy elderly people. Early detection of SBI may mitigate the risk of stroke by offering preventative treatment plans. Natural language processing (NLP) techniques offer an opportunity to systematically identify SBI cases in electronic health records using by extracting, normalizing and classifying SBI related incidental findings interpreted by radiologists from neuroimaging reports.

Objective:

Our study is to develop NLP systems to determine individuals with incidentally-discovered SBIs from neuroimaging reports at two sites: Mayo Clinic and Tufts Medical Center.

Methods:

A screening protocol using diagnosis codes and problem lists was developed to identify index neuroimaging reports (the patient’s first MRI or CT in the EHR) for individual patients without clinically-evident stroke, transient ischemic attack (TIA), and dementia any time before or up to 30 days after the index imaging exam. 500 post-screened radiology reports were retrieved from two sites and 400 out of 1000 were randomly selected to create a duplication set for calculating inter-annotator agreement (IAA). The reports were evenly distributed to 4 radiology residents (2 from each site) to manually annotate SBI-related findings. Both rule-based and machine learning approaches were adopted in developing the NLP system. The rule-based system was implemented using the open-source NLP pipeline MedTagger, developed by Mayo Clinic. Features for rule-based system including significant words and patterns related to SBI were generated using point-wise mutual information. The machine learning models adopted convolutional neural network, random forest, support vector machine and logistic regression.

Results:

5 reports were removed due to invalid scan types. The IAAs across Mayo and Tufts neuroimaging reports were 0.87 and 0.91, respectively. The rule-based system yielded the best performance of predicting SBI with an accuracy, sensitivity, specificity, positive predictive value and negative predictive value of 0.991, 0.925, 1.000, 1.000, and 0.990, respectively. The CNN achieved the best score on predicting WMD with an accuracy, sensitivity, specificity, positive predictive value and negative predictive value of 0.994, 0.994, 0.994, 0.994, and 0.994, respectively. Overall, the rule-based model required a longer development time due to the features and rules were iteratively refined during multiple training phases using manual chart review as gold standard.

Conclusions:

We adopted a standardized data abstraction and modeling process to developed NLP techniques (rule-based and machine learning) to detect incidental SBIs and WMDs from annotated neuroimaging reports. Validation statistics suggested a high feasibility of detecting SBIs and WMDs from EHRs using NLP.

Citation

Please cite as:

Fu S, Leung LY, Wang Y, Raulli AO, Kallmes DF, Kinsman KA, Nelson KB, Clark MS, Luetmer PH, Kingsbury PR, Kent DM, Liu H

Natural Language Processing for the Identification of Silent Brain Infarcts From Neuroimaging Reports

JMIR Med Inform 2019;7(2):e12109

DOI: 10.2196/12109

PMID: 31066686

PMCID: 6524454

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Sep 5, 2018

Open Peer Review Period: Sep 9, 2018 - Nov 4, 2018

Date Accepted: Mar 30, 2019

(closed for review but you can still tweet)

Natural Language Processing for the Identification of Silent Brain Infarcts from Neuroimaging Reports

ABSTRACT

Citation

Copyright