Accepted for/Published in: JMIR Bioinformatics and Biotechnology
Date Submitted: Jan 31, 2022
Date Accepted: Jul 21, 2022
Exploring the applicability of using natural language processing to support nationwide venous thromboembolism surveillance
ABSTRACT
Background:
Conducting public health surveillance for venous thromboembolism (VTE) at a national scale is important for measuring the disease burden and the impact of prevention measures. Integrating natural language processing (NLP) into VTE surveillance may be an efficient and accurate option in establishing a sustainable and cost-effective national surveillance system.
Objective:
We evaluated the performance of the VTE identification instance of IDEAL-X, an NLP tool, in automatically classifying cases of VTE from “reading” unstructured text from diagnostic imaging records.
Methods:
Accessing imaging records from pilot surveillance systems for VTE from Duke University and the University of Oklahoma Health Sciences Center (OUHSC) during 2012–2014, we used a VTE identification model of IDEAL-X to classify cases of VTE that had previously been manually classified according to pre-defined criteria. The performance measures (and 95% confidence intervals [CI]) calculated were accuracy, sensitivity, specificity, and positive and negative predictive values.
Results:
The VTE model of IDEAL-X “read” 1591 records from Duke University and 1487 records from OUHSC for a total of 3078 records. The combined performance measures were 93.7% accuracy (95% CI: 93.7%–93.8%), 96.3% sensitivity (95% CI: 96.2%–96.4%), 92.0% specificity (95% CI: 91.9%–92.0%), 89.1% positive predictive value (95% CI: 89.0%–89.2%), and 97.3% negative predictive value (95% CI: 97.3%–97.4%). The sensitivity was higher at Duke University (97.9%, 95% CI: 97.8%–98.0%) than at OUHSC (93.8%, 95% CI: 93.5%–94.0%), but the specificity was higher at OUHSC (96.3%, 95% CI: 96.2%–96.3%) than at Duke University (90.4%, 95% CI: 90.1%–90.5%).
Conclusions:
The VTE model of IDEAL-X accurately classified cases of VTE from pilot surveillance systems from 2 separate states and health systems. NLP is a promising tool in the design and implementation of an automated national surveillance system for VTE.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.