Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Bioinformatics and Biotechnology

Date Submitted: Jan 31, 2022
Date Accepted: Jul 21, 2022

The final, peer-reviewed published version of this preprint can be found here:

Exploring the Applicability of Using Natural Language Processing to Support Nationwide Venous Thromboembolism Surveillance: Model Evaluation Study

Wendelboe A, Saber I, Dvorak J, Adamski A, Feland N, Reyes N, Abe K, Ortel T, Raskob G

Exploring the Applicability of Using Natural Language Processing to Support Nationwide Venous Thromboembolism Surveillance: Model Evaluation Study

JMIR Bioinform Biotech 2022;3(1):e36877

DOI: 10.2196/36877

PMID: 37206160

PMCID: 10193259

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Exploring the applicability of using natural language processing to support nationwide venous thromboembolism surveillance

  • Aaron Wendelboe; 
  • Ibrahim Saber; 
  • Justin Dvorak; 
  • Alys Adamski; 
  • Natalie Feland; 
  • Nimia Reyes; 
  • Karon Abe; 
  • Thomas Ortel; 
  • Gary Raskob

ABSTRACT

Background:

Conducting public health surveillance for venous thromboembolism (VTE) at a national scale is important for measuring the disease burden and the impact of prevention measures. Integrating natural language processing (NLP) into VTE surveillance may be an efficient and accurate option in establishing a sustainable and cost-effective national surveillance system.

Objective:

We evaluated the performance of the VTE identification instance of IDEAL-X, an NLP tool, in automatically classifying cases of VTE from “reading” unstructured text from diagnostic imaging records.

Methods:

Accessing imaging records from pilot surveillance systems for VTE from Duke University and the University of Oklahoma Health Sciences Center (OUHSC) during 2012–2014, we used a VTE identification model of IDEAL-X to classify cases of VTE that had previously been manually classified according to pre-defined criteria. The performance measures (and 95% confidence intervals [CI]) calculated were accuracy, sensitivity, specificity, and positive and negative predictive values.

Results:

The VTE model of IDEAL-X “read” 1591 records from Duke University and 1487 records from OUHSC for a total of 3078 records. The combined performance measures were 93.7% accuracy (95% CI: 93.7%–93.8%), 96.3% sensitivity (95% CI: 96.2%–96.4%), 92.0% specificity (95% CI: 91.9%–92.0%), 89.1% positive predictive value (95% CI: 89.0%–89.2%), and 97.3% negative predictive value (95% CI: 97.3%–97.4%). The sensitivity was higher at Duke University (97.9%, 95% CI: 97.8%–98.0%) than at OUHSC (93.8%, 95% CI: 93.5%–94.0%), but the specificity was higher at OUHSC (96.3%, 95% CI: 96.2%–96.3%) than at Duke University (90.4%, 95% CI: 90.1%–90.5%).

Conclusions:

The VTE model of IDEAL-X accurately classified cases of VTE from pilot surveillance systems from 2 separate states and health systems. NLP is a promising tool in the design and implementation of an automated national surveillance system for VTE.


 Citation

Please cite as:

Wendelboe A, Saber I, Dvorak J, Adamski A, Feland N, Reyes N, Abe K, Ortel T, Raskob G

Exploring the Applicability of Using Natural Language Processing to Support Nationwide Venous Thromboembolism Surveillance: Model Evaluation Study

JMIR Bioinform Biotech 2022;3(1):e36877

DOI: 10.2196/36877

PMID: 37206160

PMCID: 10193259

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.