Accepted for/Published in: JMIR Public Health and Surveillance
Date Submitted: Dec 23, 2020
Date Accepted: Feb 12, 2021
Automated Travel History Extraction from Clinical Notes: Algorithm Development and Validation for Emergent Infectious Disease Events
ABSTRACT
Background:
Patient travel history can be crucial in evaluating evolving infectious disease events. Such information can be challenging to acquire in electronic health records as it is often available only in unstructured text.
Objective:
Assess the feasibility of annotating and automatically extracting travel history mentions from unstructured clinical documents in the Department of Veterans Affairs (VA) across disparate healthcare facilities and among millions of patients. Information about travel exposure augments existing surveillance applications for increased preparedness in responding quickly to public health threats.
Methods:
Clinical documents related to arboviral disease were annotated following selection using a semi-automated bootstrapping process. Using annotated instances as training data, models were developed to extract from unstructured clinical text any mention of affirmed travel locations outside of the continental United States. Automated text processing models were evaluated involving machine learning and neural language models for extraction accuracy.
Results:
Among annotated instances, 2,659 (58%) contained an affirmed mention of travel history while 347 (7.6%) were negated. Inter-annotator agreement resulted in a document-level Cohen’s kappa (Κc) of 0.776. Automated text processing accuracy (F1=85.6; 95% CI: 82.5 to 87.9) and computational burden were acceptable such that the system can provide a rapid screen for public health events.
Conclusions:
Automated extraction of patient travel history from clinical documents is feasible for enhanced passive surveillance public health systems. Without such a system, it would usually be necessary to manually review charts to identify recent travel or lack of travel, use an electronic health record that enforces travel history documentation, or ignore this potential source of information altogether. The development of this tool was initially motivated by emergent arboviral diseases. More recently, this system was utilized in early phases of response to COVID-19 in the United States, although its utility was limited to a relatively brief window due to rapid domestic spread of the virus. Such systems may aid future efforts to prevent and contain the spread of infectious diseases.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.