JMIR Preprints #45246: Using Natural Language Processing to Predict Fatal Drug Overdose from Autopsy Narrative Text: Algorithm Development and Validation

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Using Natural Language Processing to Predict Fatal Drug Overdose from Autopsy Narrative Text: Algorithm Development and Validation

Leigh Anne Tang;
Jessica Korona-Bailey;
Dimitrios Zaras;
Allison Roberts;
Sutapa Mukhopadhyay;
Stephen Espy;
Colin G Walsh

ABSTRACT

Background:

Fatal drug overdose surveillance informs prevention but is often delayed due to autopsy report processing and death certificate coding. Autopsy reports contain narrative text describing scene evidence and medical history (similar to preliminary death scene investigation reports) and may serve as early data sources for identifying fatal drug overdoses. To facilitate more timely fatal overdose reporting, natural language processing (NLP) was applied to narrative text from autopsies.

Objective:

This study aimed to develop an NLP-based model predicting the likelihood that an autopsy report narrative describes an accidental or undetermined fatal drug overdose.

Methods:

Autopsies for all manners of death (2019-2021) were obtained from the Tennessee Office of the State Chief Medical Examiner. Text was extracted from autopsy reports (in portable document format files) using optical character recognition. Three common narrative text sections were identified, concatenated, and preprocessed (bag-of-words) with term frequency-inverse document frequency scoring. Logistic regression, support vector machine (SVM), random forest, and gradient boosted trees classifiers were developed and validated. Autopsies from 2019-2020 were used for training (95%) and calibration (5%), and 2021 for testing. Model discrimination was evaluated using area under the receiver operating characteristic (AUROC), precision, recall, F1 score, and F2 score (prioritizes recall over precision). Calibration was performed using logistic regression (Platt scaling) and evaluated using the Spiegelhalter z-test. Shapley Additive exPlanations (SHAP) values were generated for models compatible with the method. In a post-hoc subgroup analysis of the random forest classifier, model discrimination was evaluated by forensic center, race, and age at death.

Results:

A total of 17,342 autopsies (34% cases, 66% controls) were used for model development and validation. The training set included 10,215 autopsies (33% cases, 67% controls), calibration set had 538 autopsies (34% cases, 66% controls), and test set had 6,589 autopsies (37% cases, 63% controls). The vocabulary set contained 4,002 terms. All models showed excellent performance (AUROC ≥0.95, precision ≥0.94, recall ≥0.92, F1 ≥0.94, and F2 ≥0.92). The SVM and random forest classifiers achieved the highest F2 scores (SVM F2=0.948; random forest F2=0.947). The logistic regression and random forest were calibrated (P=.95 and P=.85 respectively), while the SVM and gradient boosted trees classifiers were miscalibrated (P=.029 and P<.001 respectively). “Fentanyl” and “accident” had the highest SHAP values. Post-hoc subgroup analyses revealed lower F2 scores for autopsy reports from forensic centers D and E. Lower F2 scores were also observed for the American Indian, Asian, ≤14, and ≥65 subgroups, but larger sample sizes are needed to validate these findings.

Conclusions:

The random forest classifier may be suitable for identifying potential accidental and undetermined fatal overdose autopsies. Operationalizing this classifier could enable the early detection of accidental and undetermined fatal drug overdoses.

Citation

Please cite as:

Tang LA, Korona-Bailey J, Zaras D, Roberts A, Mukhopadhyay S, Espy S, Walsh CG

Using Natural Language Processing to Predict Fatal Drug Overdose From Autopsy Narrative Text: Algorithm Development and Validation Study

JMIR Public Health Surveill 2023;9:e45246

DOI: 10.2196/45246

PMID: 37204824

PMCID: 10238956

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Public Health and Surveillance

Date Submitted: Dec 21, 2022

Date Accepted: Mar 7, 2023

Using Natural Language Processing to Predict Fatal Drug Overdose from Autopsy Narrative Text: Algorithm Development and Validation

ABSTRACT

Citation

Copyright