Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Jan 13, 2020
Date Accepted: May 21, 2020
Identifying and Predicting Intentional Self-harm in Electronic Health Records Clinical Notes: A Deep Learning Approach
ABSTRACT
Background:
Suicide is an important public health concern in the United States and around the world. There has been significant work in examining machine learning approaches to identify and predict intentional self-harm and suicide using existing datasets. With recent advances in computing, deep learning applications in healthcare are gaining momentum.
Objective:
The aims of this study were to leverage the information in clinical notes to: 1) improve the identification of patients treated for intentional self-harm and 2) predict future self-harm events.
Methods:
We extracted clinical text notes from electronic health records (EHR) of 835 patients with ICD codes for intentional self-harm (ISH) and 1670 matched controls who never had any ISH ICD codes. The data was divided into a training set and a hold out test set. We tested a number of algorithms on clinical notes associated with the ISH codes using the training set, including a several traditional bag-of-words based models and two word embeddings based models with deep convolutional neural networks (CNN’s). After establishing the superior performance of the CNN’s on training/cross validation, we evaluated the latter models on the hold out test set. We also evaluated the predictive performance of the CNN’s on a subset of patients who had clinical notes 1 to 6 months before the first ISH event.
Results:
The area under the receiver operating characteristics curve (AUC) for the CNN for the detection of ISH in clinical notes concurrent with the events approached 1.00, with an F1-score of 0.985. The AUC for the CNN’s on the predictive task ranged from 0.861 to 0.885 with F1-scores of 0.776-0778.
Conclusions:
The strong performance on the first task, namely the detection of concurrent ISH events, suggests that such models could be used effectively for surveillance of ISH in clinical text. The modest performance on the predictive task notwithstanding, the results using clinical text alone are competitive with other reports in the literature using risk factors from structured EHR data.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.