Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Aug 21, 2018
Open Peer Review Period: Aug 26, 2018 - Oct 11, 2018
Date Accepted: Feb 10, 2019
(closed for review but you can still tweet)
Detecting Hypoglycemia Incidents Reported in Patients’ Secure Messages: Using Cost-sensitive Learning and Oversampling to Reduce Data Imbalance
ABSTRACT
Background:
Improper dosing of medications like insulin can cause hypoglycemic episodes, which may lead to severe morbidity or even death. Although secure messaging was designed for exchanging non-urgent messages, patients sometimes reported hypoglycemia events through secure messaging. Detecting these patient-reported adverse events may help alert clinical teams and enable early corrective actions to improve patient safety.
Objective:
We aimed to develop a natural language processing system, called HypoDetect (Hypoglycemia Detector) to automatically identify hypoglycemia incidents reported in patients’ secure messages.
Methods:
An expert in public health annotated 3,000 secure message threads as containing patient-reported hypoglycemia incidents or not. A physician independently annotated 100 threads randomly selected from this dataset for inter-annotator agreement. We used this dataset to develop and evaluate HypoDetect. HypoDetect incorporates three machine learning algorithms widely used for text classification: Linear Support Vector Machines, Random Forest, and Logistic Regression. We explored different learning features, including new knowledge-driven features. Because only 114 (3.8%) messages were annotated as positive, we investigated cost-sensitive learning and over-sampling methods to mitigate the challenge of imbalanced data.
Results:
The inter-annotator agreement was 0.976 Cohen’s Kappa. Using cross-validation, Logistic Regression with cost-sensitive learning achieved the best performance (Area Under ROC Curve score =0.954, Sensitivity=0.693, Specificity=0.974, F1=0.590). Cost-sensitive learning and the ensembled Synthetic Minority Over-sampling Technique improved the sensitivity of the baseline systems substantially (by 0.123 to 0.728 absolute gains). Our results show that a variety of features contributed to the best performance of HypoDetect.
Conclusions:
Despite the challenge of data imbalance, HypoDetect achieved promising results for the task of detecting hypoglycemia incidents from secure messages. The system has a great potential to facilitate early detection and treatment of hypoglycemia.
Citation
Per the author's request the PDF is not available.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.