Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Aug 21, 2018
Open Peer Review Period: Aug 26, 2018 - Oct 11, 2018
Date Accepted: Feb 10, 2019
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Detecting Hypoglycemia Incidents Reported in Patients’ Secure Messages: Using Cost-Sensitive Learning and Oversampling to Reduce Data Imbalance

Chen J, Lalor J, Liu W, Druhl E, Granillo E, Vimalananda VG, Yu H

Detecting Hypoglycemia Incidents Reported in Patients’ Secure Messages: Using Cost-Sensitive Learning and Oversampling to Reduce Data Imbalance

J Med Internet Res 2019;21(3):e11990

DOI: 10.2196/11990

PMID: 30855231

PMCID: 6431826

Detecting Hypoglycemia Incidents Reported in Patients’ Secure Messages: Using Cost-sensitive Learning and Oversampling to Reduce Data Imbalance

  • Jinying Chen; 
  • John Lalor; 
  • Weisong Liu; 
  • Emily Druhl; 
  • Edgard Granillo; 
  • Varsha G. Vimalananda; 
  • Hong Yu

ABSTRACT

Background:

Improper dosing of medications like insulin can cause hypoglycemic episodes, which may lead to severe morbidity or even death. Although secure messaging was designed for exchanging non-urgent messages, patients sometimes reported hypoglycemia events through secure messaging. Detecting these patient-reported adverse events may help alert clinical teams and enable early corrective actions to improve patient safety.

Objective:

We aimed to develop a natural language processing system, called HypoDetect (Hypoglycemia Detector) to automatically identify hypoglycemia incidents reported in patients’ secure messages.

Methods:

An expert in public health annotated 3,000 secure message threads as containing patient-reported hypoglycemia incidents or not. A physician independently annotated 100 threads randomly selected from this dataset for inter-annotator agreement. We used this dataset to develop and evaluate HypoDetect. HypoDetect incorporates three machine learning algorithms widely used for text classification: Linear Support Vector Machines, Random Forest, and Logistic Regression. We explored different learning features, including new knowledge-driven features. Because only 114 (3.8%) messages were annotated as positive, we investigated cost-sensitive learning and over-sampling methods to mitigate the challenge of imbalanced data.

Results:

The inter-annotator agreement was 0.976 Cohen’s Kappa. Using cross-validation, Logistic Regression with cost-sensitive learning achieved the best performance (Area Under ROC Curve score =0.954, Sensitivity=0.693, Specificity=0.974, F1=0.590). Cost-sensitive learning and the ensembled Synthetic Minority Over-sampling Technique improved the sensitivity of the baseline systems substantially (by 0.123 to 0.728 absolute gains). Our results show that a variety of features contributed to the best performance of HypoDetect.

Conclusions:

Despite the challenge of data imbalance, HypoDetect achieved promising results for the task of detecting hypoglycemia incidents from secure messages. The system has a great potential to facilitate early detection and treatment of hypoglycemia.


 Citation

Please cite as:

Chen J, Lalor J, Liu W, Druhl E, Granillo E, Vimalananda VG, Yu H

Detecting Hypoglycemia Incidents Reported in Patients’ Secure Messages: Using Cost-Sensitive Learning and Oversampling to Reduce Data Imbalance

J Med Internet Res 2019;21(3):e11990

DOI: 10.2196/11990

PMID: 30855231

PMCID: 6431826

Per the author's request the PDF is not available.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.