A Machine Learning Model for Risk Stratification of Post-diagnosis Diabetic Ketoacidosis Hospitalization in Pediatric Type 1 Diabetes: A Retrospective Study
ABSTRACT
Background:
Diabetic ketoacidosis (DKA) is the leading cause of morbidity and mortality among pediatric Type 1 Diabetes (T1D) patients. DKA occurs in 20-30% of T1D patients, with an economic cost of over $90 million/year in the US. Risk of post-diagnosis DKA is not spread uniformly over T1D patients: 80% of post-diagnosis DKA is experienced by fewer than 20% of patients.
Objective:
To investigate whether machine learning models can predict the risk of post-diagnosis DKA in children with T1D using routinely collected electronic health record (EHR) data. Such models allow for early therapeutic intervention.
Methods:
A retrospective case-control study was conducted using EHR data from 1,787 pediatric T1D patients treated at a large tertiary-care pediatric health system in the US from January 2010 to June 2018. Inclusion criteria were patients initially diagnosed at, and subsequently followed up, with onset date on or after 1/1/2010, age at diagnosis between 0-21, at least one positive antibody diabetes titer at diagnosis, and a clinical diagnosis of T1D. A state-of-the-art gradient-boosted ensemble of decision trees that systematically analyzed 44 regularly collected EHR variables was built to predict post-diagnosis DKA. Model performance, measured by AUC, weighted F1, precision and recall, was evaluated in five-fold cross-validation.
Results:
The model predicted post-diagnosis DKA risk with an AUC of 0.80 ± 0.04, a weighted F1 score of 0.78 ± 0.04, and a weighted precision and recall of 0.83 ± 0.03 and 0.76 ± 0.05 respectively. At the cohort level, the model stratified the population into three risk groups, identifying critical thresholds on diabetes age and HbA1c levels for optimal clinical intervention. It generated personalized risk scores, identifying key risk factors to help direct individualized intervention.
Conclusions:
We have built a predictive model that can be integrated into clinical workflow, to risk-stratify pediatric T1D patients and to direct individual clinical interventions at critical time points.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.