Treatment Recommendations for Clinical Deterioration on the Wards: Development and Validation of Machine Learning Models
ABSTRACT
Background:
Clinical deterioration in general ward patients is associated with increased morbidity and mortality. Early and appropriate treatments can improve outcomes for such patients. While machine learning tools have proven successful in the early identification of clinical deterioration risk, little work has explored their effectiveness in providing data-driven treatment recommendations to clinicians for high-risk patients.
Objective:
This study established machine learning performance benchmarks for predicting the need for 10 common clinical deterioration interventions. This study also compared the performance of various machine learning models to inform which types of approaches are well-suited to these prediction tasks.
Methods:
We relied on a chart-reviewed, multicenter dataset of general ward patients experiencing clinical deterioration (n=2480 encounters), who were identified as high risk using a Food and Drug Administration cleared early warning score (eCART). Manual chart review labeled each encounter with gold-standard lifesaving treatment labels. We trained elastic net logistic regression, gradient boosted machines, long short-term memory, and stacking ensemble models to predict the need for 10 common deterioration interventions at the time of the deterioration early warning score. Models were trained on encounters from 3 health systems and externally validated on encounters from a fourth health system. Discriminative performance, assessed by the area under the receiver operating characteristic curve (AUC), was the primary evaluation metric.
Results:
Discriminative performance varied widely by model and prediction task, with AUCs typically ranging from 0.7-0.9. Across all models, antiarrhythmics were the easiest treatment to predict (mean AUC 0.866) while anticoagulants were the hardest to predict (mean AUC 0.660). While no individual modeling approach outperformed the others across all tasks, the gradient boosted machines tended to show the best individual performance. Additionally, the stacking ensemble, which combined predictions from all models, typically matched or outperformed the best-performing individual model for each task. We also demonstrated that a sizeable fraction of patients in our evaluation cohort were untreated at the time of the high-risk early warning flag, highlighting an opportunity to leverage ML tools to decrease treatment latency.
Conclusions:
We found variability in the discrimination of machine learning models across tasks and model approaches for predicting lifesaving treatments in patients with clinical deterioration. Overall performance was high, and these models could be paired with early warning scores to provide clinicians with timely and actionable treatment recommendations to improve patient care.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.