Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Apr 28, 2021
Date Accepted: Mar 6, 2022
Machine Learning Approach for Predicting Sepsis Mortality in a Population-based National Database
ABSTRACT
Background:
While machine learning (ML) algorithms have been applied to point-of-care sepsis prognostication, ML has not been used to predict sepsis mortality in the administrative database. Therefore, we examined the performance of common machine learning algorithms to predict sepsis mortality in adult patients with sepsis and compared their performance with the conventional context knowledge-based logistic regression approach.
Objective:
To examine the performance of common machine learning (ML) algorithms to predict sepsis mortality in adult patients with sepsis and compare their performance with the conventional context knowledge-based logistic regression approach.
Methods:
We examined inpatient admissions for sepsis in the US National Inpatient Sample (NIS) using hospitalizations in 2010-2013 as the training dataset. We developed four machine learning models to predict in-hospital mortality: (1) logistic regression with Lasso regularization, (2) random forest, (3) gradient boosted decision tree, and (4) deep neural network. To estimate their performance, we compared our models with the Super Learner model. Using hospitalizations in 2014 as the testing dataset, we examined the models’ area under the receiver operator characteristic curve (AUC), confusion matrix results, and net reclassification improvement.
Results:
Hospitalizations for 923,759 adults were included in the analysis. Compared to the reference logistic regression (AUC: 0.786, 95% confidence interval [CI]: 0.783 - 0.788), all ML models showed superior discriminative ability (p < 0.001), including logistic regression with lasso regularization (AUC: 0.878, 95% CI: 0.876 - 0.879), random forest (AUC: 0.878, 95% CI: 0.877 - 0.880), xgboost (AUC: 0.888, 95% CI: 0.886 - 0.889), and neural network (AUC: 0.893, 95% CI: 0.891 - 0.895). All four ML models showed higher sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) compared to the reference logistic regression model (p < 0.001). We obtained similar results from the Super Learner model (AUC 0.883, 95% CI: 0.881 - 0.885).
Conclusions:
ML approaches can improve sensitivity, specificity, PPV, NPV, discrimination and calibration in predicting in-hospital mortality for patients hospitalized with sepsis in the states. These models need further validation and could be applied in developing more accurate models to compare risk-standardized mortality rates across hospitals and geographic regions, paving the way for research and policy initiatives studying disparities in sepsis care.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.