Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Jul 20, 2025
Open Peer Review Period: Jul 20, 2025 - Sep 14, 2025
Date Accepted: Dec 29, 2025
(closed for review but you can still tweet)
Machine learning algorithms to predict venous thromboembolism in patients with sepsis in the intensive care unit: A multicenter retrospective study
ABSTRACT
Background:
Venous thromboembolism (VTE) is a common and severe complication in intensive care unit (ICU) patients with sepsis. Conventional risk stratification tools lack sepsisspecific features and may inadequately capture complex, nonlinear interactions among clinical variables.
Objective:
This study aimed to develop and validate an interpretable machine learning (ML) model for the early prediction of VTE in septic ICU patients.
Methods:
This multicenter retrospective study utilized data from the Medical Information Mart for Intensive Care (MIMIC-IV) database for model development and internal validation, and an independent cohort from Changshu Hospital for external validation. Candidate predictors were selected through univariate analysis, followed by least absolute shrinkage and selection operator (LASSO) regression. Variables retained by LASSO were used in multivariable logistic regression to identify independent predictors, which were then used to develop nine ML models, including categorical boosting (CatBoost), decision tree (DT), k-nearest neighbor (KNN), light gradient boosting machine (LGBM), logistic regression (LR), multilayer perceptron (MLP), naive Bayes (NB), random forest (RF), and support vector machine (SVM). Model performance was evaluated by discrimination (area under the receiver operating characteristic curve, AUC), calibration, and clinical utility (decision curve analysis, DCA). Model interpretability was assessed using SHapley Additive exPlanations (SHAP) to quantify the contribution of individual features to the predicted risk.
Results:
A total of 25,197 patients from the MIMIC-IV cohort and 328 patients from the external cohort were included, with VTE incidences of 3.35% and 9.15%, respectively. The LGBM model demonstrated the best performance, achieving an AUC of 0.956 in internal validation and 0.786 in external validation. Calibration curves indicated strong agreement between predicted and observed outcomes, and DCA showed superior net benefit across clinically relevant thresholds. SHAP analysis identified central venous catheterization, serum chloride and bicarbonate levels, arterial catheterization, and prolonged partial thromboplastin time (PTT) as the most influential predictors. Partial dependence plots revealed both linear and nonlinear associations between these variables and VTE risk. Individual-level force plots further enhanced interpretability by visualizing personalized risk profiles.
Conclusions:
We developed a high-performing and interpretable ML model for predicting VTE in ICU patients with sepsis. By integrating diverse clinical data and leveraging SHAP for transparent explanations, this tool may support personalized prophylaxis and early diagnostic strategies to reduce VTErelated morbidity and mortality in septic ICU populations.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.