Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Jun 5, 2024
Date Accepted: Apr 7, 2025
Date Submitted to PubMed: Apr 9, 2025
An explainable machine-learning model for predicting persistent sepsis associated acute kidney injury: development, validation, and comparison with CCL14
ABSTRACT
Background:
Persistent sepsis-associated acute kidney injury (SA-AKI) portends worse clinical outcomes and remains a therapeutic challenge for clinicians. Early identification and prediction of persistent SA-AKI is crucial.
Objective:
The aim of this study was to develop and validate an interpretable machine learning (ML) model that predicts persistent SA-AKI, and to compare its diagnostic performance with CCL14 in a prospective cohort.
Methods:
Four retrospective cohorts and one prospective cohort were used for model derivation and validation. The derivation cohort utilized the MIMIC-IV database, randomly split into 80% for model construction and 20% for internal validation. External validation is conducted using subsets of the MIMIC-III dataset, the e-ICU dataset, and retrospective cohorts from the ICU of a Northern Jiangsu people's hospital. Prospective data from the same ICU were used for validation and compared with urinary CCL14 biomarker measurements. AKI was defined based on serum creatinine and urine output, using the kidney disease: Improving Global Outcomes (KDIGO) criteria. Routine clinical data within the first 24 hours of ICU admission were collected, and eight ML algorithms were utilized to construct the prediction model. Multiple evaluation metrics, including the area under the receiver operating characteristic curve (AUC), were employed to compare predictive performance. Feature importance was ranked using SHAP, and the final model was explained accordingly. In addition, the model is developed into a web-based application using the Streamlit framework to facilitate its clinical application.
Results:
In this study, a total of 46,097 sepsis patients from multiple cohorts were enrolled for analysis. Among the 17,928 sepsis patients in the derivation cohort, 8,081 cases (45.1%) developed into persistent SA-AKI. Among eight ML models, the Gradient Boosting Machine (GBM) model demonstrated superior discriminative ability. Following feature importance ranking, a final interpretable GBM model comprising twelve features (AKI stage, Δcreatinine, urine output, furosemide dose, BMI, SOFA score, KRT, mechanical ventilation, lactate, Bun, PT and age) was established. The final model accurately predicted the occurrence of persistent SA-AKI in both internal (AUC = 0.870) and external validation cohorts (MIMIC-III subset: AUC = 0.891, e-ICU dataset: AUC = 0.932, North Jiangsu people's Hospital retrospective cohort: AUC = 0.983). In the prospective cohort, the GBM model outperformed urinary CCL14 in predicting persistent SA-AKI (GBM AUC = 0.852 vs. CCL14 AUC = 0.821). Additionally, the model has been transformed into an online clinical tool to facilitate its application in clinical settings.
Conclusions:
The interpretable GBM model has been shown to successfully and accurately predict the occurrence of persistent SA-AKI, demonstrating good predictive ability in both internal and external validation cohorts. Furthermore, the model has been demonstrated to outperform the biomarker CCL14 in prospective cohort validation.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.