Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Jul 23, 2025
Date Accepted: May 16, 2026
Predicting End-Stage Renal Disease and Mortality in Chronic Kidney Disease Using Machine Learning: A Retrospective Cohort Study
ABSTRACT
Background:
Chronic kidney disease (CKD) is a global health burden characterized by heterogeneous progression trajectories. Without timely and appropriate management, CKD can lead to increased morbidity, mortality, and reduced quality of life. Therefore, early identification of patients at high risk of developing end-stage renal disease (ESRD) or mortality is essential to facilitate timely intervention and improve patient outcomes.
Objective:
This study aimed to develop and validate machine learning models to predict ESRD and all-cause mortality in patients with CKD.
Methods:
We developed and validated machine learning models using data from patients with chronic kidney disease and an estimated glomerular filtration rate (eGFR) of <60 mL/min/1.73m2 who were treated at Taipei Veterans General Hospital between 2011 and 2021. Predictors included 69 routinely available demographic, clinical, medication, laboratory, and echocardiographic variables. The outcomes were ESRD and all-cause mortality. The cohort was randomly divided into training (80%) and testing (20%) sets. Evaluated models included XGBoost, LightGBM, CatBoost, Random Forest, and a Stacking Classifier. Model performance was assessed using the area under the receiver operating characteristic curve (AUROC), the area under the precision–recall curve (AUPRC), calibration, and decision-curve analysis. Supplementary time-to-event analyses were performed using Kidney Failure Risk Equation and survival-based machine learning models.
Results:
A total of 29,677 patients were included in the study. The median age was 79 years, and 16,359 (55.1%) were male. Among these patients, 14,993 (50.5%) had hypertension, 7,908 (26.6%) had diabetes mellitus, and 1,768 (6.0%) had cancer. During follow-up, 649 patients (2.2%) developed ESRD, and 3,631 (12.2%) died. The models demonstrated high predictive performance for ESRD, with AUROCs ranging from 0.839 to 0.894. For all-cause mortality, the predictive performance was more modest, with AUROCs ranging from 0.752 to 0.774. Given the low incidence of ESRD in this cohort, model performance was additionally evaluated using precision–recall curves. The area under the precision–recall curve ranged from 0.172 to 0.216 for ESRD prediction and from 0.330 to 0.356 for all-cause mortality across models. Calibration and decision-curve analyses supported model reliability and clinical utility.
Conclusions:
Machine learning algorithms may serve as useful tools for risk stratification of ESRD and all-cause mortality in patients with CKD, with potential to support more individualized clinical management. Keywords: Chronic kidney disease; End-stage renal disease; Machine learning; Mortality; SHapley Additive exPlanations (SHAP) Clinical Trial: none
Citation
Per the author's request the PDF is not available.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.