Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Jul 9, 2025
Date Accepted: Feb 17, 2026

The final, peer-reviewed published version of this preprint can be found here:

Machine Learning Prediction of Progression to Dialysis in Patients With Polycystic Kidney Disease: Population-Based Retrospective Cohort Study

Chang CH, Chen M, Tsai MH, Huang YC, Liou HH, Shia BC, Liang C, Fang YW

Machine Learning Prediction of Progression to Dialysis in Patients With Polycystic Kidney Disease: Population-Based Retrospective Cohort Study

JMIR Med Inform 2026;14:e80343

DOI: 10.2196/80343

PMID: 41838891

Predicting the Risk of Progression to Dialysis in Patients with Polycystic Kidney Disease: A Population-based Machine Learning Study

  • Cheng-Hao Chang; 
  • Mingchih Chen; 
  • Ming-Hsien Tsai; 
  • Yen-Chun Huang; 
  • Hung-Hsiang Liou; 
  • Ben-Chang Shia; 
  • Chingying Liang; 
  • Yu-Wei Fang

ABSTRACT

Background:

Autosomal dominant polycystic kidney disease (ADPKD), characterized by progressive cyst growth and renal decline, is the leading genetic cause of end‐stage renal disease.

Objective:

To develop and validate machine learning models for predicting the risk of progression to dialysis in patients with ADPKD using a nationwide administrative database. Early identification of high-risk patients is critical for timely monitoring.

Methods:

This retrospective cohort study utilized data from Taiwan's National Health Insurance Research Database (2007–2018) to identify newly diagnosed ADPKD patients. We employed six machine learning algorithms, including Logistic Regression, Random Forest, and eXtreme Gradient Boosting (XGBoost), to predict progression to dialysis. Models were developed using 10-fold cross-validation, with Synthetic Minority Over-sampling Technique applied within training folds to address class imbalance. An ensemble-based feature selection strategy was implemented to identify the most robust predictors and optimize final model performance. Model evaluation was conducted using a strict temporal split.

Results:

The study included 1,856 patients with ADPKD, of whom 302 (16.27%) progressed to dialysis. A multivariable Cox regression identified several significant risk factors, including age ≥66 years (Hazard Ratio [HR] 4.63, 95% CI 2.71-7.92; P<.001), anemia (HR 4.33, 95% CI 3.25-5.78; P<.001), congestive heart failure (CHF) (HR 1.81, 95% CI 1.29-2.54; P<.001), and acute kidney injury (AKI) (HR 1.69, 95% CI 1.19-2.41; P=.003). Among the machine learning models developed, the XGBoost model, using an optimized set of 27 features, demonstrated the highest predictive performance on the held-out temporal test set (accuracy 98.3%; AUC 0.955; F1 score 0.800; Brier score 0.022). The top predictors in the XGBoost model largely aligned with age, comorbidity burden, anemia, and cardiovascular disease markers and medication use (e.g., anticoagulants, loop diuretics, febuxostat) were among the most influential predictors. Importantly, medication-related predictors should be interpreted as proxies for disease complexity rather than direct risk modulators.

Conclusions:

This study demonstrates that machine learning models can predict dialysis risk in ADPKD patients using administrative data with temporal validation. This approach may support risk stratification by helping identify individuals at higher predicted risk who may warrant closer monitoring and further specialist evaluation.


 Citation

Please cite as:

Chang CH, Chen M, Tsai MH, Huang YC, Liou HH, Shia BC, Liang C, Fang YW

Machine Learning Prediction of Progression to Dialysis in Patients With Polycystic Kidney Disease: Population-Based Retrospective Cohort Study

JMIR Med Inform 2026;14:e80343

DOI: 10.2196/80343

PMID: 41838891

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.