JMIR Preprints #80343: Predicting the Risk of Progression to Dialysis in Patients with Polycystic Kidney Disease: A Population-based Machine Learning Study

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Predicting the Risk of Progression to Dialysis in Patients with Polycystic Kidney Disease: A Population-based Machine Learning Study

Cheng-Hao Chang;
Mingchih Chen;
Ming-Hsien Tsai;
Yen-Chun Huang;
Hung-Hsiang Liou;
Ben-Chang Shia;
Chingying Liang;
Yu-Wei Fang

ABSTRACT

Background:

Autosomal dominant polycystic kidney disease (ADPKD), characterized by progressive cyst growth and renal decline, is the leading genetic cause of end‐stage renal disease.

Objective:

To develop and validate machine learning models for predicting the risk of progression to dialysis in patients with ADPKD using a nationwide administrative database. Early identification of high-risk patients is critical for timely monitoring.

Methods:

This retrospective cohort study utilized data from Taiwan's National Health Insurance Research Database (2007–2018) to identify newly diagnosed ADPKD patients. We employed six machine learning algorithms, including Logistic Regression, Random Forest, and eXtreme Gradient Boosting (XGBoost), to predict progression to dialysis. Models were developed using 10-fold cross-validation, with Synthetic Minority Over-sampling Technique applied within training folds to address class imbalance. An ensemble-based feature selection strategy was implemented to identify the most robust predictors and optimize final model performance. Model evaluation was conducted using a strict temporal split.

Results:

The study included 1,856 patients with ADPKD, of whom 302 (16.27%) progressed to dialysis. A multivariable Cox regression identified several significant risk factors, including age ≥66 years (Hazard Ratio [HR] 4.63, 95% CI 2.71-7.92; P<.001), anemia (HR 4.33, 95% CI 3.25-5.78; P<.001), congestive heart failure (CHF) (HR 1.81, 95% CI 1.29-2.54; P<.001), and acute kidney injury (AKI) (HR 1.69, 95% CI 1.19-2.41; P=.003). Among the machine learning models developed, the XGBoost model, using an optimized set of 27 features, demonstrated the highest predictive performance on the held-out temporal test set (accuracy 98.3%; AUC 0.955; F1 score 0.800; Brier score 0.022). The top predictors in the XGBoost model largely aligned with age, comorbidity burden, anemia, and cardiovascular disease markers and medication use (e.g., anticoagulants, loop diuretics, febuxostat) were among the most influential predictors. Importantly, medication-related predictors should be interpreted as proxies for disease complexity rather than direct risk modulators.

Conclusions:

This study demonstrates that machine learning models can predict dialysis risk in ADPKD patients using administrative data with temporal validation. This approach may support risk stratification by helping identify individuals at higher predicted risk who may warrant closer monitoring and further specialist evaluation.

Citation

Please cite as:

Chang CH, Chen M, Tsai MH, Huang YC, Liou HH, Shia BC, Liang C, Fang YW

Machine Learning Prediction of Progression to Dialysis in Patients With Polycystic Kidney Disease: Population-Based Retrospective Cohort Study

JMIR Med Inform 2026;14:e80343

DOI: 10.2196/80343

PMID: 41838891

PMCID: 12991194

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Jul 9, 2025

Date Accepted: Feb 17, 2026

Predicting the Risk of Progression to Dialysis in Patients with Polycystic Kidney Disease: A Population-based Machine Learning Study

ABSTRACT

Citation

Copyright