Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Currently accepted at: JMIR Medical Informatics

Date Submitted: Jul 9, 2025
Open Peer Review Period: Jul 21, 2025 - Sep 15, 2025
Date Accepted: May 16, 2026
(closed for review but you can still tweet)

This paper has been accepted and is currently in production.

It will appear shortly on 10.2196/80377

The final accepted version (not copyedited yet) is in this tab.

Quantifying the Predictive Power of Social Determinants of Health in Cardiovascular Disease and Type 2 Diabetes Progression Using XGBoost: A Retrospective Cohort Study

  • Hielke Muizelaar; 
  • Marcel Haas; 
  • Maarten van Aken; 
  • Rimke Vos; 
  • Marco Spruit

ABSTRACT

Background:

Cardiometabolic diseases such as type 2 diabetes (DM2) and cardiovascular disease (CVD) are influenced not only by biomedical risk factors but also by social determinants of health (SDOH). While the inclusion of SDOH in predictive models is increasingly advocated, few studies have quantified their specific contribution in a high-risk clinical cohort using robust statistical and machine learning approaches.

Objective:

This study aims to quantify the added predictive value of SDOH in predicting 5-year, 10-year and overall risk of cardiometabolic disease onset among individuals already at elevated risk, and to compare this added value across multiple modelling setups and frameworks.

Methods:

We used a large, linked dataset of 160,000 inclusion events from the ELAN data warehouse in the Netherlands, combining structured coded diagnosis and medication [GP] records with individual-level socioeconomic data from Statistics Netherlands. Individuals aged 30+ without prior DM2 or CVD were followed to assess disease progression. We trained Cox proportional hazards and XGBoost models to predict progression to DM2/CVD within 5- and 10-years and overall. All analyses were performed using the R programming language. Experiments included comparisons of SCORE2 , Cox, and XGBoost models; evaluation of time-bound and survival-based formulations; and quantification of SDOH impact using feature subset XGBoost models and gain-based importance.

Results:

For 10-year CVD prediction, the XGBoost binary model outperformed both Cox proportional hazards (AUC = 0.748 vs. 0.731) and SCORE2 (AUC = 0.648; P < .001). In overall event prediction, XGBoost also achieved the highest AUC (0.731), significantly better than Cox (AUC = 0.697; P < .001). For 5-year prediction, the combined XGBoost model (medical + social features) reached an AUC of 0.734, significantly higher than the medical-only model (AUC = 0.725; P < .001), and the social-only model (AUC = 0.679; P < .001). Income-related variables were among the top features in the combined model, with gains comparable to core biomedical predictors. Feature gain analysis showed that social determinants meaningfully supplement biomedical features, especially when used together. While medical features contributed more overall (total gain = 0.6066), social features added complementary value (gain = 0.2649), particularly income variables.

Conclusions:

This study quantifies the added value of SDOH in predicting cardiometabolic disease progression. Using linked medical and socioeconomic data, we show that while biomedical factors dominate, income-related SDOH significantly enhance predictive performance, highlighting their complementary role in personalised risk assessment and model development. Clinical Trial: Not applicable. This study did not involve a randomized controlled trial.


 Citation

Please cite as:

Muizelaar H, Haas M, van Aken M, Vos R, Spruit M

Quantifying the Predictive Power of Social Determinants of Health in Cardiovascular Disease and Type 2 Diabetes Progression Using XGBoost: A Retrospective Cohort Study

JMIR Medical Informatics. 16/05/2026:80377 (forthcoming/in press)

DOI: 10.2196/80377

URL: https://preprints.jmir.org/preprint/80377

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.