Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Mar 24, 2025
Open Peer Review Period: Apr 3, 2025 - May 29, 2025
Date Accepted: Jun 16, 2025
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Machine Learning–Based Analysis of Lifestyle Risk Factors for Atherosclerotic Cardiovascular Disease: Retrospective Case-Control Study

Kim HJ, Choi H, Ahn HJ, Shin SH, Kim C, Lee SH, Sohn JH, Lee JJ

Machine Learning–Based Analysis of Lifestyle Risk Factors for Atherosclerotic Cardiovascular Disease: Retrospective Case-Control Study

JMIR Med Inform 2025;13:e74415

DOI: 10.2196/74415

PMID: 40773657

PMCID: 12330983

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Machine Learning-Based Analysis of Lifestyle Risk Factors in Atherosclerotic Cardiovascular Disease Risk++

  • Hye-Jin Kim; 
  • Heeji Choi; 
  • Hyo-Jung Ahn; 
  • Seung-Ho Shin; 
  • Chulho Kim; 
  • Sang-Hwa Lee; 
  • Jong-Hee Sohn; 
  • Jae-Jun Lee

ABSTRACT

Background:

The risk of developing atherosclerotic cardiovascular disease (ASCVD) varies among individuals and is related to a variety of lifestyle factors in addition to the presence of chronic diseases.

Objective:

We aimed to assess the predictive accuracy of machine learning (ML) models incorporating lifestyle risk behaviors for ASCVD risk using Korean nationwide database.

Methods:

Utilizing data from the Korea National Health and Nutrition Examination Survey, five ML algorithms were employed for the prediction of high ASCVD risk: logistic regression, support vector machine, random forest, extreme gradient boosting (XGB), and light gradient boosting (LGB) models. ASCVD risk was assessed using the Pooled Cohort Equations, with a high-risk threshold of ≥7.5% over 10 years. Among the 8,573 participants aged 40–79 years, propensity score matching (PSM) was used to adjust for demographic confounders. We divided the dataset into a training and a test dataset in an 8:2 ratio. We also used bootstrapping to train the ML model with the area under the receiver operating characteristics curve (AUROC) score. Shapley additive explanations was used to identify the models’ important variables in assessing high ASCVD risks. In sensitivity analysis, we additionally performed binary logistic regression analysis, in which the ML model’s results were consistent with conventional statistical model.

Results:

Of 8,573 participants, 41.7% had high ASCVD risk. Before PSM, age and sex differed significantly between groups. PSM (1:1) yielded 1,976 patients with balanced demographics. After PSM, the high ASCVD risk group had higher alcohol/tobacco use, lower omega-3 intake, higher BMI, less physical activity, and spent less time sitting. In 5 ML models, XGB model showed high AUROC values, with LGB model outperforming in accuracy, recall, and F1 score. Variable importance analysis using Shapley additive explanations identified smoking and age as the strongest predictors, while BMI, sodium or omega-3 intake, and LDL cholesterol also had significant variables. Sensitivity analysis using multivariable LR analysis also confirmed these findings, showing that smoking, BMI, and LDL cholesterol increased ASCVD risk, whereas omega-3 intake and physical activity were associated with lower risk.

Conclusions:

Analyzing lifestyle behavioral factors in ASCVD risk with ML improves predictive performance compared to traditional models. Personalized prevention strategies tailored to an individual’s lifestyle can effectively reduce ASCVD risk.


 Citation

Please cite as:

Kim HJ, Choi H, Ahn HJ, Shin SH, Kim C, Lee SH, Sohn JH, Lee JJ

Machine Learning–Based Analysis of Lifestyle Risk Factors for Atherosclerotic Cardiovascular Disease: Retrospective Case-Control Study

JMIR Med Inform 2025;13:e74415

DOI: 10.2196/74415

PMID: 40773657

PMCID: 12330983

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.