Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Jul 31, 2024
Date Accepted: Dec 25, 2024
Machine learning-based risk factor analysis and prediction model construction for the occurrence of chronic heart failure: a health ecologic research
ABSTRACT
Background:
Chronic heart failure is a serious threat to human health, with high morbidity and mortality rates, imposing a heavy burden on the healthcare system and society. With the abundance of medical data and the rapid development of machine learning technologies, new opportunities are provided for in-depth investigation of the mechanisms of chronic heart failure and the construction of predictive models. The introduction of health ecology research methodology enables a comprehensive dissection of chronic heart failure risk factors from a wider range of environmental, social and individual factors. This not only helps to identify high-risk groups at an early stage, but also provides a scientific basis for the development of precise prevention and intervention strategies.
Objective:
This study aims to use machine learning (ML) to construct a predictive model of the risk of occurrence of chronic heart failure (CHF) and analyze the risk of CHF from a health ecology perspective.
Methods:
This study is a retrospective cohort study based on the Jackson Heart Study. This study included 2,553 patients who did not have heart failure at baseline and used the occurrence of chronic heart failure as an outcome measure during a 10-year follow-up period. This study used machine learning algorithms to first clean the data, and then used chi-square tests and principal component analysis to select and interpret features. Finally, models were constructed based on the selected features. A total of four models were constructed that are decision tree model, random forest model, XGBoost model and stacked model.
Results:
Through feature selection, a total of 20 risk factors were ultimately determined, namely age, alcohol drinking, systolic blood pressure, glycosylated hemoglobin, high sensitivity C-reactive protein, heart rate, insurance type, income, education, the proportion of the population living in poverty in the region, neighborhood problems, favorable food stores (3 mile kernel), sportindex, activeindex, medical institution which usually go, ever awakened by trouble breathing, ever had swelling of feet or ankles, marriage, ratio of mv_peake to ma_peaka, history of cardiovascular diseases. The model with the best performance is XGBOOST, which has an accuracy of 0.889, a sensitivity of 0.919, and an F1 value of 0.859.
Conclusions:
This study proposes an ML-based risk prediction model for the development of chronic heart failure, which uses chi-square and PCA for feature selection and interprets it in the context of health ecology. XGBoost is superior to RF and DT and can accurately and rapidly predict disease onset, provide new ideas for clinical diagnosis and disease progression, and provide effective real-time risk assessment and intervention tools for chronic heart failure patients.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.