Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Sep 15, 2023
Date Accepted: Sep 17, 2024
Machine learning-based prediction for incident hypertension based on regular health checkup data: derivation and validation in two independent nationwide cohorts in South Korea and Japan
ABSTRACT
Background:
Globally, cardiovascular diseases (CVDs) are the primary cause of death, with hypertension as a key contributor. In 2019, CVD led to 17.9 million deaths, predicted to reach 23 million by 2030.
Objective:
This study presents a new method to predict hypertension using demographic data, employing six machine learning models for enhanced reliability and applicability. The goal is to harness AI for early and accurate hypertension diagnosis across diverse populations.
Methods:
Data from two national cohort studies, NHIS-NSC (South Korea, n=244,814), conducted between 2002 and 2013 were utilized to train and test machine learning models designed to anticipate incident hypertension within five years of a health checkup involving ≥20 years of age, and JMDC (Japan, n=1,296,649) were utilized for extra validation. An ensemble from six diverse machine learning models was employed to identify the five most salient features contributing to hypertension by presenting a feature importance analysis to confirm the contribution of each future.
Results:
The AdaBoost and logistic regression ensemble showed superior balanced accuracy (0.812; sensitivity, 0.806; specificity, 0.818; area under the receiver operating characteristic curve [AUROC], 0.901). The five key hypertension indicators were age, diastolic blood pressure, body mass index, systolic blood pressure, and fasting blood glucose. The JMDC dataset (extra-validation set) corroborated these findings (balanced accuracy, 0.741; AUROC, 0.824). The ensemble model was integrated into a public web portal (http://ai-wm.khu.ac.kr/Hypertension/) for predicting hypertension onset based on health checkup data.
Conclusions:
Comparative evaluation of our machine learning models against classical statistical models across two distinct studies emphasized the former's enhanced stability, generalizability, and reproducibility in predicting hypertension onset.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.