Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Mar 1, 2023
Date Accepted: Aug 16, 2023
Predicting the 5-year Risk of NAFLD based on Machine Learning Models: Prospective Cohort Study
ABSTRACT
Background:
Non-alcoholic fatty liver disease (NAFLD) has emerged as a worldwide public health issue. Identifying and preventing 5-year high-risk NAFLD populations contribute to reducing and delaying adverse hepatic prognostic events.
Objective:
This study aimed to investigate the 5-year incidence of NAFLD based on the Chinese population. Meanwhile, to establish and validate a machine learning model for predicting 5-year NAFLD risk.
Methods:
The study population was derived from a 5-year prospective cohort study. A total of 6,196 individuals (without NAFLD) who underwent health checkups at Zhenhai Lianhua Hospital in Ningbo, China, in 2010 were enrolled in the study. Extreme gradient boosting (XGBoost)-recursive feature elimination combined with least absolute shrinkage and selection operator to screen characteristic predictors. Six machine learning models, including logistic regression, decision trees, support vector machines, random forests, categorical boosting, and XGBoost, were utilized in the construction of a 5-year risk model for NAFLD. Hyperparameter optimization of the predictive model was performed in the training set and further evaluation of the model performance was carried out in the internal and external validation sets.
Results:
The 5-year incidence of NAFLD was 18.64% in the study population. Eleven predictors were screened for risk prediction model construction. After hyperparameter optimization, CatBoost demonstrated the best prediction performance in the training set, while the area under the receiver operating characteristic curve (AUC) was 0.810 (0.768 - 0.852); logistic regression showed the best prediction performance in the internal [AUC: 0.778 (0.759 - 0.794)] and external validation sets [AUC: 0.806 (0.788 - 0.821)]. The development of web-based calculators has enhanced the clinical feasibility of the risk prediction model.
Conclusions:
The development and validation of predictive models facilitate to predict the 5-year high-risk NAFLD populations, which could help to delay and reduce the occurrence of adverse liver prognostic events.
Citation
Per the author's request the PDF is not available.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.