JMIR Preprints #46891: Predicting the 5-year Risk of NAFLD based on Machine Learning Models: Prospective Cohort Study

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Predicting the 5-year Risk of NAFLD based on Machine Learning Models: Prospective Cohort Study

Guoqing Huang;
Qiankai Jin;
Yushan Mao

ABSTRACT

Background:

Non-alcoholic fatty liver disease (NAFLD) has emerged as a worldwide public health issue. Identifying and preventing 5-year high-risk NAFLD populations contribute to reducing and delaying adverse hepatic prognostic events.

Objective:

This study aimed to investigate the 5-year incidence of NAFLD based on the Chinese population. Meanwhile, to establish and validate a machine learning model for predicting 5-year NAFLD risk.

Methods:

The study population was derived from a 5-year prospective cohort study. A total of 6,196 individuals (without NAFLD) who underwent health checkups at Zhenhai Lianhua Hospital in Ningbo, China, in 2010 were enrolled in the study. Extreme gradient boosting (XGBoost)-recursive feature elimination combined with least absolute shrinkage and selection operator to screen characteristic predictors. Six machine learning models, including logistic regression, decision trees, support vector machines, random forests, categorical boosting, and XGBoost, were utilized in the construction of a 5-year risk model for NAFLD. Hyperparameter optimization of the predictive model was performed in the training set and further evaluation of the model performance was carried out in the internal and external validation sets.

Results:

The 5-year incidence of NAFLD was 18.64% in the study population. Eleven predictors were screened for risk prediction model construction. After hyperparameter optimization, CatBoost demonstrated the best prediction performance in the training set, while the area under the receiver operating characteristic curve (AUC) was 0.810 (0.768 - 0.852); logistic regression showed the best prediction performance in the internal [AUC: 0.778 (0.759 - 0.794)] and external validation sets [AUC: 0.806 (0.788 - 0.821)]. The development of web-based calculators has enhanced the clinical feasibility of the risk prediction model.

Conclusions:

The development and validation of predictive models facilitate to predict the 5-year high-risk NAFLD populations, which could help to delay and reduce the occurrence of adverse liver prognostic events.

Citation

Please cite as:

Huang G, Jin Q, Mao Y

Predicting the 5-Year Risk of Nonalcoholic Fatty Liver Disease Using Machine Learning Models: Prospective Cohort Study

J Med Internet Res 2023;25:e46891

DOI: 10.2196/46891

PMID: 37698911

PMCID: 10523217

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Mar 1, 2023

Date Accepted: Aug 16, 2023

Predicting the 5-year Risk of NAFLD based on Machine Learning Models: Prospective Cohort Study

ABSTRACT

Citation

Copyright

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Mar 1, 2023

Date Accepted: Aug 16, 2023

Predicting the 5-year Risk of NAFLD based on Machine Learning Models: Prospective Cohort Study

ABSTRACT

Citation

Per the author's request the PDF is not available.

Copyright