Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Mar 1, 2023
Date Accepted: Aug 16, 2023

The final, peer-reviewed published version of this preprint can be found here:

Predicting the 5-Year Risk of Nonalcoholic Fatty Liver Disease Using Machine Learning Models: Prospective Cohort Study

Huang G, Jin Q, Mao Y

Predicting the 5-Year Risk of Nonalcoholic Fatty Liver Disease Using Machine Learning Models: Prospective Cohort Study

J Med Internet Res 2023;25:e46891

DOI: 10.2196/46891

PMID: 37698911

PMCID: 10523217

Predicting the 5-year Risk of NAFLD based on Machine Learning Models: Prospective Cohort Study

  • Guoqing Huang; 
  • Qiankai Jin; 
  • Yushan Mao

ABSTRACT

Background:

Non-alcoholic fatty liver disease (NAFLD) has emerged as a worldwide public health issue. Identifying and preventing 5-year high-risk NAFLD populations contribute to reducing and delaying adverse hepatic prognostic events.

Objective:

This study aimed to investigate the 5-year incidence of NAFLD based on the Chinese population. Meanwhile, to establish and validate a machine learning model for predicting 5-year NAFLD risk.

Methods:

The study population was derived from a 5-year prospective cohort study. A total of 6,196 individuals (without NAFLD) who underwent health checkups at Zhenhai Lianhua Hospital in Ningbo, China, in 2010 were enrolled in the study. Extreme gradient boosting (XGBoost)-recursive feature elimination combined with least absolute shrinkage and selection operator to screen characteristic predictors. Six machine learning models, including logistic regression, decision trees, support vector machines, random forests, categorical boosting, and XGBoost, were utilized in the construction of a 5-year risk model for NAFLD. Hyperparameter optimization of the predictive model was performed in the training set and further evaluation of the model performance was carried out in the internal and external validation sets.

Results:

The 5-year incidence of NAFLD was 18.64% in the study population. Eleven predictors were screened for risk prediction model construction. After hyperparameter optimization, CatBoost demonstrated the best prediction performance in the training set, while the area under the receiver operating characteristic curve (AUC) was 0.810 (0.768 - 0.852); logistic regression showed the best prediction performance in the internal [AUC: 0.778 (0.759 - 0.794)] and external validation sets [AUC: 0.806 (0.788 - 0.821)]. The development of web-based calculators has enhanced the clinical feasibility of the risk prediction model.

Conclusions:

The development and validation of predictive models facilitate to predict the 5-year high-risk NAFLD populations, which could help to delay and reduce the occurrence of adverse liver prognostic events.


 Citation

Please cite as:

Huang G, Jin Q, Mao Y

Predicting the 5-Year Risk of Nonalcoholic Fatty Liver Disease Using Machine Learning Models: Prospective Cohort Study

J Med Internet Res 2023;25:e46891

DOI: 10.2196/46891

PMID: 37698911

PMCID: 10523217

Per the author's request the PDF is not available.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.