Accepted for/Published in: JMIR Public Health and Surveillance
Date Submitted: Mar 7, 2023
Open Peer Review Period: Mar 7, 2023 - May 2, 2023
Date Accepted: Jul 25, 2023
(closed for review but you can still tweet)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Combination use of machine learning and logistic regression improves carotid plaque risk prediction in individuals with fatty liver disease: a study involving 5.4 million health check-up adults
ABSTRACT
Background:
Carotid plaque can progress into stroke and myocardial infarction, etc., which are the leading causes of death globally. Evidence demonstrates that in patients with fatty liver disease, the incidence of carotid plaque increased significantly. However, unlike the high detection rate of fatty liver disease, screening for carotid plaque in the asymptomatic population is not yet prevalent due to cost-effectiveness reasons.
Objective:
This study aimed to combine the advantages of machine learning and logistic regression, to develop a straightforward prediction model among the population with fatty liver disease to identify individuals at risk of carotid plaque.
Methods:
5,420,640 participants with fatty liver from Meinian Healthcare Center were included in our study. Three machine learning algorithms, including random forest, elastic net, and XGBoost were used to select important features from potential predictors, and features acknowledged by all three models were enrolled in logistic regression analysis to develop a carotid plaque prediction model among the population with fatty liver. Model performance was evaluated based on the area under the receiver operating characteristic curve, calibration curve, Brier score, and decision curve analysis both in a randomly split internal validation dataset, and an external validation dataset from MJ Health Check-up Center. The risk cutoff points for carotid plaque were determined based on a large sample size of the development dataset for risk assessment and verified on the external validation dataset.
Results:
Among the participants, 1,421,970 (26.23%) were diagnosed with carotid plaque. A total of five features, including age, systolic blood pressure, low-density lipoprotein cholesterol, total cholesterol, fasting blood glucose, and hepatic steatosis index were collectively selected by all three machine learning models out of 27 predictors. The logistic regression model established with the five predictors reached an area under the curve (AUC) of 0.831 in the internal validation dataset and 0.801 in the external validation dataset and showed good calibration capability graphically. Its predictive performance was comprehensively competitive compared with the single use of either logistic regression or machine learning algorithms. 25% and 65% were determined to be the predictive probability cutoff points of low risk, intermediate risk, and high risk of carotid plaque.
Conclusions:
The combination of machine learning and logistic regression outperformed the single use of any of them in establishing a straightforward and practical carotid plaque prediction model, and was of great value in the early identification and risk assessment of carotid plaque in population with fatty liver.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.