Accepted for/Published in: Asian/Pacific Island Nursing Journal
Date Submitted: Apr 21, 2023
Open Peer Review Period: Apr 3, 2024 - May 29, 2024
Date Accepted: Apr 16, 2024
(closed for review but you can still tweet)
Random Forest Algorithm for Assessing Risk factors associated with Chronic Kidney Disease
ABSTRACT
Background:
Chronic Kidney Disease (CKD) is a chronic structural and functional disorder of the kidney caused by various causes, and it is a major global health concern, with studies suggesting an average annual increase of 3.4% in the mortality rate caused by CKD from 1990 to 2015, and a current global prevalence of 14.3%, the mortality rate of CKD is expected to be about 14 deaths per 100,000 by 2030. In addition, the economic burden of CKD represents 31.4% of the global annual burden of living with disability and is continuously growing at 1% per year. In China, the prevalence of CKD among people over 18 years old is 10.8%, with approximately 120 million patients, or 1 out of every 10 people. In Shanghai, the prevalence is even higher at 11.8%, or 1 in every 8-9 people, and only 12.5% of patients are aware of their disease.
Objective:
The aim of this study was to investigate the value of the Random Forest algorithm (RF) for assessing risk factors associated with chronic kidney disease (CKD).
Methods:
A population of 40,686 individuals with CKD was identified from those who underwent screening between 1 January 2015 and 22 December 2020 in Jing'an District, Shanghai, China. We divided CKD individuals into those requiring management and those who did not, based on GFR staging and albuminuria grouping. Using a logistic regression model (LR), we analyzed the relationship between CKD and risk factors. The RF algorithm in machine learning was used to score the predictive variables and rank them according to their importance, to construct a prediction model.
Results:
The LR model implied that women had a lower risk of CKD than men; the risk of CKD increased with age; CKD risk was higher in individuals whose BMI exceeded the normal range; those with abnormal eGFR index status had a higher risk for CKD. Furthermore, those who were retired had a higher risk for CKD than others, and those with urban employees' medical insurance had a higher risk for CKD than those with other medical insurances. According to the RF model, the order of risk factors for CKD was as follows: age, albuminuria, occupation, urinary albumin creatinine ratio, type of health insurance, eGFR index, urinary routine protein index, BMI, gender, history of hypertension, and blood creatinine index.
Conclusions:
Our conclusions suggest that the RF algorithm has significant predictive value for assessing risk factors associated with CKD. Moreover, older age, abnormal urine biomarkers, and BMI were identified as primary risk factors for CKD. The RF algorithm has the benefits of high accuracy, stability, and easy operation. Additionally, it avoids overlearning in classification and prediction.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.