Predicting Cognitive Decline in the Elderly Using Machine Learning: Insights from the Chinese Longitudinal Healthy Longevity Survey
ABSTRACT
Background:
Cognitive impairment, indicative of Alzheimer's disease and other forms of dementia, significantly deteriorates the quality of life of elderly populations and imposes considerable burdens on families and healthcare systems globally. The early identification of individuals at risk for cognitive impairment through a convenient and rapid method is crucial for the timely implementation of interventions.
Objective:
The objective of this study was to explore the application of machine learning (ML) to integrate blood biomarkers, life behaviors, and disease history to predict the decline in cognitive function.
Methods:
This approach utilizes data from the Chinese Longitudinal Healthy Longevity Survey (CLHLS). A total of 2,688 participants aged 65 or older from the 2008–2009, 2011–2012, and 2014 CLHLS waves were included, with cognitive impairment defined as a Mini-Mental State Examination (MMSE) score below 18. The dataset was divided into a training set (n = 1,331), an internal test set (n = 333), and a prospective validation set (n = 1,024). Participants with a baseline MMSE score of less than 18 were excluded from the cohort to ensure a more accurate assessment of cognitive function. We developed machine learning (ML) models that integrate demographic information, health behaviors, disease history, and blood biomarkers to predict cognitive function at the three-year follow-up point, specifically identifying individuals who are at risk of experiencing significant declines in cognitive function by that time. Specifically, the models aimed to identify individuals who would experience a significant decline in their MMSE scores (less than 18) by the end of the follow-up period. The performance of these models was evaluated using metrics including accuracy, sensitivity, and the area under the receiver operating characteristic curve (AUC).
Results:
All machine learning models outperformed the MMSE alone. The Balanced Random Forest achieved the highest accuracy (88.5% in the internal test set and 88.7% in the prospective validation set), albeit with a lower sensitivity, while Logistic Regression recorded the highest sensitivity. SHAP analysis identified instrumental activities of daily living (IADL), age, and baseline MMSE scores as the most influential predictors for cognitive impairment.
Conclusions:
The incorporation of blood biomarkers, along with demographic, life behavior, and disease history into machine learning models offers a convenient, rapid, and accurate approach for the early identification of elderly individuals at risk of cognitive impairment. This method presents a valuable tool for healthcare professionals to facilitate timely interventions and underscores the importance of integrating diverse data types in predictive health models.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.