Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Apr 11, 2024
Date Accepted: Oct 19, 2024
Development and Evaluation of a Multivariable Prediction Model for Mild Cognitive Impairment and Dementia: A Comparison of Machine Learning and Traditional Methodologies
ABSTRACT
Background:
Mild cognitive impairment (MCI) poses significant challenges in early diagnosis and timely intervention. Underdiagnosis, coupled with the economic and social burden of dementia, necessitates more precise detection methods. Machine learning (ML) algorithms show promise in managing complex data for MCI and dementia prediction.
Objective:
This study assessed the predictive accuracy of ML models versus traditional logistic regression in identifying the onset of MCI and dementia using the Korean Longitudinal Study of Aging (KLoSA) dataset.
Methods:
This study utilized data from the KLoSA, a comprehensive biennial survey that tracks the demographic, health, and socioeconomic aspects of middle-aged and older Koreans from 2018 to 2020. Among the 6,171 initial households, 4,975 eligible older adult participants aged 60 or older were selected after excluding individuals based on age and missing data. The identification of MCI and dementia relied on self-reported diagnoses, with sociodemographic and health-related variables serving as key covariates. The dataset was categorized into training and test sets to predict MCI and dementia by employing models such as logistic regression, light gradient-boosting machines, XGBoost, CatBoost, and Random Forest. The performance was assessed using the area under the receiver operating characteristic curve (AUC). Hyperparameter tuning was specifically conducted to enhance the models and improve their predictive accuracies. The Shapley additive explanation values were used to determine the contribution of each feature to the prediction rate.
Results:
Among the 4,975 participants, logistic regression excelled in predicting dementia onset, whereas XGBoost performed better for MCI (XGBoost AUC: 0.7271, Random Forest AUC: 0.6741). Educational attainment, assets, and daily activities were associated with MCI and dementia. Lower education (OR: 36.31, 95% CI: 3.71–355.84) increased MCI risk. Increased assets reduced MCI risk by 25% (OR: 0.25, 95% CI: 0.08–0.82), and higher activities of daily living scores significantly elevated it (OR: 21.39, 95% CI: 1.22–375.41). Females had a 3.60 times higher dementia risk than males (OR: 3.60, 95% CI: 1.25–10.37). Age and instrumental activities of daily living (IADL) were associated with dementia (age: OR: 1.13, 95% CI: 1.05–1.21; IADL: OR: 1.26, 95% CI: 1.01–1.57). Alcohol consumption reduced the likelihood of developing dementia (OR 0.15, 95% CI: 0.03–0.76). The Shapley values highlighted pain in daily life and lower education levels as predictors of MCI and dementia, respectively.
Conclusions:
ML algorithms, especially XGBoost, exhibited the potential for predicting MCI onset using KLoSA data. However, no model has demonstrated robust accuracy in predicting MCI and dementia. Sociodemographic and health-related factors are crucial for initiating cognitive conditions, emphasizing the need for multifaceted predictive models for early identification and intervention. These findings underscore the potential and limitations of ML in predicting cognitive impairment in community-dwelling older adults.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.