Identifying Frailty in Older Adults Receiving Home Care Assessment Using Machine Learning: Examining the Role of Classifier, Feature Selection, and Sample Size
ABSTRACT
Background:
Machine learning techniques have started to be used in various healthcare datasets to identify frail persons who may benefit from interventions. However, evidence on the performance of machine learning techniques in comparison to conventional regression approach is mixed. It is also unclear what methodological and database factors are associated with the performance.
Objective:
In this study, we aimed to compare the mortality prediction accuracies of various machine learning classifiers for identifying frail older adults in different scenarios.
Methods:
We used de-identified data collected from older adults (aged 65+) who were assessed with interRAI-Home Care (interRAI-HC) in New Zealand between January 1st, 2012, to December 31st, 2016. A total of 138 interRAI assessment items were employed to predict 6-month and 12-month mortality, using three machine learning classifiers (including random forest [RF], extreme gradient boosting [XGBoost], and multilayer perceptron [MLP]) and regularized logistic regression. We conducted a simulation study to compare the performance of machine learning models with logistic regression and with interRAI Home Care Frailty Scale. The effects of sample sizes, number of features and train-test split ratios were examined.
Results:
A total of 95,042 older adults (median age 82.66 years; 39.42% male) receiving home care were analyzed in this study. The average Area Under Curves (AUCs) and sensitivities of 6-month mortality prediction showed that machine learning classifiers did not outperform regularized logistic regressions. In terms of AUCs, regularized logistic regression had better performance than XGBoost, MLP and RF when the number of features ≤80 and sample size ≤16,000; and XGBoost slightly outperformed regularized logistic regression as number of features and sample sizes increased. In terms of sensitivities, regularized logistic regressions substantially outperformed machine learning classifiers in all scenarios. However, machine learning classifiers had higher specificities than regularized logistic regression in all scenarios.
Conclusions:
In situations where the number of features and sample sizes were not overly large, the performance of regularized logistic regression is sufficiently good for identifying frail older adults receiving home care. Machine learning classifier improved predictive accuracy slightly only when the number of feature and sample sizes were large.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.