Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Jan 8, 2025
Date Accepted: Aug 23, 2025
The Potential of AI in Nursing Care: A Multi–Center Evaluation in Fall Risk Assessment
ABSTRACT
Background:
With 28%-35% of individuals aged 65 and older experiencing incidents of falling, falls are the second leading cause of unintentional injury-related deaths globally. Limited availability of clinical staff often impedes timely detection and prevention of potential falls. Advances in artificial intelligence (AI) could complement existing fall risk assessment and help to better allocate nursing care resources. Yet, many studies are based on small datasets from a single institution, which can restrict the generalizability of the model, and do not investigate important aspects in AI model development such as fairness across demographic groups.
Objective:
This study aims to provide a comprehensive empirical evaluation of the potential of AI in nursing care, focusing on the case of fall risk prediction. To account for demographic and contextual differences in fall incidences, we analyze data from a university and a geriatric hospital in Germany. To the best of our knowledge, these are the largest datasets for fall risk prediction to date with heterogeneous data distributions. We focus on three key objectives: Does AI help in improving fall risk prediction? Which approaches should be considered and how can AI models be trained safely across different hospitals? Are these models fair?
Methods:
This study used two datasets for fall risk prediction: one from a university hospital with 931,912 subjects, 3,351 of whom experienced falls, and another from a geriatric hospital with 12,773 subjects, 1,728 of whom have fallen. State of the art AI models were used within three experimental approaches. First, separate models were trained on the data from each hospital; second, models were retrained on the respective other dataset; and Federated Learning (FL) was applied to both datasets for collaborative learning. The performance of these models was compared to the rule-based systems for fall risk prediction. Additional analysis was conducted to test for model fairness.
Results:
Our findings demonstrate that AI models consistently outperform rule-based systems across all experimental setups, with AUROC of 0.735 (90% CI 0.727 - 0.744) for the geriatric hospital, and 0.93 (90% CI 0.928 - 0.934) for the university hospital. FL did not improve the fall risk prediction in this setting. Our fairness analysis ruled out disparities in model performance between different gender groups, but we found fairness infringements in age-based performance.
Conclusions:
This study demonstrates that AI models consistently outperform traditional rule-based systems across heterogeneous datasets in predicting fall risk. However, it also reveals the challenges related to demographic shifts and label distribution imbalances, which limited the FL models’ ability to generalize. While the fairness analysis indicated promising predictive parity and equal opportunity across gender subgroups, age-related disparities emerged. Addressing data imbalances and ensuring broader representation across demographic groups will be crucial for developing more fair and generalizable models.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.