Currently submitted to: JMIR Medical Informatics
Date Submitted: Mar 24, 2026
Open Peer Review Period: Apr 13, 2026 - Jun 8, 2026
(currently open for review)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Development and validation of machine-learning models for sarcopenia risk prediction in older adults based on evidence-driven variable selection
ABSTRACT
Background:
Due to the high incidence of sarcopenia in the elderly and the serious adverse consequences, the existing risk prediction tools often lack systematic variable screening, and the generalizability of the model is also limited. Therefore, it is necessary to develop a more reliable risk prediction model.
Objective:
To develop and validate sarcopenia risk prediction models in older adults by integrating evidence-driven variable selection with machine learning for early screening and risk stratification.
Methods:
Extract the candidate risk factors identified through systematic meta-analysis from the the China Health and Retirement Longitudinal Study. Participants (N=2530; prevalence 15.5%) were divided into training sets and test sets in a ratio of 7:3. Use the least absolute contraction and selection olator (LASSO) regression selection predictor to train 10 machine learning models. Use cross-validation, area under the curve (AUC), Brier score, calibration degree and decision curve analysis to evaluate the performance of the model. External verification uses an independent cohort (n=191; incidence rate 16.2%). Shapley Additive Interpretation (SHAP) analyses the contribution of quantitative variables.
Results:
Elastic network, logical regression and ridge regression all showed a strong degree of differentiation in the test set, and no significant differences were observed. The calibration error at baseline is improved through model adjustment. External verification shows that under different thresholds, the model performance is stable and the net benefit is positive. Shapley’s plus interpretation analysis shows that age and body mass index are the most influential factors, while weakness, cognitive function and depressive symptoms also play an independent role.
Conclusions:
Elastic Net, Logistic Regression and Ridge Regression showed strong discrimination, calibration and clinical utility, supporting noninvasive, cost-effective early sarcopenia detection and risk stratification. Clinical Trial: PROSPERO (CRD420251083240), https://www.crd.york.ac.uk/prospero/
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.