Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Jan 22, 2025
Date Accepted: Jun 20, 2025
Estimating Biological Age by Integrating Morbidity and Mortality Data Using Unsupervised and Self-Supervised Deep Learning
ABSTRACT
Background:
Biological age (BA) is increasingly recognized as a valuable alternative to chronological age (CA) for assessing an individual’s health and aging status. However, existing models are based on limited clinical parameters and have not thoroughly integrated morbidity and mortality data.
Objective:
This study aimed to develop and validate a novel transformer-based model, referred to as the BA–CA gap model, for BA estimation that incorporates morbidity and mortality information to improve predictive accuracy and enhance clinical utility for early identification of the risk of age-related diseases.
Methods:
We retrospectively analyzed data from 151,281 adults aged 18 years or older who underwent routine health checkups between 2003 and 2020. Participants were classified as normal, pre-disease, and disease groups based on comorbidities (diabetes, hypertension, and dyslipidemia) for evaluation of the model’s ability to discriminate health status along a clinically relevant spectrum. For variables with less than 50% missingness, missing values were imputed using the mean, while features with 50% or greater missingness were excluded. A custom transformer architecture was developed to learn multiple objectives simultaneously, including input feature reconstruction, BA and CA alignment, health status discrimination, and mortality prediction. Model training employed unsupervised and self-supervised strategies. We compared our model’s performance to conventional BA estimation approaches, including the Klemera and Doubal’s method, CA cluster-based model, and deep neural network, by examining BA gap distributions, health status stratification, and mortality prediction.
Results:
The proposed BA–CA gap model provided a more accurate reflection of health status and superior stratification of mortality risk compared to existing methods. The model effectively distinguished among normal, pre-disease, and disease groups, with a clear gradient of BA gap values. Kaplan-Meier analyses demonstrated stronger discrimination of future mortality in men, while a similar trend, though not statistically significant, was observed in women. Sensitivity analyses across multiple random splits and training subsets confirmed the robustness of the model’s performance.
Conclusions:
By integrating morbidity and mortality information within a transformer-based framework, the BA–CA gap model offers a more granular and clinically meaningful assessment of aging and health status than CA alone. This approach supports the potential for personalized health management and risk stratification, although external validation in diverse populations is warranted to further confirm its generalizability.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.