Estimating a Physiological Lung Function Score and Biological Sex Using Pulmonary Function Tests and Machine Learning: a Retrospective Cohort Study
ABSTRACT
Background:
Sex and age have long been known to affect lung function. Several biological variables and anatomical factors may contribute to sex- and age-related differences in pulmonary metric.
Objective:
We hypothesized that a machine learning model could be trained to predict a person’s lung age and self-reported sex using pulmonary function test (PFT) data.
Methods:
We retrospectively analyzed complete PFTs from 6,392 healthy adults across three Mayo Clinic regions. Four models of increasing complexity were trained using gradient-boosted machines to predict chronological age and biological sex. Model interpretability was assessed using SHAP values and partial dependence plots. Quantile regression was used to estimate reference percentiles for predicted lung age.
Results:
The best-performing age model achieved an RMSE of 7.01 years, while the sex classification model reached an AUC of 0.981, with sensitivity and specificity exceeding 91%. Key predictors for lung age included residual volume as a percentage of total lung capacity, FEV₁, and alveolar volume. For sex classification, peak expiratory flow, height, and age were among the most influential features. Model predictions generalized across age and race subgroups. Predicted lung age increased linearly with chronological age, and quantile regression provided normative reference ranges.
Conclusions:
Applying artificial intelligence to pulmonary function data allows prediction of a patient’s sex and estimation of lung age. The ability of an artificial intelligence algorithm to determine physiological lung age, with further validation, may serve as a measure of overall respiratory health.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.