Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR AI

Date Submitted: Dec 5, 2025
Date Accepted: May 7, 2026

The final, peer-reviewed published version of this preprint can be found here:

Estimating a Physiological Lung Function Score and Biological Sex Using Pulmonary Function Tests and Machine Learning: Retrospective Study

Johnson PW, Quicksall ZS, Lee J, Lee AS, Lim KG, Ortega VE, Arunachalam SP, Helgeson SA

Estimating a Physiological Lung Function Score and Biological Sex Using Pulmonary Function Tests and Machine Learning: Retrospective Study

JMIR AI 2026;5:e89060

DOI: 10.2196/89060

PMID: 42224015

Estimating a Physiological Lung Function Score and Biological Sex Using Pulmonary Function Tests and Machine Learning: a Retrospective Cohort Study

  • Patrick W Johnson; 
  • Zachary S Quicksall; 
  • Jieun Lee; 
  • Augustine S Lee; 
  • Kaiser G Lim; 
  • Victor E Ortega; 
  • Shivaram Poigai Arunachalam; 
  • Scott A Helgeson

ABSTRACT

Background:

Sex and age have long been known to affect lung function. Several biological variables and anatomical factors may contribute to sex- and age-related differences in pulmonary metric.

Objective:

We hypothesized that a machine learning model could be trained to predict a person’s lung age and self-reported sex using pulmonary function test (PFT) data.

Methods:

We retrospectively analyzed complete PFTs from 6,392 healthy adults across three Mayo Clinic regions. Four models of increasing complexity were trained using gradient-boosted machines to predict chronological age and biological sex. Model interpretability was assessed using SHAP values and partial dependence plots. Quantile regression was used to estimate reference percentiles for predicted lung age.

Results:

The best-performing age model achieved an RMSE of 7.01 years, while the sex classification model reached an AUC of 0.981, with sensitivity and specificity exceeding 91%. Key predictors for lung age included residual volume as a percentage of total lung capacity, FEV₁, and alveolar volume. For sex classification, peak expiratory flow, height, and age were among the most influential features. Model predictions generalized across age and race subgroups. Predicted lung age increased linearly with chronological age, and quantile regression provided normative reference ranges.

Conclusions:

Applying artificial intelligence to pulmonary function data allows prediction of a patient’s sex and estimation of lung age. The ability of an artificial intelligence algorithm to determine physiological lung age, with further validation, may serve as a measure of overall respiratory health.


 Citation

Please cite as:

Johnson PW, Quicksall ZS, Lee J, Lee AS, Lim KG, Ortega VE, Arunachalam SP, Helgeson SA

Estimating a Physiological Lung Function Score and Biological Sex Using Pulmonary Function Tests and Machine Learning: Retrospective Study

JMIR AI 2026;5:e89060

DOI: 10.2196/89060

PMID: 42224015

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.