Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Oct 5, 2020
Date Accepted: Apr 11, 2021

The final, peer-reviewed published version of this preprint can be found here:

Machine Learning Methods for the Diagnosis of Chronic Obstructive Pulmonary Disease in Healthy Subjects: Retrospective Observational Cohort Study

Muro S, Ishida M, Horie Y, Takeuchi W, Nakagawa S, Ban H, Nakagawa T, Kitamura T

Machine Learning Methods for the Diagnosis of Chronic Obstructive Pulmonary Disease in Healthy Subjects: Retrospective Observational Cohort Study

JMIR Med Inform 2021;9(7):e24796

DOI: 10.2196/24796

PMID: 34255684

PMCID: 8293159

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Machine learning methods for diagnosis of chronic obstructive pulmonary disease in healthy subjects: Analysis of Risk factors To DEtect COPD (ARTDECO)

  • Shigeo Muro; 
  • Masato Ishida; 
  • Yoshiharu Horie; 
  • Wataru Takeuchi; 
  • Shunki Nakagawa; 
  • Hideyuki Ban; 
  • Tohru Nakagawa; 
  • Tetsuhisa Kitamura

ABSTRACT

Background:

Airflow limitation is a critical physiological feature in chronic obstructive pulmonary disease (COPD), for which long-term exposure to noxious substances including tobacco smoke is an established risk. However, not all long-term smokers develop COPD, meaning that other risk factors exist.

Objective:

To predict risk factors for COPD diagnosis using machine learning in an annual medical check-up database.

Methods:

In this retrospective, observational cohort study (Analysis of Risk factors To DEtect COPD [ARTDECO]), annual medical check-up records for all Hitachi Ltd. employees in Japan collected from April 1998 to March 2019 were analyzed. Employees who provided informed consent via an opt-out model were screened and those aged 30–75 years, without prior diagnosis of COPD, asthma, or history of cancer were included. The database included clinical measurements (e.g., pulmonary function tests) and questionnaire responses. To predict risk factors for COPD diagnosis within a 3-year period, the Gradient Boosting Decision Tree machine learning method (XGBoost) was applied as a primary approach, with logistic regression as a secondary method. A diagnosis of COPD was made when the ratio of the pre-bronchodilator forced expiratory volume in 1 second (FEV1) to pre-bronchodilator forced vital capacity (FVC) was <0.7 during two consecutive examinations.

Results:

Of the 26,101 individuals screened, 1,213 met the exclusion criteria and thus 24,815 individuals were included in the analysis. The top 10 predictors for COPD diagnosis were FEV1/FVC, smoking status, allergic symptoms, cough, pack years, hemoglobin A1c, serum albumin, mean corpuscular volume, percent predicted vital capacity value, and percent predicted value of FEV1. The area under the receiver operating characteristic curves of the XGBoost model and the logistic regression model were 0.956 and 0.943, respectively.

Conclusions:

Using a machine learning model in this longitudinal database, we identified a set multiple of parameters as risk factors other than smoking exposure or lung function to support general practitioners and occupational health physicians to predict the development of COPD. Further research to confirm our results is warranted, as our analysis involved a database used only in Japan. Clinical Trial: Not applicable.


 Citation

Please cite as:

Muro S, Ishida M, Horie Y, Takeuchi W, Nakagawa S, Ban H, Nakagawa T, Kitamura T

Machine Learning Methods for the Diagnosis of Chronic Obstructive Pulmonary Disease in Healthy Subjects: Retrospective Observational Cohort Study

JMIR Med Inform 2021;9(7):e24796

DOI: 10.2196/24796

PMID: 34255684

PMCID: 8293159

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.