Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Aug 17, 2020
Date Accepted: Feb 3, 2021
Date Submitted to PubMed: Mar 12, 2021

The final, peer-reviewed published version of this preprint can be found here:

Using Automated Machine Learning to Predict the Mortality of Patients With COVID-19: Prediction Model Development Study

Ikemura K, Goldstein D, Szymanski J, Bellin E, Stahl L, Yagi Y, Saada M, Simone K, Reyes Gil M, Billett H

Using Automated Machine Learning to Predict the Mortality of Patients With COVID-19: Prediction Model Development Study

J Med Internet Res 2021;23(2):e23458

DOI: 10.2196/23458

PMID: 33539308

PMCID: 7919846

Using Automated-Machine Learning to Predict COVID-19 Patient Mortality

  • Kenji Ikemura; 
  • D.Y. Goldstein; 
  • James Szymanski; 
  • Eran Bellin; 
  • Lindsay Stahl; 
  • Yukako Yagi; 
  • Mahmoud Saada; 
  • Katelyn Simone; 
  • Morayma Reyes Gil; 
  • Henny Billett

ABSTRACT

Background:

In a pandemic, it is important for clinicians to stratify patients and decide who receives limited medical resources.

Objective:

In this study, we used automated machine learning (autoML) to develop and compare between multiple machine learning (ML) models that predict the chance of patient survival from COVID-19 infection and identified the best-performing model. In addition, we investigated which biomarkers are the most influential in generating an accurate model. We believe an ML model such as this could be a useful tool for clinicians stratifying hospitalized SARS-CoV-2 patients.

Methods:

The data was retrospectively collected from Clinical Looking Glass (CLG) on all patients testing positive for COVID-19 through a nasopharyngeal specimen by real-time RT-PCR and admitted between 3/1/2020-7/3/2020 (4376 patients) at our institution. We collected 47 biomarkers from each patient within 36 hours before or after the index time: RT-PCR positivity, and tracked whether a patient survived or not for one month following this time. We utilized the autoML from H2O.ai, an open source package for R language. The autoML generated 20 ML models and ranked them by area under the precision-recall curve (AUCPR) on the test set. We selected the best model (model_var_47) and chose a threshold probability that maximized F2 score to make a binary classifier: dead or alive. Subsequently, we ranked the relative importance of variables that generated model_var_47 and chose the 10 most influential variables. Next, we reran the autoML with these 10 variables and likewise selected the model with the best AUCPR on the test set (model_var_10). Again, threshold probability that maximized F2 score for model_var_10 was chosen to make a binary classifier. We calculated and compared the sensitivity, specificity, and positive predicate value (PPV) for model_var_10 and model_var_47.

Results:

The best model that autoML generated using all 47 variables was the stacked ensemble model of all models (AUCPR = 0.836). The most influential variables were: systolic and diastolic blood pressure, age, respiratory rate, pulse oximetry, blood urea nitrogen, lactate dehydrogenase, d-dimer, troponin, and glucose. When the autoML was retrained with these 10 most important variables, it did not significantly affect the performance (AUCPR= 0.82). For the binary classifiers, sensitivity, specificity, and PPV of model_var_47 was 83.5%, 87.7%, and 69.8% respectively, while for model_var_10 they were 90.1%, 71.1%, and 51.8% respectively.

Conclusions:

By using autoML, we developed high-performing models that predict patient mortality from COVID-19 infection. In addition, we identified the most important biomarkers correlated with mortality. This ML model can be used as a decision supporting tool for medical practitioners to efficiently triage COVID-19 infected patients. From our literature review, this will be the largest COVID-19 patient cohort to train ML models and the first to utilize autoML.


 Citation

Please cite as:

Ikemura K, Goldstein D, Szymanski J, Bellin E, Stahl L, Yagi Y, Saada M, Simone K, Reyes Gil M, Billett H

Using Automated Machine Learning to Predict the Mortality of Patients With COVID-19: Prediction Model Development Study

J Med Internet Res 2021;23(2):e23458

DOI: 10.2196/23458

PMID: 33539308

PMCID: 7919846

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.