JMIR Preprints #30157: COVID-19 mortality prediction from deep learning in a large multistate EHR and LIS dataset: algorithm development and validation

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

COVID-19 mortality prediction from deep learning in a large multistate EHR and LIS dataset: algorithm development and validation

Saranya Sankaranarayanan;
Jagadheshwar Balan;
Jesse R. Walsh;
Yanhong Wu;
Sara J. Minnich;
Amy L. Piazza;
Collin Osborne;
Gavin R. Oliver;
Jessica L. Lesko;
Kathy L. Bates;
Kia Khezeli;
Darci R. Block;
Margaret A. DiGuardo;
Justin Kreuter;
John C. O’Horo;
Iman J. Kalantari;
Eric W. Klee;
Mohamed E. Salama;
Benjamin R. Kipp;
William G. Morice II;
Garrett Jenkinson

ABSTRACT

Background:

COVID-19 is caused by the SARS-CoV-2 virus and has strikingly heterogeneous clinical manifestations with most individuals contracting mild disease but a substantial minority experiencing fulminant cardiopulmonary symptoms or death. The clinical covariates and the lab tests performed on a patient provides robust statistics to guide clinical treatment. Deep learning approaches on a dataset of this nature enables patient stratification and provide methods to guide clinical treatment.

Objective:

Here we report on the development and prospective validation of a state-of-the-art machine learning model to provide mortality prediction shortly after confirmation of SARS-CoV-2 infection in the Mayo Clinic patient population.

Methods:

We constructed one of the largest reported and most geographically diverse laboratory information system (LIS) and electronic health record (EHR) COVID-19 datasets in the published literature, which included 11,808 patients with residence in 41 states, treated at medical sites across five states in three time zones. This data was split by date into an 80/20 training and prospective testing cohort. In the training data, model selection and evaluation were performed using stratified 10-fold cross-validation. Traditional machine learning models were evaluated independently as well as in a stacked learner approach using Autogluon, and various recurrent neural network architectures were considered. We trained these models to operate solely using routine laboratory measurements and clinical covariates available within 72 hours of a patient’s first positive COVID-19 nucleic acid test.

Results:

The GRU-D recurrent neural network achieved peak cross-validation performance with 0.938±0.004 AUROC. In cross-validation, this model provides accuracy of 89% (95% CI: [88,90]), a recall of 80% (95% CI: [74,85]), a precision of 17% (95% CI: [15,19]), a negative predictive value (NPV) of 99% (95% CI: [99,100]), and statistically significant stratification in our Cox proportional hazards survival model (risk 18.9, P<.001). The model retained strong performance when reducing the follow-up time down to 12 hours (0.916±0.005 AUROC), and leave-one-out feature importance analysis indicates the most independently valuable features were: age, Charlson score, minimum oxygen saturation, fibrinogen and serum iron level. In the prospective testing cohort this model provides AUROC of 0.901, an accuracy of 78% (95% CI: [76,79]), a recall of 85% (95% CI: [77,91]), a precision of 14% (95% CI: [12,17]), a negative predictive value (NPV) of 99% (95% CI: [99,100]), and statistically significant difference in survival (P<.001, hazard ratio for those predicted to survive: 95% CI [0.043,0.106]).

Conclusions:

Our deep learning approach using GRU-D provides an alert system to flag mortality on COVID-19 positive patients, using clinical covariates and lab values within a 72-hour window after the first positive nucleic acid test.

Citation

Please cite as:

Sankaranarayanan S, Balan J, Walsh JR, Wu Y, Minnich SJ, Piazza AL, Osborne C, Oliver GR, Lesko JL, Bates KL, Khezeli K, Block DR, DiGuardo MA, Kreuter J, O’Horo JC, Kalantari IJ, Klee EW, Salama ME, Kipp BR, Morice WG II, Jenkinson G

COVID-19 Mortality Prediction From Deep Learning in a Large Multistate Electronic Health Record and Laboratory Information System Data Set: Algorithm Development and Validation

J Med Internet Res 2021;23(9):e30157

DOI: 10.2196/30157

PMID: 34449401

PMCID: 8480399

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: May 3, 2021

Date Accepted: Aug 11, 2021

Date Submitted to PubMed: Aug 27, 2021

COVID-19 mortality prediction from deep learning in a large multistate EHR and LIS dataset: algorithm development and validation

ABSTRACT

Citation

Copyright