Accepted for/Published in: JMIRx Med
Date Submitted: Apr 5, 2021
Date Accepted: Sep 14, 2021
Date Submitted to PubMed: Aug 4, 2023
Prediction of COVID-19 Mortality with Limited Attributes to Expedite Patient Prognosis and Triage: Retrospective Observational Study
ABSTRACT
Background:
The onset and development of the COVID-19 pandemic has tested the limits of hospital resources and staff across the world. This shortage can be in part improved by the integration of predictive modelling in prognosis and triage related decision making.
Objective:
The objective of this study is to assess the performance of predictive modelling in early flagging of mortality risk in hospital admitted COVID-19 patients to support timely triage. Additionally, gaps in the relevant literature will be addressed firstly by examining the effect of extreme dimensionality reduction techniques on performance for expedited decision making, secondly by employing a highly geographically and demographically diverse sample and finally by thoroughly assessing limitations of the widely cited, publicly available dataset this study draws from.
Methods:
Two machine learning classifiers, Random Forest and Logistic Regression, are employed to predict an outcome of either death or recovery in patients, training on balanced data and testing on imbalanced real-world data. Their performance is assessed on the basis of accuracy, sensitivity, specificity and ROC derived Area Under the Curve (AUC). Additionally, Mutual Information will be used as a dimensionality reduction technique to test performance on two separate attribute sizes, one with 25 attributes and one with 7, to test for retained efficacy in the smaller dataset. The above methods will be tested first on a small sample of 212 highly populated entries (demographics, symptoms and comorbidities) and later on a larger sample of 5,121 entries containing only information on patient age to verify for the standalone performance of this last attribute.
Results:
Performance on a semi-evenly balanced class sample of 212 patients resulted in high mortality detection accuracy of 92.5%, with strong specificity and sensitivity. Performance on a larger sample of 5,121 patients with only age and mortality information was added as a measure of baseline discriminatory ability. Stratifying - Random Forest - and linear - Logistic Regression - methods were applied, both achieving modestly strong performance, with 77.4%-79.3% sensitivity and 71.4%-72.6% accuracy, highlighting predictive power even on the basis of a single attribute. Mutual information was employed as a dimensionality reduction technique, greatly improving performance and showing how a small number of easily retrievable attributes can provide timely and accurate predictions, with applications for datasets with slowly available variables - such as laboratory results.
Conclusions:
Predictive statistical models have promising performance in early prediction of death among COVID-19 patients, with applications for improved hospital setting prognosis and triage. While a large number of predictors is informative, reduced datasets performed even better, decreasing the time and resources required to log relevant patient data prior to decision making. Finally, age alone was found to be an extremely good baseline predictor, providing moderately strong predictions even on a standalone basis.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.