Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Aug 2, 2020
Date Accepted: Oct 8, 2020
Date Submitted to PubMed: Oct 9, 2020
(closed for review but you can still tweet)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Prognostic Assessment of COVID-19 in ICU by Machine Learning Methods: A Retrospective Study
ABSTRACT
Background:
Patients with coronavirus disease (COVID-19) in ICU have a high mortality rate, and how to early assess the prognosis and carry out precise treatment is of great significance.
Objective:
To use machine learning to construct a model for the analysis of risk factors and prediction of death among ICU patients with COVID-19.
Methods:
In this retrospective study, 123 COVID-19 patients in ICU were selected, and data were randomly divided into a training data set (n = 98) and test data set (n = 25) with a 4:1 ratio. Significance tests, analysis of correlation and factor analysis were used to screen the 100 potential risk factors individually. Conventional logistic regression methods and four machine learning algorithms were used to construct the risk prediction model for COVID-19 patients in ICU. Performance of these machine learning models was measured by the area under the receiver operating characteristic curve (AUC). Model interpretation and model evaluation of the risk prediction model were performed to ensure its stability and reliability.
Results:
Layer-by-layer screening of 100 potential risk factors revealed 8 important risk factors that were included in the risk prediction model: lymphocyte percentage (LYM%), prothrombin time (PT), lactate dehydrogenase (LDH), total bilirubin (T-Bil), percentage of eosinophils (EOS%), creatinine(Cr), neutrophil percentage (NEUT%), albumin (ALB) level. Finally, eXtreme Gradient Boosting (XGBoost) established by 8 important risk factors showed the best recognition ability in the training set of 5-fold cross validation (AUC=0.86) and the verification queue (AUC=0.92). The calibration curve showed that the risk predicted by the model was in good agreement with the actual risk. In addition, using SHapley Additive exPlanations (SHAP) and Local Interpretable Model-Agnostic Explanations (LIME) algorithms, feature interpretation and sample prediction interpretation algorithms of the XGBoost black box model were implemented. The model has been translated into an online risk calculator that is freely available to the public ( http://114.251.235.51:1226/index).
Conclusions:
The XGBoost model predicts risk of death in ICU patients with COVID-19 well, and 8 factors help significantly to achieve good predictive effects. After algorithm verification, the model initially demonstrates stability and can be used effectively to predict COVID-19 prognosis in ICU patients. Clinical Trial: na
Citation
