Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Nov 5, 2020
Date Accepted: Dec 17, 2020
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Accurately Differentiating COVID-19, Other Viral Infection, and Healthy Individuals: a Multimodal Late Fusion Learning Approach
ABSTRACT
Background:
Effectively identifying COVID-19 patients using non-PCR biomedical data is critical for the optimal clinical outcomes. Currently, there is a lack of comprehensive understanding of various biomedical features and appropriate analytical approaches to enable early detection and effective diagnosis of COVID-19 patients.
Objective:
We aim to combine low-dimensional clinical and lab testing, as well as high-dimensional CT imaging data to accurately differentiate healthy individuals, COVID-19 and non-COVID viral pneumonia patients, especially at early stage of infection.
Methods:
In this study, we recruited 214 non-severe (NS) and 148 severe (S) COVID-19 patients, 198 non-infected healthy (H) participants and 129 non-COVID viral pneumonia (V) patients. The participants’ clinical information (23 features), lab testing results (10 features), and CT scans upon admission were acquired as three input feature modalities. To enable late fusion of multimodal features, we constructed a deep learning model to extract a 10-feature high-level representation of the CT scans. Three machine learning models (k-nearest neighbor kNN, random forest RF, and support vector machine SVM) were then developed based on the 43 features combined from all three modalities to differentiate the four classes: NS, S, H, and V.
Results:
Multimodal features provided substantial performance gain from using any single feature modality. All three machine learning models had high accuracy to differentiate the overall four classes (95.4%-97.7%) and each individual class (90.6%-99.9%) on prediction set. Multimodal features provided substantial performance gain from using any single feature modality.
Conclusions:
Compared to existing binary classification benchmarks often focusing on single feature modality, this study provided a novel and effective breakthrough for clinical applications. Findings from a relatively large sample size and the analytical workflow will supplement and assist as clinical decision support for current COVID-19 and other clinical applications with high-dimensional multimodal biomedical features.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.