Accepted for/Published in: JMIR Public Health and Surveillance
Date Submitted: Apr 17, 2020
Date Accepted: Jul 24, 2020
Date Submitted to PubMed: Aug 13, 2020
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Early Stage Prediction of US County Vulnerability to the COVID-19 Pandemic
ABSTRACT
Background:
The rapid spread of COVID-19 means that government and health services providers have little time to plan and design effective response policies. It is therefore important to rapidly provide accurate predictions of how vulnerable geographic regions such as counties are to the spread.
Objective:
To develop county level prediction around near future disease movement for COVID-19 occurrences using publicly available data.
Methods:
We estimate county level COVID-19 occurrences using data from March 14-31, 2020 based on data fused from multiple publicly available sources inclusive of health statistics, demographics, and geographical features. We developed a 3-stage model to quantify, firstly the probability of COVID-19 occurrence for unaffected counties using XGBoost classifier and secondly, the number of potential occurrences of a county via XGBoost regression. Thirdly, these results are combined to compute the county level risk. This risk is then used as an estimated after-five-day-vulnerability of the county.
Results:
Using data from March 14-31, 2020, the model shows a sensitivity over 71.5% and specificity over 94%. We found that population, population density, percentage of people aged 70 or greater and prevalence of comorbidities play an important role in predicting COVID-19 occurrences. We found a positive association between affected and urban counties as well as less vulnerable and rural counties.
Conclusions:
The developed model can be used for identification of vulnerable counties and potential data discrepancies. Limited testing facilities and delayed results introduces significant variation in reported cases and produces a bias in the model. Clinical Trial: Not Applicable
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.