Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Public Health and Surveillance

Date Submitted: Apr 17, 2020
Date Accepted: Jul 24, 2020
Date Submitted to PubMed: Aug 13, 2020

The final, peer-reviewed published version of this preprint can be found here:

Early Stage Machine Learning–Based Prediction of US County Vulnerability to the COVID-19 Pandemic: Machine Learning Approach

Mehta M, Julaiti J, Griffin P, Kumara S

Early Stage Machine Learning–Based Prediction of US County Vulnerability to the COVID-19 Pandemic: Machine Learning Approach

JMIR Public Health Surveill 2020;6(3):e19446

DOI: 10.2196/19446

PMID: 32784193

PMCID: 7490002

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Early Stage Prediction of US County Vulnerability to the COVID-19 Pandemic

  • Mehir Mehta; 
  • Juxihong Julaiti; 
  • Paul Griffin; 
  • Soundar Kumara

ABSTRACT

Background:

The rapid spread of COVID-19 means that government and health services providers have little time to plan and design effective response policies. It is therefore important to rapidly provide accurate predictions of how vulnerable geographic regions such as counties are to the spread.

Objective:

To develop county level prediction around near future disease movement for COVID-19 occurrences using publicly available data.

Methods:

We estimate county level COVID-19 occurrences using data from March 14-31, 2020 based on data fused from multiple publicly available sources inclusive of health statistics, demographics, and geographical features. We developed a 3-stage model to quantify, firstly the probability of COVID-19 occurrence for unaffected counties using XGBoost classifier and secondly, the number of potential occurrences of a county via XGBoost regression. Thirdly, these results are combined to compute the county level risk. This risk is then used as an estimated after-five-day-vulnerability of the county.

Results:

Using data from March 14-31, 2020, the model shows a sensitivity over 71.5% and specificity over 94%. We found that population, population density, percentage of people aged 70 or greater and prevalence of comorbidities play an important role in predicting COVID-19 occurrences. We found a positive association between affected and urban counties as well as less vulnerable and rural counties.

Conclusions:

The developed model can be used for identification of vulnerable counties and potential data discrepancies. Limited testing facilities and delayed results introduces significant variation in reported cases and produces a bias in the model. Clinical Trial: Not Applicable


 Citation

Please cite as:

Mehta M, Julaiti J, Griffin P, Kumara S

Early Stage Machine Learning–Based Prediction of US County Vulnerability to the COVID-19 Pandemic: Machine Learning Approach

JMIR Public Health Surveill 2020;6(3):e19446

DOI: 10.2196/19446

PMID: 32784193

PMCID: 7490002

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.