Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: May 14, 2020
Date Accepted: Jul 24, 2020
Date Submitted to PubMed: Sep 22, 2020

The final, peer-reviewed published version of this preprint can be found here:

Real-Time Forecasting of the COVID-19 Outbreak in Chinese Provinces: Machine Learning Approach Using Novel Digital Data and Estimates From Mechanistic Models

Poirier C, Liu D, Clemente L, Ding X, Chinazzi M, Davis J, Vespignani A, Santillana M

Real-Time Forecasting of the COVID-19 Outbreak in Chinese Provinces: Machine Learning Approach Using Novel Digital Data and Estimates From Mechanistic Models

J Med Internet Res 2020;22(8):e20285

DOI: 10.2196/20285

PMID: 32730217

PMCID: 7459435

A machine learning method to forecast in real-time the COVID-19 outbreak in Chinese provinces using novel digital data and estimates from mechanistic models.

  • Canelle Poirier; 
  • Dianbo Liu; 
  • Leonardo Clemente; 
  • Xiyu Ding; 
  • Matteo Chinazzi; 
  • Jessica Davis; 
  • Alessandro Vespignani; 
  • Mauricio Santillana

ABSTRACT

Background:

The inherent difficulty to identify and monitor emerging outbreaks caused by novel pathogens can lead to their rapid spread; and if left unchecked, they may become major public health threats to the planet. The ongoing COVID-19 outbreak, which has infected over 2,300,000 individuals and caused over 150,000 deaths, is an example of one of these catastrophic events.

Methods:

We present a timely and novel methodology that combines disease estimates from mechanistic models with digital traces, via interpretable machine-learning methodologies, to reliably forecast COVID-19 activity in Chinese provinces in real-time. Specifically, our method uses as inputs (a) official health reports (b) COVID-19-related internet search activity (c) news media activity and (d) daily forecasts of COVID-19 activity from a metapopulation mechanistic model. Our machine-learning methodology uses a clustering technique that enables the exploitation of geo-spatial synchronicities of COVID-19 activity across Chinese provinces, and a data augmentation technique to deal with the small number 1 of historical disease observations, characteristic of emerging outbreaks.

Results:

Our model is able to produce stable and accurate forecasts two days ahead of current time, and outperforms a collection of baseline models in 27 out of the 32 Chinese provinces

Conclusions:

Our methodology could be easily extended to other geographies currently affected by the COVID-19 outbreak to help decision makers.


 Citation

Please cite as:

Poirier C, Liu D, Clemente L, Ding X, Chinazzi M, Davis J, Vespignani A, Santillana M

Real-Time Forecasting of the COVID-19 Outbreak in Chinese Provinces: Machine Learning Approach Using Novel Digital Data and Estimates From Mechanistic Models

J Med Internet Res 2020;22(8):e20285

DOI: 10.2196/20285

PMID: 32730217

PMCID: 7459435

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.