Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Jan 11, 2021
Date Accepted: Apr 17, 2021
Date Submitted to PubMed: Apr 21, 2021

The final, peer-reviewed published version of this preprint can be found here:

Mining and Validating Social Media Data for COVID-19–Related Human Behaviors Between January and July 2020: Infodemiology Study

Daughton AR, Shelley CD, Barnard M, Gerts D, Watson Ross C, Crooker I, Nadiga G, Mukundan N, Vaquera Chavez NY, Fairchild G

Mining and Validating Social Media Data for COVID-19–Related Human Behaviors Between January and July 2020: Infodemiology Study

J Med Internet Res 2021;23(5):e27059

DOI: 10.2196/27059

PMID: 33882015

PMCID: 8153035

Mining and Validating Social Media for COVID-19-Related Human Behaviors between January and July 2020; An Infodemiology Study

  • Ashlynn R. Daughton; 
  • Courtney D. Shelley; 
  • Martha Barnard; 
  • Dax Gerts; 
  • Chrysm Watson Ross; 
  • Isabel Crooker; 
  • Gopal Nadiga; 
  • Nilesh Mukundan; 
  • Nidia Yadria Vaquera Chavez; 
  • Geoffrey Fairchild

ABSTRACT

Background:

Health authorities can minimize the impact of an emergent infectious disease outbreak through effective and timely risk communication, which can build trust and adherence to subsequent behavioral messaging. Monitoring the psychological impacts of an outbreak, as well as public adherence to such messaging is also important for minimizing long-term effects of an outbreak.

Objective:

We used social media data to identify human behaviors relevant to COVID-19 transmission, and the perceived impacts of COVID-19 on individuals, as a first step toward real-time monitoring of public perceptions to inform public health communications.

Methods:

We develop a coding schema for 6 categories and 11 subcategories, which includes both a wide number of behaviors, as well codes focused on the impacts of the pandemic (e.g., economic and mental health impacts). We use this to develop training data and develop supervised learning classifiers for classes with sufficient labels. Classifiers that perform adequately are applied to our remaining corpus and temporal and geospatial trends are assessed. We compare the classified patterns to ground truth mobility data and actual COVID-19 confirmed cases to assess the signal achieved here.

Results:

We apply our labeling schema to ~7200 tweets. The worst performing classifiers have F1 scores of only 0.18-0.28 when trying to identify tweets about monitoring symptoms and testing. Classifiers about social distancing, however, are much stronger, with F1 scores of 0.64-0.66. We applied the social distancing classifiers to over 228 million tweets. We show temporal patterns consistent with real-world events, and show correlations of up to -0.5 between social distancing signals on Twitter and ground truth mobility throughout the United States.

Conclusions:

Behaviors discussed on Twitter are exceptionally varied. Twitter can provide useful information for parameterizing models that incorporate human behavior, as well as informing public health communication strategies by describing awareness of and compliance with suggested behaviors.


 Citation

Please cite as:

Daughton AR, Shelley CD, Barnard M, Gerts D, Watson Ross C, Crooker I, Nadiga G, Mukundan N, Vaquera Chavez NY, Fairchild G

Mining and Validating Social Media Data for COVID-19–Related Human Behaviors Between January and July 2020: Infodemiology Study

J Med Internet Res 2021;23(5):e27059

DOI: 10.2196/27059

PMID: 33882015

PMCID: 8153035

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.