Accepted for/Published in: JMIR Infodemiology
Date Submitted: Jan 31, 2022
Date Accepted: Sep 15, 2022
Using Reddit data to investigate perspectives on the COVID-19 pandemic using natural language processing: a comparative study of the US, the UK, Canada and Australia
ABSTRACT
Background:
Since COVID-19 was declared a pandemic by the World Health Organization (WHO) on March 11, 2020, the disease has had an unprecedented impact worldwide, with, as of December 21, 2021, more than 276 million confirmed cases and 5.3 million deaths[1]. Social media such as Reddit can serve as a resource for enhancing situational awareness, particularly regarding monitoring public attitudes and behavior during the crisis. Insights gained can then be utilized to better understand public attitudes and behaviors during the COVID-19 crisis, and to support communication and health promotion messaging.
Objective:
With this work, we compare public attitudes towards the 2020/2021 COVID-19 pandemic across four predominantly English-speaking countries (the United States, the United Kingdom, Canada, and Australia) using data derived from the social media platform Reddit.
Methods:
We utilized a natural language processing method called topic modeling (more specifically Latent Dirichlet Allocation). Topic modeling is a popular unsupervised learning technique that can be used to automatically in- fer topics (i.e. semantically-related categories) from a large corpus of text. We derived our data from six country-specific, COVID-19-related subreddits (r/CoronavirusAustralia, r/CoronavirusDownunder, r/CoronavirusCanada, r/CanadaCoronavirus, r/CoronavirusUK, r/coronavirusus). We used topic modeling methods to investigate and compare topics of concern for each country.
Results:
From the Reddit data we found that (1) the volume of posting declined consistently across all four countries during the study period (Feb. 2020 to Nov. 2020); (2) during lockdown events, the volume of posts peaked; and (3) the UK and Australian subreddits contained much more policy discussion – and less conspiratorial content – than the US or Canadian subreddits.
Conclusions:
This work demonstrated that (a) there were key differences between salient topics discussed across the four countries, and (b) Reddit data has the potential to provide insights not readily apparent in survey-based approaches.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.