Accepted for/Published in: JMIR Public Health and Surveillance
Date Submitted: Mar 12, 2020
Date Accepted: Apr 14, 2020
Date Submitted to PubMed: Apr 15, 2020
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Data Mining and Content Analysis of Chinese Social Media Platform Weibo During Early COVID-19 Outbreak: A Retrospective Observational Infoveillance Study
ABSTRACT
Background:
Coronavirus disease 2019 (COVID-19), which originated in Wuhan, China in December 2019, is a rapidly spreading outbreak with over 100,000 cases globally as of early March 2020. Infoveillance approaches using social media can help characterize disease distribution and public knowledge, attitudes, and behaviors during outbreaks.
Objective:
To evaluate the association between number of Chinese social media posts and cases reported in Wuhan City during the early stages of the COVID-19 outbreak.
Methods:
Chinese-language messages from Wuhan were collected for 39 days between December 23, 2019-January 30, 2020 on the Chinese microblogging site Weibo. Total daily cases of COVID-19 in China were obtained from the Chinese National Health Commission. Regression was used to fit a linear model to determine the potential of social media posts to predict the number of cases reported. Qualitative review of social media posts was conducted to determine predominant COVID-19-related user-generated themes.
Results:
115,299 Weibo posts were obtained with an average of 2,956 posts per day (min 0; max 13,587). Regression showed a significant positive relationship between posts and number of reported cases within China and within Hubei province, with approximately 10 more COVID-19 cases per 25 social media posts (p < 0.001) and 10 more cases per 40 social media posts (p < 0.001) respectively. Early outbreak themes were characterized by public uncertainty regarding risks posed by COVID-19, including posts exhibiting protective and higher-risk behavior.
Conclusions:
Results of this study provide initial insight into the origins of the COVID-19 outbreak based upon quantitative and qualitative analysis of Chinese social media data. Future study should continue to explore the utility of social media data to predict COVID-19 disease severity, public reaction, and effectiveness of outbreak communication.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.