Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Dec 31, 2024
Open Peer Review Period: Dec 31, 2024 - Feb 25, 2025
Date Accepted: Jul 15, 2025
(closed for review but you can still tweet)
Analyzing Reddit Social Media Content in the United States Related to H5N1: A Sentiment and Topic Modeling Study
ABSTRACT
Background:
The H5N1 avian influenza A virus represents a serious threat to both animal and human health, with the potential to escalate into a global pandemic. Effective monitoring of social media during H5N1 avian influenza outbreaks could potentially offer critical insights to guide public health strategies. Social media platforms like Reddit, with their diverse and region-specific communities, provide a rich source of data that can reveal collective attitudes, concerns, and behavioral trends in real time.
Objective:
This study aims to analyze Reddit posts from state-specific Subreddits in the United States from the most recent outbreak period of 2022 to 2024 to (1) assess the sentiments expressed as the H5N1 outbreak progresses, (2) identify predominant topics discussed, particularly those corresponding to negative sentiments, and (3) explore correlations between these sentiments or topics and the severity and spread of the outbreak in respective regions.
Methods:
Over 2,000 Reddit posts from 160 Subreddits across 11 highly impacted states from February 2022 to July 2024 were collected. Outbreak data comprising almost 600 entries were obtained from the USDA database. Sentiment classification was performed using a fine-tuned BERT Base model, and posts were categorized into six emotions: anger, fear, joy, love, sadness, and surprise, with a seventh “neutral” category added for low-confidence classifications. Topic modeling was conducted using BERTopic and LDA models. Statistical analyses included calculating correlations between sentiment intensity and outbreak severity levels, and applying the Mann-Whitney U test to assess differences between sentiment categories.
Results:
The findings showed that H5N1 outbreaks occurred in waves, with significant surges followed by lulls. States like Minnesota and Iowa were most affected, exhibiting high case counts and numerous outbreaks over time. Sentiment analysis revealed that negative emotions—“sadness,” “anger,” and “fear”—dominated discussions, comprising about 90% of posts. When accounting for a three-week delay in reactions to outbreak severity changes, weak positive correlations emerged between outbreak severity and the intensity levels of “anger,” “sadness,” and “joy” sentiments. The sentiment of “fear” showed a modest immediate correlation without temporal adjustment. Topic modeling highlighted concerns about the virus spreading among bird populations, rising egg prices due to poultry shortages, and the economic impact of culling policies.
Conclusions:
Overall, these results underscore the critical role of social media analysis in understanding public reactions, including prevalent themes and sentiments, and guiding timely, targeted public health interventions during the H5N1 outbreak.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.