Accepted for/Published in: JMIR Infodemiology
Date Submitted: Oct 8, 2024
Open Peer Review Period: Oct 10, 2024 - Dec 5, 2024
Date Accepted: May 27, 2025
(closed for review but you can still tweet)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Differential Analysis of Age, Gender, Race, Sentiment, and Emotion in Substance Use Discourse on Twitter during the COVID-19 Pandemic: An NLP Approach
ABSTRACT
Background:
User Demographics are often hidden in social media data due to privacy concerns. However, demographic information on Substance Use can provide valuable insights, allowing Public Health policymakers to focus on specific cohorts and develop efficient prevention strategies, especially during global crises like COVID-19.
Objective:
Our study aims to analyze Substance Use trends in User level across different demographic dimensions; such as Age, Gender and Race/Ethnicity, focusing on COVID-19 pandemic. The study also establishes a baseline for substance use trends using social media data.
Methods:
The study is carried out in large scale Twitter data in the English language over a 3 year period; 2019, 2020 and 2021, which comprises 1.05 billions of posts. Following preprocessing, the substance use posts were identified using our custom trained deep learning model (RoBERTa) that resulted in identification of 9 million Substance Use posts. Then, demographic attributes like User Type, Age, Gender, Race/Ethnicity, and Sentiment types, and emotions associated with each post were extracted via a collection of natural language processing modules. Finally, various qualitative analyses were performed to get the insight of user behaviors based on the demographics.
Results:
The highest level of usership in SU discussions was observed in 2020, with increases of 22.18% compared to 2019 and 25.24% compared to 2021. Throughout the study period, Male and Teenagers increasingly dominated the Substance Use discussions in all substances. During the pandemic, Prescription Medication among Female usership was observed high compared to other substances. Additionally, Alcohol usership increased by 80% within two weeks after the Global Pandemic declaration in 2020.
Conclusions:
Our study presents a large-scale, fine-grained analysis of Substance Use on social media data by age, gender and race/ethnicity before, during, and after COVID-19 pandemic. Overall, our analysis from social media data provides a new baseline study for substance usage that can help in prevention of substance use in an efficient manner.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.