Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Formative Research

Date Submitted: Nov 9, 2023
Date Accepted: Apr 1, 2024

The final, peer-reviewed published version of this preprint can be found here:

Examining the Gateway Hypothesis and Mapping Substance Use Pathways on Social Media: Machine Learning Approach

Yuan Y, Kasson E, Taylor J, Cavazos-Rehg P, De Choudhury M, Aledavood T

Examining the Gateway Hypothesis and Mapping Substance Use Pathways on Social Media: Machine Learning Approach

JMIR Form Res 2024;8:e54433

DOI: 10.2196/54433

PMID: 38713904

PMCID: 11109860

Examining the Gateway Hypothesis and Mapping Substance Use Pathways on Social Media: A Machine Learning Approach

  • Yunhao Yuan; 
  • Erin Kasson; 
  • Jordan Taylor; 
  • Patricia Cavazos-Rehg; 
  • Munmun De Choudhury; 
  • Talayeh Aledavood

ABSTRACT

Background:

Substance misuse presents significant global public health challenges. Understanding transitions between substance types and the timing of shifts to polysubstance use is vital for targeted prevention, harm reduction, and recovery strategies. The longstanding gateway hypothesis suggests high-risk substance use is preceded by lower-risk substance use. However, the source of this correlation is hotly contested. While some claim that low-risk substance use causes subsequent, riskier substance use, most users of low-risk substances also do not escalate to higher-risk substances. Social media data holds the potential to shed light on the factors contributing to substance use transitions.

Objective:

By leveraging social media data, our study aims to gain a better understanding of substance use pathways. By identifying and analyzing the transitions of individuals between different risk levels of substance use, our goal is to find specific linguistic cues in individuals' social media posts that could be indicative of escalating or de-escalating patterns in substance use.

Methods:

We conducted a large-scale analysis using data from Reddit, collected between 2015 and 2019, consisting of over 2.29 million posts and approximately 29.37 million comments by around 1.4 million users from subreddits. This data, derived from substance use subreddits, facilitated the creation of a risk transition dataset reflecting the substance use behaviors of over 1.4 million users. We deployed deep learning and machine learning techniques, including fine-tuned BERT and RoBERTa models, to predict the escalation or de-escalation in risk levels based on initial transition phases documented in posts and comments. Additionally, we conducted an extensive linguistic analysis to analyze the language patterns associated with transitions in substance use, emphasizing the role of n-gram features in predicting future risk trajectories.

Results:

Our results showed promise in predicting the escalation or de-escalation in risk levels based on the historical data of Reddit users created on initial transition phases among drug-related subreddits with an accuracy of 78.48% and an F1-score of 79.20%. We highlighted the vital predictive features, such as specific substance names and tools indicative of future risk escalations. Our linguistic analysis showed terms linked with harm reduction strategies were instrumental in signaling de-escalation, whereas descriptors of frequent substance use were characteristic of escalating transitions.

Conclusions:

This study sheds light on the complexities surrounding the gateway hypothesis of substance use through an examination of online behavior on Reddit. While certain findings validate the hypothesis, indicating a progression from lower-risk substances like marijuana to higher-risk ones, a significant number of individuals did not showcase this transition. The research underscores the potential of using machine learning in conjunction with social media analysis for predicting substance use transitions. Our results emphasize the role of linguistic features as predictors and the importance of timely interventions.


 Citation

Please cite as:

Yuan Y, Kasson E, Taylor J, Cavazos-Rehg P, De Choudhury M, Aledavood T

Examining the Gateway Hypothesis and Mapping Substance Use Pathways on Social Media: Machine Learning Approach

JMIR Form Res 2024;8:e54433

DOI: 10.2196/54433

PMID: 38713904

PMCID: 11109860

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.