Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Infodemiology

Date Submitted: Jul 23, 2023
Date Accepted: Jun 18, 2024

The final, peer-reviewed published version of this preprint can be found here:

The Use of Natural Language Processing Methods in Reddit to Investigate Opioid Use: Scoping Review

Almeida A, Patton T, Conway M, Gupta A, Strathdee SA, Bórquez A

The Use of Natural Language Processing Methods in Reddit to Investigate Opioid Use: Scoping Review

JMIR Infodemiology 2024;4:e51156

DOI: 10.2196/51156

PMID: 39269743

PMCID: 11437337

The use of natural language processing methods in Reddit to investigate opioid use: a scoping review

  • Alexandra Almeida; 
  • Thomas Patton; 
  • Mike Conway; 
  • Amarnath Gupta; 
  • Steffanie A Strathdee; 
  • Annick Bórquez

ABSTRACT

Background:

The growing availability of big data spontaneously generated by social media platforms leverages natural language processing (NLP) methods as valuable tools to understand the opioid crisis.

Objective:

We aimed to understand how NLP has been applied to Reddit data to study opioid use.

Methods:

We systematically searched for peer-reviewed studies and conference abstracts in PubMed, Scopus, PsychInfo, ACL Anthology, IEEE, and ACM data repositories, until July 19th, 2022. Inclusion criteria were studies (i) investigating opioid use, (ii) using NLP techniques to analyze the textual corpora, and (iii) using Reddit as the social media data source. We were specifically interested in mapping studies’ (a) overarching goals and findings, (b) methodologies and software used, and (c) main limitations.

Results:

Thirty studies were included, which were classified into four non-mutually exclusive “overarching goal” categories: methodological (six studies), infodemiology (twenty-two studies), infoveillance (seven studies), and pharmacovigilance (three studies). NLP methods were used to identify content relevant to opioid use among vast quantities of textual data, to establish potential relationships between opioid use patterns/profiles and contextual factors or comorbidities, and to anticipate individuals’ transitions between different opioid-related subreddits, likely revealing progression through opioid use stages. Most studies employed an embedding technique (12/30), or/and prediction/classification approach (12/30), or/and topic modeling (9/30), or/and sentiment analysis (6/30). The most frequently used programming languages were Python (20/30) and R (2/30). The most cited limitation was the inability to verify whether posts originated from the population of interest (i.e., people using opioids). The papers were very recent (28/30 were from 2019 to 2022), with authors from a range of disciplines.

Conclusions:

This scoping review identified a wide variety of NLP techniques and applications used to support surveillance and social media interventions addressing the opioid crisis. Despite the clear potential of these methods to enable the identification of opioid-relevant content in Reddit and its analysis, there are limits to the degree of interpretive meaning that they can provide. Moreover, we identified the need for standardized ethical guidelines to govern the utilization of Reddit data to safeguard the anonymity and privacy of people using these forums. Clinical Trial: https://osf.io/ftqj3


 Citation

Please cite as:

Almeida A, Patton T, Conway M, Gupta A, Strathdee SA, Bórquez A

The Use of Natural Language Processing Methods in Reddit to Investigate Opioid Use: Scoping Review

JMIR Infodemiology 2024;4:e51156

DOI: 10.2196/51156

PMID: 39269743

PMCID: 11437337

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.