Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: May 9, 2021
Open Peer Review Period: May 7, 2021 - Jul 2, 2021
Date Accepted: Nov 21, 2021
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Identifying Electronic Nicotine Delivery System Brands and Flavors on Instagram: Natural Language Processing Analysis

Chew R, Wenger M, Guillory J, Nonnemaker J, Kim A

Identifying Electronic Nicotine Delivery System Brands and Flavors on Instagram: Natural Language Processing Analysis

J Med Internet Res 2022;24(1):e30257

DOI: 10.2196/30257

PMID: 35040793

PMCID: 8808345

Identifying Electronic Nicotine Delivery Systems Brands and Flavors on Instagram: A Natural Language Processing Analysis

  • Rob Chew; 
  • Michael Wenger; 
  • Jamie Guillory; 
  • James Nonnemaker; 
  • Annice Kim

ABSTRACT

Background:

Electronic nicotine delivery systems (ENDS) brands, like JUUL, used social media as a key component of their marketing strategy, which led to massive sales growth from 2015–2018. During this time, ENDS use rapidly increased among youth and young adults with flavored products being particularly popular among these groups.

Objective:

The objective of our study was to develop a named entity recognition (NER) model to identify potential emerging vaping brands and flavors from Instagram post text. NER is a natural language processing task for identifying specific types of words (entities) in text, based on characteristics of the entity and surrounding words.

Methods:

NER models were trained on a labeled data set of 2,272 Instagram posts coded for ENDS brands and flavors. We employed two types of NER models—conditional random fields (CRF) and residual convolutional neural network (RCNN)—to identify brands and flavors in Instagram posts with key model outcomes of precision, recall, and F1 scores. We used data from Nielsen scanner sales and Wikipedia to create benchmark ENDS brands lists to determine if brands from established ENDS brands lists were mentioned in the Instagram posts in our sample. To prevent overfitting, we performed 5-fold cross validation and report the mean and standard deviation of the model validation metrics across the folds.

Results:

The RCNN exhibited the highest mean precision (79.7), and the CRF exhibited the highest mean recall (49.6). NER models outperformed the benchmark brand list matching on mean precision, recall, and F1. However, there was greater variation in precision in the NER flavor models (RCNN: SD= 23.2; CRF: SD= 20.1) than Nielsen data matching (scanner: SD= 10.2). Comparing the benchmark brand lists, the Wikipedia list outperformed the Nielsen list in both precision (Nielsen: mean= 8.2; Wikipedia: mean= 22.4) and recall (Nielsen: mean= 2.2; Wikipedia: mean= 10.2).

Conclusions:

Findings suggest that NER models correctly identified ENDS brands and flavors in Instagram posts at rates comparable to others in the published literature. Identified brands showed little overlap with those in Nielsen scanner data, suggesting NER models may be capturing emerging brands with limited sales and distribution. NER models address challenges of manual brand identification (e.g., time-consuming, difficult without pre-existing brand lists). Brands identified on social media should be cross validated with Nielsen and other data sources, to differentiate emerging brands that become established from those with limited sales and distribution


 Citation

Please cite as:

Chew R, Wenger M, Guillory J, Nonnemaker J, Kim A

Identifying Electronic Nicotine Delivery System Brands and Flavors on Instagram: Natural Language Processing Analysis

J Med Internet Res 2022;24(1):e30257

DOI: 10.2196/30257

PMID: 35040793

PMCID: 8808345

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.