Accepted for/Published in: JMIR Formative Research
Date Submitted: Nov 17, 2021
Date Accepted: Jan 21, 2022
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Identifying health-related discussions of cannabis use on Twitter: a proof-of-concept study
ABSTRACT
Background:
The cannabis product and regulatory landscape is changing in the United States. Against the backdrop of these changes, there have been increasing reports on health-related motives for cannabis use and of adverse events from its use. The use of social media data in monitoring cannabis-related health conversations may be useful to state and federal-level regulatory agencies as they grapple with identifying cannabis safety signals in a comprehensive and scalable fashion.
Objective:
This study attempted to determine the extent to which a medical dictionary, the Unified Medical Language System (UMLS) Consumer Health Vocabulary (CHV), could identify cannabis-related motivations of use and health consequences of its use as discussed on Twitter in 2020.
Methods:
Twitter posts containing cannabis-related terms were obtained from January 1 to August 31, 2020. Each post from the sample (n = 353,353) was classified into at least one of 17 a priori categories of commonly health-related topics, using a rule-based classifier with each category defined by the terms in the medical dictionary. A subsample of posts (n=1094) was then manually annotated to help validate the rule-based classifier and determine if each post pertained to health-related motivations for cannabis use or perceived adverse health effects from its use or neither.
Results:
The validation process suggested that the medical dictionary could identify health-related conversations in 31.2% of posts. Specifically, 20.4% of posts were accurately identified as relating to a health-related motivation for cannabis use, while 10.8% of posts were accurately identified as relating to a health-related consequence from cannabis use. Potential health-related conversations around cannabis use ranged from issues with the respiratory system and stress to the immune system and gastrointestinal problems, among other health topics.
Conclusions:
The mining of social media data may prove helpful in improving surveillance of cannabis products and their adverse health effects. However, future research needs to develop and validate a dictionary and codebook that captures cannabis use-specific health conversations on Twitter.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.