JMIR Preprints #38390: Detecting Tweets Containing Cannabidiol-Related COVID-19 Misinformation Using Transformer Language Models and FDA Warning Letters

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Detecting Tweets Containing Cannabidiol-Related COVID-19 Misinformation Using Transformer Language Models and FDA Warning Letters

Jason Turner;
Mehmed Kantardzic;
Rachel Vickers-Smith;
Andrew Brown

ABSTRACT

Background:

The COVID-19 pandemic introduced yet another medical condition for online sellers of loosely regulated substances such as cannabidiol (CBD) to falsely promote sales. As a result, it has become necessary to innovate ways to identify such instances of misinformation.

Objective:

We used transformer-based language models to identify COVID-19 misinformation as it relates to the sales and/or promotion of CBD, by finding tweets that are semantically similar to quotes taken from known instances of misinformation, specifically the publicly available FDA warning letters.

Methods:

We collected tweets using CBD and COVID-19 related terms. Using a previously trained model, we extracted the tweets indicating commercialization/sales of CBD, and annotated those containing COVID-19 misinformation, according to the FDA’s definitions. We encoded the collection of tweets and misinformation quotes into sentence vectors, and then calculated the cosine similarity between each quote and each tweet, so that a threshold could be established to identify tweets that are making false claims regarding CBD and COVID-19, while minimizing the instance of false-positives.

Results:

We demonstrated that by using quotes taken from FDA warning letters of known offenses we can identify semantically similar tweets that also contain similar misinformation. By identifying a cosine distance threshold between the sentence vector of the warning letters and the sentence vector of the tweets, we can identify tweets that contain similar forms of misinformation.

Conclusions:

Our framework shows that commercial CBD/COVID-19 misinformation can potentially be identified and consequently curbed by using transformer-based language models and known prior instances of misinformation. Our approach functions without need for labeled data, potentially reducing the time in which misinformation could be identified. Our proposed framework shows promise in being easily adapted to identify other forms of misinformation related to loosely regulated substances, such as that related to autism, dementia, and Alzheimer’s disease.

Citation

Please cite as:

Turner J, Kantardzic M, Vickers-Smith R, Brown A

Detecting Tweets Containing Cannabidiol-Related COVID-19 Misinformation Using Transformer Language Models and Warning Letters From Food and Drug Administration: Content Analysis and Identification

JMIR Infodemiology 2023;3:e38390

DOI: 10.2196/38390

PMID: 36844029

PMCID: 9941900

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Infodemiology

Date Submitted: Mar 30, 2022

Date Accepted: Nov 30, 2022

Detecting Tweets Containing Cannabidiol-Related COVID-19 Misinformation Using Transformer Language Models and FDA Warning Letters

ABSTRACT

Citation

Copyright