Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Infodemiology

Date Submitted: Mar 30, 2022
Date Accepted: Nov 30, 2022

The final, peer-reviewed published version of this preprint can be found here:

Detecting Tweets Containing Cannabidiol-Related COVID-19 Misinformation Using Transformer Language Models and Warning Letters From Food and Drug Administration: Content Analysis and Identification

Turner J, Kantardzic M, Vickers-Smith R, Brown A

Detecting Tweets Containing Cannabidiol-Related COVID-19 Misinformation Using Transformer Language Models and Warning Letters From Food and Drug Administration: Content Analysis and Identification

JMIR Infodemiology 2023;3:e38390

DOI: 10.2196/38390

PMID: 36844029

PMCID: 9941900

Detecting Tweets Containing Cannabidiol-Related COVID-19 Misinformation Using Transformer Language Models and FDA Warning Letters

  • Jason Turner; 
  • Mehmed Kantardzic; 
  • Rachel Vickers-Smith; 
  • Andrew Brown

ABSTRACT

Background:

The COVID-19 pandemic introduced yet another medical condition for online sellers of loosely regulated substances such as cannabidiol (CBD) to falsely promote sales. As a result, it has become necessary to innovate ways to identify such instances of misinformation.

Objective:

We used transformer-based language models to identify COVID-19 misinformation as it relates to the sales and/or promotion of CBD, by finding tweets that are semantically similar to quotes taken from known instances of misinformation, specifically the publicly available FDA warning letters.

Methods:

We collected tweets using CBD and COVID-19 related terms. Using a previously trained model, we extracted the tweets indicating commercialization/sales of CBD, and annotated those containing COVID-19 misinformation, according to the FDA’s definitions. We encoded the collection of tweets and misinformation quotes into sentence vectors, and then calculated the cosine similarity between each quote and each tweet, so that a threshold could be established to identify tweets that are making false claims regarding CBD and COVID-19, while minimizing the instance of false-positives.

Results:

We demonstrated that by using quotes taken from FDA warning letters of known offenses we can identify semantically similar tweets that also contain similar misinformation. By identifying a cosine distance threshold between the sentence vector of the warning letters and the sentence vector of the tweets, we can identify tweets that contain similar forms of misinformation.

Conclusions:

Our framework shows that commercial CBD/COVID-19 misinformation can potentially be identified and consequently curbed by using transformer-based language models and known prior instances of misinformation. Our approach functions without need for labeled data, potentially reducing the time in which misinformation could be identified. Our proposed framework shows promise in being easily adapted to identify other forms of misinformation related to loosely regulated substances, such as that related to autism, dementia, and Alzheimer’s disease.


 Citation

Please cite as:

Turner J, Kantardzic M, Vickers-Smith R, Brown A

Detecting Tweets Containing Cannabidiol-Related COVID-19 Misinformation Using Transformer Language Models and Warning Letters From Food and Drug Administration: Content Analysis and Identification

JMIR Infodemiology 2023;3:e38390

DOI: 10.2196/38390

PMID: 36844029

PMCID: 9941900

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.