Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Dec 13, 2020
Date Accepted: May 6, 2021
Date Submitted to PubMed: Aug 12, 2021

The final, peer-reviewed published version of this preprint can be found here:

Using Machine Learning–Based Approaches for the Detection and Classification of Human Papillomavirus Vaccine Misinformation: Infodemiology Study of Reddit Discussions

Du J, Preston S, Sun H, Shegog R, Cunningham R, Boom J, Savas L, Amith M, Tao C

Using Machine Learning–Based Approaches for the Detection and Classification of Human Papillomavirus Vaccine Misinformation: Infodemiology Study of Reddit Discussions

J Med Internet Res 2021;23(8):e26478

DOI: 10.2196/26478

PMID: 34383667

PMCID: 8380585

Utilizing machine learning-based approaches for the detection and classification of human papillomavirus (HPV) vaccine misinformation: Infodemiology Study of Reddit Discussions

  • Jingcheng Du; 
  • Sharice Preston; 
  • Hanxiao Sun; 
  • Ross Shegog; 
  • Rachel Cunningham; 
  • Julie Boom; 
  • Lara Savas; 
  • Muhammad Amith; 
  • Cui Tao

ABSTRACT

Background:

Vaccine misinformation shared on social media poses a substantial threat to community safety.

Objective:

To develop and evaluate an intelligent, automated protocol to identify and classify HPV vaccine misinformation on social media, using machine learning (ML)-based methods

Methods:

Reddit posts (2007–2017, N = 28,121) were compiled that contained human papillomavirus (HPV) vaccine-related keywords. A two-step pipeline was proposed for misinformation identification and classification. A random subset (N = 2,200) was manually labeled for misinformation and served for the training and evaluation of ML algorithms (e.g., convolutional neural network [CNN]) for misinformation identification. The trained CNN model was applied to identify the misinformation from un-labeled posts. Then, for the posts that were inferred containing misinformation, topic modeling was further applied to identify the major categories (i.e., classification) associated with HPV vaccine misinformation.

Results:

The CNN model achieved the highest area under the receiver operating characteristic curve (AUC) at 0.7943 in the identification of misinformation. Of 28,121 Reddit posts, 7,207 (25.63%) were identified containing misinformation. Topic modeling then classified major misinformation categories from these posts, including general safety issues, which was identified as the leading type of misinformed posts (37%).

Conclusions:

ML-based approaches are effective in the identification and classification of HPV vaccine misinformation from Reddit and may be generalizable to other social media platforms. ML-based methods may provide the capacity and utility to meet the challenge of intelligent, automated monitoring and classification of public health misinformation in social media networks.


 Citation

Please cite as:

Du J, Preston S, Sun H, Shegog R, Cunningham R, Boom J, Savas L, Amith M, Tao C

Using Machine Learning–Based Approaches for the Detection and Classification of Human Papillomavirus Vaccine Misinformation: Infodemiology Study of Reddit Discussions

J Med Internet Res 2021;23(8):e26478

DOI: 10.2196/26478

PMID: 34383667

PMCID: 8380585

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.