JMIR Preprints #26478: Utilizing machine learning-based approaches for the detection and classification of human papillomavirus (HPV) vaccine misinformation: Infodemiology Study of Reddit Discussions

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Utilizing machine learning-based approaches for the detection and classification of human papillomavirus (HPV) vaccine misinformation: Infodemiology Study of Reddit Discussions

Jingcheng Du;
Sharice Preston;
Hanxiao Sun;
Ross Shegog;
Rachel Cunningham;
Julie Boom;
Lara Savas;
Muhammad Amith;
Cui Tao

ABSTRACT

Background:

Vaccine misinformation shared on social media poses a substantial threat to community safety.

Objective:

To develop and evaluate an intelligent, automated protocol to identify and classify HPV vaccine misinformation on social media, using machine learning (ML)-based methods

Methods:

Reddit posts (2007–2017, N = 28,121) were compiled that contained human papillomavirus (HPV) vaccine-related keywords. A two-step pipeline was proposed for misinformation identification and classification. A random subset (N = 2,200) was manually labeled for misinformation and served for the training and evaluation of ML algorithms (e.g., convolutional neural network [CNN]) for misinformation identification. The trained CNN model was applied to identify the misinformation from un-labeled posts. Then, for the posts that were inferred containing misinformation, topic modeling was further applied to identify the major categories (i.e., classification) associated with HPV vaccine misinformation.

Results:

The CNN model achieved the highest area under the receiver operating characteristic curve (AUC) at 0.7943 in the identification of misinformation. Of 28,121 Reddit posts, 7,207 (25.63%) were identified containing misinformation. Topic modeling then classified major misinformation categories from these posts, including general safety issues, which was identified as the leading type of misinformed posts (37%).

Conclusions:

ML-based approaches are effective in the identification and classification of HPV vaccine misinformation from Reddit and may be generalizable to other social media platforms. ML-based methods may provide the capacity and utility to meet the challenge of intelligent, automated monitoring and classification of public health misinformation in social media networks.

Citation

Please cite as:

Du J, Preston S, Sun H, Shegog R, Cunningham R, Boom J, Savas L, Amith M, Tao C

Using Machine Learning–Based Approaches for the Detection and Classification of Human Papillomavirus Vaccine Misinformation: Infodemiology Study of Reddit Discussions

J Med Internet Res 2021;23(8):e26478

DOI: 10.2196/26478

PMID: 34383667

PMCID: 8380585

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Dec 13, 2020

Date Accepted: May 6, 2021

Date Submitted to PubMed: Aug 12, 2021

Utilizing machine learning-based approaches for the detection and classification of human papillomavirus (HPV) vaccine misinformation: Infodemiology Study of Reddit Discussions

ABSTRACT

Citation

Copyright