JMIR Preprints #75500: A Methodology Framework for Analyzing Health Misinformation to Develop Inoculation Intervention Using Large Language Models: A case study on covid-19

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

A Methodology Framework for Analyzing Health Misinformation to Develop Inoculation Intervention Using Large Language Models: A case study on covid-19

Samira Malek;
Christopher Griffin;
Robert D Fraleigh;
Robert Lennon;
Vishal Monga;
Lijiang Shen

ABSTRACT

Background:

The rapid growth of social media as an information channel has enabled the swift spread of inaccurate or false health information, significantly impacting public health. This widespread dissemination of misinformation has caused confusion, eroded trust in health authorities, led to noncompliance with health guidelines, and encouraged risky health behaviors. Understanding the dynamics of misinformation on social media is essential for devising effective public health communication strategies.

Objective:

This study aims to present a comprehensive and automated approach that leverages Large Language Models (LLMs) and Machine Learning (ML) techniques to detect misinformation on social media, uncover the underlying causes and themes, and generate refutation arguments, facilitating control of its spread and promoting public health outcomes by inoculating people against health misinformation.

Methods:

We use two datasets to train three LLMs, namely BERT, T5, and GPT-2, to classify documents into two categories: misinformation and non-misinformation. Additionally, we employ a separate dataset to identify misinformation topics. To analyze these topics, we apply three topic modeling algorithms—Latent Dirichlet Allocation (LDA), Top2Vec, and BERTopic—and selected the optimal model based on performance evaluated across three metrics. Using a prompting approach, we extract sentence-level representations for the topics to uncover their underlying themes. Finally, we design a prompt text capable of identifying misinformation themes effectively.

Results:

The trained BERT model demonstrated exceptional performance, achieving 98% accuracy in classifying misinformation and non-misinformation, with a 44% reduction in false positive rates for AI-generated misinformation. Among the three topic modeling approaches employed, BERTopic outperformed the others, achieving the highest metrics with a Coherence Value (CV) of 0.41, Normalized Pointwise Mutual Information (NPMI) of -0.086, and Inverted RBO (IRBO) of 0.99. To address the issue of unclassified documents, we developed an algorithm to assign each document to its closest topic. Additionally, we proposed a novel method using prompt engineering to generate sentence-level representations for each topic, achieving a 99.6% approval rate as "appropriate" or "somewhat appropriate" by three independent raters. We further designed a prompt text to identify themes of misinformation topics and developed another prompt capable of detecting misinformation themes with 80% accuracy.

Conclusions:

This study presents a comprehensive and automated approach to addressing health misinformation on social media using advanced machine learning and natural language processing techniques. By leveraging large language models (LLMs) and prompt engineering, the system effectively detects misinformation, identifies underlying themes, and provides explanatory responses to combat its spread.

Citation

Please cite as:

Malek S, Griffin C, Fraleigh RD, Lennon R, Monga V, Shen L

Intervention in Health Misinformation Using Large Language Models for Automated Detection, Thematic Analysis, and Inoculation: Case Study on COVID-19

J Med Internet Res 2026;28:e75500

DOI: 10.2196/75500

PMID: 41505350

PMCID: 12791202

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Apr 9, 2025

Date Accepted: Oct 13, 2025

A Methodology Framework for Analyzing Health Misinformation to Develop Inoculation Intervention Using Large Language Models: A case study on covid-19

ABSTRACT

Citation

Copyright