Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Formative Research

Date Submitted: Jul 24, 2024
Open Peer Review Period: Jul 24, 2024 - Sep 18, 2024
Date Accepted: Aug 17, 2025
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Evaluating Large Language Models for Sentiment Analysis and Hesitancy Analysis on Vaccine Posts From Social Media: Qualitative Study

Annan A, Eiden AL, Wang D, Du J, Rastegar-Mojarad M, Nomula VK, Wang X

Evaluating Large Language Models for Sentiment Analysis and Hesitancy Analysis on Vaccine Posts From Social Media: Qualitative Study

JMIR Form Res 2025;9:e64723

DOI: 10.2196/64723

PMID: 41092067

PMCID: 12526656

Evaluating Large Language Models for Sentiment Analysis and Hesitancy Analysis on Vaccine Posts from Social Media

  • Augustine Annan; 
  • Amanda L. Eiden; 
  • Dong Wang; 
  • Jingcheng Du; 
  • Majid Rastegar-Mojarad; 
  • Varun Kumar Nomula; 
  • Xiaoyan Wang

ABSTRACT

Background:

In the digital age, social media has become a crucial platform for public discourse on diverse health-related topics, including vaccines. Efficient sentiment analysis and hesitancy detection are essential for understanding public opinions and concerns. Large language models (LLMs) offer advanced capabilities for processing complex linguistic patterns, potentially providing valuable insights into vaccine-related discourse.

Objective:

To evaluate the performance of various LLMs in sentiment analysis and hesitancy detection related to vaccine discussions on social media and identify the most efficient, accurate, and cost-effective model for detecting vaccine-related public sentiment and hesitancy trends.

Methods:

We employed several LLMs—GPT-3.5, GPT-4, Claude-3 Sonnet, and Llama 2—to process and classify complex linguistic data related to human papillomavirus (HPV), measles, mumps, and rubella (MMR), and vaccines overall from X (formerly known as Twitter), Reddit, and YouTube. The models were tested across different learning paradigms: zero-shot, one-shot, and few-shot, to determine their adaptability and learning efficiency with varying amounts of training data. We evaluated the models' performance using accuracy, F1 score, precision, and recall. Additionally, we conducted a cost analysis focused on token usage to assess the computational efficiency of each approach.

Results:

GPT-4 (F1 score = 0.85, Accuracy = 0.83) outperformed GPT-3.5, Llama 2, and Claude-3 Sonnet across various metrics, regardless of the sentiment type or learning paradigm. Few-shot learning did not significantly enhance performance compared to the zero-shot paradigm. Moreover, the increased computational costs and token usage associated with few-shot learning did not justify its application, given the marginal improvement in model performance. The analysis highlighted challenges in classifying neutral sentiments and convenience, correctly interpreting sarcasm, and accurately identifying indirect expressions of vaccine hesitancy, emphasizing the need for model refinement.

Conclusions:

GPT-4 emerged as the most accurate model, excelling in sentiment and hesitancy analysis. Performance differences between learning paradigms were minimal, making zero-shot learning preferable for its balance of accuracy and computational efficiency. However, the zero-shot GPT-4 model is not the most cost-effective compared to traditional machine learning. A hybrid approach, using LLMs for initial annotation and traditional models for training, could optimize cost and performance. Despite reliance on specific LLM versions and a limited focus on certain vaccine types and platforms, our findings underscore the capabilities and limitations of LLMs in vaccine sentiment and hesitancy analysis, highlighting the need for ongoing evaluation and adaptation in public health communication strategies.


 Citation

Please cite as:

Annan A, Eiden AL, Wang D, Du J, Rastegar-Mojarad M, Nomula VK, Wang X

Evaluating Large Language Models for Sentiment Analysis and Hesitancy Analysis on Vaccine Posts From Social Media: Qualitative Study

JMIR Form Res 2025;9:e64723

DOI: 10.2196/64723

PMID: 41092067

PMCID: 12526656

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.