JMIR Preprints #63631: Assessing A Large Language Model's Ability to Emulate Human Experts in Sentiment Evaluation of Social Media Discussions about Heated Tobacco Products: Evaluation Study

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Assessing A Large Language Model's Ability to Emulate Human Experts in Sentiment Evaluation of Social Media Discussions about Heated Tobacco Products: Evaluation Study

Kwanho Kim;
Soojong Kim

ABSTRACT

Background:

Sentiment analysis of alternative tobacco products discussed on social media is a crucial area in tobacco control research. Large Language Models (LLMs) may hold the potential to streamline the time-consuming and labor-intensive process of human sentiment analysis.

Objective:

The accuracy of a language model in replicating human sentiment labeling of social media messages relevant to heated tobacco products (HTPs) was examined.

Methods:

ChatGPT was employed to classify 500 Facebook and 500 Twitter messages. Each set consisted of 200 human-labeled anti-HTPs, 200 pro-HTPs, and 100 neutral messages. The model evaluated messages up to 20 times to generate multiple response instances reporting its labeling decisions. The majority label from these responses was assigned as the model’s decision for each message. The model’s labeling decisions were then compared to those of human evaluators.

Results:

ChatGPT accurately replicated human sentiment labeling in 61.2% of Facebook messages and 57.0% of Twitter messages. Increasing the number of responses from the model improved the accuracy, with a single response yielding at least 83% accuracy with 20 responses. The model’s accuracy was higher for human-labeled anti-HTPs messages, compared to human-labeled pro-HTPs and neutral messages. Most of the misclassified human-labeled anti- and pro-HTPs messages were labeled by the ChatGPT as either neutral or irrelevant to HTPs.

Conclusions:

LLMs could be utilized to analyze sentiment in social media messages about HTPs. A potential challenge for using LLMs to analyze discourses related to HTPs on social media could be the underrepresentation of messages that express positive attitudes towards these products.

Citation

Please cite as:

Kim K, Kim S

Large Language Models’ Accuracy in Emulating Human Experts’ Evaluation of Public Sentiments about Heated Tobacco Products on Social Media: Evaluation Study

J Med Internet Res 2025;27:e63631

DOI: 10.2196/63631

PMID: 40053746

PMCID: 11920658

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Jun 29, 2024

Date Accepted: Jan 19, 2025

Assessing A Large Language Model's Ability to Emulate Human Experts in Sentiment Evaluation of Social Media Discussions about Heated Tobacco Products: Evaluation Study

ABSTRACT

Citation

Copyright