Accepted for/Published in: JMIR Mental Health
Date Submitted: Jun 21, 2023
Open Peer Review Period: Jun 20, 2023 - Aug 15, 2023
Date Accepted: Nov 17, 2023
(closed for review but you can still tweet)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Sentiment Analysis of COVID-19 Survey Data: A Comparison of ChatGPT and Fine-tuned OPT Against Widely Used Sentiment Analysis Tools
ABSTRACT
Background:
Healthcare providers and health-related researchers face significant challenges when applying sen- timent analysis tools to health-related free-text survey data. Most state-of-the-art applications were developed in domains like social media, and their performance in the healthcare context remains relatively unknown. Moreover, existing studies indicate that these tools often lack accuracy and produce inconsistent results.
Objective:
This study aims to address the lack of comparative analysis on sentiment analysis tools applied to health-related free-text survey data in the context of COVID-19. The objective is to automatically predict sentence sentiment for two independent COVID-19 survey datasets from NIH and Stanford University.
Methods:
Gold-standard labels were created for a subset of each dataset using a panel of human raters. We compared eight state-of- the-art sentiment analysis tools on both datasets to evaluate variability and disagreement across tools. Additionally, few-shot learning was explored by fine-tuning OPT (a large language model [LLM] with publicly available weights) using a small annotated subset and zero-shot learning using ChatGPT (an LLM without available weights).
Results:
The comparison of sentiment analysis tools revealed high variability and disagreement across the evaluated tools when applied to health-related survey data. OPT and ChatGPT demonstrated superior performance, outperform- ing all other sentiment analysis tools. Moreover, ChatGPT exhibited higher accuracy, outperforming OPT by 6%, and f-score by 4% to 7%.
Conclusions:
The findings suggest that using LLMs is a viable method for predicting sentiment in health surveys. The comparative analysis highlights the potential of LLMs in reducing the need for human labor in dataset annotation or redeploying it toward quality control of LLM predictions. The study demonstrates the effectiveness of LLMs, particularly the few-shot learning and zero-shot learning approaches, in sentiment analysis of health-related survey data. These results have implications for saving hu- man labor and improving efficiency in sentiment analysis tasks, contributing to advancements in the field of automated sentiment analysis.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.