Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Aug 29, 2022
Open Peer Review Period: Aug 29, 2022 - Oct 24, 2022
Date Accepted: Sep 29, 2022
Date Submitted to PubMed: Oct 27, 2022
(closed for review but you can still tweet)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Uncovering the Reasons behind COVID-19 Vaccine Hesitancy in Serbia: Sentiment-Based Topic Modeling
ABSTRACT
Background:
After the first COVID-19 vaccine appeared, there has been a growing tendency to determine public attitudes toward it automatically. In particular, it has been important to find the reasons for vaccine hesitancy, since it was directly correlated with pandemic protraction. Natural language processing (NLP) and public health researchers have turned to social media (Twitter, Reddit, and Facebook) for user-created content from which they could gauge public opinion on vaccination. To automatically process such content, they use a number of NLP techniques, most notably topic modeling. Topic modeling enables the automatic uncovering and grouping of hidden topics in the text. When applied to content that expresses negative sentiment toward vaccination, it can give a direct insight into reasons for vaccine hesitancy.
Objective:
This study applies NLP methods to classify vaccination-related tweets by sentiment polarity, and uncover reasons for vaccine hesitancy among the negative tweets in the Serbian language.
Methods:
To study the attitudes and beliefs behind vaccine hesitancy, we collected two batches of tweets that mention some aspects of the COVID-19 vaccination. 8,817 tweets were manually annotated as either relevant or irrelevant regarding the COVID-19 vaccination sentiment and then the relevant were annotated as positive, negative or neutral. We used the annotated tweets to train a sequential BERT-based classifier for two tweet classification tasks to augment this initial dataset. The first classifier distinguishes between relevant and irrelevant tweets. The second classifier used the relevant tweets and classified them as negative, positive or neutral. This sequential classifier was used to annotate the second batch of tweets. The combined datasets resulted in 3,286 tweets with a negative sentiment: 1,770 from the manually annotated dataset and 1,516 as a result of automatic classification. Topic modeling methods (LDA and NMF) were applied using 3,286 preprocessed tweets to detect reasons for vaccine hesitancy.
Results:
The relevance classifier achieves an F-score of 0.91 and 0.96 for relevant and irrelevant tweets, respectively. The sentiment polarity classifier achieves an F-score of 0.87, 0.85 and 0.85 for negative neutral and positive sentiment, respectively. By summarizing the topics obtained in both models, we extracted five main groups of reasons for vaccine hesitancy: Concern over vaccine side effects, Concern over vaccine effectiveness, Concern over insufficiently tested vaccines, Mistrust of authorities and Conspiracy theories.
Conclusions:
This paper presents a combination of NLP methods applied to find the reasons for vaccine hesitancy in Serbia. Given these reasons, it is now possible to better understand the concerns of people regarding the vaccination process.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.