Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Mar 8, 2022
Date Accepted: May 4, 2022
Date Submitted to PubMed: May 5, 2022

The final, peer-reviewed published version of this preprint can be found here:

An Analysis of French-Language Tweets About COVID-19 Vaccines: Supervised Learning Approach

Sauvayre R, Vernier J, Chauvière C

An Analysis of French-Language Tweets About COVID-19 Vaccines: Supervised Learning Approach

JMIR Med Inform 2022;10(5):e37831

DOI: 10.2196/37831

PMID: 35512274

PMCID: 9116457

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Using supervised learning to analyze the French vaccine debate on Twitter

  • Romy Sauvayre; 
  • Jessica Vernier; 
  • Cédric Chauvière

ABSTRACT

Background:

As the pandemic progressed, disinformation, fake news and conspiracy spread through many parts of society. However, the disinformation spreading through social media is, according to the literature, one of the causes of increased COVID-19 vaccine hesitancy. In this context, the analysis of social media is particularly important, but the large amount of data exchanged on social networks requires specific methods. This is why machine learning and natural language processing (NLP) models are increasingly used on social media data.

Objective:

The aim of this study is to examine the capability of the CamemBERT French language model to faithfully predict elaborated categories, with the knowledge that tweets about vaccination are often ambiguous, sarcastic or irrelevant to the studied topic.

Methods:

A total of 901,908 unique French tweets related to vaccination published between July 12, 2021, and August 11, 2021, were extracted from Twitter API v2. Approximately 2000 randomly selected tweets were labeled with two types of categorization: (1) arguments for (“pros”) or against (“cons”) vaccination (sanitary measures included) and (2) the type of content of tweets (“scientific”, “political”, “social”, or “vaccination status”). The CamemBERT model was fine-tuned and tested for the classification of French tweets. The model performance was assessed by computing the F1-score, and confusion matrices were obtained.

Results:

The accuracy of the applied machine learning reached up to 70.6% for the first classification (“pros” and “cons” tweets) and up to 90.0% for the second classification (“scientific” and “political” tweets). Furthermore, a tweet was 1.86 times more likely to be incorrectly classified by the model if it contained fewer than 170 characters (odds ratio = 1.86; 1.20 < 95% confidence interval < 2.86).

Conclusions:

The accuracy is affected by the classification chosen and the topic of the message examined. When the vaccine debate is jostled by contested political decisions, tweet content becomes so heterogeneous that the accuracy of the models drops for less differentiated classes. However, our tests also showed that it is possible to improve the accuracy of the model by selecting tweets using a new method based on tweet size.


 Citation

Please cite as:

Sauvayre R, Vernier J, Chauvière C

An Analysis of French-Language Tweets About COVID-19 Vaccines: Supervised Learning Approach

JMIR Med Inform 2022;10(5):e37831

DOI: 10.2196/37831

PMID: 35512274

PMCID: 9116457

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.