Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Formative Research

Date Submitted: Jul 17, 2022
Date Accepted: May 23, 2023
Date Submitted to PubMed: May 23, 2023

The final, peer-reviewed published version of this preprint can be found here:

Interdisciplinary Approach to Identify and Characterize COVID-19 Misinformation on Twitter: Mixed Methods Study

Isip Tan IT, Cleofas JV, Solano GA, Pillejera JGA, Catapang JK

Interdisciplinary Approach to Identify and Characterize COVID-19 Misinformation on Twitter: Mixed Methods Study

JMIR Form Res 2023;7:e41134

DOI: 10.2196/41134

PMID: 37220196

PMCID: 10337476

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Topics, Formats and Discursive Strategies of Tweets with COVID-19 Misinformation in the Philippines: Natural Language Processing and Qualitative Analysis

  • Iris Thiele Isip Tan; 
  • Jerome V Cleofas; 
  • Geoffrey A Solano; 
  • Jeanne Genevive A Pillejera; 
  • Jasper Kyle Catapang

ABSTRACT

Background:

The World Health Organization introduced the term “infodemic” to refer to the high volume of health information made available to the public, which may include misinformation. While not exclusively happening online, the infodemic is most evident on social media. As Twitter is a popular platform in the Philippines, this study examined tweets containing COVID-19 misinformation early in the pandemic.

Objective:

Identify tweets containing COVID-19 misinformation and characterize them.

Methods:

Tweets, geolocated around the Philippine National Capital Region from 1 January to 21 March 2020 containing the words coronavirus, covid, and ncov, were mined using the GetOldTweets3 Python library. This primary corpus (N=12,631) was subjected to biterm topic modeling (BTM). Parallel processing was done using qualitative analysis and natural language processing (NLP). Key informant interviews (KII) were conducted to elicit examples of COVID-19 misinformation within the period covered, and determine keywords. Using nVivo and a combination of word frequency and text search using the keywords derived from the KII, subcorpus A (n=5,881) was identified and manually coded to identify misinformation. Constant comparative, iterative, and consensual analysis were used to identify topics, formats and discursive strategies. Tweets containing the KII keywords were extracted from the primary corpus to constitute subcorpus B (n=4,634). This contained 506 tweets with COVID-19 misinformation, which were manually labeled.

Results:

BTM of the primary corpus revealed the following topics: uncertainty, lawmakers’ response, safety measures, testing, loved ones, health standards, panic buying, tragedies other than COVID-19, economy, COVID-19 statistics, precautions, health measures, international issues, adherence to guidelines, and frontliners. These were categorized into four major topics: nature of COVID, contexts and consequences, people and agents of COVID, and COVID prevention and management. Manual coding of subcorpus A identified 398 tweets containing COVID-19 misinformation. Tweet formats identified included false connection (n=53), false context (n=42), misleading content (n=179), satire and/or parody (n=77), and conspiracy (n=47). Seven discursive strategies were evident: over-positivity (n=32), humor (n=109), marketing (n=27), performing credibility (n=45), anger and disgust (n=59), political commentaries (n=59), and fear mongering (n=67). The most common format and discursive strategy seen were misleading content and humor, respectively. NLP identified 165 tweets containing COVID-19 misinformation. However, manual review of these tweets showed that 115 tweets (69.7%) did not contain misinformation.

Conclusions:

An interdisciplinary approach was needed since tweets written in Filipino or a code-switching of the Filipino and English languages, creates difficulty in a purely computational approach. Identifying the formats and discursive strategies of the tweets required iterative, manual, and emergent coding by investigators with experiential and cultural knowledge of Twitter. NLP mislabeled more than two-thirds of the tweets as containing COVID-19 misinformation, when manually checked. Hence, combining computational and qualitative methods was necessary to gain a better understanding of COVID-19 misinformation on Twitter.


 Citation

Please cite as:

Isip Tan IT, Cleofas JV, Solano GA, Pillejera JGA, Catapang JK

Interdisciplinary Approach to Identify and Characterize COVID-19 Misinformation on Twitter: Mixed Methods Study

JMIR Form Res 2023;7:e41134

DOI: 10.2196/41134

PMID: 37220196

PMCID: 10337476

Per the author's request the PDF is not available.