JMIR Preprints #44356: Tweeting for Health: An Infodemics Data Ecosystem for Real-Time Mining and AI-Based Analytics for Twitter

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Tweeting for Health: An Infodemics Data Ecosystem for Real-Time Mining and AI-Based Analytics for Twitter

Irfhana Zakir Hussain;
Jasleen Kaur;
Matheus Lotto;
Zahid Ahmed Butt;
Plinio Pelegrini Morita

ABSTRACT

Background:

Digital misinformation, primarily on social media, has led to harmful and costly beliefs in the general population. Notably, these beliefs have resulted in public health crises to the detriment of governments around the world and their citizens. However, public health officials currently lack access to a comprehensive system capable of mining and analyzing large volumes of social media data in real time.

Objective:

The aim of this study was to design and develop a big data pipeline and ecosystem (UbiLab Infodemics Analysis System (U-IAS) for the identification and analysis of false information disseminated via social media on a certain topic or set of related topics.

Methods:

U-IAS is a platform-independent ecosystem developed in Python that leverages the Twitter V2 API and the Elastic Stack. The U-IAS expert system has 5 major components: a) Data Extraction Framework; b) Latent Dirichlet Allocation (LDA) Topic Model; c) Sentiment Analyzer; d) Information Disorder Classification Model; e) Elastic Cloud Deployment (Indexing of data and visualizations). The Data Extraction Framework queries data through the Twitter V2 API, with queries identified by public health experts. The LDA Topic Model, Sentiment Analyzer, and Information Disorder Classification Model are independently trained using a small, expert-validated subset of the extracted data. These models are then incorporated into U-IAS to analyze and classify the remaining data. Finally, the analyzed data is loaded into an index in the Elastic Cloud deployment and can then be presented in dashboards with advanced visualizations and analytics pertinent to infodemics analysis.

Results:

Each component in the system is performing as expected. The data extraction framework handles large loads of data within short periods of time. The LDA topic models have achieved relatively high coherence values (0.54) and the predicted topics are accurate and befitting to the data. The sentiment analyzer is performing at a correlation coefficient of 0.61 but could be improved in further iterations. The information disorder classifier has attained a satisfactory correlation coefficient of 0.76 against the expert-validated data. Moreover, the Elastic cloud deployment is efficient in its storage of data and comprehensive in its visualization and analytics capabilities. In fact, investigators have successfully utilized the system to extract interesting and important insights in public health.

Conclusions:

The novel U-IAS pipeline has the potential to detect and analyze misleading information related to a particular topic or set of related topics. Furthermore, this approach can emphasize on integrating social media data from multiple sources into dashboards for a multiplatform analysis and testing of the ecosystem on other public health use cases.

Citation

Please cite as:

Zakir Hussain I, Kaur J, Lotto M, Butt ZA, Morita PP

Tweeting for Health Using Real-time Mining and Artificial Intelligence–Based Analytics: Design and Development of a Big Data Ecosystem for Detecting and Analyzing Misinformation on Twitter

J Med Internet Res 2023;25:e44356

DOI: 10.2196/44356

PMID: 37294603

PMCID: 10337356

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Nov 18, 2022

Date Accepted: Mar 14, 2023

Tweeting for Health: An Infodemics Data Ecosystem for Real-Time Mining and AI-Based Analytics for Twitter

ABSTRACT

Citation

Copyright