Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Mar 7, 2025
Date Accepted: Apr 17, 2025

The final, peer-reviewed published version of this preprint can be found here:

Identifying Disinformation on the Extended Impacts of COVID-19: Methodological Investigation Using a Fuzzy Ranking Ensemble of Natural Language Processing Models

Chen JA, Chung WC, Hung CL, Wu CY

Identifying Disinformation on the Extended Impacts of COVID-19: Methodological Investigation Using a Fuzzy Ranking Ensemble of Natural Language Processing Models

J Med Internet Res 2025;27:e73601

DOI: 10.2196/73601

PMID: 40397945

PMCID: 12138316

Identifying Disinformation on the Extended Impacts of COVID-19: A Methodological Investigation Using a Fuzzy Ranking Ensemble of NLP Models

  • Jian-An Chen; 
  • Wu-Chun Chung; 
  • Che-Lun Hung; 
  • Chun-Ying Wu

ABSTRACT

Background:

During the COVID-19 pandemic, the continuous spread of misinformation on the internet poses an ongoing threat to public trust and understanding of epidemic prevention policies. Even with the pandemic under control, information regarding the risks of long-term COVID-19 and reinfection still needs to be integrated into COVID-19 policies.

Objective:

The study introduces a deep learning approach combining language models with a fuzzy rank-based ensemble method for detecting misinformation concerning the long-term impacts of COVID-19.

Methods:

The data, comprising 566 genuine and 2361 fake samples, was collected and refined from reliable open sources using data processing techniques. Afterward, deep learning models such as HAN, BERT, and XLNet were trained based on the collected data to detect misinformation about the long-term impacts of COVID-19. This study employed the fuzzy rank-based ensemble technique, combining different deep models to improve the performance further.

Results:

After training on the dataset, various classification methods were evaluated on the test set, including the fuzzy rank-based method and state-of-the-art large language models. The fuzzy rank-based ensemble method, which combines multiple language models, achieved an F1-score of 96.03%.

Conclusions:

The fusion of ensemble learning with PLMs and the Gompertz function, employing fuzzy rank-based methodology, introduces a novel prediction approach with prospects for enhancing accuracy and reliability. Additionally, experimental results imply that training solely on textual content can yield high prediction accuracy.


 Citation

Please cite as:

Chen JA, Chung WC, Hung CL, Wu CY

Identifying Disinformation on the Extended Impacts of COVID-19: Methodological Investigation Using a Fuzzy Ranking Ensemble of Natural Language Processing Models

J Med Internet Res 2025;27:e73601

DOI: 10.2196/73601

PMID: 40397945

PMCID: 12138316

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.