Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Infodemiology

Date Submitted: Jan 27, 2024
Date Accepted: Dec 24, 2024
Date Submitted to PubMed: Jan 15, 2025

The final, peer-reviewed published version of this preprint can be found here:

Transformer-Based Tool for Automated Fact-Checking of Online Health Information: Development Study

Bayani A, Ayotte A, Nikiema JN

Transformer-Based Tool for Automated Fact-Checking of Online Health Information: Development Study

JMIR Infodemiology 2025;5:e56831

DOI: 10.2196/56831

PMID: 39812653

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Automated fact-checking of online health-related information: a novel approach

  • Azadeh Bayani; 
  • Alexandre Ayotte; 
  • Jean Noel Nikiema

ABSTRACT

Background:

Many people seek health-related information online. The significance of reliable information became particularly evident due to the potential dangers of misinformation. Therefore, discerning true and reliable information from false information has become increasingly challenging.

Objective:

In the present study, we introduced a novel approach to automate the fact-checking process, leveraging PubMed resources as a source of truth employing Natural Language Processing (NLP) transformer models to enhance the process.

Methods:

A total of 538 health-related webpages, covering seven different disease subjects, were manually selected by Factually Health Company. The process included the following steps: i) using a Bidirectional Encoder Representations from Transformers (BERT) model, the contents of webpages were classified into three thematic categories: semiology, epidemiology, and management. ii) for each category in the webpages, a PubMed query was automatically produced using a combination of the “WellcomeBertMesh” and “KeyBERT” models, iii) top 20 related literatures were automatically extracted from PubMed and finally, iv) the similarity checking techniques of Cosine similarity and Jaccard distance were applied to compare the content of extracted literature and webpages.

Results:

The BERT model for categorization of webpages contents had a good performance with the F1-scores and recall of 93% and 94% for the semiology and epidemiology respectively and 96% of for both the recall and F1-score for management. For each of the three categories in a webpage, one PubMed query was generated and with each query, 20 most related, open access and within the category of systematic reviews and meta-analysis were extracted. Less than 10% of the extracted literature were irrelevant, which were deleted. For each webpage, an average number of 23% of the sentences found to be very similar to the literature. Moreover, during the evaluation, it was found that Cosine similarity outperformed the Jaccard Distance measure when comparing the similarity between sentences from web pages and academic papers vectorized by BERT. However, there was a significant issue with false positives in the retrieved sentences when compared to accurate similarities as some sentences had a similarity score exceeding 80%, but they could not be considered as similar sentences.

Conclusions:

In the present research, we have proposed an approach to automate the fact-checking of health-related online information. Incorporating content from PubMed or other scientific article databases as trustworthy resources can automate the discovery of similarly credible information in the health domain


 Citation

Please cite as:

Bayani A, Ayotte A, Nikiema JN

Transformer-Based Tool for Automated Fact-Checking of Online Health Information: Development Study

JMIR Infodemiology 2025;5:e56831

DOI: 10.2196/56831

PMID: 39812653

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.