Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Formative Research

Date Submitted: Sep 21, 2023
Date Accepted: Dec 13, 2023

The final, peer-reviewed published version of this preprint can be found here:

Automated Credibility Assessment of Web-Based Health Information Considering Health on the Net Foundation Code of Conduct (HONcode): Model Development and Validation Study

Bayani A, Ayotte A, Nikiema JN

Automated Credibility Assessment of Web-Based Health Information Considering Health on the Net Foundation Code of Conduct (HONcode): Model Development and Validation Study

JMIR Form Res 2023;7:e52995

DOI: 10.2196/52995

PMID: 38133919

PMCID: 10770789

Automated Credibility Assessment of Online Health Information Considering HONcode: Model Development and Validation Study

  • Azadeh Bayani; 
  • Alexandre Ayotte; 
  • Jean Noel Nikiema

ABSTRACT

Background:

An increasing number of users are turning to online sources which is a significant source of patient information for healthcare guidance. Thus, trustworthy sources of information should be automatically identifiable using objective criteria.

Objective:

The purpose of this study was to automate the assessment of the HONcode criteria, enhancing our ability to pinpoint trustworthy health information sources.

Methods:

A dataset of 538 webpages displaying health content was collected from 43 health-related websites. HONcode criteria have been split into two levels ((1) webpage and (2) website levels). For the website-level criteria (confidentiality, transparency, financial disclosure, and advertising policy), a bag of keywords has been identified to assess the criteria, and for webpage-level criteria (authority, complementarity, justifiability, and attribution) a machine learning (ML) approach has been used and 200 webpages were manually annotated. For website-level criteria, a list of words for each criterion was generated and validated by consensus. Three ML models -Random Forest (RF), Support Vector Machines (SVM), and BERT- were applied on the initial annotated data and evaluated. A second step of training was implemented for the complementarity criterion using the BERT model for multiclass classification of the complementarity sentences obtained by annotation and data augmentation (Positive, Negative, and non-committal complementarity sentences). Finally, the remaining web pages were classified using the selected model and 100 sentences were randomly selected for manual review.

Results:

The evaluation of the website's criteria was effectively conducted using a bag-of-words approach for confidentiality and transparency. However, this method was less effective for the other website-level criteria. For webpage level criteria, the RF model showed a good performance for the attribution criterion while displaying subpar performance in the others. BERT and SVM had a stable performance across the different criteria. BERT had a better area under the curve (AUC) of 0.96, 0.98, and 1.00 for neutral sentences, justifiability, and attribution respectively. SVM had the overall better performance for the classification of complementarity with the AUC equal to 0.98. Finally, SVM and BERT had an equal AUC of 0.98 for the authority criterion. Also, for the identification of website criteria the model was able to retrieve webpages with an accuracy of 97% for confidentiality, 82% for transparency, 51% for financial disclosure, and 51% for advertising policy. The final evaluation of the sentences determined 0.88 of precision and the agreement level of reviewers was computed 0.82.

Conclusions:

Our results showed the potential power of using a BERT model in automating the classification of HONcode criteria in web content. It achieved higher performance compared to traditional approaches using RF and SVM. This approach could be utilized with different types of pre-trained models to accelerate the text annotation, and classification and to improve the performance in low-resource cases.


 Citation

Please cite as:

Bayani A, Ayotte A, Nikiema JN

Automated Credibility Assessment of Web-Based Health Information Considering Health on the Net Foundation Code of Conduct (HONcode): Model Development and Validation Study

JMIR Form Res 2023;7:e52995

DOI: 10.2196/52995

PMID: 38133919

PMCID: 10770789

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.