Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Formative Research

Date Submitted: Jun 12, 2024
Date Accepted: Oct 10, 2024

The final, peer-reviewed published version of this preprint can be found here:

Comparing the Accuracy of Two Generated Large Language Models in Identifying Health-Related Rumors or Misconceptions and the Applicability in Health Science Popularization: Proof-of-Concept Study

Luo Y, Miao Y, Zhao Y, Li J, Chen Y, Yue Y, Wu Y

Comparing the Accuracy of Two Generated Large Language Models in Identifying Health-Related Rumors or Misconceptions and the Applicability in Health Science Popularization: Proof-of-Concept Study

JMIR Form Res 2024;8:e63188

DOI: 10.2196/63188

PMID: 39622076

PMCID: 11627524

Comparing the Accuracy of two Generated Large Language Models in Identifying Health-related Rumors or Misconceptions and the Applicability in Health Science Popularization: a Proof of Concept

  • Yuan Luo; 
  • Yiqun Miao; 
  • Yuhan Zhao; 
  • Jiawei Li; 
  • Yuling Chen; 
  • Yuexue Yue; 
  • Ying Wu

ABSTRACT

Background:

Health-related rumors and misconceptions are spreading at an increasing rate with the development of the Internet.

Objective:

To investigate the accuracy of rumor identification and the effectiveness of health science popularization.

Methods:

20 health rumors and misconceptions, along with 10 health truths, were randomly inputted into GPT-4 and ERNIE Bot 4.0.

Results:

For health-related rumors or misconceptions, they achieved 100% accuracy rates. For truths, the accuracy rates were 70% (7/10) and 100% (10/10). They mostly provided the widely recognized viewpoints without obvious errors. The average readability score for health essays was (2.92±0.85) in GPT-4 and (3.02±0.84) in ERNIE Bot 4.0 (P=0.647). Except for the content and cultural appropriateness, significant differences were observed in the total score and scores in other dimensions between them (P<0.05).

Conclusions:

They all offer widely accepted views. The health essays, linguistically strong, but need expert approval. ERNIE Bot 4.0 aligns with Chinese expression. Clinical Trial: No application


 Citation

Please cite as:

Luo Y, Miao Y, Zhao Y, Li J, Chen Y, Yue Y, Wu Y

Comparing the Accuracy of Two Generated Large Language Models in Identifying Health-Related Rumors or Misconceptions and the Applicability in Health Science Popularization: Proof-of-Concept Study

JMIR Form Res 2024;8:e63188

DOI: 10.2196/63188

PMID: 39622076

PMCID: 11627524

Per the author's request the PDF is not available.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.