Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.
Who will be affected?
Readers: No access to all 28 journals. We recommend accessing our articles via PubMed Central
Authors: No access to the submission form or your user account.
Reviewers: No access to your user account. Please download manuscripts you are reviewing for offline reading before Wednesday, July 01, 2020 at 7:00 PM.
Editors: No access to your user account to assign reviewers or make decisions.
Copyeditors: No access to user account. Please download manuscripts you are copyediting before Wednesday, July 01, 2020 at 7:00 PM.
Comparing the Accuracy of Two Generated Large Language Models in Identifying Health-Related Rumors or Misconceptions and the Applicability in Health Science Popularization: Proof-of-Concept Study
Comparing the Accuracy of two Generated Large Language Models in Identifying Health-related Rumors or Misconceptions and the Applicability in Health Science Popularization: a Proof of Concept
Yuan Luo;
Yiqun Miao;
Yuhan Zhao;
Jiawei Li;
Yuling Chen;
Yuexue Yue;
Ying Wu
ABSTRACT
Background:
Health-related rumors and misconceptions are spreading at an increasing rate with the development of the Internet.
Objective:
To investigate the accuracy of rumor identification and the effectiveness of health science popularization.
Methods:
20 health rumors and misconceptions, along with 10 health truths, were randomly inputted into GPT-4 and ERNIE Bot 4.0.
Results:
For health-related rumors or misconceptions, they achieved 100% accuracy rates. For truths, the accuracy rates were 70% (7/10) and 100% (10/10). They mostly provided the widely recognized viewpoints without obvious errors. The average readability score for health essays was (2.92±0.85) in GPT-4 and (3.02±0.84) in ERNIE Bot 4.0 (P=0.647). Except for the content and cultural appropriateness, significant differences were observed in the total score and scores in other dimensions between them (P<0.05).
Conclusions:
They all offer widely accepted views. The health essays, linguistically strong, but need expert approval. ERNIE Bot 4.0 aligns with Chinese expression. Clinical Trial: No application
Citation
Please cite as:
Luo Y, Miao Y, Zhao Y, Li J, Chen Y, Yue Y, Wu Y
Comparing the Accuracy of Two Generated Large Language Models in Identifying Health-Related Rumors or Misconceptions and the Applicability in Health Science Popularization: Proof-of-Concept Study