Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Dec 31, 2024
Open Peer Review Period: Dec 31, 2024 - Feb 25, 2025
Date Accepted: Apr 13, 2025
(closed for review but you can still tweet)
Identification of Online Health Information Using Large Pre-trained Language Models: An Effectiveness Assessment
ABSTRACT
Background:
Online health information is widely available, but a significant portion of it is inaccurate or misleading, including exaggerated, incomplete, or unverified claims. Such misinformation can significantly influence public health decisions and pose serious challenges to healthcare systems. With advances in artificial intelligence and natural language processing, large pre-trained language models (LLMs) have shown promise in identifying and distinguishing misleading health information, though their effectiveness in this area remains under explored.
Objective:
This paper aims to evaluate the performance of four mainstream LLMs (ChatGPT-3.5, ChatGPT-4.0, ERNIE Bot, and iFLYTEK Spark) in the identification of online health information, providing empirical evidence for their practical application in this field.
Methods:
Web scraping was used to collect data from online rumor-refuting websites, resulting in 2,708 samples of online health information, including both true and false claims. The four LLMs’ APIs were employed for authenticity verification, with expert results as benchmarks. Model performance was evaluated using semantic similarity, accuracy, recall, F1 score, and content analysis.
Results:
The study found that the four models performed well in identifying online health information. Among them, ChatGPT-4.0 achieved the highest accuracy at 87.27%, followed by ERNIE Bot at 87.25%, iFLYTEK Spark at 87.00%, and ChatGPT-3.5 at 81.82%. Furthermore, text length and semantic similarity analysis showed that ERNIE Bot had the highest similarity to expert texts, while ChatGPT-4.0 showed good overall consistency in its explanations. Further analysis suggested that the primary areas of misjudgment included child health, drug nutrition, disease prevention, lifestyle, food safety, and diet. Overall, the research suggests that LLMs have potential in online health information identification; however, their understanding of certain specialized health topics may require further improvement.
Conclusions:
The results demonstrate that while these models show potential in providing assistance, their performance varies significantly in terms of accuracy, semantic understanding, and cultural adaptability. Key findings highlight the models’ ability to generate accessible and context-aware explanations; however, they fall short in areas requiring specialized medical knowledge or updated data, particularly for emerging health issues and context-sensitive scenarios. Significant discrepancies were observed in the models’ ability to distinguish scientifically verified knowledge from popular misconceptions and in their stability when processing complex linguistic and cultural contexts. These challenges reveal the importance of refining training methodologies to improve the models’ reliability and adaptability. Future research should focus on enhancing the models’ capability to manage nuanced health topics and diverse cultural and linguistic nuances, thereby facilitating their broader adoption as reliable tools for online health information identification.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.