JMIR Preprints #72524: Performance of Large Language Models (LLMs)in the Cognitive Analysis of Misinformation: Evaluation Study

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Performance of Large Language Models (LLMs)in the Cognitive Analysis of Misinformation: Evaluation Study

Dominika Nadia Wojtczak;
Ryan McConville;
Cheryl McQuire;
Luisa Zuccolo;
Claudia Peersman

ABSTRACT

Background:

Public discourse is significantly impacted by the rapid spread of misinformation on social media platforms. Human moderators, while capable of performing well, face many challenges due to scalability. While Large Language Models (LLMs) show great potential across various language tasks, their capacity for cognitive and contextual analysis, in detecting and interpreting misinformation remains less explored.

Objective:

This study evaluates the effectiveness of LLMs in detecting and interpreting misinformation compared to human annotators, focusing on tasks requiring cognitive analysis and complex judgment. Additionally, we analyse the influence of different prompt engineering strategies on model performance and discuss ethical considerations for using LLMs in content moderation systems.

Methods:

We explored four OpenAI models against a panel of human annotators using a subset of posts from the MuMiN dataset. Each model and human annotator responded to structured questions on misinformation, following an established cognitive framework. Both human annotators and LLMs also provided scores indicating how confident they were in their responses. Various prompting strategies were used in this research including: zero-shot, few-shot, and chain-of-thought, with performance evaluated through precision, recall, F1 score, and accuracy. We used statistical tests, including McNemar's test to quantitatively assess differences between LLMand human ratings of misinformation.

Results:

GPT-4 Turbo with chain of thought prompting achieved the highest performance of all LLMs for detecting misinformation, with an accuracy of 67.2% and an F1 score of 78.3%, but was outperformed by human annotators, who achieved 70.1% accuracy and an F1 score of 81.0%. LLMs performed well in tasks involving logical reasoning and straightforward misinformation detection but struggled with complex judgments including detecting sarcasm, understanding misinformation, and analysing user intent. LLM confidence scores positively correlated with accuracy in simpler tasks (p = 0.72, p < 0.01) but were less reliable in subjective and complex contextual evaluations.

Conclusions:

LLMs show significant potential for automating misinformation detection. However, their limitations in understanding and interpreting these posts highlight the current necessity of human oversight. A hybrid framework combining LLMs for preliminary screening with human moderators for more complex evaluation presents a promising future direction. Future research could prioritise the fine-tuning of LLMs using datasets that emphasise cognitive and emotional linguistic features, alongside the development of advanced prompting techniques.

Citation

Please cite as:

Wojtczak DN, McConville R, McQuire C, Zuccolo L, Peersman C

Performance of Large Language Models in the Cognitive Analysis of Misinformation: Evaluation Study

JMIR Infodemiology 2026;6:e72524

DOI: 10.2196/72524

PMID: 42149639

PMCID: 13227085

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Infodemiology

Date Submitted: Feb 11, 2025

Date Accepted: Jul 29, 2025

Performance of Large Language Models (LLMs)in the Cognitive Analysis of Misinformation: Evaluation Study

ABSTRACT

Citation

Copyright