Currently accepted at: Journal of Medical Internet Research
Date Submitted: Dec 1, 2025
Date Accepted: Mar 28, 2026
This paper has been accepted and is currently in production.
It will appear shortly on 10.2196/88766
The final accepted version (not copyedited yet) is in this tab.
Performance of AI tools in citing retracted literature
ABSTRACT
Background:
Artificial intelligence is increasingly used in scientific research to generate, refine, and summarize literature. Its ability to process large datasets promises greater efficiency in evidence synthesis and review. However, generative AI tools often produce inaccurate results and may cite retracted or unreliable studies without warning, posing risks to research integrity. Whether these systems can reliably detect and exclude retracted publications remains unclear.
Objective:
In this pragmatic trial nine, freely available generative AI tools have been tested for their ability to answer question without citing retracted literature.
Methods:
Each generative AI was asked five standardized questions about 15 different retracted articles. The articles were chosen from the Retraction Watch-database, including most cited and most recent retracted articles. All questions were repeated twice to assess consistency, and answers were rated for accuracy and reliability.
Results:
None of the nine AI tools consistently identified or excluded retracted articles. ChatGPT-5 performed best (8/15, (53.3%) correct), while SciSpace, ScienceO S, and Consensus showed no fully correct results. Microsoft Copilot achieved the highest topic-overview accuracy (87%), and ChatGPT-4 showed the greatest consistency (97.2%). OpenEvidence performed reliably within medical literature but reached perfect accuracy in only 2 of 13 (15.4%) cases.
Conclusions:
No free generative AI tool can reliably detect or exclude retracted studies. Even the best systems missed a substantial proportion of retracted articles. Until retraction-aware verification is integrated, independent source checking remains essential to preserve research integrity. Clinical Trial: https://doi.org/10.17605/OSF.IO/B6J2W
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.