Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Sep 19, 2023
Date Accepted: Mar 12, 2024
Citations and References in Scholarly Writing: A cross-disciplinary Evaluation of Large Language Model Performance and Reliability.
ABSTRACT
Background:
Recent advancements in natural language processing have given rise to Large Language Models (LLMs), such as ChatGPT (GPT-3.5), capable of generating scholarly content, including citations and references. Assessing the accuracy of these AI-generated citations is imperative for maintaining scholarly rigor.
Objective:
The aim of this study was to assess the accuracy of citations and references generated by ChatGPT (GPT-3.5) in two distinct academic domains: Natural Sciences and Humanities.
Methods:
Two researchers independently prompted ChatGPT to write an introduction section for a manuscript and include citations; then evaluated citations and DOI accuracy. Results were compared between the two disciplines.
Results:
10 topics were included, 5 in natural sciences and 5 in humanities. A total of 102 citations were generated, 55 in natural sciences and 47 in humanities. 40 citations (72.7%) in natural sciences were real and 36 (76.6%) in humanities (P = 0.415). There were significant disparities found in DOI presence (Natural Sciences: 70.9% vs. Humanities: 38.3%) and accuracy (32.7% vs. 8.5%). DOI hallucination was more prevalent in the Humanities (89.4%). Levenshtein Distance was significantly higher in the Humanities, indicating lower DOI accuracy.
Conclusions:
ChatGPT's performance in generating citations and references varies across disciplines. Differences in DOI standards and disciplinary nuances contribute to performance variations. Researchers should consider AI writing tools' strengths and limitations in citation accuracy. Domain-specific models may enhance accuracy.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.