JMIR Preprints #51187: Can ChatGPT Perform Scientific Literature Searches Competitively?

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Can ChatGPT Perform Scientific Literature Searches Competitively?

Yong Nam Gwon;
Jae Heon Kim;
Hyun Soo Chung;
Eun Jee Jung;
Joey Chun;
Serin Lee;
Sung Ryul Shim

ABSTRACT

Background:

A large language model (LLM) is a type of artificial intelligence (AI) model that opens up great possibilities for healthcare practice, research, and education, although scholars have highlighted that there is a need to proactively address current issues regarding its use. One of the best-known LLMs is ChatGPT.

Objective:

This study aims to explore the potential of ChatGPT as a real-time literature search tool for systematic reviews and clinical decision support system (CDSS).

Methods:

The search results of a systematic review study on the treatment of Peyronie's Disease published by human experts were selected as a benchmark, and the literature search formula of the study was applied to ChatGPT and Microsoft Bing to compare with human researchers. To determine the accuracy of the retrieved literature, we graded it as A, B, C, and F for only those cases where actual literature exists.

Results:

The benchmark human researcher's randomized controlled trial search results were 24. ChatGPT collected 1287 literature search results through 639 questions, and 7 of them were exactly matched, and Microsoft Bing collected 48 literature search results through 223 questions, and 19 of them were exactly matched with human search results.

Conclusions:

This is the first study to compare artificial intelligence (AI) and conventional human systematic review methods as a real-time literature collection tool for evidence-based medicine. The results suggest that the use of ChatGPT as a tool for real-time evidence generation is not yet accurate and feasible. Therefore, researchers should be cautious about using such AI.

Citation

Please cite as:

Gwon YN, Kim JH, Chung HS, Jung EJ, Chun J, Lee S, Shim SR

The Use of Generative AI for Scientific Literature Searches for Systematic Reviews: ChatGPT and Microsoft Bing AI Performance Evaluation

JMIR Med Inform 2024;12:e51187

DOI: 10.2196/51187

PMID: 38771247

PMCID: 11107769

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Jul 24, 2023

Open Peer Review Period: Jul 24, 2023 - Sep 18, 2023

Date Accepted: Apr 4, 2024

(closed for review but you can still tweet)

Can ChatGPT Perform Scientific Literature Searches Competitively?

ABSTRACT

Citation

Copyright