Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Jul 24, 2023
Open Peer Review Period: Jul 24, 2023 - Sep 18, 2023
Date Accepted: Apr 4, 2024
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

The Use of Generative AI for Scientific Literature Searches for Systematic Reviews: ChatGPT and Microsoft Bing AI Performance Evaluation

Gwon YN, Kim JH, Chung HS, Jung EJ, Chun J, Lee S, Shim SR

The Use of Generative AI for Scientific Literature Searches for Systematic Reviews: ChatGPT and Microsoft Bing AI Performance Evaluation

JMIR Med Inform 2024;12:e51187

DOI: 10.2196/51187

PMID: 38771247

PMCID: 11107769

Can ChatGPT Perform Scientific Literature Searches Competitively?

  • Yong Nam Gwon; 
  • Jae Heon Kim; 
  • Hyun Soo Chung; 
  • Eun Jee Jung; 
  • Joey Chun; 
  • Serin Lee; 
  • Sung Ryul Shim

ABSTRACT

Background:

A large language model (LLM) is a type of artificial intelligence (AI) model that opens up great possibilities for healthcare practice, research, and education, although scholars have highlighted that there is a need to proactively address current issues regarding its use. One of the best-known LLMs is ChatGPT.

Objective:

This study aims to explore the potential of ChatGPT as a real-time literature search tool for systematic reviews and clinical decision support system (CDSS).

Methods:

The search results of a systematic review study on the treatment of Peyronie's Disease published by human experts were selected as a benchmark, and the literature search formula of the study was applied to ChatGPT and Microsoft Bing to compare with human researchers. To determine the accuracy of the retrieved literature, we graded it as A, B, C, and F for only those cases where actual literature exists.

Results:

The benchmark human researcher's randomized controlled trial search results were 24. ChatGPT collected 1287 literature search results through 639 questions, and 7 of them were exactly matched, and Microsoft Bing collected 48 literature search results through 223 questions, and 19 of them were exactly matched with human search results.

Conclusions:

This is the first study to compare artificial intelligence (AI) and conventional human systematic review methods as a real-time literature collection tool for evidence-based medicine. The results suggest that the use of ChatGPT as a tool for real-time evidence generation is not yet accurate and feasible. Therefore, researchers should be cautious about using such AI.


 Citation

Please cite as:

Gwon YN, Kim JH, Chung HS, Jung EJ, Chun J, Lee S, Shim SR

The Use of Generative AI for Scientific Literature Searches for Systematic Reviews: ChatGPT and Microsoft Bing AI Performance Evaluation

JMIR Med Inform 2024;12:e51187

DOI: 10.2196/51187

PMID: 38771247

PMCID: 11107769

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.