JMIR Preprints #59258: Evaluating the Medical Article Understanding Capabilities of Generative Artificial Intelligence Tools

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Evaluating the Medical Article Understanding Capabilities of Generative Artificial Intelligence Tools

Seyma Handan Akyon;
Fatih Cagatay Akyon;
Ahmet Sefa Camyar;
Fatih Hizli;
Talha Sari;
Samil Hizli

ABSTRACT

Background:

Reading medical articles is a challenging and time-consuming task for doctors, especially when the articles are long and complex. There is a need for a tool that can help doctors to process and understand medical articles more efficiently, accurately and fast.Generative artificial intelligence (AI) tools can assist doctors in analyzing medical articles, but there is no research evaluating medical articles and understanding the capabilities of new generative AI tools.

Objective:

This study aims to critically assess and compare the comprehension capabilities of Large Language Models (LLMs) in accurately and efficiently understanding medical research articles using the STROBE checklist.

Methods:

The study is a methodological type of research. The study aims to evaluate the understanding capabilities of new generative AI tools in medical articles. We designed a novel benchmark pipeline that can process PUBMED articles regardless of their length using various generative AI tools. Using this benchmark pipeline, we compared the answers of several generative AI tools (Chat-GPT 3.5-turbo, chat-GPT-4, Palm, Claude v1, Gemini pro) with the golden standard for 50 medical research articles from PUBMED. The experienced medical professor's answers to these questions are assigned as the golden standard. This study will evaluate the performance of various Large Language Models (LLMs) in accurately answering specific questions related to different sections of a scholarly article: title and abstract, methods, results, and discussion in fifteen questions from the STROBE Checklist

Results:

Among the answers given by LLMs to the questions, the LLM that gave the most correct answers (66.9%) was GPT 3.5. This was followed by GPT 4-1106 version (65.6%), Palm2- (62.1%), Claude v1 (58.3%), Gemini pro (49.2%) and GPT-4 0613 version (44.1%). LLMs showcased distinct performances for each question across different parts of a scholarly article - with certain models like Palm 2 and GPT 3.5 showing remarkable versatility and depth in understanding.

Conclusions:

This study is the first study to evaluate the performance of different LLMs in evaluating their ability to understanding medical articles using the Retrieval-Augmented Generation (RAG) method by giving documents.

Citation

Please cite as:

Akyon SH, Akyon FC, Camyar AS, Hizli F, Sari T, Hizli S

Evaluating the Capabilities of Generative AI Tools in Understanding Medical Papers: Qualitative Study

JMIR Med Inform 2024;12:e59258

DOI: 10.2196/59258

PMID: 39230947

PMCID: 11411230

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Apr 7, 2024

Open Peer Review Period: Apr 20, 2024 - Jun 20, 2024

Date Accepted: Jul 5, 2024

(closed for review but you can still tweet)

Evaluating the Medical Article Understanding Capabilities of Generative Artificial Intelligence Tools

ABSTRACT

Citation

Copyright