JMIR Preprints #47479: Evaluation of ChatGPT-4 Provided Information on Hepato Pancratico Biliary Conditions Using the Ensuring Quality Information for Patients Tool and Current Guidelines: A Systematic Evaluation

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Evaluation of ChatGPT-4 Provided Information on Hepato Pancratico Biliary Conditions Using the Ensuring Quality Information for Patients Tool and Current Guidelines: A Systematic Evaluation

Harriet Louise Walker;
Shahi Ghani;
Christoph Kümmerli;
Christian Nebiker;
Beat Müler;
Dimitri Aristotle Raptis;
Sebastian Manuel Staubli

ABSTRACT

Background:

ChatGPT-4 is the latest release of a novel AI chatbot able to answer freely formulated complex questions. It could become the new standard for healthcare professionals and patients to access medical information in the near future. Howerver, little is known about the quality of medical information provided by the AI.

Objective:

To analyse the quality of medical information provided by ChatGPT.

Methods:

Medical information provided by ChatGPT-4 on the five Hepato-Pancreatico-Biliary (HPB) conditions with the hightest global disease burden (GBD) was measured with the 36 items Ensuring Quality Information for Patients (EQIP) tool. Five guideline recommendations per analysed condition were rephrased as a question and input to ChatGPT, and agreement between the guidelines and the AI answer was measured by two authors independently. All queries were repeated three times to measure internal consistency of ChatGPT.

Results:

Five conditions were identified (gallstone disease, pancreatitis, liver cirrhosis, pancreatic cancer and hepatocellular carcinoma). The median (IQR) EQIP score across all conditions was 16 (14.5-18) from a total of 36. Divided by subsection, median (IQR) scores for content, identification and structure data were 10 (9.5-12.5), 1 (1-1), and 4 (4-5), respectively. Agreement between guideline recommendations and answers provided by ChatGPT was 60% (15/25). Inter-rater agreement as measured by Cohens Kappa was 0.83 (95% confidence interval: 0.61– 1.05), indicading a very high level of agreement. Internal consistency of provided answers by Chat GPT was complete (100%).

Conclusions:

ChatGPT provides medical information of comparable quality to available static internet information. Altough currently of limited quality, larger language models could become the future standard for patients and healthcare professionals to gather medical information. Clinical Trial: None

Citation

Please cite as:

Walker HL, Ghani S, Kümmerli C, Nebiker C, Müler B, Raptis DA, Staubli SM

Reliability of Medical Information Provided by ChatGPT: Assessment Against Clinical Guidelines and Patient Information Quality Instrument

J Med Internet Res 2023;25:e47479

DOI: 10.2196/47479

PMID: 37389908

PMCID: 10365578

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Apr 7, 2023

Open Peer Review Period: Apr 7, 2023 - Jun 2, 2023

Date Accepted: Jun 15, 2023

(closed for review but you can still tweet)

Evaluation of ChatGPT-4 Provided Information on Hepato Pancratico Biliary Conditions Using the Ensuring Quality Information for Patients Tool and Current Guidelines: A Systematic Evaluation

ABSTRACT

Citation

Copyright