Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Apr 7, 2023
Open Peer Review Period: Apr 7, 2023 - Jun 2, 2023
Date Accepted: Jun 15, 2023
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Reliability of Medical Information Provided by ChatGPT: Assessment Against Clinical Guidelines and Patient Information Quality Instrument

Walker HL, Ghani S, Kümmerli C, Nebiker C, Müler B, Raptis DA, Staubli SM

Reliability of Medical Information Provided by ChatGPT: Assessment Against Clinical Guidelines and Patient Information Quality Instrument

J Med Internet Res 2023;25:e47479

DOI: 10.2196/47479

PMID: 37389908

PMCID: 10365578

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Evaluation of ChatGPT-4 Provided Information on Hepato Pancratico Biliary Conditions Using the Ensuring Quality Information for Patients Tool and Current Guidelines: A Systematic Evaluation

  • Harriet Louise Walker; 
  • Shahi Ghani; 
  • Christoph Kümmerli; 
  • Christian Nebiker; 
  • Beat Müler; 
  • Dimitri Aristotle Raptis; 
  • Sebastian Manuel Staubli

ABSTRACT

Background:

ChatGPT-4 is the latest release of a novel AI chatbot able to answer freely formulated complex questions. It could become the new standard for healthcare professionals and patients to access medical information in the near future. Howerver, little is known about the quality of medical information provided by the AI.

Objective:

To analyse the quality of medical information provided by ChatGPT.

Methods:

Medical information provided by ChatGPT-4 on the five Hepato-Pancreatico-Biliary (HPB) conditions with the hightest global disease burden (GBD) was measured with the 36 items Ensuring Quality Information for Patients (EQIP) tool. Five guideline recommendations per analysed condition were rephrased as a question and input to ChatGPT, and agreement between the guidelines and the AI answer was measured by two authors independently. All queries were repeated three times to measure internal consistency of ChatGPT.

Results:

Five conditions were identified (gallstone disease, pancreatitis, liver cirrhosis, pancreatic cancer and hepatocellular carcinoma). The median (IQR) EQIP score across all conditions was 16 (14.5-18) from a total of 36. Divided by subsection, median (IQR) scores for content, identification and structure data were 10 (9.5-12.5), 1 (1-1), and 4 (4-5), respectively. Agreement between guideline recommendations and answers provided by ChatGPT was 60% (15/25). Inter-rater agreement as measured by Cohens Kappa was 0.83 (95% confidence interval: 0.61– 1.05), indicading a very high level of agreement. Internal consistency of provided answers by Chat GPT was complete (100%).

Conclusions:

ChatGPT provides medical information of comparable quality to available static internet information. Altough currently of limited quality, larger language models could become the future standard for patients and healthcare professionals to gather medical information. Clinical Trial: None


 Citation

Please cite as:

Walker HL, Ghani S, Kümmerli C, Nebiker C, Müler B, Raptis DA, Staubli SM

Reliability of Medical Information Provided by ChatGPT: Assessment Against Clinical Guidelines and Patient Information Quality Instrument

J Med Internet Res 2023;25:e47479

DOI: 10.2196/47479

PMID: 37389908

PMCID: 10365578

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.