Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Apr 7, 2023
Open Peer Review Period: Apr 7, 2023 - Jun 2, 2023
Date Accepted: Jun 15, 2023
(closed for review but you can still tweet)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Evaluation of ChatGPT-4 Provided Information on Hepato Pancratico Biliary Conditions Using the Ensuring Quality Information for Patients Tool and Current Guidelines: A Systematic Evaluation
ABSTRACT
Background:
ChatGPT-4 is the latest release of a novel AI chatbot able to answer freely formulated complex questions. It could become the new standard for healthcare professionals and patients to access medical information in the near future. Howerver, little is known about the quality of medical information provided by the AI.
Objective:
To analyse the quality of medical information provided by ChatGPT.
Methods:
Medical information provided by ChatGPT-4 on the five Hepato-Pancreatico-Biliary (HPB) conditions with the hightest global disease burden (GBD) was measured with the 36 items Ensuring Quality Information for Patients (EQIP) tool. Five guideline recommendations per analysed condition were rephrased as a question and input to ChatGPT, and agreement between the guidelines and the AI answer was measured by two authors independently. All queries were repeated three times to measure internal consistency of ChatGPT.
Results:
Five conditions were identified (gallstone disease, pancreatitis, liver cirrhosis, pancreatic cancer and hepatocellular carcinoma). The median (IQR) EQIP score across all conditions was 16 (14.5-18) from a total of 36. Divided by subsection, median (IQR) scores for content, identification and structure data were 10 (9.5-12.5), 1 (1-1), and 4 (4-5), respectively. Agreement between guideline recommendations and answers provided by ChatGPT was 60% (15/25). Inter-rater agreement as measured by Cohens Kappa was 0.83 (95% confidence interval: 0.61– 1.05), indicading a very high level of agreement. Internal consistency of provided answers by Chat GPT was complete (100%).
Conclusions:
ChatGPT provides medical information of comparable quality to available static internet information. Altough currently of limited quality, larger language models could become the future standard for patients and healthcare professionals to gather medical information. Clinical Trial: None
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.