Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Jan 5, 2024
Date Accepted: May 12, 2024
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Evaluating the Efficacy of ChatGPT as a Patient Education Tool in Prostate Cancer: A Multi-Metric Assessment
ABSTRACT
Background:
Artificial intelligence (AI) chatbots like ChatGPT have made significant progress. These chatbots, particularly popular among healthcare professionals and patients, are transforming patient education and disease experience with personalized information. They are especially beneficial for populations such as men with prostate cancer concerns. Accurate, timely patient education is crucial for informed decision-making, especially regarding Prostate-Specific Antigen screening and treatment options. AI chatbots can address the gap in quality prostate cancer information, reaching wider demographics, including remote communities. However, the accuracy and reliability of AI chatbots' medical information must be rigorously evaluated. Studies testing ChatGPT's knowledge in prostate cancer are emerging, but there's a need for ongoing evaluation to ensure the quality and safety of information provided to patients.
Objective:
To evaluate the quality, accuracy, and readability of ChatGPT-4's responses to common prostate cancer questions posed by patients.
Methods:
Eight questions were formulated with an inductive approach. These were based on information topics searched for and desired by prostate cancer patients in peer reviewed literature, and Google Trends data. The eight artificial intelligence (AI) outputs were judged by seven expert urologists, using an assessment framework developed to assess accuracy, safety, appropriateness, actionability and effectiveness. Adapted versions of the Patient Education Materials Assessment Tool (PEMAT-AI), Global Quality Score (GQS), and DISCERN (DISCERN-AI) tools were used by four independent reviewers to assess the quality of the AI responses. Readability of the AI responses was assessed using established algorithms (Flesch Reading Ease score, Gunning Fog Index, Flesch-Kincaid Grade Level, The Coleman-Liau Index and SMOG Index). A brief tool (REF-AI) was developed for analysis of the references provided by AI outputs, assessing for reference hallucination, relevance, and quality of references.
Results:
PEMAT-AI understandability score was very good (mean 79.44%, SD 10.44), GQS was rated as high (mean 4.46/5, SD 0.50), and DISCERN-AI rating of moderate quality (mean 13.88, SD 0.93). NLAT-AI pooled means (SD) included accuracy 3.96 (0.91), safety 4.32 (0.86), appropriateness 4.45 (0.81), actionability 4.05 (1.15), and effectiveness 4.09 (0.98). Readability algorithm consensus was “difficult to read” (Flesch Reading Ease score mean 45.97 SD 8.69, Gunning Fog Index mean 14.55 SD 4.79), averaging a grade 11 reading level, equivalent to 15 – 17-year-old (Flesch-Kincaid Grade Level mean 12.12 SD 4.34, The Coleman-Liau Index mean 12.75 SD 1.98, SMOG Index mean 11.06 SD 3.20). REF-AI identified two reference hallucinations, while the majority of references appropriately supplemented the text. Most references were from reputable government organizations, while a handful were direct citations from scientific literature.
Conclusions:
Our analysis found that ChatGPT-4 provides generally good responses to common prostate cancer queries, making it a potentially valuable tool for patient education in prostate cancer care. Objective quality assessment tools indicated that the natural language processing (NLP) outputs were generally reliable and appropriate, but there is room for improvement.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.