JMIR Preprints #55939: Evaluating the Efficacy of ChatGPT as a Patient Education Tool in Prostate Cancer: A Multi-Metric Assessment

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Evaluating the Efficacy of ChatGPT as a Patient Education Tool in Prostate Cancer: A Multi-Metric Assessment

Damien Gibson;
Stuart Jackson;
Ramesh Shanmugasundaram;
Ishith Seth;
Adrian Siu;
Nariman Ahmadi;
Jonathan Kam;
Nicholas Mehan;
Ruban Thanigasalam;
Nicola Jeffery;
Manish I Patel;
Scott Leslie

ABSTRACT

Background:

Artificial intelligence (AI) chatbots like ChatGPT have made significant progress. These chatbots, particularly popular among healthcare professionals and patients, are transforming patient education and disease experience with personalized information. They are especially beneficial for populations such as men with prostate cancer concerns. Accurate, timely patient education is crucial for informed decision-making, especially regarding Prostate-Specific Antigen screening and treatment options. AI chatbots can address the gap in quality prostate cancer information, reaching wider demographics, including remote communities. However, the accuracy and reliability of AI chatbots' medical information must be rigorously evaluated. Studies testing ChatGPT's knowledge in prostate cancer are emerging, but there's a need for ongoing evaluation to ensure the quality and safety of information provided to patients.

Objective:

To evaluate the quality, accuracy, and readability of ChatGPT-4's responses to common prostate cancer questions posed by patients.

Methods:

Eight questions were formulated with an inductive approach. These were based on information topics searched for and desired by prostate cancer patients in peer reviewed literature, and Google Trends data. The eight artificial intelligence (AI) outputs were judged by seven expert urologists, using an assessment framework developed to assess accuracy, safety, appropriateness, actionability and effectiveness. Adapted versions of the Patient Education Materials Assessment Tool (PEMAT-AI), Global Quality Score (GQS), and DISCERN (DISCERN-AI) tools were used by four independent reviewers to assess the quality of the AI responses. Readability of the AI responses was assessed using established algorithms (Flesch Reading Ease score, Gunning Fog Index, Flesch-Kincaid Grade Level, The Coleman-Liau Index and SMOG Index). A brief tool (REF-AI) was developed for analysis of the references provided by AI outputs, assessing for reference hallucination, relevance, and quality of references.

Results:

PEMAT-AI understandability score was very good (mean 79.44%, SD 10.44), GQS was rated as high (mean 4.46/5, SD 0.50), and DISCERN-AI rating of moderate quality (mean 13.88, SD 0.93). NLAT-AI pooled means (SD) included accuracy 3.96 (0.91), safety 4.32 (0.86), appropriateness 4.45 (0.81), actionability 4.05 (1.15), and effectiveness 4.09 (0.98). Readability algorithm consensus was “difficult to read” (Flesch Reading Ease score mean 45.97 SD 8.69, Gunning Fog Index mean 14.55 SD 4.79), averaging a grade 11 reading level, equivalent to 15 – 17-year-old (Flesch-Kincaid Grade Level mean 12.12 SD 4.34, The Coleman-Liau Index mean 12.75 SD 1.98, SMOG Index mean 11.06 SD 3.20). REF-AI identified two reference hallucinations, while the majority of references appropriately supplemented the text. Most references were from reputable government organizations, while a handful were direct citations from scientific literature.

Conclusions:

Our analysis found that ChatGPT-4 provides generally good responses to common prostate cancer queries, making it a potentially valuable tool for patient education in prostate cancer care. Objective quality assessment tools indicated that the natural language processing (NLP) outputs were generally reliable and appropriate, but there is room for improvement.

Citation

Please cite as:

Gibson D, Jackson S, Shanmugasundaram R, Seth I, Siu A, Ahmadi N, Kam J, Mehan N, Thanigasalam R, Jeffery N, Patel MI, Leslie S

Evaluating the Efficacy of ChatGPT as a Patient Education Tool in Prostate Cancer: Multimetric Assessment

J Med Internet Res 2024;26:e55939

DOI: 10.2196/55939

PMID: 39141904

PMCID: 11358656

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Jan 5, 2024

Date Accepted: May 12, 2024

Evaluating the Efficacy of ChatGPT as a Patient Education Tool in Prostate Cancer: A Multi-Metric Assessment

ABSTRACT

Citation

Copyright