Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Previously submitted to: Journal of Medical Internet Research (no longer under consideration since Mar 22, 2024)

Date Submitted: Sep 24, 2023

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Readability and Presentation Suitability of ChatGPT's Medical Responses to Patient Questions: Cross-Sectional Study

  • Chan Woong Jang; 
  • Myungeun Yoo; 
  • Yoon Ghil Park

ABSTRACT

Background:

Online medical information, like ChatGPT, is crucial for patients making health decisions. However, many struggle with low literacy skills when using such content. To help, we need to ensure that the information is easily readable for the average adult. Surprisingly, there's been no research on how well ChatGPT delivers medical information in text form.

Objective:

To assess the readability and presentation suitability of ChatGPT responses to the most commonly asked patient questions, as well as ChatGPT's ability to improve readability.

Methods:

This study involves two phases. First, we evaluated ChatGPT's medical responses for the readability and presentation suitability using 30 knee osteoarthritis (OA)-related questions on March 20, 2023. We applied the Flesch-Kincaid Grade Level (FKGL) and Simple Measure Of the Gobbledygook (SMOG) readability formulas. Additionally, we used three evaluation tools: the Suitability Assessment of Materials (SAM) for presentation scores, the Ensuring Quality Information for Patients (EQIP), and the modified DISCERN (mDISCERN) for overall quality scores. Secondly, we assessed the readability improvement for answers to 50 stroke-related questions by providing both detailed and simple instructions into ChatGPT. In this phase, we also utilized FKGL and SMOG readability tests.

Results:

In the readability assessment, the mean (standard deviation, SD) scores for the 30 responses regarding knee OA were as follows: FKGL, 13.65 (1.80) reading grade and SMOG, 15.62 (1.55) reading grade, all of which were statistically higher than the recommended sixth-grade reading level (P < 0.001). In the presentation suitability assessment, SAM score for all answers was 55.00 (10.64), which is considered “adequate.” The mean EQIP and mDISCERN scores were 43.72 (5.78) and 2.83 (0.59), respectively, and none of the responses was evaluated as high quality. Upon implementing both detailed and simple instructions to the 50 responses regarding stroke, the ANOVA test results indicate statistically significant differences in mean readability scores among the three groups: pre-intervention, post-intervention with detailed instructions, and post-intervention with simple instructions (P < 0.001). Post-hoc analysis revealed that the pre-intervention group differed significantly from both post-intervention groups in both readability assessments (P < 0.001, respectively). However, there was no significant difference between the two post-intervention groups (P = 0.96 for FKGL and 0.86 for SMOG).

Conclusions:

This study discovered that ChatGPT responses are hard to read and have low quality, which may discomfort patients, despite their adequate presentation of medical information. Furthermore, ChatGPT lacks the ability to improve medical information's readability. As technology advances, enhancing ChatGPT's readability and user-friendliness will increase its usefulness for patients. Clinical Trial: Not applicable.


 Citation

Please cite as:

Jang CW, Yoo M, Park YG

Readability and Presentation Suitability of ChatGPT's Medical Responses to Patient Questions: Cross-Sectional Study

JMIR Preprints. 24/09/2023:53046

DOI: 10.2196/preprints.53046

URL: https://preprints.jmir.org/preprint/53046

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.