Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Apr 27, 2025
Date Accepted: Sep 22, 2025
Advancing Patient-Centered Communication with Generative AI: A Large-Scale Evaluation of ChatGPT-4 for Lay Summarization in Prostate Cancer Research
ABSTRACT
Background:
As the volume and complexity of biomedical literature continue to grow, translating scientific knowledge into accessible language for patients and the public has become an increasingly urgent challenge. Lay summaries, now encouraged or required by many journals, seek to address this gap. However, their clarity, readability, and consistency often fall short of the expected standards. Generative large language models, such as ChatGPT-4, offer a novel opportunity to support this translational effort. Yet, their effectiveness in producing high-quality lay summaries has not been evaluated systematically on a large scale within a specific medical domain.
Objective:
This study aimed to assess the performance of ChatGPT-4 in generating lay summaries for peer-reviewed articles in prostate cancer research. A secondary objective was to evaluate the influence of prompt design on the quality of AI-generated outputs, providing practical guidance for optimizing generative tools in health communication.
Methods:
We selected 204 consecutive articles on prostate cancer published in a leading oncology journal that mandates lay summaries. For each article, the abstract was processed using ChatGPT-4 with two distinct prompt strategies: One prompt was based directly on the journal’s author guidelines, while the other not only adhered to these guidelines but was also iteratively refined to meet international readability standards. The resulting AI-generated summaries and the original lay summaries were evaluated across four domains: lexical readability (using validated indices), factual accuracy, adherence to journal instructions, and overall quality. The assessment of the resulting AI-generated and original lay summaries was conducted in a blinded manner with respect to authorship and prompt type.
Results:
ChatGPT-4 consistently produced summaries with superior readability compared to human-written counterparts, achieving significantly better complexity scores across all metrics (p < .001). Factual accuracy was comparable between the two ChatGPT-4-generated summaries (p = 0.4) but exceeded that of the original lay summaries (each p < 0.001). In the overall evaluation—considering readability, factual accuracy, and adherence to journal guidelines—the extended prompt yielded flawless lay summaries in 79% of cases, compared to 55% for the simple prompt and 5.4% for the original lay summaries (all p < 0.001).
Conclusions:
Given the use of an appropriate prompt, ChatGPT-4 is capable of producing lay summaries that are easier to comprehend, more accurate, and stylistically more consistent than conventional human-authored summaries. These findings underscore the potential of generative AI to enhance patient-centered communication and make scientific knowledge more accessible. The study also offers a practical framework for integrating AI tools into editorial processes and public health communication. Future research should explore how patients perceive and engage with such summaries, how these tools may influence health behavior, and how their utility translates across a broader spectrum of clinical domains.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.