Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Apr 27, 2025
Date Accepted: Sep 22, 2025

The final, peer-reviewed published version of this preprint can be found here:

Using ChatGPT-4 for Lay Summarization in Prostate Cancer Research to Advance Patient-Centered Communication: Large-Scale Generative AI Performance Evaluation

Rinderknecht E, Engelmann S, Saberi V, Kirschner C, Kravchuk AP, Schmelzer A, Breyer J, Goßler C, Mayr R, Gilfrich C, Burger M, von Winning D, Borgmann H, Wülfing C, Merseburger AS, Haas M, May M

Using ChatGPT-4 for Lay Summarization in Prostate Cancer Research to Advance Patient-Centered Communication: Large-Scale Generative AI Performance Evaluation

J Med Internet Res 2025;27:e76598

DOI: 10.2196/76598

PMID: 41259712

PMCID: 12629520

Advancing Patient-Centered Communication with Generative AI: A Large-Scale Evaluation of ChatGPT-4 for Lay Summarization in Prostate Cancer Research

  • Emily Rinderknecht; 
  • Simon Engelmann; 
  • Veronika Saberi; 
  • Clemens Kirschner; 
  • Anton P. Kravchuk; 
  • Anna Schmelzer; 
  • Johannes Breyer; 
  • Christopher Goßler; 
  • Roman Mayr; 
  • Christian Gilfrich; 
  • Maximilian Burger; 
  • Dominik von Winning; 
  • Hendrik Borgmann; 
  • Christian Wülfing; 
  • Axel S. Merseburger; 
  • Maximilian Haas; 
  • Matthias May

ABSTRACT

Background:

As the volume and complexity of biomedical literature continue to grow, translating scientific knowledge into accessible language for patients and the public has become an increasingly urgent challenge. Lay summaries, now encouraged or required by many journals, seek to address this gap. However, their clarity, readability, and consistency often fall short of the expected standards. Generative large language models, such as ChatGPT-4, offer a novel opportunity to support this translational effort. Yet, their effectiveness in producing high-quality lay summaries has not been evaluated systematically on a large scale within a specific medical domain.

Objective:

This study aimed to assess the performance of ChatGPT-4 in generating lay summaries for peer-reviewed articles in prostate cancer research. A secondary objective was to evaluate the influence of prompt design on the quality of AI-generated outputs, providing practical guidance for optimizing generative tools in health communication.

Methods:

We selected 204 consecutive articles on prostate cancer published in a leading oncology journal that mandates lay summaries. For each article, the abstract was processed using ChatGPT-4 with two distinct prompt strategies: One prompt was based directly on the journal’s author guidelines, while the other not only adhered to these guidelines but was also iteratively refined to meet international readability standards. The resulting AI-generated summaries and the original lay summaries were evaluated across four domains: lexical readability (using validated indices), factual accuracy, adherence to journal instructions, and overall quality. The assessment of the resulting AI-generated and original lay summaries was conducted in a blinded manner with respect to authorship and prompt type.

Results:

ChatGPT-4 consistently produced summaries with superior readability compared to human-written counterparts, achieving significantly better complexity scores across all metrics (p < .001). Factual accuracy was comparable between the two ChatGPT-4-generated summaries (p = 0.4) but exceeded that of the original lay summaries (each p < 0.001). In the overall evaluation—considering readability, factual accuracy, and adherence to journal guidelines—the extended prompt yielded flawless lay summaries in 79% of cases, compared to 55% for the simple prompt and 5.4% for the original lay summaries (all p < 0.001).

Conclusions:

Given the use of an appropriate prompt, ChatGPT-4 is capable of producing lay summaries that are easier to comprehend, more accurate, and stylistically more consistent than conventional human-authored summaries. These findings underscore the potential of generative AI to enhance patient-centered communication and make scientific knowledge more accessible. The study also offers a practical framework for integrating AI tools into editorial processes and public health communication. Future research should explore how patients perceive and engage with such summaries, how these tools may influence health behavior, and how their utility translates across a broader spectrum of clinical domains.


 Citation

Please cite as:

Rinderknecht E, Engelmann S, Saberi V, Kirschner C, Kravchuk AP, Schmelzer A, Breyer J, Goßler C, Mayr R, Gilfrich C, Burger M, von Winning D, Borgmann H, Wülfing C, Merseburger AS, Haas M, May M

Using ChatGPT-4 for Lay Summarization in Prostate Cancer Research to Advance Patient-Centered Communication: Large-Scale Generative AI Performance Evaluation

J Med Internet Res 2025;27:e76598

DOI: 10.2196/76598

PMID: 41259712

PMCID: 12629520

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.