Accepted for/Published in: JMIR Formative Research
Date Submitted: Apr 16, 2025
Date Accepted: Jun 27, 2025
Evaluating Quality and Understandability of Radiology Report Summaries Generated by ChatGPT: A Pilot Study
ABSTRACT
Background:
Radiology reports play a crucial role in conveying medical information to healthcare providers and patients. Unfortunately, they are difficult for patients to comprehend, causing confusion and anxiety, thereby limiting patient engagement in healthcare decisions. Large language models (LLMs) such as ChatGPT have the potential to create simplified, patient-friendly report summaries to increase accessibility. However, there's also a risk of errors being introduced during summarization.
Objective:
The objectives of this study were to: (1) evaluate the accuracy and clarity of ChatGPT-generated summaries compared to original radiology reports as assessed by radiologists; (2) assess patients' understanding and satisfaction with the summaries compared to the original reports; and (3) compare the readability of the original reports and summaries using validated readability metrics.
Methods:
We anonymized 30 radiology reports created by neuroradiologists at our institution (6 MR brain, 6 CT brain, 6 CTA head/neck, 6 CT neck, 6 CT spine). These anonymized reports were processed by ChatGPT to produce patient-centric summaries. 4 board-certified neuroradiologists evaluated the paired original reports and ChatGPT-generated summaries on quality and accuracy, and 4 patient volunteers evaluated the reports on perceived understandability and satisfaction. Readability was assessed via word count and validated readability scales.
Results:
There was substantial improvement in patient confidence of understanding after reading the summary (98% vs. 26%), satisfaction regarding the level of jargon/terminology (91% vs. 8%), and satisfaction regarding the time taken to understand the content (97% vs. 23%). 92% of responses indicated the summary clarified questions patients had about the report, and 98% of responses indicated patients would utilize the summary if available, with 67% of responses indicating they would want to have access to both the report and summary, while 26% of responses indicated only wanting the summary. 83% of radiologist responses indicated the summary represented the original report "extremely" or "very" well, with only 5% of responses indicating it did so "slightly" or "not well at all." 5% of responses indicated there was missing relevant medical information in the summary, 12% reported instances of overemphasis of non-significant findings, and 18% reported instances of underemphasis of significant findings. No fabricated findings were identified. Overall, 83% of responses indicated that the summary would definitely/probably not lead patients to incorrect conclusions about the original report, with 10% of responses indicating the summaries may do so.
Conclusions:
These findings demonstrate that ChatGPT-generated summaries can significantly improve perceived comprehension and satisfaction while accurately reflecting most key information from original radiology reports. Instances of minor omissions and under/over emphasis were noted in some summaries, underscoring the need for ongoing validation and oversight. Overall, these patient-centric summaries hold promise for enhancing patient-centered communication in radiology.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.