Currently submitted to: JMIR Rehabilitation and Assistive Technologies
Date Submitted: Feb 7, 2026
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Enhancing Readability of Consumer Health Information on Spinal Cord Injury through a Self-Reflection Rewriting Method
ABSTRACT
Background:
Spinal Cord Injury (SCI) is a life-altering condition that creates a critical need for accurate and accessible patient education materials (PEMs). Retrieval Augmented Generation (RAG) with Large Language Models (LLMs) holds promise for answering consumer health questions (CHQs) using existing PEMs. However, readability remains a concern. Our preliminary findings indicate that RAG-generated answers frequently exceed the commonly recommended 8th-grade readability level, thereby widening accessibility gaps for patients and family caregivers seeking SCI-related consumer health information (CHI).
Objective:
This study aimed to evaluate strategies for enhancing the readability of RAG-generated answers to SCI-related CHQs with a focus on comparing a self-reflection approach to existing prompt-based approaches.
Methods:
A curated corpus of 172 SCI-related PEMs was collected and categorized into three groups (FAQs, self-management guidelines, and insurance/legal documents), forming the foundation of a RAG knowledge base. From these PEMs, synthetic CHQs were generated and answered using a RAG framework across multiple LLMs, including OpenAI GPT 4o, Claude Sonnet 3.5, Google Gemini 1.5 Pro, Meta Llama 3.3 70B Instruct and OpenAI GPT OSS 20B. The generated answers were rewritten using four strategies: (1) few-shot prompting, (2) explicit readability prompt, (3) post-generation rewriting, and (4) self-reflection and refinement (READ). Readability was assessed using the Flesch-Kincaid Grade Level (FKGL) and semantic similarity was evaluated with embedding-based cosine similarity.
Results:
Baseline RAG answers consistently exceeded recommended readability levels, scoring higher than the PEMs from which they were derived. Few-shot prompting yielded only minimal gain and no significant improvement in readability. Both explicit readability prompting and post-generation rewriting produced significant improvement yet still failed to consistently achieve the 8th-grade readability target. In contrast, the READ approach significantly improved readability across all tested LLMs, with nearly all outputs written at or below the 8th-grade level. However, improvements in readability were accompanied by decline in semantic similarity, highlighting a potential trade-off between accessibility and semantic similarity.
Conclusions:
The READ approach provides an effective and robust framework for improving the readability of RAG-generated answers to SCI-related CHQs, outperforming prompt-based strategies and demonstrating strong potential to enhance the accessibility of patient-facing health information. Future studies should incorporate expert evaluation and real-world testing to ensure clinical accuracy, usability, and successful integration into clinical settings.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.