Currently submitted to: JMIR Formative Research
Date Submitted: Jun 6, 2026
Open Peer Review Period: Jun 8, 2026 - Aug 3, 2026
(currently open for review)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Artificial Intelligence (AI) –Assisted Optimization of Online Gastrointestinal Patient Education Materials: A Cross-Sectional Study
ABSTRACT
Background:
Patient education materials (PEMs) related to gastroenterology are often written at a level above the recommended sixth-grade reading level, which is suboptimal for accessibility. Large language models (LLMs) can enhance the readability of PEMs, but their effectiveness in this regard remains to be evaluated.
Objective:
The goal of this research is to identify if LLMs can optimize the readability and understandability of gastroenterology (GI)-focused PEMs to a more accessible level, and if different models perform differently.
Methods:
A cross-sectional review was performed on 60 PEMs that were randomly sampled from three GI-focused websites (American Cancer Society [ACS], American College of Gastroenterology [ACG], and American Gastroenterological Association [AGA]). PEMs were rewritten by four LLMs (ChatGPT, Gemini, Claude, and Perplexity) with a standardized fifth-grade translation prompt. Readability was assessed with the Flesch Reading Ease, Flesch-Kincaid Grade Level, Gunning Fog Index, and Simple Measure of Gobbledygook (SMOG) Index. Understandability was assessed with the Patient Education Materials Assessment Tool-Understandability (PEMAT-U). Accuracy was checked by two physicians who independently reviewed the simplified materials, with discrepancies verified using ChatGPT.
Results:
PEMs from the original websites scored higher than the National Institutes of Health (NIH)-recommended sixth-grade level, with those from the ACG even at postgraduate levels. While all LLMs improved PEM readability and understandability, their performance and accuracy varied. Gemini had the most significant impact on readability but also produced the highest inaccuracy rate (11.6%). Claude introduced inaccuracies at a rate of 5%, while ChatGPT and Perplexity produced no errors. The accuracy review showed errors were concentrated in the most complex source materials and included oversimplification and omission of risk qualifiers.
Conclusions:
: LLMs show the potential to increase the accessibility of PEMs related to gastroenterology but vary in their performance, indicating the importance of human review. Gemini was the most effective of those included, but the inconsistency in performance and accuracy across different models suggests that AI output cannot be blindly trusted. A combined approach using LLMs and expert review can help improve patient understanding and health literacy.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.