Currently submitted to: JMIR mHealth and uHealth
Date Submitted: Apr 7, 2026
Open Peer Review Period: Apr 8, 2026 - Jun 3, 2026
(currently open for review)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Effects of a Personalized Retrieval-Augmented Generation Chatbot on Information Needs in Chronic Kidney Disease Patients: Mixed Methods Randomized Controlled Trial
ABSTRACT
Background:
Patients with chronic kidney disease (CKD) who are on dialysis encounter complex self-management challenges that create ongoing information needs that traditional healthcare often fails to meet. Retrieval-augmented generation (RAG) enhances the factual accuracy of large language models, but it remains unclear whether personalized RAG is more effective than general RAG in reducing these information needs.
Objective:
This study aimed to assess whether personalized constraint RAG (C-RAG) is more effective than open-domain general RAG (O-RAG) and standard care in reducing information needs among dialysis patients. Additionally, the study sought to identify factors that predict engagement with the chatbot.
Methods:
A sequential two-phase mixed-methods study was conducted. Phase 1 involved the evaluation of four large language models (Claude 3.5 Sonnet, Gemini 1.5 Flash, CLOVA X, ChatGPT-4o) using 60 patient-generated questions, assessed by 15 clinical experts. Phase 2 was a three-arm parallel pilot randomized controlled trial (October–December 2025) at three tertiary hospitals in Seoul, South Korea. Participants were randomly assigned in a 1:1:1 ratio to either the C-RAG, O-RAG, or standard care control group. The primary outcome was the change in information needs across medication, dietary, and diagnostic domains using 5-point Likert scales at baseline and after four weeks, which were analyzed using ANCOVA. Semi-structured interviews with nine participants provided insights into the mechanisms behind the quantitative outcomes.
Results:
Phase 1 identified ChatGPT-4o as the optimal model, achieving the highest scores in clinical accuracy (mean 3.60), safety (mean 3.67), and readability (Flesch-Kincaid Grade Level 9.8, compared to 13.2-14.5 for other models, all on a 5-point scale). In Phase 2, 45 participants were enrolled, with 42 completing the study (retention rate of 42/45, 93%; 14 participants in each group). ANCOVA analysis revealed significant group differences in medication needs (F2,38=4.574, P=.017) and dietary information needs (F2,38=4.232, P=.022). Bonferroni-corrected pairwise comparisons indicated that the C-RAG group had significantly lower information needs than the O-RAG group for both medication (observed means 2.93 vs 3.86; Cohen d=-0.86, P=.017) and dietary domains (observed means 2.93 vs 3.86; Cohen d=-0.86, P=.018). No significant differences were found between either RAG group and the control group (all P>.12). Secondary outcomes demonstrated no significant group differences (all ANCOVA P>.40). Dietary management emerged as the most frequent query topic, accounting for 29.3% of total queries (205 out of 691). Participants on dialysis for 1-3 years showed higher engagement (90% engagement for 1-3 years vs 27.8% for 4 years; Fisher exact P=.006), while age, digital proficiency, and prior chatbot experience showed no notable associations. Qualitative analysis revealed five themes focused on the value of disease-specific access, dietary management as the primary need, and dialysis duration as a moderator of utility.
Conclusions:
Personalized RAG significantly outperforms general RAG, demonstrating large effect sizes (Cohen d=-0.86) that substantially exceed digital health benchmarks. The medium-to-large effects observed against standard care (Cohen d=-0.54 to -0.64) support the need for a fully powered confirmatory trial, with an estimated sample size of 50-55 participants per group, to evaluate long-term clinical outcomes. Clinical Trial: KCT0011656; https://cris.nih.go.kr/cris/search/detailSearch.do?seq=32626
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.