Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Currently submitted to: JMIR mHealth and uHealth

Date Submitted: Apr 7, 2026
Open Peer Review Period: Apr 8, 2026 - Jun 3, 2026
(currently open for review)

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Effects of a Personalized Retrieval-Augmented Generation Chatbot on Information Needs in Chronic Kidney Disease Patients: Mixed Methods Randomized Controlled Trial

  • Hee Jeong Hwang; 
  • Junghwan Kim; 
  • Dongseok Heo; 
  • Jeong Yun You; 
  • Jeonghwan Lee; 
  • Soie Kwon; 
  • Kyung Don Yoo; 
  • Jung Han Yoon Park; 
  • Bongwon Suh; 
  • Jung Pyo Lee; 
  • Ki Won Lee

ABSTRACT

Background:

Patients with chronic kidney disease (CKD) who are on dialysis encounter complex self-management challenges that create ongoing information needs that traditional healthcare often fails to meet. Retrieval-augmented generation (RAG) enhances the factual accuracy of large language models, but it remains unclear whether personalized RAG is more effective than general RAG in reducing these information needs.

Objective:

This study aimed to assess whether personalized constraint RAG (C-RAG) is more effective than open-domain general RAG (O-RAG) and standard care in reducing information needs among dialysis patients. Additionally, the study sought to identify factors that predict engagement with the chatbot.

Methods:

A sequential two-phase mixed-methods study was conducted. Phase 1 involved the evaluation of four large language models (Claude 3.5 Sonnet, Gemini 1.5 Flash, CLOVA X, ChatGPT-4o) using 60 patient-generated questions, assessed by 15 clinical experts. Phase 2 was a three-arm parallel pilot randomized controlled trial (October–December 2025) at three tertiary hospitals in Seoul, South Korea. Participants were randomly assigned in a 1:1:1 ratio to either the C-RAG, O-RAG, or standard care control group. The primary outcome was the change in information needs across medication, dietary, and diagnostic domains using 5-point Likert scales at baseline and after four weeks, which were analyzed using ANCOVA. Semi-structured interviews with nine participants provided insights into the mechanisms behind the quantitative outcomes.

Results:

Phase 1 identified ChatGPT-4o as the optimal model, achieving the highest scores in clinical accuracy (mean 3.60), safety (mean 3.67), and readability (Flesch-Kincaid Grade Level 9.8, compared to 13.2-14.5 for other models, all on a 5-point scale). In Phase 2, 45 participants were enrolled, with 42 completing the study (retention rate of 42/45, 93%; 14 participants in each group). ANCOVA analysis revealed significant group differences in medication needs (F2,38=4.574, P=.017) and dietary information needs (F2,38=4.232, P=.022). Bonferroni-corrected pairwise comparisons indicated that the C-RAG group had significantly lower information needs than the O-RAG group for both medication (observed means 2.93 vs 3.86; Cohen d=-0.86, P=.017) and dietary domains (observed means 2.93 vs 3.86; Cohen d=-0.86, P=.018). No significant differences were found between either RAG group and the control group (all P>.12). Secondary outcomes demonstrated no significant group differences (all ANCOVA P>.40). Dietary management emerged as the most frequent query topic, accounting for 29.3% of total queries (205 out of 691). Participants on dialysis for 1-3 years showed higher engagement (90% engagement for 1-3 years vs 27.8% for 4 years; Fisher exact P=.006), while age, digital proficiency, and prior chatbot experience showed no notable associations. Qualitative analysis revealed five themes focused on the value of disease-specific access, dietary management as the primary need, and dialysis duration as a moderator of utility.

Conclusions:

Personalized RAG significantly outperforms general RAG, demonstrating large effect sizes (Cohen d=-0.86) that substantially exceed digital health benchmarks. The medium-to-large effects observed against standard care (Cohen d=-0.54 to -0.64) support the need for a fully powered confirmatory trial, with an estimated sample size of 50-55 participants per group, to evaluate long-term clinical outcomes. Clinical Trial: KCT0011656; https://cris.nih.go.kr/cris/search/detailSearch.do?seq=32626


 Citation

Please cite as:

Hwang HJ, Kim J, Heo D, You JY, Lee J, Kwon S, Yoo KD, Yoon Park JH, Suh B, Lee JP, Lee KW

Effects of a Personalized Retrieval-Augmented Generation Chatbot on Information Needs in Chronic Kidney Disease Patients: Mixed Methods Randomized Controlled Trial

JMIR Preprints. 07/04/2026:96501

DOI: 10.2196/preprints.96501

URL: https://preprints.jmir.org/preprint/96501

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.