Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Dec 16, 2024
Date Accepted: Mar 25, 2025

The final, peer-reviewed published version of this preprint can be found here:

The Effectiveness of a Custom AI Chatbot for Type 2 Diabetes Mellitus Health Literacy: Development and Evaluation Study

Kelly A, van de Ven P, Noctor E

The Effectiveness of a Custom AI Chatbot for Type 2 Diabetes Mellitus Health Literacy: Development and Evaluation Study

J Med Internet Res 2025;27:e70131

DOI: 10.2196/70131

PMID: 40324160

PMCID: 12089868

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

The effectiveness of a custom AI chatbot for T2DM health literacy: An evaluation study

  • Anthony Kelly; 
  • Pepijn van de Ven; 
  • Eoin Noctor

ABSTRACT

Background:

People living with chronic diseases are increasingly seeking health information online. For individuals with diabetes, traditional educational materials often lack reliability and fail to engage or empower them effectively. Innovative approaches, such as Retrieval-Augmented Generation (RAG) powered by large language models (LLMs), have the potential to enhance health literacy by delivering interactive, medically accurate, and user-focused resources.

Objective:

To evaluate the effectiveness of a custom RAG-based AI chatbot designed to improve health literacy in type 2 diabetes mellitus (T2DM) by sourcing information from validated reference documents and attributing sources.

Methods:

A T2DM chatbot was developed using a fixed prompt and reference documents. Two evaluations were performed: 1) a curated set of 44 questions assessed by a specialist for appropriateness (appropriate, partly appropriate, or inappropriate) and source attribution (matched, partly matched, unmatched or general knowledge); 2) a simulated consultation of 16 queries reflecting a typical patient’s concerns.

Results:

Of the 44 evaluated questions, 32 responses cited reference documents and 12 were attributed to general knowledge. Among the sourced responses, 30 (94%) were deemed fully appropriate, with the remaining 2 partly appropriate. Of the 12 general knowledge responses, 1 was inappropriate. In the 16-question simulated consultation, all responses were fully appropriate and sourced from the reference documents.

Conclusions:

A RAG-based LLM chatbot can deliver contextually appropriate, empathetic, and clinically credible responses for T2DM queries. By consistently citing trusted sources and notifying users when relying on general knowledge, this approach enhances transparency and trust. The findings have relevance for health educators, highlighting that patient centric reference documents—structured to address frequent patient questions—are particularly effective. Moreover, instances where the chatbot draws on general knowledge can signal opportunities for health educators to refine and expand their materials, ensuring that more future queries are answered from trusted sources. The findings suggest that such chatbots may support patient education, promote self-management, and be readily adapted to other health contexts. Clinical Trial: N/A


 Citation

Please cite as:

Kelly A, van de Ven P, Noctor E

The Effectiveness of a Custom AI Chatbot for Type 2 Diabetes Mellitus Health Literacy: Development and Evaluation Study

J Med Internet Res 2025;27:e70131

DOI: 10.2196/70131

PMID: 40324160

PMCID: 12089868

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.