Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Evidence-Informed Guidance on Cannabidiol Use in Older Adults: Development and Evaluation of Retrieval-Augmented Large Language Models
ABSTRACT
Background:
Older adults often experience chronic conditions such as pain and sleep disturbances, leading many to explore cannabidiol (CBD) for symptom relief. Safe use requires appropriate dosing, careful titration, and awareness of potential interactions; however, the stigma surrounding CBD use and limited health literacy can constrain comprehension. Conversational AI systems built on large language models (LLMs) and retrieval-augmented generation (RAG) may support CBD education, but their safety and reliability remain under-evaluated.
Objective:
This study aimed to (1) design a retrieval-augmented LLM framework integrating structured prompts and evidence retrieval from curated CBD resources to generate safe, coherent, context-aware guidance for older adults, and (2) systematically evaluate leading LLMs and RAG systems using an automated, annotation-free framework in the absence of standardized benchmarks.
Methods:
A structured parametric scenario generation framework produced sixty-four diverse profiles by varying symptom goals, administration preferences, cognitive status, demographics, health parameters, comorbidities, medication regimens, cannabis history, and caregiver support. These scenarios, combined with advanced prompt engineering, were used to test OpenAI GPT 5.1, Google Gemini 2.5 Pro, Mistral AI Medium 3, Anthropic Claude Sonnet 4.5, xAI Grok 4, and DeepSeek V3.2-Exp. Retrieval-augmented variants of GPT 5.1 and Gemini 2.5 Pro incorporated thirty-two curated CBD guidelines. A novel ensemble retrieval architecture was also designed, combining two independent RAG systems with a third tiebreaker RAG. Model outputs were evaluated using three automated, annotation-free methods proposed in this study, including statistical consensus analysis, feature-aligned directional checks, and LLM-as-a-judge rubric scoring.
Results:
Across all three evaluation strategies, the ensemble RAG configuration produced the most cautious, clinically grounded recommendations, closely followed by Gemini 2.5 Pro RAG and GPT 5.1 RAG. Standalone GPT 5.1 and Gemini 2.5 Pro performed reliably but were less consistently cautious than their retrieval-augmented counterparts. DeepSeek V3.2-Exp and Grok 4 generated higher, more variable dosing patterns, while Medium 3 showed intermediate behavior with moderate safety and alignment, closely matching guideline-based dosage and titration.
Conclusions:
This study introduces a reproducible, annotation-free framework for benchmarking LLM-based CBD education and shows that retrieval-augmented models provide more adaptive guidance for older adults with diverse cognitive and clinical needs. The findings highlight the potential of structured retrieval to improve the reliability of AI-driven evidence-informed educational tools used in sensitive health contexts.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.