Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Oct 17, 2024
Date Accepted: Feb 25, 2025
Improving Dietary Supplement Information Retrieval: Development of a Retrieval-Augmented Generation (RAG) System with Large Language Models
ABSTRACT
Background:
Dietary supplements (DS) are widely used to add nutritional value to the diet, containing vitamins, minerals, herbs, amino acids, and other substances. Despite their popularity, DS are regulated less stringently compared to prescription drugs, leading to challenges related to efficacy, safety, and misinformation. There is a critical need for accurate resources to support DS-related decision-making for consumers and healthcare providers.
Objective:
To improve the accuracy and reliability of DS question answering by integrating a novel Retrieval-Augmented Generation (RAG) system with an enhanced DS knowledge base and a user-friendly interface.
Methods:
We developed iDISK2.0, a DS knowledge system, by integrating updated data from multiple trusted sources, including NMCD, MSKCC, DSLD, and NHPD, using advanced strategies to reduce data noise. iDISK2.0 was implemented with a RAG system, combining the capabilities of large language models (LLMs) and a biomedical knowledge graph (BKG) to mitigate hallucination issues found in standalone LLMs. The RAG system utilizes GPT-4 to retrieve contextually relevant subgraphs from the BKG based on entities identified in the user query. A user-friendly interface was developed to facilitate easy access to DS knowledge through conversational inputs.
Results:
iDISK2.0 contains 174,317 entities across seven types, six relationship types, and 471,063 attributes. The iDISK2.0-RAG system demonstrated significant improvements in DS information retrieval accuracy. Evaluation results indicated over 95% accuracy in answering True/False and multiple-choice DS-related questions, outperforming standalone LLMs. The user interface enabled efficient interaction, supporting free-form text input and providing accurate responses. Integration strategies minimized data noise, ensuring access to up-to-date DS information.
Conclusions:
The integration of iDISK2.0 with a RAG system effectively addresses limitations of standalone LLMs, resulting in a reliable solution for DS information retrieval. This study highlights the importance of combining structured knowledge graphs with advanced language models to enhance the accuracy of information retrieval systems, ultimately supporting better-informed decision-making in DS research and healthcare
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.