Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Integrating Knowledge Graph with Retrieval-Augmented Generation in Medical Question Answering: Development and Usability Study with MEDQA
ABSTRACT
Background:
Large language models (LLMs) have demonstrated superior performance and are widely applied across various domains. However, LLMs face challenges such as outdated knowledge, insufficient knowledge, and hallucinations, particularly in specialized fields like medical.
Objective:
Our study aims to address these challenges by designing a medical question answering model, MEDQA, which based on multimodal knowledge fusion and logic enhancement. Through medical knowledge base, knowledge graph and retrieval augmented, accurate expression and reliable reasoning of professional knowledge are realized.
Methods:
A semi-structured knowledge retrieval system is constructed, and the semantic block technology is used to transform the semi-structured text into a high-dimensional vector representation to form a vector base that supports fast retrieval. And a knowledge graph was constructed based on the medical ontology. A Text2SPARQL method combined with chain of thought (CoT) is proposed to improve the accuracy of retrieval results by converting user's natural language questions into structured query language. The vector base and knowledge graph are retrieved in parallel to provide richer prompt for LLM.
Results:
Our medical knowledge retrieval system comprises 12,184 concept-related knowledge, and the knowledge graph contains 73,000 entities and more than 350,000 triples. After using MEDQA, the accuracy of QA was 95% and the recall was 94.7%, which was significantly better than the effect of using LLM, knowledge base and knowledge graph alone.
Conclusions:
MEDQA has broken through the single knowledge model of the traditional Retrieval-Augmented Generation architecture, and formed a multi-level knowledge processing process of text representation, graph retrieval and traceability, which enhances the adaptability of LLM and provides a cost-effective solution to meet the challenges in the medical field.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.