JMIR Preprints #92241: Integrating Knowledge Graph with Retrieval-Augmented Generation in Medical Question Answering: Development and Usability Study with MEDQA

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Integrating Knowledge Graph with Retrieval-Augmented Generation in Medical Question Answering: Development and Usability Study with MEDQA

Lou Pei;
Hu Jiahui;
Zhao Wanqing;
Wang Qian;
Fang An

ABSTRACT

Background:

Large language models (LLMs) have demonstrated superior performance and are widely applied across various domains. However, LLMs face challenges such as outdated knowledge, insufficient knowledge, and hallucinations, particularly in specialized fields like medical.

Objective:

Our study aims to address these challenges by designing a medical question answering model, MEDQA, which based on multimodal knowledge fusion and logic enhancement. Through medical knowledge base, knowledge graph and retrieval augmented, accurate expression and reliable reasoning of professional knowledge are realized.

Methods:

A semi-structured knowledge retrieval system is constructed, and the semantic block technology is used to transform the semi-structured text into a high-dimensional vector representation to form a vector base that supports fast retrieval. And a knowledge graph was constructed based on the medical ontology. A Text2SPARQL method combined with chain of thought (CoT) is proposed to improve the accuracy of retrieval results by converting user's natural language questions into structured query language. The vector base and knowledge graph are retrieved in parallel to provide richer prompt for LLM.

Results:

Our medical knowledge retrieval system comprises 12,184 concept-related knowledge, and the knowledge graph contains 73,000 entities and more than 350,000 triples. After using MEDQA, the accuracy of QA was 95% and the recall was 94.7%, which was significantly better than the effect of using LLM, knowledge base and knowledge graph alone.

Conclusions:

MEDQA has broken through the single knowledge model of the traditional Retrieval-Augmented Generation architecture, and formed a multi-level knowledge processing process of text representation, graph retrieval and traceability, which enhances the adaptability of LLM and provides a cost-effective solution to meet the challenges in the medical field.

Citation

Please cite as:

Pei L, Jiahui H, Wanqing Z, Qian W, An F

Integrating Knowledge Graph with Retrieval-Augmented Generation in Medical Question Answering: Development and Usability Study with MEDQA

JMIR Preprints. 27/01/2026:92241

DOI: 10.2196/preprints.92241

URL: https://preprints.jmir.org/preprint/92241

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Currently submitted to: JMIR AI

Date Submitted: Jan 27, 2026

Integrating Knowledge Graph with Retrieval-Augmented Generation in Medical Question Answering: Development and Usability Study with MEDQA

ABSTRACT

Citation

Copyright