Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Currently submitted to: JMIR AI

Date Submitted: Jan 27, 2026

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Integrating Knowledge Graph with Retrieval-Augmented Generation in Medical Question Answering: Development and Usability Study with MEDQA

  • Lou Pei; 
  • Hu Jiahui; 
  • Zhao Wanqing; 
  • Wang Qian; 
  • Fang An

ABSTRACT

Background:

Large language models (LLMs) have demonstrated superior performance and are widely applied across various domains. However, LLMs face challenges such as outdated knowledge, insufficient knowledge, and hallucinations, particularly in specialized fields like medical.

Objective:

Our study aims to address these challenges by designing a medical question answering model, MEDQA, which based on multimodal knowledge fusion and logic enhancement. Through medical knowledge base, knowledge graph and retrieval augmented, accurate expression and reliable reasoning of professional knowledge are realized.

Methods:

A semi-structured knowledge retrieval system is constructed, and the semantic block technology is used to transform the semi-structured text into a high-dimensional vector representation to form a vector base that supports fast retrieval. And a knowledge graph was constructed based on the medical ontology. A Text2SPARQL method combined with chain of thought (CoT) is proposed to improve the accuracy of retrieval results by converting user's natural language questions into structured query language. The vector base and knowledge graph are retrieved in parallel to provide richer prompt for LLM.

Results:

Our medical knowledge retrieval system comprises 12,184 concept-related knowledge, and the knowledge graph contains 73,000 entities and more than 350,000 triples. After using MEDQA, the accuracy of QA was 95% and the recall was 94.7%, which was significantly better than the effect of using LLM, knowledge base and knowledge graph alone.

Conclusions:

MEDQA has broken through the single knowledge model of the traditional Retrieval-Augmented Generation architecture, and formed a multi-level knowledge processing process of text representation, graph retrieval and traceability, which enhances the adaptability of LLM and provides a cost-effective solution to meet the challenges in the medical field.


 Citation

Please cite as:

Pei L, Jiahui H, Wanqing Z, Qian W, An F

Integrating Knowledge Graph with Retrieval-Augmented Generation in Medical Question Answering: Development and Usability Study with MEDQA

JMIR Preprints. 27/01/2026:92241

DOI: 10.2196/preprints.92241

URL: https://preprints.jmir.org/preprint/92241

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.