Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Currently submitted to: JMIR Medical Informatics

Date Submitted: Jan 12, 2026
Open Peer Review Period: Jan 21, 2026 - Mar 18, 2026
(currently open for review)

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Multimodal Radiology Knowledge Graph Generation Using Vision Language Models

  • Abdullah Abdullah; 
  • Seong Tae Kim

ABSTRACT

Background:

Knowledge graphs are increasingly important in radiology for representing factual clinical information and supporting downstream applications such as decision support, information retrieval, and structured reporting. However, generating radiology-specific knowledge graphs remains challenging due to the specialized vocabulary used in radiology reports, the scarcity of domain-annotated datasets, and the predominance of unimodal approaches that rely solely on text.

Objective:

To develop and evaluate a multimodal Vision-Language-Model (VLM) framework capable of generating radiology knowledge graphs using both radiographic images and the corresponding reports.

Methods:

We designed a VLM-based knowledge graph generation framework that integrates radiology images and free-text reports through instruction tuning and visual instruction tuning. The model is optimized for long-context radiology reports and structured triplet extraction. Its performance was compared with existing unimodal baselines on benchmark datasets.

Results:

Our multimodal VLM-KG (MIMIC) demonstrated the strongest overall performance across standard NLG metrics, achieving the highest BLEU scores (BLEU-1: 54.98, BLEU-2: 49.65, BLEU-3: 46.12, BLEU-4: 43.29), substantially outperforming all unimodal baselines, including the BERT-based Dygiee++ model. This improvement highlights the effectiveness of multimodal learning, where the integration of visual and linguistic information enhances contextual understanding in text generation. Although Dygiee++ achieved a comparable ROUGE-L score (56.49), VLM-KG (MIMIC) provided markedly higher BLEU scores, indicating stronger n-gram overlap and more accurate triplet generation. VLM-KG (MIMIC) also achieved a competitive ROUGE-L score of 54.69, slightly lower than LLM-KG (MIMIC) (56.53), suggesting that while multimodal features improve precision, they may introduce minor variability in generated outputs. Additionally, LLM-KG (MIMIC) consistently outperformed LLM-KG (IU) across all metrics (e.g., BLEU-3: 35.96 vs. 18.02), underscoring the advantages of training on a large-scale, domain-specific dataset.

Conclusions:

This study presents the first multimodal VLM-driven approach for radiology knowledge graph generation. By leveraging both images and reports, the framework overcomes limitations of previous text-only systems and provides a more comprehensive foundation for medical knowledge representation and downstream radiology informatics applications.Vision Language Models; Large Language Models; Knowledge Graph; Radiology; Multimodal AI; Medical NLP


 Citation

Please cite as:

Abdullah A, Kim ST

Multimodal Radiology Knowledge Graph Generation Using Vision Language Models

JMIR Preprints. 12/01/2026:91301

DOI: 10.2196/preprints.91301

URL: https://preprints.jmir.org/preprint/91301

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.