Accepted for/Published in: JMIR Formative Research
Date Submitted: Oct 17, 2023
Date Accepted: Jan 10, 2024
A Vision-Language Model for Generating Textual Descriptions from Clinical Images: Model Development and Validation
ABSTRACT
Background:
Automatic generation of radiology reports, which seeks to create a free-text description from a clinical radiograph, is emerging as a pivotal intersection between clinical medicine and artificial intelligence. Leveraging natural language processing technologies can accelerate report creation, enhancing healthcare quality and standardization. However, most existing studies have not yet fully tapped into the combined potential of advanced language and vision models.
Objective:
The purpose of this study was to explore the integration of pretrained vision-language models (VLM) into radiology report generation. This would enable the VLM to automatically convert clinical images into high-quality textual reports.
Methods:
In our research, we introduced a radiology report generation model named ClinicalBLIP, building upon the foundational InstructBLIP model and refining it using clinical image-to-text datasets. A multi-stage finetuning approach via LoRA was proposed to deepen the semantic comprehension of the visual encoder and the large language model for clinical imagery. Furthermore, prior knowledge was integrated through prompt learning to enhance the precision of the reports generated. Experiments were conducted on both the IU X-RAY and MIMIC-CXR datasets, with ClinicalBLIP was compared to several leading methods.
Results:
Experimental results reveal that ClinicalBLIP obtains superior scores of 0.570/0.365 and 0.534/0.313 on the IU X-RAY/MIMIC-CXR test sets for METEOR and ROUGE metrics, respectively. This performance notably surpasses that of existing state-of-the-art methods. Further evaluations confirm the effectiveness of the multi-stage finetuning and the integration of prior information, leading to substantial improvements.
Conclusions:
The proposed ClinicalBLIP demonstrated robustness and effectiveness in enhancing clinical radiology report generation, suggesting significant promise for real-world clinical applications.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.