Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Jan 22, 2024
Open Peer Review Period: Feb 5, 2024 - Apr 1, 2024
Date Accepted: May 4, 2024
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Advancing Accuracy in Multimodal Medical Tasks Through Bootstrapped Language-Image Pretraining (BioMedBLIP): Performance Evaluation Study

Naseem U, Thapa S, Masood A

Advancing Accuracy in Multimodal Medical Tasks Through Bootstrapped Language-Image Pretraining (BioMedBLIP): Performance Evaluation Study

JMIR Med Inform 2024;12:e56627

DOI: 10.2196/56627

PMID: 39102281

PMCID: 11333867

BioMedBLIP: Advancing Accuracy in Multimodal Medical Tasks through Bootstrapped Language-Image Pretraining

  • Usman Naseem; 
  • Surendrabikram Thapa; 
  • Anum Masood

ABSTRACT

Background:

Medical image analysis, particularly in the context of Visual Question Answering (VQA) and image captioning, is crucial for accurate diagnosis and educational purposes.

Objective:

Our study introduces BioMedBLIP models, fine-tuned for VQA tasks using specialized medical datasets like ROCO and MIMIC-CXR, and evaluates their performance in comparison to the state-of-the-art (SOTA) Original BLIP model.

Methods:

We present nine versions of BioMedBLIP across three downstream tasks in various datasets. The models are trained on a varying number of epochs. The findings indicate the strong overall performance of our models. We proposed BioMedBLIP for VQA Generation Model, VQA Classification Model, and BioMedBLIP Image Caption Model. We conducted pre-training in BLIP using medical datasets, producing an adapted BLIP model tailored for medical applications.

Results:

In VQA-Generation tasks, BioMedBLIP models outperformed the SOTA on SLAKE, VQA-RAD, and ImageCLEF datasets. In VQA-Classification, our models consistently surpassed the SOTA on SLAKE. Our models also showed competitive performance on VQA-RAD and PathVQA datasets. Similarly, for image captioning tasks, our model beats the SOTA suggesting the importance of pretraining with medical datasets. Overall, in 20 different datasets and task combinations, our BioMedBLIP excels in 15 out of 20 tasks. BioMedBLIP represents a new state-of-the-art in 15 out of 20 tasks (75%) and our responses were rated higher in all 20 tasks (P< 0.005) in comparison to SOTA models.

Conclusions:

Our BioMedBLIP models show promising performance and suggest that incorporating medical knowledge through pretraining with domain-specific medical datasets helps models achieve higher performance. Our models thus demonstrate their potential to advance medical image analysis, impacting diagnosis, medical education, and research. However, data quality, task-specific variability, computational resources, and ethical considerations should be carefully addressed. In conclusion, our models represent a contribution towards the synergy of AI and medicine. We have made BioMedBLIP freely available which will help in further advancing research in multimodal medical tasks.


 Citation

Please cite as:

Naseem U, Thapa S, Masood A

Advancing Accuracy in Multimodal Medical Tasks Through Bootstrapped Language-Image Pretraining (BioMedBLIP): Performance Evaluation Study

JMIR Med Inform 2024;12:e56627

DOI: 10.2196/56627

PMID: 39102281

PMCID: 11333867

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.