JMIR Preprints #70080: One LLM is not Enough: Harnessing the Power of Ensemble Learning for Medical Question Answering

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

One LLM is not Enough: Harnessing the Power of Ensemble Learning for Medical Question Answering

Han Yang;
Mingchen Li;
Huixue Zhou;
Yongkang Xiao;
Qian Fang;
Rui Zhang

ABSTRACT

Background:

Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language processing tasks, including medical question-answering (QA). However, individual LLMs often exhibit varying performance across different medical QA datasets, highlighting the need for strategies that can harness their collective strengths. Ensemble learning methods, which combine multiple models to improve overall accuracy and reliability, offer a promising approach to address this challenge. In this study, we introduce the LLM-Synergy framework, employing two ensemble methods—Boosting-based Weighted Majority Vote and Cluster-based Dynamic Model Selection—to enhance performance across diverse medical QA tasks.

Objective:

To enhance the accuracy and reliability of medical QA tasks by developing efficient ensemble learning approaches deploying the LLM technologies. We focus on improving performance across diverse medical QA datasets through innovative ensemble strategies.

Methods:

Our study employs three medical QA datasets: PubMedQA, MedQA-USMLE, and MedMCQA, each presenting unique challenges in biomedical QA. The proposed LLM-Synergy framework, focusing exclusively on zero-shot LLMs, incorporates two primary ensemble methods. The first is a Boosting-based weighted majority vote ensemble, where decision-making is expedited and refined by assigning variable weights to different LLMs through a boosting algorithm. The second method is Cluster-based Dynamic Model Selection, which dynamically selects the most suitable LLM votes for each query, based on the characteristics of question contexts, utilizing a clustering technique to optimize model selection..

Results:

Both the Majority Weighted Vote and Dynamic Model Selection methods demonstrate superior performance compared to individual LLMs across three medical QA datasets. Specifically, the accuracies are 35.84%, 96.21%, and 37.26% for MedMCQA, PubMedQA, and MedQA-USMLE, respectively, with the Majority Weighted Vote. Correspondingly, the Dynamic Model Selection yields slightly higher accuracies of 38.01% for MedMCQA, 96.36% for PubMedQA, and 38.13% for MedQA-USMLE.

Conclusions:

The LLM-Synergy framework utilizing two ensemble methods, represents a significant advancement in leveraging LLMs for medical QA tasks. This framework provides an innovative and efficient approach to utilizing LLM technologies, enabling customization for both current and potentially future challenges in biomedical and health informatics research.

Citation

Please cite as:

Yang H, Li M, Zhou H, Xiao Y, Fang Q, Zhang R

Large Language Model Synergy for Ensemble Learning in Medical Question Answering: Design and Evaluation Study

J Med Internet Res 2025;27:e70080

DOI: 10.2196/70080

PMID: 40658884

PMCID: 12337233

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Dec 14, 2024

Open Peer Review Period: Dec 15, 2024 - Feb 9, 2025

Date Accepted: May 12, 2025

(closed for review but you can still tweet)

One LLM is not Enough: Harnessing the Power of Ensemble Learning for Medical Question Answering

ABSTRACT

Citation

Copyright