JMIR Preprints #69286: A Weighted Voting Approach for Traditional Chinese Medicine Formula Classification Using Large Language Models: Development of a Prediction Method

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

A Weighted Voting Approach for Traditional Chinese Medicine Formula Classification Using Large Language Models: Development of a Prediction Method

Zhe Wang;
Keqian Li;
Suyuan Peng;
Lihong Liu;
Xiaolin Yang;
Keyu Yao;
Heinrich Herre;
Yan Zhu

ABSTRACT

Background:

Several clinical cases and experiments have demonstrated the effectiveness of traditional Chinese medicine (TCM) formulas in treating and preventing diseases. These formulas encapsulate critical information regarding their ingredients, efficacy, and indications. Classifying TCM formulas based on this information can effectively standardize their management, support clinical and research application, and promote the modernization and scientific use of TCM. To further advance this task, TCM formulas can be classification using various approaches, including manual classification, machine learning, and deep learning. Additionally, large language models (LLMs) are gaining prominence in the biomedical field. Integrating LLMs into TCM research could significantly enhance and accelerate the discovery of TCM knowledge by leveraging their advanced linguistic comprehension and contextual awareness.

Objective:

The objective of this study is to assess the accuracy of different LLMs in TCM formulas classification task. Additionally, by utilizing ensemble learning, our study aims to improve the accuracy of the task by utilizing multiple fine-tuned LLMs.

Methods:

The data for the TCM formula was manually refined and cleaned. We selected ten LLMs that support Chinese for fine-tuning. We then used an ensemble learning approach to subject the model results to both hard and weighted voting, with weights determined by the average accuracy of each model. Finally, we selected the top five most effective models from each series of LLMs for weighted voting (Top5), and the top three most accurate models out of ten for weighted voting (Top3).

Results:

A total of 2,441 formulas were curated manually from various sources, including the Coding Rules for Chinese Medicinal Formulas and Their Codes, the Chinese National Medical Insurance Catalog for proprietary Chinese medicines, textbooks of formulas of Chinese medicine, and TCM literature. The training and testing sets consisted of 1,999 and 442 TCM formulas, respectively. The testing results showed that Qwen-14B achieved the highest accuracy of 75.32% among the single models. The accuracy rates for hard voting, weighted voting, weighted voting (Top 5), and weighted voting (Top 3) were 75.79%, 76.71%, 75.57%, and 77.15%, respectively.

Conclusions:

The primary objective of this study is to explore the effectiveness of LLMs in the classification task. To achieve this, an ensemble learning method is proposed that integrates multiple fine-tuned LLMs through a voting mechanism. This approach not only improves accuracy but also enables improvement on the existing classification system of TCM formula efficacy.

Citation

Please cite as:

Wang Z, Li K, Peng S, Liu L, Yang X, Yao K, Herre H, Zhu Y

A Weighted Voting Approach for Traditional Chinese Medicine Formula Classification Using Large Language Models: Algorithm Development and Validation Study

JMIR Med Inform 2025;13:e69286

DOI: 10.2196/69286

PMID: 40705933

PMCID: 12292024

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Nov 26, 2024

Date Accepted: May 23, 2025

A Weighted Voting Approach for Traditional Chinese Medicine Formula Classification Using Large Language Models: Development of a Prediction Method

ABSTRACT

Citation

Copyright