Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Mar 28, 2025
Date Accepted: May 24, 2025
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Evaluating and Improving Syndrome Differentiation Thinking Ability in Large Language Models: Methods Study
ABSTRACT
Background:
Large language models (LLMs) provide new opportunities to advance the intelligent development of Traditional Chinese medicine (TCM). Syndrome differentiation thinking is an essential part of TCM, and equipping LLMs with this capability represents a crucial step toward more effective clinical applications of TCM. However, given the complexity of TCM syndrome differentiation thinking, acquiring this ability is a considerable challenge for the model.
Objective:
This study aims to evaluate LLMs' syndrome differentiation thinking ability and design a method to enhance their performance in this area effectively.
Methods:
We decompose the process of TCM syndrome differentiation thinking into three core tasks: pathogenesis inference, syndrome inference, and diagnostic suggestion. To evaluate the performance of LLMs in these tasks, we constructed a high-quality evaluation dataset, providing a reliable foundation for the quantitative assessment of their capabilities. Furthermore, we developed a methodology for generating instruction data based on the idea of an "open-book exam", customized three data templates, and dynamically retrieved task-relevant professional knowledge, inserted into predefined positions within the templates. This approach effectively generates high-quality instruction data that aligns with the unique characteristics of TCM syndrome differentiation thinking. Leveraging this instruction data, we fine-tuned the base model, enhancing the syndrome differentiation thinking ability of the LLMs.
Results:
We collected 200 medical cases for the evaluation dataset and standardized them into three types of task questions. We tested general and TCM LLMs, comparing their performance with our proposed solution. The results demonstrate that our method significantly enhances LLMs' syndrome differentiation thinking ability. Our model achieved 85.7% and 81.2% accuracy in Tasks 1 and 2, respectively, surpassing the best-performing TCM and general LLMs by 26.3% and 15.8%. In Task 3, our model scored 84.3, indicating that the model is very similar to the advice given by experts.
Conclusions:
Existing general LLMs and TCM LLMs still have significant limitations in the core task of syndrome differentiation thinking. Our research shows that fine-tuning LLMs by designing professional instruction templates and generating high-quality instruction data can significantly improve their performance in core tasks. The optimized LLMs show a high degree of similarity in reasoning results with the opinions of domain experts, indicating that they can simulate syndrome differentiation thinking to a certain extent. This has important theoretical and practical significance for in-depth interpretation of the complexity of the clinical diagnosis and treatment process of TCM.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.