Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Sep 28, 2025
Date Accepted: Apr 20, 2026
An LLM-Powered Multi-Agent Framework Emulating Standardized Patients in Clinical Communication Skills Training: Development and Evaluation
ABSTRACT
Background:
Effective clinical communication is essential for medical practice, with standardized patients (SPs) being a reliable standard training method despite resource limitations. While large language models (LLMs) show strong role-playing abilities, current virtual patients(VPs) based on single LLMs face fidelity and scalability challenges. Recent advances in multi-agent frameworks, which have demonstrated considerable potential in handling complex tasks, offer a new perspective for creating VPs in medical education.
Objective:
The aim of the study is to develop and evaluate a novel multi-agent VP framework that simulates SPs through collaborative agent design, thereby enhancing scalability, instructional utility, and human-like fidelity in clinical communication training.
Methods:
Our multi-agent framework constructs five specialized sub-agents by simulating functional partitioning of brain regions, collaboratively simulating the entire process from case reception to interactive consultation with medical students. To enhance the medical accuracy and scalability of patient responses, we incorporate retrieval-augmented technology, while deep character reasoning is employed to improve response richness and realism. We evaluated the proposed framework through a two-phase experiment where the metrics of response quality, role-playing performance, and instructional utility were applied consistently: first to compare different base models, and then to benchmark the complete framework against single-LLM approaches.
Results:
Our results demonstrate that the Qwen3-32B-based multi-agent framework achieves optimal performance, attaining the highest factual consistency (mean=0.769), perfect instructional utility (100%), and superior role-playing ability (39.67/40). These metrics significantly outperform both GPT-4o and single-LLM approaches (p<0.05). The framework effectively minimizes hallucinations (<5% misleading rate) and maintains strong scalability (CV=4.7%) across different clinical departments, confirming its robustness in diverse case scenarios.
Conclusions:
The multi-agent framework offers a viable simulation of SPs through the coordinated interaction of multiple LLM-based agents. This approach enhances the performance and scalability of VP simulation, providing a customizable and scalable solution for medical communication training, without compromising patient confidentiality. The framework holds substantial potential for advancing medical education approaches.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.