Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Sep 28, 2025
Date Accepted: Apr 20, 2026

The final, peer-reviewed published version of this preprint can be found here:

A Large Language Model–Powered Multiagent Framework Emulating Standardized Patients in Clinical Communication Skills Training: Development and Evaluation Study

Qu Y, Xu X, Long Y, Wang Y, Li J, Lv X

A Large Language Model–Powered Multiagent Framework Emulating Standardized Patients in Clinical Communication Skills Training: Development and Evaluation Study

J Med Internet Res 2026;28:e84747

DOI: 10.2196/84747

PMID: 42241338

An LLM-Powered Multi-Agent Framework Emulating Standardized Patients in Clinical Communication Skills Training: Development and Evaluation

  • Yufei Qu; 
  • Xiaowei Xu; 
  • Yunzi Long; 
  • Yijie Wang; 
  • Jiao Li; 
  • Xudong Lv

ABSTRACT

Background:

Effective clinical communication is essential for medical practice, with standardized patients (SPs) being a reliable standard training method despite resource limitations. While large language models (LLMs) show strong role-playing abilities, current virtual patients(VPs) based on single LLMs face fidelity and scalability challenges. Recent advances in multi-agent frameworks, which have demonstrated considerable potential in handling complex tasks, offer a new perspective for creating VPs in medical education.

Objective:

The aim of the study is to develop and evaluate a novel multi-agent VP framework that simulates SPs through collaborative agent design, thereby enhancing scalability, instructional utility, and human-like fidelity in clinical communication training.

Methods:

Our multi-agent framework constructs five specialized sub-agents by simulating functional partitioning of brain regions, collaboratively simulating the entire process from case reception to interactive consultation with medical students. To enhance the medical accuracy and scalability of patient responses, we incorporate retrieval-augmented technology, while deep character reasoning is employed to improve response richness and realism. We evaluated the proposed framework through a two-phase experiment where the metrics of response quality, role-playing performance, and instructional utility were applied consistently: first to compare different base models, and then to benchmark the complete framework against single-LLM approaches.

Results:

Our results demonstrate that the Qwen3-32B-based multi-agent framework achieves optimal performance, attaining the highest factual consistency (mean=0.769), perfect instructional utility (100%), and superior role-playing ability (39.67/40). These metrics significantly outperform both GPT-4o and single-LLM approaches (p<0.05). The framework effectively minimizes hallucinations (<5% misleading rate) and maintains strong scalability (CV=4.7%) across different clinical departments, confirming its robustness in diverse case scenarios.

Conclusions:

The multi-agent framework offers a viable simulation of SPs through the coordinated interaction of multiple LLM-based agents. This approach enhances the performance and scalability of VP simulation, providing a customizable and scalable solution for medical communication training, without compromising patient confidentiality. The framework holds substantial potential for advancing medical education approaches.


 Citation

Please cite as:

Qu Y, Xu X, Long Y, Wang Y, Li J, Lv X

A Large Language Model–Powered Multiagent Framework Emulating Standardized Patients in Clinical Communication Skills Training: Development and Evaluation Study

J Med Internet Res 2026;28:e84747

DOI: 10.2196/84747

PMID: 42241338

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.