Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Currently submitted to: JMIR Formative Research

Date Submitted: Jul 2, 2026
Open Peer Review Period: Jul 3, 2026 - Aug 28, 2026
(currently open for review)

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

A Multi-Agent Orchestration Framework for an LLM-Based Virtual Patient in Intestinal Obstruction: Process Analytics and Safety Benchmarking

  • Weng Xiaoyuan; 
  • Yanqi Tang; 
  • Yixin Lin; 
  • Zhaofeng Huang; 
  • Xiyin Huang; 
  • Jianya Cai

ABSTRACT

Background:

Multi-agent large language model (LLM) systems can separate virtual-patient functions into coordinated agents, but educational deployment requires evidence of how agents interact, how safety rules behave under stress, and whether process logs provide meaningful visibility into learner-system interaction.

Objective:

This study developed and evaluated an OpenMAIC-DeepSeek multi-agent orchestration framework for an intestinal obstruction virtual patient. The aims were to describe the execution architecture, analyze process-level interaction data, and benchmark output safety against a DeepSeek-V3-only condition.

Methods:

We conducted a single-center informatics framework evaluation beginning on April 5, 2026. The system coordinated five agents (Virtual Patient, Examiner, Tutor, Knowledge Graph, and Safety and Rule) using locally configured OpenMAIC v0.1.0 (released March 26, 2026) and a locally deployed DeepSeek-V3 model. Evaluation focused on event-level process logs, agent handoffs, rule-interception analysis, a 150-prompt safety benchmark with prespecified risk strata, and automated formative scoring agreement. Student-reported usability and immediate test scores were collected only as non-evidentiary implementation feedback.

Results:

Across 60 sessions, the system generated 1186 logged events and 353 successful agent handoffs. Virtual Patient Agent (449 events) and Tutor Agent (248 events) were the most active. Safety and Rule Agent intercepted 55 outputs, with 50/55 (90.9%) successfully corrected; 7/55 (12.7%) required teacher review. In the safety benchmark, multi-agent outputs showed lower hallucination-associated rates (2.00% vs 7.33%) and unsafe-output rates (1.78% vs 3.56%) than DeepSeek-V3-only outputs. Risk-stratified analysis revealed a residual unsafe-output rate of 10.0% (3/30 expert ratings) in critical prompts, including one prompt-level majority unsafe classification. Automated formative scoring correlated moderately with human scores (r=0.66; 95% CI 0.49-0.78), with Bland-Altman 95% limits of agreement from -9.6 to +10.4 points.

Conclusions:

The framework made learner-system interactions auditable through structured event logs and demonstrated that rule interception can reduce, but not eliminate, unsafe educational outputs. Residual unsafe outputs in critical prompts and teacher-review events preclude claims of autonomous readiness, clinical safety, or educational effectiveness. The system should be viewed as a teacher-supervised process-analytics prototype requiring stronger safety gating and independent evaluation before broader deployment. Clinical Trial: Not applicable. This was a single-center informatics framework evaluation and safety benchmarking study, not a randomized controlled trial or clinical effectiveness trial.


 Citation

Please cite as:

Xiaoyuan W, Tang Y, Lin Y, Huang Z, Huang X, Cai J

A Multi-Agent Orchestration Framework for an LLM-Based Virtual Patient in Intestinal Obstruction: Process Analytics and Safety Benchmarking

JMIR Preprints. 02/07/2026:106057

DOI: 10.2196/preprints.106057

URL: https://preprints.jmir.org/preprint/106057

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.