JMIR Preprints #94741: A Multi-Agent LLM Framework for Rating the Quality of Surgical Feedback

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

A Multi-Agent LLM Framework for Rating the Quality of Surgical Feedback

Rafał Kocielnik;
J. Everett Knudsen;
Steven Y. Cen;
Jasmine Lin;
Cherine H. Yang;
Atharva Deo;
Ujjwal Pasupulety;
Peter Wager;
Anima Anandkumar;
Andrew J. Hung

ABSTRACT

Verbal feedback delivered by attending surgeons in the operating room plays a critical formative role in resident trainee skill acquisition. Yet, assessing the quality of trainer feedback and its effectiveness in influencing trainee behavior during live surgery remains a challenge. Prior studies assessed feedback content relying on extensive manual annotation by expert human raters and focused on developing broad taxonomies that overlook the qualitative aspects of feedback delivery such as clarity or urgency. Limited existing automated methods, including keyword analysis and topic modeling, also fail to capture these nuanced aspects. We introduce a two-stage LLM-based framework that discovers interpretable feedback quality criteria grounded in the context of surgical training. Our method uses multi-agent prompting and surgical domain knowledge injection to discover a small set of human interpretable scoring criteria (e.g., Encouraging, Urgent, Clear). These criteria are then used to automatically score live surgical feedback via an LLM-as-a-judge approach. Evaluation on 4.2k trainer feedback instances demonstrates that our AI-discovered criteria outperform prior content-based frameworks in predicting feedback effectiveness, including observed trainee behavioral adjustments and trainer approval. This work advances scalable, human-aligned assessment of communication quality in the operating room and provides a foundation for improving surgical teaching practices.

Citation

Please cite as:

Kocielnik R, Knudsen JE, Cen SY, Lin J, Yang CH, Deo A, Pasupulety U, Wager P, Anandkumar A, Hung AJ

A Multi-Agent LLM Framework for Rating the Quality of Surgical Feedback

JMIR Preprints. 05/03/2026:94741

DOI: 10.2196/preprints.94741

URL: https://preprints.jmir.org/preprint/94741

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Currently submitted to: JMIR Medical Education

Date Submitted: Mar 5, 2026

Open Peer Review Period: Mar 5, 2026 - Apr 30, 2026

(currently open for review)

A Multi-Agent LLM Framework for Rating the Quality of Surgical Feedback

ABSTRACT

Citation

Copyright