JMIR Preprints #95366: Beyond the Keyboard: The Imperative for Multimodal Ambient AI and Computer Vision as the Anesthesiologist’s "Visual Scribe"

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Beyond the Keyboard: The Imperative for Multimodal Ambient AI and Computer Vision as the Anesthesiologist’s "Visual Scribe"

Michele Russo;
Elena Giovanna Bignami

ABSTRACT

The administrative burden in anesthesiology has reached a critical tipping point, as the digitization of healthcare via Electronic Health Records (EHRs) often forces clinicians to spend more time interacting with screens than with patients. In the high-stakes, high-velocity environment of the Operating Room (OR), this "documentation tax" competes directly with cognitive vigilance. While "ambient AI scribes" that listen to and transcribe patient encounters are revolutionizing outpatient care, they remain largely ineffective in the perioperative setting, where care is a complex choreography of physical actions, physiological monitoring, and silent vigilance rather than mere conversation. This Viewpoint argues that the next generation of AI documentation in anesthesiology must evolve from unimodal "listening" to multimodal "sensing." We propose the concept of the "Visual Scribe," an ambient intelligence system integrating Computer Vision (CV) with audio and telemetry data to automatically document the physical reality of surgical care. Synthesizing current research on AI-enabled perioperative workflow analysis, we explore how CV algorithms—such as temporal action localization and pose estimation—can segment surgical cases into granular phases with superhuman precision. Contrasting the retrospective imprecision of manual documentation with the real-time capabilities of multimodal AI, we highlight how emerging architectures can accurately detect and timestamp critical "silent" events like patient transport, intubation, and incision. Automating these data points can drastically reduce clinician burnout, reveal hidden provider-level workflow variability, and enhance patient safety through real-time "sterile cockpit" monitoring. To address the ethical, medicolegal, and ergonomic implications of deploying "always-on" visual sensors, we emphasize the need for a paradigm shift in privacy engineering, utilizing edge-based skeletonization to mitigate surveillance concerns. Ultimately, by equipping the EHR with "eyes" as well as "ears," we can create a self-documenting operating room, transforming the EHR from a distractor into a silent, autonomous partner that restores the anesthesiologist’s unwavering focus to the patient.

Citation

Please cite as:

Russo M, Bignami EG

Beyond the Keyboard: The Imperative for Multimodal Ambient AI and Computer Vision as the Anesthesiologist’s "Visual Scribe"

JMIR Preprints. 15/03/2026:95366

DOI: 10.2196/preprints.95366

URL: https://preprints.jmir.org/preprint/95366

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Currently submitted to: JMIR Medical Informatics

Date Submitted: Mar 15, 2026

Open Peer Review Period: Mar 26, 2026 - May 21, 2026

(currently open for review)

Beyond the Keyboard: The Imperative for Multimodal Ambient AI and Computer Vision as the Anesthesiologist’s "Visual Scribe"

ABSTRACT

Citation

Copyright