Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Currently submitted to: JMIR Medical Informatics

Date Submitted: Mar 15, 2026
Open Peer Review Period: Mar 26, 2026 - May 21, 2026
(currently open for review)

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Beyond the Keyboard: The Imperative for Multimodal Ambient AI and Computer Vision as the Anesthesiologist’s "Visual Scribe"

  • Michele Russo; 
  • Elena Giovanna Bignami

ABSTRACT

The administrative burden in anesthesiology has reached a critical tipping point, as the digitization of healthcare via Electronic Health Records (EHRs) often forces clinicians to spend more time interacting with screens than with patients. In the high-stakes, high-velocity environment of the Operating Room (OR), this "documentation tax" competes directly with cognitive vigilance. While "ambient AI scribes" that listen to and transcribe patient encounters are revolutionizing outpatient care, they remain largely ineffective in the perioperative setting, where care is a complex choreography of physical actions, physiological monitoring, and silent vigilance rather than mere conversation. This Viewpoint argues that the next generation of AI documentation in anesthesiology must evolve from unimodal "listening" to multimodal "sensing." We propose the concept of the "Visual Scribe," an ambient intelligence system integrating Computer Vision (CV) with audio and telemetry data to automatically document the physical reality of surgical care. Synthesizing current research on AI-enabled perioperative workflow analysis, we explore how CV algorithms—such as temporal action localization and pose estimation—can segment surgical cases into granular phases with superhuman precision. Contrasting the retrospective imprecision of manual documentation with the real-time capabilities of multimodal AI, we highlight how emerging architectures can accurately detect and timestamp critical "silent" events like patient transport, intubation, and incision. Automating these data points can drastically reduce clinician burnout, reveal hidden provider-level workflow variability, and enhance patient safety through real-time "sterile cockpit" monitoring. To address the ethical, medicolegal, and ergonomic implications of deploying "always-on" visual sensors, we emphasize the need for a paradigm shift in privacy engineering, utilizing edge-based skeletonization to mitigate surveillance concerns. Ultimately, by equipping the EHR with "eyes" as well as "ears," we can create a self-documenting operating room, transforming the EHR from a distractor into a silent, autonomous partner that restores the anesthesiologist’s unwavering focus to the patient.


 Citation

Please cite as:

Russo M, Bignami EG

Beyond the Keyboard: The Imperative for Multimodal Ambient AI and Computer Vision as the Anesthesiologist’s "Visual Scribe"

JMIR Preprints. 15/03/2026:95366

DOI: 10.2196/preprints.95366

URL: https://preprints.jmir.org/preprint/95366

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.