Currently submitted to: JMIR Medical Informatics
Date Submitted: Mar 15, 2026
Open Peer Review Period: Mar 26, 2026 - May 21, 2026
(currently open for review)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Beyond the Keyboard: The Imperative for Multimodal Ambient AI and Computer Vision as the Anesthesiologist’s "Visual Scribe"
ABSTRACT
The administrative burden in anesthesiology has reached a critical tipping point, as the digitization of healthcare via Electronic Health Records (EHRs) often forces clinicians to spend more time interacting with screens than with patients. In the high-stakes, high-velocity environment of the Operating Room (OR), this "documentation tax" competes directly with cognitive vigilance. While "ambient AI scribes" that listen to and transcribe patient encounters are revolutionizing outpatient care, they remain largely ineffective in the perioperative setting, where care is a complex choreography of physical actions, physiological monitoring, and silent vigilance rather than mere conversation. This Viewpoint argues that the next generation of AI documentation in anesthesiology must evolve from unimodal "listening" to multimodal "sensing." We propose the concept of the "Visual Scribe," an ambient intelligence system integrating Computer Vision (CV) with audio and telemetry data to automatically document the physical reality of surgical care. Synthesizing current research on AI-enabled perioperative workflow analysis, we explore how CV algorithms—such as temporal action localization and pose estimation—can segment surgical cases into granular phases with superhuman precision. Contrasting the retrospective imprecision of manual documentation with the real-time capabilities of multimodal AI, we highlight how emerging architectures can accurately detect and timestamp critical "silent" events like patient transport, intubation, and incision. Automating these data points can drastically reduce clinician burnout, reveal hidden provider-level workflow variability, and enhance patient safety through real-time "sterile cockpit" monitoring. To address the ethical, medicolegal, and ergonomic implications of deploying "always-on" visual sensors, we emphasize the need for a paradigm shift in privacy engineering, utilizing edge-based skeletonization to mitigate surveillance concerns. Ultimately, by equipping the EHR with "eyes" as well as "ears," we can create a self-documenting operating room, transforming the EHR from a distractor into a silent, autonomous partner that restores the anesthesiologist’s unwavering focus to the patient.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.