Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Oct 24, 2025
Date Accepted: Mar 6, 2026

The final, peer-reviewed published version of this preprint can be found here:

Quality of Clinical Notes Created by Ambient Listening Generative AI: Pragmatic Prospective Pilot Study

Taylor SL, Jost M, MacDonald S, Red Y, Davenport S, Aizenberg D, Hall B, Lyles CR, Adams JY

Quality of Clinical Notes Created by Ambient Listening Generative AI: Pragmatic Prospective Pilot Study

JMIR Med Inform 2026;14:e86474

DOI: 10.2196/86474

PMID: 41996389

PMCID: 13089619

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Quality of clinical notes created by ambient-listening generative AI: A pragmatic, prospective pilot assessment

  • Sandra L Taylor; 
  • Melissa Jost; 
  • Scott MacDonald; 
  • Yunyi Red; 
  • Sadie Davenport; 
  • Debra Aizenberg; 
  • Bruce Hall; 
  • Courtney R Lyles; 
  • Jason Y Adams

ABSTRACT

Background:

Physicians routinely document specifics of patient encounters in clinic visit notes, a critical but potentially time-consuming task. Ambient-listening artificial intelligence (AI) technology is being integrated into clinical workflows to reduce documentation burden by creating draft visit notes. While this technology is promising, it is not perfect and the potential for patient harm needs to be understood and mitigated. We developed and piloted an efficient, standardized approach to evaluating AI-generated notes for safety concerns in ambulatory care visits.

Objective:

The objective of this study was to develop and pilot an efficient, standardized and scalable approach to evaluating AI-generated notes for safety concerns in ambulatory care visits.

Methods:

During a two-month pilot (July–August 2024), 31 physicians across multiple specialties used an ambient listening AI scribe to assist with clinic note creation. A novel survey instrument was developed to assess note quality, focusing on four error types: accidental inclusions, accidental omissions, hallucinations, and bias. Physicians evaluated 356 AI-generated notes. Where an error was present, physicians rated its severity in terms of its potential to cause patient harm if it was not corrected on a 0–5 scale. Additionally, a vendor-reported metric on the percentage of note content edited by physicians was analyzed.

Results:

Accidental omissions were the most frequent error (18%), followed by hallucinations (11.5%), and accidental inclusions (9.3%). Bias was rare (1%). Most errors were rated as mild to moderate (severity 1–3), with only 2.5% of notes containing errors rated as posing serious or imminent risk (severity 4–5). Editing metrics showed a median of 9.0% of AI-generated words were changed, with 15% of notes left entirely unedited. Physician editing practices varied widely.

Conclusions:

AI-generated clinical notes were generally of high quality, with over 80% free from significant errors. However, because a small number contained errors that carried risk of serious harm if not corrected, careful clinician review of notes remains imperative. Prior to deploying an AI scribe, organizations should pilot the technology and include an efficient review process to understand the nature and type of errors common at their organization. This pilot provides a scalable model for other health systems seeking to implement AI scribe technology responsibly.


 Citation

Please cite as:

Taylor SL, Jost M, MacDonald S, Red Y, Davenport S, Aizenberg D, Hall B, Lyles CR, Adams JY

Quality of Clinical Notes Created by Ambient Listening Generative AI: Pragmatic Prospective Pilot Study

JMIR Med Inform 2026;14:e86474

DOI: 10.2196/86474

PMID: 41996389

PMCID: 13089619

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.