JMIR Preprints #94586: Evaluating Dutch-Language Ambient Listening in Simulated Clinical Encounters: Comparing Three Providers in a Multi-Speaker, Multi-Dialect Study

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Evaluating Dutch-Language Ambient Listening in Simulated Clinical Encounters: Comparing Three Providers in a Multi-Speaker, Multi-Dialect Study

Anne Hannah Hoekman;
Tarannom Mehri;
Job N. Doornberg;
Mirjam Plantinga;
Tom P. van der Laan;
Charlotte M.H.H.T. Bootsma-Robroeks;
Rosanne C. Schoonbeek

ABSTRACT

Background:

Clinicians spent a lot of time on Electronic Health Record (EHR) documentation, often at the expense of patient interaction. Ambient listening technology uses artificial intelligence to passively record and summarize clinical encounters. While initial studies are promising, there is limited evidence on system performance in complex, non-English settings.

Objective:

To compare the documentation performance of three commercially available ambient listening systems in simulated Dutch-language outpatient consultations by assessing note completeness, correctness, and conciseness under predefined linguistic and interactional challenges.

Methods:

Standardized audio recordings of ten scripted physician–patient interactions in two specialties were used. Scenarios included multi-speaker dynamics (patient companion), conversational disruptions (nurse interruption), evasive patient communication, and a regional dialect (Gronings). Three distinct AI documentation systems (Provider A, Provider B, and Provider C) processed the audio files. Eight human raters evaluated the resulting AI-generated notes against reference summaries for Completeness, Conciseness, and Correctness using a 5-point ordinal scale. Inter-rater agreement was assessed using Gwet’s AC2. System-level technical characteristics were assessed alongside clinical performance to aid interpretation of between-vendor differences.

Results:

Across 351 ratings on a 1-5 scale, the overall inter-rater agreement was high (Gwet’s AC2 = 0.827). Mean scores were tightly clustered across providers (Provider C: 4.26, Provider B: 4.00, Provider A: 3.82). Mean scores were higher in Otolaryngology (mean 4.36) than Surgical Oncology (mean 3.68). Across scoring domains, correctness received the highest mean score (4.21), while completeness received lowest (3.81). Variation in mean scores was observed across script scenarios. Dialect-specific scenarios showed the lowest mean score (3.77) and the greatest variability across providers. Median summary generation times ranged from 13.5 seconds (Provider C) to 22.0 seconds (Provider B).

Conclusions:

Ambient listening systems demonstrate good performance in Dutch clinical settings, even under conditions simulating common conversational challenges. While accuracy is generally high, performance is sensitive to linguistic variation. Future deployment studies must prioritize linguistic equity, real-world validation of efficiency gains, and evaluation of both clinician and patient perception to understand how these systems influence consultation dynamics and care delivery across diverse patient populations.

Citation

Please cite as:

Hoekman AH, Mehri T, Doornberg JN, Plantinga M, van der Laan TP, Bootsma-Robroeks CM, Schoonbeek RC

Evaluating Dutch-Language Ambient Listening in Simulated Clinical Encounters: Comparing Three Providers in a Multi-Speaker, Multi-Dialect Study

JMIR Preprints. 03/03/2026:94586

DOI: 10.2196/preprints.94586

URL: https://preprints.jmir.org/preprint/94586

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Currently submitted to: JMIR Medical Informatics

Date Submitted: Mar 3, 2026

Open Peer Review Period: Mar 12, 2026 - May 7, 2026

(currently open for review)

Evaluating Dutch-Language Ambient Listening in Simulated Clinical Encounters: Comparing Three Providers in a Multi-Speaker, Multi-Dialect Study

ABSTRACT

Citation

Copyright