JMIR Preprints #83790: Explainable and Interpretable AI for Voice and Speech Analysis in Clinical Care: A Systematic Review

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Explainable and Interpretable AI for Voice and Speech Analysis in Clinical Care: A Systematic Review

Mohamed Ebraheem;
Jamie Toghranegar;
Bridge2AI-Voice Consortium;
Yael Bensoussan;
John Michael Templeton

ABSTRACT

Background:

Driven by recent advances in artificial intelligence, particularly in medicine, audio-based voice and speech biomarkers are increasingly investigated for various medical applications as a complementary or even alternative modality to traditional medical devices. The adoption of deep learning techniques in recent literature is motivated by their superior performance compared to classical machine learning (ML) methods. However, ethical and regulatory concerns regarding the black-box nature of these models have limited their integration into clinical workflows. Consequently, Explainable AI (XAI) has recently been employed to address this issue by generating explanations for opaque model output. Ideally, medical XAI systems aim to provide human-understandable, clinically grounded explanations essential for enhanced AI trustworthiness and, thereby, facilitated adoption into real-world clinical settings.

Objective:

We conduct a systematic literature review of XAI methods applied for explaining deep learning techniques in audio-based voice and speech clinical applications. We present a taxonomy of XAI methods in the literature and discuss the limitations of these methods, particularly for their application to clinical audio, evaluation of XAI outputs, and stakeholder relevance of generated explanation. Then, we identify opportunities and recommendations for future clinical audio XAI design.

Methods:

This review follows the Systematic Reviews and Meta-Analyses (PRISMA) guidelines. Six databases (IEEEXplore, ACM, Scopus, PubMed, Web of Science, and Nature) were searched for articles between January 2015 and February 2025. Included studies applied explainability and/or interpretability methods to deep learning techniques for clinical voice and speech audio.

Results:

A taxonomy of XAI methods is presented for 30 eligible studies. These methods are grouped into four categories: visualization-based techniques, feature-importance and attribution methods, attention-based explanations, and concept detectors and model intrinsic approaches. We find that current XAI methods and implementations lack rigorous evaluation and validation, are not suitable for the unique nature of clinical audio, and do not align with stakeholder expectations and needs.

Conclusions:

This survey presents a categorization of XAI techniques employed for voice and speech AI. We discuss several gaps and considerations and identify several opportunities for future clinical audio XAI design.

Citation

Please cite as:

Ebraheem M, Toghranegar J, Bridge2AI-Voice Consortium , Bensoussan Y, Templeton JM

Explainable and Interpretable AI for Voice and Speech Analysis in Clinical Care: Systematic Review

J Med Internet Res 2026;28:e83790

DOI: 10.2196/83790

PMID: 42341346

PMCID: 13293602

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Sep 9, 2025

Date Accepted: Apr 20, 2026

Date Submitted to PubMed: May 5, 2026

Explainable and Interpretable AI for Voice and Speech Analysis in Clinical Care: A Systematic Review

ABSTRACT

Citation

Copyright