Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Formative Research

Date Submitted: Oct 3, 2025
Date Accepted: May 15, 2026

The final, peer-reviewed published version of this preprint can be found here:

Automatic Speech Recognition and Acoustic Analysis for Dysarthria Assessment in Telerehabilitation: User-Centered Design and Usability Study

Vinet P, Dillenbourg P, Slot A, Selvanayakam S, Giovanoli S, Du E, Cardoso J, Branscheidt M, Easthope Awai C, Bauer CM

Automatic Speech Recognition and Acoustic Analysis for Dysarthria Assessment in Telerehabilitation: User-Centered Design and Usability Study

JMIR Form Res 2026;10:e85230

DOI: 10.2196/85230

PMID: 42397858

I-Speak-Tele: A Prototype Web Application Combining Automatic Intelligibility Scoring and Acoustic Feature Analysis for Dysarthric Speech

  • Pierre Vinet; 
  • Pierre Dillenbourg; 
  • Amelieke Slot; 
  • Sharmila Selvanayakam; 
  • Sandra Giovanoli; 
  • Elisa Du; 
  • Julia Cardoso; 
  • Meret Branscheidt; 
  • Chris Easthope Awai; 
  • Christoph Michael Bauer

ABSTRACT

Background:

Dysarthria is a frequent motor speech disorder following stroke, affecting up to 42% of survivors and resulting in reduced speech intelligibility and diminished quality of life. Clinical assessments such as the Frenchay Dysarthria Assessment–2 (FDA-2) rely heavily on subjective judgment by speech-language pathologists (SLPs), which limits comparability and scalability. Telepractice solutions have the potential to extend access to care, but validated digital tools that combine automatic analysis with clinically usable interfaces remain scarce.

Objective:

This study aimed to develop and evaluate a web-based application that integrates automatic speech recognition (ASR) and acoustic analysis into a user-centered dashboard for SLPs. Specifically, we investigated: (1) whether ASR can provide intelligibility scores comparable to human listeners; (2) the usability of the system in two iterative cycles with SLPs; and (3) the feasibility of presenting clinically relevant acoustic features to support tele-rehabilitation.

Methods:

A user-centered design process was followed, involving contextual inquiry, requirements gathering, prototype development, and iterative testing with SLPs. The analytical core of the prototype included an ASR module (Whisper Large-v3) to compute intelligibility scores, combining word error rate–based accuracy with sentence- and word-level alignment. Phoneme-level error highlighting was implemented to identify frequent substitution or deletion patterns. In parallel, an acoustic module extracted clinically relevant measures, including fundamental frequency (mean and range), intensity (mean and variability), and vowel formants (F1–F2 space), supplemented by sustained phonation duration. A pilot validation compared ASR-based intelligibility scores with transcriptions from eight lay listeners for three dysarthric patients performing FDA-2 word and sentence tasks. Usability was evaluated in two cycles with eight and four SLPs, respectively, using the System Usability Scale (SUS) and structured questionnaires.

Results:

In the pilot validation, ASR performance was comparable to, and in some cases better than, untrained human listeners for mild and moderate dysarthria, though performance declined with severe cases. Both usability cycles yielded excellent SUS scores (Cycle 1 mean 88.4; Cycle 2 mean 91.7). Core workflow elements, including navigation, session upload, and intelligibility score presentation, were consistently rated highly. Feedback evolved from bug reports and requests for clearer terminology in Cycle 1 to suggestions for advanced analytic features in Cycle 2, such as additional voice-quality indices and integrated note-taking.

Conclusions:

The prototype demonstrates that automatic intelligibility scoring and acoustic analysis can be integrated into a clinically usable, web-based dashboard. While current limitations include reliance on English-only phoneme analysis, limited advanced acoustic features, and lack of regulatory compliance, the application achieved excellent usability and shows promise for scalable tele-rehabilitation. Future work should expand multilingual support, incorporate additional acoustic measures, and validate the tool in larger clinical cohorts.


 Citation

Please cite as:

Vinet P, Dillenbourg P, Slot A, Selvanayakam S, Giovanoli S, Du E, Cardoso J, Branscheidt M, Easthope Awai C, Bauer CM

Automatic Speech Recognition and Acoustic Analysis for Dysarthria Assessment in Telerehabilitation: User-Centered Design and Usability Study

JMIR Form Res 2026;10:e85230

DOI: 10.2196/85230

PMID: 42397858

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.