Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: May 15, 2024
Date Accepted: Nov 17, 2024
Date Submitted to PubMed: Nov 22, 2024

The final, peer-reviewed published version of this preprint can be found here:

Usefulness of Automatic Speech Recognition Assessment of Children With Speech Sound Disorders: Validation Study

Kim DH, Jeong JW, Kang D, Ahn T, Hong Y, Im Y, Kim JW, Kim MJ, Jang DH

Usefulness of Automatic Speech Recognition Assessment of Children With Speech Sound Disorders: Validation Study

J Med Internet Res 2025;27:e60520

DOI: 10.2196/60520

PMID: 39576242

PMCID: 11775490

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Usefulness of Automatic Speech Recognition Assessment of Children with Speech Sound Disorders: Validation Study

  • Do Hyung Kim; 
  • Joo Won Jeong; 
  • Dayoung Kang; 
  • Taekyung Ahn; 
  • Yeonjung Hong; 
  • Younggon Im; 
  • Jae Won Kim; 
  • Min Jung Kim; 
  • Dae-Hyun Jang

ABSTRACT

Background:

Speech sound disorders (SSDs) are common communication challenges in children, evaluated using standardized tools by speech language pathologists. However, traditional evaluation methods are time-consuming and subject to slight variations in reliability among testers.

Objective:

We developed and assessed the performance of an automatic speech recognition (ASR) model in detecting incorrect pronunciations among children with speech sound disorders (SSDs).

Methods:

This ASR model is an end-to-end model pretrained on a dataset comprising 436,000 hours of adult voice data spanning 128 languages. The model was additionally trained with 137 hours of speech data from typically developing children to adapt it to children’s voices and from children with articulation errors (93.6 minutes) to enhance error detection. Two standardized SSDs tests, Assessment of Phonology and Articulation for Children (APAC) and Urimal Test of Articulation and Phonology (U-TAP), were utilized, and the ASR transcriptions were compared with those by speech-language pathologists (SLPs).

Results:

This study included 30 children, aged 3–7 years, who were suspected to have speech sound disorders (SSDs). The reliability between SLPs and ASR for the percentage of consonants correct (PCC) was excellent, with an interclass correlation coefficient (ICC) of 0.984 for APAC (95% CI: .953–.994) and 0.978 for UTAP (95% CI: .941–.990). The phoneme error rates (PER) for APAC and U-TAP were 11.5% and 12.22%, respectively, reflecting discrepancies at the phoneme level between ASR and SLPs transcriptions. Regarding disagreements between the ASR and SLPs, there were an average of 2.37 and 2.7 occurrences per child for phonemes transcribed as correct pronunciations and 7.8 and 7 occurrences per child for phonemes transcribed as incorrect pronunciations by SLPs in APAC and U-TAP, respectively.

Conclusions:

This study demonstrates the effectiveness of ASR in identifying incorrect pronunciations in children with SSDs.


 Citation

Please cite as:

Kim DH, Jeong JW, Kang D, Ahn T, Hong Y, Im Y, Kim JW, Kim MJ, Jang DH

Usefulness of Automatic Speech Recognition Assessment of Children With Speech Sound Disorders: Validation Study

J Med Internet Res 2025;27:e60520

DOI: 10.2196/60520

PMID: 39576242

PMCID: 11775490

Per the author's request the PDF is not available.