Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Pediatrics and Parenting

Date Submitted: Dec 9, 2021
Open Peer Review Period: Dec 9, 2021 - Dec 21, 2021
Date Accepted: Jan 25, 2022
Date Submitted to PubMed: Apr 18, 2022
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Classifying Autism From Crowdsourced Semistructured Speech Recordings: Machine Learning Model Comparison Study

Chi N, Washington P, Kline A, Husic A, Hou C, He C, Dunlap K, Wall D

Classifying Autism From Crowdsourced Semistructured Speech Recordings: Machine Learning Model Comparison Study

JMIR Pediatr Parent 2022;5(2):e35406

DOI: 10.2196/35406

PMID: 35436234

PMCID: 9052034

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Classifying Autism from Crowdsourced Semi-Structured Speech Recordings: A Machine Learning System

  • Nathan Chi; 
  • Peter Washington; 
  • Aaron Kline; 
  • Arman Husic; 
  • Cathy Hou; 
  • Chloe He; 
  • Kaitlyn Dunlap; 
  • Dennis Wall

ABSTRACT

Background:

Autism spectrum disorder (ASD) is a neurodevelopmental disorder which results in altered behavior, social development, and communication patterns. In past years, autism prevalence has tripled, with 1 in 54 children now affected. Given that traditional diagnosis is a lengthy, labor-intensive process which requires the work of trained physicians, significant attention has been given to developing systems that automatically diagnose and screen for autism.

Objective:

Prosody abnormalities are among the most clear signs of autism, with affected children displaying speech idiosyncrasies (including echolalia, monotonous intonation, atypical pitch, and irregular linguistic stress patterns). In this work, we present a suite of machine learning approaches to detect autism in self-recorded speech audio captured from autistic and neurotypical (NT) children in home environments.

Methods:

We consider three methods to detect autism in child speech: first, Random Forests trained on extracted audio features (including Mel-frequency cepstral coefficients); second, convolutional neural networks (CNNs) trained on spectrograms; and third, fine-tuned wav2vec 2.0—a state-of-the-art Transformer-based speech recognition model. We train our classifiers on our novel dataset of cellphone-recorded child speech audio curated from Stanford’s Guess What? mobile game, an app designed to crowdsource videos of autistic and neurotypical children in a natural home environment.

Results:

The Random Forest classifier achieves 70% accuracy, the fine-tuned wav2vec 2.0 model achieves 77% accuracy, and the CNN achieves 79% accuracy when classifying children’s audio as either ASD or NT. We use five-fold cross-validation to evaluate model performance.

Conclusions:

Our models were able to predict autism status when training on a varied selection of home audio clips with inconsistent recording qualities, which may be more generalizable to real world conditions. The results demonstrate that machine learning methods offer promise in detecting autism automatically from speech without specialized equipment.


 Citation

Please cite as:

Chi N, Washington P, Kline A, Husic A, Hou C, He C, Dunlap K, Wall D

Classifying Autism From Crowdsourced Semistructured Speech Recordings: Machine Learning Model Comparison Study

JMIR Pediatr Parent 2022;5(2):e35406

DOI: 10.2196/35406

PMID: 35436234

PMCID: 9052034

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.