Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Aging

Date Submitted: Sep 8, 2021
Date Accepted: Jul 23, 2022

The final, peer-reviewed published version of this preprint can be found here:

Evaluating Web-Based Automatic Transcription for Alzheimer Speech Data: Transcript Comparison and Machine Learning Analysis

Soroski T, da Cunha Vasco T, Newton-Mason S, Granby S, Lewis C, Harisinghani A, Rizzo M, Conati C, Murray G, Carenini G, Field TS, Jang H

Evaluating Web-Based Automatic Transcription for Alzheimer Speech Data: Transcript Comparison and Machine Learning Analysis

JMIR Aging 2022;5(3):e33460

DOI: 10.2196/33460

PMID: 36129754

PMCID: 9536526

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Evaluating Web-Based Automatic Transcription for Alzheimer’s Speech Data: Transcript Comparison and Machine Learning Analysis

  • Thomas Soroski; 
  • Thiago da Cunha Vasco; 
  • Sally Newton-Mason; 
  • Saffrin Granby; 
  • Caitlin Lewis; 
  • Anuj Harisinghani; 
  • Matteo Rizzo; 
  • Cristina Conati; 
  • Gabriel Murray; 
  • Giuseppe Carenini; 
  • Thalia Shoshana Field; 
  • Hyeju Jang

ABSTRACT

Background:

Speech data for medical research can be collected non-invasively and in large volumes. Speech analysis has shown promise in diagnosing neurodegenerative disease. To effectively leverage speech data, transcription is important as there is valuable information contained in lexical content. Manual transcription, while highly accurate, limits potential scalability and cost savings associated with language-based screening.

Objective:

To better understand the use of automatic transcription for classification of neurodegenerative disease (Alzheimer’s Disease [AD], mild cognitive impairment [MCI] or subjective memory complaints [SMC] versus healthy controls), we compared automatically generated transcripts against transcripts that went through manual correction.

Methods:

We recruited individuals from a memory clinic (“patients”) with a diagnosis of mild-moderate AD, (n=44), MCI (n=20), SMC (n=8) and healthy controls living in the community (n=77). Participants were asked to describe a standardized picture, read a paragraph, and recall a pleasant life experience. We compared transcripts generated using Google speech-to-text software to manually-verified transcripts by examining transcription confidence scores, transcription error rates, and machine learning classification accuracy. For the classification tasks, Logistic Regression, Gaussian Naive Bayes, and Random Forests were used.

Results:

The transcription software showed higher confidence scores (P<.001) and lower error rates (P>.05) for speech from healthy controls as compared with patients. Classification models using human-verified transcripts significantly (P<.001) outperformed automatically-generated transcript models for both spontaneous speech tasks. This comparison showed no difference in the reading task. Manually adding pauses to transcripts had no impact on classification performance. Manually correcting both spontaneous speech tasks led to significantly higher performances in the machine learning models.

Conclusions:

We found that automatically-transcribed speech data could be used to distinguish patients with a diagnosis of AD, MCI or SMC from controls. We recommend a human verification step to improve the performance of automatic transcripts, especially for spontaneous tasks. Moreover, human verification can focus on correcting errors and adding punctuation to transcripts. Manual addition of pauses, however, is not needed, which can simplify the human verification step to more efficiently process large volumes of speech data.


 Citation

Please cite as:

Soroski T, da Cunha Vasco T, Newton-Mason S, Granby S, Lewis C, Harisinghani A, Rizzo M, Conati C, Murray G, Carenini G, Field TS, Jang H

Evaluating Web-Based Automatic Transcription for Alzheimer Speech Data: Transcript Comparison and Machine Learning Analysis

JMIR Aging 2022;5(3):e33460

DOI: 10.2196/33460

PMID: 36129754

PMCID: 9536526

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.