Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Oct 25, 2021
Open Peer Review Period: Oct 25, 2021 - Dec 20, 2021
Date Accepted: Dec 18, 2022
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Automatic Depression Detection Using Smartphone-Based Text-Dependent Speech Signals: Deep Convolutional Neural Network Approach

Kim A, Jang EH, Lee SH, Choi KY, Park JG, Shin HC

Automatic Depression Detection Using Smartphone-Based Text-Dependent Speech Signals: Deep Convolutional Neural Network Approach

J Med Internet Res 2023;25:e34474

DOI: 10.2196/34474

PMID: 36696160

PMCID: 9909514

Automatic Depression Detection Using Smartphone-Based Text-Dependent Speech Signals: A Deep CNN Approach

  • Ahyoung Kim; 
  • Eun Hye Jang; 
  • Seung-Hwan Lee; 
  • Kwang-Yeon Choi; 
  • Jeon Gyu Park; 
  • Hyun-Chool Shin

ABSTRACT

Background:

In the future, automatic diagnosis of depression based on speech could complement mental health treatment methods. Previous studies have reported that acoustic properties can be used to recognize depression, including mel-frequency cepstrum coefficients (MFCCs) applied to speech recognition. However, there are few studies in which these characteristics allow differential diagnosis of patients with depressive disorder.

Objective:

This paper proposes a framework to help with automatic depression detection in a mobile environment where speech data can be easily obtained. Specifically, we recorded speech data by performing a predefined text-based speech reading task on mobile, investigated whether the recorded data can screen for depression, and proposed a deep learning-based framework that helps in automatic depression detection.

Methods:

We recruited 125 patients who met the criteria for major depressive disorder (MDD) and 113 healthy controls without current or past mental illness. Participants' voices were recorded on smart-phone while performing the task of reading predefined text-based sentences. We investigated the differences in the voice characteristics between MDD and healthy control groups using statistical analysis. We also investigated the possibility of automatic depression detection using the proposed log mel (LM) spectrogram-based deep Convolutional Neural Networks (CNN) architectures and machine learning models.

Results:

We found that there were statistically discernable differences between MDD and control groups in the MFCC features extracted through the utterances of reading predefined text-based sentences. Moreover, the best accuracies achieved with LM spectrogram-based CNN and softmax classifier on the speech data are 80.00% accuracy. Our results show that the deep-learned acoustic characteristics lead to better performances of classifiers than those using the conventional approach.

Conclusions:

Conclusions:

In conclusion, this study suggests that the analysis of speech data recorded while reading text-dependent sentences could help predict depression status automatically by capturing characteristics of depression. Our method can contribute to an approach that allows individuals to easily and automatically assess their depressive state anytime, anywhere, without the need for experts to conduct psychological assessments on-site.


 Citation

Please cite as:

Kim A, Jang EH, Lee SH, Choi KY, Park JG, Shin HC

Automatic Depression Detection Using Smartphone-Based Text-Dependent Speech Signals: Deep Convolutional Neural Network Approach

J Med Internet Res 2023;25:e34474

DOI: 10.2196/34474

PMID: 36696160

PMCID: 9909514

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.