Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Oct 25, 2021
Open Peer Review Period: Oct 25, 2021 - Dec 20, 2021
Date Accepted: Dec 18, 2022
(closed for review but you can still tweet)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Automatic Depression Detection of Mobile-Based Text-dependent Speech Signals Using a Deep CNN Approach: A Prospective Cohort Study
ABSTRACT
Background:
In the future, automatic diagnosis of depression based on speech could complement mental health treatment methods. Previous studies have reported that acoustic properties can be used to recognize depression, including mel-frequency cepstrum coefficients (MFCCs) applied to speech recognition. However, there are few studies in which these characteristics allow differential diagnosis of patients with depressive disorder.
Objective:
This paper proposes a framework to help with automatic depression detection in a mobile environment where speech data can be easily obtained. Specifically, we recorded speech data by performing a predefined text-based speech reading task on mobile, investigated whether the recorded data can screen for depression, and proposed a deep learning-based framework that helps in automatic depression detection.
Methods:
We recruited 125 patients who met the criteria for major depressive disorder (MDD) and 113 healthy controls without current or past mental illness. Participants' voices were recorded on smart-phone while performing the task of reading predefined text-based sentences. We investigated the differences in the voice characteristics between MDD and healthy control groups using statistical analysis. We also investigated the possibility of automatic depression detection using the proposed log mel (LM) spectrogram-based deep Convolutional Neural Networks (CNN) architectures and machine learning models.
Results:
We found that there were statistically discernable differences between MDD and control groups in the MFCC features extracted through the utterances of reading predefined text-based sentences. Moreover, the best accuracies achieved with LM spectrogram-based CNN and softmax classifier on the speech data are 80.00% accuracy. Our results show that the deep-learned acoustic characteristics lead to better performances of classifiers than those using the conventional approach.
Conclusions:
Conclusions:
In conclusion, this study suggests that the analysis of speech data recorded while reading text-dependent sentences could help predict depression status automatically by capturing characteristics of depression. Our method can contribute to an approach that allows individuals to easily and automatically assess their depressive state anytime, anywhere, without the need for experts to conduct psychological assessments on-site.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.