JMIR Preprints #66907: Multimodal Multitask Learning for Predicting Depression Severity and Suicide Risk Using Pretrained Audio and Text Embeddings: Methodology Development and Applications

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Multimodal Multitask Learning for Predicting Depression Severity and Suicide Risk Using Pretrained Audio and Text Embeddings: Methodology Development and Applications

Ya-Han Hu;
Ruei-Yan Wu;
Min-Yi Su;
I-Li Lin;
Cheng-Che Shen

ABSTRACT

Background:

Depression is a critical psychological disorder necessitating urgent assessment and treatment, given its strong association with increased suicide risk (SR). Effective management hinges on promptly identifying individuals with high depression severity (DS) and SR. While machine learning and deep learning have advanced the identification of DS and SR, research focusing on both aspects simultaneously remains limited and requires further refinement.

Objective:

This study aimed to evaluate whether the proposed methods, which integrates multitask learning (MTL), multimodal learning, and transfer learning, enhance the efficacy of deep learning models in the joint classification of DS and SR.

Methods:

This study proposes a multitask framework employing multimodal fusion strategy for pretrained audio and text embeddings to concurrently assess DS and SR. Data encompassing Chinese audio recordings and clinical questionnaire scores from 100 depressed patients and 100 healthy controls were employed. Preprocessed audio and text data were transformed into pretrained embeddings and integrated using concatenation and hard parameter sharing. The single-task learning (STL) models (DS and SR tasks) were evaluated with different embeddings and further compared with the MTL models.

Results:

The STL models demonstrated exceptional DS prediction (AUC=0.878) using wav2vec 2.0 combined with ERNIE-health, and SR prediction (AUC=0.876) using HuBERT combined with ERNIE-health. The MTL models significantly improved SR prediction over DS prediction, achieving the highest DS classification (AUC=0.887) with wav2vec 2.0 combined with ERNIE-health, and SR classification (AUC=0.883) with HuBERT combined with ERNIE-health.

Conclusions:

This study underscores the effectiveness of the proposed MTL models using specific pretrained audio and text embeddings in enhancing model performance. However, we advocate for cautious implementation of MTL to mitigate potential negative transfer effects. Our research presents a method that is both promising and effective, offering an objective approach for accurate clinical decision support in the parallel diagnosis of DS and SR.

Citation

Please cite as:

Hu YH, Wu RY, Su MY, Lin IL, Shen CC

Multimodal Multitask Learning for Predicting Depression Severity and Suicide Risk Using Pretrained Audio and Text Embeddings: Methodology Development and Application

JMIR Med Inform 2025;13:e66907

DOI: 10.2196/66907

PMID: 41166502

PMCID: 12574750

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Sep 29, 2024

Open Peer Review Period: Sep 26, 2024 - Nov 21, 2024

Date Accepted: Oct 5, 2025

(closed for review but you can still tweet)

Multimodal Multitask Learning for Predicting Depression Severity and Suicide Risk Using Pretrained Audio and Text Embeddings: Methodology Development and Applications

ABSTRACT

Citation

Copyright