Accepted for/Published in: JMIR Formative Research
Date Submitted: Sep 15, 2025
Date Accepted: Feb 27, 2026
Multimodal Depression Detection through Conversational Interactions with an Emotion-Aware Social Robot: Pilot Study
ABSTRACT
Background:
Depression affects more than 300 million people worldwide and is a leading contributor to the global disease burden. Traditional diagnostic methods, such as structured clinical interviews, are reliable but impractical for frequent or large-scale screening. Self-report tools like the PHQ-8 require disclosure and clinician oversight, limiting accessibility. Recent AI-based approaches leverage multimodal behavioral cues (linguistic, acoustic, visual) for automated depression detection but remain constrained by limited adaptability, scarce annotated data, weak emotional expression in real-world settings, and the high computational cost of deployment on social assistant robots (SARs).
Objective:
This study introduces DEPRESAR-Fusion, a lightweight multimodal depression detection framework designed for natural interactions with emotion-aware SARs. The objective was to enhance detection accuracy in everyday conversations while addressing the challenges of data scarcity, weak emotional cues, and computational efficiency.
Methods:
DEPRESAR-Fusion integrates acoustic, linguistic, and visual features with an emotion-aware response module powered by large language models (LLMs) to adapt conversational strategies dynamically. To stimulate richer emotional expression, participants were exposed to emotionally evocative videos before SAR interactions. To overcome data scarcity, we augmented training with (1) public depression-related social media corpora and (2) synthetic samples generated via LLMs. The proposed multimodal fusion architecture was evaluated on benchmark clinical datasets for both binary depression classification and PHQ-8 regression tasks. Performance was compared against prior multimodal baselines using root mean square error (RMSE), mean absolute error (MAE), and classification accuracy.
Results:
Participants who viewed emotional stimuli before interacting with SARs exhibited significantly higher emotional expressiveness, leading to improved model performance. Regression tasks showed lower RMSE and MAE, while classification tasks achieved significantly higher accuracy than the non-stimulus condition. DEPRESAR-Fusion outperformed prior multimodal baselines across multiple benchmark datasets, achieving state-of-the-art performance in both binary classification and PHQ-8 regression. The system maintained a lightweight architecture suitable for real-time deployment on SARs.
Conclusions:
DEPRESAR-Fusion demonstrates that integrating emotion induction, data augmentation, and lightweight multimodal fusion can enable accurate and scalable depression detection in naturalistic SAR interactions. By bridging the gap between structured clinical assessments and everyday conversations, this approach highlights the potential of SAR-based systems as non-intrusive, AI-driven tools for proactive mental health support. Clinical Trial: This study involving human participants was reviewed and approved by the NTHU Research Ethics Committee D (National Tsing Hua University, Taiwan) on June 18, 2021 (Protocol No. 202105013RINB). As the primary outcome of this work was system performance evaluation rather than patient health outcomes, this study does not constitute a clinical trial as defined by ICMJE guidelines, and trial registration was not required.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.