JMIR Preprints #34333: The multimodal assessment of emotion dysregulation: how advances in deep multimodal fusion unleash training complex models with small samples

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

The multimodal assessment of emotion dysregulation: how advances in deep multimodal fusion unleash training complex models with small samples

Federico Parra;
Yannick Benezeth;
Fan Yang

ABSTRACT

Background:

Emotion dysregulation is a key dimension of adult psychological functioning. There is an interest in developing a computer-based, multimodal, and automatic measure.

Objective:

To train a deep multimodal fusion model to estimate emotion dysregulation in adults based on their responses to the Multimodal Developmental Profile (MDP), a computer-based psychometric test, using only a small training sample and without transfer learning.

Methods:

Two hundred and forty-eight participants from three different countries took the MDP, which exposed them to 14 picture and music stimuli and asked them to express their feelings about them, while the software extracted features from the video and audio signals: facial expressions, linguistic and paralinguistic characteristics of speech, head movements, gaze direction and heart rate variability (HRV) features. Participants also responded to the brief version of the Difficulties in Emotional Regulation Scale (DERS-16). We separated and averaged the feature signals that corresponded to the responses to each stimulus, building a structured dataset. We transformed each person's per-stimulus structured data into a multimodal codex – a grayscale image created by projecting each feature's normalized intensity value onto a cartesian space, deriving each pixel’s position by applying the Uniform Manifold Approximation and Projection (UMAP) method to our transposed dataset. The codex sequence was treated as a video-to-regression problem: first, 13 CNN networks dealt with the spatial aspect of the problem - estimating emotion dysregulation by analyzing each of the codified responses. These CNN estimations were then fed to a Transformer network that decoded the temporal aspect of the problem, estimating emotional dysregulation based on the succession of responses. We introduce the Feature Map Average Pooling (FMAP) layer, which computes the mean of the convolved feature maps produced by our convolution layers, dramatically reducing the number of learnable weights and increasing regularization through an ensembling effect. The CNNs networks include a local feature extraction (LFE) module that allows them to keep learning from a detail level even at the deepest layers of the network. We implemented 8-fold cross-validation to provide a good enough estimation of generalization ability to unseen samples. Most of the experiments in this paper are easily replicable using the associated Google Colab.

Results:

We found an average Pearson correlation of r=.55 (with average p<.00) between DERS-16 emotion dysregulation and our system's estimation of emotion dysregulation. An average MAE of .16 and a mean CCC of .54 were also found.

Conclusions:

In psychometry, our results represent a very high correlation and excellent evidence of convergence validity, suggesting the MDP could be used in conjunction with this methodology to provide a valid measure of emotion dysregulation in adults. Future studies should replicate our findings using a hold-out test sample. Our methodology could be implemented more generally to train deep neural networks for multimodal fusion or even other purposes where only small training samples are available.

Citation

Please cite as:

Parra F, Benezeth Y, Yang F

Automatic Assessment of Emotion Dysregulation in American, French, and Tunisian Adults and New Developments in Deep Multimodal Fusion: Cross-sectional Study

JMIR Ment Health 2022;9(1):e34333

DOI: 10.2196/34333

PMID: 35072643

PMCID: 8822434

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Mental Health

Date Submitted: Oct 19, 2021

Open Peer Review Period: Oct 19, 2021 - Oct 26, 2021

Date Accepted: Nov 23, 2021

(closed for review but you can still tweet)

The multimodal assessment of emotion dysregulation: how advances in deep multimodal fusion unleash training complex models with small samples

ABSTRACT

Citation

Copyright