Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Nov 18, 2024
Date Accepted: May 2, 2025

The final, peer-reviewed published version of this preprint can be found here:

Facial Emotion Recognition of 16 Distinct Emotions From Smartphone Videos: Comparative Study of Machine Learning and Human Performance

Keinert M, Pistrosch S, Mallol-Ragolta A, Schuller BW, Berking M

Facial Emotion Recognition of 16 Distinct Emotions From Smartphone Videos: Comparative Study of Machine Learning and Human Performance

J Med Internet Res 2025;27:e68942

DOI: 10.2196/68942

PMID: 40601921

PMCID: 12268218

Facial Emotion Recognition of 16 Distinct Emotions from Smartphone Video: Comparing Machine-Learning vs Human Performance

  • Marie Keinert; 
  • Simon Pistrosch; 
  • Adria Mallol-Ragolta; 
  • Björn W. Schuller; 
  • Matthias Berking

ABSTRACT

Background:

The development of automatic emotion recognition models from smartphone videos is a crucial step toward the dissemination of psychotherapeutic app-interventions that encourage emotional expressions. Existing models focus mainly on the six basic emotions while neglecting other, therapeutically relevant emotions. To support this research, we introduce the novel Stress reduction Training through the Recognition of Emotions Wizard-of-Oz (STREs WoZ) dataset, which contains 14 412 smartphone videos of 63 individuals displaying 16 distinct, therapeutically relevant emotions.

Objective:

The aim of the present research is to develop automatic Facial Emotion Recognition (FER) models for binary (positive vs. negative) and mulit-class emotion classification tasks, to assess the model’s performance and to compare it to human observers in two studies.

Methods:

In Study 1, automatic FER models using both appearance and deep-learnt features for binary and multi-class emotion classification are developed. In Study 2, three human observers are trained on the same task. A test set of 3018 facial emotion videos is completed by both the automatic FER model and human observers. The performance is assessed with unweighted average recall.

Results:

Results show that appearance features outperform deep-learnt features in both tasks, with the attention network using appearance features emerging as the best-performing model. The attention network achieves an accuracy of 92.2 % in the binary classification task, comparable to human performance, but shows lower accuracy (59.0-90.0 %) in the multi-class task, falling short of human accuracy.

Conclusions:

Future studies are needed to enhance the performance of automatic FER models for practical use in psychotherapeutic apps. Nevertheless, this study makes an important first step towards advancing emotion-focused psychotherapeutic interventions via smartphone apps.


 Citation

Please cite as:

Keinert M, Pistrosch S, Mallol-Ragolta A, Schuller BW, Berking M

Facial Emotion Recognition of 16 Distinct Emotions From Smartphone Videos: Comparative Study of Machine Learning and Human Performance

J Med Internet Res 2025;27:e68942

DOI: 10.2196/68942

PMID: 40601921

PMCID: 12268218

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.