Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Nov 3, 2025
Open Peer Review Period: Nov 3, 2025 - Dec 29, 2025
Date Accepted: Mar 17, 2026
(closed for review but you can still tweet)
Detection of Micro-Behavior Intervals: A Clinically Relevant and Advanced Multimodal Temporal Approach for Predicting Mental Health
ABSTRACT
Background:
Healthcare workers (HCWs) face sustained psychological demands that place them at heightened risk for burnout and posttraumatic stress disorder (PTSD). Yet, assessing psychological distress in this population remains challenging due to stigma, underreporting, and the limitations of self-report tools. Although nonverbal behaviors hold diagnostic promise, most approaches overlook the fine-grained, temporal fluctuations in these signals. In this study, we focused on micro-behavior intervals—brief, involuntary changes in multimodal nonverbal signals—that emerge during emotion-eliciting interviews.
Objective:
To determine whether micro-behavior intervals improve discrimination of psychological distress profiles among HCWs with symptoms of burnout and PTSD.
Methods:
HCWs participated in a semi-structured interview that included five work-related, emotionally charged questions and was recorded via Webex (online video platform). Participants also completed validated questionnaires for burnout (MBI-GS-9) and PTSD (PCL-5). Recordings were analyzed with computer vision models to generate time series of facial expressions, head movement, gaze, body posture, and hand gestures. An unsupervised anomaly detection model (MOMENT) isolated micro-behavior intervals without the need for manual labels. Features derived from these intervals were used to train a deep learning classifier that predicted four symptom classes of psychological distress: ‘Moderate-Severe Burnout’, ‘Subthreshold-Provisional PTSD’, ‘Burnout + PTSD’, and ‘Resilient’. We conducted an ablation study by systematically removing one behavioral data stream at a time. Finally, we conducted an explainability analysis to characterize the features driving model predictions.
Results:
We analyzed 258 interview recordings from N=151 HCWs. Per interview, 19.65±6.01 micro-behavior intervals were detected, each lasting 1.31±1.10 seconds. The classifier demonstrated robust performance across classes, achieving a macro F1 = 0.75 and a macro ROC-AUC = 0.80 on held-out data. Ablation showed that excluding gaze or arousal-valence signals caused the largest performance declines, particularly in recall and F1 score. Explainability analysis revealed distinct temporal patterns across symptom classes, with irregularity and variability in micro-behaviors emerging as key predictors.
Conclusions:
Focusing on micro-behavior intervals yields a scalable, interpretable, and annotation-free framework for detecting psychological distress from nonverbal signals. By moving from whole-video features to fine-grained multimodal temporal modeling, we successfully captured subtle, involuntary fluctuations in nonverbal responses to emotion-eliciting questions. This multimodal approach enables objective, robust, and explainable assessment of psychological distress, offering a promising complement to conventional psychometric assessments.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.