Facial Expression-Based Evaluation of the Emotion Estimation Software “Kokoro Sensor”: A Pilot Study on the Validity and Reliability in Healthy Individuals
ABSTRACT
Background:
In recent years, artificial intelligence (AI) systems have increasingly been used to assess emotional states in health care. AI offers a safe, quick, user-friendly, and objective emotional evaluation method. However, evidence supporting its implementation in health care remains limited.
Objective:
Thus, the present study aimed to explore the concurrent validity and test–retest reliability of emotion recognition AI based on facial expressions.
Methods:
In this study, we used the Kokoro Sensor, an accurate and widely recognized automated facial expression recognition system. The Japanese version of the Profile of Mood States-Short Form (POMS-SF) was used to screen the potential influence of mental states on facial expressions. The study participants made positive, negative, and neutral expressions, which were analyzed by the emotion recognition AI. Agreement between the results of the AI and subjective evaluations was assessed by participants and independent researchers using a four-point Likert-type scale. The facial expressions and emotion analysis process were repeated after a 30-minute interval to investigate reliability. Concurrent validity was evaluated using the Content Validity Index (CVI) and coefficient, and test–retest reliability was determined using the coefficient.
Results:
The study participants were 40 individuals whose mental states did not deviate from the reference range of the POMS manual. Among the participants, the CVI values for positive, neutral, and negative expressions were 95%, 98%, and 85%, respectively. Among the researchers, the corresponding CVI values were 100%, 100%, and 70%, respectively. The overall weighted coefficient was 0.55 (95% confidence interval [CI]: 0.44–0.67), indicating moderate agreement. The agreement was almost-perfect for distinguishing positive from neutral expressions ( = 0.83, 95% CI: 0.70–0.95), but not statistically significant for distinguishing negative from neutral expressions ( = 0.15, 95% CI: –0.77 to 0.37). Test–retest reliability analysis showed an overall weighted coefficient of 0.66, reflecting substantial reliability. Almost-perfect reproducibility was observed for distinguishing positive from neutral expressions ( = 0.85, 95% CI: 0.73–0.97), while distinguishing negative from neutral expressions showed fair reproducibility ( = 0.36, 95% CI: 0.16–0.57).
Conclusions:
The present findings indicate that the Kokoro Sensor is a useful tool for identifying positive tendencies, given its acceptable validity and reliability in estimating overall expressions and distinguishing positive from neutral expressions. However, due to its limitations in distinguishing negative from neutral expressions, attention is needed when differentiating between negative from neutral expressions. In clinical settings, the Kokoro Sensor should serve as an assistive tool rather than a stand-alone method.
Citation
Per the author's request the PDF is not available.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.