Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR mHealth and uHealth

Date Submitted: Feb 19, 2021
Date Accepted: May 9, 2022

The final, peer-reviewed published version of this preprint can be found here:

Validity of Chatbot Use for Mental Health Assessment: Experimental Study

Schick A, Feine J, Morana S, Maedche A, Reininghaus U

Validity of Chatbot Use for Mental Health Assessment: Experimental Study

JMIR Mhealth Uhealth 2022;10(10):e28082

DOI: 10.2196/28082

PMID: 36315228

PMCID: 9664331

Validity of Chatbot Use for Mental Health Assessment: An Experimental Study

  • Anita Schick; 
  • Jasper Feine; 
  • Stefan Morana; 
  • Alexander Maedche; 
  • Ulrich Reininghaus

ABSTRACT

Background:

Mental disorders in youth are a major public health concern. Digital tools like text-based conversational agents (i.e., chatbots) represent a promising technology to facilitate mental health assessment. However, the human-like interaction style of chatbots may induce biases, such as social desirable responding (SDR), and may further demand more effort to complete assessments.

Objective:

The study aimed to investigate i) the convergent and discriminant validity of chatbots for mental health assessment, ii) the effect of assessment mode on SDR and iii) effort of assessments using chatbots compared to established modes.

Methods:

In a counterbalanced within-subject design, we assessed two different constructs, i.e., psychological distress (Kessler Psychological Distress Scale [K10], Brief Symptom Inventory [BSI-18]), and problematic alcohol use (Alcohol Use Disorders Identification Test [AUDIT-3]), in three modes (chatbot, paper-and-pencil, web-based) and i) compared convergent and discriminant validity. In addition, we investigated the effect of mode on ii) SDR controlling for perceived sensitivity of items and individuals’ tendency to respond socially desirable, as well as assessed perceived social presence of modes. Including a between-subject condition, we further investigated whether SDR is increased in chatbot assessments when applied in a self-report setting vs. when a human interaction may be expected. Finally, iii) effort, i.e., complexity, difficulty, burden, and the time required to complete the assessments was investigated.

Results:

146 young adults (mean age 24 [SD = 6.42] years; females: 45.89%) were recruited from a research panel for a laboratory experiment. Results revealed i) high positive correlations (all P < .001) of measures of psychological distress (K10, BSI-18) and AUDIT-3 across different modes indicating convergent validity of chatbot assessments. Further, there were no correlations between constructs indicating discriminant validity. Moreover, ii) there were no differences in SDR between modes and whether or not a human interaction was expected, although perceived social presence of the chatbot mode was higher compared to established modes (P < .001). Finally, iii) higher effort (all P < .05) and more time was needed to complete chatbot assessments compared to established modes (P < .001).

Conclusions:

The current study indicated that chatbots may yield valid results, and further established an understanding for their design trade-offs in terms of potential strengths (i.e., increased social presence) and limitations (i.e., increased effort) when assessing mental health.


 Citation

Please cite as:

Schick A, Feine J, Morana S, Maedche A, Reininghaus U

Validity of Chatbot Use for Mental Health Assessment: Experimental Study

JMIR Mhealth Uhealth 2022;10(10):e28082

DOI: 10.2196/28082

PMID: 36315228

PMCID: 9664331

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.