Accepted for/Published in: JMIR mHealth and uHealth
Date Submitted: Feb 19, 2021
Date Accepted: May 9, 2022
Validity of Chatbot Use for Mental Health Assessment: An Experimental Study
ABSTRACT
Background:
Mental disorders in youth are a major public health concern. Digital tools like text-based conversational agents (i.e., chatbots) represent a promising technology to facilitate mental health assessment. However, the human-like interaction style of chatbots may induce biases, such as social desirable responding (SDR), and may further demand more effort to complete assessments.
Objective:
The study aimed to investigate i) the convergent and discriminant validity of chatbots for mental health assessment, ii) the effect of assessment mode on SDR and iii) effort of assessments using chatbots compared to established modes.
Methods:
In a counterbalanced within-subject design, we assessed two different constructs, i.e., psychological distress (Kessler Psychological Distress Scale [K10], Brief Symptom Inventory [BSI-18]), and problematic alcohol use (Alcohol Use Disorders Identification Test [AUDIT-3]), in three modes (chatbot, paper-and-pencil, web-based) and i) compared convergent and discriminant validity. In addition, we investigated the effect of mode on ii) SDR controlling for perceived sensitivity of items and individuals’ tendency to respond socially desirable, as well as assessed perceived social presence of modes. Including a between-subject condition, we further investigated whether SDR is increased in chatbot assessments when applied in a self-report setting vs. when a human interaction may be expected. Finally, iii) effort, i.e., complexity, difficulty, burden, and the time required to complete the assessments was investigated.
Results:
146 young adults (mean age 24 [SD = 6.42] years; females: 45.89%) were recruited from a research panel for a laboratory experiment. Results revealed i) high positive correlations (all P < .001) of measures of psychological distress (K10, BSI-18) and AUDIT-3 across different modes indicating convergent validity of chatbot assessments. Further, there were no correlations between constructs indicating discriminant validity. Moreover, ii) there were no differences in SDR between modes and whether or not a human interaction was expected, although perceived social presence of the chatbot mode was higher compared to established modes (P < .001). Finally, iii) higher effort (all P < .05) and more time was needed to complete chatbot assessments compared to established modes (P < .001).
Conclusions:
The current study indicated that chatbots may yield valid results, and further established an understanding for their design trade-offs in terms of potential strengths (i.e., increased social presence) and limitations (i.e., increased effort) when assessing mental health.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.