Accepted for/Published in: JMIR Mental Health
Date Submitted: Jun 29, 2025
Open Peer Review Period: Jun 29, 2025 - Aug 24, 2025
Date Accepted: Sep 29, 2025
(closed for review but you can still tweet)
Evaluating Generative Artificial Intelligence Psychotherapy Chatbots Used by Youth: A Cross-Sectional Study
ABSTRACT
Background:
Many youth rely on direct-to-consumer generative artificial intelligence (GenAI) chatbots for mental health support, yet the quality of the psychotherapeutic capabilities of these chatbots is understudied.
Objective:
We sought to comprehensively evaluate and compare the quality of widely used GenAI chatbots with psychotherapeutic capabilities.
Methods:
In this cross-sectional study, trained raters used an evaluation framework to rate the quality of five chatbots from GenAI platforms widely used by youth. Trained raters roleplayed as youth using personas of youth with mental health challenges to prompt chatbots, facilitating conversations. Chatbot responses were generated from August to October 2024. The primary outcomes were rated scores in nine sections. The proportion of high-quality ratings (binary rating of 1) across each section was compared between chatbots using Bonferroni-corrected χ2 tests.
Results:
While GenAI chatbots were found to be accessible (104 high-quality ratings [87%]) and avoid harmful statements and misinformation (71 of 80 [89%]), they performed poorly in their therapeutic approach (14 of 45 [35%]) and their ability to monitor and assess risk (31 of 80 [39%]). Information on chatbot model training and knowledge was unavailable, resulting in low scores. Bonferroni-corrected χ2 tests showed statistically significant differences in chatbot quality in the background, therapeutic approach, and monitoring and risk evaluation sections. Qualitatively, raters perceived most chatbots as having strong conversational abilities but found them plagued by various issues, including fabricated content and poor handling of crisis situations.
Conclusions:
Overall, direct-to-consumer GenAI chatbots showed mixed results in terms of quality, suggesting potential for harm and demonstrating a greater need for transparency and oversight. These findings may enable youth and other stakeholders to make informed decisions about using chatbots for mental health support.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.