JMIR Preprints #65605: An Evaluation Framework to Assess the Quality of Psychotherapy Conversational Agents: Development and Chatbot Evaluation Study

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

An Evaluation Framework to Assess the Quality of Psychotherapy Conversational Agents: Development and Chatbot Evaluation Study

Kunmi Sobowale;
Daniel Kevin Humphrey

ABSTRACT

Background:

Despite potential risks, artificial intelligence-based chatbots that simulate psychotherapy are becoming more widely available and frequently used by the general public. A comprehensive way of evaluating the quality of these chatbots is needed.

Objective:

To address this need, we developed the CAPE (Conversational Agent for Psychotherapy Evaluation) framework to aid clinicians, researchers, and lay users in assessing psychotherapy chatbot quality.

Methods:

We identified four popular chatbots on OpenAI’s GPT store. Two reviewers independently applied the CAPE framework to these chatbots, using two fictional personas to simulate interactions. The modular framework has eight sections, each yielding an independent quality subscore. We used t-tests and non-parametric Wilcoxon Signed Rank tests to examine pairwise differences in subscores between chatbots.

Results:

Chatbots consistently scored highly on the sections of background information, conversational capabilities, therapeutic alliance and boundaries, and accessibility. Scores were low for the therapeutic orientation and monitoring and risk evaluation sections. Information on training data and knowledge base sections was not transparent. Except for the privacy and harm section, there were no differences in subscores between the chatbots.

Conclusions:

The CAPE framework offers a robust and reliable method for assessing the quality of psychotherapy chatbots, enabling users to make informed choices based on their specific needs and preferences. Our evaluation revealed that while the popular chatbots on OpenAI’s GPT store were effective at developing rapport and were easily accessible, they failed to address essential safety and privacy functions adequately.

Citation

Please cite as:

Sobowale K, Humphrey DK

Evaluating the Quality of Psychotherapy Conversational Agents: Framework Development and Cross-Sectional Study

JMIR Form Res 2025;9:e65605

DOI: 10.2196/65605

PMID: 40600851

PMCID: 12239686

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Formative Research

Date Submitted: Aug 20, 2024

Date Accepted: Jan 19, 2025

An Evaluation Framework to Assess the Quality of Psychotherapy Conversational Agents: Development and Chatbot Evaluation Study

ABSTRACT

Citation

Copyright