Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Mental Health

Date Submitted: Dec 5, 2024
Date Accepted: Mar 30, 2025

The final, peer-reviewed published version of this preprint can be found here:

A Comparison of Responses from Human Therapists and Large Language Model–Based Chatbots to Assess Therapeutic Communication: Mixed Methods Study

Scholich T, Barr M, Wiltsey Stirman S, Raj S

A Comparison of Responses from Human Therapists and Large Language Model–Based Chatbots to Assess Therapeutic Communication: Mixed Methods Study

JMIR Ment Health 2025;12:e69709

DOI: 10.2196/69709

PMID: 40397927

PMCID: 12138294

Can Chatbots Offer What Therapists Do? A mixed methods comparison between responses from therapists and LLM-based chatbots

  • Till Scholich; 
  • Maya Barr; 
  • Shannon Wiltsey Stirman; 
  • Shriti Raj

ABSTRACT

Background:

Consumers are increasingly using LLM-based chatbots to seek mental health advice or intervention due to ease of access and limited availability of mental health professionals. However, their suitability and safety for mental health applications remain underexplored, particularly in comparison to professional therapeutic practices.

Objective:

This study aimed to evaluate how general-purpose chatbots respond to mental health scenarios and compare their responses to those provided by licensed therapists. Specifically, we sought to identify chatbots’ strengths, limitations, and the ethical and practical considerations necessary for their use in mental health care.

Methods:

We conducted a mixed methods study to compare responses from chatbots and licensed therapists to scripted mental health scenarios. We created two fictional scenarios and prompted 3 chatbots to create six interaction logs. We recruited 17 therapists and conducted study sessions that consisted of three activities. First, therapists responded to the two scenarios using a Qualtrics form. Second, therapists went through the six interaction logs using a think aloud procedure to highlight their thoughts about the chatbots’ responses. Lastly, we conducted a semi-structured interview to explore subjective opinions on the use of chatbots for supporting mental health. The study sessions were analyzed using thematic analysis. The interaction logs from chatbot and therapist responses were coded using the Multitheoretical List of Therapeutic Interventions codes and then compared to each other.

Results:

We identified seven themes describing the strengths and limitations of the chatbots as compared to therapists. These include elements of good therapy in chatbot’s responses, conversational style of chatbots, insufficient inquiry and feedback seeking by chatbots, chatbot interventions, client engagement, chatbots’ responses to crisis situations, and considerations for chatbot-based therapy. In the use of MULTI codes, we found that therapists evoked more elaboration (t = 4.50, p = 0.001) and employed more self-disclosure (t = 1.05, p = 0.31) as compared to the chatbots. The chatbots used affirming (t = 1.71, p = 0.10) and reassuring (t = 2.29, p = 0.03) language more often than the therapists. The chatbots also employed psychoeducation (t = 2.69, p = 0.01) and suggestions (t = 4.23, p = 0.001) more often than the therapists did.

Conclusions:

Our study demonstrates the unsuitability of general purpose chatbots to safely engage in mental health conversations, particularly in crisis situations. While chatbots display elements of good therapy, such as validation and reassurance, overuse of directive advice without sufficient inquiry, and use of generic interventions makes them unsuitable as therapeutic agents. Careful research and evaluation will be necessary to determine the impact of chatbot interactions and to identify the most appropriate use cases related to mental health.


 Citation

Please cite as:

Scholich T, Barr M, Wiltsey Stirman S, Raj S

A Comparison of Responses from Human Therapists and Large Language Model–Based Chatbots to Assess Therapeutic Communication: Mixed Methods Study

JMIR Ment Health 2025;12:e69709

DOI: 10.2196/69709

PMID: 40397927

PMCID: 12138294

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.