JMIR Preprints #96835: Large language models as a new tool for iCBT therapists? A blinded clinician rating experiment

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Large language models as a new tool for iCBT therapists? A blinded clinician rating experiment

Thomas Tandrup Lamm;
Arthur Bran Herbener;
Oliver Rønn Christensen;
Malene Flensborg Damholdt;
Kaare Bro Wellnitz;
Heidi Frølund Pedersen;
Lisbeth Frostholm

ABSTRACT

Background:

Internet-based cognitive behavioral therapy (iCBT) is an effective and scalable alternative to face-to-face psychotherapy, but its reach is constrained by the time therapists spend reviewing patient input and manually drafting written responses. Studies suggest that large language models (LLMs) may be capable of generating high-quality therapeutic text, and may potentially be able to support therapists to deliver high quality care and increasing the number of patients each therapist can serve. The suitability of LLMs in an iCBT setting, however, remains insufficiently studied.

Objective:

This study aims to evaluate the quality of LLM-generated iCBT responses to patient messages by comparing them to the quality of responses produced by humans.

Methods:

In a pre-registered blinded clinician rating experiment, experienced clinicians assessed the quality of human-produced versus LLM-generated therapist responses within a simulated iCBT treatment for functional somatic disorder. Raters were exposed to a stimulus material consisting of five fictitious patient messages, each paired with one human and one LLM-generated response. Raters assessed message/response pairs on five quality dimensions (overall quality, helpfulness, empathy, professionalism, and protocol adherence) and were asked to indicate the source of the response (human/LLM). Analyses were primarily descriptive, supplemented by exploratory statistical tests and descriptive thematic content analysis of open-ended text fields. The full pre-registered study protocol is available at https://osf.io/yxncv/.

Results:

A total of 61 raters provided data while 54 were eligible and included for analysis. Human- and LLM-generated responses were rated similarly across quality dimensions on a 1-5 scale: overall quality (LLM: M = 4.00 vs human: M = 3.96, d = 0.06), helpfulness (LLM: M = 3.85 vs human: M = 3.93, d = 0.13), professionalism (LLM: M = 4.25 vs human: M = 4.11, d = 0.24), and protocol adherence (LLM: M = 4.13 vs human: M = 4.13, d = 0.03). LLM-generated responses, however, received higher scores on empathy (LLM: M = 4.31 vs human: M = 4.08, d=0.42). Raters correctly identified the source of human-generated responses (79%) more accurately than LLM-generated responses (63%). In all, 55% of raters responded to one or more open text fields. Qualitative analysis indicated that LLM-generated responses were perceived as polished but also generic and at times excessively empathetic.

Conclusions:

LLM-generated responses were judged to be of comparable quality to those written by human therapists, though qualitative feedback indicated they were at times generic and insufficiently challenging. These findings provide initial support for the feasibility of using LLMs as therapist-support tools in iCBT, but further research is needed to determine whether their integration yields tangible clinical and organizational benefits.

Citation

Please cite as:

Lamm TT, Herbener AB, Christensen OR, Damholdt MF, Wellnitz KB, Pedersen HF, Frostholm L

Large language models as a new tool for iCBT therapists? A blinded clinician rating experiment

JMIR Preprints. 01/04/2026:96835

DOI: 10.2196/preprints.96835

URL: https://preprints.jmir.org/preprint/96835

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Currently submitted to: JMIR Mental Health

Date Submitted: Apr 1, 2026

Open Peer Review Period: Apr 1, 2026 - May 27, 2026

(currently open for review)

Large language models as a new tool for iCBT therapists? A blinded clinician rating experiment

ABSTRACT

Citation

Copyright