Currently accepted at: JMIR Mental Health
Date Submitted: Aug 27, 2025
Open Peer Review Period: Aug 28, 2025 - Oct 23, 2025
Date Accepted: Dec 25, 2025
(closed for review but you can still tweet)
This paper has been accepted and is currently in production.
It will appear shortly on 10.2196/82642
The final accepted version (not copyedited yet) is in this tab.
Effectiveness of a fully automated mobile therapeutic versus a general chatbot in reducing depression and anxiety and improving well-being: A pilot study
ABSTRACT
Background:
Given the increasing prevalence of depression and anxiety disorders and enduring barriers to care, there is a critical need for alternative treatment options. Generative AI chatbots show promise for increasing access to mental health care, though more direct research is needed to establish their efficacy.
Objective:
This pilot study tested the efficacy of a generative mental health chatbot rooted in solution-focused therapy compared to the general-purpose ChatGPT and an assessment only control (AOC) group on depression, anxiety, and well-being.
Methods:
A total of 185 online recruited English-speaking adults were randomly assigned to one of three groups: AI therapy, ChatGPT, or AOC. Of these, 147 eligible participants filled out a pretreatment assessment. Over a three-week period, the AI therapy group (n=44) was instructed to complete three structured, fully automated app-based sessions per week (9 total), while the ChatGPT group (n=60) was instructed to engage in 9 unstructured conversations with ChatGPT (GPT-4o-based models). The control group (n=43) received no intervention. In AI therapy group, 39% completed all sessions, and 62% of those in the ChatGPT group. Primary outcomes measures, self-assessed online at baseline and post-intervention, included PHQ-9, ODSIS (depression), GAD-7 (anxiety), and WHO-5 (well-being). Linear mixed-effects models were employed for data analysis.
Results:
Compared to AOC, both the AI therapy group (d = -0.47, P = 0.012) and the ChatGPT group (d = -0.44, P = 0.023) demonstrated significant reductions in depression scores measured by PHQ-9. The AI therapy group showed non-significant reductions in anxiety (d = -0.37, p =0.106), ODSIS depression scores (d = -0.25, P = 0.220), and an increase in well-being (d = 0.12; P = 0.525) compared to AOC. Similarly, a non-significant reduction in anxiety (d = -0.27, P = 0.220), ODSIS depression scores (d = -0.12, P = 0.530), and an increase in well-being (d = 0.20, P = 0.285) were observed in the ChatGPT group compared to AOC. The AI therapy group did not significantly outperform the ChatGPT group on any outcomes (PHQ-9: b = -0.19, d = 0.03, P = 0.874; GAD-7: b = -0.57, d = -0.11, P = 0.621; ODSIS: b = -0.59, d = -0.13, P = 0.498; WHO: b = -0.38, d = -0.07, P = 0.691).
Conclusions:
Both the structured generative AI chatbot and ChatGPT showed a significant reduction in depression scores compared to the control group. No significant effects were observed across other outcomes, although descriptive trends indicated improvements in anxiety. While the AI therapy group showed descriptively better outcomes for depression and anxiety, differences between groups were not significant. A larger sample and longer intervention may be needed for the emerging trends to yield clinically meaningful effect sizes. Clinical Trial: OSF Registries https://osf.io/r76ef
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.