Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: May 28, 2025
Date Accepted: Oct 8, 2025
Date Submitted to PubMed: Oct 10, 2025

The final, peer-reviewed published version of this preprint can be found here:

The Efficacy of Rule-Based Versus Large Language Model-Based Chatbots in Alleviating Symptoms of Depression and Anxiety: Systematic Review and Meta-Analysis

Du Q, Ren Y, Meng Zl, He H, Meng S

The Efficacy of Rule-Based Versus Large Language Model-Based Chatbots in Alleviating Symptoms of Depression and Anxiety: Systematic Review and Meta-Analysis

J Med Internet Res 2025;27:e78186

DOI: 10.2196/78186

PMID: 41073272

PMCID: 12677872

The Efficacy of Rule-based Versus LLM-based Chatbots in Alleviating Symptoms of Depression and Anxiety: A Systematic Review and Meta-Analysis

  • Qiuxue Du; 
  • Yongliang Ren; 
  • Ze-long Meng; 
  • Han He; 
  • Shasha Meng

ABSTRACT

Background:

The global mental health crisis is becoming increasingly severe. Due to the shortage of mental health professionals, high treatment costs, and insufficient accessibility of services, there is an urgent need for scalable and low-cost intervention methods. In recent years, chatbots based on large language models (LLMs) have shown potential for providing convenient and scalable psychological interventions. The efficacy differences between LLM-based and Rule-based chatbots have not been systematically evaluated, with few studies directly comparing the two, and existing meta-analyses have notable limitations: there is high heterogeneity in intervention design (such as dialogue structure, interaction frequency, and duration of effects) across studies, and there is a lack of direct comparison of differentiated intervention effects on depressive and anxiety symptoms, making it difficult to integrate conclusions.

Objective:

By integrating empirical studies from the past five years, this research evaluates the differences in effectiveness between LLM-based and Rule-based chatbots in alleviating depressive and anxiety symptoms. It also analyzes the potential impacts of control group type, intervention duration, and age on intervention outcomes. Simultaneously deconstruct the function and the intervention effect, and conduct a comparative analysis. The aim is to provide evidence-based technological pathway options and optimization recommendations for differentiated interventions for depression and anxiety.

Methods:

A systematic search of eight databases, included 15 studies published between 2020 and 2025. Robust variance estimation (RVE) was used to account for non-independent effect sizes, and standardized mean differences (SMDs) were calculated using Hedges' g. Based on the expectation of clinical and methodological heterogeneity among studies, a random-effects model was preselected, and the pooled effect size was estimated using REML and interpreted according to Cohen's criteria. Publication bias was assessed using the RVE-adjusted Egger test, funnel plot asymmetry, and a fail-safe N.

Results:

For depression, Rule achieved a small effect size (g = 0.266, 95% CI [0.02, 0.512], p = 0.039), while LLM had a higher but nonsignificant effect size (g = 0.407, 95% CI [-0.734, 1.55], p = 0.169). For anxiety, Rule had no significant effect (g = 0.147, 95% CI [-0.073, 0.367], p = 0.152), while LLM showed great potential (g = 0.711, 95% CI [-0.334, 1.76], p = 0.127). Subgroup analysis showed that Rule was more effective than the control for depression, with the greatest effect in the medium term (4-8 weeks). LLM had greater potential for anxiety and was more effective in the short term (<4 weeks).

Conclusions:

Rule-based intervention is more suitable for depressive symptoms, while LLM-based intervention is more suitable for anxiety symptoms. However, the difference in intervention effects between the two technical pathways did not reach statistical significance. LLM-based chatbots may have greater potential, and the sample size needs to be expanded in the future to verify their robustness. Clinical Trial: This study has been pre-registered on the PROSPERO platform (Registration Number: CRD420250653433).


 Citation

Please cite as:

Du Q, Ren Y, Meng Zl, He H, Meng S

The Efficacy of Rule-Based Versus Large Language Model-Based Chatbots in Alleviating Symptoms of Depression and Anxiety: Systematic Review and Meta-Analysis

J Med Internet Res 2025;27:e78186

DOI: 10.2196/78186

PMID: 41073272

PMCID: 12677872

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.