JMIR Preprints #82686: Text-Based Depression Estimation Using Machine Learning with Standard Labels: A Systematic Review and Meta-Analysis

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Text-Based Depression Estimation Using Machine Learning with Standard Labels: A Systematic Review and Meta-Analysis

Shengming Zhang;
Jiaxin Zhang

ABSTRACT

Background:

Depression affected people daily lives and even leads to suicidal behaviour. Text-based depression estimation using natural language processing (NLP) has emerged as a feasible approach for early mental health screening. However, most existing reviews often included studies with weak depression labels, which affected the reliability of the results and further limited the practical application of the automatic depression estimation (ADE) models.

Objective:

This review aimed to evaluate the predictive performance of text-based depression models which used standard labels, and to identify text resource, text representation, model architecture, annotation source and reporting quality contributing to performance heterogeneity.

Methods:

Following PRISMA guidelines, we systematically searched four main databases (PubMed, Scopus, IEEE Xplore and Web of Science) for studies published between 2014 and 2025. The eligible studies were included: Machine learning models were developed based on the text generated by the participants and used validated scales or clinical diagnoses as depression labels. Pooled effect sizes (r) were calculated using random-effects meta-analysis by Comprehensive Meta-Analysis software version 4.0. Subgroup and meta-regression analyses explored potential moderators.

Results:

We scanned 2,047 articles and finally filtered 14 models from 10 studies for the meta analysis. The overall pooled effect size was r = 0.582 (95% CI 0.487–0.663), indicating a large strength of association. Models using embedding-based features and deep model architectures showed higher predictive performance than those using traditional features and shallow models (r = 0.715 and 0.710; P <.001). Models using clinical diagnoses performed slightly better than those using self-report scales (r = 0.660 vs 0.500; P = .062). Reporting quality, assessed by TRIPOD, was positively associated with model performance (β = 0.077; P <.001), while sample size and positive rate were not significant.

Conclusions:

The text-based depression estimation models trained with standard labels perform well. Embedding features and deep model architecture yield better results. Using clinical diagnoses labels and transcribed speech tend to yield higher performance, though the influence is not statistically significant. Transparent reporting is essential for model reproducibility and comparison. Clinical Trial: PROSPERO (CRD20251056902)

Citation

Please cite as:

Zhang S, Zhang J

Text-Based Depression Estimation Using Machine Learning With Standard Labels: Systematic Review and Meta-Analysis

J Med Internet Res 2026;28:e82686

DOI: 10.2196/82686

PMID: 41671575

PMCID: 12936666

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Aug 20, 2025

Date Accepted: Jan 10, 2026

Text-Based Depression Estimation Using Machine Learning with Standard Labels: A Systematic Review and Meta-Analysis

ABSTRACT

Citation

Copyright