Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Aug 20, 2025
Date Accepted: Jan 10, 2026

The final, peer-reviewed published version of this preprint can be found here:

Text-Based Depression Estimation Using Machine Learning With Standard Labels: Systematic Review and Meta-Analysis

Zhang S, Zhang J

Text-Based Depression Estimation Using Machine Learning With Standard Labels: Systematic Review and Meta-Analysis

J Med Internet Res 2026;28:e82686

DOI: 10.2196/82686

PMID: 41671575

PMCID: 12936666

Text-Based Depression Estimation Using Machine Learning with Standard Labels: A Systematic Review and Meta-Analysis

  • Shengming Zhang; 
  • Jiaxin Zhang

ABSTRACT

Background:

Depression affected people daily lives and even leads to suicidal behaviour. Text-based depression estimation using natural language processing (NLP) has emerged as a feasible approach for early mental health screening. However, most existing reviews often included studies with weak depression labels, which affected the reliability of the results and further limited the practical application of the automatic depression estimation (ADE) models.

Objective:

This review aimed to evaluate the predictive performance of text-based depression models which used standard labels, and to identify text resource, text representation, model architecture, annotation source and reporting quality contributing to performance heterogeneity.

Methods:

Following PRISMA guidelines, we systematically searched four main databases (PubMed, Scopus, IEEE Xplore and Web of Science) for studies published between 2014 and 2025. The eligible studies were included: Machine learning models were developed based on the text generated by the participants and used validated scales or clinical diagnoses as depression labels. Pooled effect sizes (r) were calculated using random-effects meta-analysis by Comprehensive Meta-Analysis software version 4.0. Subgroup and meta-regression analyses explored potential moderators.

Results:

We scanned 2,047 articles and finally filtered 14 models from 10 studies for the meta analysis. The overall pooled effect size was r = 0.582 (95% CI 0.487–0.663), indicating a large strength of association. Models using embedding-based features and deep model architectures showed higher predictive performance than those using traditional features and shallow models (r = 0.715 and 0.710; P <.001). Models using clinical diagnoses performed slightly better than those using self-report scales (r = 0.660 vs 0.500; P = .062). Reporting quality, assessed by TRIPOD, was positively associated with model performance (β = 0.077; P <.001), while sample size and positive rate were not significant.

Conclusions:

The text-based depression estimation models trained with standard labels perform well. Embedding features and deep model architecture yield better results. Using clinical diagnoses labels and transcribed speech tend to yield higher performance, though the influence is not statistically significant. Transparent reporting is essential for model reproducibility and comparison. Clinical Trial: PROSPERO (CRD20251056902)


 Citation

Please cite as:

Zhang S, Zhang J

Text-Based Depression Estimation Using Machine Learning With Standard Labels: Systematic Review and Meta-Analysis

J Med Internet Res 2026;28:e82686

DOI: 10.2196/82686

PMID: 41671575

PMCID: 12936666

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.