Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Mental Health

Date Submitted: Dec 12, 2024
Date Accepted: Feb 18, 2025

The final, peer-reviewed published version of this preprint can be found here:

Evaluating Generative AI in Mental Health: Systematic Review of Capabilities and Limitations

Wang L, Bhanushali T, Huang Z, Yang J, Badami S, Hightow-Weidman L

Evaluating Generative AI in Mental Health: Systematic Review of Capabilities and Limitations

JMIR Ment Health 2025;12:e70014

DOI: 10.2196/70014

PMID: 40373033

PMCID: 12097452

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Beyond the Buzz: A Systematic Review of Generative AI’s Capabilities in Mental Health

  • Liying Wang; 
  • Tanmay Bhanushali; 
  • Zhouran Huang; 
  • Jingyi Yang; 
  • Sukriti Badami; 
  • Lisa Hightow-Weidman

ABSTRACT

Background:

The global shortage of mental health professionals, exacerbated by increasing mental health needs post-COVID-19, has driven interest in leveraging large language models (LLMs) like ChatGPT to address these challenges through applications such as clinical note generation, personalized treatment planning, and therapeutic support.

Objective:

This systematic review aims to evaluate the current capabilities of generative AI (genAI) models in the context of mental health applications.

Methods:

A comprehensive search across five databases yielded 1,046 references, of which eight studies met the inclusion criteria. These criteria required original research with experimental designs (e.g., Turing tests, socio-cognitive tasks, trials, or qualitative methods), a focus on genAI models, and explicit measurement of socio-cognitive abilities (e.g., empathy, emotional awareness), mental health outcomes, and user experience (e.g., perceived trust, empathy).

Results:

The studies, published between 2023 and 2024, primarily evaluated models like ChatGPT 3.5 and 4.0, Bard, and Claude in tasks such as psychoeducation, diagnosis, emotional awareness, and clinical interventions. Most studies employed zero-shot prompting and human evaluators to assess the AI responses, using standardized rating scales or qualitative analysis. However, these methods were often insufficient to fully capture the complexity of genAI capabilities. The reliance on single-shot evaluation techniques, limited comparisons, and task-based assessments isolated from a specific context may oversimplify genAI’s abilities and overlook the nuances of human-AI interaction, especially in areas requiring contextual reasoning or cultural sensitivity. The findings suggest that while genAI models demonstrate strengths in psychoeducation and emotional awareness, their diagnostic accuracy, cultural competence, and ability to engage users emotionally remain limited. Users frequently reported concerns about trustworthiness, accuracy, and the lack of emotional engagement.

Conclusions:

Future research could use more sophisticated evaluation methods, such as few-shot and chain-of-thought prompting to fully uncover genAI’s potential. Future studies should also focus on longitudinal research, broader comparisons with human benchmarks, and exploring how AI can be better integrated into mental health care with improved socio-cognitive and ethical decision-making capabilities.


 Citation

Please cite as:

Wang L, Bhanushali T, Huang Z, Yang J, Badami S, Hightow-Weidman L

Evaluating Generative AI in Mental Health: Systematic Review of Capabilities and Limitations

JMIR Ment Health 2025;12:e70014

DOI: 10.2196/70014

PMID: 40373033

PMCID: 12097452

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.