Accepted for/Published in: JMIR Mental Health
Date Submitted: Dec 12, 2024
Date Accepted: Feb 18, 2025
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Beyond the Buzz: A Systematic Review of Generative AI’s Capabilities in Mental Health
ABSTRACT
Background:
The global shortage of mental health professionals, exacerbated by increasing mental health needs post-COVID-19, has driven interest in leveraging large language models (LLMs) like ChatGPT to address these challenges through applications such as clinical note generation, personalized treatment planning, and therapeutic support.
Objective:
This systematic review aims to evaluate the current capabilities of generative AI (genAI) models in the context of mental health applications.
Methods:
A comprehensive search across five databases yielded 1,046 references, of which eight studies met the inclusion criteria. These criteria required original research with experimental designs (e.g., Turing tests, socio-cognitive tasks, trials, or qualitative methods), a focus on genAI models, and explicit measurement of socio-cognitive abilities (e.g., empathy, emotional awareness), mental health outcomes, and user experience (e.g., perceived trust, empathy).
Results:
The studies, published between 2023 and 2024, primarily evaluated models like ChatGPT 3.5 and 4.0, Bard, and Claude in tasks such as psychoeducation, diagnosis, emotional awareness, and clinical interventions. Most studies employed zero-shot prompting and human evaluators to assess the AI responses, using standardized rating scales or qualitative analysis. However, these methods were often insufficient to fully capture the complexity of genAI capabilities. The reliance on single-shot evaluation techniques, limited comparisons, and task-based assessments isolated from a specific context may oversimplify genAI’s abilities and overlook the nuances of human-AI interaction, especially in areas requiring contextual reasoning or cultural sensitivity. The findings suggest that while genAI models demonstrate strengths in psychoeducation and emotional awareness, their diagnostic accuracy, cultural competence, and ability to engage users emotionally remain limited. Users frequently reported concerns about trustworthiness, accuracy, and the lack of emotional engagement.
Conclusions:
Future research could use more sophisticated evaluation methods, such as few-shot and chain-of-thought prompting to fully uncover genAI’s potential. Future studies should also focus on longitudinal research, broader comparisons with human benchmarks, and exploring how AI can be better integrated into mental health care with improved socio-cognitive and ethical decision-making capabilities.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.