Currently accepted at: JMIR Mental Health
Date Submitted: Nov 18, 2025
Open Peer Review Period: Nov 20, 2025 - Jan 15, 2026
Date Accepted: Feb 16, 2026
(closed for review but you can still tweet)
This paper has been accepted and is currently in production.
It will appear shortly on 10.2196/88057
The final accepted version (not copyedited yet) is in this tab.
Large Language Models and their Applications in Mental Health: A Scoping Review
ABSTRACT
Background:
Large language models (LLMs) are poised to transform mental healthcare, offering advanced capabilities in diagnosis, prognosis, and decision support. Since their inception, numerous mental health-focused LLMs have emerged in scientific literature, reflecting the growing interest in leveraging these models across various clinical applications. With a broad range of models available, diverse optimization strategies, and multiple use cases, reviewing the current landscape is critical to understanding where future impact lies.
Objective:
We performed a scoping review investigating the use of LLMs in mental health across diagnostic, prognostic, and decision support tasks.
Methods:
We screened 3,121 papers from PubMed, Scopus, and Web of Science for studies published between January 2023 and October 2025, using terms related to LLM and mental health. After removing duplicates, two reviewers independently screened the studies, with a third to resolve conflicting opinions. We extracted and synthesized information on the models, use-case, datasets and adaptation methods from selected papers.
Results:
41 papers were selected. Many studies included evaluations on OpenAI’s GPT series applications: GPT-4 (24 studies, 58.5%) and GPT-3.5 (15 studies, 36.6%). Others included BERT derived models (7 studies, 17.1%), LLaMA (8 studies, 19.5%), and RoBERTa derived models (6 studies, 14.6%). While all studies initially applied out-of-box LLMs, several adapted them through few-shot learning or fine-tuning to better align with specific research goals. The most common use case was in diagnostics (31 studies, 75.6%) while the most common target condition was depression (11 studies, 26.8%). While many studies reported superior performance of LLMs, only a minority of studies (13 studies, 29.3%) validated LLM performance against clinician assessments using real patient data, with the majority relying on proxy outcomes such as clinical vignettes, exam questions, or social media posts.
Conclusions:
Despite rapid growth and diversity of LLM applications in mental health, the field remains nascent and exploratory. Future developments must emphasize consistent model adaptation procedures to ensure safety and clinical workflow alignment. Models must also be evaluated on robust evaluation criteria, using standardized protocols and real clinical outcome measures.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.