JMIR Preprints #103517: Generative AI Chatbot Responses to Suicide and Self-Harm: A Systematic Review

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Generative AI Chatbot Responses to Suicide and Self-Harm: A Systematic Review

Deirdra Ashleigh Kelly;
Peter J. Franz;
Ayelet Gutman;
Mirabel Sleiman;
Jessica Dorsky;
Harsheen Singh;
Wesley Appel;
Sydney Morgan Wyatt;
Daniel M. Low;
Shannon Wiltsey Stirman;
Gabriela Kattan Khazanov

ABSTRACT

Background:

A growing number of US adults and youth confide in generative artificial intelligence (AI) chatbots for mental health support, including disclosure of suicide and self-harm risk. While the quality, safety, and effectiveness of chatbot responses to risk disclosure have the potential to impact population-level rates of suicide and self-harm, there have been no systematic reviews of this burgeoning literature.

Objective:

We conducted a systematic review of studies evaluating generative AI chatbot responses to disclosure of suicide and self-harm risk.

Methods:

We searched six databases from January 2020-December 2025 and identified empirical studies involving interactions with generative AI chatbots that included discussion of suicide or self-harm. Following deduplication, studies (k = 1,042) were imported into Covidence and titles and abstracts were independently screened by two reviewers, with discrepancies resolved by a third reviewer. The same methods were used to evaluate 126 full texts. Data extraction was led by one reviewer and verified by a second.

Results:

We identified 29 papers (14 published; 15 preprints). Most (k = 20) were solely audit studies evaluating AI chatbot responses to suicide risk disclosure. Two developed chatbots or AI evaluation frameworks, and one was a jailbreaking study (adversarially testing AI systems or attempting to circumvent chatbot safety guardrails). The remaining studies combined approaches. Across studies, proprietary, frontier model chatbots (eg, ChatGPT, Claude) provided higher quality responses to suicide and self-harm risk than open-source chatbots (eg, LlaMA, DeepSeek), and many AI companions (eg, Replika, Character.AI). All chatbots, not just proprietary models, generally performed well on empathy, validation, and support. However, chatbot responses were often generic and lacked context. Chatbots did not proactively assess risk and performed most poorly when risk disclosure was ambiguous or moderate, frequently failing to recognize implicit risk or escalate to human-delivered services. Furthermore, responses were inconsistent between chatbots and often required multiple conversational turns before providing referrals to crisis resources and human-delivered professional support. While there were few examples of overtly harmful responses under standard conditions, jailbreaking attempts easily led to problematic responses. Finally, no chatbot proactively recommended limiting access to lethal means such as firearms, medications, or sharps.

Conclusions:

Chatbots provide validation and support in response to suicide and self-harm disclosure. Overall, however, their poor risk assessment, delays in referrals to crisis resources and human-delivered support, difficulty detecting jailbreaking attempts, and general lack of adherence to clinical guidelines present safety risks. While findings are limited by the rapid versioning of AI models over time, research is needed to evaluate stakeholder perspectives on AI chatbot responses to suicide and self-harm risk disclosure. Research should also examine the short- and long-term impact of these responses on clinical outcomes, utilizing follow-up assessments in real-world or clinical settings. Clinical Trial: OSF Registries osf.io/9uva3

Citation

Please cite as:

Kelly DA, Franz PJ, Gutman A, Sleiman M, Dorsky J, Singh H, Appel W, Wyatt SM, Low DM, Stirman SW, Khazanov GK

Generative AI Chatbot Responses to Suicide and Self-Harm: A Systematic Review

JMIR Preprints. 05/06/2026:103517

DOI: 10.2196/preprints.103517

URL: https://preprints.jmir.org/preprint/103517

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Currently submitted to: Journal of Medical Internet Research

Date Submitted: Jun 5, 2026

Open Peer Review Period: Jun 6, 2026 - Aug 1, 2026

(currently open for review)

Generative AI Chatbot Responses to Suicide and Self-Harm: A Systematic Review

ABSTRACT

Citation

Copyright