JMIR Preprints #96433: Boundary Safety in Multi-Turn Mental Health Dialogues With Large Language Models: A Simulation-Based Evaluation Study

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Boundary Safety in Multi-Turn Mental Health Dialogues With Large Language Models: A Simulation-Based Evaluation Study

Youyou Cheng;
Zhuangwei Kang;
Kerry Jiang;
Chenyu Sun;
Qiyang Pan;
Tianyu Jiang

ABSTRACT

Background:

Large language models (LLMs) have been widely used for mental health support. However, current safety evaluations in this field are mostly limited to detecting whether LLMs output prohibited words in single-turn conversations, neglecting the gradual erosion of safety boundaries in long dialogues.

Objective:

This study aims to characterize how safety boundaries erode during multi-turn mental health conversations and to compare different pressure mechanisms that accelerate boundary violations.

Methods:

We developed a multi-turn stress-testing framework and conducted long-dialogue safety tests on three cutting-edge LLMs using two pressure methods: static progression and adaptive probing. We generated 50 virtual patient profiles and stress-tested each model through up to 20 rounds of virtual psychiatric dialogues.

Results:

Violations were common across all models, with both pressure modes producing similar violation rates. However, adaptive probing significantly advanced the time-to-breach, reducing the average number of turns from 9.21 in static progression to 4.64. Under both mechanisms, making definitive or zero-risk promises was the primary way in which boundaries were breached. Certainty reassurance accounted for 56.5% of violations in static progression and 48.5% in adaptive probing.

Conclusions:

These findings suggest that the robustness of LLM safety boundaries cannot be inferred solely through single-turn tests; it is necessary to fully consider the wear and tear on safety boundaries caused by different interaction pressures and characteristics in extended dialogues. Clinical implications include the need for multi-turn safety evaluation protocols and awareness that empathetic responses may gradually drift into boundary violations.

Citation

Please cite as:

Cheng Y, Kang Z, Jiang K, Sun C, Pan Q, Jiang T

Boundary Safety in Multi-Turn Mental Health Dialogues With Large Language Models: A Simulation-Based Evaluation Study

JMIR Preprints. 28/03/2026:96433

DOI: 10.2196/preprints.96433

URL: https://preprints.jmir.org/preprint/96433

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Currently submitted to: JMIR Formative Research

Date Submitted: Mar 28, 2026

Open Peer Review Period: Apr 29, 2026 - Jun 24, 2026

(currently open for review)

Boundary Safety in Multi-Turn Mental Health Dialogues With Large Language Models: A Simulation-Based Evaluation Study

ABSTRACT

Citation

Copyright