Accepted for/Published in: JMIR Mental Health
Date Submitted: Jan 14, 2026
Date Accepted: Mar 4, 2026
It’s the journey, not the destination: Moving from endpoints to trajectories when assessing chatbot mental health safety
ABSTRACT
Large language models are rapidly becoming embedded in everyday life through artificial intelligence (AI) chatbots that people use for practical assistance and companionship, as well as for support with mental health and emotional wellbeing. Alongside clear benefits, clinicians and public reports increasingly describe a minority of users whose interactions seem to drift over days or weeks toward strongly questionable convictions, delusions or suicidal crises. Importantly, clinically meaningful deterioration can occur even without overtly unsafe text outputs, via more insidious processes such as compulsive use and sleep disruption, as well as withdrawal from human contact and progressive narrowing of attention around the chatbot relationship. These patterns suggest that risk often arises not at a single tipping point but through trajectory effects that accumulate across extended dialogue. In this Viewpoint, we argue that prevailing safety evaluation approaches are misaligned with this reality because they primarily score discrete endpoints often reached through scripted dialogues lasting just a single turn or several turns. Mental health benchmarks and safety suites (including clinician-informed efforts) have advanced the field by testing refusal behaviour, toxicity, and adversarial prompting, but they often treat the last message as the unit of analysis and therefore miss when risk-relevant relational cues, signs of validation, contradiction handling, and shifts in certainty first emerge and how they compound. We propose that mental health safety assessment should shift from endpoints to trajectories by 1) treating the whole dialogue, not the end result, as the focus of evaluation; 2) reporting turn-by-turn dynamics such as delusion confirmation and harm enablement, as well as timing and persistence of safety interventions; and 3) calibrating short multi-turn tests against longer, clinically realistic interaction sequences that can reveal context-length effects and drift. We further argue that transcript-only evaluation is insufficient in mental health contexts. Similar language can reflect very different internal states, and the relationship between expressed psychopathology and real-world harm is non-linear. Safety research should therefore incorporate proximal human outcomes after interactions (e.g., shifts in certainty, openness to counterevidence, arousal, urge to continue, and subsequent sleep or behaviour) and build prospective clinical surveillance infrastructure that supports consented transcript donation and linkage to health outcomes. Together, these steps would enable benchmarks that are clinically relevant and better aligned with the kinds of harms now being observed in real-world chatbot use.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.