Accepted for/Published in: JMIR Mental Health
Date Submitted: Apr 2, 2026
Date Accepted: May 4, 2026
Date Submitted to PubMed: May 4, 2026
When AI Colludes: Clinical Reliability of Training and Preference Data as a Trustworthy-AI Criterion
ABSTRACT
The growing literature on artificial intelligence (AI) and mental health has focused predominantly on the downstream risks of AI interaction with vulnerable users and the potential of AI to deliver psychological interventions. This Viewpoint argues that a more fundamental question has received insufficient attention: how reliable is the human-generated data on which AI systems were trained? Large language models learn from text produced by humans subject to well-documented cognitive biases, including anchoring, confirmation bias, catastrophising, and selective abstraction. In clinical populations, these distortions are further amplified by the illness itself. Recent empirical evidence demonstrates that users prefer AI responses that distort reality over those that challenge it, creating a feedback loop that rewards the reproduction of cognitive bias. This paper introduces the clinical concept of collusion to the AI safety conversation, arguing that AI systems optimised for user approval collude with distorted input by default. It proposes three immediate steps: incorporating clinical expertise into training data evaluation, routine clinical enquiry about AI use, and integrating the clinical validity of training data into AI governance frameworks.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.