JMIR Preprints #96894: When AI Colludes: Clinical Reliability of Training and Preference Data as a Trustworthy-AI Criterion

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

When AI Colludes: Clinical Reliability of Training and Preference Data as a Trustworthy-AI Criterion

Hina Tahseen

ABSTRACT

The growing literature on artificial intelligence (AI) and mental health has focused predominantly on the downstream risks of AI interaction with vulnerable users and the potential of AI to deliver psychological interventions. This Viewpoint argues that a more fundamental question has received insufficient attention: how reliable is the human-generated data on which AI systems were trained? Large language models learn from text produced by humans subject to well-documented cognitive biases, including anchoring, confirmation bias, catastrophising, and selective abstraction. In clinical populations, these distortions are further amplified by the illness itself. Recent empirical evidence demonstrates that users prefer AI responses that distort reality over those that challenge it, creating a feedback loop that rewards the reproduction of cognitive bias. This paper introduces the clinical concept of collusion to the AI safety conversation, arguing that AI systems optimised for user approval collude with distorted input by default. It proposes three immediate steps: incorporating clinical expertise into training data evaluation, routine clinical enquiry about AI use, and integrating the clinical validity of training data into AI governance frameworks.

Citation

Please cite as:

Tahseen H

When AI Colludes: Clinical Reliability of Training and Preference Data as a Trustworthy-AI Criterion

JMIR Ment Health 2026;13:e96894

DOI: 10.2196/96894

PMID: 42076921

PMCID: 13205454

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Mental Health

Date Submitted: Apr 2, 2026

Date Accepted: May 4, 2026

Date Submitted to PubMed: May 4, 2026

When AI Colludes: Clinical Reliability of Training and Preference Data as a Trustworthy-AI Criterion

ABSTRACT

Citation

Copyright