Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Mental Health

Date Submitted: Apr 2, 2026
Date Accepted: May 4, 2026
Date Submitted to PubMed: May 4, 2026

The final, peer-reviewed published version of this preprint can be found here:

When AI Colludes: Clinical Reliability of Training and Preference Data as a Trustworthy-AI Criterion

Tahseen H

When AI Colludes: Clinical Reliability of Training and Preference Data as a Trustworthy-AI Criterion

JMIR Ment Health 2026;13:e96894

DOI: 10.2196/96894

PMID: 42076921

When AI Colludes: Clinical Reliability of Training and Preference Data as a Trustworthy-AI Criterion

  • Hina Tahseen

ABSTRACT

The growing literature on artificial intelligence (AI) and mental health has focused predominantly on the downstream risks of AI interaction with vulnerable users and the potential of AI to deliver psychological interventions. This Viewpoint argues that a more fundamental question has received insufficient attention: how reliable is the human-generated data on which AI systems were trained? Large language models learn from text produced by humans subject to well-documented cognitive biases, including anchoring, confirmation bias, catastrophising, and selective abstraction. In clinical populations, these distortions are further amplified by the illness itself. Recent empirical evidence demonstrates that users prefer AI responses that distort reality over those that challenge it, creating a feedback loop that rewards the reproduction of cognitive bias. This paper introduces the clinical concept of collusion to the AI safety conversation, arguing that AI systems optimised for user approval collude with distorted input by default. It proposes three immediate steps: incorporating clinical expertise into training data evaluation, routine clinical enquiry about AI use, and integrating the clinical validity of training data into AI governance frameworks.


 Citation

Please cite as:

Tahseen H

When AI Colludes: Clinical Reliability of Training and Preference Data as a Trustworthy-AI Criterion

JMIR Ment Health 2026;13:e96894

DOI: 10.2196/96894

PMID: 42076921

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.