Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Oct 13, 2025
Open Peer Review Period: Oct 13, 2025 - Dec 8, 2025
Date Accepted: May 4, 2026
(closed for review but you can still tweet)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Evaluating Patient Cognitive Bias in Large Language Model–Supported Health Consultations: A Simulation-Based Comparative Study
ABSTRACT
Background:
Large language models (LLMs) are increasingly used by the public, including patients, for health information and preliminary medical advice. However, users often interact with these systems through preconceived diagnoses or selectively framed symptom descriptions—a form of cognitive bias that alters the information provided to the model and poses a potential risk to diagnostic reliability in LLM-supported consultations.
Objective:
To quantify the effect of user cognitive bias on LLM diagnostic performance, assess the effectiveness of prompt-based mitigation strategies and temperature setting, and evaluate a dual-system framework inspired by dual-process cognitive theory.
Methods:
We developed a simulated patient agent to generate unbiased and confirmation-biased consultations using 1,273 MedQA-USMLE cases. Six LLMs of varying capacities were evaluated through multi-turn dialogues. Diagnostic accuracy was the primary outcome; bias-induced accuracy decline (BIAD, loss in accuracy under bias) and bias-influenced error proportion (BIEP, fraction of errors aligned with user misconceptions) were secondary metrics. Four prompt-based mitigation strategies, four temperature settings, and a dual-system framework—pairing a foundation model (System 1) with a reasoning model (System 2, o1-Mini)—were tested.
Results:
User cognitive bias significantly reduced diagnostic accuracy by 10–40 percentage points (P < .001), with smaller models occasionally performing near chance. Errors frequently reflected user misconceptions (BIEP > 33%). Prompt and temperature adjustments yielded limited or inconsistent improvement, whereas the dual-system framework increased accuracy by 10–39 points and recovered most or all of the performance lost under bias (P < .001).
Conclusions:
User cognitive bias represents a new behavioral dimension of risk in LLM-supported healthcare. Quick fixes such as prompt engineering or temperature control offer limited resilience. Integrating a dual-system reasoning framework provides a scalable path toward safer and more bias-aware medical AI.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.