Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Oct 13, 2025
Open Peer Review Period: Oct 13, 2025 - Dec 8, 2025
Date Accepted: May 4, 2026
(closed for review but you can still tweet)
Patient Cognitive Bias in Large Language Model–Supported Health Consultations: A Simulation-Based Comparative Study
ABSTRACT
Background:
Large language models (LLMs) are increasingly used by patients for health information and preliminary medical advice. In patient-facing consultations, users may present explicitly stated diagnostic preferences or symptom narratives emphasizing a preferred explanation. Such cognitively biased input constrains the diagnostic context available to the model and may systematically steer its reasoning during interactive LLM-supported health consultations.
Objective:
To quantify the impact of patient cognitive bias on LLM diagnostic performance in multi-turn consultations, to assess the effectiveness of prompt-based mitigation strategies and decoding temperature adjustment, and to evaluate a dual-system framework for improving robustness under biased interaction.
Methods:
We developed a simulated patient agent to generate both unbiased and cognitively biased consultations using 1,273 MedQA-USMLE cases. Six widely used LLMs of varying capacity were evaluated through three-round, multi-turn dialogues, after which each model produced a final diagnostic judgment based on the complete consultation record. Diagnostic accuracy was the primary outcome. Secondary outcomes included bias-induced accuracy decline (BIAD; absolute reduction in accuracy under biased versus standard consultations) and bias-influenced error proportion (BIEP; proportion of incorrect responses aligned with the patient’s preferred but incorrect diagnosis). Four prompt-based mitigation strategies and four decoding temperature settings were tested. In addition, a dual-system framework was evaluated, in which a conversational foundation LLM conducted patient interaction and history taking (System 1), while a reasoning-oriented LLM (o1-Mini) generated the final diagnostic judgment (System 2). In the foundation-only condition, the same LLM performed both interaction and diagnosis.
Results:
Across all six evaluated models, cognitively biased consultations led to marked diagnostic accuracy declines of approximately 8 to 39 percentage points compared with standard multi-turn consultations (P < .001), whereas static single-response tests and standard consultations showed comparable accuracy. Larger deteriorations were observed in lower-capacity models, with some approaching random-guess performance under bias. Errors were frequently aligned with patient bias, with BIEP exceeding one-third across models, indicating systematic conformity rather than random error. Prompt-based mitigation strategies and decoding temperature reduction yielded limited and inconsistent improvements and did not reliably prevent bias-induced performance loss. By contrast, the dual-system framework substantially improved diagnostic accuracy under biased conditions in most models, producing gains of approximately 10 to 39 percentage points and recovering a large proportion of the performance lost due to bias (P < .001), particularly in lower-capacity systems.
Conclusions:
Patient-driven cognitive bias represents an underrecognized behavioral risk in LLM-supported health consultations. Common mitigation approaches such as prompt engineering or decoding parameter adjustment provide limited resilience. Explicitly separating conversational interaction from deliberative diagnostic reasoning through a dual-system architecture enables more robust diagnostic performance under biased input while preserving fluent patient-facing dialogue, offering a scalable design strategy for safer medical AI systems.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.