Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Oct 13, 2025
Open Peer Review Period: Oct 13, 2025 - Dec 8, 2025
Date Accepted: May 4, 2026
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Patient Cognitive Bias in Large Language Model–Supported Health Consultations: Simulation-Based Comparative Study

Zuo Y, Wan Q, Wang S

Patient Cognitive Bias in Large Language Model–Supported Health Consultations: Simulation-Based Comparative Study

J Med Internet Res 2026;28:e85770

DOI: 10.2196/85770

PMID: 42275635

Patient Cognitive Bias in Large Language Model–Supported Health Consultations: A Simulation-Based Comparative Study

  • Yi Zuo; 
  • Qifeng Wan; 
  • Shalong Wang

ABSTRACT

Background:

Large language models (LLMs) are increasingly used by patients for health information and preliminary medical advice. In patient-facing consultations, users may present explicitly stated diagnostic preferences or symptom narratives emphasizing a preferred explanation. Such cognitively biased input constrains the diagnostic context available to the model and may systematically steer its reasoning during interactive LLM-supported health consultations.

Objective:

To quantify the impact of patient cognitive bias on LLM diagnostic performance in multi-turn consultations, to assess the effectiveness of prompt-based mitigation strategies and decoding temperature adjustment, and to evaluate a dual-system framework for improving robustness under biased interaction.

Methods:

We developed a simulated patient agent to generate both unbiased and cognitively biased consultations using 1,273 MedQA-USMLE cases. Six widely used LLMs of varying capacity were evaluated through three-round, multi-turn dialogues, after which each model produced a final diagnostic judgment based on the complete consultation record. Diagnostic accuracy was the primary outcome. Secondary outcomes included bias-induced accuracy decline (BIAD; absolute reduction in accuracy under biased versus standard consultations) and bias-influenced error proportion (BIEP; proportion of incorrect responses aligned with the patient’s preferred but incorrect diagnosis). Four prompt-based mitigation strategies and four decoding temperature settings were tested. In addition, a dual-system framework was evaluated, in which a conversational foundation LLM conducted patient interaction and history taking (System 1), while a reasoning-oriented LLM (o1-Mini) generated the final diagnostic judgment (System 2). In the foundation-only condition, the same LLM performed both interaction and diagnosis.

Results:

Across all six evaluated models, cognitively biased consultations led to marked diagnostic accuracy declines of approximately 8 to 39 percentage points compared with standard multi-turn consultations (P < .001), whereas static single-response tests and standard consultations showed comparable accuracy. Larger deteriorations were observed in lower-capacity models, with some approaching random-guess performance under bias. Errors were frequently aligned with patient bias, with BIEP exceeding one-third across models, indicating systematic conformity rather than random error. Prompt-based mitigation strategies and decoding temperature reduction yielded limited and inconsistent improvements and did not reliably prevent bias-induced performance loss. By contrast, the dual-system framework substantially improved diagnostic accuracy under biased conditions in most models, producing gains of approximately 10 to 39 percentage points and recovering a large proportion of the performance lost due to bias (P < .001), particularly in lower-capacity systems.

Conclusions:

Patient-driven cognitive bias represents an underrecognized behavioral risk in LLM-supported health consultations. Common mitigation approaches such as prompt engineering or decoding parameter adjustment provide limited resilience. Explicitly separating conversational interaction from deliberative diagnostic reasoning through a dual-system architecture enables more robust diagnostic performance under biased input while preserving fluent patient-facing dialogue, offering a scalable design strategy for safer medical AI systems.


 Citation

Please cite as:

Zuo Y, Wan Q, Wang S

Patient Cognitive Bias in Large Language Model–Supported Health Consultations: Simulation-Based Comparative Study

J Med Internet Res 2026;28:e85770

DOI: 10.2196/85770

PMID: 42275635

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.