JMIR Preprints #94614: Evaluation of Prompt Design and Internal Reasoning in Chatbot-Based Medical History Taking

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Evaluation of Prompt Design and Internal Reasoning in Chatbot-Based Medical History Taking

Nattawipa Thawinwisan;
Chang Liu;
Goshiro Yamamoto;
Kazumasa Kishimoto;
Yukiko Mori;
Tomohiro Kuroda

ABSTRACT

Background:

A persistent discrepancy exists between patient-reported information and physician documentation. While conversational agents have been developed to collect medical histories prior to consultation, existing evaluations have largely focused on diagnostic accuracy or user satisfaction rather than the completeness and clinical usefulness of the information collected. There remains a need to assess the extent of clinically relevant information captured through chatbot-based interviews and to understand how model configurations and instructional strategies influence this coverage.

Objective:

This study aimed to evaluate the extent to which a chatbot can obtain clinically useful patient history information and to examine how prompt detail and internal reasoning influence information coverage during chatbot-based medical interviews.

Methods:

We developed a medical history-taking chatbot using the Qwen3-14B-Instruct model and evaluated four configurations in a 2×2 factorial design: Detailed/Thinking (DT), Detailed/Non-thinking (DN), Minimal/Thinking (MT), and Minimal/Non-thinking (MN). These configurations were compared against a rule-based system baseline (choice-based mode) using 66 standardized primary care clinical cases, with simulated patients interacting with the chatbot according to predefined case scripts. Information coverage (%) was assessed using a checklist inspired by Objective Structured Clinical Examination (OSCE) frameworks. Three physicians independently evaluated transcript coverage, with inter-rater agreement assessed using full agreement rates and Fleiss’ κ. Coverage percentages were compared across configurations using repeated-measures analysis of variance with post hoc testing.

Results:

Inter-rater agreement was substantial (Fleiss’ κ = 0.75). Across all 66 simulated cases, information coverage differed significantly among configurations (p < .001), with the detailed prompt with thinking (DT) mode achieving the highest mean coverage (72.3%), compared with moderate coverage in configurations using either thinking or detailed prompts alone (approximately 60%) and lower coverage in minimal non-thinking and rule-based configurations (approximately 51-54%). Differences were most pronounced for past medical and family history domains. Symptom-level analyses revealed substantial variability, with higher coverage for symptoms associated with well-defined diagnostic frameworks and lower coverage for multi-system presentations.

Conclusions:

The combination of clinically detailed prompt instructions and internal reasoning significantly enhances the clinical usefulness of AI-driven history-taking by ensuring more comprehensive data collection. This approach allows for a more systematic and robust foundation for automated clinical documentation, facilitating better integration into healthcare workflows.

Citation

Please cite as:

Thawinwisan N, Liu C, Yamamoto G, Kishimoto K, Mori Y, Kuroda T

Evaluation of Prompt Design and Internal Reasoning in Chatbot-Based Medical History Taking

JMIR Preprints. 08/03/2026:94614

DOI: 10.2196/preprints.94614

URL: https://preprints.jmir.org/preprint/94614

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Currently submitted to: JMIR Medical Informatics

Date Submitted: Mar 8, 2026

Open Peer Review Period: Mar 20, 2026 - May 15, 2026

(currently open for review)

Evaluation of Prompt Design and Internal Reasoning in Chatbot-Based Medical History Taking

ABSTRACT

Citation

Copyright