JMIR Preprints #84347: Assessing Students’Clinical Reasoning Skills in History Taking with Large Language Model–Based Virtual Patients: Development and Validation of a Structured Coding Scheme Using Systematic Text Condensation

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Assessing Students’Clinical Reasoning Skills in History Taking with Large Language Model–Based Virtual Patients: Development and Validation of a Structured Coding Scheme Using Systematic Text Condensation

Naping Chen;
Luzhen Tang;
Yang Liu;
Changmin Lin;
Zijian Li;
Chujun Shi;
Chujun Shi;
Mengyu Xia;
Dragan Gasevic;
Danijela Gasevic;
Danijela Gasevic;
Jinbin Zheng;
Yizhou Fan;
Xinyu Li;
Xinyu Li

ABSTRACT

Background:

Large language model (LLM)-driven virtual patients (VPs) are increasingly used to simulate history taking. However, there is currently no straightforward methodological approach to effectively identify students’ clinical reasoning activities during these interactions, which limits the ability to provide personalised feedback.

Objective:

This study aims to develop a structured coding scheme to characterise medical students’ behaviours during interactions with LLM-driven VPs.

Methods:

Second-year medical students (N=210) completed text-based history-taking sessions across five simulated chest pain cases, yielding 1,030 dialogues. Dialogues from Cases 1–4 were analysed using systematic text condensation (STC) to develop a coding scheme inductively. Two raters independently coded a subset of dialogues, and inter-coder reliability was assessed using Cohen’s kappa. The established scheme was then applied to the dialogues from Case 5, and Pearson correlation coefficients (r) were used to assess associations between code frequencies and external performance outcomes: diagnostic accuracy, history-taking checklist scores, clinical knowledge test scores, and post-encounter form (PEF) scores.

Results:

The STC analysis produced a 12-code scheme comprising four clinical reasoning codes (Pathophysiologic Question, Relevant Response, Summarising & Integrating, Logical Organisation), six information-gathering codes, and two communication codes. Inter-coder reliability was high for all dimensions: clinical reasoning (κ = 1.00), information gathering (κ = 0.95-0.98), and communication (κ = 1.00). In Case 5, Summarising & Integrating was most predictive, correlating with diagnostic accuracy (χ2 =6.019, P=.014), checklist scores (r=0.208, P=.003), knowledge test scores (r=0.225, P=.002), and PEF scores (r=0.191, P=.009). Logical organisation (LO) also correlated with diagnostic accuracy (χ2 =0.188, P=.008), checklist (r=0.592, P<.001), and knowledge test scores (r=0.170, P=.013). Patho-physiologic question showed weaker but significant associations with checklist and knowledge tests (r=0.177, p=.013 and r=0.145,p=.042). Only two information-gathering codes demonstrated weak-to-moderate associations with checklist and knowledge test scores, while only one communication code showed a weak association with knowledge tests.

Conclusions:

This study developed a theory-informed coding scheme that reliably distinguishes information-gathering and reasoning behaviours in history-taking with virtual patients. Enabling the identification of diverse behaviours provides a foundation for formative assessment and personalised feedback, offering a scalable approach to support the development of clinical reasoning in medical students. Clinical Trial: NO

Citation

Please cite as:

Chen N, Tang L, Liu Y, Lin C, Li Z, Shi C, Shi C, Xia M, Gasevic D, Gasevic D, Gasevic D, Zheng J, Fan Y, Li X, Li X

Developing and Validating a Coding Scheme for Clinical Reasoning in History Taking Using Generative AI–Based Virtual Patients: Systematic Text Condensation Approach

JMIR Med Educ 2026;12:e84347

DOI: 10.2196/84347

PMID: 41973630

PMCID: 13075466

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Education

Date Submitted: Sep 18, 2025

Open Peer Review Period: Sep 24, 2025 - Nov 19, 2025

Date Accepted: Mar 9, 2026

(closed for review but you can still tweet)

Assessing Students’Clinical Reasoning Skills in History Taking with Large Language Model–Based Virtual Patients: Development and Validation of a Structured Coding Scheme Using Systematic Text Condensation

ABSTRACT

Citation

Copyright