JMIR Preprints #76618: Teaching Clinical Reasoning in the Age of AI: A Mixed-Methods Formative Evaluation of AI-Generated Script Concordance Tests and Expert Embodiment

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Teaching Clinical Reasoning in the Age of AI: A Mixed-Methods Formative Evaluation of AI-Generated Script Concordance Tests and Expert Embodiment

Alexandre Hudon;
Véronique Phan;
Bernard Charlin;
René Wittmer

ABSTRACT

Background:

The integration of artificial intelligence (AI) in medical education is evolving, offering new tools to enhance teaching and assessment. Among these, script concordance tests (SCT) are well suited to evaluate clinical reasoning in contexts of uncertainty. Traditionally, SCTs require expert panels for scoring and feedback, which can be resource intensive. Recent advances in generative AI, particularly large language models (LLM), suggest the possibility of replacing human experts with simulated ones, though this potential remains underexplored.

Objective:

This study aimed to evaluate whether LLMs can effectively simulate expert judgment in SCTs, by using generative AI to author, score, and provide feedback for SCTs in cardiology and pneumology. A secondary goal was to assess students’ perceptions of the test’s difficulty and the pedagogical value of AI-generated feedback.

Methods:

A cross-sectional, mixed-methods study was conducted with 25 second-year medical students who completed a 32-item SCT authored by ChatGPT-4o. Six LLMs (three trained on course material and three untrained) served as simulated experts to generate scoring keys and feedback. Students answered SCT questions, rated perceived difficulty, and selected the most helpful feedback explanation for each item. Quantitative analysis included scoring, difficulty ratings, and correlation between student and AI responses. Qualitative comments were thematically analyzed.

Results:

The average student score was 22.8 out of 32 (SD = 1.6), with scores ranging from 19.75 to 26.75. Trained AI systems showed significantly higher concordance with student responses (ρ = 0.64) than untrained models (ρ = 0.41). AI-generated feedback was rated as most helpful in 62.5% of cases, especially when provided by trained models. The SCT demonstrated good internal consistency (Cronbach’s α = 0.76), and students reported moderate perceived difficulty (mean=3.7/7). Qualitative feedback highlighted appreciation for SCTs as reflective tools, while recommending clearer guidance on Likert-scale use and more contextual detail in vignettes.

Conclusions:

This is among the first studies to demonstrate that trained generative AI models can reliably simulate expert clinical reasoning in a script concordance framework. The findings suggest that AI can both streamline SCT design and offer educational valuable feedback without compromising authenticity. Future studies should explore longitudinal effects on learning and assess how hybrid models (human and AI) can optimize reasoning instruction in medical education.

Citation

Please cite as:

Hudon A, Phan V, Charlin B, Wittmer R

Teaching Clinical Reasoning in Health Care Professions Learners Using AI-Generated Script Concordance Tests: Mixed Methods Formative Evaluation

JMIR Form Res 2025;9:e76618

DOI: 10.2196/76618

PMID: 41264864

PMCID: 12634011

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Formative Research

Date Submitted: Apr 27, 2025

Open Peer Review Period: Nov 6, 2025 - Jan 1, 2026

Date Accepted: Nov 10, 2025

(closed for review but you can still tweet)

Teaching Clinical Reasoning in the Age of AI: A Mixed-Methods Formative Evaluation of AI-Generated Script Concordance Tests and Expert Embodiment

ABSTRACT

Citation

Copyright