Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Formative Research

Date Submitted: Apr 27, 2025
Open Peer Review Period: Nov 6, 2025 - Jan 1, 2026
Date Accepted: Nov 10, 2025
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Teaching Clinical Reasoning in Health Care Professions Learners Using AI-Generated Script Concordance Tests: Mixed Methods Formative Evaluation

Hudon A, Phan V, Charlin B, Wittmer R

Teaching Clinical Reasoning in Health Care Professions Learners Using AI-Generated Script Concordance Tests: Mixed Methods Formative Evaluation

JMIR Form Res 2025;9:e76618

DOI: 10.2196/76618

PMID: 41264864

PMCID: 12634011

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Teaching Clinical Reasoning in the Age of AI: A Mixed-Methods Formative Evaluation of AI-Generated Script Concordance Tests and Expert Embodiment

  • Alexandre Hudon; 
  • Véronique Phan; 
  • Bernard Charlin; 
  • René Wittmer

ABSTRACT

Background:

The integration of artificial intelligence (AI) in medical education is evolving, offering new tools to enhance teaching and assessment. Among these, script concordance tests (SCT) are well suited to evaluate clinical reasoning in contexts of uncertainty. Traditionally, SCTs require expert panels for scoring and feedback, which can be resource intensive. Recent advances in generative AI, particularly large language models (LLM), suggest the possibility of replacing human experts with simulated ones, though this potential remains underexplored.

Objective:

This study aimed to evaluate whether LLMs can effectively simulate expert judgment in SCTs, by using generative AI to author, score, and provide feedback for SCTs in cardiology and pneumology. A secondary goal was to assess students’ perceptions of the test’s difficulty and the pedagogical value of AI-generated feedback.

Methods:

A cross-sectional, mixed-methods study was conducted with 25 second-year medical students who completed a 32-item SCT authored by ChatGPT-4o. Six LLMs (three trained on course material and three untrained) served as simulated experts to generate scoring keys and feedback. Students answered SCT questions, rated perceived difficulty, and selected the most helpful feedback explanation for each item. Quantitative analysis included scoring, difficulty ratings, and correlation between student and AI responses. Qualitative comments were thematically analyzed.

Results:

The average student score was 22.8 out of 32 (SD = 1.6), with scores ranging from 19.75 to 26.75. Trained AI systems showed significantly higher concordance with student responses (ρ = 0.64) than untrained models (ρ = 0.41). AI-generated feedback was rated as most helpful in 62.5% of cases, especially when provided by trained models. The SCT demonstrated good internal consistency (Cronbach’s α = 0.76), and students reported moderate perceived difficulty (mean=3.7/7). Qualitative feedback highlighted appreciation for SCTs as reflective tools, while recommending clearer guidance on Likert-scale use and more contextual detail in vignettes.

Conclusions:

This is among the first studies to demonstrate that trained generative AI models can reliably simulate expert clinical reasoning in a script concordance framework. The findings suggest that AI can both streamline SCT design and offer educational valuable feedback without compromising authenticity. Future studies should explore longitudinal effects on learning and assess how hybrid models (human and AI) can optimize reasoning instruction in medical education.


 Citation

Please cite as:

Hudon A, Phan V, Charlin B, Wittmer R

Teaching Clinical Reasoning in Health Care Professions Learners Using AI-Generated Script Concordance Tests: Mixed Methods Formative Evaluation

JMIR Form Res 2025;9:e76618

DOI: 10.2196/76618

PMID: 41264864

PMCID: 12634011

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.