Accepted for/Published in: JMIR Medical Education
Date Submitted: Dec 11, 2025
Date Accepted: Feb 6, 2026
From Validation to Real-World Impact: A Retrospective Propensity Score-Matched Cohort Study on the Educational Effectiveness of an AI-Powered Medical History-Taking System
ABSTRACT
Background:
Medical history-taking is a core clinical skill, yet traditional teaching methods face challenges such as limited standardized patient (SP) resources and inconsistent feedback. We previously developed an AI-powered Medical History-Taking Training and Evaluation System (AMTES) and established its technical feasibility as an extracurricular resource. Evidence on whether such tools improve learning outcomes when voluntarily embedded in routine curricula remains limited.
Objective:
To ascertain the real-world educational effectiveness of AMTES beyond its technical feasibility by assessing its impact on student academic performance as an opt-in extracurricular resource, and to examine how students’ voluntary use of AMTES across different baseline ability levels can inform precision education strategies when implementing AI-driven educational technologies.
Methods:
We conducted a retrospective cohort study of 478 undergraduates enrolled in a Diagnostics course (academic year 2024-2025). Students were categorized as AMTES users (205/478, 42.9%) or non-users (273/478, 57.1%) based on voluntary extracurricular adoption of the system during the month preceding a high-stakes final practical skill examination. We implemented propensity score matching (PSM) using prior academic performance as covariates. The average treatment effect on the treated (ATT) was estimated on the matched groups and robustness to unobserved confounding was assessed via Rosenbaum sensitivity analysis. Among users, we extracted digital trace data and applied K-means clustering to identify distinct practice intensity patterns. Subsequently, we explored aptitude-treatment interaction (ATI) by testing the interaction between practice intensity and midterm examination scores using regression models.
Results:
PSM yielded 161 matched pairs (N=322) with excellent covariate balance (|SMD|<0.1), effectively mitigating selection bias. In the matched cohort, the AMTES users achieved significantly higher final practical examination scores than the non-users (ATT=1.89, 95% CI 0.53–3.24, P<.05), corresponding to a 2.7% increase relative to the maximum score. This finding was robust to moderate unmeasured confounding (Rosenbaum's Γ=1.2). Cluster analysis of usage logs revealed low- and high-intensity practice profiles. However, greater practice intensity did not translate into higher scores (P>.05), suggesting that how students engaged with AMTES was more important than practice volume. Exploratory analyses revealed systematic heterogeneity in benefit from the system: the high-intensity group was estimated to score 1.1 points higher than the low-intensity group at the 25th percentile of midterm scores but approximately 0.1 points lower at the 75th percentile.
Conclusions:
AMTES emerges as a cost-effective supplementary tool that improves students’ history-taking performance and helps address persistent challenges in medical education. These findings highlight the importance of supporting self-regulated learning and adopting precision education strategies when implementing AI-driven educational technologies.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.