Accepted for/Published in: JMIR Medical Education
Date Submitted: Oct 10, 2025
Open Peer Review Period: Oct 16, 2025 - Dec 11, 2025
Date Accepted: Dec 18, 2025
(closed for review but you can still tweet)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Multidimensional Evaluation of Human and AI Co-produced Feedback on Case Reports in Health Professions Education: Mixed Methods Study
ABSTRACT
Background:
While artificial intelligence (AI)-generated feedback (AI-FB) offers significant potential to overcome constraints on faculty time and resources associated with providing personalized feedback, its effectiveness can be limited by issues such as algorithm aversion. A hybrid, human-AI co-produced feedback (Co-FB) model has been proposed as a solution. However, there is limited research comprehensively evaluating the value of this hybrid approach, particularly in health professions education.
Objective:
This paper aimed to conduct a multidimensional evaluation comparing the quality of human-created feedback (Hu-FB), AI-FB, and Co-FB on case reports, from the dual perspectives of physical therapy teachers and students.
Methods:
This mixed-methods research consisted of two studies. In Study 1, three physical therapy teachers ranked the three feedback types and participated in focus group interviews to clarify their potential and limitations. In Study 2, thirty-five physical therapy students compared the quality of AI-FB and Co-FB using three methods: a direct preference question regarding overall usefulness, the feedback perceptions questionnaire (FPQ), and focus group interviews. Both studies also examined how perceptions changed before and after revealing the feedback’s identity to investigate the impact of algorithm aversion.
Results:
In Study 1, all three teachers consistently ranked Hu-FB last. In an initial blinded evaluation, they valued AI-FB for its balance of textual and clinical content and praised Co-FB for its specialization. In Study 2, a clear majority of students (26/35, 74.3%) preferred Co-FB over AI-FB for overall usefulness. On the FPQ scales, Co-FB scored significantly higher than AI-FB on key items of Fairness (P=.032), Usefulness (P=.017), and Willingness to Improve (P=.036). In contrast, AI-FB scored significantly higher on the Affect scale, eliciting more positive emotions (eg, “successful,” P=.031) and less negative emotions (eg, “angry,” P=.007). A central finding across both studies was the significant impact of algorithm aversion. After the feedback’s identity was revealed, both teachers and students re-evaluated AI-FB more negatively, describing it as superficial and lacking a clinical focus, while Co-FB was viewed more favorably.
Conclusions:
Based on these findings, we propose a conceptual model for Co-FB that combines the contextual understanding characteristic of Hu-FB with the real-time capability and motivating, praising tone of AI-FB.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.