Accepted for/Published in: JMIR Medical Education
Date Submitted: Apr 5, 2024
Date Accepted: Jun 27, 2024
A language model-powered simulated patient with automated feedback for history taking: Prospective study
ABSTRACT
Background:
History taking is fundamental in diagnosing medical conditions, but teaching and providing feedback on this skill can be challenging due to constraints on patient and staff resources. Virtual simulated patients and web-based chatbots have emerged as educational tools, with recent advancements in artificial intelligence such as large language models (LLMs) enhancing their realism and potential to provide feedback.
Objective:
This study aimed to evaluate the effectiveness of a Generative Pre-trained Transformer 4 (GPT-4) model to provide structured feedback on medical students' performance in history-taking with a simulated patient.
Methods:
We conducted a prospective study involving medical students performing history taking with a GPT-powered chatbot. Therefore, we designed a chatbot to simulate patient responses and provide immediate feedback on the comprehensiveness of the students’ history taking. Students’ interactions were analysed, and feedback from the chatbot was compared to that of a human rater. We measured inter-rater reliability and performed a descriptive analysis to assess the quality of feedback.
Results:
The study included 106 participants, most of them in their third year of medical school. A total of 1,894 question-answer pairs (QAPs) from 106 conversations were included in the analysis. GPT-4’s roleplay and responses were medically plausible in over 99% of cases. Inter-rater reliability between GPT-4 and the human rater showed an “almost perfect” agreement (Cohen’s κ=0.832). A lower agreement (κ<0.6) was detected for 8 out of 45 feedback categories, highlighting areas where the model’s assessments were overly specific or diverged from human judgment.
Conclusions:
The GPT model was effective in providing structured feedback on history taking dialogue performed by medical students. While we unraveled some limitations in terms of the specificity of feedback for certain feedback categories, the overall high agreement with human raters suggests that LLMs can be a valuable tool for medical education. This study supports the integration of AI-driven feedback mechanisms in medical training and highlights important aspects when LLMs are employed in this context.
Citation
Request queued. Please wait while the file is being generated. It may take some time.