Accepted for/Published in: JMIR Medical Education
Date Submitted: Nov 4, 2025
Date Accepted: Dec 29, 2025
Using AI to train future clinicians in depression assessment: a feasibility study
ABSTRACT
Background:
Depression is a major global healthcare challenge, causing significant individual distress but also contributing to a substantial global burden. Timely and accurate diagnosis is crucial. To help future clinicians develop these essential skills, we trained a GPT-powered chatbot to simulate patients with varying degrees of depression and suicidality.
Objective:
The Objective of this study is to evaluate the applicability and transferability of our GPT-4-powered chatbot for psychosomatic cases. Specifically, we aim to investigate how accurately the chatbot can simulate patients exhibiting various stages of depression and phases of suicidal ideation, while adhering to a predefined role script and maintaining a sufficient level of authenticity. Additionally, we want to analyze to what level the chatbot is suitable for practicing correctly diagnosing depressive disorders in patients as well as assessing suicidality stages.
Methods:
Three virtual patient role scripts depicting complex, realistic cases of depression and varying degrees of suicidality were developed collaboratively by field experts and aligned with mental health assessment guidelines. These cases were integrated into a GPT-4-powered chatbot for practicing clinical history-taking. A total of 148 medical students, with an average age of 22.71 years and mostly in their sixth semester, interacted individually with one of the randomly assigned virtual patients via chat. Following this, they completed a questionnaire assessing their demographics and user experience. Chats were analyzed descriptively to assess diagnostic accuracy and suicidality assessments as well as the role script adherence and authenticity of the AI. This was done to gain further insight into the chatbot's behavior and the students' diagnostic accuracy.
Results:
In over 90% of cases, the chatbot maintained its assigned role. On average, students correctly identified the severity of depression in 60% and the phase of suicidality in 67% of the cases. Notably, the majority either failed to address or insufficiently explored the topic of suicidality despite explicit instructions beforehand.
Conclusions:
This study demonstrates that a GPT-powered chatbot can simulate depressive patients fairly accurately. More than two-thirds of participants perceived the AI-simulated depressive patients as authentic, and nearly 80% indicated they would like use the application for further practice, highlighting its potential as a training tool. While a small proportion of students expressed reservations, and the overall diagnostic accuracy varied depending on the severity of the case, the findings overall support the feasibility and educational value of AI-based role-playing in clinical training. AI-supported virtual patients provide a highly flexible, standardized, and readily available training tool, independent of real-life constraints.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.