JMIR Preprints #48978: Performance of ChatGPT on the Situational Judgement Test: A professional dilemmas exam for doctors in the UK.

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Performance of ChatGPT on the Situational Judgement Test: A professional dilemmas exam for doctors in the UK.

Robin Jacob Borchert;
Charlotte Rachel Hickman;
Jack Pepys;
Timothy J Sadler

ABSTRACT

Background:

ChatGPT is a language model which has performed well on professional exams in the fields of medicine, law and business. However, it is unclear how ChatGPT would perform on an exam assessing professionalism and situational judgement for doctors.

Objective:

We evaluated the performance of ChatGPT on the Situational Judgement Test (SJT); a national exam taken by all final-year medical students in the United Kingdom (UK). This exam is designed to assess attributes such as communication, team-working, patient safety, prioritisation skills, professionalism and ethics.

Methods:

All questions from the United Kingdom Foundation Programme Office (UKFPO) 2023 SJT practice exam were inputted into ChatGPT. For each question, ChatGPT’s answers and rationales were recorded and assessed based on the official UKFPO scoring template. Questions were categorised into domains of Good Medical Practice based on the domains referenced in the rationales provided in the scoring sheet. Questions without clear domain links were screened by multiple reviewers and assigned one, or multiple, domains. ChatGPT's overall performance, as well as its performance across the domains of Good Medical Practice, was evaluated.

Results:

Overall, ChatGPT performed well scoring 76% on the SJT but scoring full marks on only a minority of the questions (9%) which may reflect possible flaws in ChatGPT’s situational judgement and/or inconsistencies in the reasoning across questions in the exam itself. ChatGPT demonstrated consistent performance across the four outlined domains in Good Medical Practice for Doctors.

Conclusions:

Further research is needed to understand the potential applications of large language models, such as ChatGPT, in medical education for standardising questions and providing consistent rationales for examinations assessing professionalism and ethics.

Citation

Please cite as:

Borchert RJ, Hickman CR, Pepys J, Sadler TJ

Performance of ChatGPT on the Situational Judgement Test—A Professional Dilemmas–Based Examination for Doctors in the United Kingdom

JMIR Med Educ 2023;9:e48978

DOI: 10.2196/48978

PMID: 37548997

PMCID: 10442724

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Education

Date Submitted: May 16, 2023

Date Accepted: Jul 25, 2023

Performance of ChatGPT on the Situational Judgement Test: A professional dilemmas exam for doctors in the UK.

ABSTRACT

Citation

Copyright