Evaluation of ChatGPT Performance on Emergency Medicine Board Exam Questions: Observational Study
ABSTRACT
Background:
The ever-evolving field of medicine has highlighted the potential for ChatGPT as an assistive platform. However, its use in medical board exam preparation and completion remains divided.
Objective:
This study aimed to evaluate the performance of a custom-modified version of ChatGPT-4, tailored with emergency medicine board exam preparatory materials (Anki flashcard deck), compared to its default version and previous iteration (3.5). The goal was to assess the accuracy of ChatGPT-4 answering board-style questions and its suitability as a tool to aid students and trainees in standardized examination preparation.
Methods:
A comparative analysis was conducted using a random selection of 598 questions from the Rosh In-Training Exam Question Bank. The subjects of the study included three versions of ChatGPT: the Default, a Custom, and ChatGPT-3.5. Accuracy, response length, medical discipline subgroups, and underlying causes of error were analyzed.
Results:
The Custom version did not demonstrate a significant improvement in accuracy over the Default version (P=.61), though both significantly outperformed ChatGPT-3.5 (P<.001). Default produced significantly longer responses than the Custom, 1371±444 and 929¬±408, respectively¬ (P<.001). Subgroup analysis revealed no significant difference in the performance across different medical sub-disciplines between the versions (P>.05 in all cases). Both ChatGPT-4’s had similar underlying error types (P>.05 in all cases) and had a 99% predicted probability of passing while ChatGPT-3.5 had an 85% probability.
Conclusions:
The findings suggest that while newer versions of ChatGPT exhibit improved performance in emergency medicine board exam preparation, specific enhancement with a comprehensive Anki flashcard deck on the topic does not significantly impact accuracy. The study highlights the potential of ChatGPT-4 as a tool for medical education, capable of providing accurate support across a wide range of topics in emergency medicine in its default form.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.