JMIR Preprints #67696: Evaluation of ChatGPT Performance on Emergency Medicine Board Exam Questions: Observational Study

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Evaluation of ChatGPT Performance on Emergency Medicine Board Exam Questions: Observational Study

Mila Pastrak;
Sten Kajitani;
Anthony James Goodings;
Austin Drewek;
Andrew Lafree;
Adrian Murphy

ABSTRACT

Background:

The ever-evolving field of medicine has highlighted the potential for ChatGPT as an assistive platform. However, its use in medical board exam preparation and completion remains divided.

Objective:

This study aimed to evaluate the performance of a custom-modified version of ChatGPT-4, tailored with emergency medicine board exam preparatory materials (Anki flashcard deck), compared to its default version and previous iteration (3.5). The goal was to assess the accuracy of ChatGPT-4 answering board-style questions and its suitability as a tool to aid students and trainees in standardized examination preparation.

Methods:

A comparative analysis was conducted using a random selection of 598 questions from the Rosh In-Training Exam Question Bank. The subjects of the study included three versions of ChatGPT: the Default, a Custom, and ChatGPT-3.5. Accuracy, response length, medical discipline subgroups, and underlying causes of error were analyzed.

Results:

The Custom version did not demonstrate a significant improvement in accuracy over the Default version (P=.61), though both significantly outperformed ChatGPT-3.5 (P<.001). Default produced significantly longer responses than the Custom, 1371±444 and 929¬±408, respectively¬ (P<.001). Subgroup analysis revealed no significant difference in the performance across different medical sub-disciplines between the versions (P>.05 in all cases). Both ChatGPT-4’s had similar underlying error types (P>.05 in all cases) and had a 99% predicted probability of passing while ChatGPT-3.5 had an 85% probability.

Conclusions:

The findings suggest that while newer versions of ChatGPT exhibit improved performance in emergency medicine board exam preparation, specific enhancement with a comprehensive Anki flashcard deck on the topic does not significantly impact accuracy. The study highlights the potential of ChatGPT-4 as a tool for medical education, capable of providing accurate support across a wide range of topics in emergency medicine in its default form.

Citation

Please cite as:

Pastrak M, Kajitani S, Goodings AJ, Drewek A, Lafree A, Murphy A

Evaluation of ChatGPT Performance on Emergency Medicine Board Examination Questions: Observational Study

JMIR AI 2025;4:e67696

DOI: 10.2196/67696

PMID: 40611478

PMCID: 12231519

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR AI

Date Submitted: Oct 18, 2024

Date Accepted: Feb 12, 2025

Evaluation of ChatGPT Performance on Emergency Medicine Board Exam Questions: Observational Study

ABSTRACT

Citation

Copyright