JMIR Preprints #63430: Critical Analysis of ChatGPT 4 Omni in USMLE Disciplines, Clinical Clerkships, and Clinical Skills

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Critical Analysis of ChatGPT 4 Omni in USMLE Disciplines, Clinical Clerkships, and Clinical Skills

Brenton T. Bicknell;
Danner Butler;
Sydney Whalen;
James Ricks;
Cory J. Dixon;
Abigail B. Clark;
Olivia Spaedy;
Neel Edupuganti;
Lance Dzubinski;
Brenessa Lindeman;
Lisa Soleymani Lehmann

ABSTRACT

Background:

Recent studies, including those by the National Board of Medical Examiners (NBME), have highlighted the remarkable capabilities of recent large language models (LLMs) such as ChatGPT in passing the United States Medical Licensing Examination (USMLE). However, there is a gap in detailed analysis of these models' performance in specific medical content areas, thus limiting an assessment of their potential utility for medical education.

Objective:

To assess and compare the accuracy of successive ChatGPT versions (ChatGPT 3.5, GPT-4, and GPT-4 Omni) in USMLE disciplines, clinical clerkships, and the clinical skills of diagnostics and management.

Methods:

This study used 750 clinical vignette-based multiple-choice questions (MCQs) to characterize the performance of successive ChatGPT versions [ChatGPT 3.5 (GPT-3.5), ChatGPT 4 (GPT-4), and ChatGPT 4 Omni (GPT-4o)] across USMLE disciplines, clinical clerkships, and in clinical skills (diagnostics and management). Accuracy was assessed using a standardized protocol, with statistical analyses conducted to compare the models' performances.

Results:

GPT-4o demonstrated the highest overall accuracy at 90.4%, performing best in social sciences (95.5%), microbiology (92.3%), and immunology (92.9%). GPT-4 showed significant improvements over GPT-3.5, achieving an accuracy of 81.1%. GPT-3.5 accurately responded to 60.0% of MCQs, with the highest accuracy in microbiology (87.0%) and the lowest in physiology (54.2%) and social sciences (59.1%). In clinical diagnostics, GPT-4o achieved a 92.7% accuracy rate, while in clinical management tasks, it reached 88.8%, both significantly higher than GPT-4 and GPT-3.5.

Conclusions:

ChatGPT 4 Omni’s performance in USMLE preclinical content areas as well as clinical skills indicates substantial improvements over its predecessors, suggesting significant potential for the use of this technology as an educational aid for medical students. These findings underscore the necessity of careful consideration of LLMs' integration into medical education, emphasizing the importance of structured curricula to guide their appropriate use and the need for ongoing critical analyses to ensure their reliability and effectiveness.

Citation

Please cite as:

Bicknell BT, Butler D, Whalen S, Ricks J, Dixon CJ, Clark AB, Spaedy O, Edupuganti N, Dzubinski L, Lindeman B, Lehmann LS

ChatGPT-4 Omni Performance in USMLE Disciplines and Clinical Skills: Comparative Analysis

JMIR Med Educ 2024;10:e63430

DOI: 10.2196/63430

PMID: 39504445

PMCID: 11611793

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Education

Date Submitted: Jun 19, 2024

Date Accepted: Sep 14, 2024

Date Submitted to PubMed: Sep 14, 2024

Critical Analysis of ChatGPT 4 Omni in USMLE Disciplines, Clinical Clerkships, and Clinical Skills

ABSTRACT

Citation

JMIR Preprints

Accepted for/Published in: JMIR Medical Education

Date Submitted: Jun 19, 2024

Date Accepted: Sep 14, 2024

Date Submitted to PubMed: Sep 14, 2024

Critical Analysis of ChatGPT 4 Omni in USMLE Disciplines, Clinical Clerkships, and Clinical Skills

ABSTRACT

Citation

Per the author's request the PDF is not available.