JMIR Preprints #55048: Exploring the Proficiency of ChatGPT 3.5, 4, and 4 with Vision in the Chilean Medical Licensing Exam: An Observational Study

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Exploring the Proficiency of ChatGPT 3.5, 4, and 4 with Vision in the Chilean Medical Licensing Exam: An Observational Study

Marcos Rojas;
Marcelo Rojas;
Valentina Burgess;
Javier Toro;
Shima Salehi

ABSTRACT

Background:

The deployment of OpenAI's ChatGPT 3.5 and its subsequent versions, ChatGPT 4 and 4 with Vision (4V), has notably influenced the medical field. Demonstrating remarkable performance in medical exams globally, these models show potential for educational applications. However, their effectiveness in non-English contexts, particularly in Chile's Medical Licensing Exam, a critical step for medical practitioners in Chile, is less explored. This gap highlights the need to evaluate ChatGPT's adaptability to diverse linguistic and cultural scenarios.

Objective:

This study aims to evaluate the proficiency of ChatGPT versions 3.5, 4, and 4V in answering EUNACOM (Examen Único Nacional de Conocimientos de Medicina), a major medical examination in Chile, format questions.

Methods:

Three official drills from the University of Chile, mirroring the EUNACOM exam structure and difficulty, were used to test ChatGPT versions 3.5, 4, and 4V. The three ChatGPT versions underwent three rounds of answering each drill. Responses to questions during each round were systematically categorized and analyzed to assess the accuracy rate of the responses.

Results:

All versions of ChatGPT successfully passed EUNACOM-style exams, with version 4 outperforming 3.5 and 4V. A detailed analysis revealed a higher accuracy rate in questions related to Surgery and Psychiatry for all versions, while performance dipped in the areas of Internal Medicine and Public Health. Version 4V didn't demonstrate a better performance compared to the two other versions, despite access to figures of the questions.

Conclusions:

The study reveals ChatGPT's ability to pass the EUNACOM test, with distinct proficiencies across versions 3.5, 4, and 4V. Notably, advancements in artificial intelligence (AI) don't significantly enhance image-based question performance. The variations in proficiency across medical fields suggest the need for more nuanced AI training. While AI shows promise in medical education, its limitations in depth and variability of expertise highlight the necessity for medical curricula to focus on critical thinking and reflective practices, ensuring the effective integration of AI in patient-centered care.

Citation

Please cite as:

Rojas M, Rojas M, Burgess V, Toro J, Salehi S

Exploring the Performance of ChatGPT Versions 3.5, 4, and 4 With Vision in the Chilean Medical Licensing Examination: Observational Study

JMIR Med Educ 2024;10:e55048

DOI: 10.2196/55048

PMID: 38686550

PMCID: 11082432

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Education

Date Submitted: Nov 30, 2023

Date Accepted: Mar 22, 2024

Exploring the Proficiency of ChatGPT 3.5, 4, and 4 with Vision in the Chilean Medical Licensing Exam: An Observational Study

ABSTRACT

Citation

Copyright