Accepted for/Published in: JMIR Medical Education
Date Submitted: Nov 30, 2023
Date Accepted: Mar 22, 2024
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Exploring the Proficiency of ChatGPT 3.5, 4, and 4 with Vision in the Chile Medical Licensing Exam
ABSTRACT
Background:
The deployment of OpenAI's ChatGPT 3.5 and its subsequent versions, ChatGPT 4 and 4 with Vision (4V), has notably influenced the medical field. Demonstrating remarkable performance in medical exams globally, these models show potential for educational applications. However, their effectiveness in non-English contexts, particularly in Chile's Medical Licensing Exam, a critical step for medical practitioners in Chile, is less explored. This gap highlights the need to evaluate ChatGPT's adaptability to diverse linguistic and cultural scenarios.
Objective:
This study aims to evaluate the proficiency of ChatGPT versions 3.5, 4, and 4V in answering EUNACOM (Examen Único Nacional de Conocimientos de Medicina), a major medical examination in Chile, format questions.
Methods:
Three official drills from the University of Chile, mirroring the EUNACOM exam structure and difficulty, were used to test ChatGPT versions 3.5, 4, and 4V. The three ChatGPT versions underwent three rounds of answering each drill. Responses to questions during each round were systematically categorized and analyzed to assess the accuracy rate of the responses.
Results:
All versions of ChatGPT successfully passed EUNACOM-style exams, with version 4 outperforming 3.5 and 4V. A detailed analysis revealed a higher accuracy rate in questions related to Surgery and Psychiatry for all versions, while performance dipped in the areas of Internal Medicine and Public Health. Version 4V didn't demonstrate a better performance compared to the two other versions, despite access to figures of the questions.
Conclusions:
The study reveals ChatGPT's ability to pass the EUNACOM test, with distinct proficiencies across versions 3.5, 4, and 4V. Notably, advancements in artificial intelligence (AI) don't significantly enhance image-based question performance. The variations in proficiency across medical fields suggest the need for more nuanced AI training. While AI shows promise in medical education, its limitations in depth and variability of expertise highlight the necessity for medical curricula to focus on critical thinking and reflective practices, ensuring the effective integration of AI in patient-centered care.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.