Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Education

Date Submitted: Nov 30, 2023
Date Accepted: Mar 22, 2024

The final, peer-reviewed published version of this preprint can be found here:

Exploring the Performance of ChatGPT Versions 3.5, 4, and 4 With Vision in the Chilean Medical Licensing Examination: Observational Study

Rojas M, Rojas M, Burgess V, Toro J, Salehi S

Exploring the Performance of ChatGPT Versions 3.5, 4, and 4 With Vision in the Chilean Medical Licensing Examination: Observational Study

JMIR Med Educ 2024;10:e55048

DOI: 10.2196/55048

PMID: 38686550

PMCID: 11082432

Exploring the Proficiency of ChatGPT 3.5, 4, and 4 with Vision in the Chilean Medical Licensing Exam: An Observational Study

  • Marcos Rojas; 
  • Marcelo Rojas; 
  • Valentina Burgess; 
  • Javier Toro; 
  • Shima Salehi

ABSTRACT

Background:

The deployment of OpenAI's ChatGPT 3.5 and its subsequent versions, ChatGPT 4 and 4 with Vision (4V), has notably influenced the medical field. Demonstrating remarkable performance in medical exams globally, these models show potential for educational applications. However, their effectiveness in non-English contexts, particularly in Chile's Medical Licensing Exam, a critical step for medical practitioners in Chile, is less explored. This gap highlights the need to evaluate ChatGPT's adaptability to diverse linguistic and cultural scenarios.

Objective:

This study aims to evaluate the proficiency of ChatGPT versions 3.5, 4, and 4V in answering EUNACOM (Examen Único Nacional de Conocimientos de Medicina), a major medical examination in Chile, format questions.

Methods:

Three official drills from the University of Chile, mirroring the EUNACOM exam structure and difficulty, were used to test ChatGPT versions 3.5, 4, and 4V. The three ChatGPT versions underwent three rounds of answering each drill. Responses to questions during each round were systematically categorized and analyzed to assess the accuracy rate of the responses.

Results:

All versions of ChatGPT successfully passed EUNACOM-style exams, with version 4 outperforming 3.5 and 4V. A detailed analysis revealed a higher accuracy rate in questions related to Surgery and Psychiatry for all versions, while performance dipped in the areas of Internal Medicine and Public Health. Version 4V didn't demonstrate a better performance compared to the two other versions, despite access to figures of the questions.

Conclusions:

The study reveals ChatGPT's ability to pass the EUNACOM test, with distinct proficiencies across versions 3.5, 4, and 4V. Notably, advancements in artificial intelligence (AI) don't significantly enhance image-based question performance. The variations in proficiency across medical fields suggest the need for more nuanced AI training. While AI shows promise in medical education, its limitations in depth and variability of expertise highlight the necessity for medical curricula to focus on critical thinking and reflective practices, ensuring the effective integration of AI in patient-centered care.


 Citation

Please cite as:

Rojas M, Rojas M, Burgess V, Toro J, Salehi S

Exploring the Performance of ChatGPT Versions 3.5, 4, and 4 With Vision in the Chilean Medical Licensing Examination: Observational Study

JMIR Med Educ 2024;10:e55048

DOI: 10.2196/55048

PMID: 38686550

PMCID: 11082432

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.