Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Education

Date Submitted: Apr 9, 2023
Open Peer Review Period: Apr 9, 2023 - Apr 24, 2023
Date Accepted: Sep 5, 2023
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Performance of ChatGPT on the Peruvian National Licensing Medical Examination: Cross-Sectional Study

García-Vicente A, Vizcarra-Jiménez SF, De la Cruz-Galán JP, Gutiérrez-Arratia J, QUIROGA TORRES BG, Flores-Cohaila JA

Performance of ChatGPT on the Peruvian National Licensing Medical Examination: Cross-Sectional Study

JMIR Med Educ 2023;9:e48039

DOI: 10.2196/48039

PMID: 37768724

PMCID: 10570896

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

ChatGPT Performance on the Peruvian National Licensing Medical Examination: A Cross-sectional Study

  • Abigaíl García-Vicente; 
  • Sonia F. Vizcarra-Jiménez; 
  • Janith Paola De la Cruz-Galán; 
  • Jesús Gutiérrez-Arratia; 
  • BLANCA GERALDINE QUIROGA TORRES; 
  • Javier A. Flores-Cohaila

ABSTRACT

Background:

The Peruvian National Licensing Medical Examination (ENAM) is an important milestone for Peruvian medical doctors. However, the failure rate almost reaches 60%. Access to high-quality medical education is inequitable. The recent growth of Artificial Intelligence (AI) poses an opportunity to close this breach and improve Peruvian medical education.

Objective:

To evaluate ChatGPT 3.5 performance in the Peruvian National Licensing Medical Examination and assess the usefulness of the explanation provided for each correct answer.

Methods:

The dataset consisted of 180 multiple-choice questions from the 2022 Peruvian National Licensing Medical Examination (ENAM). ChatGPT performance on these questions was evaluated with three prompts: open-ended (OE), multiple-choice questions without justification (MCQ-NJ), and multiple-choice questions with justification (MCQ-J). The quality of the explanations was evaluated by two independent raters. The performance of ChatGPT was compared with the score of 1025 Peruvian junior doctors who took the ENAM in 2023 as a progress test and with the historical mean of Peruvian medical doctors from 2009-2019.

Results:

ChatGPT passed the ENAM on the three prompts with the highest accuracy on the MCQ-J, scoring 77% (139/180). Surpassing the mean score of historical junior doctors (55%) and from junior doctors of 2023 (54%). Among correct answers on MCQ-J, 64% (89/139) provided explanations of good quality.

Conclusions:

ChatGPT not only passed the ENAM but also outperformed the mean score of Peruvian examinees, raising concerns about the current state of Peruvian medical education. The variable performance across different prompts emphasizes the need for further research on prompt engineering. Although ChatGPT provided good-quality explanations in 64% of correct answers, its use to aid medical education still requires a review process. We anticipate continuous improvement in AI performance, potentially closing the barrier to access to high-quality medical education in the future. Clinical Trial: None


 Citation

Please cite as:

García-Vicente A, Vizcarra-Jiménez SF, De la Cruz-Galán JP, Gutiérrez-Arratia J, QUIROGA TORRES BG, Flores-Cohaila JA

Performance of ChatGPT on the Peruvian National Licensing Medical Examination: Cross-Sectional Study

JMIR Med Educ 2023;9:e48039

DOI: 10.2196/48039

PMID: 37768724

PMCID: 10570896

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.