JMIR Preprints #48039: ChatGPT Performance on the Peruvian National Licensing Medical Examination: A Cross-sectional Study

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

ChatGPT Performance on the Peruvian National Licensing Medical Examination: A Cross-sectional Study

Abigaíl García-Vicente;
Sonia F. Vizcarra-Jiménez;
Janith Paola De la Cruz-Galán;
Jesús Gutiérrez-Arratia;
BLANCA GERALDINE QUIROGA TORRES;
Javier A. Flores-Cohaila

ABSTRACT

Background:

The Peruvian National Licensing Medical Examination (ENAM) is an important milestone for Peruvian medical doctors. However, the failure rate almost reaches 60%. Access to high-quality medical education is inequitable. The recent growth of Artificial Intelligence (AI) poses an opportunity to close this breach and improve Peruvian medical education.

Objective:

To evaluate ChatGPT 3.5 performance in the Peruvian National Licensing Medical Examination and assess the usefulness of the explanation provided for each correct answer.

Methods:

The dataset consisted of 180 multiple-choice questions from the 2022 Peruvian National Licensing Medical Examination (ENAM). ChatGPT performance on these questions was evaluated with three prompts: open-ended (OE), multiple-choice questions without justification (MCQ-NJ), and multiple-choice questions with justification (MCQ-J). The quality of the explanations was evaluated by two independent raters. The performance of ChatGPT was compared with the score of 1025 Peruvian junior doctors who took the ENAM in 2023 as a progress test and with the historical mean of Peruvian medical doctors from 2009-2019.

Results:

ChatGPT passed the ENAM on the three prompts with the highest accuracy on the MCQ-J, scoring 77% (139/180). Surpassing the mean score of historical junior doctors (55%) and from junior doctors of 2023 (54%). Among correct answers on MCQ-J, 64% (89/139) provided explanations of good quality.

Conclusions:

ChatGPT not only passed the ENAM but also outperformed the mean score of Peruvian examinees, raising concerns about the current state of Peruvian medical education. The variable performance across different prompts emphasizes the need for further research on prompt engineering. Although ChatGPT provided good-quality explanations in 64% of correct answers, its use to aid medical education still requires a review process. We anticipate continuous improvement in AI performance, potentially closing the barrier to access to high-quality medical education in the future. Clinical Trial: None

Citation

Please cite as:

García-Vicente A, Vizcarra-Jiménez SF, De la Cruz-Galán JP, Gutiérrez-Arratia J, QUIROGA TORRES BG, Flores-Cohaila JA

Performance of ChatGPT on the Peruvian National Licensing Medical Examination: Cross-Sectional Study

JMIR Med Educ 2023;9:e48039

DOI: 10.2196/48039

PMID: 37768724

PMCID: 10570896

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Education

Date Submitted: Apr 9, 2023

Open Peer Review Period: Apr 9, 2023 - Apr 24, 2023

Date Accepted: Sep 5, 2023

(closed for review but you can still tweet)

ChatGPT Performance on the Peruvian National Licensing Medical Examination: A Cross-sectional Study

ABSTRACT

Citation

Copyright