JMIR Preprints #57451: Performance of GPT-3.5 and GPT-4 in the Korean National Examination for Pharmacists: Comparison Study

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Performance of GPT-3.5 and GPT-4 in the Korean National Examination for Pharmacists: Comparison Study

Hye Kyung Jin;
EunYoung Kim

ABSTRACT

Background:

ChatGPT, a recently developed AI chatbot and a notably large language model (LLMs), has demonstrated improved performance in medical field examinations. However, there is currently little research on its efficacy in languages other than English, or in pharmacy-related examinations.

Objective:

This study aimed to evaluate the performance of the GPT models using the Korean Pharmacist Licensing Examination (KPLE).

Methods:

We evaluated the percentage of correct answers provided by two different versions of ChatGPT (GPT-3.5 and GPT-4) for all multiple-choice single-answer exam questions, excluding image-based questions. In total, 320, 317, and 323 questions from the 2021, 2022, and 2023 KPLEs, respectively, were included in the final analysis, which consisted of four units: pharmaceutical life sciences, industrial pharmacy, clinical pharmacy practice, and pharmacy law.

Results:

The GPT-4 model consistently outperformed the GPT-3.5 across all years and question categories. The three-year average percentage of correct answers for the GPT-4 was 86.5% (SD 0.7%) and GPT-3.5 was 60.7% (SD 1.6%). Specifically, the highest percentage of correct answers (97.4 %) was observed for clinical pharmacy practice I questions in 2023. In contrast, the lowest score was recorded for GPT-3.5 in the Clinical Pharmacy Practice II & Pharmacy Law Unit of 2022, with a recorded value of 42.3%. Additionally, when comparing the performance of AI with that of human participants, pharmacy students outperformed both models (average 92.3% correct answers).

Conclusions:

GPT models passed the Korean National Pharmacy Examination during the most recent three years or performed very close to the passing threshold. The current study demonstrates the potential of applying LLMs in the pharmacy domain; however, extensive research is needed to evaluate their reliability and ensure their secure application in pharmacy contexts. Addressing these limitations could make GPT a more reliable tool for pharmacy education and for pharmacists in their daily practice.

Citation

Please cite as:

Jin HK, Kim E

Performance of GPT-3.5 and GPT-4 on the Korean Pharmacist Licensing Examination: Comparison Study

JMIR Med Educ 2024;10:e57451

DOI: 10.2196/57451

PMID: 39630413

PMCID: 11633516

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Education

Date Submitted: Feb 17, 2024

Date Accepted: Oct 9, 2024

Performance of GPT-3.5 and GPT-4 in the Korean National Examination for Pharmacists: Comparison Study

ABSTRACT

Citation

Copyright