Accepted for/Published in: JMIR Medical Education
Date Submitted: Feb 17, 2024
Date Accepted: Oct 9, 2024
Performance of GPT-3.5 and GPT-4 in the Korean National Examination for Pharmacists: Comparison Study
ABSTRACT
Background:
ChatGPT, a recently developed AI chatbot and a notably large language model (LLMs), has demonstrated improved performance in medical field examinations. However, there is currently little research on its efficacy in languages other than English, or in pharmacy-related examinations.
Objective:
This study aimed to evaluate the performance of the GPT models using the Korean Pharmacist Licensing Examination (KPLE).
Methods:
We evaluated the percentage of correct answers provided by two different versions of ChatGPT (GPT-3.5 and GPT-4) for all multiple-choice single-answer exam questions, excluding image-based questions. In total, 320, 317, and 323 questions from the 2021, 2022, and 2023 KPLEs, respectively, were included in the final analysis, which consisted of four units: pharmaceutical life sciences, industrial pharmacy, clinical pharmacy practice, and pharmacy law.
Results:
The GPT-4 model consistently outperformed the GPT-3.5 across all years and question categories. The three-year average percentage of correct answers for the GPT-4 was 86.5% (SD 0.7%) and GPT-3.5 was 60.7% (SD 1.6%). Specifically, the highest percentage of correct answers (97.4 %) was observed for clinical pharmacy practice I questions in 2023. In contrast, the lowest score was recorded for GPT-3.5 in the Clinical Pharmacy Practice II & Pharmacy Law Unit of 2022, with a recorded value of 42.3%. Additionally, when comparing the performance of AI with that of human participants, pharmacy students outperformed both models (average 92.3% correct answers).
Conclusions:
GPT models passed the Korean National Pharmacy Examination during the most recent three years or performed very close to the passing threshold. The current study demonstrates the potential of applying LLMs in the pharmacy domain; however, extensive research is needed to evaluate their reliability and ensure their secure application in pharmacy contexts. Addressing these limitations could make GPT a more reliable tool for pharmacy education and for pharmacists in their daily practice.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.