Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Education

Date Submitted: Feb 17, 2024
Date Accepted: Oct 9, 2024

The final, peer-reviewed published version of this preprint can be found here:

Performance of GPT-3.5 and GPT-4 on the Korean Pharmacist Licensing Examination: Comparison Study

Jin HK, Kim E

Performance of GPT-3.5 and GPT-4 on the Korean Pharmacist Licensing Examination: Comparison Study

JMIR Med Educ 2024;10:e57451

DOI: 10.2196/57451

PMID: 39630413

PMCID: 11633516

Performance of GPT-3.5 and GPT-4 in the Korean National Examination for Pharmacists: Comparison Study

  • Hye Kyung Jin; 
  • EunYoung Kim

ABSTRACT

Background:

ChatGPT, a recently developed AI chatbot and a notably large language model (LLMs), has demonstrated improved performance in medical field examinations. However, there is currently little research on its efficacy in languages other than English, or in pharmacy-related examinations.

Objective:

This study aimed to evaluate the performance of the GPT models using the Korean Pharmacist Licensing Examination (KPLE).

Methods:

We evaluated the percentage of correct answers provided by two different versions of ChatGPT (GPT-3.5 and GPT-4) for all multiple-choice single-answer exam questions, excluding image-based questions. In total, 320, 317, and 323 questions from the 2021, 2022, and 2023 KPLEs, respectively, were included in the final analysis, which consisted of four units: pharmaceutical life sciences, industrial pharmacy, clinical pharmacy practice, and pharmacy law.

Results:

The GPT-4 model consistently outperformed the GPT-3.5 across all years and question categories. The three-year average percentage of correct answers for the GPT-4 was 86.5% (SD 0.7%) and GPT-3.5 was 60.7% (SD 1.6%). Specifically, the highest percentage of correct answers (97.4 %) was observed for clinical pharmacy practice I questions in 2023. In contrast, the lowest score was recorded for GPT-3.5 in the Clinical Pharmacy Practice II & Pharmacy Law Unit of 2022, with a recorded value of 42.3%. Additionally, when comparing the performance of AI with that of human participants, pharmacy students outperformed both models (average 92.3% correct answers).

Conclusions:

GPT models passed the Korean National Pharmacy Examination during the most recent three years or performed very close to the passing threshold. The current study demonstrates the potential of applying LLMs in the pharmacy domain; however, extensive research is needed to evaluate their reliability and ensure their secure application in pharmacy contexts. Addressing these limitations could make GPT a more reliable tool for pharmacy education and for pharmacists in their daily practice.


 Citation

Please cite as:

Jin HK, Kim E

Performance of GPT-3.5 and GPT-4 on the Korean Pharmacist Licensing Examination: Comparison Study

JMIR Med Educ 2024;10:e57451

DOI: 10.2196/57451

PMID: 39630413

PMCID: 11633516

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.