Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Education

Date Submitted: Jan 28, 2024
Date Accepted: Dec 17, 2024

The final, peer-reviewed published version of this preprint can be found here:

Performance of ChatGPT-3.5 and ChatGPT-4 in the Taiwan National Pharmacist Licensing Examination: Comparative Evaluation Study

Wang YM, Shen HW, Chen TJ, Chiang SC, Lin TG

Performance of ChatGPT-3.5 and ChatGPT-4 in the Taiwan National Pharmacist Licensing Examination: Comparative Evaluation Study

JMIR Med Educ 2025;11:e56850

DOI: 10.2196/56850

PMID: 39864950

PMCID: 11769692

Performance of ChatGPT-3.5 and ChatGPT-4 in the Taiwan National Pharmacist Licensing Examination: A Comparative Evaluation Study

  • Ying-Mei Wang; 
  • Hung-Wei Shen; 
  • Tzeng-Ji Chen; 
  • Shu-Chiung Chiang; 
  • Ting-Guan Lin

ABSTRACT

Background:

Open AI released version ChatGPT3.5 (Chat Generative Pre-Trained Transformer) and GPT-4 between 2022 and 2023. GPT3.5 has demonstrated proficiency in various examinations, especially in the United States Medical Licensing Examination. However, GPT-4 is even more advanced.

Objective:

This study wants to examine the efficacy of GPT-3.5 and GPT-4 within the Taiwan National Pharmacist Licensing Examination to ascertain their utility and potential in clinical pharmacy and education.

Methods:

The pharmacist examination in Taiwan consists of 2 stages: basic subjects and clinical subjects. In this study, exam questions were manually fed into the GPT-3.5 and GPT-4 models, and their responses recorded; graphic-based questions were excluded. This research encompassed the following: 1) determining the answering accuracy of GPT-3.5 and GPT-4, 2) to categorize question types and observing differences in model performance across these categories, and 3) comparing model performance in computational and situational questions. Microsoft Excel and R software were employed for statistical analyses.

Results:

GPT-4 yielded a 72.9% accuracy rate, overshadowing GPT-3.5’s 59.1%. In basic subjects, GPT-4 significantly outperformed GPT-3.5 (73.4% vs. 53.2%). However, in clinical subjects, only minor differences were observed between their accuracy. Specifically, GPT-4 outperformed GPT-3.5 in computational and situational questions.

Conclusions:

GPT-4 convincingly outperformed GPT-3.5 in the national pharmacist test, especially in foundational subjects. GPT4 thus has considerable potential in the clinical pharmacy and education settings; for example, it could provide drug interaction information and aid in consultations. Future research could integrate ChatGPT with medical databases, offering a platform for medical decision-making and thereby enhancing the quality of medical care.


 Citation

Please cite as:

Wang YM, Shen HW, Chen TJ, Chiang SC, Lin TG

Performance of ChatGPT-3.5 and ChatGPT-4 in the Taiwan National Pharmacist Licensing Examination: Comparative Evaluation Study

JMIR Med Educ 2025;11:e56850

DOI: 10.2196/56850

PMID: 39864950

PMCID: 11769692

Per the author's request the PDF is not available.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.