Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Education

Date Submitted: Jan 28, 2024
Date Accepted: Aug 15, 2024

The final, peer-reviewed published version of this preprint can be found here:

Performance of ChatGPT in the In-Training Examination for Anesthesiology and Pain Medicine Residents in South Korea: Observational Study

Yoon SH, Oh SK, Lim BG, Lee HJ

Performance of ChatGPT in the In-Training Examination for Anesthesiology and Pain Medicine Residents in South Korea: Observational Study

JMIR Med Educ 2024;10:e56859

DOI: 10.2196/56859

PMID: 39284182

PMCID: 11443200

Performance of ChatGPT in the In-Training Examination for Anesthesiology and Pain Medicine Residents in South Korea: Observational Study

  • Soo-Hyuk Yoon; 
  • Seok Kyeong Oh; 
  • Byung Gun Lim; 
  • Ho-Jin Lee

ABSTRACT

Background:

Since ChatGPT's release, it has been tested in healthcare, including on the United States Medical Licensing Exam and specialty exams, showing near-passing results. Its performance in the field of anesthesiology was also assessed with English board exam questions, but its effectiveness in Korean remains unexplored.

Objective:

This study investigated the problem-solving performance of ChatGPT in the fields of anesthesiology and pain medicine in a Korean language context, highlighted advancements in artificial intelligence (AI), and explored its potential applications in medical education.

Methods:

We investigated the performance of the GPT-4, GPT-3.5, and CLOVA X in solving problems in the fields of anesthesiology and pain medicine, utilizing in-training examinations that have been posed to Korean anesthesiology residents over the past five years, with an annual composition of 100 questions. Questions containing images, diagrams, or photographs were excluded from analysis. Furthermore, to assess the performance differences of the GPT across different languages, we conducted a comparative analysis of the GPT-4's problem-solving proficiency using both the original Korean texts and their English translations.

Results:

A total of 398 questions were analyzed. GPT-4 (67.8%) demonstrated a significantly higher overall accuracy rate than GPT-3.5 (37.2%) and CLOVA-X (36.7%). However, GPT-3.5 and CLOVA X did not show significant differences in their overall accuracy rates. Additionally, the GPT-4 showed superior performance on questions translated into English, indicating a language processing discrepancy (English: 75.4% vs. Korean: 67.8%; difference 7.5%; 95% CI: 3.1–11.9%; P = 0.001).

Conclusions:

This study underscores the potential of AI tools such as ChatGPT in medical education and practice but emphasizes the need for cautious application and further refinement, especially in non-English medical contexts. The findings suggest that, although AI advancements are promising, they require careful evaluation and development to ensure accuracy and reliability across diverse linguistic and professional settings. Clinical Trial: Not applicable


 Citation

Please cite as:

Yoon SH, Oh SK, Lim BG, Lee HJ

Performance of ChatGPT in the In-Training Examination for Anesthesiology and Pain Medicine Residents in South Korea: Observational Study

JMIR Med Educ 2024;10:e56859

DOI: 10.2196/56859

PMID: 39284182

PMCID: 11443200

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.