Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Education

Date Submitted: Dec 2, 2024
Date Accepted: Jun 1, 2025

The final, peer-reviewed published version of this preprint can be found here:

Role of Artificial Intelligence in Surgical Training by Assessing GPT-4 and GPT-4o on the Japan Surgical Board Examination With Text-Only and Image-Accompanied Questions: Performance Evaluation Study

Maruyama H, Toyama Y, Takanami K, Takase K, Kamei T

Role of Artificial Intelligence in Surgical Training by Assessing GPT-4 and GPT-4o on the Japan Surgical Board Examination With Text-Only and Image-Accompanied Questions: Performance Evaluation Study

JMIR Med Educ 2025;11:e69313

DOI: 10.2196/69313

PMID: 40737609

PMCID: 12310146

Role of Artificial Intelligence in Surgical Training by Assessing GPT-4 and GPT-4o on the Japan Surgical Board Examination with Text-Only and Image-Accompanied Questions

  • Hiroki Maruyama; 
  • Yoshitaka Toyama; 
  • Kentaro Takanami; 
  • Kei Takase; 
  • Takashi Kamei

ABSTRACT

Background:

AI and large language models (LLMs), particularly GPT-4 and GPT-4o, have demonstrated high accuracy in medical examinations. GPT-4o, with its enhanced diagnostic capabilities through advanced image processing and updated knowledge, holds significant promise for medical education. Japanese surgeons face critical challenges, including a declining workforce, regional healthcare disparities, and issues related to work hours. They anticipate the potential utility of LLMs in surgical education. However, no studies to date have assessed GPT-4o’s surgical knowledge or its performance in the field of surgery.

Objective:

This study aimed to evaluate the potential of GPT-4 and GPT-4o in surgical education by using them to take the Japan Surgical Board Examination (JSBE), which includes both textual questions and medical images, such as surgical and computed tomography scans, to comprehensively assesses surgical knowledge.

Methods:

We used 297 multiple-choice questions from the 2021–2023 JSBEs. The questions were in Japanese and included 104 images. First, the responses of GPT-4 and GPT-4o to only the textual questions were collected via the OpenAI’s application programming interface to evaluate their accuracy. Subsequently, the accuracy of responses to questions that included images was assessed by inputting both the text and images.

Results:

The overall correct answer rates of GPT-4o and GPT-4 for the text-only questions were 78% (231 out of 297) and 55% (163 out of 297), respectively, with GPT-4o outperforming GPT-4 by 23% (p = 0.0001). In contrast, there was no significant improvement in correct answer rate for questions that included images compared with the results for the text-only questions.

Conclusions:

GPT-4o outperformed GPT-4 on the JSBE. Despite their capabilities, image recognition remains a challenge for these LLMs, and their clinical applications require caution owing to the potential inaccuracies of their results. Clinical Trial: None


 Citation

Please cite as:

Maruyama H, Toyama Y, Takanami K, Takase K, Kamei T

Role of Artificial Intelligence in Surgical Training by Assessing GPT-4 and GPT-4o on the Japan Surgical Board Examination With Text-Only and Image-Accompanied Questions: Performance Evaluation Study

JMIR Med Educ 2025;11:e69313

DOI: 10.2196/69313

PMID: 40737609

PMCID: 12310146

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.