Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Education

Date Submitted: Apr 7, 2023
Open Peer Review Period: Apr 7, 2023 - Apr 24, 2023
Date Accepted: Jun 14, 2023
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study

Takagi S, WATARI T, Erabi A, Sakaguchi K

Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study

JMIR Med Educ 2023;9:e48002

DOI: 10.2196/48002

PMID: 37384388

PMCID: 10365615

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Comparison of Performances of ChatGPT and GPT-4 in the Japanese National Medical Examination;

  • Soshi Takagi; 
  • TAKASHI WATARI; 
  • Ayano Erabi; 
  • Kota Sakaguchi

ABSTRACT

Background:

Background:

ChatGPT’s competence in non-English languages is not well studied.

Objective:

Objective:

Thus, this study compares the performance of ChatGPT and GPT-4 in the Japanese Medical Licensing Examination (JMLE) to evaluate the reliability of these models in clinical reasoning and medical knowledge in non-English languages.

Methods:

Methods:

The study used the default mode of ChatGPT, based on GPT-3.5, the GPT-4 model of ChatGPT plus, and the 2022 JMLE, No. 117. A total of 254 questions were included in the final analysis, which were categorized into three types, namely general, clinical, and clinical sentence questions.

Results:

Results:

The results showed that GPT-4 outperformed ChatGPT in terms of accuracy, particularly for general clinical and clinical sentence questions. GPT-4 also performed better on difficult questions and specific disease questions. Furthermore, GPT-4 achieved the passing criteria for JMLE, indicating its reliability for clinical reasoning and medical knowledge in non-English languages.

Conclusions:

Conclusions:

GPT-4 could become a valuable tool for medical education and clinical support in non-English-speaking regions, such as Japan.


 Citation

Please cite as:

Takagi S, WATARI T, Erabi A, Sakaguchi K

Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study

JMIR Med Educ 2023;9:e48002

DOI: 10.2196/48002

PMID: 37384388

PMCID: 10365615

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.