Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Nursing

Date Submitted: Mar 20, 2023
Date Accepted: May 27, 2023

The final, peer-reviewed published version of this preprint can be found here:

Performance of the Large Language Model ChatGPT on the National Nurse Examinations in Japan: Evaluation Study

Taira K, Itaya T, Hanada A

Performance of the Large Language Model ChatGPT on the National Nurse Examinations in Japan: Evaluation Study

JMIR Nursing 2023;6:e47305

DOI: 10.2196/47305

PMID: 37368470

PMCID: 10337249

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Performance of the ChatGPT on the National Nurse Examination In Japan: Large Language Model

  • Kazuya Taira; 
  • Takahiro Itaya; 
  • Ayame Hanada

ABSTRACT

Background:

The Chat Generative Pre-trained Transformer (ChatGPT), a large language model, has shown good performance on physician certification exams and medical consultations. However, its performance has not been examined in languages other than English or on nursing exams.

Objective:

We aimed to evaluate the performance of the ChatGPT on Japanese National Nurse Examinations.

Methods:

We evaluated the percentage of correct answers provided by the ChatGPT (GPT-3.5) for all questions on the Japanese National Nurse Examination from 2018–2022, excluding inappropriate questions and questions containing images. The exam consists of 240 questions each year, divided into basic knowledge questions that test the basic issues of particular importance to nurses and general questions that test a wide range of specialized knowledge. The format of questions had also two types: simple-choice and situation-setup questions. Simple-choice questions are primarily knowledge-based and multiple-choice, whereas situation-setup questions entail the candidate reading a patient and family situation description, and selecting the nurse's action or patient's response. Hence, the questions were standardized using two types of prompts before requesting answers from the ChatGPT. Chi-square tests were conducted to compare the percentage of correct answers for each year's exam format and specialty area related to the question. In addition, a Cochran-Armitage trend test was performed on the percentage of correct answers from 2018–2022.

Results:

The 5-year average percentage of correct answers for the ChatGPT was 75.1% ± 3.0% for basic knowledge questions and 64.5% ± 5.0% for general questions. The highest percentage of correct answers on the 2018 exam was 80% for basic knowledge questions and 71.2% for general questions. The ChatGPT met the passing criteria for the 2018 Japanese National Nurse Examination and was close to passing the 2019–2022 exams, with only a few more correct answers required to pass. In some areas, such as Pharmacology, Social welfare, Related Law and Regulations, Endocrinology/Metabolism, and Skin, the ChatGPT had lower percentages of correct answers, with higher percentages of correct answers in the areas of Nutrition, Pathology, Hematology, Eye, Ear Nose and Throat, Tooth and Oral, and Nursing Integration and Practice.

Conclusions:

The ChatGPT only passed the 2018 Japanese National Nursing Examination. Although it did not pass the exams from other years, it performed very close to the passing level, including on psychological, communicational, and nurse-specific questions. Clinical Trial: Not applicable.


 Citation

Please cite as:

Taira K, Itaya T, Hanada A

Performance of the Large Language Model ChatGPT on the National Nurse Examinations in Japan: Evaluation Study

JMIR Nursing 2023;6:e47305

DOI: 10.2196/47305

PMID: 37368470

PMCID: 10337249

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.