Accepted for/Published in: JMIR Formative Research
Date Submitted: Apr 9, 2023
Date Accepted: Oct 3, 2023
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Can ChatGPT answer medical questions of the Japanese National Medical Examination: Comparison of accuracy of ChatGPT-3.5 and ChatGPT-4
ABSTRACT
ChatGPT (Open AI, San Francisco, California, USA) has gained considerable attention because of its natural and intuitive responses. One limitation of OpenAI is its failure to perform reinforcement learning based on reliable information, thereby providing inaccurate or meaningless answers. Fortunately, on March 2023 update introduced GPT-4, which, according to internal evaluations, is expected to increase the likelihood of producing factual responses by 40% compared with its predecessor, GPT-3.5. We verified the accuracy of ChatGPT based on GPT-4 (ChatGPT4) and based on GPT-3.5 (ChatGPT3.5) by solving the Japanese National Medical Examination. We excluded questions containing figures and tables unsupported by ChatGPT. Of the 400 questions, 292 were analyzed. The correct response rate for ChatGPT4 was 81.5%, which was significantly higher than 42.8%, the rate for ChatGPT3.5. Moreover, ChatGPT4 surpassed the passing standard (>72%) for the Japanese National Medical Examination, indicating its potential as a diagnostic and therapeutic decision aid for physicians. We anticipate that future updates of ChatGPT will further enhance its accuracy, making it an invaluable resource in the field of medicine.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.