JMIR Preprints #50965: GPT-4 Outperforms GPT-3.5 and Ranks Among the Top 8 % of Medical Students: An Observational Study of Original German Medical Licensing Exam Questions

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

GPT-4 Outperforms GPT-3.5 and Ranks Among the Top 8 % of Medical Students: An Observational Study of Original German Medical Licensing Exam Questions

Annika Meyer;
Janik Riese;
Thomas Streichert

ABSTRACT

Background:

The potential of artificial intelligence, such as GPT, has gained significant attention in the medical field. This enthusiasm is driven not only by recent breakthroughs and improved accessibility, but also by the prospect of democratizing medical knowledge and promoting equitable healthcare.

Objective:

However, the performance of ChatGPT is substantially influenced by the input language and given the growing public trust in this artificial intelligence compared to traditional sources of information, investigating its medical accuracy across different languages is of particular importance.

Methods:

To assess GPT-3.5’s and GPT-4's medical proficiency, we used 937 original multiple-choice questions from three written German medical licensing exams in October 2021, April 2022, and October 2022.

Results:

GPT-4 achieved an average score of 85% and ranked in the 92.8th, 99.5th, and 92.6th percentiles among medical students who took the same exams in October 2021, April 2022, and October 2022, respectively. This represents a substantial improvement of 27% compared to GPT-3.5, which only passed one out of the three exams. While GPT-3.5 performed well in psychiatric questions, GPT-4 exhibited strengths in internal medicine and surgery but showed weakness in academic research.

Conclusions:

The study results highlight ChatGPT’s remarkable improvement from moderate (GPT-3.5) to high competency (GPT-4) in answering medical licensing questions in German. While its predecessor was imprecise and inconsistent, GPT-4 demonstrates considerable potential to improve medical education and patient care, provided that medically trained users critically evaluate its results. As the replacement of search engines by artificial intelligence seems possible in the future, further studies with non-professional questions are needed to assess the safety and accuracy of ChatGPT for the general population. Clinical Trial: None needed.

Citation

Please cite as:

Meyer A, Riese J, Streichert T

Comparison of the Performance of GPT-3.5 and GPT-4 With That of Medical Students on the Written German Medical Licensing Examination: Observational Study

JMIR Med Educ 2024;10:e50965

DOI: 10.2196/50965

PMID: 38329802

PMCID: 10884900

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Education

Date Submitted: Jul 18, 2023

Date Accepted: Dec 11, 2023

GPT-4 Outperforms GPT-3.5 and Ranks Among the Top 8 % of Medical Students: An Observational Study of Original German Medical Licensing Exam Questions

ABSTRACT

Citation

Copyright