JMIR Preprints #66114: Large Language Models in Worldwide Medical Exams: Platform Development and Comprehensive Analysis

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)

Large Language Models in Worldwide Medical Exams: Platform Development and Comprehensive Analysis

Bairong Shen;
Hui Zong;
Rongrong Wu;
Jiaxue Cha;
Jiao Wang;
Erman Wu;
Jiakun Li;
Yi Zhou;
Chi Zhang;
Weizhe Feng

ABSTRACT

Background:

The integration of large language models (LLMs) into medical education has demonstrated tremendous potential, with significant implications for learning and assessment.

Objective:

This study aims to present MedExamLLM, a comprehensive platform designed to systematically evaluate the performance of LLMs across a diverse range of medical exams conducted globally.

Methods:

We performed a systematic search in the PubMed database to identify relevant publications. The screening process of candidate publications was independently conducted by two researchers to ensure accuracy and reliability. We manurally curated, standardized, and organized data, including exam information, data process information, model performance, data availability, and reference. The web platform was developed utilizing Streamlit, Bootstrap, and Apache ECharts.

Results:

MedExamLLM is an open-source, free-accessible, and public-available online platform, providing the comprehensive performance evaluation information and evidence knowledge of LLMs on medical exams around the world. MedExamLLM comprises information of 16 large language models on 198 medical exams conducted across 28 countries in 15 languages from year 2009 to 2023. The United States leads in the number of medical exams and publications, with English being the primary language used in these exams. The GPT series models, especially GPT-4, demonstrate superior performance compared to other models, achieving significantly higher pass rates. The analysis reveals significant variability in the capabilities of LLMs across different geographic and linguistic contexts.

Conclusions:

MedExamLLM platform serves as a valuable resource for educators, researchers, and developers in the fields of clinical medicine and artificial intelligence. By providing valuable insights into the capabilities of LLMs in medical exams around the world, MedExamLLM not only contributes to the growing body of knowledge on LLMs in education, but also supports the future integration of artificial intelligence technologies into medical education.

Citation

Please cite as:

Shen B, Zong H, Wu R, Cha J, Wang J, Wu E, Li J, Zhou Y, Zhang C, Feng W

Large Language Models in Worldwide Medical Exams: Platform Development and Comprehensive Analysis

J Med Internet Res 2024;26:e66114

DOI: 10.2196/66114

PMID: 39729356

PMCID: 11724220

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Sep 4, 2024

Date Accepted: Dec 10, 2024

Large Language Models in Worldwide Medical Exams: Platform Development and Comprehensive Analysis

ABSTRACT

Citation

Copyright