Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Sep 4, 2024
Date Accepted: Dec 10, 2024

The final, peer-reviewed published version of this preprint can be found here:

Large Language Models in Worldwide Medical Exams: Platform Development and Comprehensive Analysis

Shen B, Zong H, Wu R, Cha J, Wang J, Wu E, Li J, Zhou Y, Zhang C, Feng W

Large Language Models in Worldwide Medical Exams: Platform Development and Comprehensive Analysis

J Med Internet Res 2024;26:e66114

DOI: 10.2196/66114

PMID: 39729356

PMCID: 11724220

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Large Language Models in Worldwide Medical Exams: Platform Development and Comprehensive Analysis

  • Bairong Shen; 
  • Hui Zong; 
  • Rongrong Wu; 
  • Jiaxue Cha; 
  • Jiao Wang; 
  • Erman Wu; 
  • Jiakun Li; 
  • Yi Zhou; 
  • Chi Zhang; 
  • Weizhe Feng

ABSTRACT

Background:

Background:

The integration of large language models (LLMs) into medical education has demonstrated tremendous potential, with significant implications for learning and assessment.

Objective:

Objective:

This study aims to present MedExamLLM, a comprehensive platform designed to systematically evaluate the performance of LLMs across a diverse range of medical exams conducted globally.

Methods:

Methods:

We performed a systematic search in the PubMed database to identify relevant publications. The screening process of candidate publications was independently conducted by two researchers to ensure accuracy and reliability. We manurally curated, standardized, and organized data, including exam information, data process information, model performance, data availability, and reference. The web platform was developed utilizing Streamlit, Bootstrap, and Apache ECharts.

Results:

Results:

MedExamLLM is an open-source, free-accessible, and public-available online platform, providing the comprehensive performance evaluation information and evidence knowledge of LLMs on medical exams around the world. MedExamLLM comprises information of 16 large language models on 198 medical exams conducted across 28 countries in 15 languages from year 2009 to 2023. The United States leads in the number of medical exams and publications, with English being the primary language used in these exams. The GPT series models, especially GPT-4, demonstrate superior performance compared to other models, achieving significantly higher pass rates. The analysis reveals significant variability in the capabilities of LLMs across different geographic and linguistic contexts.

Conclusions:

Conclusions:

MedExamLLM platform serves as a valuable resource for educators, researchers, and developers in the fields of clinical medicine and artificial intelligence. By providing valuable insights into the capabilities of LLMs in medical exams around the world, MedExamLLM not only contributes to the growing body of knowledge on LLMs in education, but also supports the future integration of artificial intelligence technologies into medical education.


 Citation

Please cite as:

Shen B, Zong H, Wu R, Cha J, Wang J, Wu E, Li J, Zhou Y, Zhang C, Feng W

Large Language Models in Worldwide Medical Exams: Platform Development and Comprehensive Analysis

J Med Internet Res 2024;26:e66114

DOI: 10.2196/66114

PMID: 39729356

PMCID: 11724220

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.