Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Sep 4, 2024
Date Accepted: Dec 10, 2024
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Large Language Models in Worldwide Medical Exams: Platform Development and Comprehensive Analysis
ABSTRACT
Background:
Background:
The integration of large language models (LLMs) into medical education has demonstrated tremendous potential, with significant implications for learning and assessment.
Objective:
Objective:
This study aims to present MedExamLLM, a comprehensive platform designed to systematically evaluate the performance of LLMs across a diverse range of medical exams conducted globally.
Methods:
Methods:
We performed a systematic search in the PubMed database to identify relevant publications. The screening process of candidate publications was independently conducted by two researchers to ensure accuracy and reliability. We manurally curated, standardized, and organized data, including exam information, data process information, model performance, data availability, and reference. The web platform was developed utilizing Streamlit, Bootstrap, and Apache ECharts.
Results:
Results:
MedExamLLM is an open-source, free-accessible, and public-available online platform, providing the comprehensive performance evaluation information and evidence knowledge of LLMs on medical exams around the world. MedExamLLM comprises information of 16 large language models on 198 medical exams conducted across 28 countries in 15 languages from year 2009 to 2023. The United States leads in the number of medical exams and publications, with English being the primary language used in these exams. The GPT series models, especially GPT-4, demonstrate superior performance compared to other models, achieving significantly higher pass rates. The analysis reveals significant variability in the capabilities of LLMs across different geographic and linguistic contexts.
Conclusions:
Conclusions:
MedExamLLM platform serves as a valuable resource for educators, researchers, and developers in the fields of clinical medicine and artificial intelligence. By providing valuable insights into the capabilities of LLMs in medical exams around the world, MedExamLLM not only contributes to the growing body of knowledge on LLMs in education, but also supports the future integration of artificial intelligence technologies into medical education.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.