Currently submitted to: JMIR Medical Education
Date Submitted: Jan 16, 2026
Open Peer Review Period: Jan 16, 2026 - Mar 13, 2026
(currently open for review)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Large Language Models And Examination Performance In Healthcare Education: A Bibliometric Analysis
ABSTRACT
Background:
Large language models (LLMs) are increasingly used and evaluated in health professions education, including studies assessing model performance on healthcare examination questions. The rapid growth and heterogeneity of this literature make it difficult to track research concentration, collaboration patterns, and emerging themes.
Objective:
To map publication trends, key contributors, collaboration networks, and thematic hotspots in research on LLM-supported exam solving in healthcare education.
Methods:
We conducted a bibliometric analysis of publications from 2023–2025. Searches were performed in PubMed, Scopus, CINAHL Ultimate (EBSCOhost), and Web of Science using structured terms for AI/LLMs (eg, ChatGPT, generative AI, large language models) combined with healthcare education and training concepts. Eligible studies addressed AI-based technologies within healthcare education or training contexts; studies focused solely on clinical practice or non-educational applications were excluded. Bibliographic metadata from PubMed (TXT) and Scopus (BIB) were merged and analyzed using bibliometrix/Biblioshiny (R) and VOSviewer to quantify productivity, collaboration (including international co-authorship), and keyword co-occurrence patterns.
Results:
The dataset comprised 262 documents from 158 sources, with an annual publication growth rate of 36.58% and a mean document age of 1.83 years. A total of 1,351 authors contributed (mean 5.97 co-authors per document); international co-authored publications accounted for 13.36%. Most records were journal articles (253/262), followed by letters (8/262) and one conference paper. Annual output rose from 52 (2023) to 113 (2024; +117.3%), then decreased to 97 (2025; −14.2% vs 2024) while remaining above 2023 levels. JMIR Medical Education published the most articles on this topic (34/262), followed by Scientific Reports (9/262) and BMC Medical Education (7/262). Frequent keywords included “humans” (n=144), “artificial intelligence” (n=82), “generative AI” (n=30), and “large language models” (n=20); education-focused terms such as “educational measurement/methods” were also prominent (n=76).
Conclusions:
Research on LLMs and exam performance in healthcare education expanded rapidly from 2023–2025, with publication activity concentrated in a limited set of journals and relatively low international collaboration. Thematic patterns emphasize assessment-related outcomes and LLM/ChatGPT performance, supporting the need for more comparable, transparent reporting (eg, prompts and model versions) and education-centered outcomes beyond accuracy in future studies. Clinical Trial: /
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.