Previously submitted to: JMIR Medical Education (no longer under consideration since Mar 30, 2026)
Date Submitted: Jan 16, 2026
Open Peer Review Period: Jan 16, 2026 - Mar 13, 2026
(closed for review but you can still tweet)
NOTE: This is an unreviewed Preprint
Warning: This is a unreviewed preprint (What is a preprint?). Readers are warned that the document has not been peer-reviewed by expert/patient reviewers or an academic editor, may contain misleading claims, and is likely to undergo changes before final publication, if accepted, or may have been rejected/withdrawn (a note "no longer under consideration" will appear above).
Peer review me: Readers with interest and expertise are encouraged to sign up as peer-reviewer, if the paper is within an open peer-review period (in this case, a "Peer Review Me" button to sign up as reviewer is displayed above). All preprints currently open for review are listed here. Outside of the formal open peer-review period we encourage you to tweet about the preprint.
Citation: Please cite this preprint only for review purposes or for grant applications and CVs (if you are the author).
Final version: If our system detects a final peer-reviewed "version of record" (VoR) published in any journal, a link to that VoR will appear below. Readers are then encourage to cite the VoR instead of this preprint.
Settings: If you are the author, you can login and change the preprint display settings, but the preprint URL/DOI is supposed to be stable and citable, so it should not be removed once posted.
Submit: To post your own preprint, simply submit to any JMIR journal, and choose the appropriate settings to expose your submitted version as preprint.
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Large Language Models And Examination Performance In Healthcare Education: A Bibliometric Analysis
ABSTRACT
Background:
Large language models (LLMs) are increasingly used and evaluated in health professions education, including studies assessing model performance on healthcare examination questions. The rapid growth and heterogeneity of this literature make it difficult to track research concentration, collaboration patterns, and emerging themes.
Objective:
To map publication trends, key contributors, collaboration networks, and thematic hotspots in research on LLM-supported exam solving in healthcare education.
Methods:
We conducted a bibliometric analysis of publications from 2023–2025. Searches were performed in PubMed, Scopus, CINAHL Ultimate (EBSCOhost), and Web of Science using structured terms for AI/LLMs (eg, ChatGPT, generative AI, large language models) combined with healthcare education and training concepts. Eligible studies addressed AI-based technologies within healthcare education or training contexts; studies focused solely on clinical practice or non-educational applications were excluded. Bibliographic metadata from PubMed (TXT) and Scopus (BIB) were merged and analyzed using bibliometrix/Biblioshiny (R) and VOSviewer to quantify productivity, collaboration (including international co-authorship), and keyword co-occurrence patterns.
Results:
The dataset comprised 262 documents from 158 sources, with an annual publication growth rate of 36.58% and a mean document age of 1.83 years. A total of 1,351 authors contributed (mean 5.97 co-authors per document); international co-authored publications accounted for 13.36%. Most records were journal articles (253/262), followed by letters (8/262) and one conference paper. Annual output rose from 52 (2023) to 113 (2024; +117.3%), then decreased to 97 (2025; −14.2% vs 2024) while remaining above 2023 levels. JMIR Medical Education published the most articles on this topic (34/262), followed by Scientific Reports (9/262) and BMC Medical Education (7/262). Frequent keywords included “humans” (n=144), “artificial intelligence” (n=82), “generative AI” (n=30), and “large language models” (n=20); education-focused terms such as “educational measurement/methods” were also prominent (n=76).
Conclusions:
Research on LLMs and exam performance in healthcare education expanded rapidly from 2023–2025, with publication activity concentrated in a limited set of journals and relatively low international collaboration. Thematic patterns emphasize assessment-related outcomes and LLM/ChatGPT performance, supporting the need for more comparable, transparent reporting (eg, prompts and model versions) and education-centered outcomes beyond accuracy in future studies. Clinical Trial: /
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.