JMIR Preprints #91675: Large Language Models in German Continuing Medical Education Assessment: Fully Crossed Experimental Study Protocol

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Large Language Models in German Continuing Medical Education Assessment: Fully Crossed Experimental Study Protocol

Leyla Özmen;
Timur Sellmann;
Christian Burisch;
Daniel Gödde;
Frank Breuckmann;
Jan Ehlers

ABSTRACT

Background:

Continuing Medical Education (CME) is a legal and ethical obligation for physicians in Germany. The rapid rise of large language models (LLMs) such as ChatGPT, Gemini, Claude, and Grok raises concerns about the integrity of CME assessments, as LLMs can already pass German CME tests.

Objective:

To determine whether the choice of document format (searchable PDF, raster PDF, vector PDF) and LLM can influence the solvability of CME test questions by LLMs above the passing threshold specified for each CME module (typically 70%).

Methods:

In a fully crossed within-subjects repeated-measures structure, 18 expired CME articles from three major German publishers across six specialties will be converted into three PDF formats and processed by four current LLMs (ChatGPT-5, Mistral 3.1 small, Claude Sonnet 4, Grok-4) and two predecessor versions (ChatGPT-4o and Grok-3). Each model will answer every article once per file-format condition. This results in 18 experimental conditions. The primary outcome is the proportion of correctly answered questions; secondary outcomes are pass/fail rate and efficiency. The study has been approved by the University of Witten/Herdecke Ethics Committee (reference number S-260/2025, dated 08.10.2025) and is preregistered at the Open Science Framework (DOI: 10.17605/OSF.IO/V96R5).

Results:

Data collection will start in January 2026 and will last approximately 4 weeks. As of December 2025, the study has been preregistered, and no results are available yet. The analyses will quantify performance differences across document formats and model generations; these findings may inform the feasibility of non-searchable document formats as a temporary measure to reduce AI-enabled cheating risks in CME contexts.

Conclusions:

By quantifying how document format constrains LLM performance, this study aims to evaluate simple technical safeguards that may reduce AI-assisted manipulation of CME tests and inform regulators and CME providers on balancing assessment validity, accessibility, and responsible LLM integration into postgraduate medical education. Clinical Trial: Open Science Framework DOI: 10.17605/OSF.IO/V96R5.

Citation

Please cite as:

Özmen L, Sellmann T, Burisch C, Gödde D, Breuckmann F, Ehlers J

Large Language Models in German Continuing Medical Education Assessment: Fully Crossed Experimental Study Protocol

JMIR Preprints. 18/01/2026:91675

DOI: 10.2196/preprints.91675

URL: https://preprints.jmir.org/preprint/91675

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Currently submitted to: JMIR Research Protocols

Date Submitted: Jan 18, 2026

Open Peer Review Period: Jan 19, 2026 - Mar 16, 2026

(closed for review but you can still tweet)

NOTE: This is an unreviewed Preprint

Large Language Models in German Continuing Medical Education Assessment: Fully Crossed Experimental Study Protocol

ABSTRACT

Citation

Copyright