JMIR Preprints #82116: AI-Driven OSCE Generation in Digital Health Education: Comparative Analysis of Three GPT-4o Configurations

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

AI-Driven OSCE Generation in Digital Health Education: Comparative Analysis of Three GPT-4o Configurations

Zineb Zouakia;
Emmanuel Logak;
Alan Szymczak;
Jean-Philippe Jais;
Anita Burgun;
Rosy Tsopra

ABSTRACT

Background:

Objective Structured Clinical Examinations (OSCE) are used as an evaluation method in medical education, but require significant pedagogical expertise and investment, especially in emerging fields like digital health. Large Language Models (LLMs), such as ChatGPT, have shown potential in automating educational content generation. However, OSCE generation using LLMs remains underexplored.

Objective:

This study evaluates three GPT-4o configurations for generating OSCE stations in digital health: (1) Standard GPT with a simple prompt and OSCE guidelines; (2) Personalized GPT with a simple prompt, OSCE guidelines, and a reference book in digital health; and (3) Simulated-Agents GPT with a structured prompt simulating specialized OSCE agents and the digital health reference book.

Methods:

Twenty-four OSCE stations were generated across 8 digital health topics with each GPT-4o configuration. Format compliance was evaluated by one expert, while educational content was assessed independently by two digital health experts, blindly of GPT-4o configurations, using a comprehensive assessment grid. Statistical analyses were performed using Kruskal-Wallis tests.

Results:

Simulated-Agents GPT performed best in format compliance and most content quality criteria, including accuracy (mean 4.47/5, P=.012), clarity (mean 4.46/5, P=.004). It also had 88% for usability without major revisions and first-place preference ranking, outperforming the other configurations. Personalized GPT showed the lowest format compliance, while Standard GPT scored lowest for clarity and educational value.

Conclusions:

Structured prompting strategies, particularly agents simulation, enhance the reliability and usability of LLM-generated OSCE content. These findings offer practical guidance for integrating artificial intelligence into medical education, while highlighting the continued need for expert validation.

Citation

Please cite as:

Zouakia Z, Logak E, Szymczak A, Jais JP, Burgun A, Tsopra R

AI-Driven Objective Structured Clinical Examination Generation in Digital Health Education: Comparative Analysis of Three GPT-4o Configurations

JMIR Med Educ 2026;12:e82116

DOI: 10.2196/82116

PMID: 41539673

PMCID: 12856406

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Education

Date Submitted: Aug 10, 2025

Date Accepted: Dec 13, 2025

AI-Driven OSCE Generation in Digital Health Education: Comparative Analysis of Three GPT-4o Configurations

ABSTRACT

Citation

Copyright