JMIR Preprints #76586: Physical Examination Identification in Medical Education Videos: Zero-shot Multimodal AI with Temporal Sequence Optimization

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Physical Examination Identification in Medical Education Videos: Zero-shot Multimodal AI with Temporal Sequence Optimization

Shinyoung Kang;
Michael Holcomb;
David Hein;
Ameer Hamza Shakur;
Thomas O. Dalton;
Andrew Jamieson

ABSTRACT

Background:

Objective Structured Clinical Examinations (OSCEs) are widely used for assessing medical student competency, but their evaluation is resource-intensive, requiring trained evaluators to review 15-minute videos. The physical examination component typically constitutes only a small portion of these recordings, yet current automated approaches struggle with processing long medical videos due to computational constraints and difficulties maintaining temporal context.

Objective:

To determine whether multimodal large language models (MM-LLMs) can effectively segment physical examination periods within OSCE videos without prior training, potentially reducing the evaluation burden on both human graders and automated assessment systems.

Methods:

We analyzed 500 videos from five OSCE stations at UT Southwestern Simulation Center, each 15 minutes long, using hand-labeled physical examination periods as ground truth. MM-LLMs processed video frames at one frame per second, classifying them into discrete activity states. A hidden Markov model with Viterbi decoding ensured temporal consistency across segments, addressing the inherent challenges of frame-by-frame classification.

Results:

Using this combined approach of zero-shot visual classification with Viterbi decoding, GPT-4o achieved 99.8% recall and 78.3% intersection over union (IOU), effectively capturing physical examinations with an average duration of 175 seconds from 900-second videos—an 81% reduction in frames requiring review.

Conclusions:

Integrating zero-shot multimodal large language models with minimal-supervision temporal modeling effectively segments physical examination periods in OSCE videos without requiring extensive training data. This approach significantly reduces review time while maintaining clinical assessment integrity, demonstrating that AI methods combining zero-shot capabilities and light supervision can be optimized for medical education's specific requirements. The technique establishes a foundation for more efficient and scalable clinical skills assessment across diverse medical education settings.

Citation

Please cite as:

Kang S, Holcomb M, Hein D, Shakur AH, Dalton TO, Jamieson A

Physical Examination Identification in Medical Education Videos: Zero-Shot Multimodal AI With Temporal Sequence Optimization Study

JMIR AI 2025;4:e76586

DOI: 10.2196/76586

PMID: 41411647

PMCID: 12757708

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR AI

Date Submitted: Apr 30, 2025

Open Peer Review Period: May 21, 2025 - Jul 16, 2025

Date Accepted: Oct 30, 2025

(closed for review but you can still tweet)

Physical Examination Identification in Medical Education Videos: Zero-shot Multimodal AI with Temporal Sequence Optimization

ABSTRACT

Citation

Copyright