JMIR Preprints #82326: Quantifying Emergency Medicine Residency Learning Curves Using Natural Language Processing: A Retrospective Cohort Study

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Quantifying Emergency Medicine Residency Learning Curves Using Natural Language Processing: A Retrospective Cohort Study

Carl Preiksaitis;
Joshua Hughes;
Rana Kabeer;
William Dixon;
Christian Rose

ABSTRACT

Background:

The optimal duration of emergency medicine (EM) residency training remains a subject of national debate, with the Accreditation Council for Graduate Medical Education considering standardizing all programs to four years. However, empirical data on how residents accumulate clinical exposure over time are limited. Traditional measures, such as case logs and diagnostic codes, often fail to capture the breadth and depth of diagnostic reasoning. Natural language processing (NLP) of clinical documentation offers a novel approach to quantify clinical experiences more comprehensively.

Objective:

This study aimed to: (1) quantify how EM residents acquire clinical topic exposure over the course of training; (2) evaluate variation in exposure patterns across residents and classes; and (3) assess changes in workload and case complexity over time to inform the discussion on optimal program length.

Methods:

We conducted a retrospective cohort study of EM residents at Stanford Hospital, analyzing 244,255 emergency department encounters from July 1, 2016, to November 30, 2023. The sample included 62 residents across four graduating classes (2020–2023), representing all primary training site encounters where residents served as primary or supervisory providers. Using a retrieval-augmented generation NLP pipeline, we mapped resident clinical documentation to the 895 subcategories of the 2022 Model for Clinical Practice of Emergency Medicine (MCPEM) via intermediate mapping to the SNOMED CT CORE Problem List Subset. We generated cumulative topic exposure curves, quantified the diversity of topic coverage, assessed variability between residents, and analyzed progression in clinical complexity using Emergency Severity Index (ESI) scores and admission rates.

Results:

Residents encountered the largest increase in new topics during postgraduate year 1 (PGY1), averaging 376.7 unique topics (42.1% of MCPEM subcategories). By PGY4, they averaged 565.9 topics (63.2% of MCPEM), representing a 9.9% increase over PGY3. Exposure plateaus generally occurred at 39–41 months, though substantial individual variation was observed, with some residents continuing to acquire new topics until graduation. Annual case volume more than tripled from PGY1 (mean 445.7 encounters) to PGY4 (mean 1,528.4 encounters). Case complexity increased, as evidenced by a decrease in mean ESI score from 2.94 to 2.79 and a rise in high-acuity (ESI 1–2) cases from 16.0% to 30.9%.

Conclusions:

NLP analysis of clinical documentation provides a scalable, detailed method for tracking EM resident clinical exposure and progression. Many residents continue to gain new experiences into their fourth year, particularly with higher-acuity cases. These findings suggest that a four-year training model may offer meaningful additional educational value, while also highlighting the importance of individualized assessment given variability in learning trajectories.

Citation

Please cite as:

Preiksaitis C, Hughes J, Kabeer R, Dixon W, Rose C

Quantifying Emergency Medicine Residency Learning Curves Using Natural Language Processing: Retrospective Cohort Study

JMIR Med Educ 2025;11:e82326

DOI: 10.2196/82326

PMID: 41364786

PMCID: 12688050

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Education

Date Submitted: Aug 13, 2025

Date Accepted: Nov 9, 2025

Quantifying Emergency Medicine Residency Learning Curves Using Natural Language Processing: A Retrospective Cohort Study

ABSTRACT

Citation

Copyright