Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Education

Date Submitted: Aug 13, 2025
Date Accepted: Nov 9, 2025

The final, peer-reviewed published version of this preprint can be found here:

Quantifying Emergency Medicine Residency Learning Curves Using Natural Language Processing: Retrospective Cohort Study

Preiksaitis C, Hughes J, Kabeer R, Dixon W, Rose C

Quantifying Emergency Medicine Residency Learning Curves Using Natural Language Processing: Retrospective Cohort Study

JMIR Med Educ 2025;11:e82326

DOI: 10.2196/82326

PMID: 41364786

PMCID: 12688050

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Quantifying Emergency Medicine Residency Learning Curves Using Natural Language Processing: A Retrospective Cohort Study

  • Carl Preiksaitis; 
  • Joshua Hughes; 
  • Rana Kabeer; 
  • William Dixon; 
  • Christian Rose

ABSTRACT

Background:

The optimal duration of emergency medicine (EM) residency training remains a subject of national debate, with the Accreditation Council for Graduate Medical Education considering standardizing all programs to four years. However, empirical data on how residents accumulate clinical exposure over time are limited. Traditional measures, such as case logs and diagnostic codes, often fail to capture the breadth and depth of diagnostic reasoning. Natural language processing (NLP) of clinical documentation offers a novel approach to quantify clinical experiences more comprehensively.

Objective:

This study aimed to: (1) quantify how EM residents acquire clinical topic exposure over the course of training; (2) evaluate variation in exposure patterns across residents and classes; and (3) assess changes in workload and case complexity over time to inform the discussion on optimal program length.

Methods:

We conducted a retrospective cohort study of EM residents at Stanford Hospital, analyzing 244,255 emergency department encounters from July 1, 2016, to November 30, 2023. The sample included 62 residents across four graduating classes (2020–2023), representing all primary training site encounters where residents served as primary or supervisory providers. Using a retrieval-augmented generation NLP pipeline, we mapped resident clinical documentation to the 895 subcategories of the 2022 Model for Clinical Practice of Emergency Medicine (MCPEM) via intermediate mapping to the SNOMED CT CORE Problem List Subset. We generated cumulative topic exposure curves, quantified the diversity of topic coverage, assessed variability between residents, and analyzed progression in clinical complexity using Emergency Severity Index (ESI) scores and admission rates.

Results:

Residents encountered the largest increase in new topics during postgraduate year 1 (PGY1), averaging 376.7 unique topics (42.1% of MCPEM subcategories). By PGY4, they averaged 565.9 topics (63.2% of MCPEM), representing a 9.9% increase over PGY3. Exposure plateaus generally occurred at 39–41 months, though substantial individual variation was observed, with some residents continuing to acquire new topics until graduation. Annual case volume more than tripled from PGY1 (mean 445.7 encounters) to PGY4 (mean 1,528.4 encounters). Case complexity increased, as evidenced by a decrease in mean ESI score from 2.94 to 2.79 and a rise in high-acuity (ESI 1–2) cases from 16.0% to 30.9%.

Conclusions:

NLP analysis of clinical documentation provides a scalable, detailed method for tracking EM resident clinical exposure and progression. Many residents continue to gain new experiences into their fourth year, particularly with higher-acuity cases. These findings suggest that a four-year training model may offer meaningful additional educational value, while also highlighting the importance of individualized assessment given variability in learning trajectories.


 Citation

Please cite as:

Preiksaitis C, Hughes J, Kabeer R, Dixon W, Rose C

Quantifying Emergency Medicine Residency Learning Curves Using Natural Language Processing: Retrospective Cohort Study

JMIR Med Educ 2025;11:e82326

DOI: 10.2196/82326

PMID: 41364786

PMCID: 12688050

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.