Currently accepted at: JMIR Medical Informatics
Date Submitted: Sep 11, 2025
Open Peer Review Period: Sep 25, 2025 - Nov 20, 2025
Date Accepted: Dec 22, 2025
(closed for review but you can still tweet)
This paper has been accepted and is currently in production.
It will appear shortly on 10.2196/83318
The final accepted version (not copyedited yet) is in this tab.
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Development and Comparative Evaluation of Three Artificial Intelligence Models (NLP, LLM, JEPA) for Predicting Triage in Emergency Departments: A 7-Month Retrospective Proof-of-Concept
ABSTRACT
Triage errors, including undertriage and overtriage, are persistent challenges in emergency departments (EDs). With increasing patient influx and staff shortages, the integration of artificial intelligence (AI) into triage protocols has gained attention. This study compares the performance of three AI models: Natural Language Processing (NLP), Large Language Models (LLM), and Joint Embedding Predictive Architecture (JEPA) to predict triage outcomes against the FRENCH scale and clinical practice. We conducted a retrospective analysis of a prospectively recruited cohort based on adult patient triage data over a 7-month period at Roger Salengro Hospital ED (Lille, France). Three AI models were trained and validated: (1) TRIAGEMASTER (NLP), (2) URGENTIAPARSE (LLM) and (3) EMERGINET (JEPA). Data included demographic details, verbatim chief complaints, vital signs, and triage outcomes based on the FRENCH scale and GEMSA coding. The primary outcome was the concordance of the AI-predicted triage level with the French gold standard. It was assessed thanks to various indicators: F1-Score, Weighted Kappa, Spearman, MAE, RMSE, AUC-ROC. The LLM model (URGENTIAPARSE) showed higher accuracy (composite score: 2.514) compared to JEPA (EMERGINET, 0.438) and NLP (TRIAGEMASTER, -3.511), outperforming nurse triage (-4.343). This observation is reinforced by the F1-Score and AUC-ROC : 0.900 and 0.879 for URGENTIAPARSE; 0.731 and 0.686 for EMERGINET; 0.618 and 0.642 for TRIAGEMASTER; respectively 0.303 and 0.776 for nurse triage. Secondary analyses highlighted the effectiveness of URGENTIAPARSE in predicting hospitalization needs (GEMSA) and its robustness with structured data versus raw transcripts (either for GEMSA prediction or for FRENCH prediction). LLM architecture, through abstraction of patient representations, offers the most accurate triage predictions among tested models. Integrating AI into ED workflows could enhance patient safety and operational efficiency, though integration into clinical workflows requires addressing model limitations and ensuring ethical transparency.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.