JMIR Preprints #82026: Multi-Evidence Clinical Reasoning with Retrieval-Augmented Generation (MECR-RAG) for Emergency Triage: Retrospective Evaluation Study

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Multi-Evidence Clinical Reasoning with Retrieval-Augmented Generation (MECR-RAG) for Emergency Triage: Retrospective Evaluation Study

Hang Sheung Wong;
Tsz Kwan Wong

ABSTRACT

Background:

Emergency triage accuracy is vital yet varies significantly due to factors like clinical experience, cognitive load, and symptom complexity. Inaccuracies can lead to critical consequences, including preventable morbidity, mortality, or resource misallocation. Large language models (LLMs) have shown potential in clinical decision-making but risk generating inaccurate outputs. Retrieval-augmented generation (RAG) systems dynamically retrieve and incorporate external authoritative information to enhance LLM reliability. Previous studies on LLM and emergency triage have typically relied on structured datasets, textbook-derived inputs, or lacked independently adjudicated ground truth, limiting external validity.

Objective:

To evaluate whether a dual-source RAG system integrating procedural and experiential clinical knowledge improves the accuracy and consistency of emergency triage classification compared to baseline LLMs.

Methods:

We developed and evaluated a novel dual-source RAG architecture—Multi-Evidence Clinical Reasoning RAG (MECR-RAG)—that combines the Hong Kong Accident and Emergency Triage Guidelines (HKAETG) with a structured database of 3,000 real-world triage cases from 2024. The system, implemented using DeepSeek-V3, was retrospectively assessed on 236 real clinical triage records sampled across a calendar year. Gold-standard labels were assigned through blinded consensus by senior triage nurses. Model performance was benchmarked against a prompt-only LLM baseline and evaluated using quadratic weighted kappa (QWK), accuracy, and triage group–specific classification metrics including precision, recall and F1 score.

Results:

MECR-RAG achieved a mean QWK of 0.902 (95% CI: 0.901–0.904) and mean accuracy of 0.802 (95% CI: 0.795–0.808), significantly outperforming the baseline LLM (QWK = 0.801; accuracy = 0.542; P<.001). Its agreement was non-inferior to professional raters (QWK = 0.887). The MECR-RAG system achieved an overall F1 score of 0.860, reduced overtriage from 28.8% to 12.7%, and slightly lowered undertriage from 1.7% to 1.3%. The greatest performance gains were observed in Categories 3 and 4, which are the most diagnostically ambiguous and operationally impactful tiers.

Conclusions:

MECR-RAG demonstrates expert-comparable triage accuracy by integrating the triage guideline with case-based reasoning. This study is the first to evaluate a dual-source RAG-enhanced LLM on real triage documentation with expert consensus labels, offering a methodologically rigorous and clinically grounded approach to decision support in emergency medicine.

Citation

Please cite as:

Wong HS, Wong TK

Multi-Evidence Clinical Reasoning With Retrieval-Augmented Generation for Emergency Triage: Retrospective Evaluation Study

JMIR Med Inform 2026;14:e82026

DOI: 10.2196/82026

PMID: 41587455

PMCID: 12887567

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Aug 7, 2025

Open Peer Review Period: Aug 11, 2025 - Oct 6, 2025

Date Accepted: Dec 29, 2025

(closed for review but you can still tweet)

Multi-Evidence Clinical Reasoning with Retrieval-Augmented Generation (MECR-RAG) for Emergency Triage: Retrospective Evaluation Study

ABSTRACT

Citation

Copyright