Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Jan 22, 2025
Open Peer Review Period: Jan 22, 2025 - Feb 6, 2025
Date Accepted: May 1, 2025
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Patient Triage and Guidance in Emergency Departments Using Large Language Models: Multimetric Study

Wang C, Wang F, Li S, Ren Qw, Tan X, Fu Y, Liu D, Qian G, Cao Y, Yin R, Li K

Patient Triage and Guidance in Emergency Departments Using Large Language Models: Multimetric Study

J Med Internet Res 2025;27:e71613

DOI: 10.2196/71613

PMID: 40374171

PMCID: 12123234

Patient triage and guidance in emergency departments using Large Language Models: Multimetric Assessment

  • Chenxu Wang; 
  • Fei Wang; 
  • Shuhan Li; 
  • Qing-wen Ren; 
  • Xiaomei Tan; 
  • Yaoyu Fu; 
  • Di Liu; 
  • Guangwu Qian; 
  • Yu Cao; 
  • Rong Yin; 
  • Kang Li

ABSTRACT

Background:

Emergency departments (EDs) face significant challenges due to overcrowding, prolonged waiting times, and staffing shortages, leading to increased strain on healthcare systems. Efficient triage systems and accurate departmental guidance are critical to alleviating these pressures. Recent advancements in Large Language Models (LLMs), such as ChatGPT, offer potential solutions for improving patient triage and outpatient department selection in emergency settings.

Objective:

The study aims to assess the accuracy, consistency, and feasibility of GPT-4 based ChatGPT models (GPT-4o and GPT-4-Turbo) for patient triage using the Modified Early Warning Score (MEWS) and evaluate GPT-4o’s ability to provide accurate outpatient department guidance based on simulated patient scenarios.

Methods:

A two-phase experimental study was conducted. In phase one, two ChatGPT models (GPT-4o and GPT-4-Turbo) were evaluated for MEWS-based patient triage accuracy using 1,854 simulated patient scenarios. Accuracy and consistency were assessed before and after prompt engineering. In phase two, GPT-4o was tested for outpatient department selection accuracy using 264 scenarios sourced from the Chinese Medical Case Repository. Each scenario was independently evaluated by GPT-4o three times. Data analyses included Wilcoxon tests, Kendall correlation coefficients, and logistic regression.

Results:

In the first phase, ChatGPT’s triage accuracy, based on the MEWS, improved following prompt engineering. Interestingly, GPT-4-Turbo outperformed GPT-4o, achieving an accuracy of 100% compared to GPT-4o's 96.2%, despite GPT-4o initially showing better performance prior to prompt engineering, suggesting GPT-4-Turbo may be more adaptable to prompt optimizations. In the second phase, GPT-4o, with a superior performance on emotional responsiveness compared to GPT-4-Turbo, demonstrated an overall guidance accuracy of 92.63% (95% CI, 90.34%, 94.93%), with the highest accuracy in internal medicine (93.51%, [95% CI, 90.85%, 96.17%]). and the lowest in general surgery (91.46%, [95% CI, 86.50%, 96.43%]).

Conclusions:

ChatGPT demonstrates promising capability for supporting patient triage and outpatient guidance in EDs. GPT-4-Turbo showed greater adaptability to prompt engineering, whereas GPT-4o exhibited superior responsiveness and emotional interaction, essential for patient-facing tasks. Future studies should explore real-world implementation and address identified limitations to enhance ChatGPT’s clinical integration.


 Citation

Please cite as:

Wang C, Wang F, Li S, Ren Qw, Tan X, Fu Y, Liu D, Qian G, Cao Y, Yin R, Li K

Patient Triage and Guidance in Emergency Departments Using Large Language Models: Multimetric Study

J Med Internet Res 2025;27:e71613

DOI: 10.2196/71613

PMID: 40374171

PMCID: 12123234

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.