Currently submitted to: JMIR AI
Date Submitted: May 6, 2026
Open Peer Review Period: May 12, 2026 - Jul 7, 2026
(currently open for review)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Generative AI Interfaces for Emergency Department Volume Forecasting and Fiscal-Year Proximal Forecasting: Longitudinal Observational Study
ABSTRACT
Background:
Forecasting emergency department (ED) arrivals is foundational to operational decision-making, including clinician scheduling, bed management, throughput initiatives, and revenue forecasting. The historical baseline for many health systems has been expert consensus combined with prior-year volume—a process that is often opaque, variably calibrated, and limited in its ability to accommodate seasonality and structural disruption.
Objective:
To compare the accuracy of forecasts produced by commonly available generative AI interfaces with a Holt–Winters (HW) exponential smoothing baseline, and to determine whether interactive updating (“proximal forecasting”) identifies a pragmatic recency window for fiscal-year planning.
Methods:
We conducted a two-part observational forecasting study using monthly ED visit volumes from four hospitals. 12-month forecasts were generated using 12- and 24-month historical windows and compared across HW and five large language model (LLM) interfaces (ChatGPT, Claude, Copilot, Perplexity, Gemini) using standardized prompts and adjusting for seasonality. We then generated FY2025 forecasts using a fixed training window and systemically varying the end month of available data to simulate a budget-cycle lag. Forecast accuracy was measured using mean absolute percentage error (MAPE) and root mean square error (RMSE).
Results:
Forecasts produced by one LLM interface (ChatGPT 4.0) demonstrated accuracy comparable to HW (MAPE 3.63±1.64 vs 3.26±0.92), whereas other interfaces showed higher error and greater variability (Perplexity 5.39±1.88; Copilot 5.15±1.77; Claude 8.14±3.19; Gemini 4.27±1.31). No approach met a predefined threshold in the pediatric ED. In adult EDs with typical seasonality, forecasts generated up to four months from the fiscal-year start were comparable to fiscal-year–proximal forecasts (mean MAPE difference 0–0.4%). Pediatric and highly seasonal sites demonstrated higher baseline error and recency bias susceptibility.
Conclusions:
With adequate historical context, an LLM interface can produce monthly ED volume forecasts with accuracy similar to Holt-Winters exponential smoothing and enable “proximal forecasting” workflows for fiscal-year planning. Strongly seasonal and pediatric settings may require tailored approaches.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.