Currently submitted to: JMIR Medical Informatics
Date Submitted: Jun 3, 2026
Open Peer Review Period: Jun 17, 2026 - Aug 12, 2026
(currently open for review)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Predicting Delayed Hemodynamic Deterioration in Initially Stable NSTEMI Patients: Development, Calibration, and Cross-Database Transportability of a Machine Learning Model
ABSTRACT
Background:
Patients with non-ST-elevation myocardial infarction (NSTEMI) are frequently triaged as stable at first emergency department (ED) contact; however, a subset of patients deteriorates into circulatory or respiratory failure within hours. High-sensitivity troponin pathways confirm infarction but are not designed to forecast delayed collapse, leaving a decision-support gap that data-driven tools could address.
Objective:
We aimed to develop, calibrate, and interpret a machine learning model that identifies initially stable NSTEMI patients at risk of early hemodynamic deterioration (“cryptic shock”) and to quantify how well such a model transports to a structurally different critical care database.
Methods:
In this retrospective dual-database study, the Medical Information Mart for Intensive Care IV (MIMIC-IV) database was used for model development and internal validation, and the eICU Collaborative Research Database was used for transportability assessment. Adults admitted with NSTEMI who received vasopressors or invasive ventilation within 2 h were classified as overtly unstable and excluded. Cryptic shock was defined as new vasopressor support and/or invasive ventilation initiated between 2 and 24 h after admission. The predictors available early in the admission were age, sex, admission troponin, creatinine, Glasgow Coma Scale (GCS)-derived neurologic severity proxy, Acute Physiology and Chronic Health Evaluation (APACHE) II, and Simplified Acute Physiology Score (SAPS) II proxies. Missing values were handled with multiple imputation by chained equations (for logistic regression and random forest) or native sparsity-aware splitting (for extreme gradient boosting [XGBoost]), which were applied within cross-validation folds to prevent leakage. Models were trained with stratified 5-fold cross-validation; discrimination was reported with bootstrap 95% confidence intervals (CIs), and calibration, decision-curve analysis, Shapley additive explanations (SHAP), and DeLong comparisons against single severity scores were performed. The Partial National Early Warning Score 2 (NEWS2) and Modified Early Warning Score (MEWS) were computed from recorded eICU vital signs for external benchmarking.
Results:
Of the 4846 MIMIC-IV NSTEMI admissions, 311 met the overt early instability criteria; among the 4535 initially stable patients, 410 (9.0%) developed cryptic shock. These patients had substantially higher in-hospital mortality (27.8% vs. 7.3%), and ICU admission was recorded in all patients meeting the cryptic shock endpoint. Internal discrimination was high (area under the receiver operating characteristic curve [AUROC]: random forest 0.905, 95% CI 0.893-0.916; XGBoost 0.897, 95% CI 0.885-0.909; logistic regression 0.892, 95% CI 0.878-0.906). Random forest significantly outperformed the strongest single comparator, the APACHE II proxy (AUROC 0.872; DeLong P<.001), and far exceeded troponin alone (AUROC 0.554; P<.001). After isotonic recalibration, the random forest Brier score improved from 0.120 to 0.061 (calibration slope, 0.86), and decision curve analysis showed a positive net benefit across threshold probabilities of approximately 3% to 20%. Transportability to eICU was limited (broad-proxy AUROC 0.65 for logistic regression and near chance for tree models). In the subset where NEWS2 was computable (n=3903), the model modestly exceeded NEWS2 (AUROC 0.67 vs. 0.64, respectively).
Conclusions:
Routinely available early variables identified initially stable NSTEMI patients at risk of imminent deterioration with strong internal discrimination, good post-recalibration calibration, and decision curve evidence of potential clinical usefulness in the derivation setting. The loss of transportability was most plausibly related to endpoint mismatch, database-specific severity score scaling, and differences in data capture, underscoring the need for harmonized, time-stamped external evaluation before clinical deployment.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.