JMIR Preprints #83918: Prediction of 30-Day All-Cause Hospital Readmissions Using Limited Structured Electronic Health Record Data: Retrospective Comparative Study

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Prediction of 30-Day All-Cause Hospital Readmissions Using Limited Structured Electronic Health Record Data: Retrospective Comparative Study

Ritam Ghosh;
Dariush Khezrimotlagh;
Sara Imanpour;
Ilya Shvartsman

ABSTRACT

Background:

Unplanned hospital readmissions represent a critical operational and financial challenge for healthcare systems in the United States, with 3.8 million 30-day all-cause readmissions in 2018 at an average cost of $15,200 each, totaling $58 billion in costs. Many published prediction models rely on comprehensive information (e.g., full billing abstractions, discharge summaries, labs, and vitals) that becomes available only late in the encounter, limiting usefulness for real-time, in-hospital intervention. This creates a timeliness accuracy trade-off: models that are most accurate retrospectively may arrive too late to act upon.

Objective:

This study tests the central hypothesis that a clinically meaningful predictive signal for 30-day all-cause readmission is present within the minimal, structured data available at the beginning of a patient’s hospital stay. This approach addresses the critical trade-off between predictive accuracy and the timeliness required for actionable intervention,

Methods:

We conducted a retrospective comparative modeling study using a large, de-identified Electronic Health Record (EHR) cohort of 50,000 inpatient encounters. Two feature sets were constructed: (1) a Limited set simulating an early-encounter view (first five International Classification of Diseases (ICD) and five Current Procedural Terminology (CPT) codes + Charlson Comorbidity Index [CCI]) and (2) a Rich set using all available ICD/CPT codes + CCI. We trained four models, Random Forest, CatBoost, Multi-Layer Perceptron (MLP), and DistilBERT (structured codes mapped to text and tokenized with distilbert-base-uncased). Evaluation used an untouched hold-out set. Primary metrics were Area under the receiver operating characteristic curve (AUC-ROC), Area under the precision recall curve (PR-AUC), F1, accuracy, and calibration. To address class imbalance, the training split only was balanced via undersampling of the majority class and bootstrap oversampling of the minority class; validation/test distributions were left unchanged.

Results:

Across three of four architectures, models trained on the Limited feature set matched, or modestly exceeded, the discrimination of their Rich counterparts, indicating that early-encounter data can be competitively predictive. For example, Random Forest achieved AUC 0.5596 (Limited) vs 0.5541 (Rich), and MLP achieved AUC 0.5386 (Limited) vs 0.5287 (Rich). Differences across architectures were small in absolute terms, with threshold-dependent metrics (e.g., F1) similarly comparable.

Conclusions:

Minimal admission-time coding data (ICD/CPT) augmented with CCI can provide timely and competitive performance for 30-day readmission prediction. Focusing on the quality and accessibility of early-encounter data enables real-time risk stratification and supports a shift from reactive, post-discharge analysis to proactive, in-hospital resource management. These findings motivate early-warning clinical decision-support tools that prioritize timeliness without incurring a substantial loss in accuracy.

Citation

Please cite as:

Ghosh R, Khezrimotlagh D, Imanpour S, Shvartsman I

Prediction of 30-Day All-Cause Hospital Readmissions Using Limited Structured Electronic Health Record Data: Retrospective Comparative Study

JMIR Form Res 2026;10:e83918

DOI: 10.2196/83918

PMID: 42172660

PMCID: 13197155

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Formative Research

Date Submitted: Sep 10, 2025

Open Peer Review Period: Sep 10, 2025 - Nov 5, 2025

Date Accepted: Apr 30, 2026

(closed for review but you can still tweet)

Prediction of 30-Day All-Cause Hospital Readmissions Using Limited Structured Electronic Health Record Data: Retrospective Comparative Study

ABSTRACT

Citation

Copyright