JMIR Preprints #93680: Network Analysis-Driven Machine Learning Model: Identifying High-Cost Stroke Inpatients Using Hospital discharge data

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Network Analysis-Driven Machine Learning Model: Identifying High-Cost Stroke Inpatients Using Hospital discharge data

Haohui Shen;
Yilong Yang;
Mengge Zhang;
Jingyi Xiang;
Runan Wang;
Pin Yao

ABSTRACT

Background:

The escalating medical burden associated with stroke poses a substantial challenge, characterized by a skewed distribution wherein a minority of high-cost patients accounts for a disproportionate share of healthcare expenditures. Consequently, the timely and accurate identification of this cohort is paramount for optimizing the quality of care and mitigating unnecessary resource utilization.

Objective:

This study aims to construct a comorbidity network for stroke patients using hospital discharge data, extract topological features characterizing disease interactions, and integrate these features with machine learning algorithms to establish a robust and clinically interpretable framework for the accurate identification of high-cost stroke patients.

Methods:

We conducted a retrospective study using hospital discharge data from 10,301 stroke inpatients at a tertiary hospital in Northeast China between 2021 and 2023. Data from the 2021–2022 period were used to construct two specific networks: the Phenotypic Comorbidity Network (PCN) and the Distance-based Disease Cost Network (DDCN). From these networks, topological features were extracted to capture latent associations between comorbidities and high costs. The 2023 dataset was subsequently partitioned into training and testing sets to develop five machine learning models, including Logistic Regression (LR), Support Vector Machine (SVM), Neural Network (NN), Random Forest (RF), and XGBoost, for the identification of high-cost stroke inpatients. Furthermore, the SHAP method was applied to elucidate both the global and local contributions of the model features.

Results:

The integration of network features significantly improved model performance, with XGBoost exhibiting superior predictive capability (AUC = 0.911). Global feature importance analysis indicated that network features accounted for the majority of the total contribution (52.8%). Specifically, Shortest Distance (SD), length of stay, Normalized High-Cost Propensity (NHCP), age, and insurance type were identified as the top five predictors of high-cost risk. Moreover, SHAP interaction analysis revealed the phasic heterogeneity inherent in patient resource utilization.

Conclusions:

Our comprehensive framework, integrating comorbidity network analysis with machine learning algorithms, significantly enhances the identification of high-cost stroke inpatients. These findings highlight the framework's potential utility in optimizing healthcare resource allocation and enabling proactive cost containment strategies. Clinical Trial: Not applicable

Citation

Please cite as:

Shen H, Yang Y, Zhang M, Xiang J, Wang R, Yao P

Network Analysis-Driven Machine Learning Model: Identifying High-Cost Stroke Inpatients Using Hospital discharge data

JMIR Preprints. 19/02/2026:93680

DOI: 10.2196/preprints.93680

URL: https://preprints.jmir.org/preprint/93680

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Currently submitted to: JMIR Medical Informatics

Date Submitted: Feb 19, 2026

Open Peer Review Period: Mar 4, 2026 - Apr 29, 2026

(currently open for review)

Network Analysis-Driven Machine Learning Model: Identifying High-Cost Stroke Inpatients Using Hospital discharge data

ABSTRACT

Citation

Copyright