Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Jan 22, 2025
Date Accepted: Oct 8, 2025
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Key variances in clinical pathways for prolonged stay identified using machine learning and ePath real-world data: Models Development and Validation Study
ABSTRACT
Background:
Prolonged hospital stays can lead to inefficiencies in healthcare delivery and unnecessary consumption of medical resources.
Objective:
This study aimed to identify key clinical variances associated with prolonged length of stay (PLOS) in clinical pathways using a machine learning model trained on real-world data from the ePath system.
Methods:
We analyzed data from 480 patients with lung cancer (mean age 68.3 ± 11.2 years, 51% men) who underwent video-assisted thoracoscopic surgery (VATS) at a university hospital between 2019 and 2023. PLOS was defined as a hospital stay exceeding nine days post-VATS. The variables collected between admission and postoperative day 4 were examined, and those significantly associated with PLOS in the univariate analysis were selected as predictors. Predictive models were developed using sparse linear regression methods (Lasso, ridge, elastic net) and decision-tree ensembles (random forest, XGBoost). The data were divided into derivation (earlier study period) and testing (later period) cohorts for temporal validation. The model performance was assessed using the area under the receiver operating characteristic curve (AUROC), Brier score, and calibration plots. Counterfactual analysis was used to identify key clinical factors influencing PLOS.
Results:
A three-dimensional heatmap illustrated the temporal relationships between clinical factors and PLOS based on patient demographics, comorbidities, functional status, surgical details, care processes, medications, and variances recorded on postoperative day 4. The ridge regression model achieved the best performance, with AUROC and Brier scores of 0.84 and 0.16, respectively, in the derivation cohort and 0.82 and 0.17, respectively, in the test cohort. The six key variables that increased PLOS risk included abnormal respiratory sounds, postoperative fever, arrhythmia, impaired ambulation, complications after drain removal, and pulmonary air leaks.
Conclusions:
A machine learning-based model using ePath data effectively identified critical variances in the clinical pathways associated with PLOS. This automated tool may enhance clinical decision-making and improve patient management.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.