Accepted for/Published in: JMIR Perioperative Medicine
Date Submitted: Jul 16, 2023
Open Peer Review Period: Jul 16, 2023 - Sep 10, 2023
Date Accepted: Sep 29, 2023
(closed for review but you can still tweet)
Temporal Generalizability of Machine Learning Models for Predicting Postoperative Delirium Using Electronic Health Record Data: Models Development and Validation Study
ABSTRACT
Background:
Although machine learning models demonstrate significant potential in predicting postoperative delirium, the advantages of their implementation in real-world settings remain unclear and require a comparison with conventional models in practical applications.
Objective:
The objective of this study was to validate the temporal generalizability of decision tree ensemble and sparse linear regression models for predicting delirium after surgery compared with that of the traditional logistic regression model.
Methods:
The health record data of patients hospitalized at an advanced emergency and critical care medical center in Kumamoto, Japan, were collected electronically. We developed a decision tree ensemble model using eXtreme Gradient Boosting (XGBoost) and a sparse linear regression model using Least Absolute Shrinkage and Selection Operator (LASSO) regression. To evaluate the predictive performance of the model, we used the area under the receiver operating characteristic curve (AUROC) and the Matthews correlation coefficient (MCC) to measure discrimination and the slope and intercept of the regression between predicted and observed probabilities to measure calibration. The Brier score was evaluated as an overall performance metric. We included 11863 consecutive patients who underwent surgery with general anesthesia between December 2017 and February 2022. The patients were divided into a derivation cohort before the COVID-19 (Coronavirus disease 2019) pandemic and a validation cohort during the COVID-19 pandemic. Postoperative delirium was diagnosed according to the Confusion Assessment Method (CAM).
Results:
A total of 6497 patients (68.5±14.4 years, female 40.4%) were included in the derivation cohort, and 5366 patients (67.8±14.6 years, female 39.2%) were included in the validation cohort. Regarding discrimination, the eXtreme Gradient Boosting model (AUROC: 0.87–0.90, MCC: 0.34–0.44) did not significantly outperform the Least Absolute Shrinkage and Selection Operator model (AUROC: 0.86–0.89, MCC: 0.34–0.41). The logistic regression model (AUROC: 0.84–0.88, MCC: 0.33–0.40, slope: 1.01–1.19, intercept: -0.16–0.06, Brier score: 0.06–0.07), with several significant predictors, achieved good predictive performance.
Conclusions:
The eXtreme Gradient Boosting model did not significantly outperform the Least Absolute Shrinkage and Selection Operator model in predicting postoperative delirium. Furthermore, a parsimonious logistic model with a few important predictors achieved comparable performance to machine learning models in predicting postoperative delirium.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.