Machine Learning for Prediction of Procedural Case Durations Developed Using a Large Multicenter Database: Algorithm Development and Validation
ABSTRACT
Background:
Accurate projections of procedural case durations are complex, but critical to planning of perioperative staffing, operating room resources, and patient communication. Nonlinear prediction models using machine learning methods may provide opportunities for hospitals to improve upon current estimates of procedure duration.
Objective:
We hypothesized a machine learning algorithm derived from a large multicenter dataset would more accurately predict surgical procedure duration when compared to a baseline linear regression approach. Using an explainable machine learning-based algorithm, results provide additional valuable insight regarding procedure duration and variability.
Methods:
A total of 1,177,893 procedures from 13 academic and private hospitals between 2016 and 2019 were used. Deep learning, gradient boosting, and ensemble machine learning models were generated using perioperative data available at three distinct time points: time of scheduling, time of arrival to the operating/procedure room (primary model), and time of surgical incision/procedure start. The primary outcome was procedure duration, defined by the time between arrival and departure of the patient from the procedure room. Model performance was assessed by mean absolute error, proportion of predictions within 20% of actual duration, and other standard metrics. Performance was compared to a baseline method of historical means within a linear regression model. Model features driving predictions were assessed using Shapley values and permutation feature importance.
Results:
Across all procedures, median procedure duration was 94 minutes (interquartile range of 50-167 minutes). In estimating procedure duration, the gradient boosting machine was the best performing model, demonstrating a mean absolute error of 34 minutes with 46% of predictions within 20% of actual duration in the test dataset. This represented a statistically and clinically significant improvement in predictions compared to a baseline linear regression model (43 minutes, p < 0.001; 39% of predictions within 20% of actual duration). The most important features in model training were historical procedure duration by surgeon, the word “free” within the procedure text, and time of day.
Conclusions:
Nonlinear models using machine learning techniques may be used to generate high-performing, automatable, explainable, and scalable prediction models for procedure duration. Medi
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.