Accepted for/Published in: JMIR AI
Date Submitted: May 15, 2023
Open Peer Review Period: May 15, 2023 - Jul 10, 2023
Date Accepted: Sep 28, 2023
(closed for review but you can still tweet)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Cross-Validation for Model Development and Evaluation in Healthcare: Practical Considerations and Applied Examples
ABSTRACT
By learning complex statistical relationships from historical data, predictive models enable automated, scalable risk detection and prognostication that can inform clinical decision making. Although relatively few have been implemented into clinical use compared to the number developed, predictive models are increasingly being deployed and tested in clinical trials. The stakes in predictive modeling continue to increase including in their regulation by groups like the U.S. Food and Drug Administration. Efforts to standardize steps in model development and validation include statements like TRIPOD and multiple published guidelines on deployment and governance. But the mode in a critical step in model development, the validation strategy, remains a simple "hold-out" or "test-train split", which has been shown to introduce bias, fail to generalize, and hinder clinical utility. Broadly, validation consists of either internal validation, which should be reported alongside model development, or external validation, in which a developed model is tested in an unseen dataset in a new setting. A newer concept of "internal-external" validation has also been suggested for studies with multi-site data.9 Most published models evaluate performance metrics by splitting the available dataset into an independent “hold-out” or “test set”, consisting of unforeseen samples excluded from model training. Such held-out sets are often selected randomly, e.g., "80% training and 20% testing", from data in the original model development setting. In contrast to hold-out validation, cross-validation and resampling methods like bootstrapping can be used to produce less biased estimates of the true out-of-sample performance (i.e., the ability to generalize to new samples). Although cross-validation is a widely used and extensively studied statistical method, many variations of cross-validation exist with respective strengths and weaknesses, distinct use cases for model development and performance estimation that are often misapplied, and domain-specific considerations necessary for effective healthcare implementation. The intent of this tutorial serves to define and compare means of cross-validation using representative, accessible data based in the well-known and well-studied, MIMIC-III dataset. All cross-validation modeling experiments and pre-processing code will be provided through reproducible notebooks that will further guide readers through the comparisons and concepts introduced. Best practices and common missteps particularly in modeling with electronic healthcare data will be emphasized.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.