Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Aug 3, 2023
Date Accepted: Oct 16, 2024
Evaluating the longitudinal model shift of Machine Learning-based clinical risk prediction models: a study on multiple use cases across different hospitals
ABSTRACT
Background:
In recent years, machine learning-based models are widely used in clinical domains that can predict clinical risk events. However, the performances of such models heavily rely on the data used for training and evaluation. Data shift, characterized by differences between real-world data distribution and the distribution of training and testing data, has significant implications for prediction models, leading to performance degradation and reduced clinical efficacy. Thus, monitoring data shifts and evaluating their impact on prediction models is of utmost importance.
Objective:
This study aims to assess the impact of data shifts on machine learning-based prediction models. We generalize our findings by evaluating three different use cases from two hospitals with different patient populations. Additionally, we investigate potential model deterioration during the COVID-19 pandemic period.
Methods:
We train prediction models using retrospective data from earlier years and examine the presence of data shifts and their impact on the models using data from more recent years. We use area under receiver operating characteristic curve (AUROC) as the metric to evaluate the model performance and analyse the calibration curves over time. We also assess the influence on clinical decisions by evaluating the alert rate and the rates of over and under-diagnosis.
Results:
Significant data shifts are observed when using data from earlier years. However, when training our models using more recent years, we did not observe substantial data shifts in our investigations, and the AUROC of the prediction models remained stable. Nevertheless, drifts were observed for the delirium and sepsis use cases when evaluating the calibration curves at two hospitals. Additionally, different patterns are observed regarding the changes in the alert rate and overdiagnosis rate between both hospitals. Importantly, we did not observe any model deteriorations during the COVID-19 pandemic period, the prediction models did not cause a notable surge in alerts.
Conclusions:
Clinical data undergoes continuous changes due to evolving clinical practices and workflows, which directly impact the predictions generated by clinical risk prediction models. Although model performances appear stable when assessed using AUROC, the presence of model drift becomes evident when alternative evaluation metrics are employed post-training. Consequently, it becomes crucial to closely monitor data changes and detect data shifts, along with their potential influence on the predictions generated by these models.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.