A Comparison of Machine Learning Models for Colon Cancer Survival Estimation:
ABSTRACT
Background:
Colon cancer is a leading cause of cancer-related deaths worldwide, and survival outcomes are influenced by a variety of factors, including biological traits, treatment type, and patient characteristics. Traditional statistical models, such as Kaplan-Meier curves, have been widely used to estimate survival probabilities. However, these models often have difficulty handling complex interactions, covariates, and non-linear relationships between risk factors. Recently, machine learning techniques have emerged as promising tools for improving survival prediction by handling large covariates and capturing complex patterns.
Objective:
This study compares several machine learning (ML) models to accurately estimate colon cancer survival by leveraging data from the Kentucky Cancer Registry (KCR). By identifying key risk factors, these analyses aims to improve risk stratification, treatment planning, and prognosis for overall colon cancer survival and within subgroups.
Methods:
This retrospective study examines registry data to compare a variety of predictive modeling techniques, including Cox proportional hazards, accelerated failure time (AFT) models, random survival forests, Lasso, and Elastic Net, which were applied to the registry data to predict survival probabilities. The models were evaluated based on their predictive accuracy, feature importance, and ability to handle complex risk factors.
Results:
Key covariates influencing survival outcomes, such as age, treatment type, positive nodes, tumor stage, smoking, and comorbidities, were identified as significant predictors of survival. The results highlight the strengths and limitations of each machine learning approach, with the random forest and Lasso models outperforming traditional methods in terms of prediction accuracy and identifying non-linear relationships.
Conclusions:
This comparative analysis offers valuable insights for clinical decision-making and prognosis, highlighting the potential of machine learning to identify risk factors specific to different subgroups, ultimately advancing personalized care for colon cancer patients.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.