Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Sep 8, 2021
Open Peer Review Period: Sep 8, 2021 - Nov 3, 2021
Date Accepted: Jan 2, 2022
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

The Application and Comparison of Machine Learning Models for the Prediction of Breast Cancer Prognosis: Retrospective Cohort Study

Xiao J, Mo M, Wang Z, Zhou C, Shen J, Yuan J, He Y, Zheng Y

The Application and Comparison of Machine Learning Models for the Prediction of Breast Cancer Prognosis: Retrospective Cohort Study

JMIR Med Inform 2022;10(2):e33440

DOI: 10.2196/33440

PMID: 35179504

PMCID: 8900909

Machine Learning Models for the Prediction of Breast Cancer Prognostic: Application and Comparison Based on a Retrospective Cohort Study

  • Jialong Xiao; 
  • Miao Mo; 
  • Zezhou Wang; 
  • Changming Zhou; 
  • Jie Shen; 
  • Jing Yuan; 
  • Yulian He; 
  • Ying Zheng

ABSTRACT

Background:

Over recent years, machine learning (ML) methods have been increasingly explored in cancer prognosis prediction because of the appearance of improved machine learning algorithms. These algorithms can use censored data for modeling, such as support vector machines (SVM) for survival analysis and random survival forest (RSF). However, it is still debated whether traditional (Cox proportional hazard regression) or ML-based prognostic prediction models have better predictive performance.

Objective:

This study aims to use the machine learning algorithms to predict the survival of breast cancer and compare the predictive performance with the traditional Cox regression.

Methods:

This retrospective cohort study included all patients diagnosed with breast cancer and subsequently hospitalized in Fudan University Shanghai Cancer Center (FUSCC) between January 1, 2008 and December 31, 2016. A total of 25267 cases with 21 features were eligible for model development, and the data set was randomly split into a train set (70%) and a test set (30%) for developing four models and predicting overall survival in breast cancer patients. The discriminative ability of models was evaluated by the concordance index (C-index) and the time-dependent area under the curve (AUC); the calibration ability of models was evaluated by the Brier score.

Results:

The RSF model revealed the best discriminative performance among the four models with 3-year, 5-year and 10-year time-dependent AUC of 0.857, 0.838 and 0.781, respectively and C-index of 0.827 (0.809, 0.845), which significantly outperformed the Cox-EN model (0.816, p=0.007), the Cox model (0.814, p=0.003) and the SVM model (0.812, p<0.001). The four models' 3-year, 5-year, and 10-year brier scores were very close, ranging from 0.027 to 0.094, which meant all models had good calibration. In the context of feature importance, elastic net and RSF both indicated that TNM staging, neoadjuvant therapy, number of lymph node metastases, age, and tumor diameter were the top 5 important features for predicting the prognosis of breast cancer. A final online tool was developed to predict the overall survival of breast cancer patients.

Conclusions:

RSF model slightly outperformed the other models on discriminative ability, revealing the great potential to be used as an effective approach for survival analysis. Clinical Trial: ClinicalTrials. gov, registration number: NCT04996732.


 Citation

Please cite as:

Xiao J, Mo M, Wang Z, Zhou C, Shen J, Yuan J, He Y, Zheng Y

The Application and Comparison of Machine Learning Models for the Prediction of Breast Cancer Prognosis: Retrospective Cohort Study

JMIR Med Inform 2022;10(2):e33440

DOI: 10.2196/33440

PMID: 35179504

PMCID: 8900909

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.