Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Apr 5, 2020
Date Accepted: Sep 16, 2020
A Personalized Prognostic Model for Early Invasive Breast Cancer by Machine-Learning Multidimensional Data: A Population-based Cohort Study in China
ABSTRACT
Background:
Current online prognostic prediction models for breast cancer, Adjuvant online and PREDICT, are mainly based on specific populations. They have been well validated and widely used in the United States and Western Europe. However, several validation attempts in non-European countries revealed sub-optimal predictions.
Objective:
We aimed to develop an advanced breast cancer prognosis model for disease progression, cancer-specific mortality, and all-cause mortality by integrating tumor, demographic, and treatment characteristics based on a large breast cancer cohort in China.
Methods:
This study was approved by the Clinical Test and Biomedical Ethics Committee of West China Hospital, Sichuan University at date May 17, 2012. Data collection for this project was started at May 2017 and ended at March 2019. Data on 5,293 women diagnosed with stage I–III invasive breast cancer between 2000 and 2013 were collected. The endpoints were disease progression, cancer-specific mortality, and all-cause mortality, and the likelihood of disease progression or death within a 5-year period was predicted. Machine learning method XGBoost was used to develop the prediction model. The model performance was assessed by calculating the area under the curve (AUC), followed by calibration and comparison with PREDICT.
Results:
The training, test, and validation populations comprised 3,276 (499 progressions, 202 breast cancer-specific deaths, and 261 all-cause deaths within 5-year follow-up), 1,405 (211 progressions, 94 breast cancer-specific deaths, and 129 all-cause deaths), and 612 (109 progressions, 33 breast cancer-specific deaths, and 37 all-cause deaths) women, respectively. The AUCs for disease progression, cancer-specific mortality, and all-cause mortality were 0.76, 0.88, and 0.82 in the training; 0.79, 0.80, and 0.83 in the test; and 0.79, 0.84, and 0.88 in the validation population, respectively. Calibration analysis demonstrated good agreement between the predicted and observed events within 5 years. Comparable AUCs and calibrations were confirmed in subgroups of different ages, residence statuses, and receptor statuses. Compared with PREDICT, our model showed similar AUCs and improved calibrations.
Conclusions:
Our integrative prognostic model exhibits high discrimination and good calibration. It may facilitate prognosis prediction and clinical decision making for Chinese breast cancer patients.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.