Accepted for/Published in: JMIR Cancer
Date Submitted: Feb 11, 2020
Date Accepted: Jun 18, 2020
Date Submitted to PubMed: Sep 24, 2020
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Incorporating Breast Cancer Recurrence Events into Population-based Cancer Registries using Medical Claims
ABSTRACT
Background:
There is a need for automated, scalable approaches to incorporate information on cancer recurrence events into population-based cancer registries.
Objective:
We aim to develop a new statistical learning algorithm to predict second breast cancer event (SBCE) occurrence and timing using cancer information registry linked with medical claims among women with localized breast cancer diagnosed in the Puget Sound SEER cancer registry (CSS) and treated at Kaiser Permanente Washington (KPWA), formerly Group Health. Since statistical learning algorithms that use only a single tree generally often exhibit suboptimal predictive performance, we sought to improve performance and increase the efficiency of claims-based recurrence identification for population-based cancer registries.
Methods:
We used supervised data from 3,092 stage I and II breast cancer cases (number of recurrences = 394), diagnosed between 1993 and 2006 inclusive, who were patients at Kaiser Permanente Washington and cases in the Puget Sound Cancer Surveillance System (CSS). Our goal was to classify each month after primary treatment as pre- versus post-SBCE. The prediction feature set for a given month consisted of registry variables on disease and patient characteristics related to the primary breast cancer event, as well as features based on monthly counts of diagnosis and procedure codes for the current, prior, and future months. A month was classified as post-SBCE if the predicted probability exceeded a probability threshold (PT); the predicted time of the SBCE was taken to be the month of maximum increase in the predicted probability between adjacent months.
Results:
The Kaplan–Meier net probability of SBCE was 0.25 at 14 years. The month-level ROC curve on test data (20% of the dataset) had an area under the curve of 0.986. The person-level predictions (at a monthly PT of 0.5) had sensitivity=0.89, specificity=0.98, PPV=0.85 and NPV=0.98. Corresponding median difference between the observed and predicted months of recurrence was 0 and mean difference was 0.04 months.
Conclusions:
Data mining of medical claims holds promise of streamlining cancer registry operations to feasibly collect information about second breast cancer events. Clinical Trial: Not applicable
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.