Accepted for/Published in: JMIR Cancer
Date Submitted: Apr 8, 2024
Open Peer Review Period: Apr 11, 2024 - Jun 6, 2024
Date Accepted: Dec 30, 2024
(closed for review but you can still tweet)
Assessing the Data Quality Dimensions of Partial and Complete Mastectomy Cohorts in the All of Us Research Program: A Cross-Sectional Study
ABSTRACT
Background:
Breast cancer is prevalent among women in the United States. Non-metastatic disease is treated by partial or complete mastectomy procedures. However, the rates of those procedures vary across practices. Generating real-world evidence on breast cancer surgery could lead to improved and consistent practices.
Objective:
The paper aims to determine whether All of Us data are fit for use in generating real-world evidence on mastectomy procedures.
Methods:
Our mastectomy phenotype consisted of adult female participants who had CPT4 or SNOMED codes for a partial or complete mastectomy procedure. We evaluated the phenotype with a novel data quality framework that consisted of five elements: conformance, completeness, concordance, plausibility, and temporality. Also, we used a previously developed adjectival rating matrix with categories of poor (providing little to no data), fair (using only internal EHR data), and good (using internal and external benchmark/data) to evaluate each data quality dimension (DQD). Our subgroup analysis compared partial to complete mastectomy procedure phenotypes.
Results:
There were 3,704 participants in the partial or complete mastectomy cohort. The geospatial distribution of our cohort varied substantially across states. For example, our cohort consisted of 817 (22.1%) participants from Massachusetts but fewer than 20 participants from multiple other states. We compared the sociodemographics of the partial (n = 2,445) and complete (n = 1,259) mastectomy subgroups. Those groups differed in the distribution of education (P = .02) and income (P < .001) levels using chi-square analysis. The DQD conformance was rated as good. A total of 3,216 (86.7%) participants in our cohort had CPT4 codes for a mastectomy that did not conform to a SNOMED standard. The DQD completeness was rated as fair. The prevalence of breast cancer related concepts was higher in our cohort compared to adult female participants who did not have a mastectomy procedure (P < .001). The DQD concordance was rated as fair. In both the partial and complete mastectomy subgroups, the correlations among concepts were consistent with the clinical management of breast cancer. The DQD plausibility was rated as fair. Although we did not have external benchmark comparisons, the distributions of concepts by age group and time were consistent with expectations. The DQD temporality was rated as fair. The median time between biopsy and mastectomy was seven weeks.
Conclusions:
Our data quality framework was implemented successfully on a mastectomy phenotype. Moreover, the framework allowed us to differentiate breast-conserving therapy and complete mastectomy subgroups in the All of Us data. The results of our analysis could be informative for future breast cancer studies with the OMOP CDM.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.