Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Cancer

Date Submitted: Apr 8, 2024
Open Peer Review Period: Apr 11, 2024 - Jun 6, 2024
Date Accepted: Dec 30, 2024
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Assessing the Data Quality Dimensions of Partial and Complete Mastectomy Cohorts in the All of Us Research Program: Cross-Sectional Study

Spotnitz M, Ostchega Y, Giannini J, Goff SL, Anandan LP, Clark E, Berman L

Assessing the Data Quality Dimensions of Partial and Complete Mastectomy Cohorts in the All of Us Research Program: Cross-Sectional Study

JMIR Cancer 2025;11:e59298

DOI: 10.2196/59298

PMID: 40068169

PMCID: 11918980

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Assessing the Data Quality Dimensions of Partial and Complete Mastectomy Cohorts in the All of Us Research Program: A Cross-Sectional Study

  • Matthew Spotnitz; 
  • Yechiam Ostchega; 
  • John Giannini; 
  • Stephanie L Goff; 
  • Lakshmi Priya Anandan; 
  • Emily Clark; 
  • Lew Berman

ABSTRACT

Background:

Breast cancer is prevalent among women in the United States. Non-metastatic disease is treated by partial or complete mastectomy procedures. However, the rates of those procedures vary across practices. Generating real-world evidence on breast cancer surgery could lead to improved and consistent practices.

Objective:

The paper aims to determine whether All of Us data are fit for use in generating real-world evidence on mastectomy procedures.

Methods:

Our mastectomy phenotype consisted of adult female participants who had CPT4 or SNOMED codes for a partial or complete mastectomy procedure. We evaluated the phenotype with a novel data quality framework that consisted of five elements: conformance, completeness, concordance, plausibility, and temporality. Also, we used a previously developed adjectival rating matrix with categories of poor (providing little to no data), fair (using only internal EHR data), and good (using internal and external benchmark/data) to evaluate each data quality dimension (DQD). Our subgroup analysis compared partial to complete mastectomy procedure phenotypes.

Results:

There were 3,704 participants in the partial or complete mastectomy cohort. The geospatial distribution of our cohort varied substantially across states. For example, our cohort consisted of 817 (22.1%) participants from Massachusetts but fewer than 20 participants from multiple other states. We compared the sociodemographics of the partial (n = 2,445) and complete (n = 1,259) mastectomy subgroups. Those groups differed in the distribution of education (P = .02) and income (P < .001) levels using chi-square analysis. The DQD conformance was rated as good. A total of 3,216 (86.7%) participants in our cohort had CPT4 codes for a mastectomy that did not conform to a SNOMED standard. The DQD completeness was rated as fair. The prevalence of breast cancer related concepts was higher in our cohort compared to adult female participants who did not have a mastectomy procedure (P < .001). The DQD concordance was rated as fair. In both the partial and complete mastectomy subgroups, the correlations among concepts were consistent with the clinical management of breast cancer. The DQD plausibility was rated as fair. Although we did not have external benchmark comparisons, the distributions of concepts by age group and time were consistent with expectations. The DQD temporality was rated as fair. The median time between biopsy and mastectomy was seven weeks.

Conclusions:

Our data quality framework was implemented successfully on a mastectomy phenotype. Moreover, the framework allowed us to differentiate breast-conserving therapy and complete mastectomy subgroups in the All of Us data. The results of our analysis could be informative for future breast cancer studies with the OMOP CDM.


 Citation

Please cite as:

Spotnitz M, Ostchega Y, Giannini J, Goff SL, Anandan LP, Clark E, Berman L

Assessing the Data Quality Dimensions of Partial and Complete Mastectomy Cohorts in the All of Us Research Program: Cross-Sectional Study

JMIR Cancer 2025;11:e59298

DOI: 10.2196/59298

PMID: 40068169

PMCID: 11918980

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.