Accepted for/Published in: JMIR Public Health and Surveillance
Date Submitted: Aug 21, 2024
Date Accepted: Sep 17, 2025
Non-Random Missingness in Child Race and Ethnicity Records and the U.S. Federal Data Standards: A Pooled Analysis of Community-based Child Health Studies
ABSTRACT
Background:
Racialization – the foundation upon which racism manifests – is the social categorization of people based on phenotypic characteristics, national origin, cultural practices, and other traits. Public health surveillance of racial and ethnic groups attempts to racialize populations based largely on self-reported survey data, often perpetuating bias and misclassification. These methodological limitations uphold racial health disparities by creating an evidence base that lacks reliability and validity for non-dominant racial groups.
Objective:
To explore the prevalence of systematic racial bias within the current federal race/ethnicity data standards in the U.S. and develop a standardized method towards improving race/ethnicity reporting in public health research and policy.
Methods:
We developed a replicable reallocation process to uncover the racial heterogeneity obscured by main components of U.S. federal data standards, including open-ended responses and the “Other” and multiracial categories. To demonstrate real-world implications, we pooled demographic surveys from 8 pediatric studies and reallocated child race/ethnicity data (N=8,087), examining pre-post differences in descriptive statistics following reallocation. Logistic regression models were used to determine if the odds of reallocation were greater for minoritized children than White children under federal data standards.
Results:
93% of parents/guardians provided child race/ethnicity data; 7% did not report the information and 3.7% identified children as "Other.” Based on open-ended written responses, most children who were reallocated were moved from "Other Race" to "Black/African American" (59%). We observed a three-fold increase in the share of Indigenous Americans by disaggregating multiracial responses. Compared to white children, those racialized as Black (OR=8.8; 95% CI=6.1, 12.7) or Hispanic (OR=1.6; 95% CI=1.0, 2.6) were more likely to be misrepresented by U.S. data standards.
Conclusions:
Black subpopulations in our study were at the greatest risk of non-random misclassification under U.S. federal data standards. These findings emphasize how the paradigm of assessing race/ethnicity in public health policy and practice leads to avoidable marginalization of individuals already facing disproportionately high exposure to racial discrimination. Efforts to improve public health surveillance and equitable data systems should focus on expanding survey response options and developing new statistical techniques for data analysis, ultimately avoiding the aggregation of diverse populations with unique life experiences.
Citation
Per the author's request the PDF is not available.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.