Accepted for/Published in: JMIR Public Health and Surveillance
Date Submitted: Mar 10, 2022
Date Accepted: Aug 5, 2022
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
If You’re Happy and You Know It, Answer This Question: A simulation study of the impact of non-random missingness in surveillance data on population-level summaries
ABSTRACT
Background:
Surveillance data are an essential public health resource for guiding policy and allocation of human and capital resources. These data often consist of large collections of information based on non-random sample designs. Population estimates based on such data may be impacted by the underlying sample distribution compared to the true population of interest. Here we simulate a population of interest and allow response rates to vary in non-random ways to illustrate and measure the effect this has on population-based estimates of an important public health policy outcome.
Objective:
To explore the effects of non-random missing data on surveillance-based population estimates.
Methods:
We simulate a population of respondents answering a survey question about their satisfaction with their community’s policy regarding vaccination mandates for government personnel. We allow response rates to differ between the generally satisfied and dissatisfied and consider the effect of common efforts to control for potential bias: sampling weights, sample size inflation and hypothesis tests for determining missingness at random. We compare these conditions via mean squared errors and sampling variability to characterize the bias in estimation arising under these different approaches.
Results:
Sample estimates present clear, quatifiable bias, even in the most favorable response profile. Efforts to mitigate bias through sample size inflation and sampling weights have negligible effect on the overall result. Additionally, hypothesis testing for departures from random missingness rarely detect the non-random missingness across the widest range of response profiles considered.
Conclusions:
Our results suggest that assuming surveillance data are missing at random during analysis could provide estimates that are widely different from what we might see in the whole population. Policy decisions based on such potentially biased estimates could result in devastating results in terms of community disengagement and health disparities. Alternative approaches to analysis which move away from broad generalization of a mis-measured population at risk are necessary to identify the marginalized groups where overall response may be very different from those observed in the measured respondents.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.