Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Oct 2, 2025
Open Peer Review Period: Oct 2, 2025 - Nov 27, 2025
Date Accepted: Jan 30, 2026
(closed for review but you can still tweet)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
What’s a survey researcher to do? Applying an epidemiological approach to the detection of fraudulent survey responses
ABSTRACT
Background:
Survey research has the potential to elevate the experiences and opinions of marginalized populations. The rising number of bot attacks, a method of participant fraud that creates multiple records in survey data using automated software, threatens to drown out those voices and produce inaccurate findings. Rapid identification and mitigation of bot attacks is vital, but there is limited guidance for researchers on scalable approaches to address this problem.
Objective:
Using an epidemiological approach for diagnostic tests, we assessed how well recommended methods detected fraud to develop insight for other web-based survey researchers into how best to identify and shut down bot attacks.
Methods:
We analyzed data from a cross-sectional, web-based statewide survey on access to pediatric subspecialty care in California that used Qualtrics survey software. Caregivers of children with chronic conditions were recruited through Family Resource Centers (FRCs), nonprofit agencies serving families with developmental delays and chronic medical conditions. The survey was sent out to 17 FRCs, whose staff distributed anonymous links to their clients through listservs and flyers. Respondents who completed the survey received a $30 gift card. Prior to launch, we designed a protocol to identify and respond to bot attacks, and we reviewed responses for markers of fraudulent activity. If markers were identified or there was a spike in responses, a senior member of our research team reviewed patterns among all submitted surveys for each FRC to look for signs of bot attacks. We calculated epidemiologic measures of diagnostic test accuracy, such as sensitivity, specificity, positive predictive value, and negative predictive value to better understand the utility of recommended strategies to identify bot attacks.
Results:
We received 646 valid survey records and 905 fraudulent records resulting from bot attacks. The primary indicator of a bot attack was a sudden spike in responses to the survey. Differences in demographics and outcomes, including wait times for pediatric subspecialty care and use of health care services, between the valid and fraudulent data indicated that failure to remove fraudulent records would have dramatically altered the survey results. Most recommended methods in the literature for identifying fraudulent responses had low sensitivity to detect bot attacks and only two were better than chance alone at correctly identifying bot attacks. Combinations of fraud markers and blocks of repeated responses were particularly useful to identify bot attacks.
Conclusions:
Fraudulent data entry using bots has been increasing in survey research. Sharing flexible protocols to identify and mitigate them in a way that is responsive to their ever-changing nature is vital to ensuring that researchers elevate the voices of real people within survey research to inform policy and programmatic discussions.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.