Currently accepted at: Journal of Medical Internet Research
Date Submitted: Dec 9, 2025
Date Accepted: Jun 3, 2026
This paper has been accepted and is currently in production.
It will appear shortly on 10.2196/88838
The final accepted version (not copyedited yet) is in this tab.
Battling the bots: Defending against fraudulent responses while conducting an international community-engaged web-based survey with people living with Long COVID
ABSTRACT
Background:
Web-based surveys involving self-reported questionnaires are vulnerable to fraudulent responses. Advancements in artificial intelligence (AI) and bots has introduced additional challenges to preventing and identifying fraudulent responses to online questionnaires.
Objective:
To describe our experiences with fraudulent responses, strategies for preventing and identifying fraudulent responses, lessons learned when conducting a web-based survey with adults living with Long COVID, and recommendations for web-based survey research.
Methods:
The Long COVID and Episodic Disability Study is an international community-engaged study among adults living with Long COVID in Canada, Ireland, United Kingdom (UK), and United States (US). We conducted a longitudinal web-based survey, with online administration of a self-reported questionnaire at two timepoints (Time One and Time Two), one week apart. We recruited through Long COVID community groups using social media, emails, and word of mouth. The survey was disrupted by fraudulent responses, including bots. To defend data integrity, we implemented the following strategies: a) pausing our initial launch (Wave One), b) developing and implementing screening criteria to identify fraudulent responses, and c) re-launching the web-based survey (Wave Two) with revised recruitment strategies and questionnaire design to prevent, and identify fraudulent responses.
Results:
We received 4663 responses for Time One and 1281 responses for Time Two, of which we retained 798/4663 (17%) and 629/1281 (49%). Strategies for preventing fraudulent responses included enabling survey protection features in survey software, shutting down compromised survey links, avoiding recruitment via public social media groups, and removing mention of a financial incentive from recruitment materials. Strategies for identifying fraudulent responses included monitoring response completion times, start and end time stamps, geolocation, and screening for suspicious email address characteristics and duplicates.
Conclusions:
Our lessons learned fell into three areas: 1) survey-design and implementation to prevent and identify fraudulent and bot-generated responses; 2) recruitment strategies to mitigate risk of disruption by bots; and 3) responding to disruptions caused by fraudulent and bot responses. We recommend the following tactics to prevent and mitigate the risks of fraudulent and bot responses when administering online web-based questionnaires: a) review current literature and connect with researchers and Research Ethics Boards (REBs) about strategies prior to launching; b) invest in survey software with rigorous info-security technology; c) employ bot-detection features available in survey software prior to launching; d) design questionnaire items to identify bots and fraudulent actors; e) tailor criteria for identifying fraudulent and bot responses to the characteristics of the target population; f) avoid recruitment in public social media groups; g) engage community leaders in tailored and targeted recruitment; h) avoid advertising incentives; i) shut down compromised links rapidly; j) communicate with the REB about disruptions; and k) combine automated with manual methods to identify potentially fraudulent responses in a timely manner. Clinical Trial: n/a
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.