Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Jul 30, 2020
Date Accepted: Sep 16, 2020
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Bots and Other Bad Actors: Threats to Data Quality following Research Participant Recruitment through Social Media
ABSTRACT
Background:
Recruitment of health research participants through social media is becoming more common. In the United States, 80% of adults use at least one social media platform. Social media platforms may allow researchers to reach potential participants efficiently. However, online research methods may be associated with unique threats to sample validity and data integrity. Limited research has described issues of data quality and authenticity associated with the recruitment of health research participants through social media, and sources of low-quality and fraudulent data in this context are poorly understood.
Objective:
To (a) describe and explain threats to sample validity and data integrity following recruitment of health research participants through social media; and (b) summarize recommended strategies to mitigate these threats. Our experience designing and implementing a research study using social media recruitment and online data collection serves as a case study.
Methods:
Using published strategies to preserve data integrity, we recruited participants to complete an online survey through the social media platforms Twitter and Facebook. Participants were to receive $15 upon survey completion. Prior to manually issuing remuneration, we reviewed completed surveys for indicators of fraudulent or low-quality data. Indicators attributable to respondent error were labeled “suspicious,” while those suggesting misrepresentation were labeled “fraudulent.” We planned to remove cases with one “fraudulent” indicator or at least three “suspicious” indicators.
Results:
Within seven hours of survey activation, we received 271 completed surveys. We classified 256 (94%) cases as fraudulent and 15 (6%) as suspicious. Of the fraudulent cases, 235 (87%) provided inconsistent responses to verifiable items, 138 (51%) provided a duplicate or unusual response to one or more open-ended items, 133 (49%) exhibited evidence of inattention, and 44 (15%) exhibited evidence of bot automation.
Conclusions:
Research findings from several disciplines suggest studies in which research participants are recruited through social media are susceptible to data quality issues. Opportunistic individuals who use virtual private servers to fraudulently complete research surveys for profit may contribute to low-quality data. Strategies to preserve data integrity following research participant recruitment through social media are limited. Development and testing of novel strategies to prevent and detect fraud is a research priority.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.