Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Formative Research

Date Submitted: Mar 7, 2023
Date Accepted: Dec 11, 2023

The final, peer-reviewed published version of this preprint can be found here:

Assessing and Improving Data Integrity in Web-Based Surveys: Comparison of Fraud Detection Systems in a COVID-19 Study

Bonett S, Lin W, Sexton Topper P, Wolfe J, Golinkoff J, Deshpande A, Villarruel A, Bauermeister J

Assessing and Improving Data Integrity in Web-Based Surveys: Comparison of Fraud Detection Systems in a COVID-19 Study

JMIR Form Res 2024;8:e47091

DOI: 10.2196/47091

PMID: 38214962

PMCID: 10818231

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Improving Data Integrity in Web-Based Surveys: A Comparison of Fraud Detection Systems in a COVID-19 Study

  • Stephen Bonett; 
  • Willey Lin; 
  • Patrina Sexton Topper; 
  • James Wolfe; 
  • Jesse Golinkoff; 
  • Aayushi Deshpande; 
  • Antonia Villarruel; 
  • José Bauermeister

ABSTRACT

Background:

Online surveys increase access to study participation and improve opportunities to reach diverse populations. However, web-based surveys are vulnerable to data quality threats, including fraudulent entries from automated bots and duplicative submissions. Widely used proprietary tools to identify fraud offer little transparency about the methods used, effectiveness, or representativeness of resulting datasets. Robust, reproducible, and context-specific methods of accurately detecting fraudulent responses are needed to ensure integrity and maximize the value of web-based survey research.

Objective:

This study aims to describe a multilayered fraud detection system implemented in a large web-based survey about COVID-19 attitudes, beliefs, and behaviors, examine the agreement between this fraud detection system and a proprietary fraud detection system, and compare the resulting study samples from each of the two fraud detection methods.

Methods:

The PhillyCEAL Common Survey is a cross-sectional web-based survey that remotely enrolled residents ages 13 and above to assess how the COVID-19 pandemic impacted individuals, neighborhoods, and communities in Philadelphia. Two fraud detection methods are described and compared: 1) a multilayer fraud detection strategy developed by the research team that combined automated validation of response data and real-time verification of study entries by study personnel and 2) the proprietary fraud detection system used by the Qualtrics survey platform. Descriptive statistics were computed for the full sample and for responses classified as valid by two different fraud detection methods, and classification tables were created to assess agreement between the methods, and the impact of fraud detection methods on the distribution of a key study variable was assessed.

Results:

Out of 7,950 completed surveys, our multilayer fraud detection system identified 3,228 (40.60%) cases as valid while the Qualtrics fraud detection system identified 4,389 (55.21%) cases as valid. The two methods showed only “fair” or “minimal” agreement in their classifications (kappa=0.25, 95% CI:[0.23, 0.27]). The choice of fraud detection method impacted the distributions of key study variables.

Conclusions:

: Selection of a fraud detection method can affect a study’s sample composition. A multilayered approach to fraud detection that includes conservative use of automated fraud detection and integration of human review of entries tailored to study’s specific context and its participants may be warranted for future survey research.


 Citation

Please cite as:

Bonett S, Lin W, Sexton Topper P, Wolfe J, Golinkoff J, Deshpande A, Villarruel A, Bauermeister J

Assessing and Improving Data Integrity in Web-Based Surveys: Comparison of Fraud Detection Systems in a COVID-19 Study

JMIR Form Res 2024;8:e47091

DOI: 10.2196/47091

PMID: 38214962

PMCID: 10818231

Per the author's request the PDF is not available.