Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Mar 7, 2025
Date Accepted: Jan 21, 2026

The final, peer-reviewed published version of this preprint can be found here:

Identifying and Analyzing Bot-Generated Responses in Online Health Care Surveys: Methodological Study

Hamovitch E, McKellar K, Wodchis WP

Identifying and Analyzing Bot-Generated Responses in Online Health Care Surveys: Methodological Study

JMIR Med Inform 2026;14:e73622

DOI: 10.2196/73622

Identifying and Analyzing Bot-Generated Responses in Healthcare Research

  • Emily Hamovitch; 
  • Kaileah McKellar; 
  • Walter P Wodchis

ABSTRACT

Background:

The increasing reliance on online surveys for collecting patient-reported outcome measures (PROMs) and patient-reported experience measures (PREMs) has led to growing concerns over fraudulent responses generated by bots. These automated responses threaten data integrity by fabricating survey results, distorting statistical analyses, and potentially misguiding policy decisions. Addressing this issue is critical for maintaining the validity of research findings that inform healthcare practice and policy.

Objective:

This study aimed to develop a robust set of criteria for identifying bot-generated responses in online healthcare surveys and to examine how these responses impact data quality. We then explored differences in survey results between human and bot respondents in a survey assessing PROMs and PREMs within a geographic region in Ontario, Canada.

Methods:

A survey was conducted from July to October 2023 using REDCap, distributed with a generic link via email and later shared on social media. The survey collected data on healthcare use, patient experiences, health outcomes, digital healthcare engagement, and demographics. A three-tier classification system was developed to detect bot responses based on predefined “red flags,” including duplicate open-ended responses, inconsistent demographic reporting, identical timestamps, and location discrepancies. Quantitative analysis included chi-square tests to assess differences between human and bot responses and Spearman’s correlation tests to examine relationships amongst healthcare indicators.

Results:

Analysis included 1,154 responses, with 58% (n=668) classified as bot-generated. The most frequent bot-identification criterion was duplicated open-ended responses (n=293). Chi-square tests revealed statistically significant differences (P<.001) between bots and humans across nearly all survey items. Bots demonstrated response patterns concentrated in the middle of Likert scales, whereas humans were more likely to select extreme values. Correlation analyses showed that expected relationships between key health indicators (e.g., depression symptoms) were present in human responses but reversed in bot-generated data, highlighting the potential for compromised validity in unfiltered survey datasets.

Conclusions:

The findings underscore the necessity of implementing bot prevention and detection methods in online healthcare surveys to preserve data integrity. Failure to do so risks distorting research conclusions, particularly in health equity studies where demographic misclassification may bias results. The study highlights effective bot detection strategies, including open-text analysis, timestamp evaluation, and geographic validation, and recommends integrating these techniques into survey design. As bots continue to evolve, ongoing advancements in bot prevention and detection will be crucial to maintaining the reliability of digital health research.


 Citation

Please cite as:

Hamovitch E, McKellar K, Wodchis WP

Identifying and Analyzing Bot-Generated Responses in Online Health Care Surveys: Methodological Study

JMIR Med Inform 2026;14:e73622

DOI: 10.2196/73622

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.