Accepted for/Published in: JMIR Formative Research
Date Submitted: Jun 19, 2025
Open Peer Review Period: Jun 23, 2025 - Aug 18, 2025
Date Accepted: Feb 16, 2026
(closed for review but you can still tweet)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Evaluating Crowdsourced Data Collection for Carceral Death Surveillance: A Pilot Study Using Amazon Mechanical Turk
ABSTRACT
Background:
People who are incarcerated face significantly higher health risks than the general population, yet deaths in custody remain underreported and poorly monitored by public health systems. Although the federal Death in Custody Reporting Act (DCRA) requires states to report all deaths in correctional facilities to the U.S. Department of Justice, reporting has been inconsistent, delayed, and often inaccessible to the public. As a result, researchers have turned to press releases issued by correctional agencies as one of the few timely sources of information on individual deaths in custody. These press releases, however, vary widely in content and structure, making it difficult to extract standardized information. Manually reviewing and coding these documents is time consuming and hard to scale. Crowdsourcing platforms like Amazon Mechanical Turk (MTurk) may offer a faster, low-cost method for gathering data, but their utility in this setting remains untested.
Objective:
This pilot study evaluated whether MTurk could be used to extract structured information from press releases about deaths in custody, as part of a broader effort to improve the timeliness and transparency of health data in correctional systems.
Methods:
We selected 144 press releases describing individual deaths that occurred between 2000 and 2023 across 35 U.S. prison systems. Each press release was assigned to three MTurk workers, for a total of 432 participants. Workers completed a 16-question form aligned with DCRA variables, including age, race and ethnicity, date of death, and facility location. We assessed how often workers agreed on responses, reviewed common types of errors, and recorded the time to complete tasks.
Results:
All 144 data abstraction tasks were completed within 48 hours, illustrating the efficiency of the MTurk platform. However, interrater agreement was low, with concordance rates of 14.2 percent for age, 12.3 percent for race or ethnicity, and 11.4 percent for date of birth. Qualitative analysis revealed frequent errors, omissions, and indications of inattentive or automated responses. Workers often misinterpreted system-specific terminology and, in some cases, submitted placeholder text rather than extracting information directly from the source material.
Conclusions:
Although MTurk allowed for rapid task completion, the quality of the extracted data was consistently low when applied to press releases about deaths in prisons and jails. These findings suggest that general crowdsourcing platforms may not be well suited for extracting accurate and detailed health information from unstructured or inconsistent sources without additional training, oversight, or quality checks. Even so, this remains a promising area for further research. With improved task design and support from artificial intelligence tools, crowdsourcing may help address important gaps in public health surveillance of deaths in custody. Long term progress, however, will require correctional agencies to implement consistent, transparent, and standardized systems for reporting deaths, similar to those in healthcare and public health systems. Clinical Trial: N/A
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.