Currently submitted to: JMIR Research Protocols
Date Submitted: Jun 18, 2026
Open Peer Review Period: Jun 18, 2026 - Aug 13, 2026
(currently open for review)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Existing Tools and Frameworks for Data Quality Root Cause Analysis in Healthcare: A Scoping Review Protocol
ABSTRACT
Background:
The importance of the quality of clinical data for secondary research use is well established. Frameworks for structured analysis of data quality (DQ) and tools for alleviating DQ issues, a process known as data cleaning, are abundant. However, efforts to fix data quality issues at the point of origin are scarce, despite being considered a superior long-term solution in many cases. The process of identifying the origin of DQ issues is known as root cause analysis (RCA). Currently, there is no systematic collection of methods for this kind of RCA, which hinders prospective researchers in this field from making informed decisions about their approach.
Objective:
To systematically collect, describe, and map known procedural methodologies for DQ RCA, taxonomies for the classification of root causes, and analytical tools that visualize RCA processes or causal chains. This scoping review will serve as a foundation for future primary research and systematic reviews in the field of DQ RCA. Methods and analysis: This scoping review follows the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) and its extension for scoping reviews (PRISMA-ScR) checklists. Eligibility criteria are defined using the Population, Concept, and Context (PCC) framework. The database interfaces Web of Science, IEEE Xplore, and Scopus will be searched for articles that present or employ RCA methods in the context of DQ. Four researchers will independently perform the screening and data extraction. Each includes a pilot phase to test and refine the inclusion/exclusion criteria or the data extraction form, respectively. The extracted data will be qualitatively analyzed, and the results will be presented using a combination of charts, tables, and narrative synthesis.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.