Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Research Protocols

Date Submitted: Jul 2, 2021
Date Accepted: Sep 7, 2021
Date Submitted to PubMed: Nov 23, 2021

The final, peer-reviewed published version of this preprint can be found here:

Approaches and Criteria for Provenance in Biomedical Data Sets and Workflows: Protocol for a Scoping Review

Gierend K, Krüger F, Waltemath D, Fünfgeld M, Zeleke AA, Ganslandt T

Approaches and Criteria for Provenance in Biomedical Data Sets and Workflows: Protocol for a Scoping Review

JMIR Res Protoc 2021;10(11):e31750

DOI: 10.2196/31750

PMID: 34813494

PMCID: 8663663

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Approaches and criteria for provenance in biomedical data sets/workflows: a scoping review

  • Kerstin Gierend; 
  • Frank Krüger; 
  • Dagmar Waltemath; 
  • Maximilian Fünfgeld; 
  • Atinkut Alamirrew Zeleke; 
  • Thomas Ganslandt

ABSTRACT

Background:

Provenance supports the understanding of data genesis and it is a key factor to ensure the trustworthiness of the digital objects containing (sensitive) scientific data. Provenance information contributes to a better understanding of scientific results and fosters collaboration on existing data as well as data-sharing. This encompasses defining comprehensive concepts and standards for transparency and traceability, reproducibility, validity and quality assurance during clinical and scientific data workflows and/or research.

Objective:

The aim of this scoping review is to investigate approaches and challenges for provenance tracking as well as disclosing current knowledge gaps in the area. The review covers modeling aspects as well as metadata frameworks for capturing meaningful and usable provenance information during creation, collection and processing of (sensitive) scientific biomedical data. The objective of the review also includes the examination of quality aspects of provenance criteria.

Methods:

The scoping review will follow the methodological framework by Arksey and O'Malley. Relevant publications will be obtained by querying PubMed and Web of Science. All articles in English language will be included, within the time period between 2006 and 23-March 2021. Database retrieval will be accompanied by manual search for grey literature. Potential publications will then be exported into a reference management software, and duplicates will be removed. Afterwards, the obtained set of papers will be transferred into a systematic review management tool. All publications will be screened, extracted and analyzed: title and abstract screening will be carried out by 4 independent reviewers. Majority vote is required for consent to eligibility of articles based on defined inclusion and exclusion criteria. Full-text reading will be performed independently by 2 reviewers and in the last step key information will be extracted on a template which has been evaluated by the reviewers beforehand. If agreement cannot be reached, the conflict will be resolved by a domain expert. Charted data will be analyzed by categorizing and summarizing the individual data items based on the research questions. Tabular or graphical overviews will be given, if applicable.

Results:

The reporting follows the extension of the PRISMA statements for scoping reviews (PRISMA-ScR). Electronic database searches in PubMed and Web of Science resulted in 469 matches after deduplication. As of June 2021, the scoping review is in the full text screening stage. The data extraction using the pretested charting template will follow the full text screening stage. We expect the scoping review report to be completed by the end of 2021.

Conclusions:

Information about the origin of healthcare data has a major impact on the quality and the reusability of scientific results as well as follow-up activities. This scoping review will provide information about current approaches, challenges or knowledge gaps with provenance tracking in biomedical sciences.


 Citation

Please cite as:

Gierend K, Krüger F, Waltemath D, Fünfgeld M, Zeleke AA, Ganslandt T

Approaches and Criteria for Provenance in Biomedical Data Sets and Workflows: Protocol for a Scoping Review

JMIR Res Protoc 2021;10(11):e31750

DOI: 10.2196/31750

PMID: 34813494

PMCID: 8663663

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.