Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Jul 27, 2023
Date Accepted: Jun 17, 2024

The final, peer-reviewed published version of this preprint can be found here:

Provenance Information for Biomedical Data and Workflows: Scoping Review

Gierend K, Krüger F, Genehr S, Hartmann F, Siegel F, Waltemath D, Ganslandt T, Zeleke AA

Provenance Information for Biomedical Data and Workflows: Scoping Review

J Med Internet Res 2024;26:e51297

DOI: 10.2196/51297

PMID: 39178413

PMCID: 11380065

Provenance Information for Biomedical Data and Workflows: A Scoping Review

  • Kerstin Gierend; 
  • Frank Krüger; 
  • Sascha Genehr; 
  • Francisca Hartmann; 
  • Fabian Siegel; 
  • Dagmar Waltemath; 
  • Thomas Ganslandt; 
  • Atinkut Alamirrew Zeleke

ABSTRACT

Background:

Provenance information leads to higher interpretability of scientific results and enables reliable collaboration and data sharing. However, the lack of comprehensive evidence on provenance approaches hinders the uptake of good scientific practice in clinical research.

Objective:

Our scoping review aimed to identify approaches and criteria for provenance tracking in the biomedical domain. We reviewed the state-of-the-art frameworks, associated artifacts, and methodologies for provenance tracking

Methods:

This scoping review follows the methodological framework by Arksey and O'Malley. PubMed and Web of Science databases were searched for English-language articles published from 2006 to 2022. Title and abstract screening were carried out by four independent reviewers using the Rayyan screening tool. A majority vote was required for consent on the eligibility of papers based on the defined inclusion and exclusion criteria. Full-text reading and screening were performed independently by two reviewers, and information was extracted into a pre-tested template for the five research questions. Disagreements were resolved by a domain expert. The study protocol has previously been published.

Results:

The search resulted in a total of 764 papers. Of 624 identified, de-duplicated papers, 66 studies fulfilled the inclusion criteria. We identified diverse provenance tracking approaches ranging from practical provenance processing and managing to theoretical frameworks distinguishing diverse concepts and details of data and metadata models, provenance components and notations. A significant majority investigated underlying requirements to varying extends and validation intensity but lacked on provenance completeness coverage. Mostly cited requirements concerned the knowledge about data integrity and reproducibility. Moreover, these revolved around robust data quality assessments, consistent policies for sensitive data protection, improved user interfaces, and automated ontology development. We found that different stakeholder groups benefit from the availability of provenance information. Thereby, we recognized that the term ‘provenance’ is subjected to an evolutionary and technical process with multifaceted meanings and roles. Challenges included organizational and technical issues linked to data annotation, provenance modeling, and performance, amplified by subsequent matters such as enhanced provenance information and quality principles.

Conclusions:

As data volumes grow and computing power increases, the challenge of scaling provenance systems to handle data efficiently and assist complex queries intensifies, necessitating automated and scalable solutions. With rising legal and scientific demands, there's an urgent need for greater transparency in implementing provenance systems in research projects, despite the challenges of unresolved granularity and knowledge bottlenecks. We believe that our recommendations enable quality and guide the implementation of auditable and measurable provenance approaches as well as solutions in the daily tasks of biomedical scientists.


 Citation

Please cite as:

Gierend K, Krüger F, Genehr S, Hartmann F, Siegel F, Waltemath D, Ganslandt T, Zeleke AA

Provenance Information for Biomedical Data and Workflows: Scoping Review

J Med Internet Res 2024;26:e51297

DOI: 10.2196/51297

PMID: 39178413

PMCID: 11380065

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.