Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Aug 30, 2022
Date Accepted: Dec 23, 2022

The final, peer-reviewed published version of this preprint can be found here:

Data Provenance in Biomedical Research: Scoping Review

Johns M, Meurers T, Wirth FN, Haber AC, Müller A, Halilovic M, Balzer F, Prasser F

Data Provenance in Biomedical Research: Scoping Review

J Med Internet Res 2023;25:e42289

DOI: 10.2196/42289

PMID: 36972116

PMCID: 10132013

Data provenance in biomedical research: scoping review

  • Marco Johns; 
  • Thierry Meurers; 
  • Felix N. Wirth; 
  • Anna C. Haber; 
  • Armin Müller; 
  • Mehmed Halilovic; 
  • Felix Balzer; 
  • Fabian Prasser

ABSTRACT

Background:

Data provenance is information about the origin, processing and movement of data. Reliable and precise knowledge about data provenance has great potential to improve reproducibility as well as quality in biomedical research and therefore to foster good scientific practice. However, despite increasing interest in the literature and implementation in other disciplines, data provenance technologies have not yet been widely adopted in biomedical research.

Objective:

The aim of this scoping review was to provide a structured overview of the body of knowledge on provenance methods in biomedical research by (1) systematizing articles covering data provenance technologies developed for or used in this application area, (2) describing and comparing the functionalities as well as the design of the provenance technologies utilized, and (3) identifying gaps in the literature that could provide opportunities for future research on technologies that could receive more widespread adoption.

Methods:

Following the PRISMA extension for scoping reviews, articles were identified by database searches in PubMed, IEEE Xplore and Web of Science and subsequently screened for eligibility. We included original articles covering software-based provenance management for scientific research published between 2010 and 2021. A set of data items was defined along five axes: (1) publication metadata, (2) application scope, (3) provenance aspects covered, (4) data representation and (5) functionalities. The data items were extracted from the articles, stored in a charting spreadsheet and summarized in tables and figures.

Results:

We identified 44 original articles published between 2010 and 2021. We found that the solutions described are heterogeneous along all axes. We also identified relationships between motivations for the use of provenance information, as well as feature sets (capture, storage, retrieval, visualization, and analysis) and implementation details, such as the data models and technologies used. Important gaps that we identified are that few publications address the analysis of provenance data or use established provenance standards, such as PROV.

Conclusions:

The heterogeneity of provenance methods, models and implementations found in the literature points towards a lack of a unified understanding of provenance concepts for biomedical data. Providing a common framework as well as biomedical reference and benchmarking datasets could foster the development of more comprehensive provenance solutions.


 Citation

Please cite as:

Johns M, Meurers T, Wirth FN, Haber AC, Müller A, Halilovic M, Balzer F, Prasser F

Data Provenance in Biomedical Research: Scoping Review

J Med Internet Res 2023;25:e42289

DOI: 10.2196/42289

PMID: 36972116

PMCID: 10132013

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.