Currently submitted to: Journal of Medical Internet Research
Date Submitted: Jun 4, 2026
Open Peer Review Period: Jun 5, 2026 - Jul 31, 2026
(currently open for review)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
MIRAPIE: Proposing a harmonising framework as a minimal community standard for biomedical provenance documentation
ABSTRACT
Background:
Generating Findable, Accessible, Interoperable, and Reusable (FAIR) biomedical samples, data, and tools is costly and time-consuming. Thus, transparency about their processing or evolution and reuse, particularly of health data, are highly desirable. Therefore, an appropriate fact-based decision framework to evaluate data (re)usability is required. Provenance information documents the processing or evolution of a data object, thereby providing an essential formal basis for such a (re)usability evaluation. Standardised, this provenance information facilitates better FAIR biomedical data.
Objective:
The MInimal Requirements for Automated Provenance Information Enrichment (MIRAPIE) project aims at defining the minimal required provenance information for harmonised documentation of a data objects processing history and to establish the MIRAPIE approach as a community standard to assure interoperability of the collected provenance information.
Methods:
A hybrid consensus finding method, adjusted from Nominal Group Technique (NGT) and Delphi, has been applied within an international community setting to iteratively implement a minimal data model, an ontology, and an application guideline. The data model is based on the PROV Data Model (PROV-DM), the ontology expands the PROV Ontology (PROV-O).
Results:
With the MIRAPIE question, we defined a harmonising framework for provenance information in biomedicine and presumably beyond. The minimal data model, a respective ontology, and an accompanying guideline facilitate means for standardised and possibly automated provenance documentation. In diverse biomedical usage scenarios their general applicability to data, workflows, models, and even samples is shown. Setting up provenance documentation from scratch is equally supported as linking alternative data schemata and mapping existing provenance documentation.
Conclusions:
MIRAPIE question, minimal data model, ontology, and guideline together significantly contribute to the advancement of biomedical and especially health research, setting up a basis for a contextual (re)usability evaluation. This fosters traceability of changes applied to data, workflows, tools, and samples and, in consequence, sustainable data usage and reproducibility of scientific results. The generalisation allows to overcome domain-specific differences and local, national, and international boundaries. We invite biomedical research community and health data gathering institutions to create lasting change by establishing MIRAPIE-compliant provenance information for transparent data processing and (re)usability assessment.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.