Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: May 8, 2023
Date Accepted: Sep 29, 2023
The Status of Data Management Practices: a Mixed-Method Study across German Medical Data Integration Centers
ABSTRACT
Background:
In the context of the Medical Informatics Initiative funded by the German government, medical data integration centers have implemented complex data flows to load routine health care data into research data repositories for secondary use. Data management practices are of importance throughout these processes, and special attention should be given to provenance aspects. Additionally, insufficient knowledge about these processes can lead to validity risks and weaken the quality of the extracted data. The need to collect provenance data during the data life cycle is undisputed, but there is a great lack of clarity on the status.
Objective:
Our study examines the current provenance tracking practices throughout the data lifecycle within the MIRACUM consortium. We outline the current data management maturity status and present recommendations to enable a trustful dissemination and re-use of patient data.
Methods:
Our study design is based on a mixed-method study. We conducted semi-structured interviews with stakeholders from ten data integration centers between July and September 2021. We used a self-designed questionnaire that we tailored to the MIRACUM data integration centers, to collect qualitative and quantitative data. Our study method is compliant with the Good Reporting of a Mixed Methods Study (GRAMMS) checklist.
Results:
From a provenance perspective, our study provides insights into the data management practices concerning data extraction, transformation, storage, and provision. We identified several traceability and reproducibility issues that can be partially explained with a lack of contextual information within non-harmonized workflow steps, unclear responsibilities, missing or incomplete data elements and incomplete computational environment information. Based on the identified shortcomings, we suggest a data management maturity framework to reach more clarity and to help define enhanced data management strategies.
Conclusions:
In this study, we present insights on provenance practices at the data integration centers. The data management maturity framework supports the production and dissemination of accurate and provenance enriched data for their second use. Furthermore, our work serves as a catalyst for the derivation of an overarching data management strategy, abiding data integrity and provenance characteristics as a key factor for quality and FAIR sustained health and research data.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.