Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Sep 20, 2023
Date Accepted: Dec 3, 2023
The use of metadata-driven approaches for data harmonization in the medical domain: a scoping review
ABSTRACT
Background:
Multi-site clinical studies are increasingly utilizing Real-world data (RWD) to gain Real-world evidence (RWE). However, due to the heterogeneity of source data, it is difficult to analyze such data in a unified way across clinics. Therefore, the implementation of Extraction-Transform-Load (ETL) or Extraction-Load-Transform (ELT) processes for harmonizing local health data is necessary, in order to guarantee the data quality for research. Therefore, the development of such processes is time-consuming and unsustainable. A promising way to ease this is the generalization of ETL/ELT processes.
Objective:
In this work, we investigate existing possibilities for the development of generic ETL/ELT processes. Particularly, we focus on approaches with low development complexity by using descriptive metadata and structural metadata.
Methods:
We conducted a literature review following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. We used four publication databases (i.e. PubMed, IEEE Explore, Web of Science and Biomed Center) to search for relevant publications from 2012 to 2022. The PRISMA flow was then visualized using an R-based online tool. All relevant contents of the publications were extracted into a spreadsheet for further analysis and visualization.
Results:
Regarding the PRISMA guidelines, we included 33 publications in this literature review. All included publications were categorized into seven different focus groups (i.e. Medicine, Data warehouse, Big Data, Industry, Geoinformatics, Archaeology and Military). Based on the extracted data, ontology-based and rule-based approaches were the two most used approaches in different focus groups. The ontology-based approach was mostly implemented using Protégé, while the rule-based approach was mostly implemented manually.
Conclusions:
Our literature review shows that using metadata-driven approaches to develop an ETL/ELT process can serve different purposes in different focus groups. In some cases, using multiple metadata-driven approaches in combination can provide more opportunities for the development of ETL/ELT processes. Therefore, it is necessary to verify the ability of improving ETL/ELT processes for harmonizing medical data by using multiple metadata-driven approaches in combination in the future.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.