Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Mar 15, 2023
Date Accepted: May 3, 2023

The final, peer-reviewed published version of this preprint can be found here:

An Extract-Transform-Load Process Design for the Incremental Loading of German Real-World Data Based on FHIR and OMOP CDM: Algorithm Development and Validation

Henke E, Peng Y, Reinecke I, Zoch M, Sedlmayr M, Bathelt F

An Extract-Transform-Load Process Design for the Incremental Loading of German Real-World Data Based on FHIR and OMOP CDM: Algorithm Development and Validation

JMIR Med Inform 2023;11:e47310

DOI: 10.2196/47310

PMID: 37621207

PMCID: 10466444

An ETL-process design for incremental loading German real-world data based on FHIR and OMOP CDM: Algorithm Development and Validation

  • Elisa Henke; 
  • Yuan Peng; 
  • Ines Reinecke; 
  • Michéle Zoch; 
  • Martin Sedlmayr; 
  • Franziska Bathelt

ABSTRACT

Background:

In the Medical Informatics in Research and Care in University Medicine (MIRACUM) consortium, an IT-based clinical trial recruitment support system (CTRSS) was developed based on the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM). Currently, OMOP CDM is populated with German Fast Healthcare Interoperability Resources (FHIR) using an Extract-Transform-Load (ETL)-process, which was designed as bulk load. However, the computational effort that comes with an everyday full load is not sufficient for daily recruitment.

Objective:

The objective of this study is to extend our existing ETL-process with the option of incremental loading to efficiently support daily updated data.

Methods:

Based on our existing bulk ETL-process, we performed an analysis to determine requirements of incremental loading. Furthermore, a literature review was conducted to identify adaptable approaches. Based on this, we implemented three methods to integrate incremental loading into our ETL-process. Lastly, a test suite was defined, to evaluate the incremental loading for data correctness and performance compared to bulk loading.

Results:

The resulting ETL-process supports bulk and incremental loading. Performance tests show that the incremental load took 87.5% less execution time than the bulk load related to changes of one day while no data differences occurred in OMOP CDM.

Conclusions:

Since incremental loading is more efficient than a daily bulk load and both loading options result in the same amount of data, we recommend using bulk load for an initial load and switching to incremental load for daily updates. The resulting incremental ETL-logic can be applied internationally, since it is not restricted to German FHIR profiles.


 Citation

Please cite as:

Henke E, Peng Y, Reinecke I, Zoch M, Sedlmayr M, Bathelt F

An Extract-Transform-Load Process Design for the Incremental Loading of German Real-World Data Based on FHIR and OMOP CDM: Algorithm Development and Validation

JMIR Med Inform 2023;11:e47310

DOI: 10.2196/47310

PMID: 37621207

PMCID: 10466444

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.