Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Sep 17, 2019
Date Accepted: May 6, 2020

The final, peer-reviewed published version of this preprint can be found here:

Enabling Agile Clinical and Translational Data Warehousing: Platform Development and Evaluation

Spengler H, Lang C, Mahapatra T, Gatz I, Kuhn KA, Prasser F

Enabling Agile Clinical and Translational Data Warehousing: Platform Development and Evaluation

JMIR Med Inform 2020;8(7):e15918

DOI: 10.2196/15918

PMID: 32706673

PMCID: 7404007

A Comprehensive Platform for Agile Clinical and Translational Data Warehousing

  • Helmut Spengler; 
  • Claudia Lang; 
  • Tanmaya Mahapatra; 
  • Ingrid Gatz; 
  • Klaus A Kuhn; 
  • Fabian Prasser

ABSTRACT

Background:

Modern data-driven medical research promises to provide new insights into the development and course of diseases and to enable novel methods of clinical decision support. Clinical and translational data warehouses are an important building block of infrastructures that provide the large datasets needed to realize this. These databases provide users with uni ed access to heterogeneous datasets and support use cases such as cohort selection, hypothesis generation and ad-hoc data analysis. They can also be used to implement distributed cross-institutional data analyses by representing data in common models using standard terminologies and ontologies.

Objective:

Often, different warehousing platforms are needed to support different use cases and different types of data. Moreover, to achieve an optimal data representation within the target systems, technical know-how as well as project-specific domain knowledge are needed when designing data transformation and loading processes. As a result, informaticians need to work in close cooperation with clinicians and researchers involving short feedback cycles. This is a challenging task, as the installation and maintenance of common warehousing platforms can be complex and time-consuming. Moreover, data loading typically requires significant efforts in terms of data pre-processing, cleansing and restructuring. The work described in this article aimed to address to these challenges.

Methods:

We have developed a (private) cloud infrastructure for managing instances of common biomedical data warehousing platforms, combined with a flexible and easy-to-use pipeline for data loading. The platform supports both i2b2 and tranSMART and it comes with built-in security and comprehensive documentation. The data loading pipeline is based on a declarative configuration paradigm, which enables the agile development of data import processes and the automation of a wide range of common data cleansing and preprocessing tasks.

Results:

The described platform has successfully been used to support a wide range of projects, from which we present three in this paper: one in which we provided translational access to highly structured research data, one in which we supported clinician-scientists by providing them with an overview of longitudinal semi-structured clinical data, and one in which we loaded highly structured and standardized billing data to prepare a large-scale distributed study.

Conclusions:

Our platform significantly simplifies the management of data warehousing platforms and enables quickly loading data in various representations. This enables the agile development of such solutions in close cooperation with end users. Both the cloud-based hosting infrastructure and the data loading pipeline are available to the community as open-source software.


 Citation

Please cite as:

Spengler H, Lang C, Mahapatra T, Gatz I, Kuhn KA, Prasser F

Enabling Agile Clinical and Translational Data Warehousing: Platform Development and Evaluation

JMIR Med Inform 2020;8(7):e15918

DOI: 10.2196/15918

PMID: 32706673

PMCID: 7404007

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.