Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Mar 26, 2025
Open Peer Review Period: Mar 26, 2025 - May 21, 2025
Date Accepted: May 28, 2025
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Enhancing Clinical Data Infrastructure for AI Research: Comparative Evaluation of Data Management Architectures

Gebler R, Reinecke I, Sedlmayr M, Goldammer M

Enhancing Clinical Data Infrastructure for AI Research: Comparative Evaluation of Data Management Architectures

J Med Internet Res 2025;27:e74976

DOI: 10.2196/74976

PMID: 40749197

PMCID: 12357119

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Enhancing Clinical Data Infrastructure for AI Research: A Comparative Evaluation of Data Management Architectures

  • Richard Gebler; 
  • Ines Reinecke; 
  • Martin Sedlmayr; 
  • Miriam Goldammer

ABSTRACT

Background:

The rapid growth of clinical data, driven by digital technologies and high-resolution sensors, presents significant challenges for healthcare organisations aiming to support advanced AI research and improve patient care. Traditional data management approaches may struggle to handle the large, diverse and rapidly updating datasets prevalent in modern clinical environments.

Objective:

This study compares three clinical data management architectures - clinical data warehouses (cDWH), clinical data lakes (cDL) and clinical data lakehouses (cDLH) - by analysing their performance using the FAIR principles and the Big Data 5 Vs (Volume, Variety, Velocity, Veracity, Value). The aim is to provide guidance on selecting an architecture that balances robust data governance with the flexibility required for advanced analytics.

Methods:

We developed a comprehensive analysis framework that integrates aspects of data governance with technical performance criteria. A rapid literature review was conducted to synthesise evidence from multiple studies, focusing on how each architecture manages large, heterogeneous and dynamically updating clinical data. The review assessed key dimensions such as scalability, real-time processing capabilities, metadata consistency, and the technical expertise required for implementation and maintenance.

Results:

The results show that cDWHs offer strong data governance, stability and structured reporting, making them well suited for environments that require strict compliance and reliable analysis. However, they are limited in terms of real-time processing and scalability. In contrast, cDLs offer greater flexibility and cost-effective scalability for managing heterogeneous data types, although they may suffer from inconsistent metadata management and challenges in maintaining data quality. cDLHs combine the strengths of both approaches by supporting real-time data ingestion and structured querying; however, their hybrid nature requires high technical expertise and involves complex integration efforts.

Conclusions:

The optimal data management architecture for clinical applications depends on an organisation's specific needs, available resources, and strategic goals. Healthcare institutions need to weigh the trade-offs between robust data governance, operational flexibility and scalability to build future-proof infrastructures that support both clinical operations and AI research. Further research should focus on simplifying the complexity of hybrid models and improving the integration of clinical standards to improve overall system reliability and ease of implementation.


 Citation

Please cite as:

Gebler R, Reinecke I, Sedlmayr M, Goldammer M

Enhancing Clinical Data Infrastructure for AI Research: Comparative Evaluation of Data Management Architectures

J Med Internet Res 2025;27:e74976

DOI: 10.2196/74976

PMID: 40749197

PMCID: 12357119

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.