Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Mar 31, 2025
Date Accepted: Oct 6, 2025

The final, peer-reviewed published version of this preprint can be found here:

Assessing Data Quality in Heterogeneous Health Care Integration: Simulation Study of the AIDAVA Framework

Declerck J, Kiliç ÃD, Emir Erol E, Mehryar S, Kalra D, de Zegher I, Celebi R

Assessing Data Quality in Heterogeneous Health Care Integration: Simulation Study of the AIDAVA Framework

JMIR Med Inform 2025;13:e75275

DOI: 10.2196/75275

PMID: 41223409

PMCID: 12779104

Assessing Data Quality in Heterogeneous Healthcare Integration: The AIDAVA Framework

  • Jens Declerck; 
  • Ömer Durukan Kiliç; 
  • Ensar Emir Erol; 
  • Shervin Mehryar; 
  • Dipak Kalra; 
  • Isabelle de Zegher; 
  • Remzi Celebi

ABSTRACT

Background:

Integrated health data is foundational for secondary use, research, and policy making. However, data quality issues – such as missing values and inconsistencies – are common due to the heterogeneity of health data sources. Existing frameworks often apply static, one-time assessments, limiting their ability to address quality problems across evolving data pipelines.

Objective:

This study evaluates the AIDAVA data quality framework, which introduces dynamic, lifecycle-based validation of health data using knowledge graph technologies and SHACL-based rules. The framework is assessed for its ability to detect and manage data quality issues – specifically, completeness and consistency – during integration.

Methods:

Using the MIMIC-III dataset, we simulated real-world data quality challenges by introducing structured noise, including missing values and logical inconsistencies. The data was transformed into Source Knowledge Graphs (SKGs) and integrated into a unified Personal Health Knowledge Graph (PHKG). SHACL validation rules were applied iteratively during the integration process, and data quality was assessed under varying noise levels and integration orders.

Results:

The AIDAVA framework effectively detected completeness and consistency issues across all scenarios. Completeness was shown to influence the interpretability of consistency scores, and domain-specific attributes (e.g., diagnoses, procedures) were more sensitive to integration order and data gaps.

Conclusions:

AIDAVA supports dynamic, rule-based validation throughout the data lifecycle. By addressing both dimension-specific vulnerabilities and cross-dimensional effects, it lays the groundwork for scalable, high-quality health data integration. Future work should explore deployment in live clinical settings and expand to additional quality dimensions.


 Citation

Please cite as:

Declerck J, Kiliç ÃD, Emir Erol E, Mehryar S, Kalra D, de Zegher I, Celebi R

Assessing Data Quality in Heterogeneous Health Care Integration: Simulation Study of the AIDAVA Framework

JMIR Med Inform 2025;13:e75275

DOI: 10.2196/75275

PMID: 41223409

PMCID: 12779104

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.