Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Dec 23, 2025
Date Accepted: May 5, 2026

The final, peer-reviewed published version of this preprint can be found here:

Addressing Data Quality Challenges in Lung Cancer Data Within the Observational Medical Outcomes Partnership Common Data Model: Observational Study

Declerck J, Deschepper M, Colpaert K, Kalra D, Coorevits P

Addressing Data Quality Challenges in Lung Cancer Data Within the Observational Medical Outcomes Partnership Common Data Model: Observational Study

J Med Internet Res 2026;28:e90246

DOI: 10.2196/90246

PMID: 42258805

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Addressing Data Quality Challenges in OMOP CDM: A Case Study on Lung Cancer Data Mapping

  • Jens Declerck; 
  • Mieke Deschepper; 
  • Kirsten Colpaert; 
  • Dipak Kalra; 
  • Pascal Coorevits

ABSTRACT

Background:

The secondary use of health data is essential for advancing medical research and improving clinical practices. The Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) enables large-scale, multi-center studies but faces challenges in consistency, completeness, and transparency during data mapping from the original data sources.

Objective:

This study aimed to evaluate the quality of the mapping process for lung cancer data within the Federated Health Innovation Network (FHIN) project, focusing on consistency, completeness, and challenges encountered throughout the process.

Methods:

Clinical data from Ghent University Hospital was mapped to the OMOP CDM using a reference data dictionary. Consistency was assessed through Cohen’s kappa scores, while completeness was evaluated by comparing patient and record counts pre- and post-mapping. Challenges, including unstructured data and evolving reference standards, were documented and analysed.

Results:

High consistency was observed for structured variables, while some unstructured variables like “Smoking status” were excluded due to free-text format and a lack of suitable OMOP concepts. Completeness analysis showed minimal data loss for most structured variables but significant challenges for unstructured data. Persistent issues included evolving data dictionary versions and diagnostic code granularity mismatches between institutions, underscoring structural challenges in standardization.

Conclusions:

The transformation of lung cancer data to the OMOP CDM highlights both technical and systemic challenges, including handling unstructured data and addressing granularity discrepancies. A multidisciplinary approach involving clinical and technical expertise is crucial to ensure reliable, high-quality datasets for multi-center research.


 Citation

Please cite as:

Declerck J, Deschepper M, Colpaert K, Kalra D, Coorevits P

Addressing Data Quality Challenges in Lung Cancer Data Within the Observational Medical Outcomes Partnership Common Data Model: Observational Study

J Med Internet Res 2026;28:e90246

DOI: 10.2196/90246

PMID: 42258805

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.