Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Dec 31, 2018
Date Accepted: Feb 25, 2019
(closed for review but you can still tweet)
Genomic Common Data Model for Seamless Interoperation of Biomedical Data in Clinical Practice: Retrospective Study
ABSTRACT
Background:
Clinical sequencing data should be shared so as to achieve the sufficient scale and diversity required for providing strong evidence toward improving patient care. A distributed research network (DRN) allows researchers to share this evidence rather than the patient-level data across centers, thereby avoiding privacy issues. The Observational Medical Outcomes Partnership (OMOP) common data model (CDM), currently used in DRNs, has low coverage of sequencing data and does not reflect the latest trend of precision medicine.
Objective:
The aim of this study was to develop and evaluate the feasibility of a genomic CDM (G-CDM), as an extension of the OMOP-CDM, for application of genomic data in clinical practice.
Methods:
Existing genomic data models and sequencing reports were reviewed to extend the OMOP-CDM to cover genomic data. Human Genome Organisation (HUGO) Gene Nomenclature Committee (HGNC) and Human Genome Variation Society (HGVS) nomenclature was adopted to standardize the terminology in the model. Sequencing data of 114 and 1060 patients with lung cancer were obtained from the Ajou University School of Medicine (AUSOM) database of Ajou University Hospital and The Cancer Genome Atlas (TCGA), respectively, which were transformed to a format appropriate for the G-CDM. The data were compared with respect to gene name, variant type, and actionable mutations.
Results:
The G-CDM was extended into four tables linked to tables of the OMOP-CDM. Upon comparison with TCGA data, a clinically actionable mutation, ‘p.Leu858Arg’, in the EGFR gene was 6.64-times more frequent in the AUSOM database, while the ‘p.Gly12Xaa’ mutation in the KRAS gene was 2.02-times more frequent in the TCGA dataset. The data-exploring tool GeneProfiler was further developed to conduct descriptive analyses automatically using the G-CDM, which provides the proportions of genes, variant types, and actionable mutations. GeneProfiler also allows for querying the specific gene name and HGVS nomenclature to calculate the proportion of patients with a given mutation.
Conclusions:
We developed the G-CDM for effective integration of genomic data with standardized clinical data allowing for data sharing across institutes. The feasibility of the G-CDM was validated by assessing the differences in data characteristics between two different genomic databases through the proposed data-exploring tool, GeneProfiler. The G-CDM may facilitate analyses of interoperating clinical and genomic datasets across multiple institutions, minimizing privacy issues and thereby enabling researchers to better understand the characteristics of patients and promote personalized medicine in clinical practice.
Citation
Request queued. Please wait while the file is being generated. It may take some time.