Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: May 7, 2020
Date Accepted: Aug 17, 2020
Integrating genomics and clinical data for statistical analysis using GEMINI and FHIR: System Design and Implementation
ABSTRACT
Background:
The introduction of Next Generation Sequencing (NGS) into molecular cancer diagnostics has led to an increase in the data available for the identification and evaluation of driver mutations and for defining personalized cancer treatment regiments. The meaningful combination of omics data, i.e. pathogenic gene variants and alterations, with other patient data to understand the full picture of malignancy has been challenging.
Objective:
This study describes the implementation of a system capable of processing, analyzing and subsequently combining NGS data with other non-high thru put patient data for analysis within and across institutions.
Methods:
On the basis of already existing NGS analysis workflows for the identification of malignant gene variants at the Institute of Pathology of the University Hospital Erlangen we defined basic requirements on an NGS processing and analysis pipeline and implemented a pipeline based on the GEMINI open source genetic variation database. For the purpose of validation, this pipeline was applied to data from the 1000 genome project and subsequently to NGS data derived from 206 patients of the local hospital. We further integrated the pipeline into existing structures of data integration centers at the University Hospital Erlangen and combined NGS data with local non-genomic patient-derived data available in Fast Healthcare Interoperability Resources (FHIR) format.
Results:
Using data from the 1000 genome project and from the patient cohort as input, the implemented system produced the same results compared to the output of already established methodologies. Further, it satisfied all our identified requirements and was successfully integrated into the existing infrastructure. Finally, we showed in an exemplary analysis how the data could be quickly loaded into and analyzed inside KETOS, a web-based analysis platform for statistical analysis and clinical decision support.
Conclusions:
This study demonstrates that the GEMINI open source database can be augmented to create an NGS analysis pipeline. This pipeline generates high quality results consistent with already established workflows for gene variant annotation and pathological evaluation. We further demonstrate how NGS-derived genomic and non-genomic data can be combined for further analysis, providing for data integration using standardized vocabularies and methods. Finally, we demonstrate the feasibility of the pipeline integration into the data integration center infrastructure currently being established across Germany.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.