Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Feb 24, 2020
Date Accepted: Aug 19, 2020
Phenotypically Similar Rare Disease Identification from an Integrative Knowledge Graph for Data Harmonization
ABSTRACT
Background:
Rare diseases can often be hard to diagnose precisely due to the limited exposure many primary health care providers may have had. This can lead to missed, delayed or inaccurate diagnoses even when an approved, effective therapy is available. Although many efforts have been made to develop comprehensive disease resources that capture rare disease information for the purpose of clinical decision making and education, there is no single, standardized method to define and harmonize rare diseases across multiple resources. This introduces a certain level of redundancy and inconsistency that may ultimately increase confusion and difficulty for wide use of these resources. To overcome such encumbrance and decrease the need for human curation and maintenance effort, we report our initial work to identify related diseases presenting in the Genetic And Rare Diseases (GARD) database for supporting further data harmonization.
Objective:
We aimed to systematically determine disease relevance among rare diseases from the GARD database, and establish systematic rules for data harmonization. Ultimately, the results generated from this study can be one potential rare disease resource for clinical decision support.
Methods:
In this paper, we computed disease similarity among the GARD diseases based on their mappings to several well-known rare disease resources and aligned human adjudgment to further evaluate and categorize those relevant disease pairs into pre-defined disease relevance groups. In addition, we adopted disease relevance presenting among siblings from disease classification trees, and prioritized relevant diseases based on a number of shared phenotypes.
Results:
By utilizing the GARD disease mappings to several well-known rare disease resources, we computed disease similarity, about 86% (339) disease pairs identified as relevant, of which 68% disease pairs (268) had similarity scores greater than 0.5. On the other hand, by scanning disease classification trees from MONDO and Orphanet, total 102,034 disease pairs with one and more shared clinical phenotypes were identified as relevant. Manual evaluation shows 88% of accuracy of prioritizing relevant disease with clinical phenotypes.
Conclusions:
We successfully identified relevant rare diseases from the GARD database via two different approaches, i.e., disease similarity comparison and disease relevance adoption from disease siblings. The results will not only direct the GARD data harmonization for use in expanding translational science research, but also accelerate data transparence and consistence across different disease resources/terminologies, towards the most robust and up-to-date knowledge on rare diseases.
Citation
Per the author's request the PDF is not available.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.