Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Feb 26, 2021
Date Accepted: May 19, 2021
Matching Biomedical Ontologies: Clues, Approach, and Scalability
ABSTRACT
Background:
Ontology matching seeks to find semantic correspondences between ontologies. With more and more biomedical ontologies are developed independently and have overlapping, matching these ontologies has become a critical task in many biomedical applications. However, there still exists some challenges in matching biomedical ontologies. First, constructing matching clues based on biomedical ontology information is a non-trivial problem. Second, it is unknown that whether there are dominant matchers during matching biomedical ontologies. Finally, it also suffers from the computational complexity owing to the large-scale sizes of biomedical ontologies.
Objective:
The interoperability between biomedical ontologies is critically important, however, due to the natural heterogeneity and large scale size of biomedical ontologies, it is still very difficult to efficiently find alignments between ontologies. This paper aims to explore matching clues and empirically study the influence of various combination strategies of clues on biomedical ontology alignments. Besides, extended reduction anchors are introduced to effectively decrease the time complexity during matching large biomedical ontologies.
Methods:
In this paper, we first construct atomic and composite matching clues from four dimensions: terminology, structure, external knowledge, and representation learning. Then we present a spectrum of matchers based on matching clues and comprehensively investigate the effectiveness of them. In addition, we also carry out a systematic comparative evaluation of different combinations of matchers. Finally, extended reduction anchors are proposed to effectively reduce the time complexity for matching large scale biomedical ontologies.
Results:
The experimental results show that considering distinguishable matching clues in biomedical ontologies leads to a substantial improvement in F-measure over using all available information. And incorporating different types of matchers with reliability also leads to a marked improvement which is comparative to the state-of-the-art methods, and the dominant matchers achieve F1 score of 0.9271 for Anatomy, 0.8218 for FMA-NCI, and 0.50 for FMA-SNOMED respectively. Extended reduction anchors are able to resolve the scalability problem of matching large biomedical ontologies and achieves a significant reduction of time complexity with little loss in F1 measure at the same time, with 0.21% decrease in Anatomy and 0.84% decrease in FMA-NCI while 2.65% increase in FMA-SNOMED.
Conclusions:
We have systematically investigated and compared the effectiveness of different matching clues, matchers, and combination strategies. Our empirical study demonstrates that distinguishing clues perform better than using the all clues available in ontologies during matching biomedical ontologies. In contrast to the matchers with single clue, the matchers combining multiple clues have more stable and accurate performance. In addition, our results provide evidence that the approach based on extended reduction anchors performs well for large ontology matching task, demonstrating an effective solution for the problem.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.