Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Mar 7, 2022
Date Accepted: Apr 26, 2022
Date Submitted to PubMed: Apr 27, 2022
Construction of A Linked Dataset of COVID-19 Knowledge Graphs: Development and Applications
ABSTRACT
Background:
With the continuous spread of COVID-19, information about the worldwide pandemic is exploding. It's necessary and significant to organize large information. As the key branch of AI, Knowledge Graph(KG) is helpful to structure, reason and understand data.
Objective:
To improve the utilization value of the information and effectively aid researchers to combat COVID-19, we have constructed and successively released a unified linked dataset OpenKG-COVID19, one of the largest knowledge graphs about COVID-19. OpenKG-COVID19 includes ten interlinked COVID-19 sub-graphs covering encyclopedia, concept, medical, research, event, health, epidemiology, goods, prevention, and character.
Methods:
In this paper, we introduce the key techniques exploited in building COVID-19 KGs in a top-down way. Firstly, the schema modelling process of each KG in OpenKG-COVID19 is described. Secondly, we propose different methods for extracting knowledge from open government sites, professional texts, public domain-specific sources, and public encyclopedia sites. The curated ten COVID-19 KGs are further linked together in both schema-level and data-level. In addition, we present the naming convention for OpenKG-COVID19.
Results:
OpenKG-COVID19 has more than 2,572 concepts, 329,600 entities, 513 properties and2,687,329 facts, and the dataset will be updated continuously. Each COVID-19 KG is evaluated, and the average precision is more than 93%. OpenKG-COVID19 dataset is available at the OpenKG website and please feel free to download particular sub-graphs. We have developed search and browse interfaces and a SPARQL endpoint to provide a more friendly access way. Possible intelligent applications based on OpenKG-COVID19 for further development are also listed in this paper.
Conclusions:
Knowledge Graph is known to be useful for intelligent question answering, semantic search, recommendation system, visualization analysis, and decision-making support. Research related to COVID-19, bio-medicine, and many other communities can benefit from OpenKG-COVID19. Furthermore, the ten KGs will be continuously updated to ensure that the public could access sufficient and up-to-date knowledge.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.