JMIR Preprints #26892: Constructing high-fidelity phenotype knowledge graphs with a fine-grained semantic information model

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Constructing high-fidelity phenotype knowledge graphs with a fine-grained semantic information model

Lizong Deng;
Luming Chen;
Tao Yang;
Mi Liu;
Shicheng Li;
Taijiao Jiang

ABSTRACT

Background:

Phenotypes characterize clinical manifestations of disease, which provide important information for diagnosis. Therefore, constructing phenotype knowledge graphs of disease is valuable to the development of artificial intelligence in medicine. However, phenotype knowledge graphs in current knowledge bases such as WikiData and DBpedia are coarse-grained knowledge graphs, because they only consider core concepts of phenotypes but neglects details (attributes) associated with phenotypes.

Objective:

To characterize details of disease phenotypes in clinical guidelines, we proposed a fine-grained semantic information model named PhenoSSU (Semantic Structured Unit of Phenotypes).

Methods:

PhenoSSU is an "entity-attribute-value" model by its very nature, which aims to capture full semantics underlying phenotype descriptions with a series of attributes and values. 193 clinical guidelines of infectious diseases from Wikipedia were selected as the study corpus, and 12 attributes from SNOMED-CT were introduced into the PhenoSSU model based on co-occurrences of phenotype concepts and attribute values. The expressive power of the PhenoSSU model was evaluated by analyzing whether a PhenoSSU instance could capture full semantic underlying the corresponding phenotype description. To automatically construct fine-grained phenotype knowledge graphs, A hybrid strategy that firstly recognized phenotype concepts with the MetaMap tool and then predicted attribute values of phenotypes with machine learning classifiers was developed.

Results:

Fine-grained phenotype knowledge graphs of 193 infectious diseases were manually constructed with the BRAT annotation tool. It was found that the PhenoSSU model could precisely represent 89.5% (3757/4020) of phenotype descriptions in clinical guidelines. By comparison, other information models such as the Clinical Element Model and the HL7 FHIR model could only capture full semantics underlying 48.4% and 21.8% of phenotype descriptions, respectively. The hybrid strategy achieved an F1-score of 0.732 for the subtask of phenotype concept recognition and an average weighted accuracy of 0.776 for the subtask of attribute value prediction.

Conclusions:

PhenoSSU is an effective information model for the precise representation of phenotype knowledge in clinical guidelines, and machine learning can be used to improve efficiency for constructing PhenoSSU-based knowledge graphs. Our work will potentially benefit knowledge-based systems for diagnosis.

Citation

Please cite as:

Deng L, Chen L, Yang T, Liu M, Li S, Jiang T

Constructing High-Fidelity Phenotype Knowledge Graphs for Infectious Diseases With a Fine-Grained Semantic Information Model: Development and Usability Study

J Med Internet Res 2021;23(6):e26892

DOI: 10.2196/26892

PMID: 34128811

PMCID: 8277235

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Jan 2, 2021

Date Accepted: May 6, 2021

Constructing high-fidelity phenotype knowledge graphs with a fine-grained semantic information model

ABSTRACT

Citation

Copyright