JMIR Preprints #38584: Deep Denoising of Raw Biomedical Knowledge Graph from COVID-19 Literature, LitCovid and Pubtator

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Deep Denoising of Raw Biomedical Knowledge Graph from COVID-19 Literature, LitCovid and Pubtator

Chao Jiang;
Victoria Ngo;
Richard Chapman;
Yue Yu;
Hongfang Liu;
Guoqian Jiang;
Nansu Zong

ABSTRACT

Construction of most knowledge graphs, including those COVID-19-related, are based upon the co-occurring biomedical entities retrieved from recent literature. However, the applications drawn from these graphs (e.g., association predictions amongst genes, drugs, and diseases) have a high probability of false-positive predictions as the co-occurrences in literature do not always mean a true biomedical association between two entities. Data quality plays an important role in training deep neural network models, however, most of the current works in this area were focused on improving a model’s performance with the assumption that the pre-processed data are clean. Here, we studied how to remove noise from raw knowledge graphs with limited labeled information. Two novel Generative Adversarial Network models, NetGAN and CELL, applied to both the synthetic dataset and real dataset for edge classification (i.e., link prediction) leveraging unlabeled link information. The performance of link prediction, especially in the extreme case of training data versus test data at a ratio of 1:9, demonstrated that the promised method still achieved favorable results (AUCROC > 0.8 for synthetic and 0.7 for real dataset) despite the limited amount of testing data available. Our preliminary findings showed the proposed method achieved promising results for removing noise in data preprocessing of the biomedical knowledge graph, and potentially improved the performance of downstream applications by providing cleaner data.

Citation

Please cite as:

Jiang C, Ngo V, Chapman R, Yu Y, Liu H, Jiang G, Zong N

Deep Denoising of Raw Biomedical Knowledge Graph From COVID-19 Literature, LitCovid, and Pubtator: Framework Development and Validation

J Med Internet Res 2022;24(7):e38584

DOI: 10.2196/38584

PMID: 35658098

PMCID: 9301549

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Apr 7, 2022

Open Peer Review Period: Apr 7, 2022 - Jun 2, 2022

Date Accepted: May 30, 2022

Date Submitted to PubMed: Jun 3, 2022

(closed for review but you can still tweet)

Deep Denoising of Raw Biomedical Knowledge Graph from COVID-19 Literature, LitCovid and Pubtator

ABSTRACT

Citation

Copyright