Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Apr 7, 2022
Open Peer Review Period: Apr 7, 2022 - Jun 2, 2022
Date Accepted: May 30, 2022
Date Submitted to PubMed: Jun 3, 2022
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Deep Denoising of Raw Biomedical Knowledge Graph From COVID-19 Literature, LitCovid, and Pubtator: Framework Development and Validation

Jiang C, Ngo V, Chapman R, Yu Y, Liu H, Jiang G, Zong N

Deep Denoising of Raw Biomedical Knowledge Graph From COVID-19 Literature, LitCovid, and Pubtator: Framework Development and Validation

J Med Internet Res 2022;24(7):e38584

DOI: 10.2196/38584

PMID: 35658098

PMCID: 9301549

Deep Denoising of Raw Biomedical Knowledge Graph from COVID-19 Literature, LitCovid and Pubtator

  • Chao Jiang; 
  • Victoria Ngo; 
  • Richard Chapman; 
  • Yue Yu; 
  • Hongfang Liu; 
  • Guoqian Jiang; 
  • Nansu Zong

ABSTRACT

Construction of most knowledge graphs, including those COVID-19-related, are based upon the co-occurring biomedical entities retrieved from recent literature. However, the applications drawn from these graphs (e.g., association predictions amongst genes, drugs, and diseases) have a high probability of false-positive predictions as the co-occurrences in literature do not always mean a true biomedical association between two entities. Data quality plays an important role in training deep neural network models, however, most of the current works in this area were focused on improving a model’s performance with the assumption that the pre-processed data are clean. Here, we studied how to remove noise from raw knowledge graphs with limited labeled information. Two novel Generative Adversarial Network models, NetGAN and CELL, applied to both the synthetic dataset and real dataset for edge classification (i.e., link prediction) leveraging unlabeled link information. The performance of link prediction, especially in the extreme case of training data versus test data at a ratio of 1:9, demonstrated that the promised method still achieved favorable results (AUCROC > 0.8 for synthetic and 0.7 for real dataset) despite the limited amount of testing data available. Our preliminary findings showed the proposed method achieved promising results for removing noise in data preprocessing of the biomedical knowledge graph, and potentially improved the performance of downstream applications by providing cleaner data.


 Citation

Please cite as:

Jiang C, Ngo V, Chapman R, Yu Y, Liu H, Jiang G, Zong N

Deep Denoising of Raw Biomedical Knowledge Graph From COVID-19 Literature, LitCovid, and Pubtator: Framework Development and Validation

J Med Internet Res 2022;24(7):e38584

DOI: 10.2196/38584

PMID: 35658098

PMCID: 9301549

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.