Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Aug 8, 2021
Open Peer Review Period: Aug 8, 2021 - Aug 27, 2021
Date Accepted: Sep 18, 2021
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Adverse Drug Event Prediction Using Noisy Literature-Derived Knowledge Graphs: Algorithm Development and Validation

Dasgupta S, Jayagopal A, Jun Hong AL, Mariappan R, Rajan V

Adverse Drug Event Prediction Using Noisy Literature-Derived Knowledge Graphs: Algorithm Development and Validation

JMIR Med Inform 2021;9(10):e32730

DOI: 10.2196/32730

PMID: 34694230

PMCID: 8576589

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Adverse Drug Event Prediction using Noisy Literature-Derived Knowledge Graphs: Algorithm Development and Evaluation

  • Soham Dasgupta; 
  • Aishwarya Jayagopal; 
  • Abel Lim Jun Hong; 
  • Ragunathan Mariappan; 
  • Vaibhav Rajan

ABSTRACT

Background:

Adverse Drug Events (ADEs) are unintended side-effects of drugs that cause substantial clinical and economic burden globally. Not all ADEs are discovered during clinical trials and so, post-marketing surveillance, called pharmacovigilance, is routinely conducted to find unknown ADEs. A wealth of information, that facilitates ADE discovery, lies in the enormous and continuously growing body of biomedical literature. Knowledge graphs (KG) encode information from the literature, where vertices and edges represent clinical concepts and their relations respectively. The scale and unstructured form of the literature necessitates the use of natural language processing (NLP) to automatically create such KGs. Previous studies have demonstrated the utility of such literature-derived KGs in ADE prediction. Through unsupervised learning of representations (features) of clinical concepts from the KG, that are used in machine learning models, state-of-the-art results for ADE prediction were obtained on benchmark datasets.

Objective:

In literature-derived KGs there is `noise’ in the form of false positive (erroneous) and false negative (absent) nodes and edges due to limitations of the NLP techniques used to infer the KGs. Previous representation learning methods do not account for such inaccuracies in the graph. NLP algorithms can quantify the confidence in their inference of extracted concepts and relations from the literature. Our hypothesis that motivates this work is that by utilizing such confidence scores during representation learning, the learnt embeddings would yield better features for ADE prediction models.

Methods:

We develop methods to utilize these confidence scores on two well-known representation learning methods – Deepwalk and TransE – to develop their `weighted’ versions – Weighted Deepwalk and Weighted TransE. These methods are used to learn representations from a large literature-derived KG, SemMedDB, containing more than 93 million clinical relations. They are compared with Embeddings of Sematic Predictions (ESP), that, to our knowledge, is the best reported representation learning method on SemMedDB with state-of-the-art results for ADE prediction. Representations learnt from different methods are used (separately) as features of drugs and diseases to build classification models for ADE prediction using benchmark datasets. The classification performance of all the methods is compared rigorously over multiple cross-validation settings.

Results:

The `weighted’ versions we design are able to learn representations that yield more accurate predictive models compared to both the corresponding unweighted versions of Deepwalk and TransE, as well as ESP, in our experiments. Performance improvements are up to 5.75% in F1 score and 8.4% in AUC, thus advancing the state-of-the-art in ADE prediction from literature-derived KGs. Implementation of our new methods and all experiments are available at https://bitbucket.org/cdal/kb_embeddings.

Conclusions:

Our classification models can be used to aid pharmacovigilance teams in detecting potentially new ADEs. Our experiments demonstrate the importance of modelling inaccuracies in the inferred KGs for representation learning, which may also be useful in other predictive models that utilize literature-derived KGs.


 Citation

Please cite as:

Dasgupta S, Jayagopal A, Jun Hong AL, Mariappan R, Rajan V

Adverse Drug Event Prediction Using Noisy Literature-Derived Knowledge Graphs: Algorithm Development and Validation

JMIR Med Inform 2021;9(10):e32730

DOI: 10.2196/32730

PMID: 34694230

PMCID: 8576589

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.