Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Formative Research

Date Submitted: Jul 18, 2023
Date Accepted: Oct 27, 2023

The final, peer-reviewed published version of this preprint can be found here:

Potential Schizophrenia Disease-Related Genes Prediction Using Metagraph Representations Based on a Protein-Protein Interaction Keyword Network: Framework Development and Validation

Yu S, Wang Z, Nan J, Li A, Yang X, Tang X

Potential Schizophrenia Disease-Related Genes Prediction Using Metagraph Representations Based on a Protein-Protein Interaction Keyword Network: Framework Development and Validation

JMIR Form Res 2023;7:e50998

DOI: 10.2196/50998

PMID: 37966892

PMCID: 10687686

Potential Schizophrenia Disease-related Genes Prediction Using Metagraph Representations Based on PPI-Keyword Network: Framework Development and Validation

  • Shirui Yu; 
  • Ziyang Wang; 
  • Jiale Nan; 
  • Aihua Li; 
  • Xuemei Yang; 
  • Xiaoli Tang

ABSTRACT

Background:

Schizophrenia is a serious mental disease. With increased research funding for this disease, it has become one of the key focus in medical field. Searching for the associations between diseases and genes provides an effective way to study this complex disease which may enhance the studies of schizophrenia pathology and lead to targets for treatment.

Objective:

This study aims to identify potential schizophrenia risk genes by employing machine learning methods to extract topological characteristics of proteins and their functional roles in the PPI-Keywords(PPIK)network and understands complex disease-causing property, and proposes a PPIK-based metagraph representation approach.

Methods:

To enrich PPI network, we integrated keywords describing protein properties into the PPI network and constructed PPIK network. We extracted features that describe the topology of this network through metagraphs. We further transformed these metagraphs into vectors and represented proteins with a series of vectors. Then we trained and optimized our model using Random Forest(RF), XGBoost and LightGBM.

Results:

Comprehensive experiments have demonstrated the good performance of our proposed method with AUC between 0.7 and 0.9. It also outperformed the baselines including Random Walk with Restart (RWR), Average Commute Time (ACT)and Katz for overall disease protein prediction. Compared with PPI network, the baselines improved the performance by 8.3% in AUC on average after the complementation of keywords into PPI network, and our experiment on PPIK network demonstrated that the metagraph-based method also improved by 8.3% in AUC on average compared with the baselines. According to the comprehensive performance of the three models, we chose the best one, namely LightGBM, for disease protein prediction, with the measurements of precision, recall, F1 Score and AUC being 0.528,0.727,0.704 and 0.856 respectively. In particular, we transformed these proteins to their producer gene ID’s and identified top 20 genes as the most probable schizophrenia-risk genes, including EYA3, CNTN4, HSPA8, LRRK2, AFP, etc. We further validated these outcomes against metagraph features and evidence from literature, and made feature analysis and exploited evidence in the literature to interpret the correlation between predicted genes and diseases.

Conclusions:

The metagraph representations based on PPIK network framework turns out to be effective in potential schizophrenia risk genes identification, and the results are quite reliable as evidence can be found in the literature to support our prediction. Our approach can provide more biological insights into the pathogenesis of schizophrenia.


 Citation

Please cite as:

Yu S, Wang Z, Nan J, Li A, Yang X, Tang X

Potential Schizophrenia Disease-Related Genes Prediction Using Metagraph Representations Based on a Protein-Protein Interaction Keyword Network: Framework Development and Validation

JMIR Form Res 2023;7:e50998

DOI: 10.2196/50998

PMID: 37966892

PMCID: 10687686

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.