Currently submitted to: JMIR Bioinformatics and Biotechnology
Date Submitted: Dec 24, 2024
Open Peer Review Period: Jan 9, 2025 - Mar 6, 2025
(currently open for review)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Unpacking Genomic Biomarkers For PD-1 Immunotherapy Success in Non-Small Cell Lung Cancer Using Deep Neural Networks
ABSTRACT
Background:
Non-small cell lung cancer (NSCLC) is one of the leading causes of cancer-related mortality worldwide. PD-1 immunotherapy has shown promising results in the treatment of NSCLC; however, not all patients respond effectively to this treatment. Identifying predictive biomarkers for PD-1 therapy response is critical to improving patient outcomes and optimizing treatment strategies. Traditional methods of biomarker discovery often fall short in terms of accuracy and comprehensiveness. Recent advancements in deep learning provide a powerful approach to analyze complex genomic data and identify novel biomarkers that may predict therapeutic responses.
Objective:
This study aims to leverage machine learning techniques, particularly deep neural networks (DNN), to identify genomic biomarkers for predicting responses to PD-1 immunotherapy in NSCLC patients. By applying the DeepImmunoGene model to RNA-seq data, the study compares the performance of DNN, SVM, and XGBoost in predicting patient responses. It focuses on identifying key biomarkers through feature selection and deep learning that can enhance patient stratification and improve the accuracy of PD-1 immunotherapy predictions, contributing to more personalized treatment strategies.
Methods:
Differentially expressed genes (DEGs) were identified in RNA-seq data from 355 NSCLC patients using the LIMMA package in R, followed by preprocessing with log2 transformation. Machine learning models, including Support Vector Machines (SVM), XGBoost, and Deep Neural Networks (DNN), were employed to analyze gene expression data, with hyperparameters optimized using GridSearchCV. The DNN model's predictive performance was evaluated with permutation importance to identify genes critical for immunotherapy response. The models were trained on 284 patients, with 71 used for testing. Evaluation metrics like accuracy, AUC, precision, recall, specificity, and F1 score were used to assess performance. Statistical significance was tested using the Kruskal-Wallis test.
Results:
Initially, we identified 1,093 differentially expressed genes from RNA-seq data of 355 patients. We then trained models using SVM, XGBoost, and DNN to predict immunotherapy response. The DNN model outperformed both SVM and XGBoost with an accuracy of 82%, AUC of 90%, and recall of 0.85, significantly improving predictive performance by capturing non-linear relationships in gene expression data. To identify key biomarkers, we performed a permutation importance analysis, narrowing down the gene set to 98 genes. DeepImmunoGene, trained on these 98 genes, showed superior results, with an accuracy of 85% and an AUC of 90%. The top 36 upregulated genes in responders and 62 upregulated genes in non-responders were identified, which could serve as potential biomarkers for predicting response to PD-1 inhibitors. These findings suggest that the DeepImmunoGene model, with its ability to capture complex gene interactions, can reliably predict immunotherapy outcomes and provide insights into the molecular mechanisms of response, paving the way for more personalized treatment strategies.
Conclusions:
The DeepImmunoGene predictive model has successfully identified 36 upregulated genes that may serve as potential genomic biomarkers for predicting NSCLC patient responses to PD-1 immunotherapy. Notably, the ten most significant genes—GSTT2B, HMGA2, AC135050.2, ANKRD33B, MMP13, PLA2G2D, RASGEF1A, BIRC7, DCAF4L2, and CHMP7—offer valuable insights into the underlying mechanisms of treatment responses. These biomarkers not only help predict which patients are most likely to respond to PD-1 immunotherapy but also shed light on the molecular factors that explain non-response.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.