Accepted for/Published in: JMIR Bioinformatics and Biotechnology
Date Submitted: Jan 18, 2024
Date Accepted: Apr 25, 2024
Deep Learning Based Identification of Tissue of Origin for Carcinomas of Unknown Primary utilizing micro-RNA expression
ABSTRACT
Background:
Carcinoma of Unknown Primary (CUP) is a subset of metastatic cancers in which the primary tissue source of the cancer cells, or origin, remains unidentified. CUP is the eighth most common malignancy worldwide, and accounts for up to five percent of all malignancies. Representing an exceptionally aggressive category of metastatic cancers, the median survival of CUP is approximately three to six months. The tissue in which a cancer arises plays a key role in our understanding of sensitivities to various forms of cell death in cancer cells. Thus, the lack of knowledge on tissue of origin makes it difficult to devise tailored and effective treatments for patients with CUP. Developing quick and clinically implementable methods to identify the tissue of origin of the primary site is crucial in treating CUP patients. Non-coding RNAs, may hold potential for origin identification and provide a robust route to clinical implementation due to their resistance against chemical degradation.
Objective:
In this work, we investigate the potential of microRNAs, a subset of non-coding RNAs, as highly accurate biomarkers for detecting the tissue of origin through data driven, machine learning, approaches for metastatic cancers.
Methods:
We use microRNA expression data from the Cancer Genome Atlas (TCGA) dataset and assess various machine learning approaches, from simple classifiers to deep learning approaches. As a validation of our classifiers, we evaluate the accuracy on a separate set of 194 primary tumor samples from the Sequence Read Archive (SRA). We use permutation feature importance to determine potential miRNA biomarkers and assess with PCA and t-SNE visualizations.
Results:
Our results show that it is possible to design robust classifiers to detect the tissue of origin for metastatic samples on the TCGA dataset with an accuracy of up to 96%, which may be utilized in situations of CUP. Our findings demonstrate that deep learning techniques enhance prediction accuracy. We progressed from an initial accuracy prediction of 62.5% with decision trees to 93.2% with logistic regression, finally achieving 96.1% accuracy using deep learning on metastatic samples. On the SRA validation set, a lower accuracy of 41.2% was achieved by decision tree, while deep learning achieved a higher accuracy of 81.2%. Notably, our feature importance analysis showed the top three most important biomarkers for predicting tissue of origin to be mir-10b, mir-205, and mir-196b, which aligns with previous work.
Conclusions:
Our findings highlight the potential of using machine learning techniques to devise accurate tests for detecting tissue of origin for CUP. Since microRNAs are carried throughout the body via extracellular vesicles secreted from cells, they may serve as key biomarkers for liquid biopsy due to their presence in blood plasma. Our work serves as a foundation towards developing blood-based cancer detection tests based on microRNA presence.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.