Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Nov 4, 2019
Date Accepted: Apr 8, 2020
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Identification of high-order SNP barcodes in breast cancer using hybrid Taguchi-genetic algorithm
ABSTRACT
Background:
Breast cancer is a major disease burden among female population, which is a highly genomic associated human disease. However, in genetic studies of complex diseases, modern geneticists face challenges in detecting interactions between loci.
Objective:
This study investigates whether variations of single-nucleotide polymorphisms (SNPs) are associated with histopathological tumor characteristics in breast cancer patients.
Methods:
A hybrid Taguchi-genetic algorithm (HTGA) was proposed to identify the high-order single-nucleotide polymorphism (SNP) barcodes in a breast cancer case–control study. A Taguchi method was used to enhance a genetic algorithm (GA) for identifying high-order SNP barcodes. Taguchi method was hybrid into GA after the crossover operations, in order to optimize the generated offspring systematically to enhance the GA search ability.
Results:
The proposed HTGA can effectively converge to a promising region within the problem space and provide excellent SNP barcode identification. The regression analysis was used to validate the association between breast cancer and identified high-order SNP barcode. The maximum odds ratio (OR) was less than 1 (ranged between 0.870 and 0.755) for two-to seven-order SNP barcodes.
Conclusions:
We systematically evaluated the interaction effects of 26 SNPs within growth factor-related genes for breast carcinogenesis pathways. HTGA could successfully identified the significant high-order SNP barcodes by evaluating the differences between cases and controls. The validation results showed that HTGA can provide better fitness values than compared methods in identification of high-order SNP barcodes using breast cancer cases-controls datasets.
Citation