Accepted for/Published in: JMIR Formative Research
Date Submitted: Aug 19, 2025
Date Accepted: Mar 6, 2026
Artificial Intelligence Design for Racial-based Prostate Cancer Stage Classification with Multi-Layer Perceptron: Feature Selection Optimization Approach
ABSTRACT
Background:
Prostate cancer progression exhibits significant variability influenced by biological and racial factors. DNA methylation profiling has shown potential in early cancer detection, but its integration with machine learning across racially diverse populations remains limited.
Objective:
This study aims to develop a race-aware framework using DNA methylation data and a Multi-Layer Perceptron (MLP) model to classify prostate cancer stages into early (I–II) and late (III–IV) stages.
Methods:
Methylation and phenotype data from the TCGA-PRAD dataset were processed using Differentially Methylated Positions (DMP) analysis to identify CpG sites correlated with cancer stages. These features were further refined through Recursive Feature Elimination (RFE) and used to train MLP models. SHapley Additive exPlanations (SHAP) and Local Interpretable Model-Agnostic Explanations (LIME) were used to interpret the model and identify key DNA methylation features contributing to model predictions.
Results:
The best-performing model achieved ~95% accuracy and up to 99% AUC on the majority race (White) training data using 70 selected features. However, performance declined sharply on minority race groups, revealing the effects of sample imbalance and race-specific methylation patterns. Feature importance examination indicates strong patterns within certain CpG sites driving the models predictions.
Conclusions:
We propose a race-aware MLP model for prostate cancer stage classification using DNA methylation data, optimized through DMP and RFE-based feature selection. SHAP and LIME confirmed the predictive relevance of selected CpG sites, supporting model transparency. Results highlight high performance within the White cohort but reveal poor generalization to minority groups, emphasizing the importance of race-specific modeling strategies.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.