JMIR Preprints #73765: Predicting Lymph Node Metastasis in Rectal Cancer: Development and Validation of a Machine Learning Model Using Clinical Data

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Predicting Lymph Node Metastasis in Rectal Cancer: Development and Validation of a Machine Learning Model Using Clinical Data

Wei Hou;
Chuangwei Li;
Zhen Wang;
Wanqin Wang;
Shouhong Wan;
Bingbing Zou

ABSTRACT

Background:

Rectal cancer (RC) is a common malignant tumor with lymph node metastasis (LNM) being a critical determinant of patient prognosis. Traditional diagnostic methods have limitations, necessitating the development of predictive models using clinical data.

Objective:

This study aimed to construct and validate machine learning models to predict LNM risk in RC patients based on clinical data.

Methods:

Retrospective data from 2,454 RC patients (SEER database) were split into training (n=1,954) and internal validation (n=500) sets. An external cohort (n=500) was obtained from the First Affiliated Hospital of Anhui Medical University. Lymph node features identified via CT scans were integrated with clinicopathological data. Variables were selected using LASSO, followed by univariate and multivariate logistic regression. Eleven ML models (LR Logistic Regression, KNN K - Nearest Neighbors, ET Extremely Randomized Trees, NB Naive Bayes, XGB XGBoost, LGBM LightGBM, MLP Multi - Layer Perceptron, GB Gradient Boosting, SVM Support Vector Machine, RF Random Forest, AB Ada – Boost) were evaluated via AUC, calibration curves, and decision curve analysis (DCA).

Results:

LNM prevalence was 26.9% (training), 27% (internal validation), and 81% (external validation). Independent LNM predictors included tumor grade, clinical T stage, N stage, tumor length, neural invasion, and total lymph nodes. Internal validation AUC ranged 0.859–0.964; external validation AUC was 0.735-0.838. In the internal validation set, RF and ET achieved the highest AUC (0.964, 95%CI: 0.950–0.978), while XGBoost demonstrated superior cross-cohort stability (AUC=0.942, 95%CI: 0.925–0.959). For external validation, GB had the highest AUC (0.838, 95%CI: 0.801–0.875), followed by XGBoost (0.832, 95%CI: 0.794–0.869). XGBoost showed minimal calibration error with curves closest to the ideal diagonal and yielded the highest net benefit in DCA across critical thresholds.

Conclusions:

This study successfully developed and validated 11 ML models to predict LNM risk in RC. The XGBoost model was optimal, achieving AUC > 0.9 in 10 internal models and AUC > 0.8 in 7 external models.The identified predictors of LNM can facilitate early diagnosis and personalized treatment, highlighting the potential of integrating CT scan data with clinicopathological findings to build effective predictive models. Clinical Trial: Trial Registration: chictr.org.cn ChiCTR2400094858

Citation

Please cite as:

Hou W, Li C, Wang Z, Wang W, Wan S, Zou B

Predicting Lymph Node Metastasis in Rectal Cancer: Development and Validation of a Machine Learning Model Using Clinical Data

JMIR Med Inform 2025;13:e73765

DOI: 10.2196/73765

PMID: 40986886

PMCID: 12456929

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Mar 11, 2025

Date Accepted: Jul 30, 2025

Predicting Lymph Node Metastasis in Rectal Cancer: Development and Validation of a Machine Learning Model Using Clinical Data

ABSTRACT

Citation

Copyright