JMIR Preprints #80574: Development of a Deep Learning Model to Predict 5-Year Mortality in Non-Small Cell Lung Cancer Using the Korean Central Cancer Registry

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Development of a Deep Learning Model to Predict 5-Year Mortality in Non-Small Cell Lung Cancer Using the Korean Central Cancer Registry

Jong Hyuk Lee;
Ho Cheol Kim;
Kyu-Won Jung;
Chang Min Choi

ABSTRACT

Background:

Non-small cell lung cancer (NSCLC) is one of the most common cancers and a leading cause of cancer-related mortality, making prognostic prediction clinically essential. Machine learning models are increasingly being utilized to assess prognosis; however, developing systems that combine high discrimination with clear, clinically interpretable reasoning remains challenging.

Objective:

To develop deep learning models that predict 5-We identified patients diagnosed between 2014 and 2017 who had complete clinical data, pulmonary function test results, histological information, genomic data, and staging details. After preprocessing, the cohort was divided into stratified training, validation, and test sets in a 70%:15%:15% ratio. Five models were tuned using Hyperband across ten predefined feature groups. The primary metric for evaluation was the area under the receiver operating characteristic curve (AUC); additional metrics reported included accuracy, F1 score, precision, and recall. Group-wise permutation importance was calculated for each model, and the concordance of importance rankings was assessed using the Friedman test. A Cox proportional hazards (CPH) model was utilized as a baseline comparator.year mortality in NSCLC using data from the Korea Central Cancer Registry (KCCR) and to quantify feature importance through permutation testing.

Methods:

We identified patients diagnosed between 2014 and 2017 who had complete clinical data, pulmonary function test results, histological information, genomic data, and staging details. After preprocessing, the cohort was divided into stratified training, validation, and test sets in a 70%:15%:15% ratio. Five models were tuned using Hyperband across ten predefined feature groups. The primary metric for evaluation was the area under the receiver operating characteristic curve (AUC); additional metrics reported included accuracy, F1 score, precision, and recall. Group-wise permutation importance was calculated for each model, and the concordance of importance rankings was assessed using the Friedman test. A Cox proportional hazards (CPH) model was utilized as a baseline comparator.

Results:

All five models yielded comparable discrimination on the test set (AUC 0.875–0.879; accuracy 0.796–0.822; F1 0.815–0.846). Permuting the 'Stage' group resulted in the most significant decrease in AUC, followed by 'Pulmonary Function Test', 'Symptoms', and 'Age'. The 'Gene Mutation' group had a modest overall impact but became more influential within the adenocarcinoma subset. The Friedman test showed no statistically significant differences in importance rankings across the models (p = .928).

Conclusions:

A meticulously tuned, grouped-input deep learning framework offered reliable and interpretable predictions for 5-year mortality in NSCLC. Group-level permutation importance provided stable and reproducible insights into the clinical factors influencing risk, which may guide future model refinement and clinical decision-making.

Citation

Please cite as:

Lee JH, Kim HC, Jung KW, Choi CM

Predicting 5-Year Mortality in Non–Small-Cell Lung Cancer Using the Korean Central Cancer Registry: Model Development and Validation Study

JMIR Med Inform 2026;14:e80574

DOI: 10.2196/80574

PMID: 42258797

PMCID: 13245844

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Jul 13, 2025

Open Peer Review Period: Aug 1, 2025 - Sep 26, 2025

Date Accepted: Mar 6, 2026

(closed for review but you can still tweet)

Development of a Deep Learning Model to Predict 5-Year Mortality in Non-Small Cell Lung Cancer Using the Korean Central Cancer Registry

ABSTRACT

Citation

Copyright