Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Jul 13, 2025
Open Peer Review Period: Aug 1, 2025 - Sep 26, 2025
Date Accepted: Mar 6, 2026
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Predicting 5-Year Mortality in Non–Small-Cell Lung Cancer Using the Korean Central Cancer Registry: Model Development and Validation Study

Lee JH, Kim HC, Jung KW, Choi CM

Predicting 5-Year Mortality in Non–Small-Cell Lung Cancer Using the Korean Central Cancer Registry: Model Development and Validation Study

JMIR Med Inform 2026;14:e80574

DOI: 10.2196/80574

PMID: 42258797

Predicting 5-Year Mortality in NSCLC: Development and Validation of a Deep Learning Model Using the Korean Central Cancer Registry

  • Jong Hyuk Lee; 
  • Ho Cheol Kim; 
  • Kyu-Won Jung; 
  • Chang Min Choi

ABSTRACT

Background:

Non-small cell lung cancer (NSCLC) is one of the most common cancers and a leading cause of cancer-related mortality, making prognostic prediction clinically essential. Machine learning models are increasingly being utilized to assess prognosis; however, developing systems that combine high discrimination with clear, clinically interpretable reasoning remains challenging.

Objective:

To develop deep learning models that predict 5-We identified patients diagnosed between 2014 and 2017 who had complete clinical data, pulmonary function test results, histological information, genomic data, and staging details. After preprocessing, the cohort was divided into stratified training, validation, and test sets in a 70%:15%:15% ratio. Five models were tuned using Hyperband across ten predefined feature groups. The primary metric for evaluation was the area under the receiver operating characteristic curve (AUC); additional metrics reported included accuracy, F1 score, precision, and recall. Group-wise permutation importance was calculated for each model, and the concordance of importance rankings was assessed using the Friedman test. A Cox proportional hazards (CPH) model was utilized as a baseline comparator.year mortality in NSCLC using data from the Korea Central Cancer Registry (KCCR) and to quantify feature importance through permutation testing.

Methods:

We identified patients diagnosed between 2014 and 2017 who had complete clinical data, pulmonary function test results, histological information, genomic data, and staging details. After preprocessing, the cohort was divided into stratified training, validation, and test sets in a 70%:15%:15% ratio. Five models were tuned using Hyperband across ten predefined feature groups. The primary metric for evaluation was the area under the receiver operating characteristic curve (AUC); additional metrics reported included accuracy, F1 score, precision, and recall. Group-wise permutation importance was calculated for each model, and the concordance of importance rankings was assessed using the Friedman test. A Cox proportional hazards (CPH) model was utilized as a baseline comparator.

Results:

All five models yielded comparable discrimination on the test set (AUC 0.875–0.879; accuracy 0.796–0.822; F1 0.815–0.846). Permuting the 'Stage' group resulted in the most significant decrease in AUC, followed by 'Pulmonary Function Test', 'Symptoms', and 'Age'. The 'Gene Mutation' group had a modest overall impact but became more influential within the adenocarcinoma subset. The Friedman test showed no statistically significant differences in importance rankings across the models (p = .928).

Conclusions:

A meticulously tuned, grouped-input deep learning framework offered reliable and interpretable predictions for 5-year mortality in NSCLC. Group-level permutation importance provided stable and reproducible insights into the clinical factors influencing risk, which may guide future model refinement and clinical decision-making.


 Citation

Please cite as:

Lee JH, Kim HC, Jung KW, Choi CM

Predicting 5-Year Mortality in Non–Small-Cell Lung Cancer Using the Korean Central Cancer Registry: Model Development and Validation Study

JMIR Med Inform 2026;14:e80574

DOI: 10.2196/80574

PMID: 42258797

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.