Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Cancer

Date Submitted: Jul 16, 2024
Date Accepted: Mar 26, 2025

The final, peer-reviewed published version of this preprint can be found here:

Next-Generation Sequencing–Based Testing Among Patients With Advanced or Metastatic Nonsquamous Non–Small Cell Lung Cancer in the United States: Predictive Modeling Using Machine Learning Methods

Brnabic AJM, Lipkovich I, Kadziola Z, He D, Krein PM, Hess LM

Next-Generation Sequencing–Based Testing Among Patients With Advanced or Metastatic Nonsquamous Non–Small Cell Lung Cancer in the United States: Predictive Modeling Using Machine Learning Methods

JMIR Cancer 2025;11:e64399

DOI: 10.2196/64399

PMID: 40497643

PMCID: 12198702

Next generation sequencing-based testing among patients with advanced/metastatic non-squamous non-small cell lung cancer in the U.S.: A predictive model using machine learning methods

  • Alan James Michael Brnabic; 
  • Ilya Lipkovich; 
  • Zbigniew Kadziola; 
  • Dan He; 
  • Peter M Krein; 
  • Lisa M Hess

ABSTRACT

Background:

.

Objective:

This study was designed to use machine learning methods to determine demographic and clinical characteristics of patients with advanced or metastatic NSCLC that may predict likelihood of receiving NGS-based testing (ever versus never NGS-tested) as well as likelihood of timing of testing (early versus late NGS-tested).

Methods:

De-identified patient-level data were analyzed in this study from a real-world cohort of patients with advanced or metastatic non-small cell lung cancer (NSCLC) in the U.S. Patients with non-squamous disease, who received systemic therapy for NSCLC, and had at least three months of follow up data for analysis were included in this study. Three strategies, logistic regression (LR) model(s), penalized logistic regression using lasso penalty (PLR) and eXtreme Gradient Boosting (XGboost) with classification trees as base learners, were used to identify predictors of ever versus never as well as early versus late NGS testing from an a priori defined set of variables. Data were split into D1 (training + validation) (80%) and D2 testing (20%) sets, and the three strategies were evaluated by comparing their performance on multiple m=1000 splits in the training (70%) and validation data (30%) within the D1 set. The final model was selected by evaluating performance from validation data while taking into account considerations of simplicity and clinical interpretability. Performance was re-estimated using the test data D2.

Results:

A total of 13,425 met criteria for the ever NGS-tested group and 17,982 were included in the never NGS-tested group. Performance metrics showed the Area under ROC (AUC) evaluated from validation data was similar across all models (77%-84%). Among those in the ever NGS-tested group, 84.1% (n=11,289) were early NGS-tested, and 15.9% (n=2,136) late NGS-tested. Factors associated with both ever having NGS testing as well as early NGS testing included later year of NSCLC diagnosis, no history of smoking, and evidence of PD-L1 testing (all p<0.05). Factors associated with a greater chance of never receiving NGS testing included older age, lower ECOG performance status, Black race, higher number of single-gene tests, public insurance, and treatment in a geography associated with Molecular Diagnostics Services (MoIDX) Program adoption (all p<0.05).

Conclusions:

Predictors of “ever” versus “never” as well as “early” versus “late NGS testing” in the setting of advanced or metastatic NSCLC were consistent across machine learning methods in this study demonstrating the ability of these models to identify factors that may predict those most and least likely to receive testing in accordance with clinical practice guidelines. There is a need to ensure that all patients, regardless of age, race, insurance status and geography, all factors that were associated with lower odds of receiving NGS testing in this study, are provided with equitable access to NGS-based testing.


 Citation

Please cite as:

Brnabic AJM, Lipkovich I, Kadziola Z, He D, Krein PM, Hess LM

Next-Generation Sequencing–Based Testing Among Patients With Advanced or Metastatic Nonsquamous Non–Small Cell Lung Cancer in the United States: Predictive Modeling Using Machine Learning Methods

JMIR Cancer 2025;11:e64399

DOI: 10.2196/64399

PMID: 40497643

PMCID: 12198702

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.