Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Public Health and Surveillance

Date Submitted: Oct 22, 2024
Date Accepted: Jan 29, 2025

The final, peer-reviewed published version of this preprint can be found here:

Identifying Data-Driven Clinical Subgroups for Cervical Cancer Prevention With Machine Learning: Population-Based, External, and Diagnostic Validation Study

Lu Z, Dong B, Cai H, Tian T, Wang J, Fu L, Wang B, Zhang W, Lin S, Tuo X, Wang J, Yang T, Huang X, Zheng Z, Xue H, Xu S, Liu S, Sun P, Zou H

Identifying Data-Driven Clinical Subgroups for Cervical Cancer Prevention With Machine Learning: Population-Based, External, and Diagnostic Validation Study

JMIR Public Health Surveill 2025;11:e67840

DOI: 10.2196/67840

PMID: 40106366

Identifying Data-Driven Clinical Subgroups for Cervical Cancer Prevention with Machine Learning: A Population-based, External, and Diagnostic Validation Study

  • Zhen Lu; 
  • Binhua Dong; 
  • Hongning Cai; 
  • Tian Tian; 
  • Junfeng Wang; 
  • Leiwen Fu; 
  • Bingyi Wang; 
  • Weijie Zhang; 
  • Shaomei Lin; 
  • Xunyuan Tuo; 
  • Juntao Wang; 
  • Tianjie Yang; 
  • Xinxin Huang; 
  • Zheng Zheng; 
  • Huifeng Xue; 
  • Shuxia Xu; 
  • Siyang Liu; 
  • Pengming Sun; 
  • Huachun Zou

ABSTRACT

Background:

Successful scale-up of high-performance and cost-effective cervical cancer prevention (CCP) is key to identifying gaps and progressing towards cervical cancer elimination.

Objective:

We aimed to propose a computational phenomapping strategy to discover CCP subgroups with differential risks of cervical cancer and validate them upon population representative data.

Methods:

We explored the data-driven CCP subgroups by applying unsupervised machine learning to a deeply phenotyped, population-based discovery cohort. We extracted CCP-specific risks of cervical intraepithelial neoplasia grade 2/3 or worse (CIN2+ and CIN3+), through weighted logistic regression analyses providing odds ratio (OR) estimates. We trained supervised machine learning model and developed pathways to classify individuals, before evaluating its diagnostic validity and usability on external cohort.

Results:

We included 551,934 and 47,130 women from discovery and external cohort, respectively. After identifying five CCP subgroups, we labelled them as (0) healthy, (1) early onset, (2) screening-targeted, (3) late onset, and (4) carcinoma-specific. In external validation, CCP subgroups were similar across datasets. In internal and external diagnostic validity analyses, women in CCP2-4 exhibited differential and increased risk of both CIN2+ (CCP2: OR 5.54 95% CI [3.27-8.86]; CCP3 & 4: 26.56 [24.44-28.88]) and CIN3+. CCP-specific risks of CIN2+/CIN3+ were evident in almost all subgroups. We proposed a computational phenomapping strategy and developed a prototype app to promote translation into real-world screening.

Conclusions:

Across six data sources, multiple machine learning algorithms, and multiple validation methods, we identified five CCP subgroups with good accuracy and diagnostic validity for CIN2+/CIN3+ within and across cohorts, and proposed a triple screening strategy. This new substratification and strategy might provide the global potential to tailor and target adequate follow-up surveillance visits and early treatment with prioritization of those in greatest need, thereby facilitating precision medicine towards cervical cancer elimination.


 Citation

Please cite as:

Lu Z, Dong B, Cai H, Tian T, Wang J, Fu L, Wang B, Zhang W, Lin S, Tuo X, Wang J, Yang T, Huang X, Zheng Z, Xue H, Xu S, Liu S, Sun P, Zou H

Identifying Data-Driven Clinical Subgroups for Cervical Cancer Prevention With Machine Learning: Population-Based, External, and Diagnostic Validation Study

JMIR Public Health Surveill 2025;11:e67840

DOI: 10.2196/67840

PMID: 40106366

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.