Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Public Health and Surveillance

Date Submitted: Jan 23, 2025
Date Accepted: Jun 6, 2025

The final, peer-reviewed published version of this preprint can be found here:

Identifying Predictors of Cervical Cancer Screening Uptake in Sub-Saharan Africa Using Machine Learning: Cross-Sectional Study

Baykemagn ND, Aweke MN, Mesfin A, Baffa LD, Agimas MC, Abuhay HW, Adugna DG, Alemu TG, Bicha AT, Alemu GG

Identifying Predictors of Cervical Cancer Screening Uptake in Sub-Saharan Africa Using Machine Learning: Cross-Sectional Study

JMIR Public Health Surveill 2025;11:e71677

DOI: 10.2196/71677

PMID: 40961361

PMCID: 12443358

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Machine Learning-Based Prediction of Determinants for Cervical Cancer Screening Among Women Aged 30-49 in Sub-Saharan Africa

  • Nebebe Demis Baykemagn; 
  • Mekuriaw Nibret Aweke; 
  • Amare Mesfin; 
  • Lemlem Daniel Baffa; 
  • Muluken Chanie Agimas; 
  • Habtamu Wagnew Abuhay; 
  • Dagnew Getnet Adugna; 
  • Tewodros Getaneh Alemu; 
  • Alemu Teshale Bicha; 
  • Gebrie Getu Alemu

ABSTRACT

Background:

Cervical cancer is the fourth most prevalent cancer in women, with 660,000 new cases and 350,000 deaths in 2022. If early screening is effectively implemented, it could reduce the overall number of cervical cancer cases by up to 80%, prevent more than 40% of new cases, and save 5 million lives. In today's world, without machine learning, it is impossible to analyze large datasets effectively and use them for decision-making.

Objective:

this to assess a machine learning-based prediction model and identify the key determinants influencing cervical cancer screening uptake among women aged 30-49 in Sub-Saharan Africa

Methods:

For this study, a weighted dataset of 33,952 from the 2022 Demographic and Health Survey (DHS) in Ghana, Kenya, Mozambique, and Tanzania was used. STATA version 17 and Python 3.10 were used for data preprocessing and analysis. MinMax and Standard Scalar were applied for feature scaling, and Recursive Feature Elimination (RFE) was used for feature selection. An 80:20 ratio was applied for data splitting. Tomek Links with Random Over-Sampling were used for handling class imbalance. Seven models were selected and trained using both balanced and unbalanced datasets. Model evaluation was performed using ROC-AUC, accuracy, and confusion matrix.

Results:

Random Forest classifier was ranked as the best among seven algorithms for cervical cancer prediction, showed that wealth status, awareness of STIs, HIV testing, age at first sex, primary education and above, and living in urban areas are significant factors associated with increased cervical cancer screening. However, factors such as not owning a smartphone, having a single sexual partner, and unknown health status are associated with a decrease in cervical cancer screening with an ROC accuracy of 78%, AUC of 86%, and a confusion matrix score of 72.7% on the test data.

Conclusions:

In conclusion, to promote cervical cancer screening in Africa, it is recommended to focus on education and awareness campaigns, establish in place outreach programs and begin screening at health post or community level, and address the digital divide.


 Citation

Please cite as:

Baykemagn ND, Aweke MN, Mesfin A, Baffa LD, Agimas MC, Abuhay HW, Adugna DG, Alemu TG, Bicha AT, Alemu GG

Identifying Predictors of Cervical Cancer Screening Uptake in Sub-Saharan Africa Using Machine Learning: Cross-Sectional Study

JMIR Public Health Surveill 2025;11:e71677

DOI: 10.2196/71677

PMID: 40961361

PMCID: 12443358

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.