Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Formative Research

Date Submitted: Sep 20, 2022
Date Accepted: Feb 7, 2023

The final, peer-reviewed published version of this preprint can be found here:

Predicting Measles Outbreaks in the United States: Evaluation of Machine Learning Approaches

Ru B, Kujawski S, Lee Afanador N, Baumgartner R, Pawaskar M, Das A

Predicting Measles Outbreaks in the United States: Evaluation of Machine Learning Approaches

JMIR Form Res 2023;7:e42832

DOI: 10.2196/42832

PMID: 37014694

PMCID: 10131820

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Predicting Measles Outbreaks in the United States: Evaluation of Machine Learning Approaches

  • Boshu Ru; 
  • Stephanie Kujawski; 
  • Nelson Lee Afanador; 
  • Richard Baumgartner; 
  • Manjiri Pawaskar; 
  • Amar Das

ABSTRACT

Background:

Measles is resurging in the US, driven by international importation and declining domestic vaccination coverage. Improved methods to predict outbreaks at the county level would facilitate the optimal allocation of public health resources.

Objective:

We aimed to develop and compare supervised, unsupervised and hybrid machine learning models to identify US counties at risk of measles outbreaks.

Methods:

We constructed a supervised machine learning model based on eXtreme Gradient Boosting (XGBoost) and unsupervised models based on Hierarchical Density-based Spatial Clustering of Applications with Noise (HDBSCAN) and unsupervised Random Forest (uRF). The unsupervised models were used to investigate clustering patterns among counties with measles outbreaks; these clustering data were also incorporated into hybrid XGBoost models as additional input variables. The machine learning models were then compared to weighted logistic regression models with and without input from the unsupervised models.

Results:

Both HDBSCAN and uRF identified clusters that included a high percentage of counties with measles outbreaks. XGBoost and its hybrid models outperformed weighted logistic regression and its hybrid models, with area under the receiver operating curve values of 0.920–0.926 versus 0.900–0.908, area under the precision–recall curve (AUPRC) values of 0.522–0.532 versus 0.485–0.513, and F2 scores of 0.595–0.601 versus 0.385–0.426. Weighted logistic regression and its hybrid models had higher sensitivity than XGBoost and its hybrid models (0.837–0.857 versus 0.704–0.735) but lower positive predictive value (0.122–0.141 versus 0.340–0.367) and specificity (0.793–0.821 versus 0.952–0.958). The hybrid versions of the weighted logistic regression and XGBoost models had slightly higher AUPRC, specificity, and positive predictive values than the respective models that did not include any unsupervised features.

Conclusions:

XGBoost provided more accurate predictions of measles cases at the county level compared with weighted logistic regression. The threshold of prediction in this model can be adjusted to align with each county’s resources, priorities, and measles risk. While clustering pattern data from unsupervised machine learning approaches improved some aspects of model performance in this imbalanced data set, the optimal approach for integration of such approaches with supervised machine learning models requires further investigation. Clinical Trial: N/A


 Citation

Please cite as:

Ru B, Kujawski S, Lee Afanador N, Baumgartner R, Pawaskar M, Das A

Predicting Measles Outbreaks in the United States: Evaluation of Machine Learning Approaches

JMIR Form Res 2023;7:e42832

DOI: 10.2196/42832

PMID: 37014694

PMCID: 10131820

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.