JMIR Preprints #42832: Predicting Measles Outbreaks in the United States: Evaluation of Machine Learning Approaches

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Predicting Measles Outbreaks in the United States: Evaluation of Machine Learning Approaches

Boshu Ru;
Stephanie Kujawski;
Nelson Lee Afanador;
Richard Baumgartner;
Manjiri Pawaskar;
Amar Das

ABSTRACT

Background:

Measles is resurging in the US, driven by international importation and declining domestic vaccination coverage. Improved methods to predict outbreaks at the county level would facilitate the optimal allocation of public health resources.

Objective:

We aimed to develop and compare supervised, unsupervised and hybrid machine learning models to identify US counties at risk of measles outbreaks.

Methods:

We constructed a supervised machine learning model based on eXtreme Gradient Boosting (XGBoost) and unsupervised models based on Hierarchical Density-based Spatial Clustering of Applications with Noise (HDBSCAN) and unsupervised Random Forest (uRF). The unsupervised models were used to investigate clustering patterns among counties with measles outbreaks; these clustering data were also incorporated into hybrid XGBoost models as additional input variables. The machine learning models were then compared to weighted logistic regression models with and without input from the unsupervised models.

Results:

Both HDBSCAN and uRF identified clusters that included a high percentage of counties with measles outbreaks. XGBoost and its hybrid models outperformed weighted logistic regression and its hybrid models, with area under the receiver operating curve values of 0.920–0.926 versus 0.900–0.908, area under the precision–recall curve (AUPRC) values of 0.522–0.532 versus 0.485–0.513, and F2 scores of 0.595–0.601 versus 0.385–0.426. Weighted logistic regression and its hybrid models had higher sensitivity than XGBoost and its hybrid models (0.837–0.857 versus 0.704–0.735) but lower positive predictive value (0.122–0.141 versus 0.340–0.367) and specificity (0.793–0.821 versus 0.952–0.958). The hybrid versions of the weighted logistic regression and XGBoost models had slightly higher AUPRC, specificity, and positive predictive values than the respective models that did not include any unsupervised features.

Conclusions:

XGBoost provided more accurate predictions of measles cases at the county level compared with weighted logistic regression. The threshold of prediction in this model can be adjusted to align with each county’s resources, priorities, and measles risk. While clustering pattern data from unsupervised machine learning approaches improved some aspects of model performance in this imbalanced data set, the optimal approach for integration of such approaches with supervised machine learning models requires further investigation. Clinical Trial: N/A

Citation

Please cite as:

Ru B, Kujawski S, Lee Afanador N, Baumgartner R, Pawaskar M, Das A

Predicting Measles Outbreaks in the United States: Evaluation of Machine Learning Approaches

JMIR Form Res 2023;7:e42832

DOI: 10.2196/42832

PMID: 37014694

PMCID: 10131820

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Formative Research

Date Submitted: Sep 20, 2022

Date Accepted: Feb 7, 2023

Predicting Measles Outbreaks in the United States: Evaluation of Machine Learning Approaches

ABSTRACT

Citation

Copyright