Accepted for/Published in: JMIR Infodemiology
Date Submitted: Apr 3, 2025
Date Accepted: Sep 5, 2025
Hand, foot and mouth disease risk prediction in southern China: a multivariate analysis integrating internet search and epidemiological surveillance data
ABSTRACT
Background:
Hand, foot, and mouth disease (HFMD) is a global health concern requiring a risk assessment framework based on systematic factors analysis for prevention and control.
Objective:
This study aims to construct a comprehensive HFMD risk assessment framework by integrating multi-source data, including historical incidence information, environmental parameters and social searching data.
Methods:
We integrated multi-source data (HFMD cases, meteorology, air pollution, Baidu Index, public health measures) from Bao’an District of Shenzhen city in Southern China (2014-2023). Correlation analysis was used to assess the associations between HFMD incidence and systematic factors. The impacts of environmental factors were analyzed using the Distributed Lag Non-linear Model. Seasonal Autoregressive Integrated Moving Average (SARIMA) model and advanced machine learning methods were used to predict HFMD 1 to 4 weeks ahead. Risk levels for the 1- to 4-week-ahead forecasts were determined by comparing the predicted weekly incidence against predefined thresholds.
Results:
Apart from sulfur dioxide, other environmental factors significantly influence HFMD incidence in various non-linear ways. SARIMA using only incidence data performs best for the 1-week-ahead forecast, with a coefficient of determination (R²) of 0.95. For the 2- to 4-week-ahead forecasts, the machine learning methods incorporating systematic factors achieves the best performance, with R² values of 0.83, 0.75, and 0.61, respectively. Additionally, the predicted risk levels of HFMD incidence matches the actual risk levels with an accuracy rate of 96%, 87%, 88%, and 83%, respectively.
Conclusions:
HFMD incidence is influenced by systematic factors in a nonlinear way. For the short-term HFMD incidence predictions, the SARIMA model stands out, while advanced machine learning methods incorporating systematical factors performs better in mid-term forecasts. The first real 1- to 4-week-ahead risk level assessment index is established with good accuracy.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.