Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Jan 26, 2025
Date Accepted: Jun 26, 2025
A Deep Learning Framework for Influenza-Like Illness Prediction: Distinguishing Epidemic and Non-Epidemic Seasons with Search Engine Data
ABSTRACT
Background:
The seasonal influenza epidemic poses a persistent and severe threat to global public health. Internet search data are recognized as a valuable source for forecasting influenza or other respiratory tract infection epidemics. Current influenza prediction studies typically focus on seasonal trends in traditional monitoring data, neglecting the sensitivity of different internet search terms to seasonal changes, thereby increasing prediction challenges.
Objective:
The aim of the present study proposed a deep learning framework for different influenza epidemic states based on Baidu index and the influenza-like-illness rate (ILI%).
Methods:
Official weekly ILI% data from 2013 to 2024 were extracted from the Chinese National Notifiable Infectious Disease Reporting System (NIDRIS). Based on the Baidu index, influenza-related search indexes were acquired for the corresponding time periods. To explore the association between influenza-related search queries and ILI%, the study conducted a cross-correlation analysis. The study period was divided into influenza epidemic season and non-epidemic season, and the influenza-related Baidu search term categories on different time periods were identified. The study finally used the convolutional long short-term memory network (CLSTM) framework to predict influenza epidemics with a lag of 1-3 weeks for the all-time period, influenza epidemic season, and non-epidemic season. The evaluation metrics included R-squared (R2), mean square error (MSE), root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE).
Results:
The ILI% presented a regular seasonal high incidence in China. Words related to influenza essential fact, symptoms, treatment, and prevention are highly correlated with the ILI%. The study found that people paid more attention to the “Influenza Essential Fact” category during the epidemic season and more attention to the “Influenza Treatment” category during the non-epidemic season. Meanwhile, the prediction of ILI% after dividing the epidemic and non-epidemic seasons (MAPE = 10.730, MSE = 0.884, MAE = 0.649, RMSE = 0.940, R2 = 0.877) was better than that of the all-time period (MAPE = 12.784, MSE = 1.513, MAE = 0.744, RMSE = 1.230, R2 = 0.786). In addition, we found that the ILI%+Baidu search index predicts better than only the ILI% regardless of the time period and lag time of the study.
Conclusions:
This study shows strong potential for influenza prediction by combining Baidu index data with traditional surveillance and specific keywords for epidemic and non-epidemic seasons. It provides a new perspective for public health preparedness. This research is expected to support early warning systems for influenza and other diseases. Future work will further optimize these models for more timely and accurate predictions, enhancing public health responses.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.