Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: May 27, 2023
Open Peer Review Period: May 27, 2023 - Jul 22, 2023
Date Accepted: Sep 26, 2023
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Early Warning and Prediction of Scarlet Fever in China Using the Baidu Search Index and Autoregressive Integrated Moving Average With Explanatory Variable (ARIMAX) Model: Time Series Analysis

Huang J, Luo T, Zhou J, Yang J, Xie Y, Wei Y, Mai H, Lu D, Yang Y, Cui P, Ye L, Liang H

Early Warning and Prediction of Scarlet Fever in China Using the Baidu Search Index and Autoregressive Integrated Moving Average With Explanatory Variable (ARIMAX) Model: Time Series Analysis

J Med Internet Res 2023;25:e49400

DOI: 10.2196/49400

PMID: 37902815

PMCID: 10644180

Early warning and prediction of scarlet fever using Baidu search index combined with ARIMAX model in China

  • Jiegang Huang; 
  • Tingyan Luo; 
  • Jie Zhou; 
  • Jing Yang; 
  • Yulan Xie; 
  • Yiru Wei; 
  • Huanzhuo Mai; 
  • Dongjia Lu; 
  • Yuecong Yang; 
  • Ping Cui; 
  • Li Ye; 
  • Hao Liang

ABSTRACT

Background:

Internet-derived data and the Autoregressive Integrated Moving Average Model (ARIMA) are extensively used for infectious disease surveillance. However, the effectiveness of the Baidu Search Index (BSI) in predicting the incidence of scarlet fever remains uncertain.

Objective:

Our objective was to investigate whether a low-cost BSI monitoring system could potentially function as a valuable complement to traditional scarlet fever surveillance in China.

Methods:

ARIMA and ARIMAX models were developed to predict the incidence of scarlet fever in China using the data from the National Health Commission of the People's Republic of China between January 2011 and August 2022. The procedures included: (1) Establishing a keyword database; (2) Keywords selection and filtering through Spearman's rank correlation and Cross-correlation analysis; (3) Construction of the scarlet fever comprehensive search index (CSI); (4) Modelling with the training sets, predicting with the testing sets, and comparing the prediction performances.

Results:

The average monthly incidence of scarlet fever was 4462.17±3011.75 cases, and annual incidence showed with an upward trend to 2019. The keyword database contained 52 keywords, but only six highly relevant ones were selected for modeling. A high correlation was observed between the scarlet fever reported cases and the scarlet fever CSI (rs=0.881). We developed an ARIMA(4,0,0)(0,1,2)[12] model, the multiple ARIMA(4,0,0)(0,1,2)[12] + CSI (Lag=0) and ARIMAX(1,0,2)(2,0,0)[12] models which combined with the BSI. The three models had a good fitting and passed the residuals Ljung-Box test (P>.05). All models demonstrated favorable predictive capabilities, with the Mean Absolute Errors of 1692.16 (95% CI 584.88, 2799.44), 1067.89 (95% CI 402.02, 1733.76), 639.75 (95% CI 188.12, 1091.38), respectively; Root Mean Squared Error of 2036.92 (95% CI 929.64, 3144.20), 1224.92 (95% CI 559.04, 1890.79), 830.80 (95% CI 379.17, 1282.43), respectively; Absolute Percentage Error of 4.33% (95% CI 0.54%, 8.13%), 3.36% (95% CI 0.54%, 8.13%), 2.16% (95% CI -0.69%, 5.00%), respectively. But the ARIMAX models outperformed the ARIMA and had better prediction performances with smaller values.

Conclusions:

This study demonstrated that the BSI can be utilized for the early warning and prediction of scarlet fever, serving as a valuable supplement to traditional surveillance systems.


 Citation

Please cite as:

Huang J, Luo T, Zhou J, Yang J, Xie Y, Wei Y, Mai H, Lu D, Yang Y, Cui P, Ye L, Liang H

Early Warning and Prediction of Scarlet Fever in China Using the Baidu Search Index and Autoregressive Integrated Moving Average With Explanatory Variable (ARIMAX) Model: Time Series Analysis

J Med Internet Res 2023;25:e49400

DOI: 10.2196/49400

PMID: 37902815

PMCID: 10644180

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.