Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: May 22, 2022
Open Peer Review Period: May 15, 2022 - Jul 10, 2022
Date Accepted: Feb 19, 2023
(closed for review but you can still tweet)
Predicting Norovirus in England using Existing and Emerging Syndromic Data: Infodemiology study
ABSTRACT
Background:
Norovirus is associated with approximately 18% of the global burden of gastroenteritis and affects all age groups. There is currently no licensed vaccine or available antiviral treatment. However, well-designed early warning systems and forecasting can guide non-pharmaceutical approaches to norovirus infection prevention and control.
Objective:
This study evaluates the predictive power of existing syndromic surveillance data and emerging data sources, such as internet searches and Wikipedia page views, to predict norovirus activity across a range of age groups across England.
Methods:
We compared laboratory data with existing syndromic surveillance and emerging syndromic data. First, to assess whether individual syndromic variables precede changes in norovirus laboratory reports in a given region or an age group, the Granger Causality Framework was used. Then, we used random forest modeling to estimate the importance of the variables with two methods: 1) change in the mean square error; and 2) node purity. Finally, these results were combined into a visualization indicating the most influential predictors for norovirus laboratory reports in a specific age group and region.
Results:
Visual exploration of the results suggested that both existing and emerging syndromic surveillance data include valuable predictors for norovirus laboratory reports in England. Predictors displayed varying relevance across age groups and regions. For example, the random forest modeling explained 60% variance in the 65+ age group, 42% in East of England, but only 13% in the South West region. Emerging data sets highlighted relative search volumes, including “flu symptoms”, “norovirus in pregnancy”, and norovirus activity in specific years, such as “norovirus 2016”. Symptoms of vomiting and gastroenteritis in multiple age groups were identified as important predictors within existing data sources.
Conclusions:
Syndromic data can predict the number of laboratory reports of norovirus, but the success and specific variables vary across age groups and regions. The variation can be due to contrasting public health practices between regions and health-information-seeking behavior between age groups. Data biases, such as low spatial granularity in the Google Trends and Wikipedia data, are important factors too. Additionally, predictors relevant in one norovirus season may not contribute in other seasons. Moreover, internet searches provide insight into mental models, i.e., individual's conceptual understanding of norovirus infection and transmission, which could be utilized in public health communication strategies.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.