Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Jun 21, 2024
Date Accepted: Oct 29, 2024

The final, peer-reviewed published version of this preprint can be found here:

Discovering Time-Varying Public Interest for COVID-19 Case Prediction in South Korea Using Search Engine Queries: Infodemiology Study

Ahn SH, Yim K, Won HS, Kim KM, Jeong DH

Discovering Time-Varying Public Interest for COVID-19 Case Prediction in South Korea Using Search Engine Queries: Infodemiology Study

J Med Internet Res 2024;26:e63476

DOI: 10.2196/63476

PMID: 39680913

PMCID: 11686031

Discovering Time-Varying Public Interest for COVID-19 Case Prediction in South Korea Using Search Engine Queries: Infodemiology Study

  • Seong-Ho Ahn; 
  • Kwangil Yim; 
  • Hyun-Sik Won; 
  • Kang-Min Kim; 
  • Dong-Hwa Jeong

ABSTRACT

Background:

The number of confirmed coronavirus disease (COVID-19) cases is a crucial indicator of policies and lifestyles. Previous studies have attempted to forecast cases using machine learning techniques that utilize a previous number of case counts and search engine queries predetermined by experts. However, they have limitations in reflecting temporal variations in queries associated with pandemic dynamics.

Objective:

We propose a novel framework to extract keywords highly associated with COVID-19, considering their temporal occurrence. We aim to extract relevant keywords based on pandemic variations using query expansion. Additionally, we examine time-delayed online search behavior related to public interest in COVID-19 and adjust for better prediction performance.

Methods:

To capture temporal semantics regarding COVID-19, word embedding models were trained on a news corpus, and the top 100 words related to "Corona" were extracted over 4-month windows. Time-lagged cross-correlation was applied to select optimal time lags correlated to confirmed cases from the expanded queries. Subsequently, EleastcNet regression models were trained after reducing the feature dimensions using principal component analysis of the time-lagged features to predict future daily case counts.

Results:

Our approach successfully extracted relevant keywords depending on the pandemic phase, encompassing keywords directly related to COVID-19, such as its symptoms, and its societal impact. Specifically, during the first outbreak, keywords directly linked to COVID-19 and past infectious disease outbreaks similar to those of COVID-19 exhibited a high positive correlation. In the second phase of the pandemic, as community infections emerged, keywords related to the government's pandemic control policies were frequently observed with a high positive correlation. In the third phase of the pandemic, during the delta variant outbreak, keywords such as “economic crisis” and “anxiety” appeared, reflecting public fatigue. Consequently, prediction models trained by the extracted queries over 4-month windows outperformed previous methods for most 1-14 day ahead predictions. Notably, our approach showed significantly higher Pearson correlation coefficients than models based solely on the number of past cases for predictions 9-11 days ahead (P=.021, P =.004, P=.004), in contrast to heuristic- and symptom-based query sets.

Conclusions:

This study proposes a novel COVID-19 case-prediction model that automatically extracts relevant queries over time using word embedding. The model outperformed previous methods that relied on static symptom-based or heuristic queries, even without prior expert knowledge. The results demonstrate the capability of our approach to track temporal shifts in public interest regarding changes in the pandemic.


 Citation

Please cite as:

Ahn SH, Yim K, Won HS, Kim KM, Jeong DH

Discovering Time-Varying Public Interest for COVID-19 Case Prediction in South Korea Using Search Engine Queries: Infodemiology Study

J Med Internet Res 2024;26:e63476

DOI: 10.2196/63476

PMID: 39680913

PMCID: 11686031

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.