Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Jun 26, 2025
Date Accepted: Nov 12, 2025

The final, peer-reviewed published version of this preprint can be found here:

Systematic Determinants of Global COVID-19 Burden: Longitudinal Time-Series Analysis Using Big Data-Driven Artificial Intelligence

Cao Z, Han W, Zhang X, Zhang C, Zeng J, Chen Y, Long H, Chen J, Du X

Systematic Determinants of Global COVID-19 Burden: Longitudinal Time-Series Analysis Using Big Data-Driven Artificial Intelligence

J Med Internet Res 2025;27:e79745

DOI: 10.2196/79745

PMID: 41461077

PMCID: 12796881

Systematic Determinants of Global COVID-19 Burden: A Big Data-Driven Artificial Intelligence Analysis

  • Zicheng Cao; 
  • Wenjie Han; 
  • Xue Zhang; 
  • Chi Zhang; 
  • Jinfeng Zeng; 
  • Yilin Chen; 
  • Haoyu Long; 
  • Jian Chen; 
  • Xiangjun Du

ABSTRACT

Background:

The COVID-19 pandemic has transitions into an endemic phase with heterogenous resurgences. Despite widespread vaccination and public health measures, the interplay of viral evolution, population immunity, and environmental factors drives diverse global patterns of COVID-19 burden. This study examines how systematic factors shape ongoing COVID-19 outbreak patterns.

Objective:

To examine how systematic factors including viral variants, population immunity, environmental conditions, and public health measures shape ongoing COVID-19 outbreak patterns through big data-driven interpretable machine learning analysis.

Methods:

Through a big data-driven interpretable machine-learning approach, global multi-faceted data encompassing variants, infection, vaccination, environmental, policy, healthcare and migration trends across 38 major countries are scrutinized. The XGBoost model, coupled with SHAP value interpretation, quantifies the complex interdependencies and their spatiotemporal effects on COVID-19 burden metrics.

Results:

Beyond well-studied effects for migration, policy-related and natural infection-related factors, this study reveals more detailed dynamic cascade of epidemic drivers, beginning with variant dominance in transmission/Rt (23.34% contribution) that progressively attenuates across disease severity metrics. Within this variant landscape, Omicron 21K and Delta 21J emerge as evolutionary outliers, exceeding baseline transmissibility by 12.2% and 3.4% respectively. This variant-driven transmission pattern interacted with evolving population immunity, where natural infection demonstrates an escalating contribution to disease severity (22.8% for ICU, rising to 35.87% for deaths). The protective role of COVID-19 vaccination manifests through critical population thresholds: 29.9% coverage for transmission reduction and 72.3% for ICU admission prevention. This immunity landscape is further enriched by unexpected cross-protective effects from routine immunizations, particularly yellow fever vaccine (YFV) at doses exceeding 600,000. Environmental factors, primarily temperature, are identified as contributing modulators of these host-pathogen dynamics, demonstrating threshold effects around 14.95°C for hospitalizations and 9.57°C for critical cases, suggesting potential seasonal impacts on COVID-19 burden.

Conclusions:

Through large-scale epidemiological data mining, we revealed previously unrecognized patterns in the interplay of viral evolution, immunity, and environmental factors, with the discovery of variant effects suggesting evolutionary natural selection and the temporal dynamics between vaccination-induced and natural immunity informing vaccination strategies. Our big data approach provides novel insights into both the cross-protective effects of routine immunizations and the impact of environmental factors on COVID-19 disease burden, offering new strategies to enhance population resilience and informing targeted control measures. This systematic, data-driven framework establishes a template for understanding and mitigating future epidemic burden.


 Citation

Please cite as:

Cao Z, Han W, Zhang X, Zhang C, Zeng J, Chen Y, Long H, Chen J, Du X

Systematic Determinants of Global COVID-19 Burden: Longitudinal Time-Series Analysis Using Big Data-Driven Artificial Intelligence

J Med Internet Res 2025;27:e79745

DOI: 10.2196/79745

PMID: 41461077

PMCID: 12796881

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.