Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Jun 26, 2025
Date Accepted: Nov 12, 2025
Systematic Determinants of Global COVID-19 Burden: A Big Data-Driven Artificial Intelligence Analysis
ABSTRACT
Background:
The COVID-19 pandemic has transitions into an endemic phase with heterogenous resurgences. Despite widespread vaccination and public health measures, the interplay of viral evolution, population immunity, and environmental factors drives diverse global patterns of COVID-19 burden. This study examines how systematic factors shape ongoing COVID-19 outbreak patterns.
Objective:
To examine how systematic factors including viral variants, population immunity, environmental conditions, and public health measures shape ongoing COVID-19 outbreak patterns through big data-driven interpretable machine learning analysis.
Methods:
Through a big data-driven interpretable machine-learning approach, global multi-faceted data encompassing variants, infection, vaccination, environmental, policy, healthcare and migration trends across 38 major countries are scrutinized. The XGBoost model, coupled with SHAP value interpretation, quantifies the complex interdependencies and their spatiotemporal effects on COVID-19 burden metrics.
Results:
Beyond well-studied effects for migration, policy-related and natural infection-related factors, this study reveals more detailed dynamic cascade of epidemic drivers, beginning with variant dominance in transmission/Rt (23.34% contribution) that progressively attenuates across disease severity metrics. Within this variant landscape, Omicron 21K and Delta 21J emerge as evolutionary outliers, exceeding baseline transmissibility by 12.2% and 3.4% respectively. This variant-driven transmission pattern interacted with evolving population immunity, where natural infection demonstrates an escalating contribution to disease severity (22.8% for ICU, rising to 35.87% for deaths). The protective role of COVID-19 vaccination manifests through critical population thresholds: 29.9% coverage for transmission reduction and 72.3% for ICU admission prevention. This immunity landscape is further enriched by unexpected cross-protective effects from routine immunizations, particularly yellow fever vaccine (YFV) at doses exceeding 600,000. Environmental factors, primarily temperature, are identified as contributing modulators of these host-pathogen dynamics, demonstrating threshold effects around 14.95°C for hospitalizations and 9.57°C for critical cases, suggesting potential seasonal impacts on COVID-19 burden.
Conclusions:
Through large-scale epidemiological data mining, we revealed previously unrecognized patterns in the interplay of viral evolution, immunity, and environmental factors, with the discovery of variant effects suggesting evolutionary natural selection and the temporal dynamics between vaccination-induced and natural immunity informing vaccination strategies. Our big data approach provides novel insights into both the cross-protective effects of routine immunizations and the impact of environmental factors on COVID-19 disease burden, offering new strategies to enhance population resilience and informing targeted control measures. This systematic, data-driven framework establishes a template for understanding and mitigating future epidemic burden.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.