Accepted for/Published in: JMIR Public Health and Surveillance
Date Submitted: Oct 17, 2023
Open Peer Review Period: Oct 16, 2023 - Oct 30, 2023
Date Accepted: Jun 12, 2024
(closed for review but you can still tweet)
Handling Missing Data in COVID-19 Incidence Estimation in Vietnam: Secondary data analysis
ABSTRACT
Background:
The COVID-19 pandemic, characterized by varying lockdown durations across different nations and overcrowding in healthcare facilities, has introduced novel challenges in the realm of disease forecasting. One of the pressing issues has been the management of missing data stemming from diverse sources
Objective:
To show how handling missing data can effect estimates of the COVID-19 incidence rate (CIR).
Methods:
The current study used data from the surveillance system of COVID-19/SAR-CoV-2 patients treated at the National Institute of Hygiene and Epidemiology, Hanoi, Vietnam. We randomly removed missing data that were completely at random (MCAR) from 5% to 30% with a break of 5% each time in the variable daily case load of COVID-19. We selected six analytical methods to assess the effects of handling missing data which were backfill imputation, moving average, median imputation, maximum likelihood, linear interpolation, and the Autoregressive integrated moving average (ARIMA) model.
Results:
During the Zero-COVID period, the median imputation method yielded lower mean absolute crude bias (ACB) and mean crude root mean square error (RMSE) values compared to the other methods, irrespective of the extent of missing data; the median imputation method exhibited the lowest mean absolute percentage change (APC) in the CIR. During the Transition period, the ARIMA model of imputation demonstrated the lowest mean ACB across all levels of missing data and the lowest mean APC values. During the New-normal period, the backfill and linear interpolation methods demonstrated the lowest mean ACB across all levels of missing data and relatively lower mean APC values compared with the other imputation methods.
Conclusions:
Our study emphasizes the importance of choosing the most appropriate missing data handling method, in the context of a specific disease situation, to ensure reliable estimates of the CIR.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.