JMIR Preprints #53719: Handling Missing Data in COVID-19 Incidence Estimation in Vietnam: Secondary data analysis

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Handling Missing Data in COVID-19 Incidence Estimation in Vietnam: Secondary data analysis

Hai-Thanh Pham;
Thanh-Toan Do-Thi;
Jonggyu Baek;
Cong-Khanh Nguyen;
Quang-Thai Pham;
Hoa L Nguyen;
Robert J Goldberg;
Loc Quang Pham;
Giang Minh Le

ABSTRACT

Background:

The COVID-19 pandemic, characterized by varying lockdown durations across different nations and overcrowding in healthcare facilities, has introduced novel challenges in the realm of disease forecasting. One of the pressing issues has been the management of missing data stemming from diverse sources

Objective:

To show how handling missing data can effect estimates of the COVID-19 incidence rate (CIR).

Methods:

The current study used data from the surveillance system of COVID-19/SAR-CoV-2 patients treated at the National Institute of Hygiene and Epidemiology, Hanoi, Vietnam. We randomly removed missing data that were completely at random (MCAR) from 5% to 30% with a break of 5% each time in the variable daily case load of COVID-19. We selected six analytical methods to assess the effects of handling missing data which were backfill imputation, moving average, median imputation, maximum likelihood, linear interpolation, and the Autoregressive integrated moving average (ARIMA) model.

Results:

During the Zero-COVID period, the median imputation method yielded lower mean absolute crude bias (ACB) and mean crude root mean square error (RMSE) values compared to the other methods, irrespective of the extent of missing data; the median imputation method exhibited the lowest mean absolute percentage change (APC) in the CIR. During the Transition period, the ARIMA model of imputation demonstrated the lowest mean ACB across all levels of missing data and the lowest mean APC values. During the New-normal period, the backfill and linear interpolation methods demonstrated the lowest mean ACB across all levels of missing data and relatively lower mean APC values compared with the other imputation methods.

Conclusions:

Our study emphasizes the importance of choosing the most appropriate missing data handling method, in the context of a specific disease situation, to ensure reliable estimates of the CIR.

Citation

Please cite as:

Pham HT, Do-Thi TT, Baek J, Nguyen CK, Pham QT, Nguyen HL, Goldberg RJ, Pham LQ, Le GM

Handling Missing Data in COVID-19 Incidence Estimation: Secondary Data Analysis

JMIR Public Health Surveill 2024;10:e53719

DOI: 10.2196/53719

PMID: 39166439

PMCID: 11350390

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Public Health and Surveillance

Date Submitted: Oct 17, 2023

Open Peer Review Period: Oct 16, 2023 - Oct 30, 2023

Date Accepted: Jun 12, 2024

(closed for review but you can still tweet)

Handling Missing Data in COVID-19 Incidence Estimation in Vietnam: Secondary data analysis

ABSTRACT

Citation

Copyright