Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Public Health and Surveillance

Date Submitted: Oct 17, 2023
Open Peer Review Period: Oct 16, 2023 - Oct 30, 2023
Date Accepted: Jun 12, 2024
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Handling Missing Data in COVID-19 Incidence Estimation: Secondary Data Analysis

Pham HT, Do-Thi TT, Baek J, Nguyen CK, Pham QT, Nguyen HL, Goldberg RJ, Pham LQ, Le GM

Handling Missing Data in COVID-19 Incidence Estimation: Secondary Data Analysis

JMIR Public Health Surveill 2024;10:e53719

DOI: 10.2196/53719

PMID: 39166439

PMCID: 11350390

Handling Missing Data in COVID-19 Incidence Estimation in Vietnam: Secondary data analysis

  • Hai-Thanh Pham; 
  • Thanh-Toan Do-Thi; 
  • Jonggyu Baek; 
  • Cong-Khanh Nguyen; 
  • Quang-Thai Pham; 
  • Hoa L Nguyen; 
  • Robert J Goldberg; 
  • Loc Quang Pham; 
  • Giang Minh Le

ABSTRACT

Background:

The COVID-19 pandemic, characterized by varying lockdown durations across different nations and overcrowding in healthcare facilities, has introduced novel challenges in the realm of disease forecasting. One of the pressing issues has been the management of missing data stemming from diverse sources

Objective:

To show how handling missing data can effect estimates of the COVID-19 incidence rate (CIR).

Methods:

The current study used data from the surveillance system of COVID-19/SAR-CoV-2 patients treated at the National Institute of Hygiene and Epidemiology, Hanoi, Vietnam. We randomly removed missing data that were completely at random (MCAR) from 5% to 30% with a break of 5% each time in the variable daily case load of COVID-19. We selected six analytical methods to assess the effects of handling missing data which were backfill imputation, moving average, median imputation, maximum likelihood, linear interpolation, and the Autoregressive integrated moving average (ARIMA) model.

Results:

During the Zero-COVID period, the median imputation method yielded lower mean absolute crude bias (ACB) and mean crude root mean square error (RMSE) values compared to the other methods, irrespective of the extent of missing data; the median imputation method exhibited the lowest mean absolute percentage change (APC) in the CIR. During the Transition period, the ARIMA model of imputation demonstrated the lowest mean ACB across all levels of missing data and the lowest mean APC values. During the New-normal period, the backfill and linear interpolation methods demonstrated the lowest mean ACB across all levels of missing data and relatively lower mean APC values compared with the other imputation methods.

Conclusions:

Our study emphasizes the importance of choosing the most appropriate missing data handling method, in the context of a specific disease situation, to ensure reliable estimates of the CIR.


 Citation

Please cite as:

Pham HT, Do-Thi TT, Baek J, Nguyen CK, Pham QT, Nguyen HL, Goldberg RJ, Pham LQ, Le GM

Handling Missing Data in COVID-19 Incidence Estimation: Secondary Data Analysis

JMIR Public Health Surveill 2024;10:e53719

DOI: 10.2196/53719

PMID: 39166439

PMCID: 11350390

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.