Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.
Who will be affected?
Readers: No access to all 28 journals. We recommend accessing our articles via PubMed Central
Authors: No access to the submission form or your user account.
Reviewers: No access to your user account. Please download manuscripts you are reviewing for offline reading before Wednesday, July 01, 2020 at 7:00 PM.
Editors: No access to your user account to assign reviewers or make decisions.
Copyeditors: No access to user account. Please download manuscripts you are copyediting before Wednesday, July 01, 2020 at 7:00 PM.
Normal Workflow and Key Strategies for Data Cleaning Towards Real-World Data: Viewpoint
Manping Guo;
Yiming Wang;
Qiaoning Yang;
Rui Li;
Yang Zhao;
Chenfei Li;
Mingbo Zhu;
Yao Cui;
Xin Jiang;
Song Sheng;
Qingna Li;
Rui Gao
ABSTRACT
Real-world research inevitably leads to the generation of "dirty data", which can seriously impact data utilization and the quality of decision-making. Data cleaning is a critical method for improving data quality. However, the current literature surrounding real-world research provides little guidance on how to set up and carry out data cleaning efforts both efficiently and ethically. To address this issue, we propose a data cleaning framework for real-world research, focusing on the three most common types of "dirty data,” (duplicate data, missing data, and outlier data), as well as a normal workflow for data cleaning to provide a reference for the application of such technologies in future studies.
Citation
Please cite as:
Guo M, Wang Y, Yang Q, Li R, Zhao Y, Li C, Zhu M, Cui Y, Jiang X, Sheng S, Li Q, Gao R
Normal Workflow and Key Strategies for Data Cleaning Toward Real-World Data: Viewpoint