Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Feb 2, 2020
Date Accepted: Jun 21, 2020
Convolutional Neural Network is not Robust to Label Noise: Validation with Chest X-Ray Images from Multi-centers
ABSTRACT
Background:
Computer-aided diagnosis (CAD) on chest X-ray images using deep learning is widely studied modality in medicine. Many studies are based on public datasets, such as the National Institute of Health (NIH) dataset, and the Stanford CheXpert dataset. However, these datasets are preprocessed by classical natural language processing, which may cause a certain extent of label errors.
Objective:
In this study, we aimed to investigate the robustness of deep convolutional neural network (CNN) for binary classification of chest posteroanterior X-ray (CXR) through random incorrect labeling.
Methods:
We trained and validated the CNN architecture with different noise levels of labels in three datasets, namely, ours, NIH, and CheXpert, and tested the models with each test set. Diseases of each CXR in our dataset were confirmed with its corresponding computed tomography (CT) by thoracic radiologists. Receiver operating characteristic (ROC) and area under the curve (AUC) were evaluated in each test. Three medical doctors and one thoracic radiologist more than 20-years experience evaluated randomly chosen CXRs of public dataset.
Results:
In comparison with public datasets including NIH and CheXpert, where AUCs did not significantly drop to 16%, the AUC of our dataset significantly decreased from 2% label noise. Evaluation results of public dataset by four medical doctors were around 65% to 80% accuracies.
Conclusions:
Result implies that deep learning-based CAD model is sensitive to label noise and CAD with inaccurate labels is not credible. Furthermore, it was found that open datasets, such as NIH and CheXpert, need to be distilled before making use of deep learning-based CAD.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.