Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Feb 2, 2020
Date Accepted: Jun 21, 2020

The final, peer-reviewed published version of this preprint can be found here:

Assessment of the Robustness of Convolutional Neural Networks in Labeling Noise by Using Chest X-Ray Images From Multiple Centers

Jang R, Kim N, Jang M, Lee K, Lee SM, Lee KH, Noh HN, Seo JB

Assessment of the Robustness of Convolutional Neural Networks in Labeling Noise by Using Chest X-Ray Images From Multiple Centers

JMIR Med Inform 2020;8(8):e18089

DOI: 10.2196/18089

PMID: 32749222

PMCID: 7435602

Convolutional Neural Network is not Robust to Label Noise: Validation with Chest X-Ray Images from Multi-centers

  • Ryoungwoo Jang; 
  • Namkug Kim; 
  • Miso Jang; 
  • Kyunghwa Lee; 
  • Sang Min Lee; 
  • Kyung Hee Lee; 
  • Han Na Noh; 
  • Joon Beom Seo

ABSTRACT

Background:

Computer-aided diagnosis (CAD) on chest X-ray images using deep learning is widely studied modality in medicine. Many studies are based on public datasets, such as the National Institute of Health (NIH) dataset, and the Stanford CheXpert dataset. However, these datasets are preprocessed by classical natural language processing, which may cause a certain extent of label errors.

Objective:

In this study, we aimed to investigate the robustness of deep convolutional neural network (CNN) for binary classification of chest posteroanterior X-ray (CXR) through random incorrect labeling.

Methods:

We trained and validated the CNN architecture with different noise levels of labels in three datasets, namely, ours, NIH, and CheXpert, and tested the models with each test set. Diseases of each CXR in our dataset were confirmed with its corresponding computed tomography (CT) by thoracic radiologists. Receiver operating characteristic (ROC) and area under the curve (AUC) were evaluated in each test. Three medical doctors and one thoracic radiologist more than 20-years experience evaluated randomly chosen CXRs of public dataset.

Results:

In comparison with public datasets including NIH and CheXpert, where AUCs did not significantly drop to 16%, the AUC of our dataset significantly decreased from 2% label noise. Evaluation results of public dataset by four medical doctors were around 65% to 80% accuracies.

Conclusions:

Result implies that deep learning-based CAD model is sensitive to label noise and CAD with inaccurate labels is not credible. Furthermore, it was found that open datasets, such as NIH and CheXpert, need to be distilled before making use of deep learning-based CAD.


 Citation

Please cite as:

Jang R, Kim N, Jang M, Lee K, Lee SM, Lee KH, Noh HN, Seo JB

Assessment of the Robustness of Convolutional Neural Networks in Labeling Noise by Using Chest X-Ray Images From Multiple Centers

JMIR Med Inform 2020;8(8):e18089

DOI: 10.2196/18089

PMID: 32749222

PMCID: 7435602

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.