JMIR Preprints #33049: Differential biases and variabilities of deep-learning-based artificial intelligence and human experts in clinical diagnosis: A retrospective cohort and survey study

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Differential biases and variabilities of deep-learning-based artificial intelligence and human experts in clinical diagnosis: A retrospective cohort and survey study

Dongchul Cha;
Chongwon Pae;
Se A Lee;
Gina Na;
Young Kyun Hur;
Ho Young Lee;
A Ra Cho;
Young Joon Cho;
Sang Gil Han;
Sung Huhn Kim;
Jae Young Choi;
Hae-Jeong Park

ABSTRACT

Background:

Deep learning (DL) based artificial intelligence may have different diagnostic characteristics from human experts in medical diagnosis. As a data-driven knowledge system, heterogeneous population incidence in the clinical world is considered to cause bias to more DL than clinicians. Conversely, by experiencing limited numbers of cases, human experts may exhibit a large inter-individual variability. Thus, understanding how the two groups classify given data differently is an essential step for the cooperative usage of DL in clinical application.

Objective:

Evaluate and compare the differential effects of clinical experiences in otoendoscopic image diagnosis in both computers and physicians exemplified by class imbalance problem and guide clinicians when utilizing decision support systems.

Methods:

Digital otoendoscopic images of patients who visited the outpatient clinic in Severance Hospital, Seoul, South Korea, department of otorhinolaryngology, from January 2013 to June 2019, a total of 22,707 otoendoscopic images. We excluded similar images, and 7,500 otoendoscopic images were selected for labeling. We built a DL based image classification model to classify the given image into six disease categories. Two tests sets of 300 images were populated, a balanced and imbalanced test set. 14 clinicians (Otolaryngologists, non-otolaryngology specialists including general practitioners) and 13 deep learning-based models were lined up. We used accuracy (overall and per-class) and kappa statistics to compare individual physicians and ML model’s results.

Results:

Our ML models had consistently high accuracies(77.14±1.83% in balanced, 82.03±3.06% in imbalanced test set) equivalent to otolaryngologists (71.17±3.37% in balanced, 72.83±6.41% in imbalanced), and far better accuracy compared to non-otolaryngologists (45.63±7.89% in balanced, 44.08±15.83% in imbalanced). However, ML models suffered from class imbalance problems (77.14±1.83% vs 82.03±3.06% in the balanced and imbalanced test set, respectively). This was mitigated by data augmentation, particularly for low incidence classes but still had low per-class accuracies in rare disease classes. Human physicians, despite being less affected by prevalence, showed high inter-physician variability. (kappa=0.83±0.02 vs 0.60±0.07 in ML models, otolaryngologists, respectively)

Conclusions:

Even though ML models deliver excellent performance in classifying ear disease, physicians and ML models have their own strengths. To deliver the best patient care in the shortage of otolaryngologists, our ML model can serve a cooperative role for clinicians with diverse expertise, as long as keeping in mind that models could be biased toward prevalent diseases even after data augmentation. Clinical Trial: Human-machine cooperation; Convolutional neural network; Deep learning, Class imbalance problem; Otoscopy; Eardrum; Artificial intelligence; Otology; Computer-aided diagnosis

Citation

Please cite as:

Cha D, Pae C, Lee SA, Na G, Hur YK, Lee HY, Cho AR, Cho YJ, Han SG, Kim SH, Choi JY, Park HJ

Differential Biases and Variabilities of Deep Learning–Based Artificial Intelligence and Human Experts in Clinical Diagnosis: Retrospective Cohort and Survey Study

JMIR Med Inform 2021;9(12):e33049

DOI: 10.2196/33049

PMID: 34889764

PMCID: 8701703

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Aug 22, 2021

Date Accepted: Oct 12, 2021

Differential biases and variabilities of deep-learning-based artificial intelligence and human experts in clinical diagnosis: A retrospective cohort and survey study

ABSTRACT

Citation

Copyright