Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Aug 12, 2020
Date Accepted: Dec 6, 2020

The final, peer-reviewed published version of this preprint can be found here:

Hidden Variables in Deep Learning Digital Pathology and Their Potential to Cause Batch Effects: Prediction Model Study

Schmitt M, Maron RC, Hekler A, Stenzinger A, Hauschild A, Weichenthal M, Tiemann M, Krahl D, Kutzner H, Utikal JS, Haferkamp S, Kather JN, Klauschen F, Krieghoff-Henning E, Fröhling S, von Kalle C, Brinker TJ

Hidden Variables in Deep Learning Digital Pathology and Their Potential to Cause Batch Effects: Prediction Model Study

J Med Internet Res 2021;23(2):e23436

DOI: 10.2196/23436

PMID: 33528370

PMCID: 7886613

Hidden Variables in Deep Learning Digital Pathology and their Potential to Cause Batch Effects: Technical Model Study

  • Max Schmitt; 
  • Roman Christoph Maron; 
  • Achim Hekler; 
  • Albrecht Stenzinger; 
  • Axel Hauschild; 
  • Michael Weichenthal; 
  • Markus Tiemann; 
  • Dieter Krahl; 
  • Heinz Kutzner; 
  • Jochen Sven Utikal; 
  • Sebastian Haferkamp; 
  • Jakob Nikolas Kather; 
  • Frederick Klauschen; 
  • Eva Krieghoff-Henning; 
  • Stefan Fröhling; 
  • Christof von Kalle; 
  • Titus Josef Brinker

ABSTRACT

Background:

An increasing number of studies within digital pathology show the potential of artificial intelligence (AI) to diagnose cancer using histological whole-slide images, which requires large and diverse datasets. While diversification may result in more generalizable AI-based systems, it can also introduce hidden variables. If neural networks are able to distinguish/learn hidden variables, these variables can introduce batch effects which compromise the accuracy of classification systems.

Objective:

To analyze the learnability of an exemplary selection of hidden variables (patient age, slide preparation date, slide origin, scanner type) which are commonly found in digital pathology whole-slide datasets and could create batch effects.

Methods:

We trained four separate convolutional neural networks (CNNs) to learn these four variables using a dataset of digitized whole-slide melanoma images from five different institutes. For robustness, each CNN training and evaluation run was repeated multiple times, and a variable was only considered learnable if the lower bound of its mean balanced accuracy 95% confidence interval was above 50.0%.

Results:

A mean balanced accuracy above 50.0% was exceeded for all four tasks, even when considering the lower bound of the 95% confidence interval. Performance between tasks showed strong variation, ranging from 56.1% (slide preparation date) up to 100% (slide origin).

Conclusions:

As all analyzed hidden variables were learnable, they have the potential to create batch effects in dermatopathology datasets, which negatively affects AI-based classification systems. Practitioners should be aware of these and similar pitfalls when developing and evaluating such systems and address these and potentially other batch effects causing variables in their datasets through sufficient dataset stratification.


 Citation

Please cite as:

Schmitt M, Maron RC, Hekler A, Stenzinger A, Hauschild A, Weichenthal M, Tiemann M, Krahl D, Kutzner H, Utikal JS, Haferkamp S, Kather JN, Klauschen F, Krieghoff-Henning E, Fröhling S, von Kalle C, Brinker TJ

Hidden Variables in Deep Learning Digital Pathology and Their Potential to Cause Batch Effects: Prediction Model Study

J Med Internet Res 2021;23(2):e23436

DOI: 10.2196/23436

PMID: 33528370

PMCID: 7886613

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.