Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Aug 12, 2020
Date Accepted: Dec 6, 2020
Hidden Variables in Deep Learning Digital Pathology and their Potential to Cause Batch Effects: Technical Model Study
ABSTRACT
Background:
An increasing number of studies within digital pathology show the potential of artificial intelligence (AI) to diagnose cancer using histological whole-slide images, which requires large and diverse datasets. While diversification may result in more generalizable AI-based systems, it can also introduce hidden variables. If neural networks are able to distinguish/learn hidden variables, these variables can introduce batch effects which compromise the accuracy of classification systems.
Objective:
To analyze the learnability of an exemplary selection of hidden variables (patient age, slide preparation date, slide origin, scanner type) which are commonly found in digital pathology whole-slide datasets and could create batch effects.
Methods:
We trained four separate convolutional neural networks (CNNs) to learn these four variables using a dataset of digitized whole-slide melanoma images from five different institutes. For robustness, each CNN training and evaluation run was repeated multiple times, and a variable was only considered learnable if the lower bound of its mean balanced accuracy 95% confidence interval was above 50.0%.
Results:
A mean balanced accuracy above 50.0% was exceeded for all four tasks, even when considering the lower bound of the 95% confidence interval. Performance between tasks showed strong variation, ranging from 56.1% (slide preparation date) up to 100% (slide origin).
Conclusions:
As all analyzed hidden variables were learnable, they have the potential to create batch effects in dermatopathology datasets, which negatively affects AI-based classification systems. Practitioners should be aware of these and similar pitfalls when developing and evaluating such systems and address these and potentially other batch effects causing variables in their datasets through sufficient dataset stratification.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.