Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Jan 18, 2026
Date Accepted: Mar 26, 2026
Deep Learning Algorithms versus Radiologists in Digital Breast Tomosynthesis for Breast Cancer Detection: A Systematic Review and Meta-Analysis
ABSTRACT
Background:
Deep learning (DL) algorithms based on digital breast tomosynthesis (DBT) have been increasingly developed, demonstrating emerging potential in enhancing lesion detection and classification.
Objective:
To compare the diagnostic performance of DL algorithms based on DBT with radiologists of varying experience, and to assess the clinical impact of DL assistance.
Methods:
A systematic search of PubMed, Embase, Web of Science, and Cochrane Library was conducted up to November 8, 2025. Included studies compared standalone DBT-based DL performance, radiologist interpretation alone, and DL-assisted diagnosis. Study quality was assessed using PROBAST+AI. Performance metrics were pooled using bivariate random-effects and generalized linear mixed models.
Results:
Thirteen studies with 38,565 patients were included in the final analysis. Standalone DL algorithms achieved a pooled sensitivity of 0.88 (95% CI: 0.80-0.93), specificity of 0.74 (95% CI: 0.59-0.85), and AUC of 0.89 (95% CI: 0.86-0.92). While DL performance showed no statistically significant difference compared to all radiologists (AUC 0.89 vs. 0.88) or senior radiologists (AUC 0.89 vs. 0.62), DL demonstrated significantly superior sensitivity compared to junior radiologists (0.88 vs. 0.76, P = 0.03). Notably, DL assistance did not statistically improve diagnostic metrics for radiologists across any experience level. Meta-regression identified validation methods as a significant source of heterogeneity.
Conclusions:
DBT-based DL algorithms exhibited diagnostic proficiency comparable to senior radiologists and superior sensitivity to junior radiologists, supporting their utility as adjunctive tools to enhance consistency and reduce oversight in less experienced settings. However, given that DL assistance did not significantly elevate human performance, current models act primarily as standardization aids. Future prospective, multimodal studies are warranted to validate these findings and optimize clinical integration.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.