Accepted for/Published in: JMIR Formative Research
Date Submitted: Mar 31, 2024
Date Accepted: Oct 30, 2024
Intersection of Performance, Interpretability, and Fairness in Neural Prototype Tree for Chest X-ray Pathology Detection: Algorithm Development and Validation Study
ABSTRACT
Background:
While deep learning classifiers have shown remarkable results in detecting chest x-ray (CXR) pathologies, their adoption in clinical settings is often hampered by the lack of transparency. To bridge this gap, this study introduces the Neural Prototype Tree (NPT), an interpretable image classifier that combines the diagnostic capability of deep learning models and the interpretability of the decision tree for CXR pathology detection.
Objective:
We first investigate the NPT classifier’s utility in three dimensions including performance, interpretability, and fairness, and subsequently examine the complex interaction between these dimensions. We also showcase both local and global explanations of the NPT classifier and discuss its potential utility in clinical settings.
Methods:
The study utilizes CXRs from the publicly available Chest X-ray 14 dataset. We trained six separate classifiers for each CXR pathology, one baseline ResNet-152, and five NPT classifiers with varying levels of interpretability. Performance, interpretability, and fairness are measured with Area Under the Receiver Operating Characteristic Curve (ROC AUC), interpretation complexity (IC), and mean true positive rate (TPR) disparity, respectively. Linear regression analyses were performed to investigate the relationship between IC and ROC AUC, as well as between IC and mean TPR disparity.
Results:
The NPT classifier with an IC of 31 achieves a competitive ROC AUC, comparable to that of ResNet-152 in detecting CXR pathologies. The NPT classifier with an IC of 1 exhibits the highest level of unfairness, with a mean TPR disparity of 0.057 for sex-based subgroups and 0.113 for age-based subgroups. The bias quantified by mean TPR disparity was found to be more pronounced in age-differentiated subgroups as compared to sex-differentiated subgroups. A significant positive relationship between interpretability (i.e., IC) and performance (i.e., ROC AUC) was observed for all CXR pathologies (P<.001). Linear regression analysis indicated a significant negative relationship between interpretability and fairness (i.e., mean TPR disparity) across age and sex subgroups (P<.001).
Conclusions:
By illuminating the intricate relationship between performance, interpretability, and fairness of the NPT classifier, this research offers insightful perspectives that could guide future developments in effective, interpretable, and equitable deep learning classifiers for CXR pathology detection.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.