SCAAT: Improving Neural Network Interpretability via Saliency Constrained Adaptive Adversarial Training

Xu, Rui; Qin, Wenkang; Huang, Peixiang; Wang, Hao; Luo, Lin

Computer Science > Computer Vision and Pattern Recognition

arXiv:2311.05143 (cs)

[Submitted on 9 Nov 2023 (v1), last revised 10 Nov 2023 (this version, v2)]

Title:SCAAT: Improving Neural Network Interpretability via Saliency Constrained Adaptive Adversarial Training

Authors:Rui Xu, Wenkang Qin, Peixiang Huang, Hao Wang, Lin Luo

View PDF

Abstract:Deep Neural Networks (DNNs) are expected to provide explanation for users to understand their black-box predictions. Saliency map is a common form of explanation illustrating the heatmap of feature attributions, but it suffers from noise in distinguishing important features. In this paper, we propose a model-agnostic learning method called Saliency Constrained Adaptive Adversarial Training (SCAAT) to improve the quality of such DNN interpretability. By constructing adversarial samples under the guidance of saliency map, SCAAT effectively eliminates most noise and makes saliency maps sparser and more faithful without any modification to the model architecture. We apply SCAAT to multiple DNNs and evaluate the quality of the generated saliency maps on various natural and pathological image datasets. Evaluations on different domains and metrics show that SCAAT significantly improves the interpretability of DNNs by providing more faithful saliency maps without sacrificing their predictive power.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2311.05143 [cs.CV]
	(or arXiv:2311.05143v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2311.05143

Submission history

From: Rui Xu [view email]
[v1] Thu, 9 Nov 2023 04:48:38 UTC (17,553 KB)
[v2] Fri, 10 Nov 2023 08:53:57 UTC (17,554 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:SCAAT: Improving Neural Network Interpretability via Saliency Constrained Adaptive Adversarial Training

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:SCAAT: Improving Neural Network Interpretability via Saliency Constrained Adaptive Adversarial Training

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators