Rethinking Robustness of Model Attributions

Kamath, Sandesh; Mittal, Sankalp; Deshpande, Amit; Balasubramanian, Vineeth N

Computer Science > Machine Learning

arXiv:2312.10534 (cs)

[Submitted on 16 Dec 2023]

Title:Rethinking Robustness of Model Attributions

Authors:Sandesh Kamath, Sankalp Mittal, Amit Deshpande, Vineeth N Balasubramanian

View PDF

Abstract:For machine learning models to be reliable and trustworthy, their decisions must be interpretable. As these models find increasing use in safety-critical applications, it is important that not just the model predictions but also their explanations (as feature attributions) be robust to small human-imperceptible input perturbations. Recent works have shown that many attribution methods are fragile and have proposed improvements in either these methods or the model training. We observe two main causes for fragile attributions: first, the existing metrics of robustness (e.g., top-k intersection) over-penalize even reasonable local shifts in attribution, thereby making random perturbations to appear as a strong attack, and second, the attribution can be concentrated in a small region even when there are multiple important parts in an image. To rectify this, we propose simple ways to strengthen existing metrics and attribution methods that incorporate locality of pixels in robustness metrics and diversity of pixel locations in attributions. Towards the role of model training in attributional robustness, we empirically observe that adversarially trained models have more robust attributions on smaller datasets, however, this advantage disappears in larger datasets. Code is available at this https URL.

Comments:	Accepted AAAI 2024
Subjects:	Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2312.10534 [cs.LG]
	(or arXiv:2312.10534v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2312.10534

Submission history

From: Sandesh Kamath K [view email]
[v1] Sat, 16 Dec 2023 20:20:38 UTC (2,700 KB)

Computer Science > Machine Learning

Title:Rethinking Robustness of Model Attributions

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Rethinking Robustness of Model Attributions

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators