Fooling Network Interpretation in Image Classification

Subramanya, Akshayvarun; Pillai, Vipin; Pirsiavash, Hamed

Computer Science > Computer Vision and Pattern Recognition

arXiv:1812.02843 (cs)

[Submitted on 6 Dec 2018 (v1), last revised 24 Sep 2019 (this version, v2)]

Title:Fooling Network Interpretation in Image Classification

Authors:Akshayvarun Subramanya, Vipin Pillai, Hamed Pirsiavash

View PDF

Abstract:Deep neural networks have been shown to be fooled rather easily using adversarial attack algorithms. Practical methods such as adversarial patches have been shown to be extremely effective in causing misclassification. However, these patches are highlighted using standard network interpretation algorithms, thus revealing the identity of the adversary. We show that it is possible to create adversarial patches which not only fool the prediction, but also change what we interpret regarding the cause of the prediction. Moreover, we introduce our attack as a controlled setting to measure the accuracy of interpretation algorithms. We show this using extensive experiments for Grad-CAM interpretation that transfers to occluding patch interpretation as well. We believe our algorithms can facilitate develo** more robust network interpretation tools that truly explain the network's underlying decision making process.

Comments:	Accepted at ICCV 2019
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:1812.02843 [cs.CV]
	(or arXiv:1812.02843v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1812.02843

Submission history

From: Akshayvarun Subramanya [view email]
[v1] Thu, 6 Dec 2018 22:53:53 UTC (9,611 KB)
[v2] Tue, 24 Sep 2019 23:48:28 UTC (9,407 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2018-12

Change to browse by:

cs
cs.LG

References & Citations

DBLP - CS Bibliography

listing | bibtex

Akshayvarun Subramanya
Vipin Pillai
Hamed Pirsiavash

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:Fooling Network Interpretation in Image Classification

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Fooling Network Interpretation in Image Classification

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators