TranSegPGD: Improving Transferability of Adversarial Examples on Semantic Segmentation

Xiaojun Jia

{}^{1}

, **dong Gu

{}^{2}

, Yihao Huang

{}^{1}

, Simeng Qin

{}^{3}

, Qing Guo

{}^{4}

, Yang Liu

{}^{1}

, Xiaochun Cao

{}^{5,\ddagger}

{}^{1}

Nanyang Technological University, Singapore

{}^{2}

University of Oxford, UK

{}^{3}

Yanshan University, China

{}^{4}

CFAR and IHPC, Agency for Science, Technology and Research (A*STAR), Singapore

{}^{5}

Shenzhen Campus of Sun Yat-sen University, China
[email protected]; [email protected]; [email protected];
[email protected]; [email protected]; [email protected]; [email protected]

Abstract

Transferability of adversarial examples on image classification has been systematically explored, which generates adversarial examples in black-box mode. However, the transferability of adversarial examples on semantic segmentation has been largely overlooked. In this paper, we propose an effective two-stage adversarial attack strategy to improve the transferability of adversarial examples on semantic segmentation, dubbed TranSegPGD. Specifically, at the first stage, every pixel in an input image is divided into different branches based on its adversarial property. Different branches are assigned different weights for optimization to improve the adversarial performance of all pixels. We assign high weights to the loss of the hard-to-attack pixels to misclassify all pixels. At the second stage, the pixels are divided into different branches based on their transferable property which is dependent on Kullback-Leibler divergence. Different branches are assigned different weights for optimization to improve the transferability of the adversarial examples. We assign high weights to the loss of the high-transferability pixels to improve the transferability of adversarial examples. Extensive experiments with various segmentation models are conducted on PASCAL VOC 2012 and Cityscapes datasets to demonstrate the effectiveness of the proposed method. The proposed adversarial attack method can achieve state-of-the-art performance.

1 Introduction

Refer to caption — Figure 1: Visualization of Clean Images, Adversarial examples, and Segmentation Predictions. PSPNet-Res50 is used as the source model and PSPNet-Res101 is used as the target model. The adversarial example generated by using the proposed method transfers better to other segmentation models. More figures are presented in the supplementary material.

A series of research works have indicated deep learning methods [29] are vulnerable to adversarial examples [15, 26, 19], which are generated by adding the well-designed and imperceptible perturbations to the benign samples. Recent works adopt the generation of adversarial examples to study adversarial robustness in many fields of research [2, 27, 5, 4, 41], such as speech recognition, image compression, and image generation, etc. In particular, many researchers focus on the task of image classification and generate adversarial examples to attack classification models through different perspectives [51, 55, 23, 6]. More importantly, several works have indicated that adversarial examples generated on a specific white-box classification model (source models) could also fool other different black-box classification models (target models), which can be considered as the transferability of adversarial examples [42, 3, 43, 40]. The concept of transferability in adversarial examples has garnered significant research attention due to its enabling of practical black-box attacks. In detail, they explore improving the transferability of adversarial examples on the image classification task from the perspectives of data augmentation [54, 31], optimization methods [30, 35], and loss functions [53, 50].

Although the transferability of adversarial examples generated on image classification tasks has been profoundly and comprehensively explored [18, 49], the transferability of adversarial examples on semantic segmentation tasks, which can be regarded as an extension of the image classification task, has rarely been studied. In detail, image semantic segmentation endeavors to meticulously classify every individual pixel within the input image. Segmentation models have a wide range of applications in the real world, such as medical image segmentation. Hence, recent works [1, 44, 14, 21, 12] pay much attention to the adversarial robustness of the image segmentation models. However, previous works [48, 17] mainly focus on improving the success rate of adversarial examples on the white-box models, while ignoring the improvement of transferability. Some works [20] even have found that it is hard for the adversarial examples generated on segmentation models to transfer across other segmentation models. As shown in Figure 1, previous works [48, 17] about segmentation attack achieves limited transferable performance.

To improve the transferability of adversarial examples on semantic segmentation, we propose an effective two-stage adversarial attack strategy, dubbed TranSegPGD. As shown in Figure 2, we divide the entire generation process of adversarial examples on semantic segmentation into two stages. In detail, at the first stage, we divide every pixel of the input image into different branches based on its adversarial property. Then, we assign different weights to the different branches in the loss function for optimization to generate adversarial examples. At the first stage, to misclassify all pixels of the input image, we assign high weights to the loss of the hand-to-attack pixels. Motivated by the related works of out-of-distribution [34, 37, 56], it is hard for well-trained models to classify the out-of-distribution examples. It indicates that adversarial examples that are farther distributedly from the original clean examples could have higher transferability. Generalized to semantic segmentation tasks, the generated adversarial pixels, which are farther distributedly from the original clean pixels, could have higher transferability. To improve the transferability of adversarial examples on semantic segmentation, they need to be assigned high weights during the second stage of adversarial example generation. Hence at the second stage, we first compute the Kullback-Leibler (KL) divergence, which can used to measure the distance between two distributions, for each pixel in the generated adversarial image with its corresponding pixel in the original clean image. Then we divided the pixels of the generated adversarial image into different branches based on its transferable property. The loss of the high-transferability pixels is assigned to high weights to improve adversarial transferability. As shown in Figure 1, the proposed method achieves better transferability than previous segmentation adversarial attack methods.

Our main contributions are in three aspects:

•

We propose an effective two-stage adversarial attack strategy to improve the transferability of adversarial examples on semantic segmentation, dubbed TranSegPGD.
•

We also propose a dynamic attack step size to increase the diversity of generated adversarial examples, thus boosting the adversarial transferability.
•

Experiments and analyses across various network architectures and datasets are conducted to demonstrate the effectiveness of the proposed method. The proposed adversarial attack method can achieve state-of-the-art performance.

2 Related Work

Adversarial attack on image classification: for a given image classification network $f_{\boldsymbol{\theta}}(\cdot)$ with model parameters $\boldsymbol{\theta}$ , the input data $\mathbf{x}$ and the corresponding ground truth label $\mathbf{y}$ , the adversarial attack methods [15, 33, 32, 25] adopt the maximization of loss function $\mathcal{L}(f_{\boldsymbol{\theta}}(\mathbf{x}),\mathbf{y})$ to generate adversarial perturbations $\boldsymbol{\delta}$ . In detail, Goodfellow et al. propose to adopt Fast Gradient Sign Method (FGSM) [15], which is an efficient gradient-based adversarial attack method, to generate adversarial examples. Madry et al. use Projected Gradient Descent (PGD) [32], which is a multi-step iteration of FGSM, for adversarial example generation. Several works improve the attack performance of adversarial examples from multiple perspectives. Besides, recent works pay attention to improving the transferability of adversarial examples, which indicates the ability of adversarial examples generated on the white-box model to attack another black-box model. Specifically, a series of works adopt the data augmentation-based adversarial attack methods [43, 39, 24] to improve the adversarial transferability. For example, Xie et al. [46] propose to perform I-FGSM with Diverse Inputs Method (DI-FGSM) to increase adversarial transferability. Dong et al. [10] propose to adopt Translation Invariance Method to implement I-FGSM (TI-FGSM) to enhance the transferability. Some works [38, 47] boost the transferability of the adversarial examples by using optimization-based methods. Dong et al. [9] use a Momentum Iterative Fast Gradient Sign Method (MI-FGSM), which is a classic adversarial attack method, for the improvement of transferability. Lin et al. [31] propose Nesterov Iterative Fast Gradient Sign Method (NI-FGSM), which performs I-FGSM with Nesterov Accelerated Gradient (NAG), to boost transferability.

Adversarial attack on semantic segmentation: previous works adopt adversarial attack methods to evaluate semantic segmentation models’ robustness on the adversarial examples. In detail, Arnab et al. [1] propose to make use of FGSM and PGD to study the adversarial robustness of the semantic segmentation models. Some works [22, 13] attack the semantic segmentation models by generating universal adversarial perturbations. Recently, some researchers [17] have focused on exploring how to generate adversarial examples of semantic segmentation more efficiently. For example, Gu et al. [17] find that wrongly classified pixels, which drive the process of adversarial examples, cause an imbalanced gradient contribution, resulting in limited attack performance of adversarial examples. Then they propose an efficient adversarial attack method on semantic segmentation, which assigns high weights to the loss of the accurately classified pixels to relieve the impact of the wrongly classified pixels. They further propose to use the proposed attack method in adversarial training to improve model robustness, which mainly focuses on how to improve the adversarial robustness of the model rather than improving the adversarial example transferability. Moreover, a series of works [45, 16, 20] indicate that adversarial examples generated on semantic segmentation can easily overfit the source model, which makes the generated adversarial examples fool the target model hard. To improve the transferability of adversarial examples on semantic segmentation, we propose a two-stage adversarial attack strategy.

3 The Proposed Method

We propose an effective two-stage adversarial attack adversarial attack strategy. In this section, we first introduce the pipeline of the proposed method. We introduce the first-stage adversarial example generation strategy. Then we propose a second-stage adversarial example generation strategy to improve transferability.

3.1 The pipeline of the proposed method

The pipeline of the proposed method is shown in Fig. 2. The proposed method divides the generation process of adversarial examples into two stages. During the first stage, the proposed method aims to improve the adversarial performance of adversarial examples on semantic segmentation. During the second stage, the proposed method aims to improve the adversarial transferability of adversarial examples on semantic segmentation.

For a given input benign image $\mathbf{x}\in\mathbb{R}^{H\times W\times C}$ and the corresponding segmentation label $\mathbf{y}\in\mathbb{R}^{H\times W\times M}$ , a semantic segmentation model $f^{seg}_{\boldsymbol{\theta}}(\cdot)$ with the model parameters $\boldsymbol{\theta}$ categorizes every individual pixel $f^{seg}_{\boldsymbol{\theta}}(\mathbf{x})\in\mathbb{R}^{H\times W\times M}$ , where $H\times W$ represents the image size, $C$ represents the channel number of the input image, and $M$ represents the number of image classes. The objective of adversarial attacks on semantic segmentation is to generate an adversarial example, which can fool the segmentation model to misclassify the category of each pixel in the input image. However previous works mainly pay attention to adversarial performance on the source segmentation model, which is used to generate adversarial examples. However, they ignore the performance of the target model, which is not accessible to attackers. In this paper, we not only focus on the adversarial performance of the source model but also the adversarial performance of the target model, that is, how to improve the transferability of adversarial examples. Specifically, we propose an effective two-stage adversarial attack strategy to improve adversarial transferability. The proposed method improves the adversarial performance of the generated adversarial examples at the first stage. The proposed method boosts the adversarial transferability of the generated adversarial examples at the second stage.

Algorithm 1 Two-Stage Adversarial Attack Strategy

0: The semantic segmentation model

f^{seg}_{\boldsymbol{\theta}}(\cdot)

, the benign image

\mathbf{x}

, the corresponding label

\mathbf{y}

, the image size

H\times W

, the maximal perturbation

\epsilon

, the step size

\alpha

, and the iteration number

N

\mathbf{x}_{adv}^{0}=\mathbf{x}+\mathcal{U}(-\epsilon,+\epsilon)% \lx@algorithmic@hfill\quad\triangleright

initialization of the adversarial example

2: for

t=1,...,N

P=f^{seg}_{\boldsymbol{\theta}}(\mathbf{x}_{adv}^{t})\lx@algorithmic@hfill\quad\triangleright

Segmentation result on the adversarial example

P_{T},P_{F}\leftarrow P\lx@algorithmic@hfill\quad\triangleright

The first stage of pixel division

5: if

P_{F}\neq\varnothing

then

\mathcal{L}\left(f^{seg}_{\boldsymbol{\theta}}\left(\mathbf{x}_{adv}^{t}\right% ),\mathbf{y}\right)=\frac{1-\alpha}{H\times W}\sum_{i\in P_{T}}\mathcal{L}_{i}% \left(f^{seg}_{\boldsymbol{\theta}}\left(\mathbf{x}_{adv}^{t}\right),\mathbf{y% }\right)+\frac{\alpha}{H\times W}\sum_{j\in P_{F}}\mathcal{L}_{j}\left(f^{seg}% _{\boldsymbol{\theta}}\left(\mathbf{x}_{adv}^{t}\right),\mathbf{y}\right)% \lx@algorithmic@hfill\quad\triangleright

Loss calculation

7: else

D_{KL}(\mathbf{x}_{adv})=\sum_{j=1}^{n}\sigma(f^{seg}_{\boldsymbol{\theta}}(% \mathbf{x}_{adv}))_{j}\log\frac{\sigma(f^{seg}_{\boldsymbol{\theta}}(\mathbf{x% }_{adv}))_{j}}{\sigma(f^{seg}_{\boldsymbol{\theta}}(\mathbf{x}))_{j}}% \lx@algorithmic@hfill\quad\triangleright

KL distance of adversarial example

P_{L},P_{H}\leftarrow P\lx@algorithmic@hfill\quad\triangleright

The second stage of pixel division

10:

\mathcal{L}\left(f^{seg}_{\boldsymbol{\theta}}\left(\mathbf{x}_{adv}^{t}\right% ),\mathbf{y}\right)=\frac{1-\beta}{H\times W}\sum_{i\in P_{H}}\mathcal{L}_{i}% \left(f^{seg}_{\boldsymbol{\theta}}\left(\mathbf{x}_{adv}^{t}\right),\mathbf{y% }\right)+\frac{\beta}{H\times W}\sum_{j\in P_{L}}\mathcal{L}_{j}\left(f^{seg}_% {\boldsymbol{\theta}}\left(\mathbf{x}_{adv}^{t}\right),\mathbf{y}\right)% \lx@algorithmic@hfill\quad\triangleright

Loss calculation

11: end if

12:

\mathbf{x}_{adv}^{t+1}=\prod_{[-\epsilon,\epsilon]}\left[\mathbf{x}_{adv}^{t}+% \alpha\cdot\operatorname{sign}\left(\nabla_{\mathbf{x}_{adv}^{t}}\mathcal{L}% \left(f^{seg}_{\boldsymbol{\theta}}\left(\mathbf{x}_{adv}^{t}\right),\mathbf{y% }\right)\right)\right]\lx@algorithmic@hfill\quad\triangleright

Generation of adversarial examples

13: end for

3.2 The first-stage adversarial attack strategy

During the first stage, the goal is to generate the adversarial example $\mathbf{x}_{adv}$ to misclassify each pixel of the input image as soon as possible. Previous works mainly adopt a classic adversarial attack method PGD to generate adversarial examples for semantic segmentation. It can be calculated as:

\mathbf{x}_{adv}^{t+1}=\prod_{[-\epsilon,\epsilon]}\left[\mathbf{x}_{adv}^{t}+% \alpha\cdot\operatorname{sign}\left(\nabla_{\mathbf{x}_{adv}^{t}}\mathcal{L}% \left(f^{seg}_{\boldsymbol{\theta}}\left(\mathbf{x}_{adv}^{t}\right),\mathbf{y% }\right)\right)\right],

(1)

where $\mathbf{x}_{adv}^{t+1}$ represents the generated adversarial example after the $(t+1)$ -th step, $\epsilon$ represents the maximum perturbation strength, $\alpha$ represents the attack step size, and $\mathcal{L}\left(f^{seg}_{\boldsymbol{\theta}}\left(\mathbf{x}_{adv}^{t}\right% ),\mathbf{y}\right)$ represents the cross-entropy loss. However, this approach assigns equal importance to every pixel, leading to a situation where misclassified pixels dominate the adversarial example generation process, thus limiting the adversarial performance of PGD on semantic segmentation.

To generate adversarial examples more effectively during the first stage, following the previous work [17], we divide all pixels into two two branches based on the prediction accuracy, i.e., the correctly classified pixels $P_{T}$ and the incorrectly classified pixels $P_{F}$ . The loss function can be formulated:

	$\displaystyle\mathcal{L}\left(f^{seg}_{\boldsymbol{\theta}}\left(\mathbf{x}_{% adv}^{t}\right),\mathbf{y}\right)$	$\displaystyle=\frac{1-\gamma}{H\times W}\sum_{i\in P_{T}}\mathcal{L}_{i}\left(% f^{seg}_{\boldsymbol{\theta}}\left(\mathbf{x}_{adv}^{t}\right),\mathbf{y}\right)$		(2)
		$\displaystyle+\frac{\gamma}{H\times W}\sum_{j\in P_{F}}\mathcal{L}_{j}\left(f^% {seg}_{\boldsymbol{\theta}}\left(\mathbf{x}_{adv}^{t}\right),\mathbf{y}\right),$		(2)

where $\mathcal{L}_{i}$ represents the cross-entropy loss of the $i$ -th pixel for semantic segmentation, and $\gamma$ represents a hyper-parameter. Although this method can effectively improve the adversarial performance of adversarial examples, it has limited improvement in adversarial transferability. Specifically, when all pixels are misclassified, the first-stage attack method still treats all pixels equally and ignores the different transferable properties of different pixels.

Target Models			PSPNet-Res50	PSPNet-Res101	DeepLabV3-Res50	DeepLabV3-Res101
Clean Images			78.55	79.11	78.17	80.55
Source Models	PSPNet-Res50	PGD	4.60	36.91	5.05	38.59
		SegPGD	2.09	36.76	2.42	38.31
		TranSegPGD (Ours)	1.55	34.57	2.38	36.74
	PSPNet-Res101	PGD	31.67	2.88	31.18	5.42
		SegPGD	30.44	1.36	29.97	3.59
		TranSegPGD (Ours)	29.06	1.08	27.88	3.19
	DeepLabV3-Res50	PGD	4.04	32.49	3.72	33.81
		SegPGD	2.38	31.43	1.63	33.25
		TranSegPGD (Ours)	2.10	30.59	1.55	30.97
	DeepLabV3-Res101	PGD	31.23	4.84	30.67	3.48
		SegPGD	30.62	2.89	30.15	1.58
		TranSegPGD (Ours)	29.93	2.73	29.14	1.14

Table 1: Transferring adversarial examples generated on source segmentation models to target models on PASCAL VOC 2012. We present the mIoU of target models on adversarial examples and corresponding clean images. For each source model, three adversarial attack methods, which include PGD, SegPGD, and our proposed attack, are used to generate adversarial examples. The mIoU of target models on the adversarial examples is lower, which indicates that the generated adversarial examples are easier to transfer.

3.3 The second-stage adversarial attack strategy

During the second stage, the goal is to boost the transferability of adversarial examples generated in the first stage, which could not only fool the source model $f^{seg}_{\boldsymbol{\theta}_{s}}$ , but also the target model $f^{seg}_{\boldsymbol{\theta}_{t}}$ . Motivated by the previous works of out-of-distribution, the well-trained models make it hard to identify the out-of-distribution examples. Several works have proven that the adversarial examples, which are further distributedly from the original clean examples, could have higher adversarial transferability. Image segmentation is an extension of image classification. Each pixel of the input image in the segmentation task can be considered as a sample in the classification task. Hence, generalized to the task of semantic segmentation, the adversarial pixels, which are farther distributedly from the original clean pixels, could also have higher adversarial transferability. Thus, we compute the Kullback-Leribler divergence between the generated adversarial pixels $\mathbf{x}_{adv}$ and the corresponding benign pixels $\mathbf{x}$ . It can be calculated as:

D_{KL}(\mathbf{x}_{adv})^{i}=\sum_{j=1}^{n}\sigma(f^{seg}_{\boldsymbol{\theta}% }(\mathbf{x}_{adv})^{i})_{j}\log\frac{\sigma(f^{seg}_{\boldsymbol{\theta}}(% \mathbf{x}_{adv})^{i})_{j}}{\sigma(f^{seg}_{\boldsymbol{\theta}}(\mathbf{x})^{% i})_{j}},

(3)

where $D_{KL}(\mathbf{x}_{adv})^{i}$ represents the KL distance between the model output on the $i$ -th adversarial pixel and the model output on the corresponding clean pixel, $n$ represents the output dimension of the segmentation model, and $\sigma$ represents the softmax operation. Then we calculate the mean KL distance between all adversarial pixels and clean pixels, which can be formulated as $D_{KL}(\mathbf{x}_{adv})^{mean}$ . We adopt the mean KL distance to divide the adversarial pixel into different branches, i.e., the high-transferability adversarial pixels $P_{H}$ and low-transferability adversarial pixels $P_{L}$ . In detail, if the KL distance of the $i$ -th pixel $D_{KL}(\mathbf{x}_{adv})^{i}$ is greater than the KL mean distance $D_{KL}(\mathbf{x}_{adv})^{mean}$ , then the $i$ -th adversarial pixel belongs to the high-transferability adversarial pixels $P_{H}$ , otherwise it belongs to the low-transferability adversarial pixels $P_{L}$ .

Hence, the loss function at the second stage can be formulated:

	$\displaystyle\mathcal{L}\left(f^{seg}_{\boldsymbol{\theta}}\left(\mathbf{x}_{% adv}^{t}\right),\mathbf{y}\right)$	$\displaystyle=\frac{1-\beta}{H\times W}\sum_{i\in P_{H}}\mathcal{L}_{i}\left(f% ^{seg}_{\boldsymbol{\theta}}\left(\mathbf{x}_{adv}^{t}\right),\mathbf{y}\right)$		(4)
		$\displaystyle+\frac{\beta}{H\times W}\sum_{j\in P_{L}}\mathcal{L}_{j}\left(f^{% seg}_{\boldsymbol{\theta}}\left(\mathbf{x}_{adv}^{t}\right),\mathbf{y}\right),$		(4)

where $\beta$ represents a hyper-parameter at the second stage, which controls the allocation of weights. Based on the first-stage and second-stage adversarial attack strategies, we can establish our two-stage adversarial attack method to generate adversarial examples for semantic segmentation. The algorithm of the proposed method is summarized in Algorithm 1.

4 Experiments

Method	PSPNet-Res50	PSPNet-Res101	DeepLabV3-Res50	DeepLabV3-Res101	FCNs-VGG	Segformer	Segmenter
Clean	78.55	79.11	78.17	80.55	69.10	77.19	78.5
MI-FGSM	5.46	29.71	5.85	32.63	36.88	43.61	54.53
MI-FGSM-ours	2.09	26.42	2.62	28.61	34.62	42.60	53.76
TI-FGSM	4.89	35.43	7.08	39.60	35.62	47.30	55.42
TI-FGSM-ours	2.19	35.06	5.12	38.76	34.88	47.15	54.72
NI-FGSM	5.59	30.18	5.96	32.67	37.02	43.82	54.50
NI-FGSM-ours	2.17	26.35	2.77	29.02	34.69	42.85	53.96

Table 2: Transferring adversarial examples generated on source segmentation models to target models on PASCAL VOC 2012. We present the mIoU of target models on adversarial examples and corresponding clean images. PSPNet-Res50 is used as the source model. The mIoU of target models on the adversarial examples is lower, which indicates that the generated adversarial examples are easier to transfer.

4.1 Settings

Following previous works, we adopt widely used semantic segmentation datasets which include PASCAL VOC 2012 (VOC) [11] and Cityscapes (CS) [8] to conduct experiments. The VOC dataset consists of 20 classes for objects and 1 class for background. It has 1,464 training images, 1,499 validation images, and 1,456 testing images. The Cityscapes dataset consists of Urban street scene images containing high-quality pixel-level annotations. It has 19 classes with 2,975 training images, 500 validation images, and 1,525 testing images. As for the semantic segmentation models, we use FCN8s-VGG16 [36], FCN16s-VGG16 [36], PSPNet-Res50 [52], PSPNet-Res101 [52], DeepLabv3-Res50 [7] and DeepLabv3-Res101 [7] to conduct adversarial example generation and performance evaluation. As for the baseline adversarial attack methods, we adopt the popular PGD and the advanced SegPGD. We also compare the proposed method with some popular transferable adversarial attack methods on image classification tasks which include TI-FGSM, MI-FGSM, and NI-FGSM to evaluate the effectiveness of the proposed method. All comparison experiments are under the $l_{\infty}$ -norm. Specifically, we set the maximum perturbation strength $\epsilon$ to $8/255$ , the attack step size $\alpha$ to $2/255$ , and the number of attack iterations to 20. The mean Intersection over Union (mIoU) is used as a metric to evaluate the adversarial performance.

Method	PSPNet-Res50	PSPNet-Res101	DeepLabV3-Res50	DeepLabV3-Res101	Bisenet	Segformer
Clean	74.20	76.04	74.06	76.05	75.16	81.08
MI-FGSM	1.06	16.64	1.56	21.99	41.39	45.86
MI-FGSM-ours	0.32	13.20	1.36	18.56	37.85	43.95
TI-FGSM	1.08	14.25	1.84	21.53	34.72	49.20
TI-FGSM-ours	0.11	12.19	1.67	19.49	32.66	47.72
NI-FGSM	1.07	16.70	1.38	21.87	40.16	45.85
NI-FGSM-ours	0.34	13.01	1.33	18.39	37.9	43.97

Table 3: Transferring adversarial examples generated on source segmentation models to target models on Cityscapes. We present the mIoU of target models on adversarial examples and corresponding clean images. PSPNet-Res50 is used as the source model. The mIoU of target models on the adversarial examples is lower, which indicates that the generated adversarial examples are easier to transfer.

4.2 Comparisons with other adversarial attack methods on semantic segmentation

We compare the proposed method with the previous popular PGD and advanced SegPGD to evaluate adversarial transferability. In detail, we adopt PSPNet-Res50, PSPNet-Res101, DeepLabV3-Res50, and DeepLabV3-Res101 as the source models to generate adversarial examples on VOC. The results are shown in Table 1. Analyses are as follows. First, the proposed method outperforms other adversarial attack methods under all attack scenarios.

In particular, compared with the popular PGD, the proposed method not only improves the adversarial performance of adversarial examples generated on the source model but also boosts the transferable adversarial performance of adversarial examples on the target model. For example, when using the PSPNet-Res50 as the source model, the proposed method improves the adversarial accuracy of popular PGD by about 3.05% and improves a transferability performance of PGD on the PSPNet-Res101 by about 2.34%. Besides, the proposed method also achieves the best adversarial transferability on other target models. We attribute the improvements to assigning different weights to different branches of the input image. Second, compared with the advanced SegPGD, the proposed method also achieves better adversarial transferability under all attack scenarios though there is always a trade-off between adversarial performance and transferability performance. Compared with PGD, SegPGD could achieve the limit improvement of adversarial example transferability. But, the proposed method can significantly improve the transferability of adversarial examples. It is attributed to the proposed second-stage adversarial attack strategy, which assigns different weights to different transferable branches. For example, when using the DeepLabV3-Res50 as the source model, the proposed method boosts the transferability performance of SegPGD on the DeepLabV3-Res101 by about 2.28%.

Moreover, we visualize the adversarial examples and the prediction results of the adversarial examples on the other semantic segmentation models. In detail, we adopt the PSPNet-Res101 as the source model to generate adversarial examples and the DeepLabV3-Res101, DeepLabV3-Res50, and PSPNet-Res50 as the target models to evaluate the adversarial transferability. The result is shown in Figure 3. It is clear that the adversarial generated by using PGD, SegPGD, and the proposed method can successfully fool the source model. Adversarial examples generated by using PGD and SegPGD do not significantly affect the output of the target models, but the adversarial examples generated by using the proposed method can fool the target models, which demonstrates the effectiveness of the proposed method in improving adversarial example transferability for semantic segmentation.

4.3 Comparisons with other transferable attack methods on image classification

To further evaluate the effectiveness of TranSegPGD, we first generalize popular transferable attack methods which include MI-FGSM, TI-FGSM, and NI-FGSM on the image classification to the semantic segmentation and compare the proposed method with them. Our method can be combined with these transferable attack methods as a plug-and-play component to improve the transferability of adversarial examples on semantic segmentation, i.e., MI-FSGM-ours, TI-FGSM-ours, and NI-FGSM-ours.

For PASCAL VOC 2012, we use the PSPNet-Res50 as the source model to generate adversarial examples and use the PSPNet-Res101, DeepLabV3-Res50, DeppLabV3-Res101, FCNs-VGG, Segformer, and Segmenter as the target models to evaluate the adversarial transferability. The results are shown in Table 2. Performance analyses are summarized as follows. First, the proposed three-attack method achieves better adversarial transferability than their base adversarial attack methods under all attack scenarios. For example, when using the DeppLabV3-Res101 as the target model, the previous MI-FGSM achieves a transferability performance of about 32.63%, but the proposed MI-FGSM-ours achieves a transferability performance of about 28.61%, which boosts the transferability performance of about 4.02%. The previous TI-FGSM achieves a transferability performance of about 39.60%, but the proposed TI-FGSM-ours achieves a transferability performance of about 38.76%, which boosts the transferability performance of about 0.84%. The previous NI-FGSM achieves a transferability performance of about 32.67%, but the proposed MI-FGSM-ours achieves a transferability performance of about 29.02%, which boosts the transferability performance of about 3.65%. It indicates that TranSegPGD can significantly improve the transferability of adversarial examples.

For Cityscapes, we also adopt the PSPNet-Res50 as the source model to generate adversarial examples and adopt the PSPNet-Res101, DeepLabV3-Res50, DeepLabV3-Res101, Bisenet, and Segformer as the target models to evaluate the adversarial transferability. The results are shown in Table 3. We can observe a similar phenomenon on PASCAL VOC 2012. The proposed method can boost the adversarial transferability of the base adversarial attack methods under all attack scenarios. For example, when using the PSPNet-Res101 as the target model, the previous MI-FGSM obtains a transferability performance of about 16.64%, while the proposed MI-FGSM-ours achieves a transferability performance of about 13.2%, which boosts the transferability performance of about 3.44%. The previous TI-FGSM obtains a transferability performance of about 14.25%, while the proposed TI-FGSM-ours achieves a transferability performance of about 12.19%, which boosts the transferability performance of about 2.06%. The previous NI-FGSM obtains a transferability performance of about 16.70%, while the proposed NI-FGSM-ours achieves a transferability performance of about 13.01%, which boosts the transferability performance of about 3.69%. The experimental results indicate that the proposed method can further boost the transferability of adversarial examples.

4.4 Transfer to attack segment anything model

More and more works adopt foundation models to perform semantic segmentation tasks, and Segment Anything Model (SAM) [28] stands out from them. We also transfer the adversarial examples to attack SAM to evaluate the effectiveness of the proposed method on the VOC. In detail, we adopt PSPNet-Res50 as the source model to generate adversarial examples. And MI-FGSM, TI-FGSM, and NI-FGSM are used as the baseline methods. The results are shown in Figure 4. It can be observed that the proposed method can significantly improve the adversarial transferability performance of adversarial examples to SAM. We provide visualization results in the supplementary material. We also provide the experimental results on Cityscapes in the supplementary material.

4.5 Ablation Study

In this paper, we propose an effective two-stage adversarial attack strategy to boost the transferability of adversarial examples for semantic segmentation. The first-stage adversarial attack strategy is used to generate adversarial examples effectively. The second-stage adversarial attack strategy is used to improve the adversarial example transferability. To validate the effectiveness of each stage in the proposed method, we conduct ablation experiments on VOC. Specifically, we adopt the PSPNet-Res50 as the source model for adversarial example generation. PSPNet-Res101 and DeepLabV3-Res101 are used to validate the transferability of generated adversarial examples. The results are shown in Table 4. Analyses are summarized as follows.

First, when incorporating the first-stage adversarial attack strategy only, the adversarial performance on the source model significantly improves while the performance on the target models improves a little. When incorporating the second-stage adversarial attack strategy only, the adversarial performance on the source model slightly improves while the performance on the target models improves a lot. It indicates that the first-stage adversarial attack strategy contributes more to improve the adversarial performance and the second-stage adversarial attack strategy contributes more to improve the adversarial transferability. Second, using both adversarial attack strategies can achieve the adversarial performance on the source model and adversarial transferability performance, which suggests that the two adversarial attack strategies are harmonious, and their integration has the potential to achieve the best performance.

Fisrt

Stage

Second

Stage

PSPNet-

Res50

PSPNet-

Res101

DeepLabV3-

Res101

4.60

36.91

38.59

✔

2.80

36.38

38.03

✔

3.93

36.15

37.22

✔

1.55

34.57

36.74

Table 4: Ablation study of the proposed method. The mIoU(%) of segmentation models on the adversarial examples is reported. PSPNet-Res50 is used as the source model. PSPNet-Res101 and DeepLabV3-Res101 are used as the target models.

5 Conclusion

In this paper, we focus on how to improve the transferability of adversarial examples on semantic segmentation, which has been largely overlooked by previous works. We propose an effective two-stage adversarial attack strategy to boost transferability, dubbed TranSegPGD. At the first stage, we divide each pixel in an input image into different branches according to its adversarial property. We assign distinct weights to different branches for optimization to enhance the adversarial performance of all pixels. At the second stage, we divide each pixel into different branches according to its transferable property, determined by Kullback-Leibler divergence. We assign distinct weights to different branches to boost transferability. We emphasize high weights on the loss of pixels with high transferability to amplify the transferability. Extensive experiments across diverse segmentation models conducted on the PASCAL VOC 2012 and Cityscapes datasets validate the efficacy of our method. The experiment results show that our TranSegPGD achieves state-of-the-art performance.

References

Arnab et al. [2018] Anurag Arnab, Ondrej Miksik, and Philip HS Torr. On the robustness of semantic segmentation models to adversarial attacks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 888–897, 2018.
Bai et al. [2019] Yang Bai, Yan Feng, Yisen Wang, Tao Dai, Shu-Tao Xia, and Yong Jiang. Hilbert-based generative defense for adversarial examples. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4784–4793, 2019.
Bai et al. [2020] Yang Bai, Yuyuan Zeng, Yong Jiang, Yisen Wang, Shu-Tao Xia, and Weiwei Guo. Improving query efficiency of black-box adversarial attack. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXV 16, pages 101–116. Springer, 2020.
Bai et al. [2021a] Yang Bai, Xin Yan, Yong Jiang, Shu-Tao Xia, and Yisen Wang. Clustering effect of (linearized) adversarial robust models. arXiv preprint arXiv:2111.12922, 2021a.
Bai et al. [2021b] Yang Bai, Yuyuan Zeng, Yong Jiang, Shu-Tao Xia, Xingjun Ma, and Yisen Wang. Improving adversarial robustness via channel-wise activation suppressing. arXiv preprint arXiv:2103.08307, 2021b.
Bai et al. [2023] Yang Bai, Yisen Wang, Yuyuan Zeng, Yong Jiang, and Shu-Tao Xia. Query efficient black-box adversarial attack on deep neural networks. Pattern Recognition, 133:109037, 2023.
Chen et al. [2017] Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587, 2017.
Cordts et al. [2016] Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3213–3223, 2016.
Dong et al. [2018] Yinpeng Dong, Fangzhou Liao, Tianyu Pang, Hang Su, Jun Zhu, Xiaolin Hu, and Jianguo Li. Boosting adversarial attacks with momentum. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 9185–9193, 2018.
Dong et al. [2019] Yinpeng Dong, Tianyu Pang, Hang Su, and Jun Zhu. Evading defenses to transferable adversarial examples by translation-invariant attacks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4312–4321, 2019.
Everingham et al. [2010] Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. The pascal visual object classes (voc) challenge. International journal of computer vision, 88:303–338, 2010.
Fischer et al. [2021] Marc Fischer, Maximilian Baader, and Martin Vechev. Scalable certified segmentation via randomized smoothing. In International Conference on Machine Learning, pages 3340–3351. PMLR, 2021.
Fischer et al. [2017] Volker Fischer, Mummadi Chaithanya Kumar, Jan Hendrik Metzen, and Thomas Brox. Adversarial examples for semantic image segmentation. arXiv preprint arXiv:1703.01101, 2017.
Frangi et al. [2018] Alejandro F Frangi, Sailesh Conjeti, Christos Davatzikos, Nassir Navab, Julia A Schnabel, Carlos Alberola-López, Gabor Fichtinger, Magdalini Paschali, and Fernando Navarro. Generalizability vs. robustness: Investigating medical imaging networks using adversarial examples. In International Conference on Medical Image Computing and Computer-Assisted Intervention, number DZNE-2022-01068. Image Analysis, 2018.
Goodfellow et al. [2015] Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.
Gu et al. [2021] **dong Gu, Hengshuang Zhao, Volker Tresp, and Philip Torr. Adversarial examples on segmentation models can be easy to transfer. arXiv preprint arXiv:2111.11368, 2021.
Gu et al. [2022] **dong Gu, Hengshuang Zhao, Volker Tresp, and Philip HS Torr. Segpgd: An effective and efficient adversarial attack for evaluating and boosting segmentation robustness. In European Conference on Computer Vision, pages 308–325. Springer, 2022.
Gu et al. [2023] **dong Gu, Xiaojun Jia, Pau de Jorge, Wenqain Yu, Xinwei Liu, Avery Ma, Yuan Xun, Anjun Hu, Ashkan Khakzar, Zhijiang Li, et al. A survey on transferability of adversarial examples across deep neural networks. arXiv preprint arXiv:2310.17626, 2023.
He et al. [2023a] Bangyan He, Jian Liu, Yiming Li, Siyuan Liang, **gzhi Li, Xiaojun Jia, and Xiaochun Cao. Generating transferable 3d adversarial point cloud via random perturbation factorization. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 764–772, 2023a.
He et al. [2023b] Mengqi He, **g Zhang, Zhaoyuan Yang, Mingyi He, Nick Barnes, and Yuchao Dai. Transferable attack for semantic segmentation. arXiv preprint arXiv:2307.16572, 2023b.
He et al. [2019] Xiang He, Sibei Yang, Guanbin Li, Haofeng Li, Huiyou Chang, and Yizhou Yu. Non-local context encoder: Robust biomedical image segmentation against adversarial attacks. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 8417–8424, 2019.
Hendrik Metzen et al. [2017] Jan Hendrik Metzen, Mummadi Chaithanya Kumar, Thomas Brox, and Volker Fischer. Universal adversarial perturbations against semantic image segmentation. In Proceedings of the IEEE international conference on computer vision, pages 2755–2764, 2017.
Huang et al. [2023a] Lifeng Huang, Chengying Gao, and Ning Liu. Erosion attack: Harnessing corruption to improve adversarial examples. IEEE Transactions on Image Processing, 2023a.
Huang and Kong [2021] Yi Huang and Adams Wai-Kin Kong. Transferable adversarial attack based on integrated gradients. In International Conference on Learning Representations, 2021.
Huang et al. [2023b] Yihao Huang, Yue Cao, Tianlin Li, Felix Juefei-Xu, Di Lin, Ivor W Tsang, Yang Liu, and Qing Guo. On the robustness of segment anything. arXiv preprint arXiv:2305.16220, 2023b.
Jia et al. [2020] Xiaojun Jia, Xingxing Wei, Xiaochun Cao, and Xiaoguang Han. Adv-watermark: A novel watermark perturbation for adversarial examples. In Proceedings of the 28th ACM International Conference on Multimedia, pages 1579–1587, 2020.
Jia et al. [2022] Xiaojun Jia, Yong Zhang, Baoyuan Wu, Ke Ma, Jue Wang, and Xiaochun Cao. Las-at: adversarial training with learnable attack strategy. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13398–13408, 2022.
Kirillov et al. [2023] Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al. Segment anything. arXiv preprint arXiv:2304.02643, 2023.
LeCun et al. [2015] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature, 521(7553):436–444, 2015.
Li et al. [2020] Maosen Li, Cheng Deng, Tengjiao Li, Junchi Yan, Xinbo Gao, and Heng Huang. Towards transferable targeted attack. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 641–649, 2020.
Lin et al. [2019] Jiadong Lin, Chuanbiao Song, Kun He, Liwei Wang, and John E Hopcroft. Nesterov accelerated gradient and scale invariance for adversarial attacks. In International Conference on Learning Representations, 2019.
Madry et al. [2018] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018.
Moosavi-Dezfooli et al. [2016] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2574–2582, 2016.
Nagarajan et al. [2020] Vaishnavh Nagarajan, Anders Andreassen, and Behnam Neyshabur. Understanding the failure modes of out-of-distribution generalization. In International Conference on Learning Representations, 2020.
Qin et al. [2022] Zeyu Qin, Yanbo Fan, Yi Liu, Li Shen, Yong Zhang, Jue Wang, and Baoyuan Wu. Boosting the transferability of adversarial attacks with reverse adversarial perturbation. Advances in Neural Information Processing Systems, 35:29845–29858, 2022.
Shelhamer et al. [2017] Evan Shelhamer, Jonathan Long, and Trevor Darrell. Fully convolutional networks for semantic segmentation. IEEE transactions on pattern analysis and machine intelligence, 39(4):640–651, 2017.
Wald et al. [2021] Yoav Wald, Amir Feder, Daniel Greenfeld, and Uri Shalit. On calibration and out-of-domain generalization. Advances in neural information processing systems, 34:2215–2227, 2021.
Wang and He [2021] Xiaosen Wang and Kun He. Enhancing the transferability of adversarial attacks through variance tuning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1924–1933, 2021.
Wang et al. [2021] Xiaosen Wang, Xuanran He, **gdong Wang, and Kun He. Admix: Enhancing the transferability of adversarial attacks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 16158–16167, 2021.
Wang and Farnia [2023] Yilin Wang and Farzan Farnia. On the role of generalization in transferability of adversarial examples. In Uncertainty in Artificial Intelligence, pages 2259–2270. PMLR, 2023.
Wang et al. [2023] Zhaoxin Wang, Handing Wang, Cong Tian, and Yaochu **. Adversarial training of deep neural networks guided by texture and structural information. In Proceedings of the 31st ACM International Conference on Multimedia, pages 4958–4967, 2023.
Wu and Zhu [2020] Lei Wu and Zhanxing Zhu. Towards understanding and improving the transferability of adversarial examples in deep neural networks. In Asian Conference on Machine Learning, pages 837–850. PMLR, 2020.
Wu et al. [2021] Weibin Wu, Yuxin Su, Michael R Lyu, and Irwin King. Improving the transferability of adversarial samples with adversarial transformations. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9024–9033, 2021.
Xiao et al. [2018] Chaowei Xiao, Ruizhi Deng, Bo Li, Fisher Yu, Mingyan Liu, and Dawn Song. Characterizing adversarial examples based on spatial consistency information for semantic segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), pages 217–234, 2018.
Xie et al. [2017] Cihang Xie, Jianyu Wang, Zhishuai Zhang, Yuyin Zhou, Lingxi Xie, and Alan Yuille. Adversarial examples for semantic segmentation and object detection. In Proceedings of the IEEE international conference on computer vision, pages 1369–1378, 2017.
Xie et al. [2019] Cihang Xie, Zhishuai Zhang, Yuyin Zhou, Song Bai, Jianyu Wang, Zhou Ren, and Alan L Yuille. Improving transferability of adversarial examples with input diversity. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2730–2739, 2019.
Xiong et al. [2022] Yifeng Xiong, Jiadong Lin, Min Zhang, John E Hopcroft, and Kun He. Stochastic variance reduced ensemble adversarial attack for boosting the adversarial transferability. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14983–14992, 2022.
Xu et al. [2021] Xiaogang Xu, Hengshuang Zhao, and Jiaya Jia. Dynamic divide-and-conquer adversarial training for robust semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7486–7495, 2021.
Yu et al. [2023] Wenqian Yu, **dong Gu, Zhijiang Li, and Philip Torr. Reliable evaluation of adversarial transferability. arXiv preprint arXiv:2306.08565, 2023.
Zhang et al. [2022] Chaoning Zhang, Philipp Benz, Adil Karjauv, Jae Won Cho, Kang Zhang, and In So Kweon. Investigating top-k white-box and transferable black-box attack. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15085–15094, 2022.
Zhang et al. [2023] Jian** Zhang, Jen-tse Huang, Wenxuan Wang, Yichen Li, Weibin Wu, Xiaosen Wang, Yuxin Su, and Michael R Lyu. Improving the transferability of adversarial samples by path-augmented method. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8173–8182, 2023.
Zhao et al. [2017] Hengshuang Zhao, Jian** Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. Pyramid scene parsing network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2881–2890, 2017.
Zhao et al. [2021] Zhengyu Zhao, Zhuoran Liu, and Martha Larson. On success and simplicity: A second look at transferable targeted attacks. Advances in Neural Information Processing Systems, 34:6115–6128, 2021.
Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2921–2929, 2016.
Zhu et al. [2023] Hegui Zhu, Haoran Zheng, Ying Zhu, and Xiaoyan Sui. Boosting the transferability of adversarial attacks with adaptive points selecting in temporal neighborhood. Information Sciences, 641:119081, 2023.
Zhu et al. [2022] Yao Zhu, Yuefeng Chen, Xiaodan Li, Kejiang Chen, Yuan He, Xiang Tian, Bolun Zheng, Yaowu Chen, and Qingming Huang. Toward understanding and boosting adversarial transferability from a distribution perspective. IEEE Transactions on Image Processing, 31:6487–6501, 2022.