License: arXiv.org perpetual non-exclusive license
arXiv:2312.02207v1 [cs.CV] 03 Dec 2023

TranSegPGD: Improving Transferability of Adversarial Examples on Semantic Segmentation

Xiaojun Jia11{}^{1}start_FLOATSUPERSCRIPT 1 end_FLOATSUPERSCRIPT , **dong Gu22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT, Yihao Huang11{}^{1}start_FLOATSUPERSCRIPT 1 end_FLOATSUPERSCRIPT, Simeng Qin33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPT, Qing Guo44{}^{4}start_FLOATSUPERSCRIPT 4 end_FLOATSUPERSCRIPT, Yang Liu11{}^{1}start_FLOATSUPERSCRIPT 1 end_FLOATSUPERSCRIPT, Xiaochun Cao5,5{}^{5,\ddagger}start_FLOATSUPERSCRIPT 5 , ‡ end_FLOATSUPERSCRIPT
11{}^{1}start_FLOATSUPERSCRIPT 1 end_FLOATSUPERSCRIPTNanyang Technological University, Singapore 22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPTUniversity of Oxford, UK
33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPT Yanshan University, China
44{}^{4}start_FLOATSUPERSCRIPT 4 end_FLOATSUPERSCRIPT CFAR and IHPC, Agency for Science, Technology and Research (A*STAR), Singapore
55{}^{5}start_FLOATSUPERSCRIPT 5 end_FLOATSUPERSCRIPT Shenzhen Campus of Sun Yat-sen University, China
[email protected]; [email protected]; [email protected];
[email protected]; [email protected]; [email protected]; [email protected]
Abstract

Transferability of adversarial examples on image classification has been systematically explored, which generates adversarial examples in black-box mode. However, the transferability of adversarial examples on semantic segmentation has been largely overlooked. In this paper, we propose an effective two-stage adversarial attack strategy to improve the transferability of adversarial examples on semantic segmentation, dubbed TranSegPGD. Specifically, at the first stage, every pixel in an input image is divided into different branches based on its adversarial property. Different branches are assigned different weights for optimization to improve the adversarial performance of all pixels. We assign high weights to the loss of the hard-to-attack pixels to misclassify all pixels. At the second stage, the pixels are divided into different branches based on their transferable property which is dependent on Kullback-Leibler divergence. Different branches are assigned different weights for optimization to improve the transferability of the adversarial examples. We assign high weights to the loss of the high-transferability pixels to improve the transferability of adversarial examples. Extensive experiments with various segmentation models are conducted on PASCAL VOC 2012 and Cityscapes datasets to demonstrate the effectiveness of the proposed method. The proposed adversarial attack method can achieve state-of-the-art performance.

1 Introduction

Refer to caption
Figure 1: Visualization of Clean Images, Adversarial examples, and Segmentation Predictions. PSPNet-Res50 is used as the source model and PSPNet-Res101 is used as the target model. The adversarial example generated by using the proposed method transfers better to other segmentation models. More figures are presented in the supplementary material.
Refer to caption
Figure 2: The proposed method consists of two-stage adversarial attack strategies. At the first stage, all pixels in the input image are divided into different branches based on their adversarial property. At the second stage, all pixels are divided into different branches based on their transferable property.

A series of research works have indicated deep learning methods [29] are vulnerable to adversarial examples [15, 26, 19], which are generated by adding the well-designed and imperceptible perturbations to the benign samples. Recent works adopt the generation of adversarial examples to study adversarial robustness in many fields of research [2, 27, 5, 4, 41], such as speech recognition, image compression, and image generation, etc. In particular, many researchers focus on the task of image classification and generate adversarial examples to attack classification models through different perspectives [51, 55, 23, 6]. More importantly, several works have indicated that adversarial examples generated on a specific white-box classification model (source models) could also fool other different black-box classification models (target models), which can be considered as the transferability of adversarial examples [42, 3, 43, 40]. The concept of transferability in adversarial examples has garnered significant research attention due to its enabling of practical black-box attacks. In detail, they explore improving the transferability of adversarial examples on the image classification task from the perspectives of data augmentation [54, 31], optimization methods [30, 35], and loss functions [53, 50].

Although the transferability of adversarial examples generated on image classification tasks has been profoundly and comprehensively explored [18, 49], the transferability of adversarial examples on semantic segmentation tasks, which can be regarded as an extension of the image classification task, has rarely been studied. In detail, image semantic segmentation endeavors to meticulously classify every individual pixel within the input image. Segmentation models have a wide range of applications in the real world, such as medical image segmentation. Hence, recent works [1, 44, 14, 21, 12] pay much attention to the adversarial robustness of the image segmentation models. However, previous works [48, 17] mainly focus on improving the success rate of adversarial examples on the white-box models, while ignoring the improvement of transferability. Some works [20] even have found that it is hard for the adversarial examples generated on segmentation models to transfer across other segmentation models. As shown in Figure 1, previous works [48, 17] about segmentation attack achieves limited transferable performance.

To improve the transferability of adversarial examples on semantic segmentation, we propose an effective two-stage adversarial attack strategy, dubbed TranSegPGD. As shown in Figure 2, we divide the entire generation process of adversarial examples on semantic segmentation into two stages. In detail, at the first stage, we divide every pixel of the input image into different branches based on its adversarial property. Then, we assign different weights to the different branches in the loss function for optimization to generate adversarial examples. At the first stage, to misclassify all pixels of the input image, we assign high weights to the loss of the hand-to-attack pixels. Motivated by the related works of out-of-distribution [34, 37, 56], it is hard for well-trained models to classify the out-of-distribution examples. It indicates that adversarial examples that are farther distributedly from the original clean examples could have higher transferability. Generalized to semantic segmentation tasks, the generated adversarial pixels, which are farther distributedly from the original clean pixels, could have higher transferability. To improve the transferability of adversarial examples on semantic segmentation, they need to be assigned high weights during the second stage of adversarial example generation. Hence at the second stage, we first compute the Kullback-Leibler (KL) divergence, which can used to measure the distance between two distributions, for each pixel in the generated adversarial image with its corresponding pixel in the original clean image. Then we divided the pixels of the generated adversarial image into different branches based on its transferable property. The loss of the high-transferability pixels is assigned to high weights to improve adversarial transferability. As shown in Figure 1, the proposed method achieves better transferability than previous segmentation adversarial attack methods.

Our main contributions are in three aspects:

  • We propose an effective two-stage adversarial attack strategy to improve the transferability of adversarial examples on semantic segmentation, dubbed TranSegPGD.

  • We also propose a dynamic attack step size to increase the diversity of generated adversarial examples, thus boosting the adversarial transferability.

  • Experiments and analyses across various network architectures and datasets are conducted to demonstrate the effectiveness of the proposed method. The proposed adversarial attack method can achieve state-of-the-art performance.

2 Related Work

Adversarial attack on image classification: for a given image classification network f𝜽()subscript𝑓𝜽f_{\boldsymbol{\theta}}(\cdot)italic_f start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( ⋅ ) with model parameters 𝜽𝜽\boldsymbol{\theta}bold_italic_θ, the input data 𝐱𝐱\mathbf{x}bold_x and the corresponding ground truth label 𝐲𝐲\mathbf{y}bold_y, the adversarial attack methods [15, 33, 32, 25] adopt the maximization of loss function (f𝜽(𝐱),𝐲)subscript𝑓𝜽𝐱𝐲\mathcal{L}(f_{\boldsymbol{\theta}}(\mathbf{x}),\mathbf{y})caligraphic_L ( italic_f start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_x ) , bold_y ) to generate adversarial perturbations 𝜹𝜹\boldsymbol{\delta}bold_italic_δ. In detail, Goodfellow et al. propose to adopt Fast Gradient Sign Method (FGSM) [15], which is an efficient gradient-based adversarial attack method, to generate adversarial examples. Madry et al. use Projected Gradient Descent (PGD) [32], which is a multi-step iteration of FGSM, for adversarial example generation. Several works improve the attack performance of adversarial examples from multiple perspectives. Besides, recent works pay attention to improving the transferability of adversarial examples, which indicates the ability of adversarial examples generated on the white-box model to attack another black-box model. Specifically, a series of works adopt the data augmentation-based adversarial attack methods [43, 39, 24] to improve the adversarial transferability. For example, Xie et al.  [46] propose to perform I-FGSM with Diverse Inputs Method (DI-FGSM) to increase adversarial transferability. Dong et al. [10] propose to adopt Translation Invariance Method to implement I-FGSM (TI-FGSM) to enhance the transferability. Some works [38, 47] boost the transferability of the adversarial examples by using optimization-based methods. Dong et al.  [9] use a Momentum Iterative Fast Gradient Sign Method (MI-FGSM), which is a classic adversarial attack method, for the improvement of transferability. Lin et al. [31] propose Nesterov Iterative Fast Gradient Sign Method (NI-FGSM), which performs I-FGSM with Nesterov Accelerated Gradient (NAG), to boost transferability.

Adversarial attack on semantic segmentation: previous works adopt adversarial attack methods to evaluate semantic segmentation models’ robustness on the adversarial examples. In detail, Arnab et al. [1] propose to make use of FGSM and PGD to study the adversarial robustness of the semantic segmentation models. Some works [22, 13] attack the semantic segmentation models by generating universal adversarial perturbations. Recently, some researchers [17] have focused on exploring how to generate adversarial examples of semantic segmentation more efficiently. For example, Gu et al. [17] find that wrongly classified pixels, which drive the process of adversarial examples, cause an imbalanced gradient contribution, resulting in limited attack performance of adversarial examples. Then they propose an efficient adversarial attack method on semantic segmentation, which assigns high weights to the loss of the accurately classified pixels to relieve the impact of the wrongly classified pixels. They further propose to use the proposed attack method in adversarial training to improve model robustness, which mainly focuses on how to improve the adversarial robustness of the model rather than improving the adversarial example transferability. Moreover, a series of works [45, 16, 20] indicate that adversarial examples generated on semantic segmentation can easily overfit the source model, which makes the generated adversarial examples fool the target model hard. To improve the transferability of adversarial examples on semantic segmentation, we propose a two-stage adversarial attack strategy.

3 The Proposed Method

We propose an effective two-stage adversarial attack adversarial attack strategy. In this section, we first introduce the pipeline of the proposed method. We introduce the first-stage adversarial example generation strategy. Then we propose a second-stage adversarial example generation strategy to improve transferability.

3.1 The pipeline of the proposed method

The pipeline of the proposed method is shown in Fig. 2. The proposed method divides the generation process of adversarial examples into two stages. During the first stage, the proposed method aims to improve the adversarial performance of adversarial examples on semantic segmentation. During the second stage, the proposed method aims to improve the adversarial transferability of adversarial examples on semantic segmentation.

For a given input benign image 𝐱H×W×C𝐱superscript𝐻𝑊𝐶\mathbf{x}\in\mathbb{R}^{H\times W\times C}bold_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_H × italic_W × italic_C end_POSTSUPERSCRIPT and the corresponding segmentation label 𝐲H×W×M𝐲superscript𝐻𝑊𝑀\mathbf{y}\in\mathbb{R}^{H\times W\times M}bold_y ∈ blackboard_R start_POSTSUPERSCRIPT italic_H × italic_W × italic_M end_POSTSUPERSCRIPT, a semantic segmentation model f𝜽seg()subscriptsuperscript𝑓𝑠𝑒𝑔𝜽f^{seg}_{\boldsymbol{\theta}}(\cdot)italic_f start_POSTSUPERSCRIPT italic_s italic_e italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( ⋅ ) with the model parameters 𝜽𝜽\boldsymbol{\theta}bold_italic_θ categorizes every individual pixel f𝜽seg(𝐱)H×W×Msubscriptsuperscript𝑓𝑠𝑒𝑔𝜽𝐱superscript𝐻𝑊𝑀f^{seg}_{\boldsymbol{\theta}}(\mathbf{x})\in\mathbb{R}^{H\times W\times M}italic_f start_POSTSUPERSCRIPT italic_s italic_e italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_x ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_H × italic_W × italic_M end_POSTSUPERSCRIPT, where H×W𝐻𝑊H\times Witalic_H × italic_W represents the image size, C𝐶Citalic_C represents the channel number of the input image, and M𝑀Mitalic_M represents the number of image classes. The objective of adversarial attacks on semantic segmentation is to generate an adversarial example, which can fool the segmentation model to misclassify the category of each pixel in the input image. However previous works mainly pay attention to adversarial performance on the source segmentation model, which is used to generate adversarial examples. However, they ignore the performance of the target model, which is not accessible to attackers. In this paper, we not only focus on the adversarial performance of the source model but also the adversarial performance of the target model, that is, how to improve the transferability of adversarial examples. Specifically, we propose an effective two-stage adversarial attack strategy to improve adversarial transferability. The proposed method improves the adversarial performance of the generated adversarial examples at the first stage. The proposed method boosts the adversarial transferability of the generated adversarial examples at the second stage.

Algorithm 1 Two-Stage Adversarial Attack Strategy
0:  The semantic segmentation model f𝜽seg()subscriptsuperscript𝑓𝑠𝑒𝑔𝜽f^{seg}_{\boldsymbol{\theta}}(\cdot)italic_f start_POSTSUPERSCRIPT italic_s italic_e italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( ⋅ ), the benign image 𝐱𝐱\mathbf{x}bold_x, the corresponding label 𝐲𝐲\mathbf{y}bold_y, the image size H×W𝐻𝑊H\times Witalic_H × italic_W, the maximal perturbation ϵitalic-ϵ\epsilonitalic_ϵ, the step size α𝛼\alphaitalic_α, and the iteration number N𝑁Nitalic_N.
1:  𝐱adv0=𝐱+𝒰(ϵ,+ϵ)superscriptsubscript𝐱𝑎𝑑𝑣0𝐱𝒰italic-ϵitalic-ϵ\mathbf{x}_{adv}^{0}=\mathbf{x}+\mathcal{U}(-\epsilon,+\epsilon)% \lx@algorithmic@hfill\quad\trianglerightbold_x start_POSTSUBSCRIPT italic_a italic_d italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT = bold_x + caligraphic_U ( - italic_ϵ , + italic_ϵ ) ▷ initialization of the adversarial example
2:  for t=1,,N𝑡1𝑁t=1,...,Nitalic_t = 1 , … , italic_N do
3:     P=f𝜽seg(𝐱advt)𝑃subscriptsuperscript𝑓𝑠𝑒𝑔𝜽superscriptsubscript𝐱𝑎𝑑𝑣𝑡P=f^{seg}_{\boldsymbol{\theta}}(\mathbf{x}_{adv}^{t})\lx@algorithmic@hfill\quad\trianglerightitalic_P = italic_f start_POSTSUPERSCRIPT italic_s italic_e italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_a italic_d italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ▷ Segmentation result on the adversarial example
4:     PT,PFPformulae-sequencesubscript𝑃𝑇subscript𝑃𝐹𝑃P_{T},P_{F}\leftarrow P\lx@algorithmic@hfill\quad\trianglerightitalic_P start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ← italic_P ▷ The first stage of pixel division
5:     if PFsubscript𝑃𝐹P_{F}\neq\varnothingitalic_P start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ≠ ∅ then
6:        (f𝜽seg(𝐱advt),𝐲)=1αH×WiPTi(f𝜽seg(𝐱advt),𝐲)+αH×WjPFj(f𝜽seg(𝐱advt),𝐲)subscriptsuperscript𝑓𝑠𝑒𝑔𝜽superscriptsubscript𝐱𝑎𝑑𝑣𝑡𝐲1𝛼𝐻𝑊subscript𝑖subscript𝑃𝑇subscript𝑖subscriptsuperscript𝑓𝑠𝑒𝑔𝜽superscriptsubscript𝐱𝑎𝑑𝑣𝑡𝐲𝛼𝐻𝑊subscript𝑗subscript𝑃𝐹subscript𝑗subscriptsuperscript𝑓𝑠𝑒𝑔𝜽superscriptsubscript𝐱𝑎𝑑𝑣𝑡𝐲\mathcal{L}\left(f^{seg}_{\boldsymbol{\theta}}\left(\mathbf{x}_{adv}^{t}\right% ),\mathbf{y}\right)=\frac{1-\alpha}{H\times W}\sum_{i\in P_{T}}\mathcal{L}_{i}% \left(f^{seg}_{\boldsymbol{\theta}}\left(\mathbf{x}_{adv}^{t}\right),\mathbf{y% }\right)+\frac{\alpha}{H\times W}\sum_{j\in P_{F}}\mathcal{L}_{j}\left(f^{seg}% _{\boldsymbol{\theta}}\left(\mathbf{x}_{adv}^{t}\right),\mathbf{y}\right)% \lx@algorithmic@hfill\quad\trianglerightcaligraphic_L ( italic_f start_POSTSUPERSCRIPT italic_s italic_e italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_a italic_d italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , bold_y ) = divide start_ARG 1 - italic_α end_ARG start_ARG italic_H × italic_W end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_P start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_f start_POSTSUPERSCRIPT italic_s italic_e italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_a italic_d italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , bold_y ) + divide start_ARG italic_α end_ARG start_ARG italic_H × italic_W end_ARG ∑ start_POSTSUBSCRIPT italic_j ∈ italic_P start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_f start_POSTSUPERSCRIPT italic_s italic_e italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_a italic_d italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , bold_y ) ▷ Loss calculation
7:     else
8:        DKL(𝐱adv)=j=1nσ(f𝜽seg(𝐱adv))jlogσ(f𝜽seg(𝐱adv))jσ(f𝜽seg(𝐱))jsubscript𝐷𝐾𝐿subscript𝐱𝑎𝑑𝑣superscriptsubscript𝑗1𝑛𝜎subscriptsubscriptsuperscript𝑓𝑠𝑒𝑔𝜽subscript𝐱𝑎𝑑𝑣𝑗𝜎subscriptsubscriptsuperscript𝑓𝑠𝑒𝑔𝜽subscript𝐱𝑎𝑑𝑣𝑗𝜎subscriptsubscriptsuperscript𝑓𝑠𝑒𝑔𝜽𝐱𝑗D_{KL}(\mathbf{x}_{adv})=\sum_{j=1}^{n}\sigma(f^{seg}_{\boldsymbol{\theta}}(% \mathbf{x}_{adv}))_{j}\log\frac{\sigma(f^{seg}_{\boldsymbol{\theta}}(\mathbf{x% }_{adv}))_{j}}{\sigma(f^{seg}_{\boldsymbol{\theta}}(\mathbf{x}))_{j}}% \lx@algorithmic@hfill\quad\trianglerightitalic_D start_POSTSUBSCRIPT italic_K italic_L end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_a italic_d italic_v end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_σ ( italic_f start_POSTSUPERSCRIPT italic_s italic_e italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_a italic_d italic_v end_POSTSUBSCRIPT ) ) start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT roman_log divide start_ARG italic_σ ( italic_f start_POSTSUPERSCRIPT italic_s italic_e italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_a italic_d italic_v end_POSTSUBSCRIPT ) ) start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_σ ( italic_f start_POSTSUPERSCRIPT italic_s italic_e italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_x ) ) start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG ▷ KL distance of adversarial example
9:        PL,PHPformulae-sequencesubscript𝑃𝐿subscript𝑃𝐻𝑃P_{L},P_{H}\leftarrow P\lx@algorithmic@hfill\quad\trianglerightitalic_P start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT ← italic_P ▷ The second stage of pixel division
10:        (f𝜽seg(𝐱advt),𝐲)=1βH×WiPHi(f𝜽seg(𝐱advt),𝐲)+βH×WjPLj(f𝜽seg(𝐱advt),𝐲)subscriptsuperscript𝑓𝑠𝑒𝑔𝜽superscriptsubscript𝐱𝑎𝑑𝑣𝑡𝐲1𝛽𝐻𝑊subscript𝑖subscript𝑃𝐻subscript𝑖subscriptsuperscript𝑓𝑠𝑒𝑔𝜽superscriptsubscript𝐱𝑎𝑑𝑣𝑡𝐲𝛽𝐻𝑊subscript𝑗subscript𝑃𝐿subscript𝑗subscriptsuperscript𝑓𝑠𝑒𝑔𝜽superscriptsubscript𝐱𝑎𝑑𝑣𝑡𝐲\mathcal{L}\left(f^{seg}_{\boldsymbol{\theta}}\left(\mathbf{x}_{adv}^{t}\right% ),\mathbf{y}\right)=\frac{1-\beta}{H\times W}\sum_{i\in P_{H}}\mathcal{L}_{i}% \left(f^{seg}_{\boldsymbol{\theta}}\left(\mathbf{x}_{adv}^{t}\right),\mathbf{y% }\right)+\frac{\beta}{H\times W}\sum_{j\in P_{L}}\mathcal{L}_{j}\left(f^{seg}_% {\boldsymbol{\theta}}\left(\mathbf{x}_{adv}^{t}\right),\mathbf{y}\right)% \lx@algorithmic@hfill\quad\trianglerightcaligraphic_L ( italic_f start_POSTSUPERSCRIPT italic_s italic_e italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_a italic_d italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , bold_y ) = divide start_ARG 1 - italic_β end_ARG start_ARG italic_H × italic_W end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_P start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_f start_POSTSUPERSCRIPT italic_s italic_e italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_a italic_d italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , bold_y ) + divide start_ARG italic_β end_ARG start_ARG italic_H × italic_W end_ARG ∑ start_POSTSUBSCRIPT italic_j ∈ italic_P start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_f start_POSTSUPERSCRIPT italic_s italic_e italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_a italic_d italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , bold_y ) ▷ Loss calculation
11:     end if
12:     𝐱advt+1=[ϵ,ϵ][𝐱advt+αsign(𝐱advt(f𝜽seg(𝐱advt),𝐲))]superscriptsubscript𝐱𝑎𝑑𝑣𝑡1subscriptproductitalic-ϵitalic-ϵdelimited-[]superscriptsubscript𝐱𝑎𝑑𝑣𝑡𝛼signsubscriptsuperscriptsubscript𝐱𝑎𝑑𝑣𝑡subscriptsuperscript𝑓𝑠𝑒𝑔𝜽superscriptsubscript𝐱𝑎𝑑𝑣𝑡𝐲\mathbf{x}_{adv}^{t+1}=\prod_{[-\epsilon,\epsilon]}\left[\mathbf{x}_{adv}^{t}+% \alpha\cdot\operatorname{sign}\left(\nabla_{\mathbf{x}_{adv}^{t}}\mathcal{L}% \left(f^{seg}_{\boldsymbol{\theta}}\left(\mathbf{x}_{adv}^{t}\right),\mathbf{y% }\right)\right)\right]\lx@algorithmic@hfill\quad\trianglerightbold_x start_POSTSUBSCRIPT italic_a italic_d italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = ∏ start_POSTSUBSCRIPT [ - italic_ϵ , italic_ϵ ] end_POSTSUBSCRIPT [ bold_x start_POSTSUBSCRIPT italic_a italic_d italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + italic_α ⋅ roman_sign ( ∇ start_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT italic_a italic_d italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_POSTSUBSCRIPT caligraphic_L ( italic_f start_POSTSUPERSCRIPT italic_s italic_e italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_a italic_d italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , bold_y ) ) ] ▷ Generation of adversarial examples
13:  end for

3.2 The first-stage adversarial attack strategy

During the first stage, the goal is to generate the adversarial example 𝐱advsubscript𝐱𝑎𝑑𝑣\mathbf{x}_{adv}bold_x start_POSTSUBSCRIPT italic_a italic_d italic_v end_POSTSUBSCRIPT to misclassify each pixel of the input image as soon as possible. Previous works mainly adopt a classic adversarial attack method PGD to generate adversarial examples for semantic segmentation. It can be calculated as:

𝐱advt+1=[ϵ,ϵ][𝐱advt+αsign(𝐱advt(f𝜽seg(𝐱advt),𝐲))],superscriptsubscript𝐱𝑎𝑑𝑣𝑡1subscriptproductitalic-ϵitalic-ϵdelimited-[]superscriptsubscript𝐱𝑎𝑑𝑣𝑡𝛼signsubscriptsuperscriptsubscript𝐱𝑎𝑑𝑣𝑡subscriptsuperscript𝑓𝑠𝑒𝑔𝜽superscriptsubscript𝐱𝑎𝑑𝑣𝑡𝐲\mathbf{x}_{adv}^{t+1}=\prod_{[-\epsilon,\epsilon]}\left[\mathbf{x}_{adv}^{t}+% \alpha\cdot\operatorname{sign}\left(\nabla_{\mathbf{x}_{adv}^{t}}\mathcal{L}% \left(f^{seg}_{\boldsymbol{\theta}}\left(\mathbf{x}_{adv}^{t}\right),\mathbf{y% }\right)\right)\right],bold_x start_POSTSUBSCRIPT italic_a italic_d italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = ∏ start_POSTSUBSCRIPT [ - italic_ϵ , italic_ϵ ] end_POSTSUBSCRIPT [ bold_x start_POSTSUBSCRIPT italic_a italic_d italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + italic_α ⋅ roman_sign ( ∇ start_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT italic_a italic_d italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_POSTSUBSCRIPT caligraphic_L ( italic_f start_POSTSUPERSCRIPT italic_s italic_e italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_a italic_d italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , bold_y ) ) ] , (1)

where 𝐱advt+1superscriptsubscript𝐱𝑎𝑑𝑣𝑡1\mathbf{x}_{adv}^{t+1}bold_x start_POSTSUBSCRIPT italic_a italic_d italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT represents the generated adversarial example after the (t+1)𝑡1(t+1)( italic_t + 1 )-th step, ϵitalic-ϵ\epsilonitalic_ϵ represents the maximum perturbation strength, α𝛼\alphaitalic_α represents the attack step size, and (f𝜽seg(𝐱advt),𝐲)subscriptsuperscript𝑓𝑠𝑒𝑔𝜽superscriptsubscript𝐱𝑎𝑑𝑣𝑡𝐲\mathcal{L}\left(f^{seg}_{\boldsymbol{\theta}}\left(\mathbf{x}_{adv}^{t}\right% ),\mathbf{y}\right)caligraphic_L ( italic_f start_POSTSUPERSCRIPT italic_s italic_e italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_a italic_d italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , bold_y ) represents the cross-entropy loss. However, this approach assigns equal importance to every pixel, leading to a situation where misclassified pixels dominate the adversarial example generation process, thus limiting the adversarial performance of PGD on semantic segmentation.

To generate adversarial examples more effectively during the first stage, following the previous work [17], we divide all pixels into two two branches based on the prediction accuracy, i.e., the correctly classified pixels PTsubscript𝑃𝑇P_{T}italic_P start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT and the incorrectly classified pixels PFsubscript𝑃𝐹P_{F}italic_P start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT. The loss function can be formulated:

(f𝜽seg(𝐱advt),𝐲)subscriptsuperscript𝑓𝑠𝑒𝑔𝜽superscriptsubscript𝐱𝑎𝑑𝑣𝑡𝐲\displaystyle\mathcal{L}\left(f^{seg}_{\boldsymbol{\theta}}\left(\mathbf{x}_{% adv}^{t}\right),\mathbf{y}\right)caligraphic_L ( italic_f start_POSTSUPERSCRIPT italic_s italic_e italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_a italic_d italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , bold_y ) =1γH×WiPTi(f𝜽seg(𝐱advt),𝐲)absent1𝛾𝐻𝑊subscript𝑖subscript𝑃𝑇subscript𝑖subscriptsuperscript𝑓𝑠𝑒𝑔𝜽superscriptsubscript𝐱𝑎𝑑𝑣𝑡𝐲\displaystyle=\frac{1-\gamma}{H\times W}\sum_{i\in P_{T}}\mathcal{L}_{i}\left(% f^{seg}_{\boldsymbol{\theta}}\left(\mathbf{x}_{adv}^{t}\right),\mathbf{y}\right)= divide start_ARG 1 - italic_γ end_ARG start_ARG italic_H × italic_W end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_P start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_f start_POSTSUPERSCRIPT italic_s italic_e italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_a italic_d italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , bold_y ) (2)
+γH×WjPFj(f𝜽seg(𝐱advt),𝐲),𝛾𝐻𝑊subscript𝑗subscript𝑃𝐹subscript𝑗subscriptsuperscript𝑓𝑠𝑒𝑔𝜽superscriptsubscript𝐱𝑎𝑑𝑣𝑡𝐲\displaystyle+\frac{\gamma}{H\times W}\sum_{j\in P_{F}}\mathcal{L}_{j}\left(f^% {seg}_{\boldsymbol{\theta}}\left(\mathbf{x}_{adv}^{t}\right),\mathbf{y}\right),+ divide start_ARG italic_γ end_ARG start_ARG italic_H × italic_W end_ARG ∑ start_POSTSUBSCRIPT italic_j ∈ italic_P start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_f start_POSTSUPERSCRIPT italic_s italic_e italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_a italic_d italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , bold_y ) ,

where isubscript𝑖\mathcal{L}_{i}caligraphic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT represents the cross-entropy loss of the i𝑖iitalic_i-th pixel for semantic segmentation, and γ𝛾\gammaitalic_γ represents a hyper-parameter. Although this method can effectively improve the adversarial performance of adversarial examples, it has limited improvement in adversarial transferability. Specifically, when all pixels are misclassified, the first-stage attack method still treats all pixels equally and ignores the different transferable properties of different pixels.

Target Models PSPNet-Res50 PSPNet-Res101 DeepLabV3-Res50 DeepLabV3-Res101
Clean Images 78.55 79.11 78.17 80.55
Source Models PSPNet-Res50 PGD 4.60 36.91 5.05 38.59
SegPGD 2.09 36.76 2.42 38.31
TranSegPGD (Ours) 1.55 34.57 2.38 36.74
PSPNet-Res101 PGD 31.67 2.88 31.18 5.42
SegPGD 30.44 1.36 29.97 3.59
TranSegPGD (Ours) 29.06 1.08 27.88 3.19
DeepLabV3-Res50 PGD 4.04 32.49 3.72 33.81
SegPGD 2.38 31.43 1.63 33.25
TranSegPGD (Ours) 2.10 30.59 1.55 30.97
DeepLabV3-Res101 PGD 31.23 4.84 30.67 3.48
SegPGD 30.62 2.89 30.15 1.58
TranSegPGD (Ours) 29.93 2.73 29.14 1.14
Table 1: Transferring adversarial examples generated on source segmentation models to target models on PASCAL VOC 2012. We present the mIoU of target models on adversarial examples and corresponding clean images. For each source model, three adversarial attack methods, which include PGD, SegPGD, and our proposed attack, are used to generate adversarial examples. The mIoU of target models on the adversarial examples is lower, which indicates that the generated adversarial examples are easier to transfer.

3.3 The second-stage adversarial attack strategy

During the second stage, the goal is to boost the transferability of adversarial examples generated in the first stage, which could not only fool the source model f𝜽ssegsubscriptsuperscript𝑓𝑠𝑒𝑔subscript𝜽𝑠f^{seg}_{\boldsymbol{\theta}_{s}}italic_f start_POSTSUPERSCRIPT italic_s italic_e italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_θ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT, but also the target model f𝜽tsegsubscriptsuperscript𝑓𝑠𝑒𝑔subscript𝜽𝑡f^{seg}_{\boldsymbol{\theta}_{t}}italic_f start_POSTSUPERSCRIPT italic_s italic_e italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT. Motivated by the previous works of out-of-distribution, the well-trained models make it hard to identify the out-of-distribution examples. Several works have proven that the adversarial examples, which are further distributedly from the original clean examples, could have higher adversarial transferability. Image segmentation is an extension of image classification. Each pixel of the input image in the segmentation task can be considered as a sample in the classification task. Hence, generalized to the task of semantic segmentation, the adversarial pixels, which are farther distributedly from the original clean pixels, could also have higher adversarial transferability. Thus, we compute the Kullback-Leribler divergence between the generated adversarial pixels 𝐱advsubscript𝐱𝑎𝑑𝑣\mathbf{x}_{adv}bold_x start_POSTSUBSCRIPT italic_a italic_d italic_v end_POSTSUBSCRIPT and the corresponding benign pixels 𝐱𝐱\mathbf{x}bold_x. It can be calculated as:

DKL(𝐱adv)i=j=1nσ(f𝜽seg(𝐱adv)i)jlogσ(f𝜽seg(𝐱adv)i)jσ(f𝜽seg(𝐱)i)j,subscript𝐷𝐾𝐿superscriptsubscript𝐱𝑎𝑑𝑣𝑖superscriptsubscript𝑗1𝑛𝜎subscriptsubscriptsuperscript𝑓𝑠𝑒𝑔𝜽superscriptsubscript𝐱𝑎𝑑𝑣𝑖𝑗𝜎subscriptsubscriptsuperscript𝑓𝑠𝑒𝑔𝜽superscriptsubscript𝐱𝑎𝑑𝑣𝑖𝑗𝜎subscriptsubscriptsuperscript𝑓𝑠𝑒𝑔𝜽superscript𝐱𝑖𝑗D_{KL}(\mathbf{x}_{adv})^{i}=\sum_{j=1}^{n}\sigma(f^{seg}_{\boldsymbol{\theta}% }(\mathbf{x}_{adv})^{i})_{j}\log\frac{\sigma(f^{seg}_{\boldsymbol{\theta}}(% \mathbf{x}_{adv})^{i})_{j}}{\sigma(f^{seg}_{\boldsymbol{\theta}}(\mathbf{x})^{% i})_{j}},italic_D start_POSTSUBSCRIPT italic_K italic_L end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_a italic_d italic_v end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_σ ( italic_f start_POSTSUPERSCRIPT italic_s italic_e italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_a italic_d italic_v end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT roman_log divide start_ARG italic_σ ( italic_f start_POSTSUPERSCRIPT italic_s italic_e italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_a italic_d italic_v end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_σ ( italic_f start_POSTSUPERSCRIPT italic_s italic_e italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_x ) start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG , (3)

where DKL(𝐱adv)isubscript𝐷𝐾𝐿superscriptsubscript𝐱𝑎𝑑𝑣𝑖D_{KL}(\mathbf{x}_{adv})^{i}italic_D start_POSTSUBSCRIPT italic_K italic_L end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_a italic_d italic_v end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT represents the KL distance between the model output on the i𝑖iitalic_i-th adversarial pixel and the model output on the corresponding clean pixel, n𝑛nitalic_n represents the output dimension of the segmentation model, and σ𝜎\sigmaitalic_σ represents the softmax operation. Then we calculate the mean KL distance between all adversarial pixels and clean pixels, which can be formulated as DKL(𝐱adv)meansubscript𝐷𝐾𝐿superscriptsubscript𝐱𝑎𝑑𝑣𝑚𝑒𝑎𝑛D_{KL}(\mathbf{x}_{adv})^{mean}italic_D start_POSTSUBSCRIPT italic_K italic_L end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_a italic_d italic_v end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_m italic_e italic_a italic_n end_POSTSUPERSCRIPT. We adopt the mean KL distance to divide the adversarial pixel into different branches, i.e., the high-transferability adversarial pixels PHsubscript𝑃𝐻P_{H}italic_P start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT and low-transferability adversarial pixels PLsubscript𝑃𝐿P_{L}italic_P start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT. In detail, if the KL distance of the i𝑖iitalic_i-th pixel DKL(𝐱adv)isubscript𝐷𝐾𝐿superscriptsubscript𝐱𝑎𝑑𝑣𝑖D_{KL}(\mathbf{x}_{adv})^{i}italic_D start_POSTSUBSCRIPT italic_K italic_L end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_a italic_d italic_v end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is greater than the KL mean distance DKL(𝐱adv)meansubscript𝐷𝐾𝐿superscriptsubscript𝐱𝑎𝑑𝑣𝑚𝑒𝑎𝑛D_{KL}(\mathbf{x}_{adv})^{mean}italic_D start_POSTSUBSCRIPT italic_K italic_L end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_a italic_d italic_v end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_m italic_e italic_a italic_n end_POSTSUPERSCRIPT, then the i𝑖iitalic_i-th adversarial pixel belongs to the high-transferability adversarial pixels PHsubscript𝑃𝐻P_{H}italic_P start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT, otherwise it belongs to the low-transferability adversarial pixels PLsubscript𝑃𝐿P_{L}italic_P start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT.

Hence, the loss function at the second stage can be formulated:

(f𝜽seg(𝐱advt),𝐲)subscriptsuperscript𝑓𝑠𝑒𝑔𝜽superscriptsubscript𝐱𝑎𝑑𝑣𝑡𝐲\displaystyle\mathcal{L}\left(f^{seg}_{\boldsymbol{\theta}}\left(\mathbf{x}_{% adv}^{t}\right),\mathbf{y}\right)caligraphic_L ( italic_f start_POSTSUPERSCRIPT italic_s italic_e italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_a italic_d italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , bold_y ) =1βH×WiPHi(f𝜽seg(𝐱advt),𝐲)absent1𝛽𝐻𝑊subscript𝑖subscript𝑃𝐻subscript𝑖subscriptsuperscript𝑓𝑠𝑒𝑔𝜽superscriptsubscript𝐱𝑎𝑑𝑣𝑡𝐲\displaystyle=\frac{1-\beta}{H\times W}\sum_{i\in P_{H}}\mathcal{L}_{i}\left(f% ^{seg}_{\boldsymbol{\theta}}\left(\mathbf{x}_{adv}^{t}\right),\mathbf{y}\right)= divide start_ARG 1 - italic_β end_ARG start_ARG italic_H × italic_W end_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ italic_P start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_f start_POSTSUPERSCRIPT italic_s italic_e italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_a italic_d italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , bold_y ) (4)
+βH×WjPLj(f𝜽seg(𝐱advt),𝐲),𝛽𝐻𝑊subscript𝑗subscript𝑃𝐿subscript𝑗subscriptsuperscript𝑓𝑠𝑒𝑔𝜽superscriptsubscript𝐱𝑎𝑑𝑣𝑡𝐲\displaystyle+\frac{\beta}{H\times W}\sum_{j\in P_{L}}\mathcal{L}_{j}\left(f^{% seg}_{\boldsymbol{\theta}}\left(\mathbf{x}_{adv}^{t}\right),\mathbf{y}\right),+ divide start_ARG italic_β end_ARG start_ARG italic_H × italic_W end_ARG ∑ start_POSTSUBSCRIPT italic_j ∈ italic_P start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_f start_POSTSUPERSCRIPT italic_s italic_e italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_a italic_d italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) , bold_y ) ,

where β𝛽\betaitalic_β represents a hyper-parameter at the second stage, which controls the allocation of weights. Based on the first-stage and second-stage adversarial attack strategies, we can establish our two-stage adversarial attack method to generate adversarial examples for semantic segmentation. The algorithm of the proposed method is summarized in Algorithm 1.

4 Experiments

Method PSPNet-Res50 PSPNet-Res101 DeepLabV3-Res50 DeepLabV3-Res101 FCNs-VGG Segformer Segmenter
Clean 78.55 79.11 78.17 80.55 69.10 77.19 78.5
MI-FGSM 5.46 29.71 5.85 32.63 36.88 43.61 54.53
MI-FGSM-ours 2.09 26.42 2.62 28.61 34.62 42.60 53.76
TI-FGSM 4.89 35.43 7.08 39.60 35.62 47.30 55.42
TI-FGSM-ours 2.19 35.06 5.12 38.76 34.88 47.15 54.72
NI-FGSM 5.59 30.18 5.96 32.67 37.02 43.82 54.50
NI-FGSM-ours 2.17 26.35 2.77 29.02 34.69 42.85 53.96
Table 2: Transferring adversarial examples generated on source segmentation models to target models on PASCAL VOC 2012. We present the mIoU of target models on adversarial examples and corresponding clean images. PSPNet-Res50 is used as the source model. The mIoU of target models on the adversarial examples is lower, which indicates that the generated adversarial examples are easier to transfer.
Refer to caption
Figure 3: Visualization of Clean Images, Adversarial examples, and Segmentation Predictions. PSPNet-Res101 is used as the source model. DeepLabV3-Res101, DeepLabV3-Res50, and PSPNet-Res50 are used as the target models. The adversarial examples are generated by PGD, SegPGD, and the proposed TranSegPGD. The adversarial example generated by using the proposed method transfers better to other segmentation models. More figures are presented in the supplementary material.

4.1 Settings

Following previous works, we adopt widely used semantic segmentation datasets which include PASCAL VOC 2012 (VOC) [11] and Cityscapes (CS) [8] to conduct experiments. The VOC dataset consists of 20 classes for objects and 1 class for background. It has 1,464 training images, 1,499 validation images, and 1,456 testing images. The Cityscapes dataset consists of Urban street scene images containing high-quality pixel-level annotations. It has 19 classes with 2,975 training images, 500 validation images, and 1,525 testing images. As for the semantic segmentation models, we use FCN8s-VGG16 [36], FCN16s-VGG16 [36], PSPNet-Res50 [52], PSPNet-Res101 [52], DeepLabv3-Res50 [7] and DeepLabv3-Res101 [7] to conduct adversarial example generation and performance evaluation. As for the baseline adversarial attack methods, we adopt the popular PGD and the advanced SegPGD. We also compare the proposed method with some popular transferable adversarial attack methods on image classification tasks which include TI-FGSM, MI-FGSM, and NI-FGSM to evaluate the effectiveness of the proposed method. All comparison experiments are under the lsubscript𝑙l_{\infty}italic_l start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT-norm. Specifically, we set the maximum perturbation strength ϵitalic-ϵ\epsilonitalic_ϵ to 8/25582558/2558 / 255, the attack step size α𝛼\alphaitalic_α to 2/25522552/2552 / 255, and the number of attack iterations to 20. The mean Intersection over Union (mIoU) is used as a metric to evaluate the adversarial performance.

Method PSPNet-Res50 PSPNet-Res101 DeepLabV3-Res50 DeepLabV3-Res101 Bisenet Segformer
Clean 74.20 76.04 74.06 76.05 75.16 81.08
MI-FGSM 1.06 16.64 1.56 21.99 41.39 45.86
MI-FGSM-ours 0.32 13.20 1.36 18.56 37.85 43.95
TI-FGSM 1.08 14.25 1.84 21.53 34.72 49.20
TI-FGSM-ours 0.11 12.19 1.67 19.49 32.66 47.72
NI-FGSM 1.07 16.70 1.38 21.87 40.16 45.85
NI-FGSM-ours 0.34 13.01 1.33 18.39 37.9 43.97
Table 3: Transferring adversarial examples generated on source segmentation models to target models on Cityscapes. We present the mIoU of target models on adversarial examples and corresponding clean images. PSPNet-Res50 is used as the source model. The mIoU of target models on the adversarial examples is lower, which indicates that the generated adversarial examples are easier to transfer.

4.2 Comparisons with other adversarial attack methods on semantic segmentation

We compare the proposed method with the previous popular PGD and advanced SegPGD to evaluate adversarial transferability. In detail, we adopt PSPNet-Res50, PSPNet-Res101, DeepLabV3-Res50, and DeepLabV3-Res101 as the source models to generate adversarial examples on VOC. The results are shown in Table 1. Analyses are as follows. First, the proposed method outperforms other adversarial attack methods under all attack scenarios.

In particular, compared with the popular PGD, the proposed method not only improves the adversarial performance of adversarial examples generated on the source model but also boosts the transferable adversarial performance of adversarial examples on the target model. For example, when using the PSPNet-Res50 as the source model, the proposed method improves the adversarial accuracy of popular PGD by about 3.05% and improves a transferability performance of PGD on the PSPNet-Res101 by about 2.34%. Besides, the proposed method also achieves the best adversarial transferability on other target models. We attribute the improvements to assigning different weights to different branches of the input image. Second, compared with the advanced SegPGD, the proposed method also achieves better adversarial transferability under all attack scenarios though there is always a trade-off between adversarial performance and transferability performance. Compared with PGD, SegPGD could achieve the limit improvement of adversarial example transferability. But, the proposed method can significantly improve the transferability of adversarial examples. It is attributed to the proposed second-stage adversarial attack strategy, which assigns different weights to different transferable branches. For example, when using the DeepLabV3-Res50 as the source model, the proposed method boosts the transferability performance of SegPGD on the DeepLabV3-Res101 by about 2.28%.

Moreover, we visualize the adversarial examples and the prediction results of the adversarial examples on the other semantic segmentation models. In detail, we adopt the PSPNet-Res101 as the source model to generate adversarial examples and the DeepLabV3-Res101, DeepLabV3-Res50, and PSPNet-Res50 as the target models to evaluate the adversarial transferability. The result is shown in Figure 3. It is clear that the adversarial generated by using PGD, SegPGD, and the proposed method can successfully fool the source model. Adversarial examples generated by using PGD and SegPGD do not significantly affect the output of the target models, but the adversarial examples generated by using the proposed method can fool the target models, which demonstrates the effectiveness of the proposed method in improving adversarial example transferability for semantic segmentation.

4.3 Comparisons with other transferable attack methods on image classification

To further evaluate the effectiveness of TranSegPGD, we first generalize popular transferable attack methods which include MI-FGSM, TI-FGSM, and NI-FGSM on the image classification to the semantic segmentation and compare the proposed method with them. Our method can be combined with these transferable attack methods as a plug-and-play component to improve the transferability of adversarial examples on semantic segmentation, i.e., MI-FSGM-ours, TI-FGSM-ours, and NI-FGSM-ours.

For PASCAL VOC 2012, we use the PSPNet-Res50 as the source model to generate adversarial examples and use the PSPNet-Res101, DeepLabV3-Res50, DeppLabV3-Res101, FCNs-VGG, Segformer, and Segmenter as the target models to evaluate the adversarial transferability. The results are shown in Table 2. Performance analyses are summarized as follows. First, the proposed three-attack method achieves better adversarial transferability than their base adversarial attack methods under all attack scenarios. For example, when using the DeppLabV3-Res101 as the target model, the previous MI-FGSM achieves a transferability performance of about 32.63%, but the proposed MI-FGSM-ours achieves a transferability performance of about 28.61%, which boosts the transferability performance of about 4.02%. The previous TI-FGSM achieves a transferability performance of about 39.60%, but the proposed TI-FGSM-ours achieves a transferability performance of about 38.76%, which boosts the transferability performance of about 0.84%. The previous NI-FGSM achieves a transferability performance of about 32.67%, but the proposed MI-FGSM-ours achieves a transferability performance of about 29.02%, which boosts the transferability performance of about 3.65%. It indicates that TranSegPGD can significantly improve the transferability of adversarial examples.

For Cityscapes, we also adopt the PSPNet-Res50 as the source model to generate adversarial examples and adopt the PSPNet-Res101, DeepLabV3-Res50, DeepLabV3-Res101, Bisenet, and Segformer as the target models to evaluate the adversarial transferability. The results are shown in Table 3. We can observe a similar phenomenon on PASCAL VOC 2012. The proposed method can boost the adversarial transferability of the base adversarial attack methods under all attack scenarios. For example, when using the PSPNet-Res101 as the target model, the previous MI-FGSM obtains a transferability performance of about 16.64%, while the proposed MI-FGSM-ours achieves a transferability performance of about 13.2%, which boosts the transferability performance of about 3.44%. The previous TI-FGSM obtains a transferability performance of about 14.25%, while the proposed TI-FGSM-ours achieves a transferability performance of about 12.19%, which boosts the transferability performance of about 2.06%. The previous NI-FGSM obtains a transferability performance of about 16.70%, while the proposed NI-FGSM-ours achieves a transferability performance of about 13.01%, which boosts the transferability performance of about 3.69%. The experimental results indicate that the proposed method can further boost the transferability of adversarial examples.

4.4 Transfer to attack segment anything model

More and more works adopt foundation models to perform semantic segmentation tasks, and Segment Anything Model (SAM) [28] stands out from them. We also transfer the adversarial examples to attack SAM to evaluate the effectiveness of the proposed method on the VOC. In detail, we adopt PSPNet-Res50 as the source model to generate adversarial examples. And MI-FGSM, TI-FGSM, and NI-FGSM are used as the baseline methods. The results are shown in Figure 4. It can be observed that the proposed method can significantly improve the adversarial transferability performance of adversarial examples to SAM. We provide visualization results in the supplementary material. We also provide the experimental results on Cityscapes in the supplementary material.

Refer to caption
Figure 4: The mIoU of the segment anything model on adversarial examples generated on PSPNet-Res50 and corresponding clean image on the VOC. MI-FGSM, NI-FGSM, and TI-FGSM are used as the baseline models. x𝑥xitalic_x-axis represents the attack methods. y𝑦yitalic_y-axis represents the mIoU(%).

4.5 Ablation Study

In this paper, we propose an effective two-stage adversarial attack strategy to boost the transferability of adversarial examples for semantic segmentation. The first-stage adversarial attack strategy is used to generate adversarial examples effectively. The second-stage adversarial attack strategy is used to improve the adversarial example transferability. To validate the effectiveness of each stage in the proposed method, we conduct ablation experiments on VOC. Specifically, we adopt the PSPNet-Res50 as the source model for adversarial example generation. PSPNet-Res101 and DeepLabV3-Res101 are used to validate the transferability of generated adversarial examples. The results are shown in Table 4. Analyses are summarized as follows.

First, when incorporating the first-stage adversarial attack strategy only, the adversarial performance on the source model significantly improves while the performance on the target models improves a little. When incorporating the second-stage adversarial attack strategy only, the adversarial performance on the source model slightly improves while the performance on the target models improves a lot. It indicates that the first-stage adversarial attack strategy contributes more to improve the adversarial performance and the second-stage adversarial attack strategy contributes more to improve the adversarial transferability. Second, using both adversarial attack strategies can achieve the adversarial performance on the source model and adversarial transferability performance, which suggests that the two adversarial attack strategies are harmonious, and their integration has the potential to achieve the best performance.

Fisrt
Stage
Second
Stage
PSPNet-
Res50
PSPNet-
Res101
DeepLabV3-
Res101
4.60 36.91 38.59
2.80 36.38 38.03
3.93 36.15 37.22
1.55 34.57 36.74
Table 4: Ablation study of the proposed method. The mIoU(%) of segmentation models on the adversarial examples is reported. PSPNet-Res50 is used as the source model. PSPNet-Res101 and DeepLabV3-Res101 are used as the target models.

5 Conclusion

In this paper, we focus on how to improve the transferability of adversarial examples on semantic segmentation, which has been largely overlooked by previous works. We propose an effective two-stage adversarial attack strategy to boost transferability, dubbed TranSegPGD. At the first stage, we divide each pixel in an input image into different branches according to its adversarial property. We assign distinct weights to different branches for optimization to enhance the adversarial performance of all pixels. At the second stage, we divide each pixel into different branches according to its transferable property, determined by Kullback-Leibler divergence. We assign distinct weights to different branches to boost transferability. We emphasize high weights on the loss of pixels with high transferability to amplify the transferability. Extensive experiments across diverse segmentation models conducted on the PASCAL VOC 2012 and Cityscapes datasets validate the efficacy of our method. The experiment results show that our TranSegPGD achieves state-of-the-art performance.

References

  • Arnab et al. [2018] Anurag Arnab, Ondrej Miksik, and Philip HS Torr. On the robustness of semantic segmentation models to adversarial attacks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 888–897, 2018.
  • Bai et al. [2019] Yang Bai, Yan Feng, Yisen Wang, Tao Dai, Shu-Tao Xia, and Yong Jiang. Hilbert-based generative defense for adversarial examples. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4784–4793, 2019.
  • Bai et al. [2020] Yang Bai, Yuyuan Zeng, Yong Jiang, Yisen Wang, Shu-Tao Xia, and Weiwei Guo. Improving query efficiency of black-box adversarial attack. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXV 16, pages 101–116. Springer, 2020.
  • Bai et al. [2021a] Yang Bai, Xin Yan, Yong Jiang, Shu-Tao Xia, and Yisen Wang. Clustering effect of (linearized) adversarial robust models. arXiv preprint arXiv:2111.12922, 2021a.
  • Bai et al. [2021b] Yang Bai, Yuyuan Zeng, Yong Jiang, Shu-Tao Xia, Xingjun Ma, and Yisen Wang. Improving adversarial robustness via channel-wise activation suppressing. arXiv preprint arXiv:2103.08307, 2021b.
  • Bai et al. [2023] Yang Bai, Yisen Wang, Yuyuan Zeng, Yong Jiang, and Shu-Tao Xia. Query efficient black-box adversarial attack on deep neural networks. Pattern Recognition, 133:109037, 2023.
  • Chen et al. [2017] Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587, 2017.
  • Cordts et al. [2016] Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3213–3223, 2016.
  • Dong et al. [2018] Yinpeng Dong, Fangzhou Liao, Tianyu Pang, Hang Su, Jun Zhu, Xiaolin Hu, and Jianguo Li. Boosting adversarial attacks with momentum. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 9185–9193, 2018.
  • Dong et al. [2019] Yinpeng Dong, Tianyu Pang, Hang Su, and Jun Zhu. Evading defenses to transferable adversarial examples by translation-invariant attacks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4312–4321, 2019.
  • Everingham et al. [2010] Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. The pascal visual object classes (voc) challenge. International journal of computer vision, 88:303–338, 2010.
  • Fischer et al. [2021] Marc Fischer, Maximilian Baader, and Martin Vechev. Scalable certified segmentation via randomized smoothing. In International Conference on Machine Learning, pages 3340–3351. PMLR, 2021.
  • Fischer et al. [2017] Volker Fischer, Mummadi Chaithanya Kumar, Jan Hendrik Metzen, and Thomas Brox. Adversarial examples for semantic image segmentation. arXiv preprint arXiv:1703.01101, 2017.
  • Frangi et al. [2018] Alejandro F Frangi, Sailesh Conjeti, Christos Davatzikos, Nassir Navab, Julia A Schnabel, Carlos Alberola-López, Gabor Fichtinger, Magdalini Paschali, and Fernando Navarro. Generalizability vs. robustness: Investigating medical imaging networks using adversarial examples. In International Conference on Medical Image Computing and Computer-Assisted Intervention, number DZNE-2022-01068. Image Analysis, 2018.
  • Goodfellow et al. [2015] Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.
  • Gu et al. [2021] **dong Gu, Hengshuang Zhao, Volker Tresp, and Philip Torr. Adversarial examples on segmentation models can be easy to transfer. arXiv preprint arXiv:2111.11368, 2021.
  • Gu et al. [2022] **dong Gu, Hengshuang Zhao, Volker Tresp, and Philip HS Torr. Segpgd: An effective and efficient adversarial attack for evaluating and boosting segmentation robustness. In European Conference on Computer Vision, pages 308–325. Springer, 2022.
  • Gu et al. [2023] **dong Gu, Xiaojun Jia, Pau de Jorge, Wenqain Yu, Xinwei Liu, Avery Ma, Yuan Xun, Anjun Hu, Ashkan Khakzar, Zhijiang Li, et al. A survey on transferability of adversarial examples across deep neural networks. arXiv preprint arXiv:2310.17626, 2023.
  • He et al. [2023a] Bangyan He, Jian Liu, Yiming Li, Siyuan Liang, **gzhi Li, Xiaojun Jia, and Xiaochun Cao. Generating transferable 3d adversarial point cloud via random perturbation factorization. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 764–772, 2023a.
  • He et al. [2023b] Mengqi He, **g Zhang, Zhaoyuan Yang, Mingyi He, Nick Barnes, and Yuchao Dai. Transferable attack for semantic segmentation. arXiv preprint arXiv:2307.16572, 2023b.
  • He et al. [2019] Xiang He, Sibei Yang, Guanbin Li, Haofeng Li, Huiyou Chang, and Yizhou Yu. Non-local context encoder: Robust biomedical image segmentation against adversarial attacks. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 8417–8424, 2019.
  • Hendrik Metzen et al. [2017] Jan Hendrik Metzen, Mummadi Chaithanya Kumar, Thomas Brox, and Volker Fischer. Universal adversarial perturbations against semantic image segmentation. In Proceedings of the IEEE international conference on computer vision, pages 2755–2764, 2017.
  • Huang et al. [2023a] Lifeng Huang, Chengying Gao, and Ning Liu. Erosion attack: Harnessing corruption to improve adversarial examples. IEEE Transactions on Image Processing, 2023a.
  • Huang and Kong [2021] Yi Huang and Adams Wai-Kin Kong. Transferable adversarial attack based on integrated gradients. In International Conference on Learning Representations, 2021.
  • Huang et al. [2023b] Yihao Huang, Yue Cao, Tianlin Li, Felix Juefei-Xu, Di Lin, Ivor W Tsang, Yang Liu, and Qing Guo. On the robustness of segment anything. arXiv preprint arXiv:2305.16220, 2023b.
  • Jia et al. [2020] Xiaojun Jia, Xingxing Wei, Xiaochun Cao, and Xiaoguang Han. Adv-watermark: A novel watermark perturbation for adversarial examples. In Proceedings of the 28th ACM International Conference on Multimedia, pages 1579–1587, 2020.
  • Jia et al. [2022] Xiaojun Jia, Yong Zhang, Baoyuan Wu, Ke Ma, Jue Wang, and Xiaochun Cao. Las-at: adversarial training with learnable attack strategy. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13398–13408, 2022.
  • Kirillov et al. [2023] Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al. Segment anything. arXiv preprint arXiv:2304.02643, 2023.
  • LeCun et al. [2015] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature, 521(7553):436–444, 2015.
  • Li et al. [2020] Maosen Li, Cheng Deng, Tengjiao Li, Junchi Yan, Xinbo Gao, and Heng Huang. Towards transferable targeted attack. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 641–649, 2020.
  • Lin et al. [2019] Jiadong Lin, Chuanbiao Song, Kun He, Liwei Wang, and John E Hopcroft. Nesterov accelerated gradient and scale invariance for adversarial attacks. In International Conference on Learning Representations, 2019.
  • Madry et al. [2018] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018.
  • Moosavi-Dezfooli et al. [2016] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2574–2582, 2016.
  • Nagarajan et al. [2020] Vaishnavh Nagarajan, Anders Andreassen, and Behnam Neyshabur. Understanding the failure modes of out-of-distribution generalization. In International Conference on Learning Representations, 2020.
  • Qin et al. [2022] Zeyu Qin, Yanbo Fan, Yi Liu, Li Shen, Yong Zhang, Jue Wang, and Baoyuan Wu. Boosting the transferability of adversarial attacks with reverse adversarial perturbation. Advances in Neural Information Processing Systems, 35:29845–29858, 2022.
  • Shelhamer et al. [2017] Evan Shelhamer, Jonathan Long, and Trevor Darrell. Fully convolutional networks for semantic segmentation. IEEE transactions on pattern analysis and machine intelligence, 39(4):640–651, 2017.
  • Wald et al. [2021] Yoav Wald, Amir Feder, Daniel Greenfeld, and Uri Shalit. On calibration and out-of-domain generalization. Advances in neural information processing systems, 34:2215–2227, 2021.
  • Wang and He [2021] Xiaosen Wang and Kun He. Enhancing the transferability of adversarial attacks through variance tuning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1924–1933, 2021.
  • Wang et al. [2021] Xiaosen Wang, Xuanran He, **gdong Wang, and Kun He. Admix: Enhancing the transferability of adversarial attacks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 16158–16167, 2021.
  • Wang and Farnia [2023] Yilin Wang and Farzan Farnia. On the role of generalization in transferability of adversarial examples. In Uncertainty in Artificial Intelligence, pages 2259–2270. PMLR, 2023.
  • Wang et al. [2023] Zhaoxin Wang, Handing Wang, Cong Tian, and Yaochu **. Adversarial training of deep neural networks guided by texture and structural information. In Proceedings of the 31st ACM International Conference on Multimedia, pages 4958–4967, 2023.
  • Wu and Zhu [2020] Lei Wu and Zhanxing Zhu. Towards understanding and improving the transferability of adversarial examples in deep neural networks. In Asian Conference on Machine Learning, pages 837–850. PMLR, 2020.
  • Wu et al. [2021] Weibin Wu, Yuxin Su, Michael R Lyu, and Irwin King. Improving the transferability of adversarial samples with adversarial transformations. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9024–9033, 2021.
  • Xiao et al. [2018] Chaowei Xiao, Ruizhi Deng, Bo Li, Fisher Yu, Mingyan Liu, and Dawn Song. Characterizing adversarial examples based on spatial consistency information for semantic segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), pages 217–234, 2018.
  • Xie et al. [2017] Cihang Xie, Jianyu Wang, Zhishuai Zhang, Yuyin Zhou, Lingxi Xie, and Alan Yuille. Adversarial examples for semantic segmentation and object detection. In Proceedings of the IEEE international conference on computer vision, pages 1369–1378, 2017.
  • Xie et al. [2019] Cihang Xie, Zhishuai Zhang, Yuyin Zhou, Song Bai, Jianyu Wang, Zhou Ren, and Alan L Yuille. Improving transferability of adversarial examples with input diversity. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2730–2739, 2019.
  • Xiong et al. [2022] Yifeng Xiong, Jiadong Lin, Min Zhang, John E Hopcroft, and Kun He. Stochastic variance reduced ensemble adversarial attack for boosting the adversarial transferability. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14983–14992, 2022.
  • Xu et al. [2021] Xiaogang Xu, Hengshuang Zhao, and Jiaya Jia. Dynamic divide-and-conquer adversarial training for robust semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7486–7495, 2021.
  • Yu et al. [2023] Wenqian Yu, **dong Gu, Zhijiang Li, and Philip Torr. Reliable evaluation of adversarial transferability. arXiv preprint arXiv:2306.08565, 2023.
  • Zhang et al. [2022] Chaoning Zhang, Philipp Benz, Adil Karjauv, Jae Won Cho, Kang Zhang, and In So Kweon. Investigating top-k white-box and transferable black-box attack. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15085–15094, 2022.
  • Zhang et al. [2023] Jian** Zhang, Jen-tse Huang, Wenxuan Wang, Yichen Li, Weibin Wu, Xiaosen Wang, Yuxin Su, and Michael R Lyu. Improving the transferability of adversarial samples by path-augmented method. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8173–8182, 2023.
  • Zhao et al. [2017] Hengshuang Zhao, Jian** Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. Pyramid scene parsing network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2881–2890, 2017.
  • Zhao et al. [2021] Zhengyu Zhao, Zhuoran Liu, and Martha Larson. On success and simplicity: A second look at transferable targeted attacks. Advances in Neural Information Processing Systems, 34:6115–6128, 2021.
  • Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2921–2929, 2016.
  • Zhu et al. [2023] Hegui Zhu, Haoran Zheng, Ying Zhu, and Xiaoyan Sui. Boosting the transferability of adversarial attacks with adaptive points selecting in temporal neighborhood. Information Sciences, 641:119081, 2023.
  • Zhu et al. [2022] Yao Zhu, Yuefeng Chen, Xiaodan Li, Kejiang Chen, Yuan He, Xiang Tian, Bolun Zheng, Yaowu Chen, and Qingming Huang. Toward understanding and boosting adversarial transferability from a distribution perspective. IEEE Transactions on Image Processing, 31:6487–6501, 2022.