Data-Driven Lipschitz Continuity: A Cost-Effective Approach to Improve Adversarial Robustness

Erh-Chung Chen
National Tsing Hua University
&Pin-Yu Chen
IBM Research
&I-Hsin Chung
IBM Research
&Che-Rung Lee
National Tsing Hua University

Abstract

The security and robustness of deep neural networks (DNNs) have become increasingly concerning. This paper aims to provide both a theoretical foundation and a practical solution to ensure the reliability of DNNs. We explore the concept of Lipschitz continuity to certify the robustness of DNNs against adversarial attacks, which aim to mislead the network with adding imperceptible perturbations into inputs. We propose a novel algorithm that remaps the input domain into a constrained range, reducing the Lipschitz constant and potentially enhancing robustness. Unlike existing adversarially trained models, where robustness is enhanced by introducing additional examples from other datasets or generative models, our method is almost cost-free as it can be integrated with existing models without requiring re-training. Experimental results demonstrate the generalizability of our method, as it can be combined with various models and achieve enhancements in robustness. Furthermore, our method achieves the best robust accuracy for CIFAR10, CIFAR100, and ImageNet datasets on the RobustBench leaderboard.

1 Introduction

Deep neural networks (DNNs) have demonstrated promising results across various tasks [1, 2], prompting concerns about AI security as these networks are increasingly deployed in our daily lives. A single erroneous prediction could lead to catastrophic consequences. For example, the Overload attack can significantly inflate the inference time of detecting objects for self-driving systems [3], while even minor typos in input prompts can cause large language models (LLMs) to produce unexpected responses [4].

The focus of this paper is to design robust DNNs that can defend against adversarial attacks, which aim to create perturbations in inputs that are imperceptible to humans but can mislead DNNs. Previous studies have revealed the existence of adversarial examples in diverse domains, such as image pixels [5], audio data [6], and textual content [7]. Consequently, exploring the vulnerabilities of DNNs and develo** theoretically grounded explainable AI is crucial for ensuring the reliability of DNN-based applications.

Adversarial training [8] has proven to be an effective strategy for enhancing the robustness of DNNs. It achieves this by generating adversarial examples on the fly during the training phase and optimizing the model’s weights to minimize the losses caused by these examples. Recent studies have shown that robustness can be further improved by introducing additional examples from other datasets [9] or using generative models [10, 11] to cover low-frequency data. Despite the promising improvements in robustness, training costs increase significantly due to the demand for additional data, which can be up to 20 to 100 times larger than the original dataset. This poses a trade-off between training cost and robustness. The concern over high computational costs becomes a significant obstacle in deploying robust DNN-based applications, especially in fields like medicine, autonomous driving systems, and other areas where human lives are at stake.

In this paper, we explore how robustness is certified by the theorem of Lipschitz continuity, which theoretically gauges how much outputs are amplified by the perturbations. However, we argue that the set of observed data is finite and cannot cover the entire real data space, leading to an overestimation of the Lipschitz constant derived from the theorem. Therefore, we propose an algorithm that can remap the input domain into a constrained range, resulting in a Lipschitz constant for the modified function that is less than or equal to the Lipschitz constant of the original function, thus potentially enhancing robustness. Our key contributions are outlined as follows:

•

We introduce the concept of the empirical Lipschitz constant, which can more precisely reflect the robustness of the corresponding observed data. Compared with the original definition of Lipschitz constant, the empirical value is derived from a set of observed data, thereby eliminating the influence of space that is never drawn from real data. As illustrated in Figure 1, we prove that any function that can be formulated as a linear system, when combined with our proposed function to remap the input domain of a specific layer to a constrained range, can reduce its empirical Lipschitz constant, resulting in better robustness.
•

The proposed function can enhance the robustness of adversarially trained models with minimal additional costs. Specifically, it introduces only one parameter, the value of which can be determined by scanning an observed data once without the need for re-training or fine-tuning.
•

The experimental results suggest that our method can be combined with various existing methods and gain robustness improvements. Besides, our method achieves the best robust accuracy against adversarial examples generated by AutoAttack [12], a state-of-the-art ensemble attack, for CIFAR10, CIFAR100, and ImageNet datasets on the RobustBench leaderboard [13]. By assessing accuracy against adaptive attacks, transfer attacks, and evaluation methods for validating obfuscated gradients [14, 15], we believe that the proposed algorithm should not cause robustness to be overestimated.

Refer to caption — Figure 1: The empirical Lipschitz constant of specific layers that can be represented by linear systems, such as convolutional or fully connected layers, can be reduced by remap** their input domain to a constrained range.

The rest of this paper is organized as follows. Section 2 introduces the background on adversarial attacks and adversarial training. Section 3 presents the theoretical proof of how robustness is enhanced by manipulating the domain of linear functions and introduces the proposed algorithm. Section 4 shows the experimental results, including the comparisons among related works, and ablation studies on various hyper-parameters, combination with different activation functions and gradient masking verification. The last section is our conclusion.

2 Related Works

2.1 Adversarial Attacks

Adversarial attacks aim to inject tiny perturbations into inputs, causing victim DNNs to output incorrect predictions with high confidence [16]. These attacks have been observed in numerous vision applications [17, 18, 19, 20]. Furthermore, these tiny perturbations can be embedded not only in image pixels but also in textual contexts [21, 22], audio space [23], and other fields [24]. Some research has shown how adversarial attacks threaten real applications [25, 26, 27, 28]. Investigating the vulnerability of DNNs and theoretically avoiding adversarial examples when optimizing model weights or designing architecture is an ongoing challenge.

Depending on the amount of information the attacker has access to, adversarial attacks can be divided into two types: white-box attacks and black-box attacks. For white-box attacks, all information about the victim models is public. Attackers can craft adversarial examples through the gradient direction, which is usually opposite to the direction in which the model weights were optimized during the training phase [8, 12]. Although this scenario is unrealistic, this type of research could lead to the design of more reliable models in the future. Conversely, for black-box attacks, the only information leaked to the attacker is the output prediction of the victim model. Adversarial examples can be generated by random search [29], discrepancies in outputs [30], or transferability from models with similar architectures [31, 32]. The purpose of black-box attacks is to study the risk of the victim models being attacked in real-world application scenarios.

2.2 Defensive Strategies

Adversarial training is a defensive strategy that aims to find optimal weights against adversarial attacks [8, 33, 34]. It achieves this by generating adversarial examples on the fly during the training phase and optimizing the model’s weights to minimize the losses caused by these examples. Despite the superior robustness achieved by adversarial training, the associated training costs of adversarially trained models are generally ten times more expensive than those of models trained utilizing a standard policy. The concern over high computational costs becomes a significant obstacle in deploying DNN-based applications.

Balancing between training cost and robustness is a challenge for adversarial training. Fast adversarial training has been proposed for applications pursuing higher robustness under a limited budget [35, 36]. However, numerous adversarial examples cannot be drawn from these approaches, potentially leading to catastrophic overfitting, where robust accuracy significantly decreases without warning signs [37]. On the contrary, some studies attempted to refine robustness by introducing additional examples from other datasets [9] or using generative models [10, 11]. Alternatively, another line of research has demonstrated that the removal of partial adversarial examples does not compromise robust accuracy, addressing the issue of unaffordable training costs [38, 39].

Despite the potential of adversarial training to enhance model robustness, budgetary constraints often limit the scope of their crafting to one or two specific attack types during the training stage. This restricted approach may inadvertently render adversarially trained models susceptible to novel, unseen attacks. As an alternative, Lipschitz-based certified training offers a theoretical framework for ensuring an upper bound on prediction errors [40, 41, 42]. However, it is important to acknowledge that these training methods often suffer from scalability issues.

3 Methodology

3.1 Motivation

In this paper, we approach robustness from a theoretical perspective, aiming to demonstrate that all risks posed by adversarial examples are limited while minimizing the additional costs associated with improving robustness. Our evaluation is conducted under the white-box scenario, where the target model is capable of defending against various types of known adversarial attacks, including white-box attacks [8, 18, 12], black-box attacks [43], and transfer attacks [44, 45]. Additionally, we conduct a set of experiments to verify that gradient masking [14] does not occur in our method and to ensure that robustness is not overestimated.

3.2 Lipschitz Continuity

To achieve our goal, we introduce a quantitative metric known as the Lipschitz constant, which gauges how much outputs are amplified by the perturbations within the input domain. The mathematical definition is as follows, a function $f:\mathbb{R}^{m}\rightarrow\mathbb{R}^{n}$ is globally Lipschitz continuous if there exists an constant $K\geq 0$ such that

D_{f}(f(x_{1}),f(x_{2}))\leq KD_{x}(x_{1},x_{2})\quad\forall x_{1},x_{2}\in% \mathbb{R}^{m},

(1)

where $D_{x}$ is a metric on the domain of $f$ ; $D_{f}$ is a metric on the range of $f$ ; and $x_{1}\neq x_{2}$ . For a DNN, it can be considered as a composite function:

F(x)=(f_{1}\circ f_{2}\circ\dots\circ f_{L})(x),

(2)

where $f_{i}$ is the function of $i$ -th layer. If there exists a Lipschitz constant for each individual layer, we can derive an upper bound of the Lipschitz constant for the victim model as follows,

K_{F}\leq\prod_{i=1}^{L}K_{i},

(3)

where $K_{i}$ is the Lipschitz constant of $f_{i}$ .

By defining adversarial examples $x^{\text{adv}}$ within a $\epsilon$ -ball centered at an image $x$ as the inputs of (1), we can assess the impact caused by adversarial examples. Therefore, the Lipschitz constant serves as a bridge that connects the design of robust models with the measurement of risks posed by adversarial examples. A small Lipschitz constant for the victim model implies that the increase in loss is minimal, indicating a higher ability to resist adversarial attacks. Consequently, the objective of this paper is to lower the upper bound of Lipschitz constant for the given models.

As indicated by previous studies [46, 47], Lipschitz constant of the given model defined in (3) can be minimized by reducing the output discrepancy of individual linear layers. Under the $L_{2}$ norm, we have

\frac{||f(x^{\text{adv}})-f(x)||_{2}}{||x^{\text{adv}}-x||_{2}}=\frac{(||Wx^{% \text{adv}}+b)-(Wx+b)||_{2}}{||\delta||_{2}}=\frac{||W\delta||_{2}}{||\delta||% _{2}},

(4)

where $W$ is the weight matrix; and $\delta$ is the distance between $x^{\text{adv}}$ and $x$ . Therefore, the original optimization problem of minimizing Lipschitz constant is transformed into the following minimization problem:

\min_{W}\max_{\delta\neq 0,\delta\in\mathbb{R}^{m}}\frac{||W\delta||_{2}}{||% \delta||_{2}}=\min_{W}\sigma_{\text{max}}(W),

(5)

where $\sigma_{\text{max}}(W)$ represents the largest singular value of the matrix $W$ . Notably, there is a relation to eigenvalues:

\sigma^{2}_{i}(W)=\lambda_{i}(WW^{\dagger})=\lambda_{i}(W^{\dagger}W),

(6)

where $W^{\dagger}$ is the conjugate transpose of $W$ . Each singular value of the matrix $W$ is the square root of the eigenvalue of the matrices $WW^{\dagger}$ or $W^{\dagger}W$ . In other words, minimizing $\lambda_{\text{max}}(WW^{\dagger})$ , the largest eigenvalue of the matrices, can achieve the same objective.

Rather that minimizing the objective directly, Gershgorin circle theorem provides an alternative solution to estimate the robustness of the given linear system.

Theorem 1.

(Gershgorin Circle Theorem) For an $m\times m$ matrix $A$ with entries $a_{ij}$ , each eigenvalue of $A$ is in at least one of the disk:

R_{i}=\{z\in\mathbb{C}:|z-a_{ii}|\leq\sum_{i\neq j}|a_{ij}|\}\quad\mathrm{for}% \quad i=\{1,2,\ldots,m\}.

(7)

Theorem 1 indicates each row vector can be represented as a disk which is centered at the diagonal entry $a_{ii}$ and whose radius is the sum of the off-diagonal entries $a_{ij}$ . For any layer which can be represented by a linear system, such as convolutional or fully connected layers, robustness can be improved by shrinking the radius of the disk with the largest eigenvalue.

3.3 Forged Function

We argue that the largest singular value provides a loose bound for the Lipschitz constant. To precisely reflect the robustness of the corresponding observed data, we define the empirical Lipschitz constant that eliminates the influence of space that is never drawn from real data.

Definition 1.

Empirical Lipschitz constant:

\max_{\delta\neq 0,x\in\mathcal{S}}\frac{||Wx||_{2}}{||x||_{2}}\quad\forall x% \in\mathcal{S},

(8)

where $\mathcal{S}$ is an observed dataset. As can be seen, the empirical Lipschitz constant on the finite dataset is less than or equal to its Lipschitz constant derived from the theorem.

Based on Definition 1, we can build robust models by manipulating the output ranges of individual layers, thereby restricting the input domain of the next layer. If input vectors do not align with the direction of the eigenvector with the largest eigenvalue, the empirical constant should be bounded. Therefore, we proposed a forged function defined as follows:

f^{\text{forge}}(x)=\begin{cases}0\quad&\text{if}\quad|x|\leq c^{\text{th}}_{i% },\\ x\quad&\text{otherwise},\end{cases}

(9)

where $c^{\text{th}}_{i}$ is a threshold for the $i$ -th layer. Compared with the original functions, the range of the forged function is suppressed if its value is less than the threshold. When $c^{\text{th}}_{i}$ is set to $0$ , the forged function degrades into the original function.

The forged function aim to reduce the empirical Lipschitz constant of the layers that can be represented as linear systems by remap** the input domain of these layers into a constrained set. Figure 2 provides a visual representation of potential insertion points for the forged function, while maintaining the integrity of other layers. For the ResNet architecture, the forged function is placed before the convolutional layers in each residual block. Similarly, for vision transformer architectures, the structure of MLP layers is adapted to seamlessly integrate the forged function.

Here is the proof that the largest eigenvalue can be shrunk by the forged function. Let $W$ be the weight of the target layer, which can be represented by an $m\times n$ matrix, and t be the input vector. Without loss of generality, we assume that $A=W^{\dagger}W$ and $f^{\text{forge}}(\textbf{t})$ is defined as:

f^{\text{forge}}(t_{i})=\begin{cases}0\quad i\leq k\\ t_{i}\quad\text{otherwise},\end{cases}

(10)

where $t_{i}$ is the $i$ -th element of t and $k$ is a positive number.

Lemma 2.

There exists a matrix $A^{\prime}$ whose largest eigenvalue, $\lambda_{\text{max}}(A^{\prime})$ , is less than or equal to the largest eigenvalue of $A$ , $\lambda_{\text{max}}(A)$ , if

Af^{\text{forge}}(\textbf{t})=A^{\prime}\textbf{t}.

(11)

Proof.

Since the first $k$ entries of the vector t are replaced with zeros, above condition can be achieved by replacing the corresponding column vectors of the matrix $A$ with zero vectors. Therefore, the entries of $A^{\prime}$ are formulated as

a^{\prime}_{ij}=\begin{cases}0\quad j\leq k\\ a_{ij}\quad\text{otherwise}.\end{cases}

(12)

The matrix $A$ is a positive semidefinite matrix, implying that the diagonal entries are non-negative. Moreover, with the entry representation of $A^{\prime}$ in (12), we observe that modifications are only applied to the first $k$ columns, while the rest remain unchanged. Combining the Gershgorin Circle Theorem, we know that the centers of the first $k$ disks of the matrix $A^{\prime}$ are shifted towards zero. Additionally, the radii of all disks, the absolute values of the off-diagonal entries in $A^{\prime}$ , are shrunk. Consequently, the upper bound of the largest eigenvalue of the matrix $A^{\prime}$ is tighter compared to that of the original matrix $A$ . ∎

Notably, the outputs of $f^{\text{forge}}(\textbf{t})$ vary depending on the inputs, resulting in each input having its own $A^{\prime}$ . The upper bound of the largest eigenvalue of each matrix $A^{\prime}$ must be not greater than the largest eigenvalue of the matrix $A$ . With Lamma 2, a precise upper bound of the largest eigenvalue can be obtained by feeding a set of observed images. On the contrary, there might be cases in which solving the minimization problem in (5) leads to the theoretical largest eigenvalue being minimized, but the empirical Lipschitz constant remains unchanged.

The choice of a proper $c^{\text{th}}_{i}$ is a crucial factor in reducing the largest eigenvalue. In this paper, we propose obtaining the value of $c^{\text{th}}_{i}$ through the following equation:

c^{\text{th}}_{i}=c^{r}\max(F_{1\rightarrow i}(x))\quad\forall x\in\mathcal{S},

(13)

where $\mathcal{S}$ can include all or a subset of images in the training set, $c^{r}$ is a positive number and $F_{1\rightarrow i}(x)$ represents the output of the $i$ -th layer. Specifically, each layer has its own $c^{\text{th}}_{i}$ , but they share the same hyper-parameter $c^{r}$ . Algorithm 1 specifies the implementation details of the forged function. The variable $b$ is used to store the maximum value that appeared in $\mathcal{S}$ , as defined in (13), and is initialized during construction. Similar to the implementation of the batchnorm layer, the behavior is depended on the mode configuration. When the mode is set to tracking mode, the variable $b$ is updated accordingly, and the input is set to the output without any modification. Conversely, when the mode is set to inference mode, the value of $b$ is frozen, but the input is updated as defined by (9). By default, the mode is set to inference, and the values of $b$ and $c^{r}$ are zero, respectively. As a result, the set $\mathcal{M}$ is empty, and the algorithm is degraded to the identical function.

It is worth emphasizing that by feeding all images in the set $\mathcal{S}$ once in track mode beforehand, the value of $c^{\text{th}}_{i}$ can be obtained. The elements satisfying the constraints are appropriately deactivated during inference. Notably, this operation does not necessitate gradient computations and incurs minimal time consumption, typically only a few minutes, even when executed on commonly used GPUs. In comparison to adversarial training, this process is nearly cost-free. The overall procedure shares many similarities with post-pruning techniques. Nevertheless, we posit that the proposed function is very similar to the ReLU function, as it suppresses the output values within a specific range, but the defined range in the forged function is adaptive to the observed dataset.

Algorithm 1 Forged Function

1: require: Input x, Mode

m

2: if

m

is tracking mode then

b=\max(b,\textbf{x})

4: else

\mathcal{M}=\left\{x\middle|\text{abs}(x)\leq c^{r}b\right\}

6: for all

s\in\mathcal{M}

s=0

8: end for

9: end if

10: return x

4 Experiments

4.1 Setup

We evaluated the performance on CIFAR10, CIFAR100, and ImageNet datasets under the white-box scenario with an $L_{\infty}$ norm. To ensure comparability of results, we assessed robustness using AutoAttack [12], For CIFAR10 and CIFAR100 datasets, $\epsilon$ is set to $8/255$ , while for the ImageNet dataset, $\epsilon$ is set to $4/255$ . The model weights are publicly accessible from RobustBench. The ablation study involves exploring the selection of the optimal $c^{r}$ , the combination of various models trained from different techniques, the verification of gradient masking, and assessments of certified adversarial robustness via randomized smoothing. Due to the page limit, the full experimental results of the ablation study are listed in Appendix.

4.2 White-box Evaluation

4.2.1 Performance Analysis on CIFAR10 and CIFAR100 Datasets

Table 1: The results of top-3 competitors on Robustbench.

(a) CIFAR10 dataset

#	Method	acc_nat	acc_AA
*	[11] + Ours	93.20	71.70
1	[48]	93.27	71.07
2	[11]	93.25	70.69
3	[49]	95.19	69.71

(b) CIFAR100 dataset

#	Method	acc_nat	acc_AA
*	[11] + Ours	74.97	44.00
1	[11]	75.22	42.67
2	[49]	83.08	41.80
3	[50]	73.85	39.18

The model used in this study is based on WRN-70-16 architecture with SiLU function while generative data were involved during the training phase. The value of $c^{\text{th}}_{i}$ is obtained by feeding all images from the training set without any augmentation, and $c^{r}$ was set to $2^{-8}$ for this experiment. Tables 1(a) and 1(b) summarize the top-3 competitors on Robustbench for CIFAR10 and CIFAR100 datasets, respectively, where $\#$ represent the rankings, our results are marked by the asterisk (*), acc_nat and acc_AA denote the accuracy against clean data and adversairal examples generated by AutoAttack, respectively.

As can be seen, our method combined with WRN-70-16 with SiLU function gains improvement in robustness by at least 0.9% and achieves the best results on Robustbench for both datasets. Nevertheless, standard accuracy (acc_nat) is decreased. Many factors might affect the results. For example, the single additional hyper-parameter introduced in this study might not provide sufficient granularity to fit all layers in the target model.

4.2.2 Performance Analysis on ImageNet Dataset

Table 2: The results of top-3 competitors for ImageNet dataset on Robustbench.

#	Method	Architecutre	acc_nat	acc_AA
*	[51] + Ours	Swin-L	78.88	60.04
1	[51]	Swin-L	78.92	59.56
2	[49]	ConvNeXtV2-L + Swin-L	81.48	58.50
3	[51]	ConvNeXt-L	78.02	58.48

In this experiment, we utilized the Swin [52] model architecture, a variant of transformers. However, scanning the approximately 1.2 million training images provided by the ImageNet dataset to determine the value of the hyper-parameter introduced in the forged function defined in (13) might take a long time. Alternatively, we randomly selected about 5,000 images as the observed images to determine the value of the hyper-parameter. Ideally, determining the optimal choice of $c^{r}$ requires conducting an ablation study to explore the relationship between the chosen $c^{r}$ and robust accuracy on a validation set. To accelerate this procedure, we first seek a value of $c^{r}$ with the highest standard accuracy. The candidate values are selected in a small range centered around this value.

Tables 2 lists the top-3 competitors on Robustbench for ImageNet dataset, including ranking, architecture, standard accuracy, and robust accuracy against AutoAttack. The experimental results demonstrate that the Swin-L model with GELU combined with our method can obtain improvements in robust accuracy and achieve the best result while standard accuracy has a tiny drop. This finding verifies that our method can be applied to both convolutional and fully connected layers.

4.2.3 Combination with Various Models

The experiment aims to assess the generability of the proposed function on adversarial trained models with identical architecture but from various training strategies and to evaluate the potential cost reduction of adversarial training. We integrated the proposed approach with partial models selected from RobustBench, whose weights are obtained directly from the official without any modifications, and also included a model trained by TRADES [53] as a baseline for CIFAR10 dataset. The selected models were trained using different techniques, such as adding perturbations in internal layers, retrieving information using knowledge distillation, reducing inefficient training data, or involving additional images from generated models or another dataset. Except for the model used in RST-WAP, which is WRN-28-10, the model architecture we utilized is WRN-34-10 with ReLU, as it is the most popular network on the RobustBench leaderboard [13].

For the white-box evaluation, the value of $c^{r}$ is set to $2^{-7}$ . Table 3(a) and 3(b) present standard and robust accuracy of models integrated with our method for CIFAR10 and CIFAR100 dataset, respectively. In these tables, the column Original indicates the original results reported by RobustBench, and the column Original+Ours demonstrates the results of the proposed method. As indicated in these tables, for CIFAR10 dataset, the proposed method enhances robust accuracy by more than $2\%$ for RST-AWP, DefEAT, and LTD models, while other models receive approximately $1$ to $1.5\%$ improvement in robustness. Similarly, for CIFAR100 dataset, these models meet at least a $1\%$ increase in robustness. The empirical results prove that the resilience of existing models against adversarial attacks can be improved by Lemma 2. We believe that the proposed solution is general as it achieves great success in models incorporating different training techniques.

Another advantage of the proposed method that we would like to highlight is that the cost of our approach can almost be ignored compared to the cost of adversarial training as the cost involves only a single pass scan of a set of images to determine the hyper-parameter $c^{th}$ . This implies that these models can enhance robustness for free. Specifically, LefEAT can achieve a robust accuracy of 57.30% by removing inefficient training data. By combining LefEAT model with our approach, a robust accuracy of 59.55% can be achieved, which is comparable to RST-AWP (60.04%). However, RST-AWP introduces more images from another dataset, resulting in a higher cost in each epoch. Similarly, for the CIFAR100 dataset, DefEAT with our proposed method achieves a robust accuracy of 32.11%, which is better than EffAug (31.85%), which involves more complex data augmentation during the training stage. This aligns with the suggestion by DefEAT that some data can be removed without hurting robustness. Holistically, we believe that our approach might provide a hint during the late phase of adversarial training to drop inefficient weights, resulting in further cost savings or enhanced resilience.

Another interesting observation presented in these tables is that standard accuracy is also improved for all models on both CIFAR10 and CIFAR100 datasets. Although this phenomenon cannot be explained by Lemma 2, we believe that the output range between ReLU and the proposed functions has a high similarity. Consequently, this arrangement can maintain the accuracy of clean data.

Table 3: Standard and robust accuracy of models integrated with our method.

(a) CIFAR10 dataset

Method Original Original+Ours acc_nat acc_AA acc_nat acc_AA RST-AWP [54] 88.25 60.04 89.50 62.76 DefEAT [39] 86.54 57.30 87.40 59.55 LTD [55] 85.21 56.94 85.98 59.25 AWP [54] 85.36 56.17 86.19 57.85 TRADES[53] 85.34 52.86 85.78 53.80

(b) CIFAR100 dataset

Method Original Original+Ours acc_nat acc_AA acc_nat acc_AA EffAug [56] 68.75 31.85 69.14 32.57 DKLD [50] 64.08 31.65 64.26 32.58 DefEAT [39] 64.32 31.13 66.42 32.11 LTD [55] 64.07 30.59 64.29 31.95 AWP [54] 60.38 28.86 60.63 29.72

4.2.4 Gradient Masking Verification

Previous studies suggest that the resilience of models might be unintentionally overestimated [14, 15]. The proposed function in (9) suppresses values to zero if the condition is satisfied. One might argue that this property could unintentionally cause obfuscated gradients, resulting in gradient attacks being unable to efficiently produce adversarial examples. Therefore, to verify that the proposed method does not encounter the gradient masking issue, we should conduct more experiments from the following aspects:

1.

White-box attacks should be better than black-box attacks.
2.

Iterative attacks should have better performance than one-step attacks.
3.

Robust accuracy should gradually decrease to zero when the radius of $\epsilon$ -ball increase.
4.

The modified model should defense against adversarial examples generated by the original models.
5.

Certified robustness that conducted by random smoothing [57].

The first item has been examined by AutoAttack as it involves three white-box attacks and one black-box attack. By compared robust accuracy shown in Table 1(a), 1(b) and 2, the models combined with the proposed method perform better robust accuracy than the original models. It indicates black-box cannot produce more adversarial examples.

The full experimental results for the rest of experiments can be found in the appendix. The results demonstrate that the proposed algorithm does not violate any of the above rules and the certified robustness improves by our method across most settings. From the evidence, we believe that the proposed method does not encounter the gradient masking problem among different hyper-parameters and various models on CIFAR10 and CIFAR100 datasets.

5 Conclusion

In this paper, we recap how robustness is certified by the theorem of Lipschitz continuity. We introduce the concept of the empirical Lipschitz constant, which minimizes the influence of the space not drawn from real data, resulting in a precise estimation of the robustness of the corresponding observed data. We prove that by remap** the input domain of a specific layer to a constrained range, the Lipschitz constant can be shrunk, leading to better robustness. The proposed function introduces only one parameter, the value of which can be determined by scanning the training data once, without re-training or fine-tuning. Compared with adversarial training, the proposed method is almost cost-free. The experimental results suggest that our method can be combined with various existing methods and achieve robustness improvements, and no gradient masking occurs in our algorithm. Furthermore, our method can achieve the best robust accuracy for CIFAR10, CIFAR100, and ImageNet datasets on the RobustBench leaderboard.

Numerous future directions merit exploration. Firstly, due to the property of maximization, the proposed function might easily be influenced by outliers. Designing a better function is an interesting research topic. Secondly, exploring the combination with various activation functions, different model architectures or large-scale datasets would be beneficial. Lastly, it is worth investigating to understand the theoretical reasons why our proposed function improves standard accuracy.

References

[1] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 2012.
[2] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 779–788, 2016.
[3] Erh-Chung Chen, Pin-Yu Chen, I Chung, Che-rung Lee, et al. Overload: Latency attacks on object detection for edge devices. arXiv preprint arXiv:2304.05370, 2023.
[4] Kaijie Zhu, **dong Wang, Jiaheng Zhou, Zichen Wang, Hao Chen, Yidong Wang, Linyi Yang, Wei Ye, Neil Zhenqiang Gong, Yue Zhang, et al. Promptbench: Towards evaluating the robustness of large language models on adversarial prompts. arXiv preprint arXiv:2306.04528, 2023.
[5] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
[6] Nicholas Carlini and David Wagner. Audio adversarial examples: Targeted attacks on speech-to-text. In 2018 IEEE Security and Privacy Workshops (SPW), pages 1–7. IEEE, 2018.
[7] **feng Li, Shouling Ji, Tianyu Du, Bo Li, and Ting Wang. Textbugger: Generating adversarial text against real-world applications. arXiv preprint arXiv:1812.05271, 2018.
[8] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
[9] Yair Carmon, Aditi Raghunathan, Ludwig Schmidt, John C Duchi, and Percy S Liang. Unlabeled data improves adversarial robustness. Advances in neural information processing systems, 32, 2019.
[10] Sven Gowal, Sylvestre-Alvise Rebuffi, Olivia Wiles, Florian Stimberg, Dan Andrei Calian, and Timothy A Mann. Improving robustness using generated data. Advances in Neural Information Processing Systems, 34:4218–4233, 2021.
[11] Zekai Wang, Tianyu Pang, Chao Du, Min Lin, Weiwei Liu, and Shuicheng Yan. Better diffusion models further improve adversarial training. In International Conference on Machine Learning, pages 36246–36263. PMLR, 2023.
[12] Francesco Croce and Matthias Hein. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In International conference on machine learning, pages 2206–2216. PMLR, 2020.
[13] Francesco Croce, Maksym Andriushchenko, Vikash Sehwag, Edoardo Debenedetti, Nicolas Flammarion, Mung Chiang, Prateek Mittal, and Matthias Hein. Robustbench: a standardized adversarial robustness benchmark. arXiv preprint arXiv:2010.09670, 2020.
[14] Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In International conference on machine learning, pages 274–283. PMLR, 2018.
[15] Nicholas Carlini, Anish Athalye, Nicolas Papernot, Wieland Brendel, Jonas Rauber, Dimitris Tsipras, Ian Goodfellow, Aleksander Madry, and Alexey Kurakin. On evaluating adversarial robustness. arXiv preprint arXiv:1902.06705, 2019.
[16] Pin-Yu Chen and Cho-Jui Hsieh. Adversarial robustness for machine learning. Academic Press, 2022.
[17] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
[18] Pin-Yu Chen, Yash Sharma, Huan Zhang, **feng Yi, and Cho-Jui Hsieh. Ead: elastic-net attacks to deep neural networks via adversarial examples. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
[19] Yixiang Wang, Jiqiang Liu, Xiaolin Chang, Ricardo J Rodríguez, and Jianhua Wang. Di-aa: An interpretable white-box attack for fooling deep neural networks. Information Sciences, 610:14–32, 2022.
[20] Mingjun Yin, Shasha Li, Chengyu Song, M Salman Asif, Amit K Roy-Chowdhury, and Srikanth V Krishnamurthy. Adc: Adversarial attacks against object detection that evade context consistency checks. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 3278–3287, 2022.
[21] Aounon Kumar, Chirag Agarwal, Suraj Srinivas, Soheil Feizi, and Hima Lakkaraju. Certifying llm safety against adversarial prompting. arXiv preprint arXiv:2309.02705, 2023.
[22] Jia-Yu Yao, Kun-Peng Ning, Zhen-Hui Liu, Mu-Nan Ning, and Li Yuan. Llm lies: Hallucinations are not bugs, but features as adversarial examples. arXiv preprint arXiv:2310.01469, 2023.
[23] Yi Xie, Zhuohang Li, Cong Shi, Jian Liu, Yingying Chen, and Bo Yuan. Enabling fast and universal audio adversarial attack using generative model. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 14129–14137, 2021.
[24] Inaam Ilahi, Muhammad Usama, Junaid Qadir, Muhammad Umar Janjua, Ala Al-Fuqaha, Dinh Thai Hoang, and Dusit Niyato. Challenges and countermeasures for adversarial attacks on deep reinforcement learning. IEEE Transactions on Artificial Intelligence, 3(2):90–109, 2021.
[25] Kaidi Xu, Gaoyuan Zhang, Sijia Liu, Quanfu Fan, Mengshu Sun, Hongge Chen, Pin-Yu Chen, Yanzhi Wang, and Xue Lin. Adversarial t-shirt! evading person detectors in a physical world. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pages 665–681. Springer, 2020.
[26] Stepan Komkov and Aleksandr Petiushko. Advhat: Real-world adversarial attack on arcface face id system. In 2020 25th international conference on pattern recognition (ICPR), pages 819–826. IEEE, 2021.
[27] Andrew Du, Bo Chen, Tat-Jun Chin, Yee Wei Law, Michele Sasdelli, Ramesh Rajasegaran, and Dillon Campbell. Physical adversarial attacks on an aerial imagery object detector. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1796–1806, 2022.
[28] Xingxing Wei, Ying Guo, and Jie Yu. Adversarial sticker: A stealthy attack method in the physical world. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(3):2711–2725, 2022.
[29] Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion, and Matthias Hein. Square attack: a query-efficient black-box adversarial attack via random search. In European conference on computer vision, pages 484–501. Springer, 2020.
[30] Pu Zhao, Pin-Yu Chen, Siyue Wang, and Xue Lin. Towards query-efficient black-box adversary with zeroth-order natural gradient descent. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 6909–6916, 2020.
[31] Xiaosen Wang, Xuanran He, **gdong Wang, and Kun He. Admix: Enhancing the transferability of adversarial attacks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 16158–16167, 2021.
[32] Erh-Chung Chen, Pin-Yu Chen, I Chung, Che-Rung Lee, et al. Steal now and attack later: Evaluating robustness of object detection against black-box adversarial attacks. arXiv preprint arXiv:2404.15881, 2024.
[33] Minhao Cheng, Pin-Yu Chen, Sijia Liu, Shiyu Chang, Cho-Jui Hsieh, and Payel Das. Self-progressing robust training. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 7107–7115, 2021.
[34] Minhao Cheng, Qi Lei, Pin-Yu Chen, Inderjit Dhillon, and Cho-Jui Hsieh. Cat: Customized adversarial training for improved robustness. arXiv preprint arXiv:2002.06789, 2020.
[35] Erh-Chung Chen and Che-Rung Lee. Towards fast and robust adversarial training for image classification. In Proceedings of the Asian Conference on Computer Vision, 2020.
[36] Yihua Zhang, Guanhua Zhang, Prashant Khanduri, Mingyi Hong, Shiyu Chang, and Sijia Liu. Revisiting and advancing fast adversarial training through the lens of bi-level optimization. In International Conference on Machine Learning, pages 26693–26712. PMLR, 2022.
[37] Leslie Rice, Eric Wong, and Zico Kolter. Overfitting in adversarially robust deep learning. In International conference on machine learning, pages 8093–8104. PMLR, 2020.
[38] **gfeng Zhang, Xilie Xu, Bo Han, Gang Niu, Lizhen Cui, Masashi Sugiyama, and Mohan Kankanhalli. Attacks which do not kill training make adversarial learning stronger. In International conference on machine learning, pages 11278–11287. PMLR, 2020.
[39] Erh-Chung Chen and Che-Rung Lee. Data filtering for efficient adversarial training. Pattern Recognition, page 110394, 2024.
[40] Sven Gowal, Krishnamurthy Dvijotham, Robert Stanforth, Rudy Bunel, Chongli Qin, Jonathan Uesato, Relja Arandjelovic, Timothy Mann, and Pushmeet Kohli. On the effectiveness of interval bound propagation for training verifiably robust models. arXiv preprint arXiv:1810.12715, 2018.
[41] Yujia Huang, Huan Zhang, Yuanyuan Shi, J Zico Kolter, and Anima Anandkumar. Training certifiably robust neural networks with efficient local lipschitz bounds. Advances in Neural Information Processing Systems, 34:22745–22757, 2021.
[42] Mark Niklas Müller, Franziska Eckert, Marc Fischer, and Martin Vechev. Certified training: Small boxes are all you need. arXiv preprint arXiv:2210.04871, 2022.
[43] Pin-Yu Chen, Huan Zhang, Yash Sharma, **feng Yi, and Cho-Jui Hsieh. Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In Proceedings of the 10th ACM workshop on artificial intelligence and security, pages 15–26, 2017.
[44] Ambra Demontis, Marco Melis, Maura Pintor, Matthew Jagielski, Battista Biggio, Alina Oprea, Cristina Nita-Rotaru, and Fabio Roli. Why do adversarial attacks transfer? explaining transferability of evasion and poisoning attacks. In 28th USENIX security symposium (USENIX security 19), pages 321–338, 2019.
[45] Zeyu Qin, Yanbo Fan, Yi Liu, Li Shen, Yong Zhang, Jue Wang, and Baoyuan Wu. Boosting the transferability of adversarial attacks with reverse adversarial perturbation. Advances in neural information processing systems, 35:29845–29858, 2022.
[46] Yuichi Yoshida and Takeru Miyato. Spectral norm regularization for improving the generalizability of deep learning. arXiv preprint arXiv:1705.10941, 2017.
[47] Farzan Farnia, Jesse M Zhang, and David Tse. Generalizable adversarial training via spectral normalization. arXiv preprint arXiv:1811.07457, 2018.
[48] ShengYun Peng, Weilin Xu, Cory Cornelius, Matthew Hull, Kevin Li, Rahul Duggal, Mansi Phute, Jason Martin, and Duen Horng Chau. Robust principles: Architectural design principles for adversarially robust cnns. arXiv preprint arXiv:2308.16258, 2023.
[49] Yatong Bai, Mo Zhou, Vishal M Patel, and Somayeh Sojoudi. Mixednuts: Training-free accuracy-robustness balance via nonlinearly mixed classifiers. arXiv preprint arXiv:2402.02263, 2024.
[50] Jiequan Cui, Zhuotao Tian, Zhisheng Zhong, Xiaojuan Qi, Bei Yu, and Hanwang Zhang. Decoupled kullback-leibler divergence loss. arXiv preprint arXiv:2305.13948, 2023.
[51] Chang Liu, Yinpeng Dong, Wenzhao Xiang, Xiao Yang, Hang Su, Jun Zhu, Yuefeng Chen, Yuan He, Hui Xue, and Shibao Zheng. A comprehensive study on robustness of image classification models: Benchmarking and rethinking. arXiv preprint arXiv:2302.14301, 2023.
[52] Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021.
[53] Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric Xing, Laurent El Ghaoui, and Michael Jordan. Theoretically principled trade-off between robustness and accuracy. In International conference on machine learning, pages 7472–7482. PMLR, 2019.
[54] Dongxian Wu, Shu-Tao Xia, and Yisen Wang. Adversarial weight perturbation helps robust generalization. Advances in neural information processing systems, 33:2958–2969, 2020.
[55] Erh-Chung Chen and Che-Rung Lee. Ltd: Low temperature distillation for robust adversarial training. arXiv preprint arXiv:2111.02331, 2021.
[56] Sravanti Addepalli, Samyak Jain, et al. Efficient and effective augmentation strategy for adversarial training. Advances in Neural Information Processing Systems, 35:1488–1501, 2022.
[57] Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certified adversarial robustness via randomized smoothing. In international conference on machine learning, pages 1310–1320. PMLR, 2019.

Appendix A Ablation Study

A.1 Hyper-parameter Selection

This experiment investigates how the choice of hyper-parameter $c^{r}$ influences standard accuracy and robust accuracy. Since most models are represented in 16 bit format, and the widths of fraction bit for FP16 format defined by IEEE-754 standard and BFloat are 10 and 7 bits, respectively, truncated errors might easily occur when performing addition on two numbers with a magnitude difference of $2^{8}$ or higher. On the other hand, when $c^{r}$ is set to $2^{-5}$ , all models experience a significant drop in standard accuracy, and there is meaningless in evaluating robustness at this configuration. We suggest that the candidates of $c^{r}$ are $2^{-8}$ , $2^{-7}$ and $2^{-6}$ .

The results on CIFAR10 and CIFAR100 are presented in Table 5 and Table 5, respectively. Moreover, the results of accuracy against CW attack on $L_{\infty}$ norm for CIFAR10 and CIFAR100 datasets are presented in Tables 6(a) and 6(b), respectively. As can be seen, when $c^{r}$ is set to $2^{-8}$ , all models achieve better standard accuracy and robust accuracy. Additionally, the results for all models with $c^{r}=2^{-7}$ are surpassed by those when $c^{r}$ is set to $2^{-8}$ . Robust accuracy can be further enhanced by setting $2^{-6}$ , while standard accuracy might drop compared to the original. The results suggest that $c^{r}=2^{-7}$ is a solution that balances standard accuracy and robustness. Nevertheless, when robustness is a major concern, $c^{r}=2^{-6}$ is a better choice.

Intuitively, we expect that standard accuracy gradually decreases when the value of $c^{r}$ increases. The phenomenon can be observed when $c^{r}$ is $2^{-6}$ or higher but two counterexamples are reported in the ablation study when setting $c^{r}$ to $2^{-7}$ and $2^{-8}$ . A possible explanation is that the optimizer becomes stuck in a saddle area, as ReLU is non-differentiable at the zero point. This might cause the gradient direction to become stuck in an oscillation when values are close to zero. By shifting those values to zero, antagonistic effects among different feature maps, filters, or channels are accidentally mitigated. However, further investigation and evidence are needed to support this conjecture.

We argue that any function that satisfies the conditions defined in (11) can shrink the largest eigenvalue. There might be another function that can perform better than the proposed one. Besides, the hyper-parameter is determined by choosing the maximum value appearing in the dataset.

Table 4: Ablation study of selecting optimal

c^{r}

for CIFAR10 dataset.

Method	RobustBench		$c^{r}=2^{-8}$		$c^{r}=2^{-7}$		$c^{r}=2^{-6}$
Method	acc_nat	acc_AA	acc_nat	acc_AA	acc_nat	acc_AA	acc_nat	acc_AA
RST-AWP	88.25	60.04	88.82	60.96	89.50	62.76	87.88	61.96
DefEAT	86.54	57.30	86.88	57.81	87.40	59.55	84.59	61.08
LTD	85.21	56.94	85.28	57.28	85.98	59.25	85.59	60.63
AWP	85.36	56.17	85.80	56.53	86.19	57.85	84.55	59.21
TRADES	85.34	52.86	85.57	52.97	85.78	53.80	85.49	55.37

Table 5: Ablation study of selecting optimal

c^{r}

for CIFAR100 dataset.

Method	RobustBench		$c^{r}=2^{-8}$		$c^{r}=2^{-7}$		$c^{r}=2^{-6}$
Method	acc_nat	acc_AA	acc_nat	acc_AA	acc_nat	acc_AA	acc_nat	acc_AA
EffAug	68.75	31.85	68.81	32.00	69.14	32.57	68.44	33.64
DKLD	64.08	31.65	64.10	31.77	64.26	32.58	63.50	33.87
DefEAT	65.89	30.57	66.12	31.11	66.42	32.46	65.06	34.07
LTD	64.07	30.59	64.29	31.13	64.29	31.95	64.18	34.04
AWP	60.38	28.86	60.18	29.10	60.63	29.72	60.71	30.82

Appendix B Full Experimental Results of Gradient Masking Verification

Table 6: The robust accuracy against CW attack on

L_{\infty}

norm.

(a) CIFAR10 dataset

Method	Origin	$c^{r}$
Method	Origin	$2^{-8}$	$2^{-7}$	$2^{-6}$
RST-AWP	58.98	61.84	68.24	80.92
DefEAT	56.92	58.02	61.06	65.56
LTD	58.12	58.56	60.50	64.86
AWP	56.84	57.34	60.58	66.50
TRADES	56.10	56.52	58.18	63.62

(b) CIFAR100 dataset

Method	Origin	$c^{r}$
Method	Origin	$2^{-8}$	$2^{-7}$	$2^{-6}$
EffAug	37.40	37.70	38.70	43.00
DKLD	37.50	38.06	39.38	44.20
DefEAT	36.90	37.56	39.82	44.30
LTD	36.66	37.32	38.86	43.44
AWP	34.56	35.20	35.94	40.40

Table 7(a) and 7(b) present the robust accuracy against adversarial examples generated by the original models on CIFAR10 and CIFAR100 datasets, respectively. As observed, none of the models showed lower robust accuracy than the original model. It indicates that adversarial examples can be efficiently crafted by utilizing the gradients from the victim models.

Table 8 and 9 presents the robust accuracy against FGSM and PGD attacks among different radii of the $\epsilon$ -ball on the CIFAR10 and CIFAR100 datasets, respectively. As observed, the robust accuracy against FGSM, a one-step attack, is always higher than the robust accuracy against PGD, an iterative attack. This implies that the gradient is reliable, allowing the PGD attack to adjust the gradient direction multiple times to find adversarial examples. Additionally, we observe that the robust accuracy against PGD attacks for all models gradually decreases to zero as the radius of the $\epsilon$ -ball increases. This indicates that the quality of gradients is preserved, enabling PGD attacks to move the gradient toward examples not in the observed distribution.

Figure 3 illustrates the certified robustness achieved by random smoothing for various models on the CIFAR10 dataset, where Original refers to the certified robustness of the original model, while Ours denotes the robustness of the model combined with the proposed method. As can be seen, our method brings slight improvements in robustness, except for the AWP model. These results demonstrate that our algorithm does not suffer from the gradient masking issue. However, the empirical Lipschitz constant is derived from the observed data. As the input distribution drawn from random smoothing and the observed data might have discrepancies, this could result in fluctuations in robustness.

Table 7: The robust accuracy against adversarial examples generated by the original models.

(a) CIFAR10 dataset

Method	Origin	$c^{r}$
Method	Origin	$2^{-8}$	$2^{-7}$	$2^{-6}$
RST-AWP	60.04	62.10	65.10	70.53
DefEAT	57.30	58.37	60.39	66.10
LTD	56.94	58.71	61.63	66.47
AWP	56.17	57.49	59.74	65.58
TRADES	52.86	55.55	55.09	58.68

(b) CIFAR100 dataset

Method	Origin	$c^{r}$
Method	Origin	$2^{-8}$	$2^{-7}$	$2^{-6}$
EffAug	31.85	32.87	35.08	40.04
DKLD	31.65	32.91	35.04	40.58
DefEAT	30.57	31.82	33.94	40.67
LTD	30.59	32.05	34.07	39.11
AWP	28.86	29.88	32.18	36.67

Table 8: The robust accuracy against FGSM and PGD attacks among different radii of

\epsilon

-ball on CIFAR10 dataset.

Method	$c^{r}$	Attack	$\epsilon$
Method	$c^{r}$	Attack	$\frac{1}{255}$	$\frac{2}{255}$	$\frac{4}{255}$	$\frac{8}{255}$	$\frac{16}{255}$	$\frac{32}{255}$	$\frac{64}{255}$	$\frac{96}{255}$
RST-AWP	$2^{-8}$	FGSM	88.28	86.94	83.80	75.12	57.23	34.04	18.80	19.03
	$2^{-8}$	PGD	86.78	84.47	79.03	66.03	34.02	2.01	0.01	0.0
	$2^{-7}$	FGSM	89.46	88.28	85.64	77.62	60.60	35.91	19.39	20.68
	$2^{-7}$	PGD	88.03	85.88	80.89	69.24	38.27	3.08	0.1	0.0
	$2^{-6}$	FGSM	87.70	86.63	84.38	77.91	60.93	33.48	16.12	18.81
	$2^{-6}$	PGD	86.38	84.69	81.19	73.72	52.24	11.89	0.19	0.0
DefEAT	$2^{-8}$	FGSM	86.38	85.40	81.98	72.73	53.28	30.34	18.13	19.65
	$2^{-8}$	PGD	84.52	82.07	76.51	63.71	33.87	1.76	0.0	0.0
	$2^{-7}$	FGSM	86.69	85.70	83.05	74.35	56.36	32.04	18.88	21.40
	$2^{-7}$	PGD	85.11	82.87	78.00	66.54	38.70	3.12	0.0	0.0
	$2^{-6}$	FGSM	84.14	83.57	81.03	74.63	57.67	28.11	13.17	19.38
	$2^{-6}$	PGD	82.96	81.35	77.37	69.52	50.70	10.05	0.2	0.0
LTD	$2^{-8}$	FGSM	84.94	83.87	81.15	72.80	55.45	33.06	18.44	17.48
	$2^{-8}$	PGD	83.13	80.68	75.53	63.52	34.81	2.64	0.0	0.0
	$2^{-7}$	FGSM	85.48	84.67	82.24	74.10	57.41	35.53	17.57	17.50
	$2^{-7}$	PGD	83.88	81.88	77.00	65.38	28.57	3.95	0.0	0.0
	$2^{-6}$	FGSM	85.06	84.28	82.21	75.77	60.79	34.39	14.34	15.42
	$2^{-6}$	PGD	83.91	82.00	78.33	69.82	49.95	11.63	0.21	0.0
AWP	$2^{-8}$	FGSM	85.11	83.90	80.68	71.28	53.78	33.21	20.86	19.61
	$2^{-8}$	PGD	83.34	80.34	75.08	61.53	30.50	1.89	0.03	0.0
	$2^{-7}$	FGSM	85.50	84.68	81.75	73.56	57.19	35.50	20.63	20.16
	$2^{-7}$	PGD	83.94	81.62	76.34	65.57	34.16	2.89	0.02	0.0
	$2^{-6}$	FGSM	83.87	83.19	81.08	74.97	61.49	35.13	16.12	18.69
	$2^{-6}$	PGD	83.00	81.42	78.13	71.50	54.95	14.78	0.41	0.0
TRADES	$2^{-8}$	FGSM	84.74	83.51	70.58	70.50	54.23	36.53	23.97	23.46
	$2^{-8}$	PGD	82.62	79.72	72.81	57.30	24.21	1.21	0.01	0.0
	$2^{-7}$	FGSM	85.00	84.02	80.57	71.61	55.94	25.67	22.31	22.51
	$2^{-7}$	PGD	82.96	80.31	73.75	58.91	26.22	1.31	0.02	0.0
	$2^{-6}$	FGSM	85.05	84.19	81.40	75.07	60.24	36.99	21.81	21.21
	$2^{-6}$	PGD	83.23	81.33	75.81	64.56	36.76	3.37	0.02	0.0

Table 9: The robust accuracy against FGSM and PGD attacks among different radii of

\epsilon

-ball on CIFAR100 dataset.

Method	$c^{r}$	Attack	$\epsilon$
Method	$c^{r}$	Attack	$\frac{1}{255}$	$\frac{2}{255}$	$\frac{4}{255}$	$\frac{8}{255}$	$\frac{16}{255}$	$\frac{32}{255}$	$\frac{64}{255}$	$\frac{96}{255}$
EffAug	$2^{-8}$	FGSM	68.02	65.96	60.36	49.65	33.90	17.39	7.11	5.77
	$2^{-8}$	PGD	64.92	61.04	52.82	39.37	17.53	1.81	0.0	0.0
	$2^{-7}$	FGSM	68.41	66.56	61.84	51.83	36.59	18.34	7.02	6.70
	$2^{-7}$	PGD	65.66	62.02	54.58	41.55	19.70	2.26	0.0	0.0
	$2^{-6}$	FGSM	67.84	67.25	64.65	57.97	44.23	22.20	8.26	8.85
	$2^{-6}$	PGD	66.38	64.20	59.63	50.89	33.98	7.27	0.17	0.0
DKLD	$2^{-8}$	FGSM	63.47	61.94	58.15	48.86	34.39	17.73	6.26	3.66
	$2^{-8}$	PGD	60.71	57.14	50.44	38.14	17.38	1.97	0.0	0.0
	$2^{-7}$	FGSM	63.55	62.31	58.65	50.37	36.49	18.69	6.36	4.29
	$2^{-7}$	PGD	61.06	57.84	51.51	39.99	19.71	2.37	0.0	0.0
	$2^{-6}$	FGSM	63.26	62.77	60.41	55.18	43.86	21.17	5.50	4.97
	$2^{-6}$	PGD	61.67	59.62	55.83	48.06	33.51	7.55	0.17	0.0
DefEAT	$2^{-8}$	FGSM	65.57	64.36	59.88	49.38	32.48	15.38	5.50	3.30
	$2^{-8}$	PGD	62.39	58.96	51.85	38.59	17.18	1.45	0.0	0.0
	$2^{-7}$	FGSM	65.97	64.96	60.67	51.44	34.84	16.36	5.33	3.96
	$2^{-7}$	PGD	62.94	59.83	52.98	41.05	20.07	2.02	0.0	0.0
	$2^{-6}$	FGSM	64.69	63.58	61.31	54.47	40.06	16.91	4.64	4.46
	$2^{-6}$	PGD	62.85	60.50	56.29	47.54	30.76	5.84	0.09	0.0
LTD	$2^{-8}$	FGSM	63.59	62.35	58.10	48.86	33.11	16.78	5.92	3.20
	$2^{-8}$	PGD	61.06	57.65	50.70	38.21	18.21	1.98	0.0	0.0
	$2^{-7}$	FGSM	64.05	62.87	59.02	50.16	34.90	17.27	5.54	3.11
	$2^{-7}$	PGD	61.51	58.32	51.84	39.89	20.32	2.26	0.0	0.0
	$2^{-6}$	FGSM	63.62	62.98	60.96	54.96	41.51	19.78	4.69	3.12
	$2^{-6}$	PGD	61.90	59.68	55.23	46.49	29.13	5.74	0.05	0.0
AWP	$2^{-8}$	FGSM	59.77	58.06	54.21	45.54	30.98	16.69	6.49	3.97
	$2^{-8}$	PGD	56.72	52.92	46.53	34.90	16.03	2.11	0.0	0.0
	$2^{-7}$	FGSM	60.00	58.52	54.93	46.65	32.66	17.42	6.04	3.60
	$2^{-7}$	PGD	57.19	53.75	47.46	36.22	17.48	2.58	0.0	0.0
	$2^{-6}$	FGSM	60.20	59.71	57.47	52.05	39.78	21.10	5.70	3.90
	$2^{-6}$	PGD	58.19	55.66	50.91	42.37	25.53	5.68	0.09	0.0

NeurIPS Paper Checklist

1.

Claims
Question: Do the main claims made in the abstract and introduction accurately reflect the paper’s contributions and scope?
Answer: [Yes]
Justification: The main claim is stated in the abstract and highlighted in the introduction.
Guidelines:
- •
  
  The answer NA means that the abstract and introduction do not include the claims made in the paper.
- •
  
  The abstract and/or introduction should clearly state the claims made, including the contributions made in the paper and important assumptions and limitations. A No or NA answer to this question will not be perceived well by the reviewers.
- •
  
  The claims made should match theoretical and experimental results, and reflect how much the results can be expected to generalize to other settings.
- •
  
  It is fine to include aspirational goals as motivation as long as it is clear that these goals are not attained by the paper.
2.

Limitations
Question: Does the paper discuss the limitations of the work performed by the authors?
Answer: [Yes]
Justification: We leave some topics unexplored for further work as described in the conclusion.
Guidelines:
- •
  
  The answer NA means that the paper has no limitation while the answer No means that the paper has limitations, but those are not discussed in the paper.
- •
  
  The authors are encouraged to create a separate "Limitations" section in their paper.
- •
  
  The paper should point out any strong assumptions and how robust the results are to violations of these assumptions (e.g., independence assumptions, noiseless settings, model well-specification, asymptotic approximations only holding locally). The authors should reflect on how these assumptions might be violated in practice and what the implications would be.
- •
  
  The authors should reflect on the scope of the claims made, e.g., if the approach was only tested on a few datasets or with a few runs. In general, empirical results often depend on implicit assumptions, which should be articulated.
- •
  
  The authors should reflect on the factors that influence the performance of the approach. For example, a facial recognition algorithm may perform poorly when image resolution is low or images are taken in low lighting. Or a speech-to-text system might not be used reliably to provide closed captions for online lectures because it fails to handle technical jargon.
- •
  
  The authors should discuss the computational efficiency of the proposed algorithms and how they scale with dataset size.
- •
  
  If applicable, the authors should discuss possible limitations of their approach to address problems of privacy and fairness.
- •
  
  While the authors might fear that complete honesty about limitations might be used by reviewers as grounds for rejection, a worse outcome might be that reviewers discover limitations that aren’t acknowledged in the paper. The authors should use their best judgment and recognize that individual actions in favor of transparency play an important role in develo** norms that preserve the integrity of the community. Reviewers will be specifically instructed to not penalize honesty concerning limitations.
3.

Theory Assumptions and Proofs
Question: For each theoretical result, does the paper provide the full set of assumptions and a complete (and correct) proof?
Answer: [Yes]
Justification: The proof of Lemma 2 is shown in Section 2.
Guidelines:
- •
  
  The answer NA means that the paper does not include theoretical results.
- •
  
  All the theorems, formulas, and proofs in the paper should be numbered and cross-referenced.
- •
  
  All assumptions should be clearly stated or referenced in the statement of any theorems.
- •
  
  The proofs can either appear in the main paper or the supplemental material, but if they appear in the supplemental material, the authors are encouraged to provide a short proof sketch to provide intuition.
- •
  
  Inversely, any informal proof provided in the core of the paper should be complemented by formal proofs provided in appendix or supplemental material.
- •
  
  Theorems and Lemmas that the proof relies upon should be properly referenced.
4.

Experimental Result Reproducibility
Question: Does the paper fully disclose all the information needed to reproduce the main experimental results of the paper to the extent that it affects the main claims and/or conclusions of the paper (regardless of whether the code and data are provided or not)?
Answer: [Yes]
Justification: The experiment were conducted by the public leaderboard RobusBench. The process is standardized and hardware-independent. The results can be fare compared to the competitiers.
Guidelines:
- •
  
  The answer NA means that the paper does not include experiments.
- •
  
  If the paper includes experiments, a No answer to this question will not be perceived well by the reviewers: Making the paper reproducible is important, regardless of whether the code and data are provided or not.
- •
  
  If the contribution is a dataset and/or model, the authors should describe the steps taken to make their results reproducible or verifiable.
- •
  
  Depending on the contribution, reproducibility can be accomplished in various ways. For example, if the contribution is a novel architecture, describing the architecture fully might suffice, or if the contribution is a specific model and empirical evaluation, it may be necessary to either make it possible for others to replicate the model with the same dataset, or provide access to the model. In general. releasing code and data is often one good way to accomplish this, but reproducibility can also be provided via detailed instructions for how to replicate the results, access to a hosted model (e.g., in the case of a large language model), releasing of a model checkpoint, or other means that are appropriate to the research performed.
- •
  While NeurIPS does not require releasing code, the conference does require all submissions to provide some reasonable avenue for reproducibility, which may depend on the nature of the contribution. For example
  1. (a)
    
    If the contribution is primarily a new algorithm, the paper should make it clear how to reproduce that algorithm.
  2. (b)
    
    If the contribution is primarily a new model architecture, the paper should describe the architecture clearly and fully.
  3. (c)
    
    If the contribution is a new model (e.g., a large language model), then there should either be a way to access this model for reproducing the results or a way to reproduce the model (e.g., with an open-source dataset or instructions for how to construct the dataset).
  4. (d)
    
    We recognize that reproducibility may be tricky in some cases, in which case authors are welcome to describe the particular way they provide for reproducibility. In the case of closed-source models, it may be that access to the model is limited in some way (e.g., to registered users), but it should be possible for other researchers to have some path to reproducing or verifying the results.
5.

Open access to data and code
Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material?
Answer: [N/A]
Justification: The model architecture and pre-trained weights proposed in this work will be submitted to Robusbench. This information will be published once the results are certified by Robusbench.
Guidelines:
- •
  
  The answer NA means that paper does not include experiments requiring code.
- •
  
  Please see the NeurIPS code and data submission guidelines (https://nips.cc/public/guides/CodeSubmissionPolicy) for more details.
- •
  
  While we encourage the release of code and data, we understand that this might not be possible, so “No” is an acceptable answer. Papers cannot be rejected simply for not including code, unless this is central to the contribution (e.g., for a new open-source benchmark).
- •
  
  The instructions should contain the exact command and environment needed to run to reproduce the results. See the NeurIPS code and data submission guidelines (https://nips.cc/public/guides/CodeSubmissionPolicy) for more details.
- •
  
  The authors should provide instructions on data access and preparation, including how to access the raw data, preprocessed data, intermediate data, and generated data, etc.
- •
  
  The authors should provide scripts to reproduce all experimental results for the new proposed method and baselines. If only a subset of experiments are reproducible, they should state which ones are omitted from the script and why.
- •
  
  At submission time, to preserve anonymity, the authors should release anonymized versions (if applicable).
- •
  
  Providing as much information as possible in supplemental material (appended to the paper) is recommended, but including URLs to data and code is permitted.
6.

Experimental Setting/Details
Question: Does the paper specify all the training and test details (e.g., data splits, hyperparameters, how they were chosen, type of optimizer, etc.) necessary to understand the results?
Answer: [Yes]
Justification: The assessments utilized by RobustBench are standardized.
Guidelines:
- •
  
  The answer NA means that the paper does not include experiments.
- •
  
  The experimental setting should be presented in the core of the paper to a level of detail that is necessary to appreciate the results and make sense of them.
- •
  
  The full details can be provided either with the code, in appendix, or as supplemental material.
7.

Experiment Statistical Significance
Question: Does the paper report error bars suitably and correctly defined or other appropriate information about the statistical significance of the experiments?
Answer: [N/A]
Justification: The assessments utilized by RobustBench are standardized. The impact of Statistical variances have been included in the assessments.
Guidelines:
- •
  
  The answer NA means that the paper does not include experiments.
- •
  
  The authors should answer "Yes" if the results are accompanied by error bars, confidence intervals, or statistical significance tests, at least for the experiments that support the main claims of the paper.
- •
  
  The factors of variability that the error bars are capturing should be clearly stated (for example, train/test split, initialization, random drawing of some parameter, or overall run with given experimental conditions).
- •
  
  The method for calculating the error bars should be explained (closed form formula, call to a library function, bootstrap, etc.)
- •
  
  The assumptions made should be given (e.g., Normally distributed errors).
- •
  
  It should be clear whether the error bar is the standard deviation or the standard error of the mean.
- •
  
  It is OK to report 1-sigma error bars, but one should state it. The authors should preferably report a 2-sigma error bar than state that they have a 96% CI, if the hypothesis of Normality of errors is not verified.
- •
  
  For asymmetric distributions, the authors should be careful not to show in tables or figures symmetric error bars that would yield results that are out of range (e.g. negative error rates).
- •
  
  If error bars are reported in tables or plots, The authors should explain in the text how they were calculated and reference the corresponding figures or tables in the text.
8.

Experiments Compute Resources
Question: For each experiment, does the paper provide sufficient information on the computer resources (type of compute workers, memory, time of execution) needed to reproduce the experiments?
Answer: [N/A]
Justification: The assessments utilized by RobustBench are hardware-independent. Experiments can be performed on any GPU, but powerful GPUs can complete the evaluation in shorter execution times.
Guidelines:
- •
  
  The answer NA means that the paper does not include experiments.
- •
  
  The paper should indicate the type of compute workers CPU or GPU, internal cluster, or cloud provider, including relevant memory and storage.
- •
  
  The paper should provide the amount of compute required for each of the individual experimental runs as well as estimate the total compute.
- •
  
  The paper should disclose whether the full research project required more compute than the experiments reported in the paper (e.g., preliminary or failed experiments that didn’t make it into the paper).
9.

Code Of Ethics
Question: Does the research conducted in the paper conform, in every respect, with the NeurIPS Code of Ethics https://neurips.cc/public/EthicsGuidelines?
Answer: [Yes]
Justification: This work follows the NeurIPS Code of Ethics.
Guidelines:
- •
  
  The answer NA means that the authors have not reviewed the NeurIPS Code of Ethics.
- •
  
  If the authors answer No, they should explain the special circumstances that require a deviation from the Code of Ethics.
- •
  
  The authors should make sure to preserve anonymity (e.g., if there is a special consideration due to laws or regulations in their jurisdiction).
10.

Broader Impacts
Question: Does the paper discuss both potential positive societal impacts and negative societal impacts of the work performed?
Answer: [N/A]
Justification: The main goal of this work is to design reliable models. This work is unlikely to have a negative social impact.
Guidelines:
- •
  
  The answer NA means that there is no societal impact of the work performed.
- •
  
  If the authors answer NA or No, they should explain why their work has no societal impact or why the paper does not address societal impact.
- •
  
  Examples of negative societal impacts include potential malicious or unintended uses (e.g., disinformation, generating fake profiles, surveillance), fairness considerations (e.g., deployment of technologies that could make decisions that unfairly impact specific groups), privacy considerations, and security considerations.
- •
  
  The conference expects that many papers will be foundational research and not tied to particular applications, let alone deployments. However, if there is a direct path to any negative applications, the authors should point it out. For example, it is legitimate to point out that an improvement in the quality of generative models could be used to generate deepfakes for disinformation. On the other hand, it is not needed to point out that a generic algorithm for optimizing neural networks could enable people to train models that generate Deepfakes faster.
- •
  
  The authors should consider possible harms that could arise when the technology is being used as intended and functioning correctly, harms that could arise when the technology is being used as intended but gives incorrect results, and harms following from (intentional or unintentional) misuse of the technology.
- •
  
  If there are negative societal impacts, the authors could also discuss possible mitigation strategies (e.g., gated release of models, providing defenses in addition to attacks, mechanisms for monitoring misuse, mechanisms to monitor how a system learns from feedback over time, improving the efficiency and accessibility of ML).
11.

Safeguards
Question: Does the paper describe safeguards that have been put in place for responsible release of data or models that have a high risk for misuse (e.g., pretrained language models, image generators, or scraped datasets)?
Answer: [N/A]
Justification: This models used in this works are existing public models. Therefore, this work poses no such risks.
Guidelines:
- •
  
  The answer NA means that the paper poses no such risks.
- •
  
  Released models that have a high risk for misuse or dual-use should be released with necessary safeguards to allow for controlled use of the model, for example by requiring that users adhere to usage guidelines or restrictions to access the model or implementing safety filters.
- •
  
  Datasets that have been scraped from the Internet could pose safety risks. The authors should describe how they avoided releasing unsafe images.
- •
  
  We recognize that providing effective safeguards is challenging, and many papers do not require this, but we encourage authors to take this into account and make a best faith effort.
12.

Licenses for existing assets
Question: Are the creators or original owners of assets (e.g., code, data, models), used in the paper, properly credited and are the license and terms of use explicitly mentioned and properly respected?
Answer: [Yes]
Justification: The related works and RobustBench are correctly cited in this work.
Guidelines:
- •
  
  The answer NA means that the paper does not use existing assets.
- •
  
  The authors should cite the original paper that produced the code package or dataset.
- •
  
  The authors should state which version of the asset is used and, if possible, include a URL.
- •
  
  The name of the license (e.g., CC-BY 4.0) should be included for each asset.
- •
  
  For scraped data from a particular source (e.g., website), the copyright and terms of service of that source should be provided.
- •
  
  If assets are released, the license, copyright information, and terms of use in the package should be provided. For popular datasets, paperswithcode.com/datasets has curated licenses for some datasets. Their licensing guide can help determine the license of a dataset.
- •
  
  For existing datasets that are re-packaged, both the original license and the license of the derived asset (if it has changed) should be provided.
- •
  
  If this information is not available online, the authors are encouraged to reach out to the asset’s creators.
13.

New Assets
Question: Are new assets introduced in the paper well documented and is the documentation provided alongside the assets?
Answer: [N/A]
Justification: This work does not release new assets.
Guidelines:
- •
  
  The answer NA means that the paper does not release new assets.
- •
  
  Researchers should communicate the details of the dataset/code/model as part of their submissions via structured templates. This includes details about training, license, limitations, etc.
- •
  
  The paper should discuss whether and how consent was obtained from people whose asset is used.
- •
  
  At submission time, remember to anonymize your assets (if applicable). You can either create an anonymized URL or include an anonymized zip file.
14.

Crowdsourcing and Research with Human Subjects
Question: For crowdsourcing experiments and research with human subjects, does the paper include the full text of instructions given to participants and screenshots, if applicable, as well as details about compensation (if any)?
Answer: [N/A]
Justification: This work does not involve crowdsourcing nor research with human subjects.
Guidelines:
- •
  
  The answer NA means that the paper does not involve crowdsourcing nor research with human subjects.
- •
  
  Including this information in the supplemental material is fine, but if the main contribution of the paper involves human subjects, then as much detail as possible should be included in the main paper.
- •
  
  According to the NeurIPS Code of Ethics, workers involved in data collection, curation, or other labor should be paid at least the minimum wage in the country of the data collector.
15.

Institutional Review Board (IRB) Approvals or Equivalent for Research with Human Subjects
Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or institution) were obtained?
Answer: [N/A]
Justification: This work does not involve crowdsourcing nor research with human subjects.
Guidelines:
- •
  
  The answer NA means that the paper does not involve crowdsourcing nor research with human subjects.
- •
  
  Depending on the country in which research is conducted, IRB approval (or equivalent) may be required for any human subjects research. If you obtained IRB approval, you should clearly state this in the paper.
- •
  
  We recognize that the procedures for this may vary significantly between institutions and locations, and we expect authors to adhere to the NeurIPS Code of Ethics and the guidelines for their institution.
- •
  
  For initial submissions, do not include any information that would break anonymity (if applicable), such as the institution conducting the review.