Consistency Purification: Effective and Efficient Diffusion Purification towards Certified Robustness

Yiquan Li¹ Zhongzhu Chen^2∗ Kun **^2∗ Jiongxiao Wang^1∗ Bo Li³ Chaowei Xiao¹
¹University of Wisconsin-Madison, ²University of Michigan, Ann Arbor, ³University of Chicago the first four authors contributed equally

Abstract

Diffusion Purification, purifying noised images with diffusion models, has been widely used for enhancing certified robustness via randomized smoothing. However, existing frameworks often grapple with the balance between efficiency and effectiveness. While the Denoising Diffusion Probabilistic Model (DDPM) offers an efficient single-step purification, it falls short in ensuring purified images reside on the data manifold. Conversely, the Stochastic Diffusion Model effectively places purified images on the data manifold but demands solving cumbersome stochastic differential equations, while its derivative, the Probability Flow Ordinary Differential Equation (PF-ODE), though solving simpler ordinary differential equations, still requires multiple computational steps. In this work, we demonstrated that an ideal purification pipeline should generate the purified images on the data manifold that are as much semantically aligned to the original images for effectiveness in one step for efficiency. Therefore, we introduced Consistency Purification, an efficiency-effectiveness Pareto superior purifier compared to the previous work. Consistency Purification employs the consistency model, a one-step generative model distilled from PF-ODE, thus can generate on-manifold purified images with a single network evaluation. However, the consistency model is designed not for purification thus it does not inherently ensure semantic alignment between purified and original images. To resolve this issue, we further refine it through Consistency Fine-tuning with LPIPS loss, which enables more aligned semantic meaning while kee** the purified images on data manifold. Our comprehensive experiments demonstrate that our Consistency Purification framework achieves state-of-the-art certified robustness and efficiency compared to baseline methods.

1 Introduction

Diffusion models were first proposed for high-quality image generation [1; 2; 3; 4; 5] and have been extended to generative tasks across various modalities, including audio [6; 7; 8], video [9; 10], and 3D object [11; 12; 13]. A diffusion model for image generation typically involves two key processes: (1) a forward diffusion process, which transforms the source image into an isotropic Gaussian by gradually adding Gaussian noise, and (2) the reverse diffusion process, which uses a Deep Neural Network (DNN) to perform iterative denoising starting from random Gaussian noise.

Due to the inherent denoising capability of diffusion models, there have been widely applied to improve the robustness of DNNs. This enhancement is achieved by Diffusion Purification [14; 15; 16; 17; 18], which purifies the network inputs to reduce the effects of various types of unforeseen corruptions or adversarial attacks. Among these, one particularly suitable and effective scenario of purification is to improve certified robustness through randomized smoothing [19] for image classification tasks. This method guarantees a tight robustness in the $\ell_{2}$ norm with a smoothed classifier. However, many previous works [19; 20; 21; 22; 23; 24] have shown that it still requires retraining with Gaussian augmented examples for each noise level to optimize the smoothed classifier. Diffusion models, capable of purifying Gaussian perturbed images before classification, can be seamlessly integrated with any base classifier to produce a smoothed classifier for arbitrary noise levels. This integration has been demonstrated to effectively enhance certified robustness, as supported by numerous studies [18; 25; 26; 27].

Refer to caption — Figure 1: An illustration of Consistency Purification framework.

However, current diffusion purification for certified robustness via randomized smoothing still faces significant trade-offs between efficiency and effectiveness. Although Denoising Diffusion Probabilistic Model (DDPM) [28] only requires one single network evaluation in the purification process [25], it generates the mean of the posterior data distribution conditioned the noisy sample, which does not necessarily locate on the data manifold and may exhibit ambiguity during classification. To further improve diffusion purification, various methods such as DensePure [26], Local Smoothing [27] and Noised Diffusion Classifiers [29] are applied. However, these methods are considerably less efficient as they require multiple times of the computational costs compared to one-step DDPM. Another promising approach involves using the Probability Flow Ordinary Differential Equation (PF-ODE) [3]. It has offered a method to accelerate the sampling process [4] and achieved a closer distribution to the original data, well balancing efficiency and effectiveness. However, several computational steps are still needed to solve the ODE numerically.

To find a Pareto superior solution in terms of efficiency and effectiveness, we introduce a new framework, Consistency Purification, which integrating consistency models into diffusion purification with Consistency Fine-tuning. The consistency model is a novel category of diffusion models that learns the trajectory of the PF-ODE that transits the data distribution to the noisy distribution. It is trained to map any point along this trajectory back to its starting point. This property is desirable for diffusion purification, as it allows images with any scale of Gaussian noise to be directly purified to the clean images. Distilled from a pre-trained diffusion model by simulating the PF-ODE trajectory, the consistency model can generate high-quality in-distribution images in a single step, thereby ensuring both efficiency and effectiveness. However, since consistency models are primarily trained for image generation, it may not suffice to guarantee that the purified image that maintains the same semantic meaning as the original image. To address this issue, we propose adding a Consistency Fine-tuning step into the purification framework, which further fine-tunes the consistency model using Learned Perceptual Image Patch Similarity (LPIPS) [30] loss, aiming to minimize the perceptual differences between the purified and original images, thereby ensuring better semantic alignment, while at the same time, ensuring the purified images still lie on the data manifold.

We show that Consistency Purification is Pareto superior compared to baselines from two aspects. First of all, compared with effective methods like DensePure [26], Local Smoothing [27] and Noised Diffusion Classifiers [29], Consistency Purification is much more efficient since it enables single-step purification. Secondly, compared with efficient method like onestep-DDPM [25], we provide both theoretical analysis and experiment results to support the effectiveness improvement of Consistency Purification. In Example 3.1, we show an one-dimensional example demonstrating that consistency model can generate on-manifold purified samples while onestep-DDPM does not have this property.

In Theorem 3.3, we show an important theoretical result that given a purifier, the lower the transport from the original distribution to the purified distribution, the higher the probability that the purified sample is sufficiently close to the original sample, and thus the better purification outcomes. Our experiment results verify that both the integration of consistency model in Consistency Purification and the further Consistency Fine-tuning decreases such transport and achieves better semantic alignment between purified samples and original samples.

Beyond the validation of our theory, we conduct comprehensive experiments to demonstrate the empirical improvements of Consistency Purification. Compared to various baseline settings, our approach has shown significant improvements, achieving an average 5% gain in performance over the previous onestep-DDPM under the same cost with single-step purification. These observations underscore our success in finding a Pareto superior diffusion purification framework in both efficiency and effectiveness for certified robustness.

2 Backgrounds

Randomized Smoothing [19]. Randomized smoothing is designed to certify the robustness of a given classifier under $\ell_{2}$ norm perturbations. Given a base classifier $f$ and an input ${\bm{x}}$ , randomized smoothing first defines the smoothed classifier by $g({\bm{x}})=\arg\max_{c}\mathbb{P}_{\bm{\epsilon}\sim\mathcal{N}(\bm{0},\sigma% ^{2}\bm{I})}(f({\bm{x}}+\bm{\epsilon})=c),$ where $\sigma$ is the noise level, which controls the trade-off between robustness and accuracy. [19] shows that $g({\bm{x}})$ induces the certifiable robustness for ${\bm{x}}$ under the $\ell_{2}$ norm with radius $R$ , where $R=\frac{\sigma}{2}\left(\Phi^{-1}(p_{A})-\Phi^{-1}(p_{B})\right),$ where $p_{A}$ and $p_{B}$ are the probability of the most probable class and “runner-up” class respectively; $\Phi$ is the inverse of the standard Gaussian CDF. The $p_{A}$ and $p_{B}$ can be estimated with arbitrarily high confidence via the Monte Carlo method.

Continuous-Time Diffusion Model [3]. The diffusion model has two components: the diffusion process followed by the reverse process. Given an input random variable ${\bm{x}}_{0}\sim p$ , the diffusion process adds isotropic Gaussian noises to the data so that the diffused random variable at time $t$ is ${\bm{x}}_{t}=\sqrt{\alpha_{t}}({\bm{x}}_{0}+\bm{\epsilon}_{t})$ , s.t., $\bm{\epsilon}_{t}\sim\mathcal{N}(\bm{0},\sigma_{t}^{2}{\bm{I}})$ , and $\sigma_{t}^{2}=(1-\alpha_{t})/\alpha_{t}$ , and we denote ${\bm{x}}_{t}\sim p_{t}$ . The forward diffusion process can also be defined by the stochastic differential equation

\mathrm{d}{\bm{x}}=D({\bm{x}},t)\mathrm{d}t+G(t)\mathrm{d}{\bm{w}},

(SDE)

where ${\bm{x}}_{0}\sim p$ , $D:\mathbb{R}^{d}\times\mathbb{R}\mapsto\mathbb{R}^{d}$ is the drift coefficient and typically has the form $D({\bm{x}},t)=D(t){\bm{x}}$ . $G:\mathbb{R}\mapsto\mathbb{R}$ is the diffusion coefficient, $\mathrm{d}t$ is an infinitesimal time step, and ${\bm{w}}(t)\in\mathbb{R}^{n}$ is the standard Wiener process.

The reverse process exists and removes the added noise by solving the reverse-time SDE [31]

\mathrm{d}{{\bm{x}}}=[D(t){{\bm{x}}}-G(t)^{2}\triangledown_{\hat{{\bm{x}}}}% \log p_{t}({{\bm{x}}})]\mathrm{d}t+G(t)\mathrm{d}\overline{{\bm{w}}},

(reverse-SDE)

where $p_{t}({\bm{x}})$ denotes the marginal distribution at time $t$ , and $\overline{{\bm{w}}}(t)$ is a reverse-time standard Wiener process. [3] defined the probability flow ODE (PF ODE) which has the same marginal distribution as reverse-SDE but can be solved much faster

\displaystyle\textstyle\mathrm{d}{{\bm{x}}}=\left[D(t){{\bm{x}}}-\frac{1}{2}G(% t)^{2}\nabla_{{{\bm{x}}}}\log p_{t}({{\bm{x}}})\right]\mathrm{d}t.

(PF-ODE)

As shown in [4], the perturbation kernel of SDE has the general form

\displaystyle p_{0t}(\bm{x}(t)\mid\bm{x}(0))=\mathcal{N}\left(\bm{x}(t);s(t)% \bm{x}(0),s(t)^{2}\sigma(t)^{2}\mathbf{I}\right)

(perturbation-kernel)

where $\textstyle s(t)=\exp\left(\int_{0}^{t}f(\xi)\mathrm{d}\xi\right)$ and $\sigma(t)=\sqrt{\int_{0}^{t}\frac{g(\xi)^{2}}{s(\xi)^{2}}\mathrm{~{}d}\xi}$ . Under this formulation, PF-ODE can written as

\textstyle\mathrm{d}\bm{x}=\left[\frac{\dot{s}(t)}{s(t)}\bm{x}-s(t)^{2}\dot{% \sigma}(t)\sigma(t)\nabla_{\bm{x}}\log p\left(\frac{\bm{x}}{s(t)};\sigma(t)% \right)\right]\mathrm{d}t

where $\cdot$ denotes the time derivative and $p\left(\frac{\bm{x}}{s(t)};\sigma(t)\right)$ denotes the marginal distribution at time $t$ . In our context, we use the EDM parameter [4] where $s(t)=1$ and $\sigma(t)=t$ which gives us a probability flow ODE

\displaystyle\mathrm{d}{{\bm{x}}}=-t\nabla_{{{\bm{x}}}}\log p_{t}({{\bm{x}}})% \mathrm{d}t.

(EDM-ODE)

We use $\{{\bm{x}}_{t}\}_{t\in[0,1]}$ and $\{\hat{\bm{x}}_{t}\}_{t\in[0,1]}$ to denote the diffusion process and the reverse process generated by SDE and reverse-SDE respectively, which follow the same distribution. We also use $\{\tilde{{\bm{x}}}_{t}\}_{t\in[0,1]}$ to denote the reverse process generated by PF-ODE, which has the same marginal distribution as $\{{\bm{x}}_{t}\}_{t\in[0,1]}$ and $\{\hat{\bm{x}}_{t}\}_{t\in[0,1]}$ given $t$ .

Discrete-Time Diffusion Model (DDPM [28]). DDPM constructs a discrete Markov chain $\{{\bm{x}}_{0},{\bm{x}}_{1},\cdots,{\bm{x}}_{i},\cdots,{\bm{x}}_{N}\}$ as the forward process for the training data ${\bm{x}}_{0}\sim p$ , such that $\mathbb{P}({\bm{x}}_{i}|{\bm{x}}_{i-1})=\mathcal{N}({\bm{x}}_{i};\sqrt{1-\beta% _{i}}{\bm{x}}_{i-1},\beta_{i}I)$ , where $0<\beta_{1}<\beta_{2}<\cdots<\beta_{N}<1$ are predefined noise scales such that ${\bm{x}}_{N}$ approximates the Gaussian white noise. Denote $\overline{\alpha}_{i}=\prod_{i=1}^{N}(1-\beta_{i})$ , we have $\mathbb{P}({\bm{x}}_{i}|{\bm{x}}_{0})=\mathcal{N}({\bm{x}}_{i};\sqrt{\overline% {\alpha}_{i}}{\bm{x}}_{0},(1-\overline{\alpha}_{i}){\bm{I}})$ , i.e., ${\bm{x}}_{t}({\bm{x}}_{0},\epsilon)=\sqrt{\overline{\alpha}_{i}}{\bm{x}}_{0}+(% 1-\overline{\alpha}_{i})\bm{\epsilon},\bm{\epsilon}\sim\mathcal{N}(\bm{0},{\bm% {I}})$ .

The reverse process of DDPM learns a reverse direction variational Markov chain $p_{\bm{\theta}}({\bm{x}}_{i-1}|{\bm{x}}_{i})=\mathcal{N}({\bm{x}}_{i-1};\bm{% \mu}_{\bm{\theta}}({\bm{x}}_{i},i),\Sigma_{\bm{\theta}}({\bm{x}}_{i},i))$ . [28] defines $\bm{\epsilon}_{\bm{\theta}}$ as a function approximator to predict $\bm{\epsilon}$ from ${\bm{x}}_{i}$ such that $\bm{\mu}_{\bm{\theta}}({\bm{x}}_{i},i)=\frac{1}{\sqrt{1-\beta_{i}}}\left({\bm{% x}}_{i}-\frac{\beta_{i}}{\sqrt{1-\overline{\alpha}_{i}}}\bm{\epsilon}_{\bm{% \theta}}({\bm{x}}_{i},i)\right)$ . Then the reverse time samples are generated by $\hat{{\bm{x}}}_{i-1}=\frac{1}{\sqrt{1-\beta_{i}}}\left(\hat{\bm{x}}_{i}-\frac{% \beta_{i}}{\sqrt{1-\overline{\alpha}_{i}}}\bm{\epsilon}_{\bm{\theta}^{*}}(\hat% {\bm{x}}_{i},i)\right)+\sqrt{\beta_{i}}\bm{\epsilon},\bm{\epsilon}\sim\mathcal% {N}(\mathbf{0},I)$ , and the optimal parameters $\bm{\theta}^{*}$ are obtained by solving $\bm{\theta}^{*}:=\arg\min_{\bm{\theta}}\mathbb{E}_{{\bm{x}}_{0},\bm{\epsilon}}% \left[\|\bm{\epsilon}-\bm{\epsilon}_{\bm{\theta}}(\sqrt{\overline{\alpha}_{i}}% {\bm{x}}_{0}+(1-\overline{\alpha}_{i}),i)\|_{2}^{2}\right]$ . [28] also provided a one-step approximate reconstruction of ${\bm{x}}_{0}$ from any ${\bm{x}}_{t}$ ,

\displaystyle{\bm{x}}_{0}\approx\hat{{\bm{x}}}_{0}=\left({\bm{x}}_{t}-\sqrt{1-% \overline{\alpha}_{t}}\bm{\epsilon}_{\theta}({\bm{x}}_{t})\right)/\sqrt{% \overline{\alpha}_{t}}.

(onestep-DDPM)

Consistency Model [32]. Given a solution trajectory of PF-ODE, the consistency model is defined as $D:({\bm{x}}_{t},t)\mapsto{\bm{x}}_{\epsilon}$ . The model exhibits the property of self-consistency, ensuring that its outputs are consistent for arbitrary pairs of $({\bm{x}}_{t},t)$ from the same PF-ODE trajectory; specifically, $D({\bm{x}}_{t},t)=D({\bm{x}}_{t^{\prime}},t^{\prime})$ for all $t,t^{\prime}\in[\epsilon,T]$ . As shown by the definition, consistency models are suitable for one-shot denoising, allowing for the recovery of ${\bm{x}}_{\epsilon}$ from any noisy input ${\bm{x}}_{t}$ in one network evaluation. Two distinct training strategies can be employed for training the consistency models: distillation mode and isolation mode. The primary distinction lies in whether the models distill the knowledge from pre-trained diffusion models or train from initial parameters. According to the experiments reported in [32], consistency models trained in the distillation mode have been shown to outperform those trained in isolation mode for generating high-quality images. Consequently, our paper only considers consistency models trained in the distillation mode.

3 Theoretical Analysis

In this section, we provide theoretical explanations on the advantages of Consistency Purification, with a focus on its purification performance improvement in terms of certified robustness over [25].

As demonstrated in [3], PF-ODE maintains the marginal distribution of reverse-SDE, thereby establishing a deterministic map** between the noisy distribution ${\bm{x}}_{t}$ and the data distribution ${\bm{x}}_{0}$ . In other words, PF-ODE guarantees that the purified sample lies on the data manifold, unlike onestep-DDPM, which lacks this assurance. We present here a simple one dimensional example for illustration.

Example 3.1.

Consider a one-dimensional space with a data set consisting of two samples $\{{\bm{y}}_{1},{\bm{y}}_{2}\}$ , where ${\bm{y}}_{1}=1$ and ${\bm{y}}_{2}=-1$ . The distribution can be represented as a mixture of Dirac delta distributions: $p_{\text{data}}({\bm{x}})=\frac{1}{2}\left(\delta({\bm{x}}-{\bm{y}}_{1})+% \delta({\bm{x}}-{\bm{y}}_{2})\right)$ . By setting $s(t)=1$ and $\sigma(t)=t$ in perturbation-kernel, the distribution at time $t$ becomes: $p_{t}({\bm{x}})=\frac{1}{2t\sqrt{2\pi}}\big{(}e^{-\frac{1}{2}\left(\frac{{\bm{% x}}-1}{t}\right)^{2}}+e^{-\frac{1}{2}\left(\frac{{\bm{x}}+1}{t}\right)^{2}}% \big{)}$ . Then

	$\displaystyle\textstyle\frac{\mathrm{d}\log p_{t}({\bm{x}})}{\mathrm{d}{\bm{x}% }}~{}$	$\displaystyle~{}\textstyle=\frac{-\left(\frac{{\bm{x}}-1}{t}\right)e^{-\frac{1% }{2}\left(\frac{{\bm{x}}-1}{t^{2}}\right)^{2}}-\left(\frac{{\bm{x}}+1}{t}% \right)e^{-\frac{1}{2}\left(\frac{{\bm{x}}+1}{t^{2}}\right)^{2}}}{2t\sqrt{2\pi% }p_{t}({\bm{x}})}$
		$\displaystyle\textstyle=~{}-\frac{{\bm{x}}}{t^{2}}+\frac{e^{-\frac{1}{2}\left(% \frac{{\bm{x}}-1}{t}\right)^{2}}-e^{-\frac{1}{2}\left(\frac{{\bm{x}}+1}{t}% \right)^{2}}}{e^{-\frac{1}{2}\left(\frac{{\bm{x}}-1}{t}\right)^{2}}+e^{-\frac{% 1}{2}\left(\frac{{\bm{x}}+1}{t}\right)^{2}}}.$

From the derivative formula $\frac{\mathrm{d}\log p_{t}({\bm{x}})}{\mathrm{d}{\bm{x}}}$ , it’s evident that ${\bm{x}}=0$ is an equilibrium point, and the right-hand side expression is Lipschitz continuous around ${\bm{x}}=0$ by L’Hôpital’s rule. Thus, according to the Picard-Lindelöf theorem, any trajectory starting on either side of ${\bm{x}}=0$ will not cross this point. As PF-ODE drives $p_{t}({\bm{x}})$ closer to the Dirac delta distribution $p_{\text{data}}({\bm{x}})$ as $t$ approaches zero, any initial point on positive/negative side of ${\bm{x}}=0$ will eventually approach $1$ or $-1$ , i.e., the data manifold. Furthermore, in this example, PF-ODE generates not only a purified sample on the data manifold but also closest to the noisy sample. This property is desirable as it establishes a relatively large "robust" neighborhood around each true data point, which implies high certified robustness and a significant certified radius, which will be further discussed later. With the consistency model, we do not need to solve the ODE but rather directly map the noisy sample to either $1/-1$ depending on its location relative to ${\bm{x}}=0$ .

For comparison, given any ${\bm{x}}$ and $t$ , the onestep-DDPM will output a posterior mean that is

\displaystyle\textstyle\frac{e^{-\frac{1}{2}\left(\frac{{\bm{x}}-1}{t}\right)^% {2}}-e^{-\frac{1}{2}\left(\frac{{\bm{x}}+1}{t}\right)^{2}}}{e^{-\frac{1}{2}% \left(\frac{{\bm{x}}-1}{t}\right)^{2}}+e^{-\frac{1}{2}\left(\frac{{\bm{x}}+1}{% t}\right)^{2}}}=\textstyle\frac{e^{\frac{2{\bm{x}}}{t^{2}}}-1}{e^{\frac{2{\bm{% x}}}{t^{2}}}+1}.

The posterior mean will be near $1$ or $-1$ only when $t$ is sufficiently small compared to $\|{\bm{x}}\|$ . Otherwise, it deviates from the data manifold. In the case when $t$ is large, the posterior mean will be close to zero, locating in an ambiguous classification region. In adversarial purification [25; 26; 14], we typically select $t$ based on the variance of the noise added to the data sample rather than using an very small $t$ . This practice helps avoid significant deviations in the posterior mean estimation due to the imperfect estimation of score/noise. With a very small $t$ , even a slight bias in score/noise estimation can lead to a substantial deviation, resulting in a denoised sample even farther from the data manifold represented by $p_{\text{data}}({\bm{x}})$ .

Additionally, PF-ODE is deterministic, eliminating the overhead of majority voting required when using reverse-SDE as a purifier [26]. The consistency model, which reduces ODE solving to a one-step map**, further ensures purification has the same efficiency as onestep-DDPM while kee** the in-distribution property.

Though the consistency model enjoys both in-distribution property and one-step efficiency, it does not guarantee that the purified sample has the same semantic meaning as the original sample. This is because the derivation of PF-ODE only guarantees a map** between noisy distribution and data distribution, which is sufficient for generation, but not enough for denoising purposes.

To address this concern, we first delineate the desired characteristics of the purifier. As evidenced in prior works [14; 25; 26; 33], an ideal purifier should yield a purified output situated within a proximate vicinity of the original input. It is generally presumed that such purified outputs retain the semantic meaning of the original inputs with a high probability. The disparity in semantic consistency between the noisy input and the purified output generated by PF-ODE arises due to the proximity of the purified output to other samples. In this regard, we propose quantifying this disparity through the notion of transport between the data distribution and the purified distribution, derived by introducing Gaussian perturbations to the data distribution and subsequently applying denoising via PF-ODE. Given an original sample ${\bm{x}}$ , Gaussian noise $\epsilon$ , and purifier $d$ , the map** in the transport process is defined as $T:{\bm{x}}\rightarrow d({\bm{x}}+\epsilon)$ , which is probabilistic. We aim to demonstrate that a diminished transport between the data distribution and the purified distribution is conducive to a higher likelihood of the purified output being situated in proximity to the original sample, thereby preserving its semantic meaning.

We will leverage the following definition.

Definition 3.2.

Given the data distribution $p$ , Gaussian noise $\epsilon$ , timestep $t$ , and a purifier $d$ , we define $\pi_{t}:{\bm{x}}\rightarrow d({\bm{x}}+t\epsilon)$ and the “transport" under $g_{t}$ between the data distribution and purified distribution as $T_{\pi_{t}}(p):=\int\|{\bm{x}}-\pi_{t}({\bm{x}})\|\cdot p({\bm{x}})d{\bm{x}}$ .

Intuitively, transport measures the distance between the original and purified samples, which should be small by an effective purifier. Below, we quantify this intuition and present our main theorem. See the detailed proof in Appendix B.

Theorem 3.3.

Given the transport $T_{\pi_{t}}(p)$ between the data distribution $p$ and the corresponding purified distribution under $g_{t}$ , then for any $r>0$ , the probability that the distance between the original sample ${\bm{x}}$ and purified sample $\hat{{\bm{x}}}=\pi_{t}({\bm{x}})$ is larger than $r$ is upper bounded by $\frac{T_{\pi_{t}}(p)}{r}$ .

Remark 3.4.

By Theorem 3.3, the efficacy of the purifier hinges on two crucial factors: the transport $T_{\pi_{t}}(p)$ and the radius $r$ . A theoretically perfect purifier would yield zero transport; however, this is unattainable due to the inherent randomness of $g_{t}$ . Typically, we can optimize the parameter $t$ to minimize the transport, denoted as $T^{*}=\min_{t}\frac{T_{\pi_{t}}(p)}{r}$ . In the context of classification tasks, the selection of $r$ also depends on the robustness of the classifier; a more robust classifier allows a larger $r$ to be chosen, thereby guarantee better purification efficacy.

	FID at different $\sigma$
Loss	0.25	0.5	1.0
- -	60.3	155.3	350.3
$\ell_{1}$	96.8	205.7	383.6
$\ell_{2}$	102.1	214.8	375.4
LPIPS	20.5	100.9	338.1

For ensuring consistency in semantic meaning between the original and purified samples, it is insufficient merely to minimize their distance; it is also necessary that the purified sample resides on the data manifold, which is the in-distribution property we previously mentioned. To concurrently achieve both objectives, rather than solely focusing on minimizing the Euclidean distance between the original and purified samples, we opt to minimize the Learned Perceptual Image Patch Similarity (LPIPS) loss between them. This strategy aids in mitigating the risk of the purified sample deviating from the data manifold, thereby preserving semantic meaning. In Table 1, we show that using LPIPS is better than $\ell_{1}$ and $\ell_{2}$ loss for Consistency Fine-tuning when we want to guarantee the generated images are in-distribution, where lower FID scores indicate better in-distribution properties.

Figure 2 validates the effectiveness of Consistency Purification based on our results in Theorem 3.3, it shows that both the integration of consistency model in Consistency Purification and the further Consistency Fine-tuning can decrease the transport from the original distribution to the purified distribution. Specifically, we can see that Consistency Purification achieves a lower average distance from the purified sample to the original sample compared with onestep-DDPM, and Consistency Fine-tuning further decreases this average distance, indicating both components result in a lower transport and thus a better semantic alignment between purified samples and original samples.

4 Method

We propose our framework, Consistency Purification, with a further improved version using Consistency Fine-tuning.

4.1 Consistency Purification

We introduce Consistency Purification, directly applying consistency model as a purifier to integrate with a base classifier into smoothed classifier for randomized smoothing.

Following Diffusion Denoised Smoothing outlined in [25], it is necessary to establish a map** between Gaussian noise augmented images required by randomized smoothing and the noised image in the ODE trajectory of consistency model. For a given consistency model purifier $D_{\theta}$ , any noisy input ${\bm{x}}_{t}\sim\mathcal{N}({\bm{x}},t^{2}{\bm{I}})$ can be recovered to the trajectory’s start ${\bm{x}}_{\epsilon}$ by directly passing it through the model with time $t$ : ${\bm{x}}_{\epsilon}=D_{\theta}({\bm{x}}_{t},t)$ .

When comparing this to the image augmented with additive Gaussian noise ${\bm{x}}_{rs}\sim\mathcal{N}({\bm{x}},\sigma^{2}{\bm{I}})$ , which is required by randomized smoothing, we observe that ${\bm{x}}_{rs}$ and ${\bm{x}}_{t}$ share the same formula when $t=\sigma$ . However, since the variances $\sigma\in\{\sigma_{i}\}_{i=1}^{m}$ may not be used during the training of the consistency model, we empirically select the nearest time step $t$ from the discrete time steps used in training for each $\sigma$ .

For the entire time horizon $[\epsilon,T]$ with $N-1$ sub-interval boundaries $t_{1}=\epsilon<t_{2}<\cdot\cdot\cdot<t_{N}=T$ , the time steps used in training are computed by: $t_{i}=(\epsilon^{1/\rho}+\ ^{i-1}/_{N-1}(T^{1/\rho}-\epsilon^{1/\rho}))^{\rho}% ,\ \text{where}\ \rho=7$ .

Given the variance $\sigma$ of Gaussian noise used in randomized smoothing, we select the corresponding time step $t^{*}_{\sigma}$ for Consistency Purified Smoothing by $t^{*}_{\sigma}=\{t_{i}|\sigma\in(\frac{t_{i-1}+t_{i}}{2},\frac{t_{i}+t_{i+1}}{% 2}]\}$ .

4.2 Consistency Fine-tuning

To optimize the consistency model for aligning semantic meanings during purification, we fine-tune the purifier $D_{\theta}$ by minimizing the following loss function: $\mathcal{L}_{\theta}=\mathbb{E}\|{\bm{x}}-D_{\theta}({\bm{x}}_{\sigma},t^{*}_{% \sigma})\|_{\text{LPIPS}}$ , where the expectation is taken with ${\bm{x}}\sim p_{data}$ , $\sigma\sim\mathcal{U}\{\sigma_{i}\}_{i=1}^{m}$ , ${\bm{x}}_{\sigma}\sim\mathcal{N}({\bm{x}},\sigma^{2}{\bm{I}})$ . Here LPIPS denotes the distance computed by the Learned Perceptual Image Patch Similarity [30]. $p_{data}$ represents the distribution of the training data, from which clean images ${\bm{x}}$ are sampled. $\mathcal{U}\{\sigma_{i}\}_{i=1}^{m}$ denotes the uniform distribution over $m$ different noise scales $\sigma_{i}$ used for randomized smoothing. Typically, we select the scale set $\sigma_{i}\in\{\text{0.25},\text{0.5},\text{1.0}\}$ , which is commonly used to compute the certified radius via randomized smoothing.

After obtaining the fine-tuned consistency model purifier $D_{\theta^{*}}$ ,it can replace the original model used in Consistency Purified Smoothing to purify any noised image ${\bm{x}}_{rs}$ with Gaussian variance $\sigma_{i}$ , resulting in the final purified image ${\bm{x}}_{p}$ by ${\bm{x}}_{p}=D_{\theta^{*}}({\bm{x}}_{rs},t^{*}_{\sigma_{i}})$ .

We present the detailed algorithm of our Consistency Purification in Appendix A.

5 Experiments

In this section, we begin by detailing the experimental settings, followed by our main results. Additionally, we conduct ablation studies to further demonstrate the effectiveness of our framework. All experiments are conducted with 1 $\times$ NVIDIA RTX A5000 24GB GPU.

5.1 Experimental Settings.

Dataset. We evaluate the Consistency Purification framework on both CIFAR-10 [34] and ImageNet-64 [35]. CIFAR-10 contains $32\times 32$ pixel images across 10 different categories while ImageNet-64 includes $64\times 64$ pixel images across 1000 categories. 500 test images for CIFAR-10 are selected with balanced number of classes. Due to limited computational resources, we only select 100 test images for ImageNet-64.

Consistency Purification. For CIFAR-10, to demonstrate the effectiveness of Consistency Purification, we first perform purification with a public unconditional consistency model [36]. After that, to further improve the performance, we fine-tune the model with noise levels $\sigma$ sampling from $\{0.25,0.5,1.0\}$ , shown as the (+ Consistency Fine-tuning). However, currently there is no publicly available unconditional consistency model checkpoint for the ImageNet dataset that can be used directly for purification purposes. The only available model is the conditional consistency model on ImageNet-64. Thus, here we trained an unconditional consistency model on ImageNet-64, initializing it with the existing conditional consistency model checkpoint. Details of the training process are included in Appendix C. Additionally, we also conduct Consistency Fine-tuning on ImageNet-64 model with noise levels $\sigma\in\{0.05,0.15,0.25\}$ .

Baselines. For comparative analysis of CIFAR-10, we conduct baseline experiments under various settings. The first baseline involves onestep-DDPM, where we employ the 50-M unconditional improved diffusion models from [2] utilizing the one-shot denoising method [25] for purification. Given that our consistency model is distilled from an EDM model [4], we include EDM as our baselines, applying both one-shot denoising (onestep-EDM) and ODE solver (PF-ODE EDM) for purification. Additionally, we include the recent advancement in diffusion purification methods, Diffusion Calibration, as a baseline following [37], which fine-tunes the diffusion model with the guidance of classifier WideResNet28-10 to improve the purification accuracy under the specific classifier. While for ImageNet-64, due to the lack of public unconditional EDM model, we only include the comparison baseline with onestep-DDPM.

Randomized Smoothing Settings. We set $N=10000$ for both CIFAR-10 and ImageNet as the number of sampling times used in randomized smoothing. We compute the certified radius for each test example at three different noise levels $\sigma\in\{0.25,0.5,1.0\}$ for CIFAR-10 and $\sigma\in\{0.05,0.15,0.25\}$ for ImageNet-64. Then we calculate the proportion of test examples whose radius exceeds a specific threshold $\epsilon$ . The highest accuracy among these noise levels is reported as the certified accuracy at $\epsilon$ .

Classifiers. For the classifier used after purification for CIFAR-10, we employ ViT-B/16 model [38], which is pretrained on ImageNet-21k [35] and finetuned on CIFAR-10 dataset. In our ablation studies, we also use ResNet [39] and WideResNet [40] trained on CIFAR-10. For ImageNet-64, we make up-sampling on the 64 $\times$ 64 images and directly apply ViT-B/16 as the classifier.

5.2 Main Results.

We present the certified accuracy of Consistency Purification for both CIFAR-10 and ImageNet-64 dataset, with the results presented in Table 2. We also include the purification steps which decide whether the purifier needs multiple evaluation times through the networks (Multi Steps) other than a single network evaluation (One Step). As observed from Table 2, Consistency Purification significantly outperforms onestep-DDPM for both CIFAR-10 and ImageNet-64 with even higher certified accuracy with Consistency Fine-tuning. Besides, for CIFAR-10, the results also suggest the effectiveness of Consistency Purification with Consistency Fine-tuning when compared with more baseline methods such as onestep-EDM, PF-ODE EDM and Diffusion Calibration. We also present a detailed certified accuracy evaluation for fine-grained $\epsilon$ at different noise levels $\sigma$ compared with onestep-DDPM in Figure 3 of Appendix D. All results have demonstrated that Consistency Purification is able to certify the robustness with both efficiency and effectiveness.

Table 2: Certified Accuracy of Consistency Purification for CIFAR-10 and ImageNet-64.

CIFAR-10		Certified Accuracy at $\epsilon$ (%)
Method	Purification Steps	0.0	0.25	0.5	0.75	1.0
onestep-DDPM[25]	One Step	87.6	73.6	55.6	39.2	29.6
onestep-EDM	One Step	87.4	76.2	58.8	40.8	32.4
PF-ODE EDM	Multi Steps	89.6	77.0	60.4	42.6	34.0
Diffusion Calibration[37]	One Step	90.2	76.4	57.2	42.6	32.4
Consistency Purification	One Step	90.4	77.2	59.8	42.8	33.2
+ Consistency Fine-tuning	One Step	90.2	79.4	62.4	43.8	35.4
ImageNet-64		Certified Accuracy at $\epsilon$ (%)
Method	Purification Steps	0.0	0.05	0.15	0.25	0.35
onestep-DDPM [25]	One Step	53.0	44.0	32.0	15.0	7.0
Consistency Purification	One Step	61.0	52.0	34.0	19.0	13.0
+ Consistency Fine-tuning	One Step	69.0	57.0	35.0	21.0	16.0

5.3 Ablation Studies.

We conduct various ablation studies to evaluate the effectiveness of our proposed method.

Fine-tuning Loss Functions. To further demonstrate that LPIPS loss is the best choice considering both on-manifold purification and semantic meaning alignment, we assess the certified accuracy of Consistency Purification using different loss functions during Consistency Fine-tuning. Instead of LPIPS distance between the clean and purified images as the loss function, we experiment with $\ell_{1}$ and $\ell_{2}$ distances. Results in Table 5.3 indicate that Consistency Purification with LPIPS loss achieves the highest Certified Accuracy. In contrast, fine-tuning with $\ell_{1}$ and $\ell_{2}$ distances compromises the purification performance for certification. This demonstrates that fine-tuning with LPIPS loss function effectively aligns semantic meanings, whereas $\ell_{1}$ or $\ell_{2}$ distances may hurt them.

Noise Levels Sampling Schedules during Consistency Fine-tuning. In our experiments of Consistency Fine-tuning, we simply select the same sampling schedules of noise levels $\sigma\sim\mathcal{U}\{0.25,0.5,1.0\}$ , uniformly sampling $\sigma$ used in randomized smoothing. To empirically demonstrate its effectiveness, we compare this approach with continuous sampling schedules where $\sigma\sim\mathcal{U}[0,1]$ . Results presented in Table 5.3 show that our discrete sampling schedule achieves higher certified accuracy. This indicates that fine-tuning with a discrete scale, aligned with the noise levels used in randomized smoothing, enhances certified robustness.

Table 3: Certified Accuracy of Consistency Purification with different loss functions during fine-tuning for CIFAR-10. "- -" represents the setting without fine-tuning.

	Certified Accuracy at $\epsilon$ %
Distance	0.0	0.25	0.5	0.75	1.0
- -	90.4	77.2	59.8	42.8	33.2
$\ell_{1}$	89.4	76.4	59.6	42.4	31.4
$\ell_{2}$	90.0	77.0	59.8	42.4	33.4
LPIPS	90.2	79.4	62.4	43.8	35.4

Table 4: Certified Accuracy of Consistency Purification with continuous and discrete sampling schedules during fine-tuning for CIFAR-10. "- -" represents the setting without fine-tuning.

	Certified Accuracy at $\epsilon$ %
Schedules	0.0	0.25	0.5	0.75	1.0
- -	90.4	77.2	59.8	42.8	33.2
$[$ 0,1 $]$	89.0	76.2	59.8	43.2	33.8
{0.25, 0.5, 1.0}	90.2	79.4	62.4	43.8	35.4

Generalizability with Different Classifiers. We compute certified accuracy with various classifiers to test if our framework maintains its effectiveness with arbitrary classifiers. The results, presented in Table 6, compare Consistency Fine-tuning with Diffusion Calibration, an alternative method to fine-tune diffusion models for improving the certified robustness. When evaluated across different classifiers, including ViT-B/16, ResNet56, and WideNet28-10, our method outperforms Diffusion Calibration except certified accuracy at $\epsilon=0.0$ on WRN28-10 model. It is worth noting that the Diffusion Calibration, which requires a specific classifier for guidance during fine-tuning, exhibits limitations, only achieving comparable performance with the guidance classifier WRN28-10. This demonstrates the advantages of Consistency Fine-tuning in generalizing across different classifiers.

Fine-tuning Classifier vs. Fine-tuning Diffusion Model. A potential concern with Consistency Fine-tuning is the higher certified accuracy and lower training cost associated with Fine-tuning the Classifier (CLS-FT) compared to our approach of Fine-tuning the Diffusion Model (DM-FT). However, our experiments, as shown in Table 6, indicate that DM-FT does not conflict with CLS-FT; rather, combining these two methods achieves even higher certified accuracy. On another hand, although CLS-FT yield slightly higher certified accuracy than DM-FT, its requirement for fine-tuning a specific classifier compromises the natural property of diffusion purification frameworks with arbitrary off-the-shelf classifiers, thus limiting the practical applicability.

Table 5: Certified Accuracy of Consistency Fine-tuning with different classifiers on CIFAR-10. The guidance classifier used in Diffusion Calibration is WideResNet28-10.

Method	Classifier	0.0	0.25	0.5	0.75	1.0
		Certified Accuracy at $\epsilon$ %
	ViT-B/16	90.2	76.4	57.2	42.6	32.4
Diffusion Calibration [37]	WRN28-10	88.2	76.4	59.2	42.0	31.8
	ResNet56	86.0	72.8	52.6	35.2	25.8
	ViT-B/16	90.2	79.4	62.4	43.8	35.4
Consistency Fine-tuning	WRN28-10	88.0	76.4	59.8	42.8	33.0
	ResNet56	87.2	74.8	57.6	38.2	30.2

Table 6: Certified Accuracy of Fine-tuning the Diffusion Model (DM-FT) compared with Fine-tuning the Classifier (CLS-FT) in diffusion purification frameworks on CIFAR-10.

		Certified Accuracy at $\epsilon$ %
DM-FT	CLS-FT	0.0	0.25	0.5	0.75	1.0
-	-	90.4	77.2	59.8	42.8	33.2
✓	-	90.2	79.4	62.4	43.8	35.4
-	✓	90.4	79.8	63.4	44.2	35.2
✓	✓	90.8	80.0	64.8	44.6	36.8

6 Conclusion

In this paper, we introduced Consistency Purification, a novel framework proposed to enhance certified robustness via randomized smoothing. By incorporating consistency models into diffusion purification approach and further refining them through Consistency Fine-tuning, our empirical experiments have demonstrate the framework’s ability to achieve high certified robustness efficiently with one single network evaluation for purification.

Limitations. A notable limitation of our study is that our empirical results do not include computing certified robustness of high-resolution images such as ImageNet 256 $\times$ 256. This constraint is due to the absence of publicly available checkpoints for the consistency model at this resolution. Additionally, training a consistency model for ImageNet 256 $\times$ 256 would require huge computing resources, which are currently beyond our affordability. However, our framework is designed for adaptability and could be easily extended to ImageNet 256 $\times$ 256 once these checkpoints become available. As a result, our empirical evaluations in this paper are limited to the CIFAR-10 and ImageNet 64 $\times$ 64 datasets.

References

[1] Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
[2] Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pages 8162–8171. PMLR, 2021.
[3] Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021.
[4] Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models. Advances in Neural Information Processing Systems, 35:26565–26577, 2022.
[5] Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
[6] Zhifeng Kong, Wei **, Jiaji Huang, Kexin Zhao, and Bryan Catanzaro. Diffwave: A versatile diffusion model for audio synthesis. In International Conference on Learning Representations, 2020.
[7] Nanxin Chen, Yu Zhang, Heiga Zen, Ron J Weiss, Mohammad Norouzi, and William Chan. Wavegrad: Estimating gradients for waveform generation. In International Conference on Learning Representations, 2020.
[8] Vadim Popov, Ivan Vovk, Vladimir Gogoryan, Tasnima Sadekova, and Mikhail Kudinov. Grad-tts: A diffusion probabilistic model for text-to-speech. In International Conference on Machine Learning, pages 8599–8608. PMLR, 2021.
[9] Jonathan Ho, Tim Salimans, Alexey Gritsenko, William Chan, Mohammad Norouzi, and David J Fleet. Video diffusion models. Advances in Neural Information Processing Systems, 35:8633–8646, 2022.
[10] Jonathan Ho, William Chan, Chitwan Saharia, Jay Whang, Ruiqi Gao, Alexey Gritsenko, Diederik P Kingma, Ben Poole, Mohammad Norouzi, David J Fleet, et al. Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303, 2022.
[11] Shitong Luo and Wei Hu. Diffusion probabilistic models for 3d point cloud generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2837–2845, 2021.
[12] Animesh Karnewar, Andrea Vedaldi, David Novotny, and Niloy J Mitra. Holodiffusion: Training a 3d diffusion model using 2d images. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 18423–18433, 2023.
[13] Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Mildenhall. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988, 2022.
[14] Weili Nie, Brandon Guo, Yujia Huang, Chaowei Xiao, Arash Vahdat, and Anima Anandkumar. Diffusion models for adversarial purification. In International Conference on Machine Learning (ICML), 2022.
[15] Shutong Wu, Jiongxiao Wang, Wei **, Weili Nie, and Chaowei Xiao. Defending against adversarial audio via diffusion model. In The Eleventh International Conference on Learning Representations, 2022.
[16] Jiachen Sun, Jiongxiao Wang, Weili Nie, Zhiding Yu, Zhuoqing Mao, and Chaowei Xiao. A critical revisit of adversarial robustness in 3d point cloud recognition with diffusion-driven purification. In International Conference on Machine Learning, pages 33100–33114. PMLR, 2023.
[17] **yi Wang, Zhaoyang Lyu, Dahua Lin, Bo Dai, and Hongfei Fu. Guided diffusion model for adversarial purification. arXiv preprint arXiv:2205.14969, 2022.
[18] Quanlin Wu, Hang Ye, and Yuntian Gu. Guided diffusion model for adversarial purification from random noise. arXiv preprint arXiv:2206.10875, 2022.
[19] Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certified adversarial robustness via randomized smoothing. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 1310–1320. PMLR, 09–15 Jun 2019.
[20] Hadi Salman, Jerry Li, Ilya Razenshteyn, Pengchuan Zhang, Huan Zhang, Sebastien Bubeck, and Greg Yang. Provably robust deep learning via adversarially trained smoothed classifiers. Advances in Neural Information Processing Systems, 32, 2019.
[21] Runtian Zhai, Chen Dan, Di He, Huan Zhang, Boqing Gong, Pradeep Ravikumar, Cho-Jui Hsieh, and Liwei Wang. Macer: Attack-free and scalable robust training via maximizing certified radius. arXiv preprint arXiv:2001.02378, 2020.
[22] Jongheon Jeong and **woo Shin. Consistency regularization for certified robustness of smoothed classifiers. Advances in Neural Information Processing Systems, 33:10558–10570, 2020.
[23] Miklós Z Horváth, Mark Niklas Müller, Marc Fischer, and Martin Vechev. Boosting randomized smoothing with variance reduced classifiers. arXiv preprint arXiv:2106.06946, 2021.
[24] Jongheon Jeong, Sejun Park, Minkyu Kim, Heung-Chang Lee, Do-Guk Kim, and **woo Shin. Smoothmix: Training confidence-calibrated smoothed classifiers for certified robustness. Advances in Neural Information Processing Systems, 34:30153–30168, 2021.
[25] Nicholas Carlini, Florian Tramer, J Zico Kolter, et al. (certified!!) adversarial robustness for free! arXiv preprint arXiv:2206.10550, 2022.
[26] Chaowei Xiao, Zhongzhu Chen, Kun **, Jiongxiao Wang, Weili Nie, Mingyan Liu, Anima Anandkumar, Bo Li, and Dawn Song. Densepure: Understanding diffusion models for adversarial robustness. In The Eleventh International Conference on Learning Representations, 2022.
[27] Jiawei Zhang, Zhongzhu Chen, Huan Zhang, Chaowei Xiao, and Bo Li. $\{$ DiffSmooth $\}$ : Certifiably robust learning via diffusion models and local smoothing. In 32nd USENIX Security Symposium (USENIX Security 23), pages 4787–4804, 2023.
[28] Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models, 2020.
[29] Huanran Chen, Yinpeng Dong, Shitong Shao, Zhongkai Hao, Xiao Yang, Hang Su, and Jun Zhu. Your diffusion model is secretly a certifiably robust classifier. arXiv preprint arXiv:2402.02316, 2024.
[30] Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018.
[31] Brian DO Anderson. Reverse-time diffusion equation models. Stochastic Processes and their Applications, 12(3):313–326, 1982.
[32] Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models. In Proceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org, 2023.
[33] Huan Zhang, Tsui-Wei Weng, Pin-Yu Chen, Cho-Jui Hsieh, and Luca Daniel. Efficient neural network robustness certification with general activation functions. In NeurIPS, 2018.
[34] Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009.
[35] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
[36] Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models. In International Conference on Machine Learning, pages 32211–32252. PMLR, 2023.
[37] Jongheon Jeong and **woo Shin. Multi-scale diffusion denoised smoothing. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
[38] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
[39] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
[40] Sergey Zagoruyko and Nikos Komodakis. Wide residual networks, 2017.

Appendix A Consistency Purification Algorithm

We provide detailed descriptions of Consistency Purification in the following algorithms. Algorithm 1 presents the function of Consistency Fine-tuning and Consistency Purification respectively. Algorithm 2 shows the randomized smoothing algorithm from [19] with applying Consistency Purification to do prediction and compute the certified radius.

Algorithm 1 Consistency Fine-tuning and Consistency Purification

1:Consistency model purifier

D_{\theta}

where

\theta

represents the model parameters. Noise levels used in randomized smoothing

\{\sigma_{i}\}_{i=1}^{m}

. Arbitrary classification model

f_{\text{clf}}

. Fine-tuning learning rate

\eta

2:function ConsistencyFine-tuning(

D_{\theta}

)

3: repeat

4: sample

x\in

Training Dataset,

\sigma\in\{\sigma_{i}\}_{i=1}^{m}

x_{\sigma}\leftarrow x+\mathcal{N}(0,\sigma^{2}{\bm{I}})

t^{*}_{\sigma}\leftarrow\textsc{GetTimestep}(\sigma)

\mathcal{L}\leftarrow\text{LPIPS}(x,D_{\theta}(x_{\sigma},t^{*}))

\theta\leftarrow\theta-\eta\nabla_{\theta}\mathcal{L}

9: until convergence

10: return

D_{\theta}

11:end function

12:

13:function ConsistencyPurification(

f_{\text{clf}},x,\sigma

)

14:

t^{*}_{\sigma}\leftarrow\textsc{GetTimestep}(\sigma)

15:

{\bm{x}}_{rs}\leftarrow{\bm{x}}+\mathcal{N}(0,\sigma^{2}I)

16:

{\bm{x}}_{p}\leftarrow D_{\theta^{*}}({\bm{x}}_{rs},t^{*}_{\sigma})

17:

y\leftarrow f_{\text{clf}}({\bm{x}}_{p})

18: return

y

19:end function

20:

21:function GetTimestep(

\sigma

)

22:

t_{i}\leftarrow(\epsilon^{1/\rho}+\frac{i-1}{N-1}(T^{1/\rho}-\epsilon^{1/\rho}% ))^{\rho}

for

i\in\{1,\ldots,N\}

23:

t^{*}_{\sigma}\leftarrow

find

\{t_{i}|\sigma\in\left(\frac{t_{i-1}+t_{i}}{2},\frac{t_{i}+t_{i+1}}{2}\right]\}

24: return

t^{*}_{\sigma}

25:end function

Appendix B Proof of Theorem 3.3

Theorem 3.3. Given the transport $T_{\pi_{t}}(p)$ between the data distribution $p$ and the corresponding purified distribution under $g_{t}$ , then for any $r>0$ , the probability that the distance between the original sample ${\bm{x}}$ and purified sample $\hat{{\bm{x}}}=\pi_{t}({\bm{x}})$ is larger than $r$ is upper bounded by $\frac{T_{\pi_{t}}(p)}{r}$ .

Proof.

We can leverage the Markov’s inequality. Because

	$\displaystyle\mathbb{E}[\\|{\bm{x}}-\hat{{\bm{x}}}\\|]$	$\displaystyle~{}=\int_{\\|{\bm{x}}-\hat{{\bm{x}}}\\|\leq r}\\|{\bm{x}}-\hat{{\bm{% x}}}\\|\cdot p({\bm{x}})d{\bm{x}}+\int_{\\|{\bm{x}}-\hat{{\bm{x}}}\\|>r}\\|{\bm{x}% }-\hat{{\bm{x}}}\\|\cdot p({\bm{x}})d{\bm{x}}$
		$\displaystyle~{}\geq\int_{\\|{\bm{x}}-\hat{{\bm{x}}}\\|>r}\\|{\bm{x}}-\hat{{\bm{x% }}}\\|\cdot p({\bm{x}})d{\bm{x}}$
		$\displaystyle~{}\geq\int_{\\|{\bm{x}}-\hat{{\bm{x}}}\\|>r}r\cdot p({\bm{x}})d{% \bm{x}}$
		$\displaystyle~{}=r\cdot P(\\|{\bm{x}}-\hat{{\bm{x}}}\\|>r),$

we have

	$\displaystyle P(\\|{\bm{x}}-\hat{{\bm{x}}}\\|>r)$	$\displaystyle\leq\frac{\mathbb{E}[\\|{\bm{x}}-\hat{{\bm{x}}}\\|]}{r}$
		$\displaystyle=\frac{\mathbb{E}[\\|{\bm{x}}-\pi_{t}({\bm{x}})\\|]}{r}$
		$\displaystyle=\frac{T_{\pi_{t}}(p)}{r}.$

∎

Algorithm 2 Randomized Smoothing [19]

1:Sampling times for prediction

n

. Sampling times for certification

N

. Significant confidence level

\alpha

. Function

\textsc{LowerConfBound}(k,n,1-\alpha)

returns a one-sided (1-

\alpha

) lower confidence interval for the Binomial parameter

p

given that

k\sim\text{Binomial}(n,p)

2:function Predict(

f_{\text{clf}},{\bm{x}},\sigma,n,\alpha

)

3: counts

\leftarrow 0

4: for

i\in\{1,2,\ldots,n\}

y\leftarrow\textsc{ConsistencyPurification}(f_{\text{clf}},{\bm{x}},\sigma)

6: counts[y]

\leftarrow

counts[y] + 1

7: end for

\hat{y}_{A},\hat{y}_{B}\leftarrow\text{top two labels in }

counts

n_{A},n_{B}\leftarrow

counts

[\hat{y}_{A}],

counts

[\hat{y}_{B}]

10: if

\textsc{BinomTest}(n_{A},n_{A}+n_{B},\frac{1}{2})\leq\alpha

then

11: return

\hat{y}_{A}

12: else

13: return Abstain

14: end if

15:end function

16:

17:function Certify(

f_{\text{clf}},{\bm{x}},\sigma,n,N,\alpha

)

18: counts0

\leftarrow 0

19: for

i\in\{1,2,\ldots,n\}

20:

y\leftarrow\textsc{ConsistencyPurification}(f_{\text{clf}},{\bm{x}},\sigma)

21: counts0[y]

\leftarrow

counts0[y] + 1

22: end for

23:

\hat{y}_{A}\leftarrow\text{top label in }

counts0

24: counts

\leftarrow 0

25: for

i\in\{1,2,\ldots,N\}

26:

y\leftarrow\textsc{ConsistencyPurification}(f_{\text{clf}},{\bm{x}},\sigma)

27: counts[y]

\leftarrow

counts[y] + 1

28: end for

29:

\underline{p_{A}}\leftarrow\textsc{LowerConfBound}(\text{counts}[\hat{y}_{A}],% N,1-\alpha)

30: if

\underline{p_{A}}>\frac{1}{2}

then

31: return prediction

\hat{y}_{A}

and radius

\sigma\Phi^{-1}(\underline{p_{A}})

32: else

33: return Abstain

34: end if

35:end function

Appendix C Training Unconditional Consistency Model for ImageNet-64

We train an unconditional consistency model for ImageNet-64 from the public available conditional version by transiting the class embedding layers to a learnable token, initialization with average class embeddings. For each model forwarding, this token will be combined with the time embeddings for computation. After that, we train the conditional consistency model, initialized with the unconditional model’s parameters, on ImageNet-64 training set for 120k steps.

Appendix D Certified Accuracy with Fine-grained $\epsilon$

We present the detailed certified accuracy with fine-grained radius thresholds $\epsilon$ in Figure 3.

	$\displaystyle\mathbb{E}[\\|{\bm{x}}-\hat{{\bm{x}}}\\|]$	$\displaystyle~{}=\int_{\\|{\bm{x}}-\hat{{\bm{x}}}\\|\leq r}\\|{\bm{x}}-\hat{{\bm{% x}}}\\|\cdot p({\bm{x}})d{\bm{x}}+\int_{\\|{\bm{x}}-\hat{{\bm{x}}}\\|>r}\\|{\bm{x}% }-\hat{{\bm{x}}}\\|\cdot p({\bm{x}})d{\bm{x}}$
		$\displaystyle~{}\geq\int_{\\|{\bm{x}}-\hat{{\bm{x}}}\\|>r}\\|{\bm{x}}-\hat{{\bm{x% }}}\\|\cdot p({\bm{x}})d{\bm{x}}$
		$\displaystyle~{}\geq\int_{\\|{\bm{x}}-\hat{{\bm{x}}}\\|>r}r\cdot p({\bm{x}})d{% \bm{x}}$
		$\displaystyle~{}=r\cdot P(\\|{\bm{x}}-\hat{{\bm{x}}}\\|>r),$

	$\displaystyle P(\\|{\bm{x}}-\hat{{\bm{x}}}\\|>r)$	$\displaystyle\leq\frac{\mathbb{E}[\\|{\bm{x}}-\hat{{\bm{x}}}\\|]}{r}$
		$\displaystyle=\frac{\mathbb{E}[\\|{\bm{x}}-\pi_{t}({\bm{x}})\\|]}{r}$
		$\displaystyle=\frac{T_{\pi_{t}}(p)}{r}.$

Consistency Purification: Effective and Efficient Diffusion Purification towards Certified Robustness

Abstract

1 Introduction

2 Backgrounds

3 Theoretical Analysis

Example 3.1.

Definition 3.2.

Theorem 3.3.

Remark 3.4.

4 Method

4.1 Consistency Purification

4.2 Consistency Fine-tuning

5 Experiments

5.1 Experimental Settings.

5.2 Main Results.

5.3 Ablation Studies.

6 Conclusion

References

Appendix A Consistency Purification Algorithm

Appendix B Proof of Theorem 3.3

Proof.

Appendix C Training Unconditional Consistency Model for ImageNet-64

Appendix D Certified Accuracy with Fine-grained ϵitalic-ϵ\epsilonitalic_ϵ

Appendix D Certified Accuracy with Fine-grained $\epsilon$