Consistency Purification: Effective and Efficient Diffusion Purification towards Certified Robustness

Yiquan Li1  Zhongzhu Chen2∗  Kun **2∗  Jiongxiao Wang1∗Bo Li3Chaowei Xiao1
1University of Wisconsin-Madison, 2University of Michigan, Ann Arbor, 3University of Chicago
the first four authors contributed equally
Abstract

Diffusion Purification, purifying noised images with diffusion models, has been widely used for enhancing certified robustness via randomized smoothing. However, existing frameworks often grapple with the balance between efficiency and effectiveness. While the Denoising Diffusion Probabilistic Model (DDPM) offers an efficient single-step purification, it falls short in ensuring purified images reside on the data manifold. Conversely, the Stochastic Diffusion Model effectively places purified images on the data manifold but demands solving cumbersome stochastic differential equations, while its derivative, the Probability Flow Ordinary Differential Equation (PF-ODE), though solving simpler ordinary differential equations, still requires multiple computational steps. In this work, we demonstrated that an ideal purification pipeline should generate the purified images on the data manifold that are as much semantically aligned to the original images for effectiveness in one step for efficiency. Therefore, we introduced Consistency Purification, an efficiency-effectiveness Pareto superior purifier compared to the previous work. Consistency Purification employs the consistency model, a one-step generative model distilled from PF-ODE, thus can generate on-manifold purified images with a single network evaluation. However, the consistency model is designed not for purification thus it does not inherently ensure semantic alignment between purified and original images. To resolve this issue, we further refine it through Consistency Fine-tuning with LPIPS loss, which enables more aligned semantic meaning while kee** the purified images on data manifold. Our comprehensive experiments demonstrate that our Consistency Purification framework achieves state-of-the-art certified robustness and efficiency compared to baseline methods.

1 Introduction

Diffusion models were first proposed for high-quality image generation [1; 2; 3; 4; 5] and have been extended to generative tasks across various modalities, including audio [6; 7; 8], video [9; 10], and 3D object [11; 12; 13]. A diffusion model for image generation typically involves two key processes: (1) a forward diffusion process, which transforms the source image into an isotropic Gaussian by gradually adding Gaussian noise, and (2) the reverse diffusion process, which uses a Deep Neural Network (DNN) to perform iterative denoising starting from random Gaussian noise.

Due to the inherent denoising capability of diffusion models, there have been widely applied to improve the robustness of DNNs. This enhancement is achieved by Diffusion Purification [14; 15; 16; 17; 18], which purifies the network inputs to reduce the effects of various types of unforeseen corruptions or adversarial attacks. Among these, one particularly suitable and effective scenario of purification is to improve certified robustness through randomized smoothing [19] for image classification tasks. This method guarantees a tight robustness in the 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norm with a smoothed classifier. However, many previous works [19; 20; 21; 22; 23; 24] have shown that it still requires retraining with Gaussian augmented examples for each noise level to optimize the smoothed classifier. Diffusion models, capable of purifying Gaussian perturbed images before classification, can be seamlessly integrated with any base classifier to produce a smoothed classifier for arbitrary noise levels. This integration has been demonstrated to effectively enhance certified robustness, as supported by numerous studies [18; 25; 26; 27].

Refer to caption
Figure 1: An illustration of Consistency Purification framework.

However, current diffusion purification for certified robustness via randomized smoothing still faces significant trade-offs between efficiency and effectiveness. Although Denoising Diffusion Probabilistic Model (DDPM) [28] only requires one single network evaluation in the purification process [25], it generates the mean of the posterior data distribution conditioned the noisy sample, which does not necessarily locate on the data manifold and may exhibit ambiguity during classification. To further improve diffusion purification, various methods such as DensePure [26], Local Smoothing [27] and Noised Diffusion Classifiers [29] are applied. However, these methods are considerably less efficient as they require multiple times of the computational costs compared to one-step DDPM. Another promising approach involves using the Probability Flow Ordinary Differential Equation (PF-ODE) [3]. It has offered a method to accelerate the sampling process [4] and achieved a closer distribution to the original data, well balancing efficiency and effectiveness. However, several computational steps are still needed to solve the ODE numerically.

To find a Pareto superior solution in terms of efficiency and effectiveness, we introduce a new framework, Consistency Purification, which integrating consistency models into diffusion purification with Consistency Fine-tuning. The consistency model is a novel category of diffusion models that learns the trajectory of the PF-ODE that transits the data distribution to the noisy distribution. It is trained to map any point along this trajectory back to its starting point. This property is desirable for diffusion purification, as it allows images with any scale of Gaussian noise to be directly purified to the clean images. Distilled from a pre-trained diffusion model by simulating the PF-ODE trajectory, the consistency model can generate high-quality in-distribution images in a single step, thereby ensuring both efficiency and effectiveness. However, since consistency models are primarily trained for image generation, it may not suffice to guarantee that the purified image that maintains the same semantic meaning as the original image. To address this issue, we propose adding a Consistency Fine-tuning step into the purification framework, which further fine-tunes the consistency model using Learned Perceptual Image Patch Similarity (LPIPS) [30] loss, aiming to minimize the perceptual differences between the purified and original images, thereby ensuring better semantic alignment, while at the same time, ensuring the purified images still lie on the data manifold.

We show that Consistency Purification is Pareto superior compared to baselines from two aspects. First of all, compared with effective methods like DensePure [26], Local Smoothing [27] and Noised Diffusion Classifiers [29], Consistency Purification is much more efficient since it enables single-step purification. Secondly, compared with efficient method like onestep-DDPM [25], we provide both theoretical analysis and experiment results to support the effectiveness improvement of Consistency Purification. In Example 3.1, we show an one-dimensional example demonstrating that consistency model can generate on-manifold purified samples while onestep-DDPM does not have this property.

In Theorem 3.3, we show an important theoretical result that given a purifier, the lower the transport from the original distribution to the purified distribution, the higher the probability that the purified sample is sufficiently close to the original sample, and thus the better purification outcomes. Our experiment results verify that both the integration of consistency model in Consistency Purification and the further Consistency Fine-tuning decreases such transport and achieves better semantic alignment between purified samples and original samples.

Beyond the validation of our theory, we conduct comprehensive experiments to demonstrate the empirical improvements of Consistency Purification. Compared to various baseline settings, our approach has shown significant improvements, achieving an average 5% gain in performance over the previous onestep-DDPM under the same cost with single-step purification. These observations underscore our success in finding a Pareto superior diffusion purification framework in both efficiency and effectiveness for certified robustness.

2 Backgrounds

Randomized Smoothing [19]. Randomized smoothing is designed to certify the robustness of a given classifier under 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norm perturbations. Given a base classifier f𝑓fitalic_f and an input 𝒙𝒙{\bm{x}}bold_italic_x, randomized smoothing first defines the smoothed classifier by g(𝒙)=argmaxcϵ𝒩(𝟎,σ2𝑰)(f(𝒙+ϵ)=c),𝑔𝒙subscript𝑐subscriptsimilar-tobold-italic-ϵ𝒩0superscript𝜎2𝑰𝑓𝒙bold-italic-ϵ𝑐g({\bm{x}})=\arg\max_{c}\mathbb{P}_{\bm{\epsilon}\sim\mathcal{N}(\bm{0},\sigma% ^{2}\bm{I})}(f({\bm{x}}+\bm{\epsilon})=c),italic_g ( bold_italic_x ) = roman_arg roman_max start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT blackboard_P start_POSTSUBSCRIPT bold_italic_ϵ ∼ caligraphic_N ( bold_0 , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_italic_I ) end_POSTSUBSCRIPT ( italic_f ( bold_italic_x + bold_italic_ϵ ) = italic_c ) , where σ𝜎\sigmaitalic_σ is the noise level, which controls the trade-off between robustness and accuracy. [19] shows that g(𝒙)𝑔𝒙g({\bm{x}})italic_g ( bold_italic_x ) induces the certifiable robustness for 𝒙𝒙{\bm{x}}bold_italic_x under the 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norm with radius R𝑅Ritalic_R, where R=σ2(Φ1(pA)Φ1(pB)),𝑅𝜎2superscriptΦ1subscript𝑝𝐴superscriptΦ1subscript𝑝𝐵R=\frac{\sigma}{2}\left(\Phi^{-1}(p_{A})-\Phi^{-1}(p_{B})\right),italic_R = divide start_ARG italic_σ end_ARG start_ARG 2 end_ARG ( roman_Φ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_p start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ) - roman_Φ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_p start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ) ) , where pAsubscript𝑝𝐴p_{A}italic_p start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT and pBsubscript𝑝𝐵p_{B}italic_p start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT are the probability of the most probable class and “runner-up” class respectively; ΦΦ\Phiroman_Φ is the inverse of the standard Gaussian CDF. The pAsubscript𝑝𝐴p_{A}italic_p start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT and pBsubscript𝑝𝐵p_{B}italic_p start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT can be estimated with arbitrarily high confidence via the Monte Carlo method.

Continuous-Time Diffusion Model [3]. The diffusion model has two components: the diffusion process followed by the reverse process. Given an input random variable 𝒙0psimilar-tosubscript𝒙0𝑝{\bm{x}}_{0}\sim pbold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∼ italic_p, the diffusion process adds isotropic Gaussian noises to the data so that the diffused random variable at time t𝑡titalic_t is 𝒙t=αt(𝒙0+ϵt)subscript𝒙𝑡subscript𝛼𝑡subscript𝒙0subscriptbold-italic-ϵ𝑡{\bm{x}}_{t}=\sqrt{\alpha_{t}}({\bm{x}}_{0}+\bm{\epsilon}_{t})bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = square-root start_ARG italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG ( bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + bold_italic_ϵ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ), s.t., ϵt𝒩(𝟎,σt2𝑰)similar-tosubscriptbold-italic-ϵ𝑡𝒩0superscriptsubscript𝜎𝑡2𝑰\bm{\epsilon}_{t}\sim\mathcal{N}(\bm{0},\sigma_{t}^{2}{\bm{I}})bold_italic_ϵ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∼ caligraphic_N ( bold_0 , italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_italic_I ), and σt2=(1αt)/αtsuperscriptsubscript𝜎𝑡21subscript𝛼𝑡subscript𝛼𝑡\sigma_{t}^{2}=(1-\alpha_{t})/\alpha_{t}italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = ( 1 - italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) / italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, and we denote 𝒙tptsimilar-tosubscript𝒙𝑡subscript𝑝𝑡{\bm{x}}_{t}\sim p_{t}bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∼ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. The forward diffusion process can also be defined by the stochastic differential equation

d𝒙=D(𝒙,t)dt+G(t)d𝒘,d𝒙𝐷𝒙𝑡d𝑡𝐺𝑡d𝒘\mathrm{d}{\bm{x}}=D({\bm{x}},t)\mathrm{d}t+G(t)\mathrm{d}{\bm{w}},roman_d bold_italic_x = italic_D ( bold_italic_x , italic_t ) roman_d italic_t + italic_G ( italic_t ) roman_d bold_italic_w , (SDE)

where 𝒙0psimilar-tosubscript𝒙0𝑝{\bm{x}}_{0}\sim pbold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∼ italic_p, D:d×d:𝐷maps-tosuperscript𝑑superscript𝑑D:\mathbb{R}^{d}\times\mathbb{R}\mapsto\mathbb{R}^{d}italic_D : blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT × blackboard_R ↦ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT is the drift coefficient and typically has the form D(𝒙,t)=D(t)𝒙𝐷𝒙𝑡𝐷𝑡𝒙D({\bm{x}},t)=D(t){\bm{x}}italic_D ( bold_italic_x , italic_t ) = italic_D ( italic_t ) bold_italic_x. G::𝐺maps-toG:\mathbb{R}\mapsto\mathbb{R}italic_G : blackboard_R ↦ blackboard_R is the diffusion coefficient, dtd𝑡\mathrm{d}troman_d italic_t is an infinitesimal time step, and 𝒘(t)n𝒘𝑡superscript𝑛{\bm{w}}(t)\in\mathbb{R}^{n}bold_italic_w ( italic_t ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT is the standard Wiener process.

The reverse process exists and removes the added noise by solving the reverse-time SDE [31]

d𝒙=[D(t)𝒙G(t)2𝒙^logpt(𝒙)]dt+G(t)d𝒘¯,d𝒙delimited-[]𝐷𝑡𝒙𝐺superscript𝑡2subscript^𝒙subscript𝑝𝑡𝒙d𝑡𝐺𝑡d¯𝒘\mathrm{d}{{\bm{x}}}=[D(t){{\bm{x}}}-G(t)^{2}\triangledown_{\hat{{\bm{x}}}}% \log p_{t}({{\bm{x}}})]\mathrm{d}t+G(t)\mathrm{d}\overline{{\bm{w}}},roman_d bold_italic_x = [ italic_D ( italic_t ) bold_italic_x - italic_G ( italic_t ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ▽ start_POSTSUBSCRIPT over^ start_ARG bold_italic_x end_ARG end_POSTSUBSCRIPT roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x ) ] roman_d italic_t + italic_G ( italic_t ) roman_d over¯ start_ARG bold_italic_w end_ARG , (reverse-SDE)

where pt(𝒙)subscript𝑝𝑡𝒙p_{t}({\bm{x}})italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x ) denotes the marginal distribution at time t𝑡titalic_t, and 𝒘¯(t)¯𝒘𝑡\overline{{\bm{w}}}(t)over¯ start_ARG bold_italic_w end_ARG ( italic_t ) is a reverse-time standard Wiener process. [3] defined the probability flow ODE (PF ODE) which has the same marginal distribution as reverse-SDE but can be solved much faster

d𝒙=[D(t)𝒙12G(t)2𝒙logpt(𝒙)]dt.d𝒙delimited-[]𝐷𝑡𝒙12𝐺superscript𝑡2subscript𝒙subscript𝑝𝑡𝒙d𝑡\displaystyle\textstyle\mathrm{d}{{\bm{x}}}=\left[D(t){{\bm{x}}}-\frac{1}{2}G(% t)^{2}\nabla_{{{\bm{x}}}}\log p_{t}({{\bm{x}}})\right]\mathrm{d}t.roman_d bold_italic_x = [ italic_D ( italic_t ) bold_italic_x - divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_G ( italic_t ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x ) ] roman_d italic_t . (PF-ODE)

As shown in [4], the perturbation kernel of SDE has the general form

p0t(𝒙(t)𝒙(0))=𝒩(𝒙(t);s(t)𝒙(0),s(t)2σ(t)2𝐈)subscript𝑝0𝑡conditional𝒙𝑡𝒙0𝒩𝒙𝑡𝑠𝑡𝒙0𝑠superscript𝑡2𝜎superscript𝑡2𝐈\displaystyle p_{0t}(\bm{x}(t)\mid\bm{x}(0))=\mathcal{N}\left(\bm{x}(t);s(t)% \bm{x}(0),s(t)^{2}\sigma(t)^{2}\mathbf{I}\right)italic_p start_POSTSUBSCRIPT 0 italic_t end_POSTSUBSCRIPT ( bold_italic_x ( italic_t ) ∣ bold_italic_x ( 0 ) ) = caligraphic_N ( bold_italic_x ( italic_t ) ; italic_s ( italic_t ) bold_italic_x ( 0 ) , italic_s ( italic_t ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ ( italic_t ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_I ) (perturbation-kernel)

where s(t)=exp(0tf(ξ)dξ)𝑠𝑡superscriptsubscript0𝑡𝑓𝜉differential-d𝜉\textstyle s(t)=\exp\left(\int_{0}^{t}f(\xi)\mathrm{d}\xi\right)italic_s ( italic_t ) = roman_exp ( ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_f ( italic_ξ ) roman_d italic_ξ ) and σ(t)=0tg(ξ)2s(ξ)2dξ𝜎𝑡superscriptsubscript0𝑡𝑔superscript𝜉2𝑠superscript𝜉2differential-d𝜉\sigma(t)=\sqrt{\int_{0}^{t}\frac{g(\xi)^{2}}{s(\xi)^{2}}\mathrm{~{}d}\xi}italic_σ ( italic_t ) = square-root start_ARG ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT divide start_ARG italic_g ( italic_ξ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_s ( italic_ξ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG roman_d italic_ξ end_ARG. Under this formulation, PF-ODE can written as

d𝒙=[s˙(t)s(t)𝒙s(t)2σ˙(t)σ(t)𝒙logp(𝒙s(t);σ(t))]dtd𝒙delimited-[]˙𝑠𝑡𝑠𝑡𝒙𝑠superscript𝑡2˙𝜎𝑡𝜎𝑡subscript𝒙𝑝𝒙𝑠𝑡𝜎𝑡d𝑡\textstyle\mathrm{d}\bm{x}=\left[\frac{\dot{s}(t)}{s(t)}\bm{x}-s(t)^{2}\dot{% \sigma}(t)\sigma(t)\nabla_{\bm{x}}\log p\left(\frac{\bm{x}}{s(t)};\sigma(t)% \right)\right]\mathrm{d}troman_d bold_italic_x = [ divide start_ARG over˙ start_ARG italic_s end_ARG ( italic_t ) end_ARG start_ARG italic_s ( italic_t ) end_ARG bold_italic_x - italic_s ( italic_t ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over˙ start_ARG italic_σ end_ARG ( italic_t ) italic_σ ( italic_t ) ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT roman_log italic_p ( divide start_ARG bold_italic_x end_ARG start_ARG italic_s ( italic_t ) end_ARG ; italic_σ ( italic_t ) ) ] roman_d italic_t

where \cdot denotes the time derivative and p(𝒙s(t);σ(t))𝑝𝒙𝑠𝑡𝜎𝑡p\left(\frac{\bm{x}}{s(t)};\sigma(t)\right)italic_p ( divide start_ARG bold_italic_x end_ARG start_ARG italic_s ( italic_t ) end_ARG ; italic_σ ( italic_t ) ) denotes the marginal distribution at time t𝑡titalic_t. In our context, we use the EDM parameter [4] where s(t)=1𝑠𝑡1s(t)=1italic_s ( italic_t ) = 1 and σ(t)=t𝜎𝑡𝑡\sigma(t)=titalic_σ ( italic_t ) = italic_t which gives us a probability flow ODE

d𝒙=t𝒙logpt(𝒙)dt.d𝒙𝑡subscript𝒙subscript𝑝𝑡𝒙d𝑡\displaystyle\mathrm{d}{{\bm{x}}}=-t\nabla_{{{\bm{x}}}}\log p_{t}({{\bm{x}}})% \mathrm{d}t.roman_d bold_italic_x = - italic_t ∇ start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x ) roman_d italic_t . (EDM-ODE)

We use {𝒙t}t[0,1]subscriptsubscript𝒙𝑡𝑡01\{{\bm{x}}_{t}\}_{t\in[0,1]}{ bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_t ∈ [ 0 , 1 ] end_POSTSUBSCRIPT and {𝒙^t}t[0,1]subscriptsubscript^𝒙𝑡𝑡01\{\hat{\bm{x}}_{t}\}_{t\in[0,1]}{ over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_t ∈ [ 0 , 1 ] end_POSTSUBSCRIPT to denote the diffusion process and the reverse process generated by SDE and reverse-SDE respectively, which follow the same distribution. We also use {𝒙~t}t[0,1]subscriptsubscript~𝒙𝑡𝑡01\{\tilde{{\bm{x}}}_{t}\}_{t\in[0,1]}{ over~ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_t ∈ [ 0 , 1 ] end_POSTSUBSCRIPT to denote the reverse process generated by PF-ODE, which has the same marginal distribution as {𝒙t}t[0,1]subscriptsubscript𝒙𝑡𝑡01\{{\bm{x}}_{t}\}_{t\in[0,1]}{ bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_t ∈ [ 0 , 1 ] end_POSTSUBSCRIPT and {𝒙^t}t[0,1]subscriptsubscript^𝒙𝑡𝑡01\{\hat{\bm{x}}_{t}\}_{t\in[0,1]}{ over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_t ∈ [ 0 , 1 ] end_POSTSUBSCRIPT given t𝑡titalic_t.

Discrete-Time Diffusion Model (DDPM [28]). DDPM constructs a discrete Markov chain {𝒙0,𝒙1,,𝒙i,,𝒙N}subscript𝒙0subscript𝒙1subscript𝒙𝑖subscript𝒙𝑁\{{\bm{x}}_{0},{\bm{x}}_{1},\cdots,{\bm{x}}_{i},\cdots,{\bm{x}}_{N}\}{ bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , ⋯ , bold_italic_x start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT } as the forward process for the training data 𝒙0psimilar-tosubscript𝒙0𝑝{\bm{x}}_{0}\sim pbold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∼ italic_p, such that (𝒙i|𝒙i1)=𝒩(𝒙i;1βi𝒙i1,βiI)conditionalsubscript𝒙𝑖subscript𝒙𝑖1𝒩subscript𝒙𝑖1subscript𝛽𝑖subscript𝒙𝑖1subscript𝛽𝑖𝐼\mathbb{P}({\bm{x}}_{i}|{\bm{x}}_{i-1})=\mathcal{N}({\bm{x}}_{i};\sqrt{1-\beta% _{i}}{\bm{x}}_{i-1},\beta_{i}I)blackboard_P ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | bold_italic_x start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ) = caligraphic_N ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; square-root start_ARG 1 - italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG bold_italic_x start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_I ), where 0<β1<β2<<βN<10subscript𝛽1subscript𝛽2subscript𝛽𝑁10<\beta_{1}<\beta_{2}<\cdots<\beta_{N}<10 < italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT < italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT < ⋯ < italic_β start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT < 1 are predefined noise scales such that 𝒙Nsubscript𝒙𝑁{\bm{x}}_{N}bold_italic_x start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT approximates the Gaussian white noise. Denote α¯i=i=1N(1βi)subscript¯𝛼𝑖superscriptsubscriptproduct𝑖1𝑁1subscript𝛽𝑖\overline{\alpha}_{i}=\prod_{i=1}^{N}(1-\beta_{i})over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ( 1 - italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), we have (𝒙i|𝒙0)=𝒩(𝒙i;α¯i𝒙0,(1α¯i)𝑰)conditionalsubscript𝒙𝑖subscript𝒙0𝒩subscript𝒙𝑖subscript¯𝛼𝑖subscript𝒙01subscript¯𝛼𝑖𝑰\mathbb{P}({\bm{x}}_{i}|{\bm{x}}_{0})=\mathcal{N}({\bm{x}}_{i};\sqrt{\overline% {\alpha}_{i}}{\bm{x}}_{0},(1-\overline{\alpha}_{i}){\bm{I}})blackboard_P ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = caligraphic_N ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; square-root start_ARG over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ( 1 - over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) bold_italic_I ), i.e., 𝒙t(𝒙0,ϵ)=α¯i𝒙0+(1α¯i)ϵ,ϵ𝒩(𝟎,𝑰)formulae-sequencesubscript𝒙𝑡subscript𝒙0italic-ϵsubscript¯𝛼𝑖subscript𝒙01subscript¯𝛼𝑖bold-italic-ϵsimilar-tobold-italic-ϵ𝒩0𝑰{\bm{x}}_{t}({\bm{x}}_{0},\epsilon)=\sqrt{\overline{\alpha}_{i}}{\bm{x}}_{0}+(% 1-\overline{\alpha}_{i})\bm{\epsilon},\bm{\epsilon}\sim\mathcal{N}(\bm{0},{\bm% {I}})bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_ϵ ) = square-root start_ARG over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ( 1 - over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) bold_italic_ϵ , bold_italic_ϵ ∼ caligraphic_N ( bold_0 , bold_italic_I ).

The reverse process of DDPM learns a reverse direction variational Markov chain p𝜽(𝒙i1|𝒙i)=𝒩(𝒙i1;𝝁𝜽(𝒙i,i),Σ𝜽(𝒙i,i))subscript𝑝𝜽conditionalsubscript𝒙𝑖1subscript𝒙𝑖𝒩subscript𝒙𝑖1subscript𝝁𝜽subscript𝒙𝑖𝑖subscriptΣ𝜽subscript𝒙𝑖𝑖p_{\bm{\theta}}({\bm{x}}_{i-1}|{\bm{x}}_{i})=\mathcal{N}({\bm{x}}_{i-1};\bm{% \mu}_{\bm{\theta}}({\bm{x}}_{i},i),\Sigma_{\bm{\theta}}({\bm{x}}_{i},i))italic_p start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT | bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = caligraphic_N ( bold_italic_x start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ; bold_italic_μ start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_i ) , roman_Σ start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_i ) ). [28] defines ϵ𝜽subscriptbold-italic-ϵ𝜽\bm{\epsilon}_{\bm{\theta}}bold_italic_ϵ start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT as a function approximator to predict ϵbold-italic-ϵ\bm{\epsilon}bold_italic_ϵ from 𝒙isubscript𝒙𝑖{\bm{x}}_{i}bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT such that 𝝁𝜽(𝒙i,i)=11βi(𝒙iβi1α¯iϵ𝜽(𝒙i,i))subscript𝝁𝜽subscript𝒙𝑖𝑖11subscript𝛽𝑖subscript𝒙𝑖subscript𝛽𝑖1subscript¯𝛼𝑖subscriptbold-italic-ϵ𝜽subscript𝒙𝑖𝑖\bm{\mu}_{\bm{\theta}}({\bm{x}}_{i},i)=\frac{1}{\sqrt{1-\beta_{i}}}\left({\bm{% x}}_{i}-\frac{\beta_{i}}{\sqrt{1-\overline{\alpha}_{i}}}\bm{\epsilon}_{\bm{% \theta}}({\bm{x}}_{i},i)\right)bold_italic_μ start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_i ) = divide start_ARG 1 end_ARG start_ARG square-root start_ARG 1 - italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG end_ARG ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG 1 - over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG end_ARG bold_italic_ϵ start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_i ) ). Then the reverse time samples are generated by 𝒙^i1=11βi(𝒙^iβi1α¯iϵ𝜽(𝒙^i,i))+βiϵ,ϵ𝒩(𝟎,I)formulae-sequencesubscript^𝒙𝑖111subscript𝛽𝑖subscript^𝒙𝑖subscript𝛽𝑖1subscript¯𝛼𝑖subscriptbold-italic-ϵsuperscript𝜽subscript^𝒙𝑖𝑖subscript𝛽𝑖bold-italic-ϵsimilar-tobold-italic-ϵ𝒩0𝐼\hat{{\bm{x}}}_{i-1}=\frac{1}{\sqrt{1-\beta_{i}}}\left(\hat{\bm{x}}_{i}-\frac{% \beta_{i}}{\sqrt{1-\overline{\alpha}_{i}}}\bm{\epsilon}_{\bm{\theta}^{*}}(\hat% {\bm{x}}_{i},i)\right)+\sqrt{\beta_{i}}\bm{\epsilon},\bm{\epsilon}\sim\mathcal% {N}(\mathbf{0},I)over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG square-root start_ARG 1 - italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG end_ARG ( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG 1 - over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG end_ARG bold_italic_ϵ start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_i ) ) + square-root start_ARG italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG bold_italic_ϵ , bold_italic_ϵ ∼ caligraphic_N ( bold_0 , italic_I ), and the optimal parameters 𝜽superscript𝜽\bm{\theta}^{*}bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT are obtained by solving 𝜽:=argmin𝜽𝔼𝒙0,ϵ[ϵϵ𝜽(α¯i𝒙0+(1α¯i),i)22]assignsuperscript𝜽subscript𝜽subscript𝔼subscript𝒙0bold-italic-ϵdelimited-[]superscriptsubscriptnormbold-italic-ϵsubscriptbold-italic-ϵ𝜽subscript¯𝛼𝑖subscript𝒙01subscript¯𝛼𝑖𝑖22\bm{\theta}^{*}:=\arg\min_{\bm{\theta}}\mathbb{E}_{{\bm{x}}_{0},\bm{\epsilon}}% \left[\|\bm{\epsilon}-\bm{\epsilon}_{\bm{\theta}}(\sqrt{\overline{\alpha}_{i}}% {\bm{x}}_{0}+(1-\overline{\alpha}_{i}),i)\|_{2}^{2}\right]bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT := roman_arg roman_min start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_ϵ end_POSTSUBSCRIPT [ ∥ bold_italic_ϵ - bold_italic_ϵ start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( square-root start_ARG over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ( 1 - over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , italic_i ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ]. [28] also provided a one-step approximate reconstruction of 𝒙0subscript𝒙0{\bm{x}}_{0}bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT from any 𝒙tsubscript𝒙𝑡{\bm{x}}_{t}bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT,

𝒙0𝒙^0=(𝒙t1α¯tϵθ(𝒙t))/α¯t.subscript𝒙0subscript^𝒙0subscript𝒙𝑡1subscript¯𝛼𝑡subscriptbold-italic-ϵ𝜃subscript𝒙𝑡subscript¯𝛼𝑡\displaystyle{\bm{x}}_{0}\approx\hat{{\bm{x}}}_{0}=\left({\bm{x}}_{t}-\sqrt{1-% \overline{\alpha}_{t}}\bm{\epsilon}_{\theta}({\bm{x}}_{t})\right)/\sqrt{% \overline{\alpha}_{t}}.bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ≈ over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - square-root start_ARG 1 - over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG bold_italic_ϵ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) / square-root start_ARG over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG . (onestep-DDPM)

Consistency Model [32]. Given a solution trajectory of PF-ODE, the consistency model is defined as D:(𝒙t,t)𝒙ϵ:𝐷maps-tosubscript𝒙𝑡𝑡subscript𝒙italic-ϵD:({\bm{x}}_{t},t)\mapsto{\bm{x}}_{\epsilon}italic_D : ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ) ↦ bold_italic_x start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT. The model exhibits the property of self-consistency, ensuring that its outputs are consistent for arbitrary pairs of (𝒙t,t)subscript𝒙𝑡𝑡({\bm{x}}_{t},t)( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ) from the same PF-ODE trajectory; specifically, D(𝒙t,t)=D(𝒙t,t)𝐷subscript𝒙𝑡𝑡𝐷subscript𝒙superscript𝑡superscript𝑡D({\bm{x}}_{t},t)=D({\bm{x}}_{t^{\prime}},t^{\prime})italic_D ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ) = italic_D ( bold_italic_x start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) for all t,t[ϵ,T]𝑡superscript𝑡italic-ϵ𝑇t,t^{\prime}\in[\epsilon,T]italic_t , italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ [ italic_ϵ , italic_T ]. As shown by the definition, consistency models are suitable for one-shot denoising, allowing for the recovery of 𝒙ϵsubscript𝒙italic-ϵ{\bm{x}}_{\epsilon}bold_italic_x start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT from any noisy input 𝒙tsubscript𝒙𝑡{\bm{x}}_{t}bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT in one network evaluation. Two distinct training strategies can be employed for training the consistency models: distillation mode and isolation mode. The primary distinction lies in whether the models distill the knowledge from pre-trained diffusion models or train from initial parameters. According to the experiments reported in [32], consistency models trained in the distillation mode have been shown to outperform those trained in isolation mode for generating high-quality images. Consequently, our paper only considers consistency models trained in the distillation mode.

3 Theoretical Analysis

In this section, we provide theoretical explanations on the advantages of Consistency Purification, with a focus on its purification performance improvement in terms of certified robustness over [25].

As demonstrated in [3], PF-ODE maintains the marginal distribution of reverse-SDE, thereby establishing a deterministic map** between the noisy distribution 𝒙tsubscript𝒙𝑡{\bm{x}}_{t}bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and the data distribution 𝒙0subscript𝒙0{\bm{x}}_{0}bold_italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. In other words, PF-ODE guarantees that the purified sample lies on the data manifold, unlike onestep-DDPM, which lacks this assurance. We present here a simple one dimensional example for illustration.

Example 3.1.

Consider a one-dimensional space with a data set consisting of two samples {𝐲1,𝐲2}subscript𝐲1subscript𝐲2\{{\bm{y}}_{1},{\bm{y}}_{2}\}{ bold_italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT }, where 𝐲1=1subscript𝐲11{\bm{y}}_{1}=1bold_italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1 and 𝐲2=1subscript𝐲21{\bm{y}}_{2}=-1bold_italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = - 1. The distribution can be represented as a mixture of Dirac delta distributions: pdata(𝐱)=12(δ(𝐱𝐲1)+δ(𝐱𝐲2))subscript𝑝data𝐱12𝛿𝐱subscript𝐲1𝛿𝐱subscript𝐲2p_{\text{data}}({\bm{x}})=\frac{1}{2}\left(\delta({\bm{x}}-{\bm{y}}_{1})+% \delta({\bm{x}}-{\bm{y}}_{2})\right)italic_p start_POSTSUBSCRIPT data end_POSTSUBSCRIPT ( bold_italic_x ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_δ ( bold_italic_x - bold_italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) + italic_δ ( bold_italic_x - bold_italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ). By setting s(t)=1𝑠𝑡1s(t)=1italic_s ( italic_t ) = 1 and σ(t)=t𝜎𝑡𝑡\sigma(t)=titalic_σ ( italic_t ) = italic_t in perturbation-kernel, the distribution at time t𝑡titalic_t becomes: pt(𝐱)=12t2π(e12(𝐱1t)2+e12(𝐱+1t)2)subscript𝑝𝑡𝐱12𝑡2𝜋superscript𝑒12superscript𝐱1𝑡2superscript𝑒12superscript𝐱1𝑡2p_{t}({\bm{x}})=\frac{1}{2t\sqrt{2\pi}}\big{(}e^{-\frac{1}{2}\left(\frac{{\bm{% x}}-1}{t}\right)^{2}}+e^{-\frac{1}{2}\left(\frac{{\bm{x}}+1}{t}\right)^{2}}% \big{)}italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x ) = divide start_ARG 1 end_ARG start_ARG 2 italic_t square-root start_ARG 2 italic_π end_ARG end_ARG ( italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( divide start_ARG bold_italic_x - 1 end_ARG start_ARG italic_t end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( divide start_ARG bold_italic_x + 1 end_ARG start_ARG italic_t end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ). Then

dlogpt(𝒙)d𝒙dsubscript𝑝𝑡𝒙d𝒙\displaystyle\textstyle\frac{\mathrm{d}\log p_{t}({\bm{x}})}{\mathrm{d}{\bm{x}% }}~{}divide start_ARG roman_d roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG start_ARG roman_d bold_italic_x end_ARG =(𝒙1t)e12(𝒙1t2)2(𝒙+1t)e12(𝒙+1t2)22t2πpt(𝒙)absent𝒙1𝑡superscript𝑒12superscript𝒙1superscript𝑡22𝒙1𝑡superscript𝑒12superscript𝒙1superscript𝑡222𝑡2𝜋subscript𝑝𝑡𝒙\displaystyle~{}\textstyle=\frac{-\left(\frac{{\bm{x}}-1}{t}\right)e^{-\frac{1% }{2}\left(\frac{{\bm{x}}-1}{t^{2}}\right)^{2}}-\left(\frac{{\bm{x}}+1}{t}% \right)e^{-\frac{1}{2}\left(\frac{{\bm{x}}+1}{t^{2}}\right)^{2}}}{2t\sqrt{2\pi% }p_{t}({\bm{x}})}= divide start_ARG - ( divide start_ARG bold_italic_x - 1 end_ARG start_ARG italic_t end_ARG ) italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( divide start_ARG bold_italic_x - 1 end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT - ( divide start_ARG bold_italic_x + 1 end_ARG start_ARG italic_t end_ARG ) italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( divide start_ARG bold_italic_x + 1 end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_t square-root start_ARG 2 italic_π end_ARG italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG
=𝒙t2+e12(𝒙1t)2e12(𝒙+1t)2e12(𝒙1t)2+e12(𝒙+1t)2.absent𝒙superscript𝑡2superscript𝑒12superscript𝒙1𝑡2superscript𝑒12superscript𝒙1𝑡2superscript𝑒12superscript𝒙1𝑡2superscript𝑒12superscript𝒙1𝑡2\displaystyle\textstyle=~{}-\frac{{\bm{x}}}{t^{2}}+\frac{e^{-\frac{1}{2}\left(% \frac{{\bm{x}}-1}{t}\right)^{2}}-e^{-\frac{1}{2}\left(\frac{{\bm{x}}+1}{t}% \right)^{2}}}{e^{-\frac{1}{2}\left(\frac{{\bm{x}}-1}{t}\right)^{2}}+e^{-\frac{% 1}{2}\left(\frac{{\bm{x}}+1}{t}\right)^{2}}}.= - divide start_ARG bold_italic_x end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( divide start_ARG bold_italic_x - 1 end_ARG start_ARG italic_t end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT - italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( divide start_ARG bold_italic_x + 1 end_ARG start_ARG italic_t end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( divide start_ARG bold_italic_x - 1 end_ARG start_ARG italic_t end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( divide start_ARG bold_italic_x + 1 end_ARG start_ARG italic_t end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT end_ARG .

From the derivative formula dlogpt(𝐱)d𝐱dsubscript𝑝𝑡𝐱d𝐱\frac{\mathrm{d}\log p_{t}({\bm{x}})}{\mathrm{d}{\bm{x}}}divide start_ARG roman_d roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x ) end_ARG start_ARG roman_d bold_italic_x end_ARG, it’s evident that 𝐱=0𝐱0{\bm{x}}=0bold_italic_x = 0 is an equilibrium point, and the right-hand side expression is Lipschitz continuous around 𝐱=0𝐱0{\bm{x}}=0bold_italic_x = 0 by L’Hôpital’s rule. Thus, according to the Picard-Lindelöf theorem, any trajectory starting on either side of 𝐱=0𝐱0{\bm{x}}=0bold_italic_x = 0 will not cross this point. As PF-ODE drives pt(𝐱)subscript𝑝𝑡𝐱p_{t}({\bm{x}})italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x ) closer to the Dirac delta distribution pdata(𝐱)subscript𝑝data𝐱p_{\text{data}}({\bm{x}})italic_p start_POSTSUBSCRIPT data end_POSTSUBSCRIPT ( bold_italic_x ) as t𝑡titalic_t approaches zero, any initial point on positive/negative side of 𝐱=0𝐱0{\bm{x}}=0bold_italic_x = 0 will eventually approach 1111 or 11-1- 1, i.e., the data manifold. Furthermore, in this example, PF-ODE generates not only a purified sample on the data manifold but also closest to the noisy sample. This property is desirable as it establishes a relatively large "robust" neighborhood around each true data point, which implies high certified robustness and a significant certified radius, which will be further discussed later. With the consistency model, we do not need to solve the ODE but rather directly map the noisy sample to either 1/11/-11 / - 1 depending on its location relative to 𝐱=0𝐱0{\bm{x}}=0bold_italic_x = 0.

For comparison, given any 𝐱𝐱{\bm{x}}bold_italic_x and t𝑡titalic_t, the onestep-DDPM will output a posterior mean that is

e12(𝒙1t)2e12(𝒙+1t)2e12(𝒙1t)2+e12(𝒙+1t)2=e2𝒙t21e2𝒙t2+1.superscript𝑒12superscript𝒙1𝑡2superscript𝑒12superscript𝒙1𝑡2superscript𝑒12superscript𝒙1𝑡2superscript𝑒12superscript𝒙1𝑡2superscript𝑒2𝒙superscript𝑡21superscript𝑒2𝒙superscript𝑡21\displaystyle\textstyle\frac{e^{-\frac{1}{2}\left(\frac{{\bm{x}}-1}{t}\right)^% {2}}-e^{-\frac{1}{2}\left(\frac{{\bm{x}}+1}{t}\right)^{2}}}{e^{-\frac{1}{2}% \left(\frac{{\bm{x}}-1}{t}\right)^{2}}+e^{-\frac{1}{2}\left(\frac{{\bm{x}}+1}{% t}\right)^{2}}}=\textstyle\frac{e^{\frac{2{\bm{x}}}{t^{2}}}-1}{e^{\frac{2{\bm{% x}}}{t^{2}}}+1}.divide start_ARG italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( divide start_ARG bold_italic_x - 1 end_ARG start_ARG italic_t end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT - italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( divide start_ARG bold_italic_x + 1 end_ARG start_ARG italic_t end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT end_ARG start_ARG italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( divide start_ARG bold_italic_x - 1 end_ARG start_ARG italic_t end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( divide start_ARG bold_italic_x + 1 end_ARG start_ARG italic_t end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT end_ARG = divide start_ARG italic_e start_POSTSUPERSCRIPT divide start_ARG 2 bold_italic_x end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_POSTSUPERSCRIPT - 1 end_ARG start_ARG italic_e start_POSTSUPERSCRIPT divide start_ARG 2 bold_italic_x end_ARG start_ARG italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_POSTSUPERSCRIPT + 1 end_ARG .

The posterior mean will be near 1111 or 11-1- 1 only when t𝑡titalic_t is sufficiently small compared to 𝐱norm𝐱\|{\bm{x}}\|∥ bold_italic_x ∥. Otherwise, it deviates from the data manifold. In the case when t𝑡titalic_t is large, the posterior mean will be close to zero, locating in an ambiguous classification region. In adversarial purification [25; 26; 14], we typically select t𝑡titalic_t based on the variance of the noise added to the data sample rather than using an very small t𝑡titalic_t. This practice helps avoid significant deviations in the posterior mean estimation due to the imperfect estimation of score/noise. With a very small t𝑡titalic_t, even a slight bias in score/noise estimation can lead to a substantial deviation, resulting in a denoised sample even farther from the data manifold represented by pdata(𝐱)subscript𝑝data𝐱p_{\text{data}}({\bm{x}})italic_p start_POSTSUBSCRIPT data end_POSTSUBSCRIPT ( bold_italic_x ).

Additionally, PF-ODE is deterministic, eliminating the overhead of majority voting required when using reverse-SDE as a purifier [26]. The consistency model, which reduces ODE solving to a one-step map**, further ensures purification has the same efficiency as onestep-DDPM while kee** the in-distribution property.

Though the consistency model enjoys both in-distribution property and one-step efficiency, it does not guarantee that the purified sample has the same semantic meaning as the original sample. This is because the derivation of PF-ODE only guarantees a map** between noisy distribution and data distribution, which is sufficient for generation, but not enough for denoising purposes.

To address this concern, we first delineate the desired characteristics of the purifier. As evidenced in prior works [14; 25; 26; 33], an ideal purifier should yield a purified output situated within a proximate vicinity of the original input. It is generally presumed that such purified outputs retain the semantic meaning of the original inputs with a high probability. The disparity in semantic consistency between the noisy input and the purified output generated by PF-ODE arises due to the proximity of the purified output to other samples. In this regard, we propose quantifying this disparity through the notion of transport between the data distribution and the purified distribution, derived by introducing Gaussian perturbations to the data distribution and subsequently applying denoising via PF-ODE. Given an original sample 𝒙𝒙{\bm{x}}bold_italic_x, Gaussian noise ϵitalic-ϵ\epsilonitalic_ϵ, and purifier d𝑑ditalic_d, the map** in the transport process is defined as T:𝒙d(𝒙+ϵ):𝑇𝒙𝑑𝒙italic-ϵT:{\bm{x}}\rightarrow d({\bm{x}}+\epsilon)italic_T : bold_italic_x → italic_d ( bold_italic_x + italic_ϵ ), which is probabilistic. We aim to demonstrate that a diminished transport between the data distribution and the purified distribution is conducive to a higher likelihood of the purified output being situated in proximity to the original sample, thereby preserving its semantic meaning.

We will leverage the following definition.

Definition 3.2.

Given the data distribution p𝑝pitalic_p, Gaussian noise ϵitalic-ϵ\epsilonitalic_ϵ, timestep t𝑡titalic_t, and a purifier d𝑑ditalic_d, we define πt:𝒙d(𝒙+tϵ):subscript𝜋𝑡𝒙𝑑𝒙𝑡italic-ϵ\pi_{t}:{\bm{x}}\rightarrow d({\bm{x}}+t\epsilon)italic_π start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT : bold_italic_x → italic_d ( bold_italic_x + italic_t italic_ϵ ) and the “transport" under gtsubscript𝑔𝑡g_{t}italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT between the data distribution and purified distribution as Tπt(p):=𝒙πt(𝒙)p(𝒙)𝑑𝒙assignsubscript𝑇subscript𝜋𝑡𝑝norm𝒙subscript𝜋𝑡𝒙𝑝𝒙differential-d𝒙T_{\pi_{t}}(p):=\int\|{\bm{x}}-\pi_{t}({\bm{x}})\|\cdot p({\bm{x}})d{\bm{x}}italic_T start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_p ) := ∫ ∥ bold_italic_x - italic_π start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x ) ∥ ⋅ italic_p ( bold_italic_x ) italic_d bold_italic_x.

Intuitively, transport measures the distance between the original and purified samples, which should be small by an effective purifier. Below, we quantify this intuition and present our main theorem. See the detailed proof in Appendix B.

Theorem 3.3.

Given the transport Tπt(p)subscript𝑇subscript𝜋𝑡𝑝T_{\pi_{t}}(p)italic_T start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_p ) between the data distribution p𝑝pitalic_p and the corresponding purified distribution under gtsubscript𝑔𝑡g_{t}italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, then for any r>0𝑟0r>0italic_r > 0, the probability that the distance between the original sample 𝐱𝐱{\bm{x}}bold_italic_x and purified sample 𝐱^=πt(𝐱)^𝐱subscript𝜋𝑡𝐱\hat{{\bm{x}}}=\pi_{t}({\bm{x}})over^ start_ARG bold_italic_x end_ARG = italic_π start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x ) is larger than r𝑟ritalic_r is upper bounded by Tπt(p)rsubscript𝑇subscript𝜋𝑡𝑝𝑟\frac{T_{\pi_{t}}(p)}{r}divide start_ARG italic_T start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_p ) end_ARG start_ARG italic_r end_ARG.

Remark 3.4.

By Theorem 3.3, the efficacy of the purifier hinges on two crucial factors: the transport Tπt(p)subscript𝑇subscript𝜋𝑡𝑝T_{\pi_{t}}(p)italic_T start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_p ) and the radius r𝑟ritalic_r. A theoretically perfect purifier would yield zero transport; however, this is unattainable due to the inherent randomness of gtsubscript𝑔𝑡g_{t}italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. Typically, we can optimize the parameter t𝑡titalic_t to minimize the transport, denoted as T=mintTπt(p)rsuperscript𝑇subscript𝑡subscript𝑇subscript𝜋𝑡𝑝𝑟T^{*}=\min_{t}\frac{T_{\pi_{t}}(p)}{r}italic_T start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = roman_min start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT divide start_ARG italic_T start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_p ) end_ARG start_ARG italic_r end_ARG. In the context of classification tasks, the selection of r𝑟ritalic_r also depends on the robustness of the classifier; a more robust classifier allows a larger r𝑟ritalic_r to be chosen, thereby guarantee better purification efficacy.

Refer to caption
Figure 2: Transport between purified images and clean images with σ{0.25,0.5,0.75,1.0}𝜎0.250.50.751.0\sigma\in\{0.25,0.5,0.75,1.0\}italic_σ ∈ { 0.25 , 0.5 , 0.75 , 1.0 }.
FID at different σ𝜎\sigmaitalic_σ
Loss 0.25 0.5 1.0
- - 60.3 155.3 350.3
1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT 96.8 205.7 383.6
2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT 102.1 214.8 375.4
LPIPS 20.5 100.9 338.1
Table 1: FID between purified and clean images on CIFAR-10 test set for different fine-tuning loss. Images are purified at different noise levels.

For ensuring consistency in semantic meaning between the original and purified samples, it is insufficient merely to minimize their distance; it is also necessary that the purified sample resides on the data manifold, which is the in-distribution property we previously mentioned. To concurrently achieve both objectives, rather than solely focusing on minimizing the Euclidean distance between the original and purified samples, we opt to minimize the Learned Perceptual Image Patch Similarity (LPIPS) loss between them. This strategy aids in mitigating the risk of the purified sample deviating from the data manifold, thereby preserving semantic meaning. In Table 1, we show that using LPIPS is better than 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT loss for Consistency Fine-tuning when we want to guarantee the generated images are in-distribution, where lower FID scores indicate better in-distribution properties.

Figure 2 validates the effectiveness of Consistency Purification based on our results in Theorem 3.3, it shows that both the integration of consistency model in Consistency Purification and the further Consistency Fine-tuning can decrease the transport from the original distribution to the purified distribution. Specifically, we can see that Consistency Purification achieves a lower average distance from the purified sample to the original sample compared with onestep-DDPM, and Consistency Fine-tuning further decreases this average distance, indicating both components result in a lower transport and thus a better semantic alignment between purified samples and original samples.

4 Method

We propose our framework, Consistency Purification, with a further improved version using Consistency Fine-tuning.

4.1 Consistency Purification

We introduce Consistency Purification, directly applying consistency model as a purifier to integrate with a base classifier into smoothed classifier for randomized smoothing.

Following Diffusion Denoised Smoothing outlined in [25], it is necessary to establish a map** between Gaussian noise augmented images required by randomized smoothing and the noised image in the ODE trajectory of consistency model. For a given consistency model purifier Dθsubscript𝐷𝜃D_{\theta}italic_D start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT, any noisy input 𝒙t𝒩(𝒙,t2𝑰)similar-tosubscript𝒙𝑡𝒩𝒙superscript𝑡2𝑰{\bm{x}}_{t}\sim\mathcal{N}({\bm{x}},t^{2}{\bm{I}})bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∼ caligraphic_N ( bold_italic_x , italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_italic_I ) can be recovered to the trajectory’s start 𝒙ϵsubscript𝒙italic-ϵ{\bm{x}}_{\epsilon}bold_italic_x start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT by directly passing it through the model with time t𝑡titalic_t: 𝒙ϵ=Dθ(𝒙t,t)subscript𝒙italic-ϵsubscript𝐷𝜃subscript𝒙𝑡𝑡{\bm{x}}_{\epsilon}=D_{\theta}({\bm{x}}_{t},t)bold_italic_x start_POSTSUBSCRIPT italic_ϵ end_POSTSUBSCRIPT = italic_D start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ).

When comparing this to the image augmented with additive Gaussian noise 𝒙rs𝒩(𝒙,σ2𝑰)similar-tosubscript𝒙𝑟𝑠𝒩𝒙superscript𝜎2𝑰{\bm{x}}_{rs}\sim\mathcal{N}({\bm{x}},\sigma^{2}{\bm{I}})bold_italic_x start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT ∼ caligraphic_N ( bold_italic_x , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_italic_I ), which is required by randomized smoothing, we observe that 𝒙rssubscript𝒙𝑟𝑠{\bm{x}}_{rs}bold_italic_x start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT and 𝒙tsubscript𝒙𝑡{\bm{x}}_{t}bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT share the same formula when t=σ𝑡𝜎t=\sigmaitalic_t = italic_σ. However, since the variances σ{σi}i=1m𝜎superscriptsubscriptsubscript𝜎𝑖𝑖1𝑚\sigma\in\{\sigma_{i}\}_{i=1}^{m}italic_σ ∈ { italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT may not be used during the training of the consistency model, we empirically select the nearest time step t𝑡titalic_t from the discrete time steps used in training for each σ𝜎\sigmaitalic_σ.

For the entire time horizon [ϵ,T]italic-ϵ𝑇[\epsilon,T][ italic_ϵ , italic_T ] with N1𝑁1N-1italic_N - 1 sub-interval boundaries t1=ϵ<t2<<tN=Tsubscript𝑡1italic-ϵsubscript𝑡2subscript𝑡𝑁𝑇t_{1}=\epsilon<t_{2}<\cdot\cdot\cdot<t_{N}=Titalic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_ϵ < italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT < ⋯ < italic_t start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT = italic_T, the time steps used in training are computed by: ti=(ϵ1/ρ+i1/N1(T1/ρϵ1/ρ))ρ,whereρ=7t_{i}=(\epsilon^{1/\rho}+\ ^{i-1}/_{N-1}(T^{1/\rho}-\epsilon^{1/\rho}))^{\rho}% ,\ \text{where}\ \rho=7italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( italic_ϵ start_POSTSUPERSCRIPT 1 / italic_ρ end_POSTSUPERSCRIPT + start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT / start_POSTSUBSCRIPT italic_N - 1 end_POSTSUBSCRIPT ( italic_T start_POSTSUPERSCRIPT 1 / italic_ρ end_POSTSUPERSCRIPT - italic_ϵ start_POSTSUPERSCRIPT 1 / italic_ρ end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT italic_ρ end_POSTSUPERSCRIPT , where italic_ρ = 7.

Given the variance σ𝜎\sigmaitalic_σ of Gaussian noise used in randomized smoothing, we select the corresponding time step tσsubscriptsuperscript𝑡𝜎t^{*}_{\sigma}italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT for Consistency Purified Smoothing by tσ={ti|σ(ti1+ti2,ti+ti+12]}subscriptsuperscript𝑡𝜎conditional-setsubscript𝑡𝑖𝜎subscript𝑡𝑖1subscript𝑡𝑖2subscript𝑡𝑖subscript𝑡𝑖12t^{*}_{\sigma}=\{t_{i}|\sigma\in(\frac{t_{i-1}+t_{i}}{2},\frac{t_{i}+t_{i+1}}{% 2}]\}italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT = { italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_σ ∈ ( divide start_ARG italic_t start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT + italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG , divide start_ARG italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_t start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ] }.

4.2 Consistency Fine-tuning

To optimize the consistency model for aligning semantic meanings during purification, we fine-tune the purifier Dθsubscript𝐷𝜃D_{\theta}italic_D start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT by minimizing the following loss function: θ=𝔼𝒙Dθ(𝒙σ,tσ)LPIPSsubscript𝜃𝔼subscriptnorm𝒙subscript𝐷𝜃subscript𝒙𝜎subscriptsuperscript𝑡𝜎LPIPS\mathcal{L}_{\theta}=\mathbb{E}\|{\bm{x}}-D_{\theta}({\bm{x}}_{\sigma},t^{*}_{% \sigma})\|_{\text{LPIPS}}caligraphic_L start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT = blackboard_E ∥ bold_italic_x - italic_D start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT , italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT LPIPS end_POSTSUBSCRIPT, where the expectation is taken with 𝒙pdatasimilar-to𝒙subscript𝑝𝑑𝑎𝑡𝑎{\bm{x}}\sim p_{data}bold_italic_x ∼ italic_p start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT, σ𝒰{σi}i=1msimilar-to𝜎𝒰superscriptsubscriptsubscript𝜎𝑖𝑖1𝑚\sigma\sim\mathcal{U}\{\sigma_{i}\}_{i=1}^{m}italic_σ ∼ caligraphic_U { italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT, 𝒙σ𝒩(𝒙,σ2𝑰)similar-tosubscript𝒙𝜎𝒩𝒙superscript𝜎2𝑰{\bm{x}}_{\sigma}\sim\mathcal{N}({\bm{x}},\sigma^{2}{\bm{I}})bold_italic_x start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ∼ caligraphic_N ( bold_italic_x , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_italic_I ). Here LPIPS denotes the distance computed by the Learned Perceptual Image Patch Similarity [30]. pdatasubscript𝑝𝑑𝑎𝑡𝑎p_{data}italic_p start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT represents the distribution of the training data, from which clean images 𝒙𝒙{\bm{x}}bold_italic_x are sampled. 𝒰{σi}i=1m𝒰superscriptsubscriptsubscript𝜎𝑖𝑖1𝑚\mathcal{U}\{\sigma_{i}\}_{i=1}^{m}caligraphic_U { italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT denotes the uniform distribution over m𝑚mitalic_m different noise scales σisubscript𝜎𝑖\sigma_{i}italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT used for randomized smoothing. Typically, we select the scale set σi{0.25,0.5,1.0}subscript𝜎𝑖0.250.51.0\sigma_{i}\in\{\text{0.25},\text{0.5},\text{1.0}\}italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ { 0.25 , 0.5 , 1.0 }, which is commonly used to compute the certified radius via randomized smoothing.

After obtaining the fine-tuned consistency model purifier Dθsubscript𝐷superscript𝜃D_{\theta^{*}}italic_D start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT,it can replace the original model used in Consistency Purified Smoothing to purify any noised image 𝒙rssubscript𝒙𝑟𝑠{\bm{x}}_{rs}bold_italic_x start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT with Gaussian variance σisubscript𝜎𝑖\sigma_{i}italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, resulting in the final purified image 𝒙psubscript𝒙𝑝{\bm{x}}_{p}bold_italic_x start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT by 𝒙p=Dθ(𝒙rs,tσi)subscript𝒙𝑝subscript𝐷superscript𝜃subscript𝒙𝑟𝑠subscriptsuperscript𝑡subscript𝜎𝑖{\bm{x}}_{p}=D_{\theta^{*}}({\bm{x}}_{rs},t^{*}_{\sigma_{i}})bold_italic_x start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT = italic_D start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT , italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ).

We present the detailed algorithm of our Consistency Purification in Appendix A.

5 Experiments

In this section, we begin by detailing the experimental settings, followed by our main results. Additionally, we conduct ablation studies to further demonstrate the effectiveness of our framework. All experiments are conducted with 1×\times×NVIDIA RTX A5000 24GB GPU.

5.1 Experimental Settings.

Dataset. We evaluate the Consistency Purification framework on both CIFAR-10 [34] and ImageNet-64 [35]. CIFAR-10 contains 32×32323232\times 3232 × 32 pixel images across 10 different categories while ImageNet-64 includes 64×64646464\times 6464 × 64 pixel images across 1000 categories. 500 test images for CIFAR-10 are selected with balanced number of classes. Due to limited computational resources, we only select 100 test images for ImageNet-64.

Consistency Purification. For CIFAR-10, to demonstrate the effectiveness of Consistency Purification, we first perform purification with a public unconditional consistency model [36]. After that, to further improve the performance, we fine-tune the model with noise levels σ𝜎\sigmaitalic_σ sampling from {0.25,0.5,1.0}0.250.51.0\{0.25,0.5,1.0\}{ 0.25 , 0.5 , 1.0 }, shown as the (+ Consistency Fine-tuning). However, currently there is no publicly available unconditional consistency model checkpoint for the ImageNet dataset that can be used directly for purification purposes. The only available model is the conditional consistency model on ImageNet-64. Thus, here we trained an unconditional consistency model on ImageNet-64, initializing it with the existing conditional consistency model checkpoint. Details of the training process are included in Appendix C. Additionally, we also conduct Consistency Fine-tuning on ImageNet-64 model with noise levels σ{0.05,0.15,0.25}𝜎0.050.150.25\sigma\in\{0.05,0.15,0.25\}italic_σ ∈ { 0.05 , 0.15 , 0.25 }.

Baselines. For comparative analysis of CIFAR-10, we conduct baseline experiments under various settings. The first baseline involves onestep-DDPM, where we employ the 50-M unconditional improved diffusion models from [2] utilizing the one-shot denoising method [25] for purification. Given that our consistency model is distilled from an EDM model [4], we include EDM as our baselines, applying both one-shot denoising (onestep-EDM) and ODE solver (PF-ODE EDM) for purification. Additionally, we include the recent advancement in diffusion purification methods, Diffusion Calibration, as a baseline following [37], which fine-tunes the diffusion model with the guidance of classifier WideResNet28-10 to improve the purification accuracy under the specific classifier. While for ImageNet-64, due to the lack of public unconditional EDM model, we only include the comparison baseline with onestep-DDPM.

Randomized Smoothing Settings. We set N=10000𝑁10000N=10000italic_N = 10000 for both CIFAR-10 and ImageNet as the number of sampling times used in randomized smoothing. We compute the certified radius for each test example at three different noise levels σ{0.25,0.5,1.0}𝜎0.250.51.0\sigma\in\{0.25,0.5,1.0\}italic_σ ∈ { 0.25 , 0.5 , 1.0 } for CIFAR-10 and σ{0.05,0.15,0.25}𝜎0.050.150.25\sigma\in\{0.05,0.15,0.25\}italic_σ ∈ { 0.05 , 0.15 , 0.25 } for ImageNet-64. Then we calculate the proportion of test examples whose radius exceeds a specific threshold ϵitalic-ϵ\epsilonitalic_ϵ. The highest accuracy among these noise levels is reported as the certified accuracy at ϵitalic-ϵ\epsilonitalic_ϵ.

Classifiers. For the classifier used after purification for CIFAR-10, we employ ViT-B/16 model [38], which is pretrained on ImageNet-21k [35] and finetuned on CIFAR-10 dataset. In our ablation studies, we also use ResNet [39] and WideResNet [40] trained on CIFAR-10. For ImageNet-64, we make up-sampling on the 64×\times×64 images and directly apply ViT-B/16 as the classifier.

5.2 Main Results.

We present the certified accuracy of Consistency Purification for both CIFAR-10 and ImageNet-64 dataset, with the results presented in Table 2. We also include the purification steps which decide whether the purifier needs multiple evaluation times through the networks (Multi Steps) other than a single network evaluation (One Step). As observed from Table 2, Consistency Purification significantly outperforms onestep-DDPM for both CIFAR-10 and ImageNet-64 with even higher certified accuracy with Consistency Fine-tuning. Besides, for CIFAR-10, the results also suggest the effectiveness of Consistency Purification with Consistency Fine-tuning when compared with more baseline methods such as onestep-EDM, PF-ODE EDM and Diffusion Calibration. We also present a detailed certified accuracy evaluation for fine-grained ϵitalic-ϵ\epsilonitalic_ϵ at different noise levels σ𝜎\sigmaitalic_σ compared with onestep-DDPM in Figure 3 of Appendix D. All results have demonstrated that Consistency Purification is able to certify the robustness with both efficiency and effectiveness.

Table 2: Certified Accuracy of Consistency Purification for CIFAR-10 and ImageNet-64.
CIFAR-10 Certified Accuracy at ϵitalic-ϵ\epsilonitalic_ϵ (%)
Method Purification Steps 0.0 0.25 0.5 0.75 1.0
onestep-DDPM[25] One Step 87.6 73.6 55.6 39.2 29.6
onestep-EDM One Step 87.4 76.2 58.8 40.8 32.4
PF-ODE EDM Multi Steps 89.6 77.0 60.4 42.6 34.0
Diffusion Calibration[37] One Step 90.2 76.4 57.2 42.6 32.4
Consistency Purification One Step 90.4 77.2 59.8 42.8 33.2
+ Consistency Fine-tuning One Step 90.2 79.4 62.4 43.8 35.4
ImageNet-64 Certified Accuracy at ϵitalic-ϵ\epsilonitalic_ϵ (%)
Method Purification Steps 0.0 0.05 0.15 0.25 0.35
onestep-DDPM [25] One Step 53.0 44.0 32.0 15.0 7.0
Consistency Purification One Step 61.0 52.0 34.0 19.0 13.0
+ Consistency Fine-tuning One Step 69.0 57.0 35.0 21.0 16.0

5.3 Ablation Studies.

We conduct various ablation studies to evaluate the effectiveness of our proposed method.

Fine-tuning Loss Functions. To further demonstrate that LPIPS loss is the best choice considering both on-manifold purification and semantic meaning alignment, we assess the certified accuracy of Consistency Purification using different loss functions during Consistency Fine-tuning. Instead of LPIPS distance between the clean and purified images as the loss function, we experiment with 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT distances. Results in Table 5.3 indicate that Consistency Purification with LPIPS loss achieves the highest Certified Accuracy. In contrast, fine-tuning with 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT distances compromises the purification performance for certification. This demonstrates that fine-tuning with LPIPS loss function effectively aligns semantic meanings, whereas 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT or 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT distances may hurt them.

Noise Levels Sampling Schedules during Consistency Fine-tuning. In our experiments of Consistency Fine-tuning, we simply select the same sampling schedules of noise levels σ𝒰{0.25,0.5,1.0}similar-to𝜎𝒰0.250.51.0\sigma\sim\mathcal{U}\{0.25,0.5,1.0\}italic_σ ∼ caligraphic_U { 0.25 , 0.5 , 1.0 }, uniformly sampling σ𝜎\sigmaitalic_σ used in randomized smoothing. To empirically demonstrate its effectiveness, we compare this approach with continuous sampling schedules where σ𝒰[0,1]similar-to𝜎𝒰01\sigma\sim\mathcal{U}[0,1]italic_σ ∼ caligraphic_U [ 0 , 1 ]. Results presented in Table 5.3 show that our discrete sampling schedule achieves higher certified accuracy. This indicates that fine-tuning with a discrete scale, aligned with the noise levels used in randomized smoothing, enhances certified robustness.

Table 3: Certified Accuracy of Consistency Purification with different loss functions during fine-tuning for CIFAR-10. "- -" represents the setting without fine-tuning.
Certified Accuracy at ϵitalic-ϵ\epsilonitalic_ϵ%
Distance 0.0 0.25 0.5 0.75 1.0
- - 90.4 77.2 59.8 42.8 33.2
1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT 89.4 76.4 59.6 42.4 31.4
2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT 90.0 77.0 59.8 42.4 33.4
LPIPS 90.2 79.4 62.4 43.8 35.4
Table 4: Certified Accuracy of Consistency Purification with continuous and discrete sampling schedules during fine-tuning for CIFAR-10. "- -" represents the setting without fine-tuning.
Certified Accuracy at ϵitalic-ϵ\epsilonitalic_ϵ%
Schedules 0.0 0.25 0.5 0.75 1.0
- - 90.4 77.2 59.8 42.8 33.2
[[[[0,1]]]] 89.0 76.2 59.8 43.2 33.8
{0.25, 0.5, 1.0} 90.2 79.4 62.4 43.8 35.4

Generalizability with Different Classifiers. We compute certified accuracy with various classifiers to test if our framework maintains its effectiveness with arbitrary classifiers. The results, presented in Table 6, compare Consistency Fine-tuning with Diffusion Calibration, an alternative method to fine-tune diffusion models for improving the certified robustness. When evaluated across different classifiers, including ViT-B/16, ResNet56, and WideNet28-10, our method outperforms Diffusion Calibration except certified accuracy at ϵ=0.0italic-ϵ0.0\epsilon=0.0italic_ϵ = 0.0 on WRN28-10 model. It is worth noting that the Diffusion Calibration, which requires a specific classifier for guidance during fine-tuning, exhibits limitations, only achieving comparable performance with the guidance classifier WRN28-10. This demonstrates the advantages of Consistency Fine-tuning in generalizing across different classifiers.

Fine-tuning Classifier vs. Fine-tuning Diffusion Model. A potential concern with Consistency Fine-tuning is the higher certified accuracy and lower training cost associated with Fine-tuning the Classifier (CLS-FT) compared to our approach of Fine-tuning the Diffusion Model (DM-FT). However, our experiments, as shown in Table 6, indicate that DM-FT does not conflict with CLS-FT; rather, combining these two methods achieves even higher certified accuracy. On another hand, although CLS-FT yield slightly higher certified accuracy than DM-FT, its requirement for fine-tuning a specific classifier compromises the natural property of diffusion purification frameworks with arbitrary off-the-shelf classifiers, thus limiting the practical applicability.

Table 5: Certified Accuracy of Consistency Fine-tuning with different classifiers on CIFAR-10. The guidance classifier used in Diffusion Calibration is WideResNet28-10.
Certified Accuracy at ϵitalic-ϵ\epsilonitalic_ϵ%
Method Classifier 0.0 0.25 0.5 0.75 1.0
ViT-B/16 90.2 76.4 57.2 42.6 32.4
Diffusion Calibration [37] WRN28-10 88.2 76.4 59.2 42.0 31.8
ResNet56 86.0 72.8 52.6 35.2 25.8
ViT-B/16 90.2 79.4 62.4 43.8 35.4
Consistency Fine-tuning WRN28-10 88.0 76.4 59.8 42.8 33.0
ResNet56 87.2 74.8 57.6 38.2 30.2
Table 6: Certified Accuracy of Fine-tuning the Diffusion Model (DM-FT) compared with Fine-tuning the Classifier (CLS-FT) in diffusion purification frameworks on CIFAR-10.
Certified Accuracy at ϵitalic-ϵ\epsilonitalic_ϵ%
DM-FT CLS-FT 0.0 0.25 0.5 0.75 1.0
- - 90.4 77.2 59.8 42.8 33.2
- 90.2 79.4 62.4 43.8 35.4
- 90.4 79.8 63.4 44.2 35.2
90.8 80.0 64.8 44.6 36.8

6 Conclusion

In this paper, we introduced Consistency Purification, a novel framework proposed to enhance certified robustness via randomized smoothing. By incorporating consistency models into diffusion purification approach and further refining them through Consistency Fine-tuning, our empirical experiments have demonstrate the framework’s ability to achieve high certified robustness efficiently with one single network evaluation for purification.

Limitations. A notable limitation of our study is that our empirical results do not include computing certified robustness of high-resolution images such as ImageNet 256×\times×256. This constraint is due to the absence of publicly available checkpoints for the consistency model at this resolution. Additionally, training a consistency model for ImageNet 256×\times×256 would require huge computing resources, which are currently beyond our affordability. However, our framework is designed for adaptability and could be easily extended to ImageNet 256×\times×256 once these checkpoints become available. As a result, our empirical evaluations in this paper are limited to the CIFAR-10 and ImageNet 64×\times×64 datasets.

References

  • [1] Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  • [2] Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pages 8162–8171. PMLR, 2021.
  • [3] Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021.
  • [4] Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models. Advances in Neural Information Processing Systems, 35:26565–26577, 2022.
  • [5] Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
  • [6] Zhifeng Kong, Wei **, Jiaji Huang, Kexin Zhao, and Bryan Catanzaro. Diffwave: A versatile diffusion model for audio synthesis. In International Conference on Learning Representations, 2020.
  • [7] Nanxin Chen, Yu Zhang, Heiga Zen, Ron J Weiss, Mohammad Norouzi, and William Chan. Wavegrad: Estimating gradients for waveform generation. In International Conference on Learning Representations, 2020.
  • [8] Vadim Popov, Ivan Vovk, Vladimir Gogoryan, Tasnima Sadekova, and Mikhail Kudinov. Grad-tts: A diffusion probabilistic model for text-to-speech. In International Conference on Machine Learning, pages 8599–8608. PMLR, 2021.
  • [9] Jonathan Ho, Tim Salimans, Alexey Gritsenko, William Chan, Mohammad Norouzi, and David J Fleet. Video diffusion models. Advances in Neural Information Processing Systems, 35:8633–8646, 2022.
  • [10] Jonathan Ho, William Chan, Chitwan Saharia, Jay Whang, Ruiqi Gao, Alexey Gritsenko, Diederik P Kingma, Ben Poole, Mohammad Norouzi, David J Fleet, et al. Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303, 2022.
  • [11] Shitong Luo and Wei Hu. Diffusion probabilistic models for 3d point cloud generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2837–2845, 2021.
  • [12] Animesh Karnewar, Andrea Vedaldi, David Novotny, and Niloy J Mitra. Holodiffusion: Training a 3d diffusion model using 2d images. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 18423–18433, 2023.
  • [13] Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Mildenhall. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988, 2022.
  • [14] Weili Nie, Brandon Guo, Yujia Huang, Chaowei Xiao, Arash Vahdat, and Anima Anandkumar. Diffusion models for adversarial purification. In International Conference on Machine Learning (ICML), 2022.
  • [15] Shutong Wu, Jiongxiao Wang, Wei **, Weili Nie, and Chaowei Xiao. Defending against adversarial audio via diffusion model. In The Eleventh International Conference on Learning Representations, 2022.
  • [16] Jiachen Sun, Jiongxiao Wang, Weili Nie, Zhiding Yu, Zhuoqing Mao, and Chaowei Xiao. A critical revisit of adversarial robustness in 3d point cloud recognition with diffusion-driven purification. In International Conference on Machine Learning, pages 33100–33114. PMLR, 2023.
  • [17] **yi Wang, Zhaoyang Lyu, Dahua Lin, Bo Dai, and Hongfei Fu. Guided diffusion model for adversarial purification. arXiv preprint arXiv:2205.14969, 2022.
  • [18] Quanlin Wu, Hang Ye, and Yuntian Gu. Guided diffusion model for adversarial purification from random noise. arXiv preprint arXiv:2206.10875, 2022.
  • [19] Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certified adversarial robustness via randomized smoothing. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 1310–1320. PMLR, 09–15 Jun 2019.
  • [20] Hadi Salman, Jerry Li, Ilya Razenshteyn, Pengchuan Zhang, Huan Zhang, Sebastien Bubeck, and Greg Yang. Provably robust deep learning via adversarially trained smoothed classifiers. Advances in Neural Information Processing Systems, 32, 2019.
  • [21] Runtian Zhai, Chen Dan, Di He, Huan Zhang, Boqing Gong, Pradeep Ravikumar, Cho-Jui Hsieh, and Liwei Wang. Macer: Attack-free and scalable robust training via maximizing certified radius. arXiv preprint arXiv:2001.02378, 2020.
  • [22] Jongheon Jeong and **woo Shin. Consistency regularization for certified robustness of smoothed classifiers. Advances in Neural Information Processing Systems, 33:10558–10570, 2020.
  • [23] Miklós Z Horváth, Mark Niklas Müller, Marc Fischer, and Martin Vechev. Boosting randomized smoothing with variance reduced classifiers. arXiv preprint arXiv:2106.06946, 2021.
  • [24] Jongheon Jeong, Sejun Park, Minkyu Kim, Heung-Chang Lee, Do-Guk Kim, and **woo Shin. Smoothmix: Training confidence-calibrated smoothed classifiers for certified robustness. Advances in Neural Information Processing Systems, 34:30153–30168, 2021.
  • [25] Nicholas Carlini, Florian Tramer, J Zico Kolter, et al. (certified!!) adversarial robustness for free! arXiv preprint arXiv:2206.10550, 2022.
  • [26] Chaowei Xiao, Zhongzhu Chen, Kun **, Jiongxiao Wang, Weili Nie, Mingyan Liu, Anima Anandkumar, Bo Li, and Dawn Song. Densepure: Understanding diffusion models for adversarial robustness. In The Eleventh International Conference on Learning Representations, 2022.
  • [27] Jiawei Zhang, Zhongzhu Chen, Huan Zhang, Chaowei Xiao, and Bo Li. {{\{{DiffSmooth}}\}}: Certifiably robust learning via diffusion models and local smoothing. In 32nd USENIX Security Symposium (USENIX Security 23), pages 4787–4804, 2023.
  • [28] Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models, 2020.
  • [29] Huanran Chen, Yinpeng Dong, Shitong Shao, Zhongkai Hao, Xiao Yang, Hang Su, and Jun Zhu. Your diffusion model is secretly a certifiably robust classifier. arXiv preprint arXiv:2402.02316, 2024.
  • [30] Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018.
  • [31] Brian DO Anderson. Reverse-time diffusion equation models. Stochastic Processes and their Applications, 12(3):313–326, 1982.
  • [32] Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models. In Proceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org, 2023.
  • [33] Huan Zhang, Tsui-Wei Weng, Pin-Yu Chen, Cho-Jui Hsieh, and Luca Daniel. Efficient neural network robustness certification with general activation functions. In NeurIPS, 2018.
  • [34] Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009.
  • [35] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  • [36] Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models. In International Conference on Machine Learning, pages 32211–32252. PMLR, 2023.
  • [37] Jongheon Jeong and **woo Shin. Multi-scale diffusion denoised smoothing. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  • [38] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  • [39] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  • [40] Sergey Zagoruyko and Nikos Komodakis. Wide residual networks, 2017.

Appendix A Consistency Purification Algorithm

We provide detailed descriptions of Consistency Purification in the following algorithms. Algorithm 1 presents the function of Consistency Fine-tuning and Consistency Purification respectively. Algorithm 2 shows the randomized smoothing algorithm from [19] with applying Consistency Purification to do prediction and compute the certified radius.

Algorithm 1 Consistency Fine-tuning and Consistency Purification
1:Consistency model purifier Dθsubscript𝐷𝜃D_{\theta}italic_D start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT where θ𝜃\thetaitalic_θ represents the model parameters. Noise levels used in randomized smoothing {σi}i=1msuperscriptsubscriptsubscript𝜎𝑖𝑖1𝑚\{\sigma_{i}\}_{i=1}^{m}{ italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT. Arbitrary classification model fclfsubscript𝑓clff_{\text{clf}}italic_f start_POSTSUBSCRIPT clf end_POSTSUBSCRIPT. Fine-tuning learning rate η𝜂\etaitalic_η.
2:function ConsistencyFine-tuning(Dθsubscript𝐷𝜃D_{\theta}italic_D start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT)
3:     repeat
4:         sample x𝑥absentx\initalic_x ∈ Training Dataset, σ{σi}i=1m𝜎superscriptsubscriptsubscript𝜎𝑖𝑖1𝑚\sigma\in\{\sigma_{i}\}_{i=1}^{m}italic_σ ∈ { italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT
5:         xσx+𝒩(0,σ2𝑰)subscript𝑥𝜎𝑥𝒩0superscript𝜎2𝑰x_{\sigma}\leftarrow x+\mathcal{N}(0,\sigma^{2}{\bm{I}})italic_x start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ← italic_x + caligraphic_N ( 0 , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_italic_I )
6:         tσGetTimestep(σ)subscriptsuperscript𝑡𝜎GetTimestep𝜎t^{*}_{\sigma}\leftarrow\textsc{GetTimestep}(\sigma)italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ← GetTimestep ( italic_σ )
7:         LPIPS(x,Dθ(xσ,t))LPIPS𝑥subscript𝐷𝜃subscript𝑥𝜎superscript𝑡\mathcal{L}\leftarrow\text{LPIPS}(x,D_{\theta}(x_{\sigma},t^{*}))caligraphic_L ← LPIPS ( italic_x , italic_D start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT , italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) )
8:         θθηθ𝜃𝜃𝜂subscript𝜃\theta\leftarrow\theta-\eta\nabla_{\theta}\mathcal{L}italic_θ ← italic_θ - italic_η ∇ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT caligraphic_L
9:     until convergence
10:     return Dθsubscript𝐷𝜃D_{\theta}italic_D start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT
11:end function
12:
13:function ConsistencyPurification(fclf,x,σsubscript𝑓clf𝑥𝜎f_{\text{clf}},x,\sigmaitalic_f start_POSTSUBSCRIPT clf end_POSTSUBSCRIPT , italic_x , italic_σ)
14:     tσGetTimestep(σ)subscriptsuperscript𝑡𝜎GetTimestep𝜎t^{*}_{\sigma}\leftarrow\textsc{GetTimestep}(\sigma)italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ← GetTimestep ( italic_σ )
15:     𝒙rs𝒙+𝒩(0,σ2I)subscript𝒙𝑟𝑠𝒙𝒩0superscript𝜎2𝐼{\bm{x}}_{rs}\leftarrow{\bm{x}}+\mathcal{N}(0,\sigma^{2}I)bold_italic_x start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT ← bold_italic_x + caligraphic_N ( 0 , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I )
16:     𝒙pDθ(𝒙rs,tσ)subscript𝒙𝑝subscript𝐷superscript𝜃subscript𝒙𝑟𝑠subscriptsuperscript𝑡𝜎{\bm{x}}_{p}\leftarrow D_{\theta^{*}}({\bm{x}}_{rs},t^{*}_{\sigma})bold_italic_x start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ← italic_D start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT , italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT )
17:     yfclf(𝒙p)𝑦subscript𝑓clfsubscript𝒙𝑝y\leftarrow f_{\text{clf}}({\bm{x}}_{p})italic_y ← italic_f start_POSTSUBSCRIPT clf end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT )
18:     return y𝑦yitalic_y
19:end function
20:
21:function GetTimestep(σ𝜎\sigmaitalic_σ)
22:     ti(ϵ1/ρ+i1N1(T1/ρϵ1/ρ))ρsubscript𝑡𝑖superscriptsuperscriptitalic-ϵ1𝜌𝑖1𝑁1superscript𝑇1𝜌superscriptitalic-ϵ1𝜌𝜌t_{i}\leftarrow(\epsilon^{1/\rho}+\frac{i-1}{N-1}(T^{1/\rho}-\epsilon^{1/\rho}% ))^{\rho}italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ← ( italic_ϵ start_POSTSUPERSCRIPT 1 / italic_ρ end_POSTSUPERSCRIPT + divide start_ARG italic_i - 1 end_ARG start_ARG italic_N - 1 end_ARG ( italic_T start_POSTSUPERSCRIPT 1 / italic_ρ end_POSTSUPERSCRIPT - italic_ϵ start_POSTSUPERSCRIPT 1 / italic_ρ end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT italic_ρ end_POSTSUPERSCRIPT for i{1,,N}𝑖1𝑁i\in\{1,\ldots,N\}italic_i ∈ { 1 , … , italic_N }
23:     tσsubscriptsuperscript𝑡𝜎absentt^{*}_{\sigma}\leftarrowitalic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ← find {ti|σ(ti1+ti2,ti+ti+12]}conditional-setsubscript𝑡𝑖𝜎subscript𝑡𝑖1subscript𝑡𝑖2subscript𝑡𝑖subscript𝑡𝑖12\{t_{i}|\sigma\in\left(\frac{t_{i-1}+t_{i}}{2},\frac{t_{i}+t_{i+1}}{2}\right]\}{ italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_σ ∈ ( divide start_ARG italic_t start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT + italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG , divide start_ARG italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_t start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ] }
24:     return tσsubscriptsuperscript𝑡𝜎t^{*}_{\sigma}italic_t start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT
25:end function

Appendix B Proof of Theorem 3.3

Theorem 3.3. Given the transport Tπt(p)subscript𝑇subscript𝜋𝑡𝑝T_{\pi_{t}}(p)italic_T start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_p ) between the data distribution p𝑝pitalic_p and the corresponding purified distribution under gtsubscript𝑔𝑡g_{t}italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, then for any r>0𝑟0r>0italic_r > 0, the probability that the distance between the original sample 𝐱𝐱{\bm{x}}bold_italic_x and purified sample 𝐱^=πt(𝐱)^𝐱subscript𝜋𝑡𝐱\hat{{\bm{x}}}=\pi_{t}({\bm{x}})over^ start_ARG bold_italic_x end_ARG = italic_π start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x ) is larger than r𝑟ritalic_r is upper bounded by Tπt(p)rsubscript𝑇subscript𝜋𝑡𝑝𝑟\frac{T_{\pi_{t}}(p)}{r}divide start_ARG italic_T start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_p ) end_ARG start_ARG italic_r end_ARG.

Proof.

We can leverage the Markov’s inequality. Because

𝔼[𝒙𝒙^]𝔼delimited-[]norm𝒙^𝒙\displaystyle\mathbb{E}[\|{\bm{x}}-\hat{{\bm{x}}}\|]blackboard_E [ ∥ bold_italic_x - over^ start_ARG bold_italic_x end_ARG ∥ ] =𝒙𝒙^r𝒙𝒙^p(𝒙)𝑑𝒙+𝒙𝒙^>r𝒙𝒙^p(𝒙)𝑑𝒙absentsubscriptnorm𝒙^𝒙𝑟norm𝒙^𝒙𝑝𝒙differential-d𝒙subscriptnorm𝒙^𝒙𝑟norm𝒙^𝒙𝑝𝒙differential-d𝒙\displaystyle~{}=\int_{\|{\bm{x}}-\hat{{\bm{x}}}\|\leq r}\|{\bm{x}}-\hat{{\bm{% x}}}\|\cdot p({\bm{x}})d{\bm{x}}+\int_{\|{\bm{x}}-\hat{{\bm{x}}}\|>r}\|{\bm{x}% }-\hat{{\bm{x}}}\|\cdot p({\bm{x}})d{\bm{x}}= ∫ start_POSTSUBSCRIPT ∥ bold_italic_x - over^ start_ARG bold_italic_x end_ARG ∥ ≤ italic_r end_POSTSUBSCRIPT ∥ bold_italic_x - over^ start_ARG bold_italic_x end_ARG ∥ ⋅ italic_p ( bold_italic_x ) italic_d bold_italic_x + ∫ start_POSTSUBSCRIPT ∥ bold_italic_x - over^ start_ARG bold_italic_x end_ARG ∥ > italic_r end_POSTSUBSCRIPT ∥ bold_italic_x - over^ start_ARG bold_italic_x end_ARG ∥ ⋅ italic_p ( bold_italic_x ) italic_d bold_italic_x
𝒙𝒙^>r𝒙𝒙^p(𝒙)𝑑𝒙absentsubscriptnorm𝒙^𝒙𝑟norm𝒙^𝒙𝑝𝒙differential-d𝒙\displaystyle~{}\geq\int_{\|{\bm{x}}-\hat{{\bm{x}}}\|>r}\|{\bm{x}}-\hat{{\bm{x% }}}\|\cdot p({\bm{x}})d{\bm{x}}≥ ∫ start_POSTSUBSCRIPT ∥ bold_italic_x - over^ start_ARG bold_italic_x end_ARG ∥ > italic_r end_POSTSUBSCRIPT ∥ bold_italic_x - over^ start_ARG bold_italic_x end_ARG ∥ ⋅ italic_p ( bold_italic_x ) italic_d bold_italic_x
𝒙𝒙^>rrp(𝒙)𝑑𝒙absentsubscriptnorm𝒙^𝒙𝑟𝑟𝑝𝒙differential-d𝒙\displaystyle~{}\geq\int_{\|{\bm{x}}-\hat{{\bm{x}}}\|>r}r\cdot p({\bm{x}})d{% \bm{x}}≥ ∫ start_POSTSUBSCRIPT ∥ bold_italic_x - over^ start_ARG bold_italic_x end_ARG ∥ > italic_r end_POSTSUBSCRIPT italic_r ⋅ italic_p ( bold_italic_x ) italic_d bold_italic_x
=rP(𝒙𝒙^>r),absent𝑟𝑃norm𝒙^𝒙𝑟\displaystyle~{}=r\cdot P(\|{\bm{x}}-\hat{{\bm{x}}}\|>r),= italic_r ⋅ italic_P ( ∥ bold_italic_x - over^ start_ARG bold_italic_x end_ARG ∥ > italic_r ) ,

we have

P(𝒙𝒙^>r)𝑃norm𝒙^𝒙𝑟\displaystyle P(\|{\bm{x}}-\hat{{\bm{x}}}\|>r)italic_P ( ∥ bold_italic_x - over^ start_ARG bold_italic_x end_ARG ∥ > italic_r ) 𝔼[𝒙𝒙^]rabsent𝔼delimited-[]norm𝒙^𝒙𝑟\displaystyle\leq\frac{\mathbb{E}[\|{\bm{x}}-\hat{{\bm{x}}}\|]}{r}≤ divide start_ARG blackboard_E [ ∥ bold_italic_x - over^ start_ARG bold_italic_x end_ARG ∥ ] end_ARG start_ARG italic_r end_ARG
=𝔼[𝒙πt(𝒙)]rabsent𝔼delimited-[]norm𝒙subscript𝜋𝑡𝒙𝑟\displaystyle=\frac{\mathbb{E}[\|{\bm{x}}-\pi_{t}({\bm{x}})\|]}{r}= divide start_ARG blackboard_E [ ∥ bold_italic_x - italic_π start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_italic_x ) ∥ ] end_ARG start_ARG italic_r end_ARG
=Tπt(p)r.absentsubscript𝑇subscript𝜋𝑡𝑝𝑟\displaystyle=\frac{T_{\pi_{t}}(p)}{r}.= divide start_ARG italic_T start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_p ) end_ARG start_ARG italic_r end_ARG .

Algorithm 2 Randomized Smoothing [19]
1:Sampling times for prediction n𝑛nitalic_n. Sampling times for certification N𝑁Nitalic_N. Significant confidence level α𝛼\alphaitalic_α. Function LowerConfBound(k,n,1α)LowerConfBound𝑘𝑛1𝛼\textsc{LowerConfBound}(k,n,1-\alpha)LowerConfBound ( italic_k , italic_n , 1 - italic_α ) returns a one-sided (1-α𝛼\alphaitalic_α) lower confidence interval for the Binomial parameter p𝑝pitalic_p given that kBinomial(n,p)similar-to𝑘Binomial𝑛𝑝k\sim\text{Binomial}(n,p)italic_k ∼ Binomial ( italic_n , italic_p ).
2:function Predict(fclf,𝒙,σ,n,αsubscript𝑓clf𝒙𝜎𝑛𝛼f_{\text{clf}},{\bm{x}},\sigma,n,\alphaitalic_f start_POSTSUBSCRIPT clf end_POSTSUBSCRIPT , bold_italic_x , italic_σ , italic_n , italic_α)
3:     counts 0absent0\leftarrow 0← 0
4:     for i{1,2,,n}𝑖12𝑛i\in\{1,2,\ldots,n\}italic_i ∈ { 1 , 2 , … , italic_n } do
5:         yConsistencyPurification(fclf,𝒙,σ)𝑦ConsistencyPurificationsubscript𝑓clf𝒙𝜎y\leftarrow\textsc{ConsistencyPurification}(f_{\text{clf}},{\bm{x}},\sigma)italic_y ← ConsistencyPurification ( italic_f start_POSTSUBSCRIPT clf end_POSTSUBSCRIPT , bold_italic_x , italic_σ )
6:         counts[y] \leftarrow counts[y] + 1
7:     end for
8:     y^A,y^Btop two labels in subscript^𝑦𝐴subscript^𝑦𝐵top two labels in \hat{y}_{A},\hat{y}_{B}\leftarrow\text{top two labels in }over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT , over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ← top two labels incounts
9:     nA,nBsubscript𝑛𝐴subscript𝑛𝐵absentn_{A},n_{B}\leftarrowitalic_n start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ←counts[y^A],delimited-[]subscript^𝑦𝐴[\hat{y}_{A}],[ over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ] ,counts[y^B]delimited-[]subscript^𝑦𝐵[\hat{y}_{B}][ over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ]
10:     if BinomTest(nA,nA+nB,12)αBinomTestsubscript𝑛𝐴subscript𝑛𝐴subscript𝑛𝐵12𝛼\textsc{BinomTest}(n_{A},n_{A}+n_{B},\frac{1}{2})\leq\alphaBinomTest ( italic_n start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT , italic_n start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT + italic_n start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT , divide start_ARG 1 end_ARG start_ARG 2 end_ARG ) ≤ italic_α then
11:         return y^Asubscript^𝑦𝐴\hat{y}_{A}over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT
12:     else
13:         return Abstain
14:     end if
15:end function
16:
17:function Certify(fclf,𝒙,σ,n,N,αsubscript𝑓clf𝒙𝜎𝑛𝑁𝛼f_{\text{clf}},{\bm{x}},\sigma,n,N,\alphaitalic_f start_POSTSUBSCRIPT clf end_POSTSUBSCRIPT , bold_italic_x , italic_σ , italic_n , italic_N , italic_α)
18:     counts0 0absent0\leftarrow 0← 0
19:     for i{1,2,,n}𝑖12𝑛i\in\{1,2,\ldots,n\}italic_i ∈ { 1 , 2 , … , italic_n } do
20:         yConsistencyPurification(fclf,𝒙,σ)𝑦ConsistencyPurificationsubscript𝑓clf𝒙𝜎y\leftarrow\textsc{ConsistencyPurification}(f_{\text{clf}},{\bm{x}},\sigma)italic_y ← ConsistencyPurification ( italic_f start_POSTSUBSCRIPT clf end_POSTSUBSCRIPT , bold_italic_x , italic_σ )
21:         counts0[y] \leftarrow counts0[y] + 1
22:     end for
23:     y^Atop label in subscript^𝑦𝐴top label in \hat{y}_{A}\leftarrow\text{top label in }over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ← top label incounts0
24:     counts 0absent0\leftarrow 0← 0
25:     for i{1,2,,N}𝑖12𝑁i\in\{1,2,\ldots,N\}italic_i ∈ { 1 , 2 , … , italic_N } do
26:         yConsistencyPurification(fclf,𝒙,σ)𝑦ConsistencyPurificationsubscript𝑓clf𝒙𝜎y\leftarrow\textsc{ConsistencyPurification}(f_{\text{clf}},{\bm{x}},\sigma)italic_y ← ConsistencyPurification ( italic_f start_POSTSUBSCRIPT clf end_POSTSUBSCRIPT , bold_italic_x , italic_σ )
27:         counts[y] \leftarrow counts[y] + 1
28:     end for
29:     pA¯LowerConfBound(counts[y^A],N,1α)¯subscript𝑝𝐴LowerConfBoundcountsdelimited-[]subscript^𝑦𝐴𝑁1𝛼\underline{p_{A}}\leftarrow\textsc{LowerConfBound}(\text{counts}[\hat{y}_{A}],% N,1-\alpha)under¯ start_ARG italic_p start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT end_ARG ← LowerConfBound ( counts [ over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ] , italic_N , 1 - italic_α )
30:     if pA¯>12¯subscript𝑝𝐴12\underline{p_{A}}>\frac{1}{2}under¯ start_ARG italic_p start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT end_ARG > divide start_ARG 1 end_ARG start_ARG 2 end_ARG then
31:         return prediction y^Asubscript^𝑦𝐴\hat{y}_{A}over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT and radius σΦ1(pA¯)𝜎superscriptΦ1¯subscript𝑝𝐴\sigma\Phi^{-1}(\underline{p_{A}})italic_σ roman_Φ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( under¯ start_ARG italic_p start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT end_ARG )
32:     else
33:         return Abstain
34:     end if
35:end function

Appendix C Training Unconditional Consistency Model for ImageNet-64

We train an unconditional consistency model for ImageNet-64 from the public available conditional version by transiting the class embedding layers to a learnable token, initialization with average class embeddings. For each model forwarding, this token will be combined with the time embeddings for computation. After that, we train the conditional consistency model, initialized with the unconditional model’s parameters, on ImageNet-64 training set for 120k steps.

Appendix D Certified Accuracy with Fine-grained ϵitalic-ϵ\epsilonitalic_ϵ

We present the detailed certified accuracy with fine-grained radius thresholds ϵitalic-ϵ\epsilonitalic_ϵ in Figure 3.

Refer to caption
Refer to caption
Figure 3: Left figure shows experiments on CIFAR-10, right figure shows experiments on ImageNet-64. The lines demonstrate the certified accuracy with different 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT perturbation bound with different Gaussian noise levels.