Improving Diffusion Inverse Problem Solving
with Decoupled Noise Annealing

Bingliang Zhang^∗,1 Wenda Chu^∗,1 Julius Berner¹
Chenlin Meng² Anima Anandkumar¹ Yang Song³
¹California Institute of Technology ²Stanford University ³OpenAI

Abstract

Diffusion models have recently achieved success in solving Bayesian inverse problems with learned data priors. Current methods build on top of the diffusion sampling process, where each denoising step makes small modifications to samples from the previous step. However, this process struggles to correct errors from earlier sampling steps, leading to worse performance in complicated nonlinear inverse problems, such as phase retrieval. To address this challenge, we propose a new method called Decoupled Annealing Posterior Sampling (DAPS) that relies on a novel noise annealing process. Specifically, we decouple consecutive steps in a diffusion sampling trajectory, allowing them to vary considerably from one another while ensuring their time-marginals anneal to the true posterior as we reduce noise levels. This approach enables the exploration of a larger solution space, improving the success rate for accurate reconstructions. We demonstrate that DAPS significantly improves sample quality and stability across multiple image restoration tasks, particularly in complicated nonlinear inverse problems. For example, we achieve a PSNR of 30.72dB on the FFHQ 256 dataset for phase retrieval, which is an improvement of 9.12dB compared to existing methods. Our code is available at the GitHub repository DAPS.

^*^*footnotetext: These authors contributed equally to this work

1 Introduction

Inverse problems are ubiquitous in science and engineering, with applications ranging from image restoration [1, 2, 3, 4, 5, 6], medical imaging [7, 8, 9, 10, 11, 12] to astrophotography [13, 14, 15, 16]. Solving an inverse problem involves finding the underlying signal ${\mathbf{x}}_{0}$ from its partial, noisy measurement ${\mathbf{y}}$ . Since the measurement process is typically noisy and many-to-one, inverse problems do not have a unique solution; instead, multiple solutions may exist that are consistent with the observed measurement. In the Bayesian inverse problem framework, the solution space is characterized by the posterior distribution $p({\mathbf{x}}_{0}\mid{\mathbf{y}})\propto p({\mathbf{y}}\mid{\mathbf{x}}_{0})% p({\mathbf{x}}_{0})$ , where $p({\mathbf{y}}\mid{\mathbf{x}}_{0})$ represents the noisy measurement process, and $p({\mathbf{x}}_{0})$ is the prior distribution. In this work, we aim to solve Bayesian inverse problems where the measurement process $p({\mathbf{y}}\mid{\mathbf{x}}_{0})$ is known, and the prior distribution $p({\mathbf{x}}_{0})$ is captured by a deep generative model trained on a corresponding dataset.

As score-based diffusion models [17, 18, 19, 1, 20, 21] have risen to dominance in modeling high-dimensional data distributions like images, audio, and video, they have become the leading method for estimating the prior distribution $p({\mathbf{x}}_{0})$ in Bayesian inverse problems. A diffusion model generates a sample ${\mathbf{x}}_{0}$ by smoothly removing noise from an unstructured initial noise sample ${\mathbf{x}}_{T}$ through solving stochastic differential equations (SDEs). In particular, each step of the sampling process recursively converts a noisy sample ${\mathbf{x}}_{t+\Delta t}$ , where $\Delta t>0$ denotes the step size, to a slightly less noisy sample ${\mathbf{x}}_{t}$ until $t=0$ . This iterative structure in the diffusion sampling process can be leveraged to facilitate Bayesian inverse problem solving. In fact, prior research [3, 22, 23] has shown that given measurement ${\mathbf{y}}$ and a diffusion model prior $p({\mathbf{x}}_{0})$ , we can sample from the posterior distribution $p({\mathbf{x}}_{0}\mid{\mathbf{y}})$ by perturbing a reverse-time SDE using an approximate gradient of the measurement process, $\nabla_{{\mathbf{x}}_{t}}\log p({\mathbf{y}}\mid{\mathbf{x}}_{t})$ , at every step of the SDE solver.

Despite the remarkable success of these methods in solving many real-world inverse problems, like image colorization [1, 2, 24], image super-resolution [1, 2, 3, 22], computed tomography [8, 25], and magnetic resonance imaging [8, 7], they face significant challenges in more complex inverse problems with nonlinear measurement processes, such as phase retrieval and nonlinear motion deblurring. This is partially because, in prior methods, each denoising step approximately samples from the distribution $p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{t+\Delta t},{\mathbf{y}})$ . This causes ${\mathbf{x}}_{t}$ and ${\mathbf{x}}_{t+\Delta t}$ to be close to each other because of using a small step size $\Delta t$ in discretizing the reverse-time SDE. As a result, ${\mathbf{x}}_{t}$ can at most correct local errors in ${\mathbf{x}}_{t+\Delta t}$ but struggles to correct global errors that require significant modifications to the prior sample. This issue is exacerbated when the methods are applied to complicated, nonlinear inverse problems, such as phase retrieval, where they often converge to undesired samples that are consistent with the measurement but reside in low-probability areas of the prior distribution.

To address this challenge, we propose a new framework for solving general inverse problems, termed Decoupled Annealing Posterior Sampling (DAPS). Our method employs a new noise annealing process inspired by the diffusion sampling process, where we decouple the consecutive samples ${\mathbf{x}}_{t+\Delta t}$ and ${\mathbf{x}}_{t}$ in the sampling trajectory, allowing samplers to correct large, global errors made in earlier steps. Instead of repetitively sampling from $p({\mathbf{x}}_{t}\mid{\mathbf{x}}_{t+\Delta t},{\mathbf{y}})$ as in previous methods, which restricts the distances between consecutive samples ${\mathbf{x}}_{t+\Delta t}$ and ${\mathbf{x}}_{t}$ , DAPS recursively samples from the marginal distribution $p({\mathbf{x}}_{t}\mid{\mathbf{y}})$ . As illustrated in Fig. 2, we factorize the time marginal $p({\mathbf{x}}_{t}\mid{\mathbf{y}})$ into three conditional distributions and sample from them in turn by solving the reverse diffusion process, simulating Langevin dynamics, and adding noise according to the forward diffusion process. We show that this creates approximate samples from corresponding marginal distributions. As the noise gradually anneals to zero, the time marginal $p({\mathbf{x}}_{t}\mid{\mathbf{y}})$ smoothly converges to the posterior distribution $p({\mathbf{x}}_{0}\mid{\mathbf{y}})$ , providing samples that approximately solve the Bayesian inverse problem.

Empirically, our method demonstrates significantly improved performance on various inverse problems compared to existing approaches, including DPS [3], DDRM [2], PnP-ADMM [26], PSLD [27], ReSample [25], and RED-diff [28]. Our method can be combined with diffusion models in both raw pixel space and learned latent space, which we refer to as DAPS and LatentDAPS, respectively. As shown in Fig. 1, both methods provide superior reconstructions with improved visual perceptual quality across a wide range of nonlinear inverse problems. Our approach exhibits remarkable stability and sampling quality, particularly for challenging nonlinear inverse problems. On the FFHQ 256 dataset, we achieve a 30.72dB PSNR for noisy phase retrieval, which is 9.12dB higher than all existing methods, while on the ImageNet dataset, which includes a broader range of classes, we achieve a 25.78dB PSNR for phase retrieval, surpassing others by 5.24dB. Additionally, DAPS performs well even with a very small number of neural network evaluations (approximately 100), striking a better balance between efficiency and sample quality.

2 Background

2.1 Diffusion Models

Diffusion models [17, 18, 19, 1, 20] generate data by reversing a predefined noising process. Let the data distribution be $p({\mathbf{x}}_{0})$ . We can define a series of noisy data distributions $p({\mathbf{x}};\sigma)$ by adding Gaussian noise with a standard deviation of $\sigma$ to the data. These form the time-marginals of a stochastic differential equation (SDE) [20], given by

\mathrm{d}{\mathbf{x}}_{t}=\sqrt{2\dot{\sigma}_{t}\sigma_{t}}\mathrm{d}{% \mathbf{w}}_{t},

(1)

where $\sigma_{t}$ is the predefined noise schedule with $\sigma_{0}=0$ and $\sigma_{T}=\sigma_{\max}$ , $\dot{\sigma}_{t}$ is the time derivative of $\sigma_{t}$ , and ${\mathbf{w}}_{t}$ is a standard Wiener process. We use ${\mathbf{x}}_{t}$ interchangeably with ${\mathbf{x}}_{\sigma_{t}}$ . For a sufficiently large $\sigma_{\max}$ , the distribution $p({\mathbf{x}};\sigma_{\max})$ converges to pure Gaussian noise $\mathcal{N}({\bm{0}},\sigma_{\max}^{2}{\bm{I}})$ .

To sample from data distribution $p({\mathbf{x}}_{0})$ , we first draw an initial sample from $\mathcal{N}({\bm{0}},\sigma_{\max}^{2}{\bm{I}})$ , then solve the reverse SDE

\mathrm{d}{\mathbf{x}}_{t}=-2\dot{\sigma}_{t}\sigma_{t}\nabla_{{\mathbf{x}}_{t% }}\log p({\mathbf{x}}_{t};\sigma_{t})\mathrm{d}t+\sqrt{2\dot{\sigma}_{t}\sigma% _{t}}\mathrm{d}{\mathbf{w}}_{t}.

(2)

where $\nabla_{{\mathbf{x}}_{t}}\log p({\mathbf{x}}_{t};\sigma_{t})$ is the time-dependent score function [17, 1]. Here $\nabla_{{\mathbf{x}}_{t}}\log p({\mathbf{x}}_{t};\sigma_{t})$ is unknown, but can be approximated by training a diffusion model ${\bm{s}}_{\bm{\theta}}({\mathbf{x}}_{t},\sigma_{t})$ such that ${\bm{s}}_{\bm{\theta}}({\mathbf{x}}_{t},\sigma_{t})\approx\nabla_{{\mathbf{x}}% _{t}}\log p({\mathbf{x}}_{t};\sigma_{t})$ .

2.2 Bayesian Inverse Problems with Diffusion Priors

Inverse problems aim to recover data from partial, potentially noisy measurements. Formally, solving an inverse problem involves finding the inversion to a forward model that describes the measurement process. In general, a forward model takes the form of

{\mathbf{y}}=\mathcal{A}({\mathbf{x}}_{0})+{\mathbf{n}},

(3)

where $\mathcal{A}$ is the measurement function, ${\mathbf{x}}_{0}$ represents the original data, ${\mathbf{y}}$ is the observed measurement, and ${\mathbf{n}}$ symbolizes the noise in the measurement process, often modeled as ${\mathbf{n}}\sim\mathcal{N}({\bm{0}},\beta_{{\mathbf{y}}}^{2}{\bm{I}})$ . In a Bayesian framework, ${\mathbf{x}}_{0}$ comes from the posterior distribution $p({\mathbf{x}}_{0}\mid{\mathbf{y}})\propto p({\mathbf{x}}_{0})p({\mathbf{y}}% \mid{\mathbf{x}}_{0})$ . Here $p({\mathbf{x}}_{0})$ is a prior distribution that can be estimated from a given dataset, and $p({\mathbf{y}}\mid{\mathbf{x}}_{0})=\mathcal{N}(\mathcal{A}({\mathbf{x}}_{0}),% \beta_{{\mathbf{y}}}^{2}{\bm{I}})$ models the noisy measurement process.

When the prior $p({\mathbf{x}}_{0})$ is modeled by a pre-trained diffusion model, we can modify Eq. 2 to approximately sample from the posterior distribution following Bayes’ rule, i.e.,

\mathrm{d}{\mathbf{x}}_{t}=-2\dot{\sigma}_{t}\sigma_{t}\Big{(}\nabla_{{\mathbf% {x}}_{t}}\log p({\mathbf{x}}_{t};\sigma_{t})+\nabla_{{\mathbf{x}}_{t}}\log p({% \mathbf{y}}\mid{\mathbf{x}}_{t})\Big{)}\mathrm{d}t+\sqrt{2\dot{\sigma}_{t}% \sigma_{t}}\mathrm{d}{\mathbf{w}}_{t}.

(4)

Here, the noisy likelihood $\nabla_{{\mathbf{x}}_{t}}\log p({\mathbf{y}}\mid{\mathbf{x}}_{t})$ is generally intractable. Multiple methods have been proposed to estimate the noisy likelihood [8, 29, 22, 23, 30, 31, 32, 6]. One predominant approach is the DPS algorithm [3], which estimates $p({\mathbf{y}}\mid{\mathbf{x}}_{t})\approx p({\mathbf{y}}\mid{\mathbf{x}}_{0}=% \mathbb{E}[{\mathbf{x}}_{0}\mid{\mathbf{x}}_{t}])$ . Another line of work [2, 33, 31] solves linear inverse problems by running the reverse diffusion process in the spectral domain via singular value decomposition (SVD). Other approaches bypass direct computation of this likelihood by interleaving optimization [25, 34, 35, 36, 37] or projection [2, 9, 7, 31] steps with normal diffusion sampling steps.

Despite promising empirical success, we find that this line of approaches faces challenges in solving more difficult inverse problems when the forward model is highly nonlinear. Accurately solving the reverse SDE in Eq. 4 requires the solver to take a very small step size $\Delta t>0$ , causing ${\mathbf{x}}_{t}$ and ${\mathbf{x}}_{t+\Delta t}$ to be very close to each other. Consequently, ${\mathbf{x}}_{t}$ can only correct minor errors in ${\mathbf{x}}_{t+\Delta t}$ , but oftentimes fails to address larger, global errors that require substantial changes to ${\mathbf{x}}_{t+\Delta t}$ . One such failure case is given in Fig. 4.

3 Method

3.1 Posterior Sampling with Decoupled Noise Annealing

Instead of solving the reverse-time SDE in Eq. 4, we propose a new noise annealing process that reduces the dependency between samples at consecutive time steps, as illustrated in Figs. 2, 3(a) and 3(b). Unlike previous methods, we ensure ${\mathbf{x}}_{t}$ and ${\mathbf{x}}_{t+\Delta t}$ are conditionally independent given ${\mathbf{x}}_{0}$ . To generate sample ${\mathbf{x}}_{t}$ from ${\mathbf{x}}_{t+\Delta t}$ , we follow a two-step procedure: (1) sampling ${\mathbf{x}}_{0\mid{\mathbf{y}}}\sim p({\mathbf{x}}_{0}\mid{\mathbf{x}}_{t+% \Delta t},{\mathbf{y}})$ , and (2) sampling ${\mathbf{x}}_{t}\sim\mathcal{N}({\mathbf{x}}_{0\mid{\mathbf{y}}},\sigma_{t}^{2% }{\bm{I}})$ . We repeat this process, gradually reducing noise until ${\mathbf{x}}_{0}$ is sampled. We call this process decoupled noise annealing, which is justified by the proposition below.

Proposition 1.

Suppose ${\mathbf{x}}_{t_{1}}$ is sampled from the time-marginal $p({\mathbf{x}}_{t_{1}}\mid{\mathbf{y}})$ , then

{\mathbf{x}}_{t_{2}}\sim\mathbb{E}_{{\mathbf{x}}_{0}\sim p({\mathbf{x}}_{0}% \mid{\mathbf{x}}_{t_{1}},{\mathbf{y}})}[\mathcal{N}({\mathbf{x}}_{0},\sigma_{t% _{2}}^{2}{\bm{I}})]

(5)

satisfies the time-marginal $p({\mathbf{x}}_{t_{2}}\mid{\mathbf{y}})$ .

Remark. The key idea of Proposition 1 is to enable the sampling from $p({\mathbf{x}}_{t_{2}}\mid{\mathbf{y}})$ for any noise level $\sigma_{t_{2}}$ given any sample $x_{t_{1}}$ at another noise level $\sigma_{t_{1}}$ . For a sufficiently large $\sigma_{T}$ , one can assume $p({\mathbf{x}}_{T}\mid{\mathbf{y}})\approx p({\mathbf{x}}_{T};\sigma_{T})% \approx\mathcal{N}(\bm{0},\sigma_{T}^{2}{\bm{I}})$ . Starting from ${\mathbf{x}}_{T}$ , we can iteratively sample from $p({\mathbf{x}}_{t}\mid{\mathbf{y}})$ with $\sigma_{t}$ annealed down from $\sigma_{T}$ to 0.

Algorithm 1 Decoupled Annealing Posterior Sampling

Score model

{\bm{s}}_{{\bm{\theta}}}

, measurement

{\mathbf{y}}

, noise schedule

\sigma_{t}

(t_{i})_{i\in\{0,\dots,N_{A}\}}

Sample

{\mathbf{x}}_{T}\sim\mathcal{N}({\bm{0}},\sigma_{T}^{2}{\bm{I}})

for

i=N_{A},N_{A}-1,\dots,1

Compute

\hat{\mathbf{x}}_{0}^{(0)}=\hat{\mathbf{x}}_{0}({\mathbf{x}}_{t_{i}})

by solving the probability flow ODE in Eq. 39 with

{\bm{s}}_{\bm{\theta}}

for

j=0,\dots,N-1

\hat{\mathbf{x}}_{0}^{(j+1)}\leftarrow\hat{\mathbf{x}}_{0}^{(j)}+\eta_{t}\Big{% (}\nabla_{\hat{\mathbf{x}}_{0}}\log p(\hat{\mathbf{x}}_{0}^{(j)}|{\mathbf{x}}_% {t_{i}})+\nabla_{\hat{\mathbf{x}}_{0}}\log p({\mathbf{y}}|\hat{\mathbf{x}}_{0}% ^{(j)})\Big{)}+\sqrt{2\eta_{t}}\bm{\epsilon}_{j}

\ \bm{\epsilon}_{j}\sim\mathcal{N}({\bm{0}},{\bm{I}})

end for

Sample

{\mathbf{x}}_{t_{i-1}}\sim\mathcal{N}(\hat{\mathbf{x}}_{0}^{(N)},\sigma_{t_{i-% 1}}^{2}{\bm{I}})

end for

Return

{\mathbf{x}}_{0}

The first step of our decoupled noise annealing requires sampling ${\mathbf{x}}_{0\mid{\mathbf{y}}}\sim p({\mathbf{x}}_{0}\mid{\mathbf{x}}_{t},{% \mathbf{y}})$ where ${\mathbf{x}}_{t}$ and ${\mathbf{y}}$ are known. Since ${\mathbf{y}}$ is conditionally independent from ${\mathbf{x}}_{t}$ given ${\mathbf{x}}_{0}$ , we can deduce from the Bayes’ rule that

p({\mathbf{x}}_{0}\mid{\mathbf{x}}_{t},{\mathbf{y}})=\frac{p({\mathbf{x}}_{0}% \mid{\mathbf{x}}_{t})p({\mathbf{y}}\mid{\mathbf{x}}_{0},{\mathbf{x}}_{t})}{p({% \mathbf{y}}\mid{\mathbf{x}}_{t})}\propto p({\mathbf{x}}_{0}\mid{\mathbf{x}}_{t% })p({\mathbf{y}}\mid{\mathbf{x}}_{0}).

(6)

To sample ${\mathbf{x}}_{0\mid{\mathbf{y}}}$ from this unnormalized distribution, we propose to run Langevin dynamics[38], given by

{\mathbf{x}}_{0}^{(j+1)}={\mathbf{x}}_{0}^{(j)}+\eta\cdot\left(\nabla_{{% \mathbf{x}}_{0}^{(j)}}\log p({\mathbf{x}}_{0}^{(j)}\mid{\mathbf{x}}_{t})+% \nabla_{{\mathbf{x}}_{0}^{(j)}}\log p({\mathbf{y}}\mid{\mathbf{x}}_{0}^{(j)})% \right)+\sqrt{2\eta}\bm{\epsilon}_{j},

(7)

where $\eta>0$ is the step size and $\bm{\epsilon}_{j}\sim\mathcal{N}({\bm{0}},{\bm{I}})$ . When $\eta\to 0$ and $j\to\infty$ , the sample ${\mathbf{x}}_{0}^{(j)}$ will be approximately distributed according to $p({\mathbf{x}}_{0}\mid{\mathbf{x}}_{t},{\mathbf{y}})$ .

Following previous works [3, 22, 23], we approximate the conditional distribution $p({\mathbf{x}}_{0}\mid{\mathbf{x}}_{t})$ with a Gaussian distribution

p({\mathbf{x}}_{0}\mid{\mathbf{x}}_{t})\approx\mathcal{N}({\mathbf{x}}_{0};% \hat{{\mathbf{x}}}_{0}({\mathbf{x}}_{t}),r_{t}^{2}{\bm{I}}),

(8)

where $\hat{{\mathbf{x}}}_{0}({\mathbf{x}}_{t})$ is an estimator of ${\mathbf{x}}_{0}$ given ${\mathbf{x}}_{t}$ , and the variance $r_{t}^{2}$ is specified using heuristics. Given a pre-trained diffusion model ${\bm{s}}_{\bm{\theta}}({\mathbf{x}}_{t},t)$ , one way to compute $\hat{{\mathbf{x}}}_{0}({\mathbf{x}}_{t})$ is to solve the (unconditional) probability flow ODE starting at ${\mathbf{x}}_{t}$ . We leave details in Section D.2. When the measurement noise is an isotropic Gaussian, i.e., ${\mathbf{n}}\sim\mathcal{N}({\bm{0}},\beta_{\mathbf{y}}^{2}{\bm{I}})$ , the update rule simplifies to:

{\mathbf{x}}_{0}^{(j+1)}={\mathbf{x}}_{0}^{(j)}-\eta\cdot\nabla_{{\mathbf{x}}_% {0}^{(j)}}\left(\frac{\|{\mathbf{x}}_{0}^{(j)}-\hat{{\mathbf{x}}}_{0}({\mathbf% {x}}_{t})\|^{2}}{2r_{t}^{2}}+\frac{\|{\mathcal{A}}({\mathbf{x}}_{0}^{(j)})-{% \mathbf{y}}\|^{2}}{2\beta_{\mathbf{y}}^{2}}\right)+\sqrt{2\eta}\bm{\epsilon}_{% j},

(9)

Combining Eq. 9 with Proposition 1, we define an algorithm called Decoupled Annealing Posterior Sampling (DAPS), summarized in Algorithm 1. In particular, given a noise schedule $\sigma_{t}$ and a time discretization $\{t_{i},i=0,\dots,N_{A}\}$ , we iteratively sample ${\mathbf{x}}_{t_{i}}$ from the measurement-conditioned time-marginal $p({\mathbf{x}}_{t_{i}}\mid{\mathbf{y}})$ for $i=N_{A},N_{A}-1,\dots,0$ , following Eq. 5 in Proposition 1. This algorithm provides an approximate sample ${\mathbf{x}}_{0}$ from the posterior distribution $p({\mathbf{x}}_{0}\mid{\mathbf{y}})$ .

The computational cost of Langevin dynamics mostly originates from evaluating the measurement function $\mathcal{A}$ . In most image restoration tasks, this operation is significantly more efficient compared to evaluating the diffusion model. As demonstrated in Section C.1, Langevin dynamics introduce only a small overhead to the sampling process. Moreover, this framework can be adapted to sampling with pre-trained latent diffusion models by factorizing the probabilistic graphical model for the latent diffusion process, as shown in Fig. 3(b). Further details are provided in Appendix A.

3.2 Discussion and Connection with Existing Methods

Comparison with other posterior sampling methods. Unlike most previous algorithms, our sampling method does not solve a specific SDE/ODE; instead, we recursively sample from the time-marginal $p({\mathbf{x}}_{t}\mid{\mathbf{y}})$ with noise annealing to zero. We therefore decouple the dependency of ${\mathbf{x}}_{t}$ and ${\mathbf{x}}_{t+\Delta t}$ in the sampling process. We argue that decoupling helps correct the errors accumulated in early diffusion sampling steps by allowing non-local transitions. This is particularly important when the measurement function is nonlinear. As a concrete example, Fig. 4 compares DAPS and DPS when solving a 2D nonlinear inverse problem with a Gaussian mixture prior. We visualize the trajectories of ${\mathbf{x}}_{t}$ from time $T$ to $0$ for both methods. For DAPS, points on the trajectory have significantly larger variations compared to those of DPS. As a result, DPS converges to wrong solutions, but DAPS is able to approximate the true posterior distribution. We provide more discussions in Appendix E.

Comparison with optimization-based methods. Although our method does not directly involve inner-loop optimization, we highlight that it has connections with existing optimization-based solvers for inverse problems. For example, ReSample [25] alternate between denoising, optimizing, and resampling to solve inverse problems using latent diffusion models. Our method also resembles [25] if we set standard deviation $\beta_{\mathbf{y}}\to 0$ and assume $\eta\beta_{\mathbf{y}}^{2}$ is constant in Eq. 12.

4 Experiments

4.1 Experimental Setup

We evaluate our method using both pixel-space and latent diffusion models. For pixel-based diffusion experiments, we leverage the pre-trained diffusion models trained by [3] on the FFHQ dataset and the pre-trained model from [39] on the ImageNet dataset. For latent diffusion models, we use the same pre-trained models as [25]: the unconditional LDM-VQ4 trained on FFHQ and ImageNet by [40]. The autoencoder’s downsample factor is $4$ . We use the same time step discretization and noise schedule as EDM [20].

As mentioned in Section 3, we implement a few-step Euler ODE solver to compute $\hat{\mathbf{x}}_{0}({\mathbf{x}}_{t})$ , while maintaining the same number of neural function evaluations (NFE) for different noise levels. In our experiments, we use DAPS-1K for all linear tasks and DAPS-4K for all nonlinear tasks. DAPS-1K uses $4$ ODE solver NFE and $250$ annealing scheduling steps, while DAPS-4K uses $10$ ODE solver NFE and $400$ annealing scheduling steps. Ablation studies on their effects are shown in Section 4.3. The corresponding latent version, LatentDAPS, follows the same settings but performs sampling in the latent space of VAEs. We use $100$ Langevin steps per denoising iteration and tune learning rates separately for each task. Further details on model configurations, samplers, and other hyperparameters are provided in Appendix D, along with more discussion on sampling efficiency in Section C.1.

Datasets and metrics. Adopting the previous convention, we test our method on two image datasets, FFHQ $256\times 256$ [41] and ImageNet $256\times 256$ [42]. To evaluate our method, we use $100$ images from the validation set for both FFHQ and ImageNet. We include Learned Perceptual Image Patch Similarity (LPIPS) score [43] and peak signal-to-noise-ratio (PSNR) as our main evaluation metrics. For both our method and baselines, we use the versions implemented in piq [44] with all images normalized to the range $[0,1]$ . The replace-pooling option is enabled for LPIPS evaluation.

Inverse problems. We evaluate our method with a series of linear and nonlinear tasks. For linear inverse problems, we consider (1) super-resolution, (2) Gaussian deblurring, (3) motion deblurring, (4) inpainting (with a box mask), and (5) inpainting (with a 70% random mask). For Gaussian and motion deblurring, kernels of size 61 $\times$ 61 with standard deviations of 3.0 and 0.5, respectively, are used. In the super-resolution task, a bicubic resizer downscales images by a factor of 4. The box inpainting task uses a random box of size 128 $\times$ 128 to mask the original images, while random mask inpainting uses a generated random mask where each pixel has a 70% chance of being masked, following the settings in [25].

We consider three nonlinear inverse problems: (1) phase retrieval, (2) high dynamic range (HDR) reconstruction, and (3) nonlinear deblurring. Due to the inherent instability of phase retrieval, we adopt the strategy from DPS [3], using an oversampling rate of 2.0 and reporting the best result out of four independent samples. The goal of HDR reconstruction is to recover a higher dynamic range image (factor of 2) from a low dynamic range image. For nonlinear deblurring, we use the default setting as described in [45]. All linear and nonlinear measurements are subject to white Gaussian noise with a standard deviation of $\beta_{\mathbf{y}}=0.05$ . Further details regarding the forward measuring functions for each task and their respective hyperparameters are provided in the Appendix D.

Baselines. We compare our methods with the following baselines: Denoising Diffusion Restoration Models (DDRM) [2], Diffusion Posterior Sampling (DPS) [3], Denoising Diffusion Null-Space Model (DDNM) [31], and Plug-and-Play Alternating Direction Method of Multipliers (PnP-ADMM) [46] for pixel-based diffusion model experiments. We compare the latent diffusion version of our methods with Posterior Sampling with Latent Diffusion Models (PSLD) [27] and ReSample [25] that operates also in the latent space. Note that DDRM, DDNM, and PSLD are not able to handle nonlinear inverse problems. We also include Regularization by denoising diffusion process (RED-diff) [28] for nonlinear experiments.

Table 1: Quantitative evaluation on FFHQ 256

\mathbf{\times}

256. Performance comparison of different methods on various linear tasks in image domain. The value shows the mean over

100

images.

Method	SR (×4)		Inpaint (Box)		Inpaint (Random)		Gaussian deblurring		Motion deblurring
	LPIPS $\downarrow$	PSNR $\uparrow$	LPIPS $\downarrow$	PSNR $\uparrow$	LPIPS $\downarrow$	PSNR $\uparrow$	LPIPS $\downarrow$	PSNR $\uparrow$	LPIPS $\downarrow$	PSNR $\uparrow$
DAPS (ours)	0.177	29.07	0.133	24.07	0.098	31.12	0.165	29.19	0.157	29.66
DPS	0.260	24.38	0.198	23.32	0.193	28.39	0.211	25.52	0.270	23.14
DDRM	0.210	27.65	0.159	22.37	0.218	25.75	0.236	23.36	-	-
DDNM	0.197	28.03	0.235	24.47	0.121	29.91	0.216	28.20	-	-
PnP-ADMM	0.725	23.48	0.775	13.39	0.724	20.94	0.751	21.31	0.703	23.40
LatentDAPS (ours)	0.275	27.48	0.194	23.99	0.157	30.71	0.234	27.93	0.283	27.00
PSLD	0.287	24.35	0.158	24.22	0.221	30.31	0.316	23.27	0.336	22.31
ReSample	0.392	23.29	0.184	20.06	0.140	29.61	0.255	26.39	0.198	27.41

Table 2: Quantitative evaluation on FFHQ 256

\mathbf{\times}

256. Performance comparison of different methods on various nonlinear tasks in the image domain. The mean and standard deviation are computed over

100

images.

Method	Phase retrieval		Nonlinear deblurring		High dynamic range
	LPIPS $\downarrow$	PSNR $\uparrow$	LPIPS $\downarrow$	PSNR $\uparrow$	LPIPS $\downarrow$	PSNR $\uparrow$
DAPS (ours)	0.121 (0.073)	30.72 (3.06)	0.155 (0.032)	28.29 (1.77)	0.162 (0.072)	27.12 (3.53)
DPS	0.410 (0.090)	17.64 (2.97)	0.278 (0.060)	23.39 (2.01)	0.264 (0.156)	22.73 (6.07)
RED-diff	0.596 (0.092)	15.60 (4.48)	0.160 (0.034)	30.86 (0.51)	0.258 (0.089)	22.16 (3.41)
LatentDAPS (ours)	0.199 (0.078)	29.16 (3.55)	0.235 (0.049)	28.11 (1.75)	0.223 (0.080)	25.94 (2.87)
ReSample	0.406 (0.224)	21.60 (8.10)	0.185 (0.039)	28.24 (1.69)	0.182 (0.085)	25.65 (3.57)

Table 3: Quantitative evaluation on ImageNet 256

\times

256. Performance comparison of different methods on various linear tasks in image domain.

Method	SR (×4)		Inpaint (Box)		Inpaint (Random)		Gaussian deblurring		Motion deblurring
	LPIPS $\downarrow$	PSNR $\uparrow$	LPIPS $\downarrow$	PSNR $\uparrow$	LPIPS $\downarrow$	PSNR $\uparrow$	LPIPS $\downarrow$	PSNR $\uparrow$	LPIPS $\downarrow$	PSNR $\uparrow$
DAPS(ours)	0.276	25.89	0.214	21.43	0.135	28.44	0.253	26.15	0.196	27.86
DPS	0.354	23.92	0.309	19.78	0.326	24.43	0.360	21.86	0.357	21.46
DDRM	0.284	25.21	0.229	19.45	0.325	23.23	0.341	23.86	-	-
DDNM	0.475	23.96	0.319	21.64	0.191	31.16	0.278	28.06	-	-
PnP-ADMM	0.724	22.18	0.702	12.61	0.680	20.03	0.729	20.47	0.684	24.23
LatentDAPS(ours)	0.343	25.06	0.340	17.19	0.219	27.59	0.349	25.05	0.296	26.83
PSLD	0.360	25.42	0.465	20.10	0.337	31.30	0.390	25.86	0.511	20.85
ReSample	0.370	22.61	0.262	18.29	0.143	27.50	0.254	25.97	0.227	26.94

Table 4: Quantitative evaluation on ImageNet 256

\times

256. Performance comparison of different methods on various nonlinear tasks in the image domain. The mean and standard deviation are computed over

100

images.

Method	Phase retrieval		Nonlinear deblurring		High dynamic range
	LPIPS $\downarrow$	PSNR $\uparrow$	LPIPS $\downarrow$	PSNR $\uparrow$	LPIPS $\downarrow$	PSNR $\uparrow$
DAPS(ours)	0.254 (0.125)	25.78 (6.92)	0.169 (0.056)	27.73 (3.23)	0.175 (0.107)	26.30 (4.15)
DPS	0.447 (0.099)	16.81 (3.61)	0.306 (0.081)	22.49 (3.20)	0.503 (0.106)	19.23 (2.52)
RED-diff	0.536 (0.129)	14.98 (3.75)	0.211 (0.083)	30.07 (1.41)	0.274 (0.198)	22.03 (5.90)
LatentDAPS (ours)	0.361 (0.150)	20.54 (6.41)	0.314 (0.080)	25.34 (3.44)	0.269 (0.099)	23.64 (4.10)
ReSample	0.403 (0.174)	19.24 (4.21)	0.206 (0.057)	26.20 (3.71)	0.198 (0.089)	25.11 (4.21)

4.2 Main Results

We show quantitative results for the FFHQ dataset in Tables 1 and 2, and results for the ImageNet dataset in Tables 3 and 4. The qualitative comparisons are provided in Fig. 1. As shown in Table 1, our method achieves comparable or even better performance across all selected linear inverse problems. Moreover, our methods are remarkably stable when handling nonlinear inverse problems. For example, although existing methods such as DPS can recover high-quality images from phase retrieval measurements, they suffer from an extremely high failure rate, resulting in a relatively low PSNR and a high LPIPS. However, DAPS demonstrates superior stability across different samples and achieves a significantly higher success rate, resulting in a substantial improvement in PSNR and LPIPS as evidenced by the results in Table 2. Moreover, DAPS captures and recovers much more fine-grained details in measurements compared to existing baselines. We include more samples and comparisons in Appendix F for further illustration.

To understand the evolution of the noise annealing procedure, we evaluate two crucial trajectories in our method: 1) the estimated mean of $p({\mathbf{x}}_{0}\mid{\mathbf{x}}_{t})$ , denoted as $\hat{{\mathbf{x}}}_{0}({\mathbf{x}}_{t})$ ; and 2) samples from $p({\mathbf{x}}_{0}\mid{\mathbf{x}}_{t},{\mathbf{y}})$ obtained through Langevin dynamics, denoted as ${\mathbf{x}}_{0\mid{\mathbf{y}}}$ . We assess image quality and measurement error across these trajectories, as depicted in Fig. 6. The measurement error for ${\mathbf{x}}_{0\mid{\mathbf{y}}}\sim p({\mathbf{x}}_{0}\mid{\mathbf{x}}_{t},{% \mathbf{y}})$ remains consistently low throughout the process, while the measurement error for $\hat{{\mathbf{x}}}_{0}({\mathbf{x}}_{t})$ continuously decreases. Consequently, metrics such as PSNR and LPIPS exhibit stable improvement as the noise level is annealed.

In addition, our method can generate diversified samples when the measurements contain less information. To illustrate this, we selected two tasks where the posterior distributions may have multiple modes: (1) super-resolution by a factor of 16, and (2) box inpainting with a box size of 192 $\times$ 192. Some samples from DAPS are shown in Fig. 5, demonstrating DAPS’s ability to produce diverse samples while preserving the measurement information. We show more samples in Appendix F for further demonstration.

4.3 Ablation Studies

Effectiveness of the number of function evaluations.

To better understand how the number of function evaluations (NFE) of the diffusion model influences the performance, we evaluate the performance of DAPS with different configurations. Recall that we use an ODE sampler in each inner loop to compute $\hat{\mathbf{x}}_{0}({\mathbf{x}}_{t})$ , the total NFE for DAPS is the number of inner ODE steps times the number of noise annealing steps. We evaluate DAPS using NFE ranging from 50 to 4k, with configurations as specified in Section C.1. As indicated by Fig. 7, DAPS achieves relatively decent performance even with small NFE.

Table 5: Phase retrieval of different oversamples with DAPS.

Oversample	2.0	1.5	1.0	0.5	0.0
LPIPS	0.117	0.131	0.235	0.331	0.489
PSNR	30.26	29.17	24.87	21.60	16.02

More Discussion on Phase Retrieval.

Compared to the baselines, DAPS exhibits significantly better sample quality and stability in the phase retrieval task. Unlike other selected tasks, phase retrieval is more ill-posed, meaning that images with the same measurements can appear quite different perceptually. A demonstration of this is shown in Section F.1. To understand how this ill-posedness affects performance, we present results on $20$ images from FFHQ in Table 5 using different oversampling ratios in phase retrieval. These results further demonstrate the strength of DAPS in addressing complex, ill-posed inverse problems.

5 Conclusion

In summary, we propose Decoupled Annealing Posterior Sampling (DAPS) for solving inverse problems, particularly those with complex nonlinear measurement processes such as phase retrieval. Our method decouples consecutive sample points in a diffusion sampling trajectory, allowing them to vary considerably, thereby enabling DAPS to explore a larger solution space. Empirically, we demonstrate that DAPS generates samples with better visual quality and stability compared to existing methods when solving a wide range of challenging inverse problems.

However, DAPS has its limitation and could be further improved in future work. For example, our method approximates $p({\mathbf{x}}_{0}\mid{\mathbf{x}}_{t})$ with a Gaussian distribution. This inaccurate approximation might lead to irrecoverable error in posterior sampling. Moreover, our method calls forward models multiple times while running Langevin dynamics, which increases the computation overhead when the forward model is complicated. Further discussions about limitations and future extensions are covered in Section C.2.

Acknowledgments

We are grateful to Pika for providing the computing resources essential for this research. We also extend our thanks to the Kortschak Scholars Fellowship for supporting B.Z. and W.C. at Caltech. J.B. acknowledges support from the Wally Baer and Jeri Weiss Postdoctoral Fellowship. A.A. is supported in part by Bren endowed chair and by the AI2050 senior fellow program at Schmidt Sciences.

References

[1] Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021.
[2] Bahjat Kawar, Michael Elad, Stefano Ermon, and Jiaming Song. Denoising diffusion restoration models. In Advances in Neural Information Processing Systems, 2022.
[3] Hyung** Chung, Jeongsol Kim, Michael Thompson Mccann, Marc Louis Klasky, and Jong Chul Ye. Diffusion posterior sampling for general noisy inverse problems. In The Eleventh International Conference on Learning Representations, 2023.
[4] Chitwan Saharia, Jonathan Ho, William Chan, Tim Salimans, David J. Fleet, and Mohammad Norouzi. Image super-resolution via iterative refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4):4713–4726, 2023.
[5] Andreas Lugmayr, Martin Danelljan, Andres Romero, Fisher Yu, Radu Timofte, and Luc Van Gool. Repaint: Inpainting using denoising diffusion probabilistic models, 2022.
[6] Yuanzhi Zhu, Kai Zhang, **gyun Liang, Jiezhang Cao, Bihan Wen, Radu Timofte, and Luc Van Gool. Denoising diffusion models for plug-and-play image restoration. In IEEE Conference on Computer Vision and Pattern Recognition Workshops (NTIRE), 2023.
[7] Ajil Jalal, Marius Arvinte, Giannis Daras, Eric Price, Alexandros G Dimakis, and Jon Tamir. Robust compressed sensing mri with deep generative priors. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 14938–14954. Curran Associates, Inc., 2021.
[8] Yang Song, Liyue Shen, Lei Xing, and Stefano Ermon. Solving inverse problems in medical imaging with score-based generative models. In International Conference on Learning Representations, 2022.
[9] Hyung** Chung and Jong Chul Ye. Score-based diffusion models for accelerated mri. Medical Image Analysis, page 102479, 2022.
[10] Alex Ling Yu Hung, Kai Zhao, Haoxin Zheng, Ran Yan, Steven S Raman, Demetri Terzopoulos, and Kyunghyun Sung. Med-cdiff: Conditional medical image generation with diffusion models. Bioengineering, 10(11):1258, 2023.
[11] Zolnamar Dorjsembe, Hsing-Kuo Pao, Sodtavilan Odonchimed, and Furen Xiao. Conditional diffusion models for semantic 3d brain mri synthesis. IEEE Journal of Biomedical and Health Informatics, 2024.
[12] Hyung** Chung, Eun Sun Lee, and Jong Chul Ye. Mr image denoising and super-resolution using regularized reverse diffusion. IEEE Transactions on Medical Imaging, 42(4):922–934, 2022.
[13] Kazunori Akiyama, Antxon Alberdi, Walter Alef, Keiichi Asada, Rebecca Azulay, Anne-Kathrin Baczko, David Ball, Mislav Baloković, John Barrett, Dan Bintley, et al. First m87 event horizon telescope results. iv. imaging the central supermassive black hole. The Astrophysical Journal Letters, 875(1):L4, 2019.
[14] Berthy T. Feng, Jamie Smith, Michael Rubinstein, Huiwen Chang, Katherine L. Bouman, and William T. Freeman. Score-based diffusion models as principled priors for inverse imaging. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 10520–10531, October 2023.
[15] He Sun and Katherine L. Bouman. Deep probabilistic imaging: Uncertainty quantification and multi-modal solution characterization for computational imaging, 2020.
[16] Yu Sun, Zihui Wu, Yifan Chen, Berthy Feng, and Katherine L. Bouman. Provable probabilistic imaging using score-based generative priors. ArXiv, abs/2310.10835, 2023.
[17] Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. In Neural Information Processing Systems, 2019.
[18] Yang Song and Stefano Ermon. Improved techniques for training score-based generative models. Advances in neural information processing systems, 33:12438–12448, 2020.
[19] Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 6840–6851. Curran Associates, Inc., 2020.
[20] Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022.
[21] Tero Karras, Miika Aittala, Jaakko Lehtinen, Janne Hellsten, Timo Aila, and Samuli Laine. Analyzing and improving the training dynamics of diffusion models, 2024.
[22] Jiaming Song, Arash Vahdat, Morteza Mardani, and Jan Kautz. Pseudoinverse-guided diffusion models for inverse problems. In International Conference on Learning Representations, 2023.
[23] Benjamin Boys, Mark Girolami, Jakiw Pidstrigach, Sebastian Reich, Alan Mosca, and O. Deniz Akyildiz. Tweedie moment projected diffusions for inverse problems, 2023.
[24] Hanyuan Liu, **bo Xing, Minshan Xie, Chengze Li, and Tien-Tsin Wong. Improved diffusion-based image colorization via piggybacked models, 2023.
[25] Bowen Song, Soo Min Kwon, Zecheng Zhang, Xinyu Hu, Qing Qu, and Liyue Shen. Solving inverse problems with latent diffusion models via hard data consistency. In The Twelfth International Conference on Learning Representations, 2024.
[26] Stanley Chan, Xiran Wang, and Omar Elgendy. Plug-and-play admm for image restoration: Fixed point convergence and applications. IEEE Transactions on Computational Imaging, PP, 05 2016.
[27] Litu Rout, Negin Raoof, Giannis Daras, Constantine Caramanis, Alex Dimakis, and Sanjay Shakkottai. Solving linear inverse problems provably via posterior sampling with latent diffusion models. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
[28] Morteza Mardani, Jiaming Song, Jan Kautz, and Arash Vahdat. A variational perspective on solving inverse problems with diffusion models. arXiv preprint arXiv:2305.04391, 2023.
[29] Hyung** Chung, Byeongsu Sim, Dohoon Ryu, and Jong Chul Ye. Improving diffusion models for inverse problems using manifold constraints. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022.
[30] Xiangming Meng and Yoshiyuki Kabashima. Diffusion model based posterior sampling for noisy linear inverse problems. arXiv preprint arXiv:2211.12343, 2022.
[31] Yinhuai Wang, Jiwen Yu, and Jian Zhang. Zero-shot image restoration using denoising diffusion null-space model. The Eleventh International Conference on Learning Representations, 2023.
[32] Gabriel Victorino Cardoso, Yazid Janati, Sylvain Le Corff, and Éric Moulines. Monte carlo guided diffusion for bayesian linear inverse problems. ArXiv, abs/2308.07983, 2023.
[33] Bahjat Kawar, Gregory Vaksman, and Michael Elad. SNIPS: Solving noisy inverse problems stochastically. Advances in Neural Information Processing Systems, 34:21757–21769, 2021.
[34] Ulugbek Kamilov, Charles Bouman, Gregery Buzzard, and Brendt Wohlberg. Plug-and-play methods for integrating physical and learned models in computational imaging: Theory, algorithms, and applications. IEEE Signal Processing Magazine, 40:85–97, 01 2023.
[35] Junqing Chen and Haibo Liu. An alternating direction method of multipliers for inverse lithography problem, 2023.
[36] Marius Arvinte, Sriram Vishwanath, Ahmed H. Tewfik, and Jonathan I. Tamir. Deep j-sense: Accelerated mri reconstruction via unrolled alternating optimization, 2021.
[37] Yuanzhi Zhu, Kai Zhang, **gyun Liang, Jiezhang Cao, Bihan Wen, Radu Timofte, and Luc Van Gool. Denoising diffusion models for plug-and-play image restoration, 2023.
[38] Max Welling and Yee W Teh. Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th international conference on machine learning (ICML-11), pages 681–688. Citeseer, 2011.
[39] Prafulla Dhariwal and Alexander Quinn Nichol. Diffusion models beat GANs on image synthesis. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, 2021.
[40] Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
[41] T. Karras, S. Laine, and T. Aila. A style-based generator architecture for generative adversarial networks. IEEE Transactions on Pattern Analysis & Machine Intelligence, 43(12):4217–4228, dec 2021.
[42] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009.
[43] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang. The unreasonable effectiveness of deep features as a perceptual metric. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 586–595, Los Alamitos, CA, USA, jun 2018. IEEE Computer Society.
[44] Sergey Kastryulin, Jamil Zakirov, Denis Prokopenko, and Dmitry V. Dylov. Pytorch image quality: Metrics for image quality assessment, 2022.
[45] Phong Tran, Anh Tran, Quynh Phung, and Minh Hoai. Explore image deblurring via encoded blur kernel space. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
[46] Singanallur Venkatakrishnan, Charles Bouman, and Brendt Wohlberg. Plug-and-play priors for model based reconstruction. pages 945–948, 12 2013.

Appendix A Sampling with Latent Diffusion Models

Latent diffusion models (LDMs) [40] operate the denoising process not directly on the pixel space, but in a low-dimensional latent space. LDMs have been known for their superior performance and computational efficiency in high-dimensional data synthesis. In this section, we show that our method can be naturally extended to sampling with latent diffusion models.

Let $\mathcal{E}:\mathbb{R}^{n}\to\mathbb{R}^{k}$ and $\mathcal{D}:\mathbb{R}^{k}\to\mathbb{R}^{n}$ be a pair of encoder and decoder. Let ${\mathbf{z}}_{0}=\mathcal{E}({\mathbf{x}}_{0})$ where ${\mathbf{x}}_{0}\sim p({\mathbf{x}}_{0})$ , and $p({\mathbf{z}};\sigma)$ be the noisy distribution of latent vector ${\mathbf{z}}$ by adding Gaussian noises of variance $\sigma^{2}$ to the latent code of clean data. We have the following Proposition according to the factor graph in Fig. 3(b).

Proposition 2.

Suppose ${\mathbf{z}}_{t_{1}}$ is sampled from the measurement conditioned time-marginal $p({\mathbf{z}}_{t_{1}}\mid{\mathbf{y}})$ , then

{\mathbf{z}}_{t_{2}}\sim\mathbb{E}_{{\mathbf{x}}_{0}\sim p({\mathbf{x}}_{0}% \mid{\mathbf{z}}_{t_{1}},{\mathbf{y}})}[\mathcal{N}(\mathcal{E}({\mathbf{x}}_{% 0}),\sigma_{t_{2}}^{2}{\bm{I}})]

(10)

satisfies the measurement conditioned time-marginal $p({\mathbf{z}}_{t_{2}}\mid{\mathbf{y}})$ . Moreover,

{\mathbf{z}}_{t_{2}}\sim\mathbb{E}_{{\mathbf{z}}_{0}\sim p({\mathbf{z}}_{0}% \mid{\mathbf{z}}_{t_{1}},{\mathbf{y}})}[\mathcal{N}({\mathbf{z}}_{0},\sigma_{t% _{2}}^{2}{\bm{I}})].

(11)

also satisfies the measurement conditioned time-marginal $p({\mathbf{z}}_{t_{2}}\mid{\mathbf{y}})$ .

Remark. We can efficiently sample from $p({\mathbf{x}}_{0}\mid{\mathbf{z}}_{t_{1}},{\mathbf{y}})$ using similar strategies as in Section 3, i.e.,

{\mathbf{x}}_{0}^{(j+1)}={\mathbf{x}}_{0}^{(j)}+\eta\cdot\Big{(}\nabla_{{% \mathbf{x}}_{0}^{(j)}}\log p({\mathbf{x}}_{0}^{(j)}\mid{\mathbf{z}}_{t_{1}})+% \nabla_{{\mathbf{x}}_{0}^{(j)}}\log p({\mathbf{y}}\mid{\mathbf{x}}_{0}^{(j)})% \Big{)}+\sqrt{2\eta}\bm{\epsilon}_{j}.

(12)

We further approximate $p({\mathbf{x}}_{0}^{(j)}\mid{\mathbf{z}}_{t_{1}})$ by $\mathcal{N}({\mathbf{x}}_{0}^{(j)};\mathcal{D}(\hat{\mathbf{z}}_{0}({\mathbf{z% }}_{t_{1}})),r_{t_{1}}^{2}{\bm{I}})$ , where $\hat{\mathbf{z}}_{0}({\mathbf{z}}_{t_{1}})$ is computed by solving the (unconditional) probability flow ODE with a latent diffusion model ${\bm{s}}_{\bm{\theta}}$ starting at ${\mathbf{z}}_{t_{1}}$ . The Langevin dynamics can then be rewritten as

{\mathbf{x}}_{0}^{(j+1)}={\mathbf{x}}_{0}^{(j)}-\eta\cdot\nabla_{{\mathbf{x}}_% {0}^{(j)}}\left(\frac{\|{\mathbf{x}}_{0}^{(j)}-\mathcal{D}(\hat{\mathbf{z}}_{0% }({\mathbf{z}}_{t_{1}}))\|^{2}}{2r_{t_{1}}^{2}}+\frac{\|\mathcal{A}({\mathbf{x% }})-{\mathbf{y}}\|^{2}}{2\beta_{\mathbf{y}}^{2}}\right)+\sqrt{2\eta}\bm{% \epsilon}_{j}.

(13)

On the other hand, we can also decompose $p({\mathbf{z}}_{0}\mid{\mathbf{z}}_{t_{1}},{\mathbf{y}})\approx p({\mathbf{z}}% _{0}\mid{\mathbf{z}}_{t_{1}})p({\mathbf{y}}\mid{\mathbf{z}}_{0})$ and run Langevin dynamics directly on the latent space,

{\mathbf{z}}_{0}^{(j+1)}={\mathbf{z}}_{0}^{(j)}+\eta\cdot\Big{(}\nabla_{{% \mathbf{z}}_{0}^{(j)}}\log p({\mathbf{z}}_{0}^{(j)}\mid{\mathbf{z}}_{t_{1}})+% \nabla_{{\mathbf{z}}_{0}^{(j)}}\log p({\mathbf{y}}\mid{\mathbf{z}}_{0}^{(j)})% \Big{)}+\sqrt{2\eta}\bm{\epsilon}_{j}.

(14)

Assuming $p({\mathbf{z}}_{0}^{(j)}\mid{\mathbf{z}}_{t_{1}})$ by $\mathcal{N}({\mathbf{z}}_{0}^{(j)};\hat{\mathbf{z}}_{0}({\mathbf{z}}_{t}),r_{t% _{1}}^{2}{\bm{I}})$ , we derive another Langevin MCMC updating rule in the latent space,

{\mathbf{z}}_{0}^{(j+1)}={\mathbf{z}}_{0}^{(j)}-\eta\cdot\nabla_{{\mathbf{z}}_% {0}^{(j)}}\left(\frac{\|{\mathbf{z}}_{0}^{(j)}-\hat{\mathbf{z}}_{0}({\mathbf{z% }}_{t_{1}})\|^{2}}{2r_{t_{1}}^{2}}+\frac{\|\mathcal{A}(\mathcal{D}({\mathbf{z}% }_{0}^{(j)}))-{\mathbf{y}}\|^{2}}{2\beta_{\mathbf{y}}^{2}}\right)+\sqrt{2\eta}% \bm{\epsilon}_{j}.

(15)

Both approaches are applicable for our posterior sampling algorithm. We summarize DAPS with latent diffusion models in Algorithm 2.

Algorithm 2 Decoupled Annealing Posterior Sampling with Latent Diffusion Models

Latent space score model

{\bm{s}}_{\bm{\theta}}

, measurement

{\mathbf{y}}

, noise schedule

\sigma_{t}

t_{i\in\{0,\dots,N_{A}\}}

, encoder

\mathcal{E}

and decoder

\mathcal{D}

Sample

{\mathbf{z}}_{T}\sim\mathcal{N}({\bm{0}},\sigma_{T}^{2}{\bm{I}})

for

i=N_{A},N_{A}-1,\dots,1

Compute

\hat{\mathbf{z}}_{0}^{(0)}=\hat{\mathbf{z}}_{0}({\mathbf{z}}_{t_{i}})

by solving the probability flow ODE in Eq. 39 with

{\mathbf{s}}_{\bm{\theta}}

Pixel space Langevin dynamics:

\hat{\mathbf{x}}_{0}^{(0)}=\mathcal{D}(\hat{\mathbf{z}}_{0}^{(0)})

for

j=0,\dots,N-1

\hat{\mathbf{x}}_{0}^{(j+1)}\leftarrow\hat{\mathbf{x}}_{0}^{(j)}+\eta_{t}\Big{% (}\nabla_{\hat{\mathbf{x}}_{0}}\log p(\hat{\mathbf{x}}_{0}^{(j)}\mid{\mathbf{z% }}_{t_{i}})+\nabla_{\hat{\mathbf{x}}_{0}}\log p({\mathbf{y}}\mid\hat{\mathbf{x% }}_{0}^{(j)})\Big{)}+\sqrt{2\eta_{t}}\bm{\epsilon}_{j},\ \bm{\epsilon}_{j}\sim% \mathcal{N}({\bm{0}},{\bm{I}}).

(16)

end for

\hat{\mathbf{z}}_{0}^{(N)}=\mathcal{E}(\hat{\mathbf{x}}_{0}^{(N)})

Or, latent space Langevin dynamics:

for

j=0,\dots,N-1

\hat{\mathbf{z}}_{0}^{(j+1)}\leftarrow\hat{\mathbf{z}}_{0}^{(j)}+\eta_{t}\Big{% (}\nabla_{\hat{\mathbf{z}}_{0}^{(j)}}\log p(\hat{\mathbf{z}}_{0}^{(j)}\mid{% \mathbf{z}}_{t_{i}})+\nabla_{\hat{\mathbf{z}}_{0}^{(j)}}\log p({\mathbf{y}}% \mid\hat{\mathbf{z}}_{0}^{(j)})\Big{)}+\sqrt{2\eta_{t}}\bm{\epsilon}_{j},\ \bm% {\epsilon}_{j}\sim\mathcal{N}({\bm{0}},{\bm{I}}).

(17)

end for

Sample

{\mathbf{z}}_{t_{i-1}}\sim\mathcal{N}(\hat{\mathbf{z}}_{0}^{(N)},\sigma_{t_{i-% 1}}^{2}{\bm{I}})

end for

Return

\mathcal{D}({\mathbf{z}}_{0}).

Appendix B Proof for Propositions

Proposition 3 (Restated).

Suppose ${\mathbf{x}}_{t_{1}}$ is sampled from the measurement conditioned time-marginal $p({\mathbf{x}}_{t_{1}}\mid{\mathbf{y}})$ , then

{\mathbf{x}}_{t_{2}}\sim\mathbb{E}_{{\mathbf{x}}_{0}\sim p({\mathbf{x}}_{0}% \mid{\mathbf{x}}_{t_{1}},{\mathbf{y}})}\mathcal{N}({\mathbf{x}}_{0},\sigma_{t_% {2}}^{2}{\bm{I}})

(18)

satisfies the measurement conditioned time-marginal $p({\mathbf{x}}_{t_{2}}\mid{\mathbf{y}})$ .

Proof.

We first factorize the measurement conditioned time-marginal $p({\mathbf{x}}_{t_{2}}\mid{\mathbf{y}})$ by

	$\displaystyle p({\mathbf{x}}_{t_{2}}\mid{\mathbf{y}})$	$\displaystyle=\iint p({\mathbf{x}}_{t_{2}},{\mathbf{x}}_{0},{\mathbf{x}}_{t_{1% }}\mid{\mathbf{y}})\mathrm{d}{\mathbf{x}}_{0}\mathrm{d}{\mathbf{x}}_{t_{1}}$		(19)
		$\displaystyle=\iint p({\mathbf{x}}_{t_{1}}\mid{\mathbf{y}})p({\mathbf{x}}_{0}% \mid{\mathbf{x}}_{t_{1}},{\mathbf{y}})p({\mathbf{x}}_{t_{2}}\mid{\mathbf{x}}_{% 0},{\mathbf{x}}_{t_{1}},{\mathbf{y}})\mathrm{d}{\mathbf{x}}_{0}\mathrm{d}{% \mathbf{x}}_{t_{1}}.$		(20)

Recall the probabilistic graphical model in Fig. 3(a). ${\mathbf{x}}_{t_{2}}$ is independent of ${\mathbf{x}}_{t_{1}}$ and ${\mathbf{y}}$ given ${\mathbf{x}}_{0}$ . Therefore,

p({\mathbf{x}}_{t_{2}}\mid{\mathbf{x}}_{0},{\mathbf{x}}_{t_{1}},{\mathbf{y}})=% p({\mathbf{x}}_{t_{2}}\mid{\mathbf{x}}_{0}).

(21)

As a result,

$\displaystyle p({\mathbf{x}}_{t_{2}}\mid{\mathbf{y}})$	$\displaystyle=\iint p({\mathbf{x}}_{t_{1}}\mid{\mathbf{y}})p({\mathbf{x}}_{0}% \mid{\mathbf{x}}_{t_{1}},{\mathbf{y}})p({\mathbf{x}}_{t_{2}}\mid{\mathbf{x}}_{% 0})\mathrm{d}{\mathbf{x}}_{0}\mathrm{d}{\mathbf{x}}_{t_{1}}$	(22)
	$\displaystyle=\mathbb{E}_{{\mathbf{x}}_{t_{1}}\sim p({\mathbf{x}}_{t_{1}}\mid{% \mathbf{y}})}\mathbb{E}_{{\mathbf{x}}_{0}\sim p({\mathbf{x}}_{0}\mid{\mathbf{x% }}_{t_{1}},{\mathbf{y}})}p({\mathbf{x}}_{t_{2}}\mid{\mathbf{x}}_{0})$	(23)
	$\displaystyle=\mathbb{E}_{{\mathbf{x}}_{0}\sim p({\mathbf{x}}_{0}\mid{\mathbf{% x}}_{t_{1}},{\mathbf{y}})}\mathcal{N}({\mathbf{x}}_{t_{2}};{\mathbf{x}}_{0},% \sigma_{t_{2}}^{2}{\bm{I}}),$	(24)

given ${\mathbf{x}}_{t_{1}}$ is drawn from the measurement conditioned time-marginal $p({\mathbf{x}}_{t_{1}}\mid y)$ . ∎

Proposition 4 (Restated).

Suppose ${\mathbf{z}}_{t_{1}}$ is sampled from the measurement conditioned time-marginal $p({\mathbf{z}}_{t_{1}}\mid{\mathbf{y}})$ , then

{\mathbf{z}}_{t_{2}}\sim\mathbb{E}_{{\mathbf{x}}_{0}\sim p({\mathbf{x}}_{0}% \mid{\mathbf{z}}_{t_{1}},{\mathbf{y}})}\mathcal{N}(\mathcal{E}({\mathbf{x}}_{0% }),\sigma_{t_{2}}^{2}{\bm{I}})

(25)

satisfies the measurement conditioned time-marginal $p({\mathbf{z}}_{t_{2}}\mid{\mathbf{y}})$ . Moreover,

{\mathbf{z}}_{t_{2}}\sim\mathbb{E}_{{\mathbf{z}}_{0}\sim p({\mathbf{z}}_{0}% \mid{\mathbf{z}}_{t_{1}},{\mathbf{y}})}\mathcal{N}({\mathbf{z}}_{0},\sigma_{t_% {2}}^{2}{\bm{I}}).

(26)

also satisfies the measurement conditioned time-marginal $p({\mathbf{z}}_{t_{2}}\mid{\mathbf{y}})$ .

Proof.

We first factorize the measurement conditioned time-marginal $p({\mathbf{z}}_{t_{2}}\mid{\mathbf{y}})$ by

	$\displaystyle p({\mathbf{z}}_{t_{2}}\mid{\mathbf{y}})$	$\displaystyle=\iint p({\mathbf{z}}_{t_{2}},{\mathbf{x}}_{0},{\mathbf{z}}_{t_{1% }}\mid{\mathbf{y}})\mathrm{d}{\mathbf{x}}_{0}\mathrm{d}{\mathbf{z}}_{t_{1}}$		(27)
		$\displaystyle=\iint p({\mathbf{z}}_{t_{1}}\mid{\mathbf{y}})p({\mathbf{x}}_{0}% \mid{\mathbf{z}}_{t_{1}},{\mathbf{y}})p({\mathbf{z}}_{t_{2}}\mid{\mathbf{x}}_{% 0},{\mathbf{z}}_{t_{1}},y)\mathrm{d}{\mathbf{x}}_{0}\mathrm{d}{\mathbf{z}}_{t_% {1}}.$		(28)

Recall the probabilistic graphical model in Fig. 3(b). ${\mathbf{z}}_{t_{2}}$ is independent of ${\mathbf{z}}_{t_{1}}$ and ${\mathbf{y}}$ given ${\mathbf{z}}_{0}$ , while ${\mathbf{z}}_{0}$ is determined only by ${\mathbf{x}}_{0}$ . Therefore,

p({\mathbf{z}}_{t_{2}}\mid{\mathbf{x}}_{0},{\mathbf{z}}_{t_{1}},{\mathbf{y}})=% p({\mathbf{z}}_{t_{2}}\mid{\mathbf{x}}_{0})=\mathcal{N}({\mathbf{z}}_{t_{2}};% \mathcal{E}({\mathbf{x}}_{0}),\sigma_{t_{2}}^{2}{\bm{I}}).

(29)

Hence,

$\displaystyle p({\mathbf{z}}_{t_{2}}\mid{\mathbf{y}})$	$\displaystyle=\iint p({\mathbf{z}}_{t_{1}}\mid{\mathbf{y}})p({\mathbf{x}}_{0}% \mid{\mathbf{z}}_{t_{1}},{\mathbf{y}})p({\mathbf{z}}_{t_{2}}\mid{\mathbf{x}}_{% 0})\mathrm{d}{\mathbf{x}}_{0}\mathrm{d}{\mathbf{x}}_{t_{1}}$	(30)
	$\displaystyle=\mathbb{E}_{{\mathbf{z}}_{t_{1}}\sim p({\mathbf{z}}_{t_{1}}\mid{% \mathbf{y}})}\mathbb{E}_{{\mathbf{x}}_{0}\sim p({\mathbf{x}}_{0}\mid{\mathbf{z% }}_{t_{1}},{\mathbf{y}})}p({\mathbf{z}}_{t_{2}}\mid{\mathbf{x}}_{0})$	(31)
	$\displaystyle=\mathbb{E}_{{\mathbf{x}}_{0}\sim p({\mathbf{x}}_{0}\mid{\mathbf{% z}}_{t_{1}},{\mathbf{y}})}\mathcal{N}({\mathbf{z}}_{t_{2}};\mathcal{E}({% \mathbf{x}}_{0}),\sigma_{t_{2}}^{2}{\bm{I}}),$	(32)

assuming ${\mathbf{z}}_{t_{1}}$ is drawn from $p({\mathbf{z}}_{t_{1}}\mid{\mathbf{y}})$ .

Moreover, we can also factorize $p({\mathbf{z}}_{t_{2}}\mid{\mathbf{y}})$ by

$\displaystyle p({\mathbf{z}}_{t_{2}}\mid{\mathbf{y}})$	$\displaystyle=\iint p({\mathbf{z}}_{t_{2}},{\mathbf{z}}_{t_{1}},{\mathbf{z}}_{% 0}\mid{\mathbf{y}})\mathrm{d}{\mathbf{z}}_{0}\mathrm{d}{\mathbf{z}}_{t_{1}}$	(33)
	$\displaystyle=\iint p({\mathbf{z}}_{t_{1}}\mid{\mathbf{y}})p({\mathbf{z}}_{0}% \mid{\mathbf{z}}_{t_{1}},{\mathbf{y}})p({\mathbf{z}}_{t_{2}}\mid{\mathbf{z}}_{% 0},{\mathbf{z}}_{t_{1}},{\mathbf{y}})\mathrm{d}{\mathbf{z}}_{0}\mathrm{d}{% \mathbf{z}}_{t_{1}}$	(34)
	$\displaystyle=\iint p({\mathbf{z}}_{t_{1}}\mid{\mathbf{y}})p({\mathbf{z}}_{0}% \mid{\mathbf{z}}_{t_{1}},{\mathbf{y}})p({\mathbf{z}}_{t_{2}}\mid{\mathbf{z}}_{% t_{1}})\mathrm{d}{\mathbf{z}}_{0}\mathrm{d}{\mathbf{z}}_{t_{1}}.$	(35)

The last equation is again derived directly from Fig. 3(b). Given that ${\mathbf{z}}_{t_{1}}$ is sampled from $p({\mathbf{z}}_{t_{1}}\mid{\mathbf{y}})$ , we have that

p({\mathbf{z}}_{t_{2}}\mid{\mathbf{y}})=\mathbb{E}_{{\mathbf{z}}_{0}\sim p({% \mathbf{z}}_{0}\mid{\mathbf{z}}_{t_{1}},{\mathbf{y}})}\mathcal{N}({\mathbf{z}}% _{t_{2}};{\mathbf{z}}_{0},\sigma_{t_{2}}^{2}{\bm{I}}).

(36)

∎

Appendix C Discussions

C.1 Sampling Efficiency

The sampling efficiency is a crucial aspect of inverse problem solvers. The time cost of diffusion model-based methods is highly dependent on the number of neural function evaluations (NFE). Here in Table 6 we show the NFE of the default setting of some pixel space baseline methods and DAPS with different configurations. In Fig. 7, we show the quantitative evaluation of DAPS with different NFE. As we can see, DAPS can achieve relatively much better performance than baselines with small NFE.

Table 6: Sampling time of DAPS on phase retrieval task with FFHQ 256. The nonparallel single image sampling time on the FFHQ 256 dataset with 1 NVIDIA A100-SXM4-80GB GPU. The time depends may differ slightly in different runs.

Configuration	ODE Steps	Annealing Steps	NFE	Seconds/Image
DPS	-	-	1000	35
DDRM	-	-	20	2
RED-diff	-	-	1000	47
DAPS-50	2	25	50	4
DAPS-100	2	50	100	7
DAPS-200	2	100	200	13
DAPS-400	4	100	400	17
DAPS-1K	5	200	1000	37
DAPS-2K	8	250	2000	61
DAPS-4K	10	400	4000	108

C.2 Limitations and Future Extension

Though DAPS achieves significantly better performance on inverse problems like phase retrieval, there are still some limitations.

First, we only adopt a very naive implementation of the latent diffusion model with DAPS, referred to as LatentDAPS. However, some recent techniques [27, 25] have been proposed to improve the performance of posterior sampling with latent diffusion models. Specifically, one main challenge is that ${\mathbf{x}}_{0\mid{\mathbf{y}}}$ obtained by Langevin dynamics in pixel space might not lie in the manifold of clean images. This could further lead to a sub-optimal performance for autoencoders in diffusion models since they are only trained with clean data manifold.

Furthermore, we only implement DAPS with a decreasing annealing scheduler, but the DAPS framework can support any scheduler function $\sigma_{t}^{A}$ as long as $\sigma_{0}^{A}=0$ . A non-monotonic scheduler has the potential of providing DAPS with more power to explore the solution space.

Finally, we utilize fixed NFE for the ODE solver. However, one could adjust it automatically. For example, less ODE solver NFE for smaller $t$ in later sampling steps. We would leave the discussions above as possible future extensions.

C.3 Broader Impacts

We anticipate that DAPS can offer a new paradigm for addressing challenging real-world inverse problems using diffusion models. DAPS tackles these problems by employing a diffusion model as a general denoiser, which learns to model a powerful prior data distribution. This approach could significantly enrich the array of methods available to the inverse problem-solving community. However, it is important to note that DAPS might generate biased samples if the diffusion model is trained on biased data. Therefore, caution should be exercised when using DAPS in bias-sensitive scenarios.

Appendix D Experimental Details

D.1 Inverse Problem Setup

Most inverse problems are implemented in the same way as introduced in [3]. However, for inpainting with random pixel masks, motion deblurring, and nonlinear deblurring, we fix a certain realization for fair comparison by using the same random seeds for mask generation and blurring kernels. Moreover, for phase retrieval, we adopt a slightly different version as follows:

{\mathbf{y}}\sim\mathcal{N}(|\mathbf{F}\mathbf{P}(0.5{\mathbf{x}}_{0}+0.5)|,% \beta_{\mathbf{y}}^{2}{\bm{I}}),

(37)

which normalize the data to lies in range $[0,1]$ first. Here $\mathbf{F}$ and $\mathbf{P}$ are discrete Fourier transformation matrices and oversampling matrices with ratio $k/n$ . Same as [3], we use an oversampling factor $k=2$ and $n=8$ . We normalize input $x_{0}$ by shifting its data range from $[-1,1]$ to $[0,1]$ to better fit practical settings, where the measured signals are usually non-negative.

The measurement for high dynamic range reconstruction is defined as

{\mathbf{y}}\sim\mathcal{N}(\mathrm{clip}(\alpha{\mathbf{x}}_{0},-1,1),\beta_{% \mathbf{y}}^{2}{\bm{I}}),

(38)

where the scale $\alpha$ controls the distortion strength. We set $\alpha=2$ in our experiments.

D.2 DAPS Implementation Details

Euler ODE Solver

For any given increasing and differentiable noisy scheduler $\sigma_{t}$ and any initial data distribution $p({\mathbf{x}}_{0})$ , we consider the forward diffusion SDE $\mathrm{d}{\mathbf{x}}_{t}=\sqrt{2\dot{\sigma_{t}}\sigma_{t}}\,\mathrm{d}{% \mathbf{w}}_{t}$ , where $\dot{\sigma_{t}}$ denotes the time derivative of $\sigma_{t}$ and $\mathrm{d}{\mathbf{w}}_{t}$ represents the standard Wiener process. This SDE induces a probability path of the marginal distribution ${\mathbf{x}}_{t}$ , denoted as $p({\mathbf{x}}_{t};\sigma_{t})$ . As demonstrated in [20, 1], the probability flow ODE for the above process is given by:

\mathrm{d}{\mathbf{x}}_{t}=-\dot{\sigma_{t}}\sigma_{t}\nabla_{{\mathbf{x}}_{t}% }\log p({\mathbf{x}}_{t};\sigma_{t})\,\mathrm{d}t.

(39)

By employing the appropriate preconditioning introduced in [41], we can transform the pre-trained diffusion model with parameter ${\bm{\theta}}$ to approximate the score function of the above probability path: ${\bm{s}}_{\bm{\theta}}({\mathbf{x}}_{t},\sigma_{t})\approx\nabla_{{\mathbf{x}}% _{t}}\log p({\mathbf{x}}_{t};\sigma_{t})$ . In DAPS, we compute $\hat{\mathbf{x}}_{0}({\mathbf{x}}_{t})$ by solving the ODE given ${\mathbf{x}}_{t}$ and time $t$ as initial values.

Numerically, we use scheduler $\sigma_{t}=t$ and implement an Euler solver [20], which evaluates $\frac{\mathrm{d}{\mathbf{x}}_{t}}{\mathrm{d}t}$ at $N_{\text{ode}}$ discretized time steps in interval $[0,t]$ and updates ${\mathbf{x}}_{t}$ by the discretized ODE. The time step $t_{i}$ , $i=1,\cdots,N_{\text{ode}}$ are selected by a polynomial interpolation between $t$ and $t_{\min}$ :

t_{i}=\left(t^{\frac{1}{\rho}}+\dfrac{i}{N-1}\left(t_{\min}^{\frac{1}{\rho}}-t% ^{\frac{1}{\rho}}\right)\right)^{\rho}.

(40)

We use $\rho=7$ and $t_{\min}=0.02$ throughout all experiments.

Annealing Scheduler

To sample from the posterior distribution $p({\mathbf{x}}_{0}\mid{\mathbf{y}})$ , DAPS adopts a noise annealing process to sample ${\mathbf{x}}_{t}$ from measurement conditioned time-marginals $p({\mathbf{x}}_{t}\mid{\mathbf{y}})$ , where ${\mathbf{x}}_{t}$ is defined by noisy perturbation of ${\mathbf{x}}_{0}$ : ${\mathbf{x}}_{t}={\mathbf{x}}_{0}+\sigma_{t}^{A}\bm{\epsilon}$ , $\bm{\epsilon}\sim\mathcal{N}({\bm{0}},{\bm{I}})$ , where $\sigma_{t}^{A}$ is the annealing scheduler. In practice, we start from time $T$ , assuming $p({\mathbf{x}}_{T}\mid{\mathbf{y}})\approx\mathcal{N}({\bm{0}},\sigma_{\max}^{% 2}{\bm{I}})$ , with $\sigma_{\max}=\sigma_{T}^{A}$ . For simplicity, we adopt $\sigma_{t}^{A}=t$ and the same polynomial interpolation in Eq. 40 between $\sigma_{0}$ and $\sigma_{T}$ for total $N_{A}$ steps.

DAPS with Latent Diffusion Model

As shown in Appendix A, the Langevin dynamics can be performed both in pixel space in Eq. 25 and in latent space Eq. 26 by adopting different assumptions. However, the computation cost of latent space Langevin dynamics is much more costly than pixel space one. Thus we perform pixel space Langevin dynamics in early annealing time steps and perform latent space Langevin dynamics later. We use a hyperparamter $R$ to decide the proportion of total annealing steps using latent pixel space Langevin dynamics.

Hyperparameters Overview

The hyperparameters of DAPS can be categorized into the following three categories.

(1) The ODE solver steps $N_{\text{ode}}$ and annealing scheduler $N_{A}$ . These two control the total NFE of DAPS. Need to trade-off between cost and quality. For linear tasks, $N_{\text{ode}}=5$ and $N_{A}=200$ . And for nonlinear tasks $N_{\text{ode}=10}$ and $N_{A}=400$ .

(2) The Langevin step size $\eta$ and total step $N$ . These two control the sample quality from approximated $p({\mathbf{x}}_{0}\mid{\mathbf{x}}_{t},{\mathbf{y}})$ . For all tasks, we first fix $N=100$ and find the best $\eta$ for a given forward function, typically by a grid search. We include the final $\eta$ in both pixel and latent Langevin and the corresponding ratio $R$ in Table 7. Notice the latent step size and ratio are only used by LatentDPS. Moreover, instead of using the true $\beta_{\mathbf{y}}=0.05$ in Eq. 7, we regard $\beta_{\mathbf{y}}$ as a hyperparameter and set it to $0.01$ for better empirical performance.

(3) The $\sigma_{\max}$ and $\sigma_{\min}$ used in annealing process. We set $\sigma_{\max}=100$ and $10$ for DAPS and LatentDAPS and $\sigma_{\min}=0.1$ to make be more robust to noise in measurement.

Tasks	SR (x4)	Inpaint (Box)	Inpaint (Random)	Gaussian deblurring	Motion deblurring	Phase retrieval	Nonlinear deblurring	High dynamic range
$\eta_{\text{pixel}}$	1e-4	5e-5	1e-4	1e-4	5e-5	5e-5	5e-5	2e-5
$\eta_{\text{latent}}$	2e-6	2e-6	2e-6	2e-6	2e-6	4e-6	2e-6	6e-7
$R$	0.1	0.1	0.1	0.1	0.1	0.3	0.1	0.1

Table 7: The Langevin dynamics hyperparameters for all tasks.

D.3 Baseline Details

DPS

All experiments are conducted with the original code and default settings as specified in [3]. For high dynamic range reconstruction task, we use the $\xi_{i}=1/\|{\mathbf{y}}-\mathcal{A}(\hat{\mathbf{x}}_{0}({\mathbf{x}}_{i})\|$

DDRM

We adopt the default setting of $\eta_{B}=1.0$ and $\eta=0.85$ with 20 DDIM steps as specified in [2].

DDNM

We adopt the default setting of $\eta_{B}=1.0$ and $\eta=0.85$ with 100 DDIM steps as specified in[31].

PnP-ADMM

We use a UNet denoiser trained on ImageNet for experiments.

RED-diff

For a fair comparison, we use a slightly different RED-diff[28] by initializing the algorithm with random noise instead of a solution from the pseudoinverse. This might lead to a worse performance compared with the original RED-diff algorithm. We use $\lambda=0.25$ and $lr=0.5$ for all experiments.

PSLD

We use the official implementation of PSLD [27] with the default configurations. Specifically, we use Stable-diffusion v1.5 for ImageNet experiments, which is commonly believed to be a stronger pre-trained model than LDM-VQ4 used in other experiments.

ReSample

All experiments are based on the official code of ReSample[25] with 500 steps DDIM sampler.

Appendix E Experiments on Synthetic Data Distributions

Fig. 4 shows the sampling trajectories and predicted posterior distribution of DPS and DAPS on a synthetic data distribution. Specifically, we create a 2D Gaussian mixture as the prior distribution, i.e., $p({\mathbf{x}}_{0})=\frac{1}{2}\left(\mathcal{N}({\mathbf{x}}_{0};{\bm{c}}_{1}% ,{\bm{\Sigma}}_{1})+\mathcal{N}({\mathbf{x}}_{0};{\bm{c}}_{2},{\bm{\Sigma}}_{2% })\right)$ . Let ${\bm{c}}_{1}=(-0.3,-0.4)$ and ${\bm{c}}_{2}=(0.6,0.5)$ , ${\bm{\Sigma}}_{1}={\bm{\Sigma}}_{2}=\mathrm{diag}(0.01,0.04)$ . We draw 1000 samples from this prior distribution to create a small dataset, from which we can compute a closed-form empirical Stein score function at any noise level $\sigma$ .

Moreover, we consider the simplest measurement function that contains two modes, i.e., ${\mathbf{y}}=\exp\left(-\frac{\|{\mathbf{x}}\|^{2}}{0.05}\right)+\exp\left(-% \frac{\|{\mathbf{x}}-(0.5,0.5)\|^{2}}{0.05}\right)+{\mathbf{n}}$ , where ${\mathbf{n}}\sim\mathcal{N}({\bm{0}},\beta_{\mathbf{y}}^{2}{\bm{I}})$ with $\beta_{\mathbf{y}}=0.3$ . Let ${\mathbf{y}}=0$ , so that the likelihood $p({\mathbf{y}}\mid{\mathbf{x}}_{0})$ has two modes at $(0.5,0.5)$ and $(0,0)$ . Since the prior distribution is large only at $(0.5,0.5)$ , the posterior distribution is single-mode, as illustrated in Fig. 8.

We run both DPS and DAPS for 200 steps and 100 independent samples on this synthetic dataset. However, as shown in Fig. 8, both SDE and ODE versions of DPS converge to two different modes. This is because DPS suffers from large errors in estimating likelihood $p({\mathbf{x}}_{t}\mid{\mathbf{y}})$ , especially in the early stages. These errors can hardly be corrected and are propagated along the SDE/ODE trajectory. DAPS, on the other hand, samples from a time-marginal distribution at each time step, and is able to recover the posterior distribution more accurately.

We further investigate the performance of posterior estimation by computing the Wasserstein distance between samples ${\mathbf{x}}_{t}$ and ground truth posterior $p({\mathbf{x}}_{t}\mid{\mathbf{y}})$ for each step $t$ . As shown in Fig. 9, the Wasserstein distance for DAPS decreases quickly and remains small throughout the sampling process. This conforms with our theory that the distribution of ${\mathbf{x}}_{t}$ is ensured to be $p({\mathbf{x}}_{t}\mid{\mathbf{y}})$ for every noise level $\sigma_{t}$ .

Appendix F Additional Results

F.1 More Ablation Study

Effectiveness of ODE number of function evaluation.

Recall that we use an ODE sampler to compute $\hat{\mathbf{x}}_{0}({\mathbf{x}}_{t})$ , the estimated mean of the approximated distribution $p({\mathbf{x}}_{0}\mid{\mathbf{x}}_{t})$ . We use the same number of function evaluations in our ODE sampler throughout the entire algorithm. To test how the number of function evaluations (NFE) in the ODE sampler influences the performance, we try different NFE on two linear tasks and one nonlinear task. As shown in Fig. 10, increasing NFE in the ODE sampler consistently improves the overall image perceptual quality. In particular, when NFE is $1$ , the ODE sampler is equivalent to computing $\mathbb{E}[{\mathbf{x}}_{0}\mid{\mathbf{x}}_{t}]$ via Tweedie’s formula.

Effectiveness of annealing noise scheduling step.

To better understand how the scheduling of sigma influences performance, we also evaluate the effects of sampling with varying noise scheduling steps. A larger number of scheduling steps implies a denser discretization grid between $\sigma_{\max}$ and $\sigma_{\min}$ . The quantitative results are shown in Fig. 11. The performance of DAPS on linear tasks slightly increases as the number of annealing noise scheduling steps increases, while its performance on nonlinear tasks (e.g., phase retrieval) increases dramatically with the number of scheduling steps. However, DAPS achieves a near-optimal sample quality, when the number of noise scheduling steps is larger than 200.

Different Measurement Noisy Level

When subjected to varying levels of measurement noise, the quality of solutions to inverse problems can differ significantly. To evaluate the performance of DAPS under different noise conditions, we present the results in Fig. 12. DAPS is robust to small noise levels ( $\sigma<0.05$ ) and degrades almost linearly as $\sigma$ continues to increase.

F.2 More Discussion on Phase Retrieval

Phase retrieval is inherently a harder problem than other tasks considered in this paper. There are multiple disjoint modes with exactly the same measurement for phase retrieval. This is completely different from other tasks such as super-resolution and deblurring, for which the subset of images with low measurement error is a continuous set. We show in Fig. 13 eight images with disparate perceptual features but with exactly the same measurement in phase retrieval.

F.3 More Analysis on Sampling Trajectory

Here we show a longer trajectory of phase retrieval in Figs. 14, 15 and 16. The $\hat{\mathbf{x}}_{0}({\mathbf{x}}_{t})$ evolves from unconditional samples from model to the posterior samples while ${\mathbf{x}}_{0\mid{\mathbf{y}}}$ evolves from a noisy conditioned samples to the posterior samples. These two trajectories converge to the same sample as noise annealing down.

F.4 More Qualitative Samples

We show a full stack of phase retrieval samples in $4$ runs without manual post-selection in Figs. 17 and 18. More samples for other tasks are shown in Figs. 19 and 20. The more diverse samples from box inpainting of size $192\times 192$ and super-resolution of factor $16$ are shown in Figs. 22 and 21.