-
Target Score Matching
Authors:
Valentin De Bortoli,
Michael Hutchinson,
Peter Wirnsberger,
Arnaud Doucet
Abstract:
Denoising Score Matching estimates the score of a noised version of a target distribution by minimizing a regression loss and is widely used to train the popular class of Denoising Diffusion Models. A well known limitation of Denoising Score Matching, however, is that it yields poor estimates of the score at low noise levels. This issue is particularly unfavourable for problems in the physical sci…
▽ More
Denoising Score Matching estimates the score of a noised version of a target distribution by minimizing a regression loss and is widely used to train the popular class of Denoising Diffusion Models. A well known limitation of Denoising Score Matching, however, is that it yields poor estimates of the score at low noise levels. This issue is particularly unfavourable for problems in the physical sciences and for Monte Carlo sampling tasks for which the score of the clean original target is known. Intuitively, estimating the score of a slightly noised version of the target should be a simple task in such cases. In this paper, we address this shortcoming and show that it is indeed possible to leverage knowledge of the target score. We present a Target Score Identity and corresponding Target Score Matching regression loss which allows us to obtain score estimates admitting favourable properties at low noise levels.
△ Less
Submitted 13 February, 2024;
originally announced February 2024.
-
Particle Denoising Diffusion Sampler
Authors:
Angus Phillips,
Hai-Dang Dau,
Michael John Hutchinson,
Valentin De Bortoli,
George Deligiannidis,
Arnaud Doucet
Abstract:
Denoising diffusion models have become ubiquitous for generative modeling. The core idea is to transport the data distribution to a Gaussian by using a diffusion. Approximate samples from the data distribution are then obtained by estimating the time-reversal of this diffusion using score matching ideas. We follow here a similar strategy to sample from unnormalized probability densities and comput…
▽ More
Denoising diffusion models have become ubiquitous for generative modeling. The core idea is to transport the data distribution to a Gaussian by using a diffusion. Approximate samples from the data distribution are then obtained by estimating the time-reversal of this diffusion using score matching ideas. We follow here a similar strategy to sample from unnormalized probability densities and compute their normalizing constants. However, the time-reversed diffusion is here simulated by using an original iterative particle scheme relying on a novel score matching loss. Contrary to standard denoising diffusion models, the resulting Particle Denoising Diffusion Sampler (PDDS) provides asymptotically consistent estimates under mild assumptions. We demonstrate PDDS on multimodal and high dimensional sampling tasks.
△ Less
Submitted 15 June, 2024; v1 submitted 9 February, 2024;
originally announced February 2024.
-
Augmented Bridge Matching
Authors:
Valentin De Bortoli,
Guan-Horng Liu,
Tianrong Chen,
Evangelos A. Theodorou,
Weilie Nie
Abstract:
Flow and bridge matching are a novel class of processes which encompass diffusion models. One of the main aspect of their increased flexibility is that these models can interpolate between arbitrary data distributions i.e. they generalize beyond generative modeling and can be applied to learning stochastic (and deterministic) processes of arbitrary transfer tasks between two given distributions. I…
▽ More
Flow and bridge matching are a novel class of processes which encompass diffusion models. One of the main aspect of their increased flexibility is that these models can interpolate between arbitrary data distributions i.e. they generalize beyond generative modeling and can be applied to learning stochastic (and deterministic) processes of arbitrary transfer tasks between two given distributions. In this paper, we highlight that while flow and bridge matching processes preserve the information of the marginal distributions, they do \emph{not} necessarily preserve the coupling information unless additional, stronger optimality conditions are met. This can be problematic if one aims at preserving the original empirical pairing. We show that a simple modification of the matching process recovers this coupling by augmenting the velocity field (or drift) with the information of the initial sample point. Doing so, we lose the Markovian property of the process but preserve the coupling information between distributions. We illustrate the efficiency of our augmentation in learning mixture of image translation tasks.
△ Less
Submitted 12 November, 2023;
originally announced November 2023.
-
Plug-and-Play Posterior Sampling under Mismatched Measurement and Prior Models
Authors:
Marien Renaud,
Jiaming Liu,
Valentin de Bortoli,
Andrés Almansa,
Ulugbek S. Kamilov
Abstract:
Posterior sampling has been shown to be a powerful Bayesian approach for solving imaging inverse problems. The recent plug-and-play unadjusted Langevin algorithm (PnP-ULA) has emerged as a promising method for Monte Carlo sampling and minimum mean squared error (MMSE) estimation by combining physical measurement models with deep-learning priors specified using image denoisers. However, the intrica…
▽ More
Posterior sampling has been shown to be a powerful Bayesian approach for solving imaging inverse problems. The recent plug-and-play unadjusted Langevin algorithm (PnP-ULA) has emerged as a promising method for Monte Carlo sampling and minimum mean squared error (MMSE) estimation by combining physical measurement models with deep-learning priors specified using image denoisers. However, the intricate relationship between the sampling distribution of PnP-ULA and the mismatched data-fidelity and denoiser has not been theoretically analyzed. We address this gap by proposing a posterior-L2 pseudometric and using it to quantify an explicit error bound for PnP-ULA under mismatched posterior distribution. We numerically validate our theory on several inverse problems such as sampling from Gaussian mixture models and image deblurring. Our results suggest that the sensitivity of the sampling distribution of PnP-ULA to a mismatch in the measurement model and the denoiser can be precisely characterized.
△ Less
Submitted 5 October, 2023;
originally announced October 2023.
-
Diffusion Schrödinger Bridges for Bayesian Computation
Authors:
Jeremy Heng,
Valentin De Bortoli,
Arnaud Doucet
Abstract:
Denoising diffusion models are a novel class of generative models that have recently become extremely popular in machine learning. In this paper, we describe how such ideas can also be used to sample from posterior distributions and, more generally, any target distribution whose density is known up to a normalizing constant. The key idea is to consider a forward ``noising'' diffusion initialized a…
▽ More
Denoising diffusion models are a novel class of generative models that have recently become extremely popular in machine learning. In this paper, we describe how such ideas can also be used to sample from posterior distributions and, more generally, any target distribution whose density is known up to a normalizing constant. The key idea is to consider a forward ``noising'' diffusion initialized at the target distribution which ``transports'' this latter to a normal distribution for long diffusion times. The time-reversal of this process, the ``denoising'' diffusion, thus ``transports'' the normal distribution to the target distribution and can be approximated so as to sample from the target. To accelerate simulation, we show how one can introduce and approximate a Schrödinger bridge between these two distributions, i.e. a diffusion which transports the normal to the target in finite time.
△ Less
Submitted 27 August, 2023;
originally announced August 2023.
-
Nearly $d$-Linear Convergence Bounds for Diffusion Models via Stochastic Localization
Authors:
Joe Benton,
Valentin De Bortoli,
Arnaud Doucet,
George Deligiannidis
Abstract:
Denoising diffusions are a powerful method to generate approximate samples from high-dimensional data distributions. Recent results provide polynomial bounds on their convergence rate, assuming $L^2$-accurate scores. Until now, the tightest bounds were either superlinear in the data dimension or required strong smoothness assumptions. We provide the first convergence bounds which are linear in the…
▽ More
Denoising diffusions are a powerful method to generate approximate samples from high-dimensional data distributions. Recent results provide polynomial bounds on their convergence rate, assuming $L^2$-accurate scores. Until now, the tightest bounds were either superlinear in the data dimension or required strong smoothness assumptions. We provide the first convergence bounds which are linear in the data dimension (up to logarithmic factors) assuming only finite second moments of the data distribution. We show that diffusion models require at most $\tilde O(\frac{d \log^2(1/δ)}{\varepsilon^2})$ steps to approximate an arbitrary distribution on $\mathbb{R}^d$ corrupted with Gaussian noise of variance $δ$ to within $\varepsilon^2$ in KL divergence. Our proof extends the Girsanov-based methods of previous works. We introduce a refined treatment of the error from discretizing the reverse SDE inspired by stochastic localization.
△ Less
Submitted 5 March, 2024; v1 submitted 7 August, 2023;
originally announced August 2023.
-
Geometric Neural Diffusion Processes
Authors:
Emile Mathieu,
Vincent Dutordoir,
Michael J. Hutchinson,
Valentin De Bortoli,
Yee Whye Teh,
Richard E. Turner
Abstract:
Denoising diffusion models have proven to be a flexible and effective paradigm for generative modelling. Their recent extension to infinite dimensional Euclidean spaces has allowed for the modelling of stochastic processes. However, many problems in the natural sciences incorporate symmetries and involve data living in non-Euclidean spaces. In this work, we extend the framework of diffusion models…
▽ More
Denoising diffusion models have proven to be a flexible and effective paradigm for generative modelling. Their recent extension to infinite dimensional Euclidean spaces has allowed for the modelling of stochastic processes. However, many problems in the natural sciences incorporate symmetries and involve data living in non-Euclidean spaces. In this work, we extend the framework of diffusion models to incorporate a series of geometric priors in infinite-dimension modelling. We do so by a) constructing a noising process which admits, as limiting distribution, a geometric Gaussian process that transforms under the symmetry group of interest, and b) approximating the score with a neural network that is equivariant w.r.t. this group. We show that with these conditions, the generative functional model admits the same symmetry. We demonstrate scalability and capacity of the model, using a novel Langevin-based conditional sampler, to fit complex scalar and vector fields, with Euclidean and spherical codomain, on synthetic and real-world weather data.
△ Less
Submitted 11 July, 2023;
originally announced July 2023.
-
Tree-Based Diffusion Schrödinger Bridge with Applications to Wasserstein Barycenters
Authors:
Maxence Noble,
Valentin De Bortoli,
Arnaud Doucet,
Alain Durmus
Abstract:
Multi-marginal Optimal Transport (mOT), a generalization of OT, aims at minimizing the integral of a cost function with respect to a distribution with some prescribed marginals. In this paper, we consider an entropic version of mOT with a tree-structured quadratic cost, i.e., a function that can be written as a sum of pairwise cost functions between the nodes of a tree. To address this problem, we…
▽ More
Multi-marginal Optimal Transport (mOT), a generalization of OT, aims at minimizing the integral of a cost function with respect to a distribution with some prescribed marginals. In this paper, we consider an entropic version of mOT with a tree-structured quadratic cost, i.e., a function that can be written as a sum of pairwise cost functions between the nodes of a tree. To address this problem, we develop Tree-based Diffusion Schrödinger Bridge (TreeDSB), an extension of the Diffusion Schrödinger Bridge (DSB) algorithm. TreeDSB corresponds to a dynamic and continuous state-space counterpart of the multimarginal Sinkhorn algorithm. A notable use case of our methodology is to compute Wasserstein barycenters which can be recast as the solution of a mOT problem on a star-shaped tree. We demonstrate that our methodology can be applied in high-dimensional settings such as image interpolation and Bayesian fusion.
△ Less
Submitted 28 October, 2023; v1 submitted 25 May, 2023;
originally announced May 2023.
-
Trans-Dimensional Generative Modeling via Jump Diffusion Models
Authors:
Andrew Campbell,
William Harvey,
Christian Weilbach,
Valentin De Bortoli,
Tom Rainforth,
Arnaud Doucet
Abstract:
We propose a new class of generative models that naturally handle data of varying dimensionality by jointly modeling the state and dimension of each datapoint. The generative process is formulated as a jump diffusion process that makes jumps between different dimensional spaces. We first define a dimension destroying forward noising process, before deriving the dimension creating time-reversed gen…
▽ More
We propose a new class of generative models that naturally handle data of varying dimensionality by jointly modeling the state and dimension of each datapoint. The generative process is formulated as a jump diffusion process that makes jumps between different dimensional spaces. We first define a dimension destroying forward noising process, before deriving the dimension creating time-reversed generative process along with a novel evidence lower bound training objective for learning to approximate it. Simulating our learned approximation to the time-reversed generative process then provides an effective way of sampling data of varying dimensionality by jointly generating state values and dimensions. We demonstrate our approach on molecular and video datasets of varying dimensionality, reporting better compatibility with test-time diffusion guidance imputation tasks and improved interpolation capabilities versus fixed dimensional models that generate state values and dimensions separately.
△ Less
Submitted 30 October, 2023; v1 submitted 25 May, 2023;
originally announced May 2023.
-
Diffusion Models for Constrained Domains
Authors:
Nic Fishman,
Leo Klarner,
Valentin De Bortoli,
Emile Mathieu,
Michael Hutchinson
Abstract:
Denoising diffusion models are a novel class of generative algorithms that achieve state-of-the-art performance across a range of domains, including image generation and text-to-image tasks. Building on this success, diffusion models have recently been extended to the Riemannian manifold setting, broadening their applicability to a range of problems from the natural and engineering sciences. Howev…
▽ More
Denoising diffusion models are a novel class of generative algorithms that achieve state-of-the-art performance across a range of domains, including image generation and text-to-image tasks. Building on this success, diffusion models have recently been extended to the Riemannian manifold setting, broadening their applicability to a range of problems from the natural and engineering sciences. However, these Riemannian diffusion models are built on the assumption that their forward and backward processes are well-defined for all times, preventing them from being applied to an important set of tasks that consider manifolds defined via a set of inequality constraints. In this work, we introduce a principled framework to bridge this gap. We present two distinct noising processes based on (i) the logarithmic barrier metric and (ii) the reflected Brownian motion induced by the constraints. As existing diffusion model techniques cannot be applied in this setting, we derive new tools to define such models in our framework. We then demonstrate the practical utility of our methods on a number of synthetic and real-world tasks, including applications from robotics and protein design.
△ Less
Submitted 7 March, 2024; v1 submitted 11 April, 2023;
originally announced April 2023.
-
Diffusion Schrödinger Bridge Matching
Authors:
Yuyang Shi,
Valentin De Bortoli,
Andrew Campbell,
Arnaud Doucet
Abstract:
Solving transport problems, i.e. finding a map transporting one given distribution to another, has numerous applications in machine learning. Novel mass transport methods motivated by generative modeling have recently been proposed, e.g. Denoising Diffusion Models (DDMs) and Flow Matching Models (FMMs) implement such a transport through a Stochastic Differential Equation (SDE) or an Ordinary Diffe…
▽ More
Solving transport problems, i.e. finding a map transporting one given distribution to another, has numerous applications in machine learning. Novel mass transport methods motivated by generative modeling have recently been proposed, e.g. Denoising Diffusion Models (DDMs) and Flow Matching Models (FMMs) implement such a transport through a Stochastic Differential Equation (SDE) or an Ordinary Differential Equation (ODE). However, while it is desirable in many applications to approximate the deterministic dynamic Optimal Transport (OT) map which admits attractive properties, DDMs and FMMs are not guaranteed to provide transports close to the OT map. In contrast, Schrödinger bridges (SBs) compute stochastic dynamic map**s which recover entropy-regularized versions of OT. Unfortunately, existing numerical methods approximating SBs either scale poorly with dimension or accumulate errors across iterations. In this work, we introduce Iterative Markovian Fitting (IMF), a new methodology for solving SB problems, and Diffusion Schrödinger Bridge Matching (DSBM), a novel numerical algorithm for computing IMF iterates. DSBM significantly improves over previous SB numerics and recovers as special/limiting cases various recent transport methods. We demonstrate the performance of DSBM on a variety of problems.
△ Less
Submitted 11 December, 2023; v1 submitted 29 March, 2023;
originally announced March 2023.
-
SE(3) diffusion model with application to protein backbone generation
Authors:
Jason Yim,
Brian L. Trippe,
Valentin De Bortoli,
Emile Mathieu,
Arnaud Doucet,
Regina Barzilay,
Tommi Jaakkola
Abstract:
The design of novel protein structures remains a challenge in protein engineering for applications across biomedicine and chemistry. In this line of work, a diffusion model over rigid bodies in 3D (referred to as frames) has shown success in generating novel, functional protein backbones that have not been observed in nature. However, there exists no principled methodological framework for diffusi…
▽ More
The design of novel protein structures remains a challenge in protein engineering for applications across biomedicine and chemistry. In this line of work, a diffusion model over rigid bodies in 3D (referred to as frames) has shown success in generating novel, functional protein backbones that have not been observed in nature. However, there exists no principled methodological framework for diffusion on SE(3), the space of orientation preserving rigid motions in R3, that operates on frames and confers the group invariance. We address these shortcomings by develo** theoretical foundations of SE(3) invariant diffusion models on multiple frames followed by a novel framework, FrameDiff, for learning the SE(3) equivariant score over multiple frames. We apply FrameDiff on monomer backbone generation and find it can generate designable monomers up to 500 amino acids without relying on a pretrained protein structure prediction network that has been integral to previous methods. We find our samples are capable of generalizing beyond any known protein structure.
△ Less
Submitted 22 May, 2023; v1 submitted 4 February, 2023;
originally announced February 2023.
-
From Denoising Diffusions to Denoising Markov Models
Authors:
Joe Benton,
Yuyang Shi,
Valentin De Bortoli,
George Deligiannidis,
Arnaud Doucet
Abstract:
Denoising diffusions are state-of-the-art generative models exhibiting remarkable empirical performance. They work by diffusing the data distribution into a Gaussian distribution and then learning to reverse this noising process to obtain synthetic datapoints. The denoising diffusion relies on approximations of the logarithmic derivatives of the noised data densities using score matching. Such mod…
▽ More
Denoising diffusions are state-of-the-art generative models exhibiting remarkable empirical performance. They work by diffusing the data distribution into a Gaussian distribution and then learning to reverse this noising process to obtain synthetic datapoints. The denoising diffusion relies on approximations of the logarithmic derivatives of the noised data densities using score matching. Such models can also be used to perform approximate posterior simulation when one can only sample from the prior and likelihood. We propose a unifying framework generalising this approach to a wide class of spaces and leading to an original extension of score matching. We illustrate the resulting models on various applications.
△ Less
Submitted 18 February, 2024; v1 submitted 7 November, 2022;
originally announced November 2022.
-
Unbiased constrained sampling with Self-Concordant Barrier Hamiltonian Monte Carlo
Authors:
Maxence Noble,
Valentin De Bortoli,
Alain Durmus
Abstract:
In this paper, we propose Barrier Hamiltonian Monte Carlo (BHMC), a version of the HMC algorithm which aims at sampling from a Gibbs distribution $π$ on a manifold $\mathrm{M}$, endowed with a Hessian metric $\mathfrak{g}$ derived from a self-concordant barrier. Our method relies on Hamiltonian dynamics which comprises $\mathfrak{g}$. Therefore, it incorporates the constraints defining…
▽ More
In this paper, we propose Barrier Hamiltonian Monte Carlo (BHMC), a version of the HMC algorithm which aims at sampling from a Gibbs distribution $π$ on a manifold $\mathrm{M}$, endowed with a Hessian metric $\mathfrak{g}$ derived from a self-concordant barrier. Our method relies on Hamiltonian dynamics which comprises $\mathfrak{g}$. Therefore, it incorporates the constraints defining $\mathrm{M}$ and is able to exploit its underlying geometry. However, the corresponding Hamiltonian dynamics is defined via non separable Ordinary Differential Equations (ODEs) in contrast to the Euclidean case. It implies unavoidable bias in existing generalization of HMC to Riemannian manifolds. In this paper, we propose a new filter step, called "involution checking step", to address this problem. This step is implemented in two versions of BHMC, coined continuous BHMC (c-BHMC) and numerical BHMC (n-BHMC) respectively. Our main results establish that these two new algorithms generate reversible Markov chains with respect to $π$ and do not suffer from any bias in comparison to previous implementations. Our conclusions are supported by numerical experiments where we consider target distributions defined on polytopes.
△ Less
Submitted 28 October, 2023; v1 submitted 21 October, 2022;
originally announced October 2022.
-
Spectral Diffusion Processes
Authors:
Angus Phillips,
Thomas Seror,
Michael Hutchinson,
Valentin De Bortoli,
Arnaud Doucet,
Emile Mathieu
Abstract:
Score-based generative modelling (SGM) has proven to be a very effective method for modelling densities on finite-dimensional spaces. In this work we propose to extend this methodology to learn generative models over functional spaces. To do so, we represent functional data in spectral space to dissociate the stochastic part of the processes from their space-time part. Using dimensionality reducti…
▽ More
Score-based generative modelling (SGM) has proven to be a very effective method for modelling densities on finite-dimensional spaces. In this work we propose to extend this methodology to learn generative models over functional spaces. To do so, we represent functional data in spectral space to dissociate the stochastic part of the processes from their space-time part. Using dimensionality reduction techniques we then sample from their stochastic component using finite dimensional SGM. We demonstrate our method's effectiveness for modelling various multimodal datasets.
△ Less
Submitted 28 November, 2022; v1 submitted 28 September, 2022;
originally announced September 2022.
-
Solving Fredholm Integral Equations of the First Kind via Wasserstein Gradient Flows
Authors:
Francesca R. Crucinio,
Valentin De Bortoli,
Arnaud Doucet,
Adam M. Johansen
Abstract:
Solving Fredholm equations of the first kind is crucial in many areas of the applied sciences. In this work we adopt a probabilistic and variational point of view by considering a minimization problem in the space of probability measures with an entropic regularization. Contrary to classical approaches which discretize the domain of the solutions, we introduce an algorithm to asymptotically sample…
▽ More
Solving Fredholm equations of the first kind is crucial in many areas of the applied sciences. In this work we adopt a probabilistic and variational point of view by considering a minimization problem in the space of probability measures with an entropic regularization. Contrary to classical approaches which discretize the domain of the solutions, we introduce an algorithm to asymptotically sample from the unique solution of the regularized minimization problem. As a result our estimators do not depend on any underlying grid and have better scalability properties than most existing methods. Our algorithm is based on a particle approximation of the solution of a McKean--Vlasov stochastic differential equation associated with the Wasserstein gradient flow of our variational formulation. We prove the convergence towards a minimizer and provide practical guidelines for its numerical implementation. Finally, our method is compared with other approaches on several examples including density deconvolution and epidemiology.
△ Less
Submitted 15 May, 2024; v1 submitted 16 September, 2022;
originally announced September 2022.
-
Convergence of denoising diffusion models under the manifold hypothesis
Authors:
Valentin De Bortoli
Abstract:
Denoising diffusion models are a recent class of generative models exhibiting state-of-the-art performance in image and audio synthesis. Such models approximate the time-reversal of a forward noising process from a target distribution to a reference density, which is usually Gaussian. Despite their strong empirical results, the theoretical analysis of such models remains limited. In particular, al…
▽ More
Denoising diffusion models are a recent class of generative models exhibiting state-of-the-art performance in image and audio synthesis. Such models approximate the time-reversal of a forward noising process from a target distribution to a reference density, which is usually Gaussian. Despite their strong empirical results, the theoretical analysis of such models remains limited. In particular, all current approaches crucially assume that the target density admits a density w.r.t. the Lebesgue measure. This does not cover settings where the target distribution is supported on a lower-dimensional manifold or is given by some empirical distribution. In this paper, we bridge this gap by providing the first convergence results for diffusion models in this more general setting. In particular, we provide quantitative bounds on the Wasserstein distance of order one between the target data distribution and the generative distribution of the diffusion model.
△ Less
Submitted 29 May, 2023; v1 submitted 10 August, 2022;
originally announced August 2022.
-
Wavelet Score-Based Generative Modeling
Authors:
Florentin Guth,
Simon Coste,
Valentin De Bortoli,
Stephane Mallat
Abstract:
Score-based generative models (SGMs) synthesize new data samples from Gaussian white noise by running a time-reversed Stochastic Differential Equation (SDE) whose drift coefficient depends on some probabilistic score. The discretization of such SDEs typically requires a large number of time steps and hence a high computational cost. This is because of ill-conditioning properties of the score that…
▽ More
Score-based generative models (SGMs) synthesize new data samples from Gaussian white noise by running a time-reversed Stochastic Differential Equation (SDE) whose drift coefficient depends on some probabilistic score. The discretization of such SDEs typically requires a large number of time steps and hence a high computational cost. This is because of ill-conditioning properties of the score that we analyze mathematically. We show that SGMs can be considerably accelerated, by factorizing the data distribution into a product of conditional probabilities of wavelet coefficients across scales. The resulting Wavelet Score-based Generative Model (WSGM) synthesizes wavelet coefficients with the same number of time steps at all scales, and its time complexity therefore grows linearly with the image size. This is proved mathematically over Gaussian distributions, and shown numerically over physical processes at phase transition and natural image datasets.
△ Less
Submitted 9 August, 2022;
originally announced August 2022.
-
Riemannian Diffusion Schrödinger Bridge
Authors:
James Thornton,
Michael Hutchinson,
Emile Mathieu,
Valentin De Bortoli,
Yee Whye Teh,
Arnaud Doucet
Abstract:
Score-based generative models exhibit state of the art performance on density estimation and generative modeling tasks. These models typically assume that the data geometry is flat, yet recent extensions have been developed to synthesize data living on Riemannian manifolds. Existing methods to accelerate sampling of diffusion models are typically not applicable in the Riemannian setting and Rieman…
▽ More
Score-based generative models exhibit state of the art performance on density estimation and generative modeling tasks. These models typically assume that the data geometry is flat, yet recent extensions have been developed to synthesize data living on Riemannian manifolds. Existing methods to accelerate sampling of diffusion models are typically not applicable in the Riemannian setting and Riemannian score-based methods have not yet been adapted to the important task of interpolation of datasets. To overcome these issues, we introduce \emph{Riemannian Diffusion Schrödinger Bridge}. Our proposed method generalizes Diffusion Schrödinger Bridge introduced in \cite{debortoli2021neurips} to the non-Euclidean setting and extends Riemannian score-based models beyond the first time reversal. We validate our proposed method on synthetic data and real Earth and climate data.
△ Less
Submitted 6 July, 2022;
originally announced July 2022.
-
Can Push-forward Generative Models Fit Multimodal Distributions?
Authors:
Antoine Salmona,
Valentin de Bortoli,
Julie Delon,
Agnès Desolneux
Abstract:
Many generative models synthesize data by transforming a standard Gaussian random variable using a deterministic neural network. Among these models are the Variational Autoencoders and the Generative Adversarial Networks. In this work, we call them "push-forward" models and study their expressivity. We show that the Lipschitz constant of these generative networks has to be large in order to fit mu…
▽ More
Many generative models synthesize data by transforming a standard Gaussian random variable using a deterministic neural network. Among these models are the Variational Autoencoders and the Generative Adversarial Networks. In this work, we call them "push-forward" models and study their expressivity. We show that the Lipschitz constant of these generative networks has to be large in order to fit multimodal distributions. More precisely, we show that the total variation distance and the Kullback-Leibler divergence between the generated and the data distribution are bounded from below by a constant depending on the mode separation and the Lipschitz constant. Since constraining the Lipschitz constants of neural networks is a common way to stabilize generative models, there is a provable trade-off between the ability of push-forward models to approximate multimodal distributions and the stability of their training. We validate our findings on one-dimensional and image datasets and empirically show that generative models consisting of stacked networks with stochastic input at each step, such as diffusion models do not suffer of such limitations.
△ Less
Submitted 12 October, 2022; v1 submitted 29 June, 2022;
originally announced June 2022.
-
A Continuous Time Framework for Discrete Denoising Models
Authors:
Andrew Campbell,
Joe Benton,
Valentin De Bortoli,
Tom Rainforth,
George Deligiannidis,
Arnaud Doucet
Abstract:
We provide the first complete continuous time framework for denoising diffusion models of discrete data. This is achieved by formulating the forward noising process and corresponding reverse time generative process as Continuous Time Markov Chains (CTMCs). The model can be efficiently trained using a continuous time version of the ELBO. We simulate the high dimensional CTMC using techniques develo…
▽ More
We provide the first complete continuous time framework for denoising diffusion models of discrete data. This is achieved by formulating the forward noising process and corresponding reverse time generative process as Continuous Time Markov Chains (CTMCs). The model can be efficiently trained using a continuous time version of the ELBO. We simulate the high dimensional CTMC using techniques developed in chemical physics and exploit our continuous time framework to derive high performance samplers that we show can outperform discrete time methods for discrete data. The continuous time treatment also enables us to derive a novel theoretical result bounding the error between the generated sample distribution and the true data distribution.
△ Less
Submitted 14 October, 2022; v1 submitted 30 May, 2022;
originally announced May 2022.
-
Conditional Simulation Using Diffusion Schrödinger Bridges
Authors:
Yuyang Shi,
Valentin De Bortoli,
George Deligiannidis,
Arnaud Doucet
Abstract:
Denoising diffusion models have recently emerged as a powerful class of generative models. They provide state-of-the-art results, not only for unconditional simulation, but also when used to solve conditional simulation problems arising in a wide range of inverse problems. A limitation of these models is that they are computationally intensive at generation time as they require simulating a diffus…
▽ More
Denoising diffusion models have recently emerged as a powerful class of generative models. They provide state-of-the-art results, not only for unconditional simulation, but also when used to solve conditional simulation problems arising in a wide range of inverse problems. A limitation of these models is that they are computationally intensive at generation time as they require simulating a diffusion process over a long time horizon. When performing unconditional simulation, a Schrödinger bridge formulation of generative modeling leads to a theoretically grounded algorithm shortening generation time which is complementary to other proposed acceleration techniques. We extend the Schrödinger bridge framework to conditional simulation. We demonstrate this novel methodology on various applications including image super-resolution, optimal filtering for state-space models and the refinement of pre-trained networks. Our code can be found at https://github.com/vdeborto/cdsb.
△ Less
Submitted 26 June, 2022; v1 submitted 27 February, 2022;
originally announced February 2022.
-
Riemannian Score-Based Generative Modelling
Authors:
Valentin De Bortoli,
Emile Mathieu,
Michael Hutchinson,
James Thornton,
Yee Whye Teh,
Arnaud Doucet
Abstract:
Score-based generative models (SGMs) are a powerful class of generative models that exhibit remarkable empirical performance. Score-based generative modelling (SGM) consists of a ``noising'' stage, whereby a diffusion is used to gradually add Gaussian noise to data, and a generative model, which entails a ``denoising'' process defined by approximating the time-reversal of the diffusion. Existing S…
▽ More
Score-based generative models (SGMs) are a powerful class of generative models that exhibit remarkable empirical performance. Score-based generative modelling (SGM) consists of a ``noising'' stage, whereby a diffusion is used to gradually add Gaussian noise to data, and a generative model, which entails a ``denoising'' process defined by approximating the time-reversal of the diffusion. Existing SGMs assume that data is supported on a Euclidean space, i.e. a manifold with flat geometry. In many domains such as robotics, geoscience or protein modelling, data is often naturally described by distributions living on Riemannian manifolds and current SGM techniques are not appropriate. We introduce here Riemannian Score-based Generative Models (RSGMs), a class of generative models extending SGMs to Riemannian manifolds. We demonstrate our approach on a variety of manifolds, and in particular with earth and climate science spherical data.
△ Less
Submitted 22 November, 2022; v1 submitted 6 February, 2022;
originally announced February 2022.
-
On Maximum-a-Posteriori estimation with Plug & Play priors and stochastic gradient descent
Authors:
Rémi Laumont,
Valentin de Bortoli,
Andrés Almansa,
Julie Delon,
Alain Durmus,
Marcelo Pereyra
Abstract:
Bayesian methods to solve imaging inverse problems usually combine an explicit data likelihood function with a prior distribution that explicitly models expected properties of the solution. Many kinds of priors have been explored in the literature, from simple ones expressing local properties to more involved ones exploiting image redundancy at a non-local scale. In a departure from explicit model…
▽ More
Bayesian methods to solve imaging inverse problems usually combine an explicit data likelihood function with a prior distribution that explicitly models expected properties of the solution. Many kinds of priors have been explored in the literature, from simple ones expressing local properties to more involved ones exploiting image redundancy at a non-local scale. In a departure from explicit modelling, several recent works have proposed and studied the use of implicit priors defined by an image denoising algorithm. This approach, commonly known as Plug & Play (PnP) regularisation, can deliver remarkably accurate results, particularly when combined with state-of-the-art denoisers based on convolutional neural networks. However, the theoretical analysis of PnP Bayesian models and algorithms is difficult and works on the topic often rely on unrealistic assumptions on the properties of the image denoiser. This papers studies maximum-a-posteriori (MAP) estimation for Bayesian models with PnP priors. We first consider questions related to existence, stability and well-posedness, and then present a convergence proof for MAP computation by PnP stochastic gradient descent (PnP-SGD) under realistic assumptions on the denoiser used. We report a range of imaging experiments demonstrating PnP-SGD as well as comparisons with other PnP schemes.
△ Less
Submitted 16 January, 2022;
originally announced January 2022.
-
Simulating Diffusion Bridges with Score Matching
Authors:
Jeremy Heng,
Valentin De Bortoli,
Arnaud Doucet,
James Thornton
Abstract:
We consider the problem of simulating diffusion bridges, which are diffusion processes that are conditioned to initialize and terminate at two given states. The simulation of diffusion bridges has applications in diverse scientific fields and plays a crucial role in the statistical inference of discretely-observed diffusions. This is known to be a challenging problem that has received much attenti…
▽ More
We consider the problem of simulating diffusion bridges, which are diffusion processes that are conditioned to initialize and terminate at two given states. The simulation of diffusion bridges has applications in diverse scientific fields and plays a crucial role in the statistical inference of discretely-observed diffusions. This is known to be a challenging problem that has received much attention in the last two decades. This article contributes to this rich body of literature by presenting a new avenue to obtain diffusion bridge approximations. Our approach is based on a backward time representation of a diffusion bridge, which may be simulated if one can time-reverse the unconditioned diffusion. We introduce a variational formulation to learn this time-reversal with function approximation and rely on a score matching method to circumvent intractability. Another iteration of our proposed methodology approximates the Doob's $h$-transform defining the forward time representation of a diffusion bridge. We discuss algorithmic considerations and extensions, and present numerical results on an Ornstein--Uhlenbeck process, a model from financial econometrics for interest rates, and a model from genetics for cell differentiation and development to illustrate the effectiveness of our approach.
△ Less
Submitted 29 October, 2022; v1 submitted 14 November, 2021;
originally announced November 2021.
-
On quantitative Laplace-type convergence results for some exponential probability measures, with two applications
Authors:
Valentin De Bortoli,
Agnès Desolneux
Abstract:
Laplace-type results characterize the limit of sequence of measures $(π_\varepsilon)_{\varepsilon >0}$ with density w.r.t the Lebesgue measure $(\mathrm{d} π_\varepsilon / \mathrm{d} \mathrm{Leb})(x) \propto \exp[-U(x)/\varepsilon]$ when the temperature $\varepsilon>0$ converges to $0$. If a limiting distribution $π_0$ exists, it concentrates on the minimizers of the potential $U$. Classical resul…
▽ More
Laplace-type results characterize the limit of sequence of measures $(π_\varepsilon)_{\varepsilon >0}$ with density w.r.t the Lebesgue measure $(\mathrm{d} π_\varepsilon / \mathrm{d} \mathrm{Leb})(x) \propto \exp[-U(x)/\varepsilon]$ when the temperature $\varepsilon>0$ converges to $0$. If a limiting distribution $π_0$ exists, it concentrates on the minimizers of the potential $U$. Classical results require the invertibility of the Hessian of $U$ in order to establish such asymptotics. In this work, we study the particular case of norm-like potentials $U$ and establish quantitative bounds between $π_\varepsilon$ and $π_0$ w.r.t. the Wasserstein distance of order $1$ under an invertibility condition of a generalized Jacobian. One key element of our proof is the use of geometric measure theory tools such as the coarea formula. We apply our results to the study of maximum entropy models (microcanonical/macrocanonical distributions) and to the convergence of the iterates of the Stochastic Gradient Langevin Dynamics (SGLD) algorithm at low temperatures for non-convex minimization.
△ Less
Submitted 25 October, 2021;
originally announced October 2021.
-
Quantitative Uniform Stability of the Iterative Proportional Fitting Procedure
Authors:
George Deligiannidis,
Valentin De Bortoli,
Arnaud Doucet
Abstract:
We establish the uniform in time stability, w.r.t. the marginals, of the Iterative Proportional Fitting Procedure, also known as Sinkhorn algorithm, used to solve entropy-regularised Optimal Transport problems. Our result is quantitative and stated in terms of the 1-Wasserstein metric. As a corollary we establish a quantitative stability result for Schrödinger bridges.
We establish the uniform in time stability, w.r.t. the marginals, of the Iterative Proportional Fitting Procedure, also known as Sinkhorn algorithm, used to solve entropy-regularised Optimal Transport problems. Our result is quantitative and stated in terms of the 1-Wasserstein metric. As a corollary we establish a quantitative stability result for Schrödinger bridges.
△ Less
Submitted 22 October, 2021; v1 submitted 18 August, 2021;
originally announced August 2021.
-
Diffusion Schrödinger Bridge with Applications to Score-Based Generative Modeling
Authors:
Valentin De Bortoli,
James Thornton,
Jeremy Heng,
Arnaud Doucet
Abstract:
Progressively applying Gaussian noise transforms complex data distributions to approximately Gaussian. Reversing this dynamic defines a generative model. When the forward noising process is given by a Stochastic Differential Equation (SDE), Song et al. (2021) demonstrate how the time inhomogeneous drift of the associated reverse-time SDE may be estimated using score-matching. A limitation of this…
▽ More
Progressively applying Gaussian noise transforms complex data distributions to approximately Gaussian. Reversing this dynamic defines a generative model. When the forward noising process is given by a Stochastic Differential Equation (SDE), Song et al. (2021) demonstrate how the time inhomogeneous drift of the associated reverse-time SDE may be estimated using score-matching. A limitation of this approach is that the forward-time SDE must be run for a sufficiently long time for the final distribution to be approximately Gaussian. In contrast, solving the Schrödinger Bridge problem (SB), i.e. an entropy-regularized optimal transport problem on path spaces, yields diffusions which generate samples from the data distribution in finite time. We present Diffusion SB (DSB), an original approximation of the Iterative Proportional Fitting (IPF) procedure to solve the SB problem, and provide theoretical analysis along with generative modeling experiments. The first DSB iteration recovers the methodology proposed by Song et al. (2021), with the flexibility of using shorter time intervals, as subsequent DSB iterations reduce the discrepancy between the final-time marginal of the forward (resp. backward) SDE with respect to the prior (resp. data) distribution. Beyond generative modeling, DSB offers a widely applicable computational optimal transport tool as the continuous state-space analogue of the popular Sinkhorn algorithm (Cuturi, 2013).
△ Less
Submitted 5 April, 2023; v1 submitted 1 June, 2021;
originally announced June 2021.
-
Bayesian imaging using Plug & Play priors: when Langevin meets Tweedie
Authors:
Rémi Laumont,
Valentin de Bortoli,
Andrés Almansa,
Julie Delon,
Alain Durmus,
Marcelo Pereyra
Abstract:
Since the seminal work of Venkatakrishnan et al. in 2013, Plug & Play (PnP) methods have become ubiquitous in Bayesian imaging. These methods derive Minimum Mean Square Error (MMSE) or Maximum A Posteriori (MAP) estimators for inverse problems in imaging by combining an explicit likelihood function with a prior that is implicitly defined by an image denoising algorithm. The PnP algorithms proposed…
▽ More
Since the seminal work of Venkatakrishnan et al. in 2013, Plug & Play (PnP) methods have become ubiquitous in Bayesian imaging. These methods derive Minimum Mean Square Error (MMSE) or Maximum A Posteriori (MAP) estimators for inverse problems in imaging by combining an explicit likelihood function with a prior that is implicitly defined by an image denoising algorithm. The PnP algorithms proposed in the literature mainly differ in the iterative schemes they use for optimisation or for sampling. In the case of optimisation schemes, some recent works guarantee the convergence to a fixed point, albeit not necessarily a MAP estimate. In the case of sampling schemes, to the best of our knowledge, there is no known proof of convergence. There also remain important open questions regarding whether the underlying Bayesian models and estimators are well defined, well-posed, and have the basic regularity properties required to support these numerical schemes. To address these limitations, this paper develops theory, methods, and provably convergent algorithms for performing Bayesian inference with PnP priors. We introduce two algorithms: 1) PnP-ULA (Unadjusted Langevin Algorithm) for Monte Carlo sampling and MMSE inference; and 2) PnP-SGD (Stochastic Gradient Descent) for MAP inference. Using recent results on the quantitative convergence of Markov chains, we establish detailed convergence guarantees for these two algorithms under realistic assumptions on the denoising operators used, with special attention to denoisers based on deep neural networks. We also show that these algorithms approximately target a decision-theoretically optimal Bayesian model that is well-posed. The proposed algorithms are demonstrated on several canonical problems such as image deblurring, inpainting, and denoising, where they are used for point estimation as well as for uncertainty visualisation and quantification.
△ Less
Submitted 12 January, 2022; v1 submitted 8 March, 2021;
originally announced March 2021.
-
Maximum likelihood estimation of regularisation parameters in high-dimensional inverse problems: an empirical Bayesian approach. Part II: Theoretical Analysis
Authors:
Valentin De Bortoli,
Alain Durmus,
Ana F. Vidal,
Marcelo Pereyra
Abstract:
This paper presents a detailed theoretical analysis of the three stochastic approximation proximal gradient algorithms proposed in our companion paper [49] to set regularization parameters by marginal maximum likelihood estimation. We prove the convergence of a more general stochastic approximation scheme that includes the three algorithms of [49] as special cases. This includes asymptotic and non…
▽ More
This paper presents a detailed theoretical analysis of the three stochastic approximation proximal gradient algorithms proposed in our companion paper [49] to set regularization parameters by marginal maximum likelihood estimation. We prove the convergence of a more general stochastic approximation scheme that includes the three algorithms of [49] as special cases. This includes asymptotic and non-asymptotic convergence results with natural and easily verifiable conditions, as well as explicit bounds on the convergence rates. Importantly, the theory is also general in that it can be applied to other intractable optimisation problems. A main novelty of the work is that the stochastic gradient estimates of our scheme are constructed from inexact proximal Markov chain Monte Carlo samplers. This allows the use of samplers that scale efficiently to large problems and for which we have precise theoretical guarantees.
△ Less
Submitted 13 August, 2020;
originally announced August 2020.
-
Quantitative Propagation of Chaos for SGD in Wide Neural Networks
Authors:
Valentin De Bortoli,
Alain Durmus,
Xavier Fontaine,
Umut Simsekli
Abstract:
In this paper, we investigate the limiting behavior of a continuous-time counterpart of the Stochastic Gradient Descent (SGD) algorithm applied to two-layer overparameterized neural networks, as the number or neurons (ie, the size of the hidden layer) $N \to +\infty$. Following a probabilistic approach, we show 'propagation of chaos' for the particle system defined by this continuous-time dynamics…
▽ More
In this paper, we investigate the limiting behavior of a continuous-time counterpart of the Stochastic Gradient Descent (SGD) algorithm applied to two-layer overparameterized neural networks, as the number or neurons (ie, the size of the hidden layer) $N \to +\infty$. Following a probabilistic approach, we show 'propagation of chaos' for the particle system defined by this continuous-time dynamics under different scenarios, indicating that the statistical interaction between the particles asymptotically vanishes. In particular, we establish quantitative convergence with respect to $N$ of any particle to a solution of a mean-field McKean-Vlasov equation in the metric space endowed with the Wasserstein distance. In comparison to previous works on the subject, we consider settings in which the sequence of stepsizes in SGD can potentially depend on the number of neurons and the iterations. We then identify two regimes under which different mean-field limits are obtained, one of them corresponding to an implicitly regularized version of the minimization problem at hand. We perform various experiments on real datasets to validate our theoretical results, assessing the existence of these two regimes on classification problems and illustrating our convergence results.
△ Less
Submitted 14 July, 2020; v1 submitted 13 July, 2020;
originally announced July 2020.
-
Convergence rates and approximation results for SGD and its continuous-time counterpart
Authors:
Xavier Fontaine,
Valentin De Bortoli,
Alain Durmus
Abstract:
This paper proposes a thorough theoretical analysis of Stochastic Gradient Descent (SGD) with non-increasing step sizes. First, we show that the recursion defining SGD can be provably approximated by solutions of a time inhomogeneous Stochastic Differential Equation (SDE) using an appropriate coupling. In the specific case of a batch noise we refine our results using recent advances in Stein's met…
▽ More
This paper proposes a thorough theoretical analysis of Stochastic Gradient Descent (SGD) with non-increasing step sizes. First, we show that the recursion defining SGD can be provably approximated by solutions of a time inhomogeneous Stochastic Differential Equation (SDE) using an appropriate coupling. In the specific case of a batch noise we refine our results using recent advances in Stein's method. Then, motivated by recent analyses of deterministic and stochastic optimization methods by their continuous counterpart, we study the long-time behavior of the continuous processes at hand and establish non-asymptotic bounds. To that purpose, we develop new comparison techniques which are of independent interest. Adapting these techniques to the discrete setting, we show that the same results hold for the corresponding SGD sequences. In our analysis, we notably improve non-asymptotic bounds in the convex setting for SGD under weaker assumptions than the ones considered in previous works. Finally, we also establish finite-time convergence results under various conditions, including relaxations of the famous Łojasiewicz inequality, which can be applied to a class of non-convex functions.
△ Less
Submitted 29 January, 2021; v1 submitted 8 April, 2020;
originally announced April 2020.
-
Maximum entropy methods for texture synthesis: theory and practice
Authors:
Valentin De Bortoli,
Agnes Desolneux,
Alain Durmus,
Bruno Galerne,
Arthur Leclaire
Abstract:
Recent years have seen the rise of convolutional neural network techniques in exemplar-based image synthesis. These methods often rely on the minimization of some variational formulation on the image space for which the minimizers are assumed to be the solutions of the synthesis problem. In this paper we investigate, both theoretically and experimentally, another framework to deal with this proble…
▽ More
Recent years have seen the rise of convolutional neural network techniques in exemplar-based image synthesis. These methods often rely on the minimization of some variational formulation on the image space for which the minimizers are assumed to be the solutions of the synthesis problem. In this paper we investigate, both theoretically and experimentally, another framework to deal with this problem using an alternate sampling/minimization scheme. First, we use results from information geometry to assess that our method yields a probability measure which has maximum entropy under some constraints in expectation. Then, we turn to the analysis of our method and we show, using recent results from the Markov chain literature, that its error can be explicitly bounded with constants which depend polynomially in the dimension even in the non-convex setting. This includes the case where the constraints are defined via a differentiable neural network. Finally, we present an extensive experimental study of the model, including a comparison with state-of-the-art methods and an extension to style transfer.
△ Less
Submitted 3 December, 2019;
originally announced December 2019.
-
Maximum likelihood estimation of regularisation parameters in high-dimensional inverse problems: an empirical Bayesian approach. Part I: Methodology and Experiments
Authors:
Ana F. Vidal,
Valentin De Bortoli,
Marcelo Pereyra,
Alain Durmus
Abstract:
Many imaging problems require solving an inverse problem that is ill-conditioned or ill-posed. Imaging methods typically address this difficulty by regularising the estimation problem to make it well-posed. This often requires setting the value of the so-called regularisation parameters that control the amount of regularisation enforced. These parameters are notoriously difficult to set a priori,…
▽ More
Many imaging problems require solving an inverse problem that is ill-conditioned or ill-posed. Imaging methods typically address this difficulty by regularising the estimation problem to make it well-posed. This often requires setting the value of the so-called regularisation parameters that control the amount of regularisation enforced. These parameters are notoriously difficult to set a priori, and can have a dramatic impact on the recovered estimates. In this work, we propose a general empirical Bayesian method for setting regularisation parameters in imaging problems that are convex w.r.t. the unknown image. Our method calibrates regularisation parameters directly from the observed data by maximum marginal likelihood estimation, and can simultaneously estimate multiple regularisation parameters. Furthermore, the proposed algorithm uses the same basic operators as proximal optimisation algorithms, namely gradient and proximal operators, and it is therefore straightforward to apply to problems that are currently solved by using proximal optimisation techniques. Our methodology is demonstrated with a range of experiments and comparisons with alternative approaches from the literature. The considered experiments include image denoising, non-blind image deconvolution, and hyperspectral unmixing, using synthesis and analysis priors involving the L1, total-variation, total-variation and L1, and total-generalised-variation pseudo-norms. A detailed theoretical analysis of the proposed method is presented in the companion paper arXiv:2008.05793.
△ Less
Submitted 14 August, 2020; v1 submitted 26 November, 2019;
originally announced November 2019.
-
Approximate Bayesian Computation with the Sliced-Wasserstein Distance
Authors:
Kimia Nadjahi,
Valentin De Bortoli,
Alain Durmus,
Roland Badeau,
Umut Şimşekli
Abstract:
Approximate Bayesian Computation (ABC) is a popular method for approximate inference in generative models with intractable but easy-to-sample likelihood. It constructs an approximate posterior distribution by finding parameters for which the simulated data are close to the observations in terms of summary statistics. These statistics are defined beforehand and might induce a loss of information, w…
▽ More
Approximate Bayesian Computation (ABC) is a popular method for approximate inference in generative models with intractable but easy-to-sample likelihood. It constructs an approximate posterior distribution by finding parameters for which the simulated data are close to the observations in terms of summary statistics. These statistics are defined beforehand and might induce a loss of information, which has been shown to deteriorate the quality of the approximation. To overcome this problem, Wasserstein-ABC has been recently proposed, and compares the datasets via the Wasserstein distance between their empirical distributions, but does not scale well to the dimension or the number of samples. We propose a new ABC technique, called Sliced-Wasserstein ABC and based on the Sliced-Wasserstein distance, which has better computational and statistical properties. We derive two theoretical results showing the asymptotical consistency of our approach, and we illustrate its advantages on synthetic data and an image denoising task.
△ Less
Submitted 6 March, 2020; v1 submitted 28 October, 2019;
originally announced October 2019.
-
Efficient stochastic optimisation by unadjusted Langevin Monte Carlo. Application to maximum marginal likelihood and empirical Bayesian estimation
Authors:
Valentin De Bortoli,
Alain Durmus,
Marcelo Pereyra,
Ana F. Vidal
Abstract:
Stochastic approximation methods play a central role in maximum likelihood estimation problems involving intractable likelihood functions, such as marginal likelihoods arising in problems with missing or incomplete data, and in parametric empirical Bayesian estimation. Combined with Markov chain Monte Carlo algorithms, these stochastic optimisation methods have been successfully applied to a wide…
▽ More
Stochastic approximation methods play a central role in maximum likelihood estimation problems involving intractable likelihood functions, such as marginal likelihoods arising in problems with missing or incomplete data, and in parametric empirical Bayesian estimation. Combined with Markov chain Monte Carlo algorithms, these stochastic optimisation methods have been successfully applied to a wide range of problems in science and industry. However, this strategy scales poorly to large problems because of methodological and theoretical difficulties related to using high-dimensional Markov chain Monte Carlo algorithms within a stochastic approximation scheme. This paper proposes to address these difficulties by using unadjusted Langevin algorithms to construct the stochastic approximation. This leads to a highly efficient stochastic optimisation methodology with favourable convergence properties that can be quantified explicitly and easily checked. The proposed methodology is demonstrated with three experiments, including a challenging application to high-dimensional statistical audio analysis and a sparse Bayesian logistic regression with random effects problem.
△ Less
Submitted 30 May, 2020; v1 submitted 28 June, 2019;
originally announced June 2019.
-
Convergence of diffusions and their discretizations: from continuous to discrete processes and back
Authors:
Valentin De Bortoli,
Alain Durmus
Abstract:
In this paper, we establish new quantitative convergence bounds for a class of functional autoregressive models in weighted total variation metrics. To derive our results, we show that under mild assumptions, explicit minorization and Foster-Lyapunov drift conditions hold. The main applications and consequences of the bounds we obtain concern the geometric convergence of Euler-Maruyama discretizat…
▽ More
In this paper, we establish new quantitative convergence bounds for a class of functional autoregressive models in weighted total variation metrics. To derive our results, we show that under mild assumptions, explicit minorization and Foster-Lyapunov drift conditions hold. The main applications and consequences of the bounds we obtain concern the geometric convergence of Euler-Maruyama discretizations of diffusions with identity covariance matrix. Second, as a corollary, we provide a new approach to establish quantitative convergence of these diffusion processes by applying our conclusions in the discrete-time setting to a well-suited sequence of discretizations whose associated stepsizes decrease towards zero.
△ Less
Submitted 1 May, 2020; v1 submitted 22 April, 2019;
originally announced April 2019.