-
Accelerated Bayesian imaging by relaxed proximal-point Langevin sampling
Authors:
Teresa Klatzer,
Paul Dobson,
Yoann Altmann,
Marcelo Pereyra,
Jesús María Sanz-Serna,
Konstantinos C. Zygalakis
Abstract:
This paper presents a new accelerated proximal Markov chain Monte Carlo methodology to perform Bayesian inference in imaging inverse problems with an underlying convex geometry. The proposed strategy takes the form of a stochastic relaxed proximal-point iteration that admits two complementary interpretations. For models that are smooth or regularised by Moreau-Yosida smoothing, the algorithm is eq…
▽ More
This paper presents a new accelerated proximal Markov chain Monte Carlo methodology to perform Bayesian inference in imaging inverse problems with an underlying convex geometry. The proposed strategy takes the form of a stochastic relaxed proximal-point iteration that admits two complementary interpretations. For models that are smooth or regularised by Moreau-Yosida smoothing, the algorithm is equivalent to an implicit midpoint discretisation of an overdamped Langevin diffusion targeting the posterior distribution of interest. This discretisation is asymptotically unbiased for Gaussian targets and shown to converge in an accelerated manner for any target that is $κ$-strongly log-concave (i.e., requiring in the order of $\sqrtκ$ iterations to converge, similarly to accelerated optimisation schemes), comparing favorably to [M. Pereyra, L. Vargas Mieles, K.C. Zygalakis, SIAM J. Imaging Sciences, 13,2 (2020), pp. 905-935] which is only provably accelerated for Gaussian targets and has bias. For models that are not smooth, the algorithm is equivalent to a Leimkuhler-Matthews discretisation of a Langevin diffusion targeting a Moreau-Yosida approximation of the posterior distribution of interest, and hence achieves a significantly lower bias than conventional unadjusted Langevin strategies based on the Euler-Maruyama discretisation. For targets that are $κ$-strongly log-concave, the provided non-asymptotic convergence analysis also identifies the optimal time step which maximizes the convergence speed. The proposed methodology is demonstrated through a range of experiments related to image deconvolution with Gaussian and Poisson noise, with assumption-driven and data-driven convex priors. Source codes for the numerical experiments of this paper are available from https://github.com/MI2G/accelerated-langevin-imla.
△ Less
Submitted 12 January, 2024; v1 submitted 18 August, 2023;
originally announced August 2023.
-
Adaptive multi-stage integration schemes for Hamiltonian Monte Carlo
Authors:
Lorenzo Nagar,
Mario Fernández-Pendás,
Jesús María Sanz-Serna,
Elena Akhmatskaya
Abstract:
Hamiltonian Monte Carlo (HMC) is a powerful tool for Bayesian statistical inference due to its potential to rapidly explore high dimensional state space, avoiding the random walk behavior typical of many Markov Chain Monte Carlo samplers. The proper choice of the integrator of the Hamiltonian dynamics is key to the efficiency of HMC. It is becoming increasingly clear that multi-stage splitting int…
▽ More
Hamiltonian Monte Carlo (HMC) is a powerful tool for Bayesian statistical inference due to its potential to rapidly explore high dimensional state space, avoiding the random walk behavior typical of many Markov Chain Monte Carlo samplers. The proper choice of the integrator of the Hamiltonian dynamics is key to the efficiency of HMC. It is becoming increasingly clear that multi-stage splitting integrators are a good alternative to the Verlet method, traditionally used in HMC. Here we propose a principled way of finding optimal, problem-specific integration schemes (in terms of the best conservation of energy for harmonic forces/Gaussian targets) within the families of 2- and 3-stage splitting integrators. The method, which we call Adaptive Integration Approach for statistics, or s-AIA, uses a multivariate Gaussian model and simulation data obtained at the HMC burn-in stage to identify a system-specific dimensional stability interval and assigns the most appropriate 2-/3-stage integrator for any user-chosen simulation step size within that interval. s-AIA has been implemented in the in-house software package HaiCS without introducing computational overheads in the simulations. The efficiency of the s-AIA integrators and their impact on the HMC accuracy, sampling performance and convergence are discussed in comparison with known fixed-parameter multi-stage splitting integrators (including Verlet). Numerical experiments on well-known statistical models show that the adaptive schemes reach the best possible performance within the family of 2-, 3-stage splitting schemes.
△ Less
Submitted 31 January, 2024; v1 submitted 5 July, 2023;
originally announced July 2023.
-
On the connections between optimization algorithms, Lyapunov functions, and differential equations: theory and insights
Authors:
Paul Dobson,
Jesus Maria Sanz-Serna,
Konstantinos Zygalakis
Abstract:
We revisit the general framework introduced by Fazylab et al. (SIAM J. Optim. 28, 2018) to construct Lyapunov functions for optimization algorithms in discrete and continuous time. For smooth, strongly convex objective functions, we relax the requirements necessary for such a construction. As a result we are able to prove for Polyak's ordinary differential equations and for a two-parameter family…
▽ More
We revisit the general framework introduced by Fazylab et al. (SIAM J. Optim. 28, 2018) to construct Lyapunov functions for optimization algorithms in discrete and continuous time. For smooth, strongly convex objective functions, we relax the requirements necessary for such a construction. As a result we are able to prove for Polyak's ordinary differential equations and for a two-parameter family of Nesterov algorithms rates of convergence that improve on those available in the literature. We analyse the interpretation of Nesterov algorithms as discretizations of the Polyak equation. We show that the algorithms are instances of Additive Runge-Kutta integrators and discuss the reasons why most discretizations of the differential equation do not result in optimization algorithms with acceleration. We also introduce a modification of Polyak's equation and study its convergence properties. Finally we extend the general framework to the stochastic scenario and consider an application to random algorithms with acceleration for overparameterized models; again we are able to prove convergence rates that improve on those in the literature.
△ Less
Submitted 20 May, 2024; v1 submitted 15 May, 2023;
originally announced May 2023.
-
Split Hamiltonian Monte Carlo revisited
Authors:
Fernando Casas,
Jesús María Sanz-Serna,
Luke Shaw
Abstract:
We study Hamiltonian Monte Carlo (HMC) samplers based on splitting the Hamiltonian $H$ as $H_0(θ,p)+U_1(θ)$, where $H_0$ is quadratic and $U_1$ small. We show that, in general, such samplers suffer from stepsize stability restrictions similar to those of algorithms based on the standard leapfrog integrator. The restrictions may be circumvented by preconditioning the dynamics. Numerical experiments…
▽ More
We study Hamiltonian Monte Carlo (HMC) samplers based on splitting the Hamiltonian $H$ as $H_0(θ,p)+U_1(θ)$, where $H_0$ is quadratic and $U_1$ small. We show that, in general, such samplers suffer from stepsize stability restrictions similar to those of algorithms based on the standard leapfrog integrator. The restrictions may be circumvented by preconditioning the dynamics. Numerical experiments show that, when the $H_0(θ,p)+U_1(θ)$ splitting is combined with preconditioning, it is possible to construct samplers far more efficient than standard leapfrog HMC.
△ Less
Submitted 15 July, 2022;
originally announced July 2022.
-
Wasserstein distance estimates for the distributions of numerical approximations to ergodic stochastic differential equations
Authors:
J. M. Sanz-Serna,
Konstantinos C. Zygalakis
Abstract:
We present a framework that allows for the non-asymptotic study of the $2$-Wasserstein distance between the invariant distribution of an ergodic stochastic differential equation and the distribution of its numerical approximation in the strongly log-concave case. This allows us to study in a unified way a number of different integrators proposed in the literature for the overdamped and underdamped…
▽ More
We present a framework that allows for the non-asymptotic study of the $2$-Wasserstein distance between the invariant distribution of an ergodic stochastic differential equation and the distribution of its numerical approximation in the strongly log-concave case. This allows us to study in a unified way a number of different integrators proposed in the literature for the overdamped and underdamped Langevin dynamics. In addition, we analyse a novel splitting method for the underdamped Langevin dynamics which only requires one gradient evaluation per time step. Under an additional smoothness assumption on a $d$--dimensional strongly log-concave distribution with condition number $κ$, the algorithm is shown to produce with an $\mathcal{O}\big(κ^{5/4} d^{1/4}ε^{-1/2} \big)$ complexity samples from a distribution that, in Wasserstein distance, is at most $ε>0$ away from the target distribution.
△ Less
Submitted 24 September, 2021; v1 submitted 26 April, 2021;
originally announced April 2021.
-
Is the NUTS algorithm correct?
Authors:
J. M. Sanz-Serna
Abstract:
This paper is devoted to investigate whether the popular No U-turn (NUTS) sampling algorithm is correct, i.e.\ whether the target probability distribution is \emph{exactly} conserved by the algorithm. It turns out that one of the Gibbs substeps used in the algorithm cannot always be guaranteed to be correct.
This paper is devoted to investigate whether the popular No U-turn (NUTS) sampling algorithm is correct, i.e.\ whether the target probability distribution is \emph{exactly} conserved by the algorithm. It turns out that one of the Gibbs substeps used in the algorithm cannot always be guaranteed to be correct.
△ Less
Submitted 7 May, 2020; v1 submitted 4 May, 2020;
originally announced May 2020.
-
HMC: avoiding rejections by not using leapfrog and some results on the acceptance rate
Authors:
M. P. Calvo,
D. Sanz-Alonso,
J. M. Sanz-Serna
Abstract:
The leapfrog integrator is routinely used within the Hamiltonian Monte Carlo method and its variants. We give strong numerical evidence that alternative, easy to implement algorithms yield fewer rejections with a given computational effort. When the dimensionality of the target distribution is high, the number of accepted proposals may be multiplied by a factor of three or more. This increase in t…
▽ More
The leapfrog integrator is routinely used within the Hamiltonian Monte Carlo method and its variants. We give strong numerical evidence that alternative, easy to implement algorithms yield fewer rejections with a given computational effort. When the dimensionality of the target distribution is high, the number of accepted proposals may be multiplied by a factor of three or more. This increase in the number of accepted proposals is not achieved by impairing any positive features of the sampling. We also establish new non-asymptotic and asymptotic results on the monotonic relationship between the expected acceptance rate and the expected energy error. These results further validate the derivation of one of the integrators we consider and are of independent interest.
△ Less
Submitted 2 April, 2021; v1 submitted 6 December, 2019;
originally announced December 2019.
-
Geometric integrators and the Hamiltonian Monte Carlo method
Authors:
Nawaf Bou-Rabee,
Jesús María Sanz-Serna
Abstract:
This paper surveys in detail the relations between numerical integration and the Hamiltonian (or hybrid) Monte Carlo method (HMC). Since the computational cost of HMC mainly lies in the numerical integrations, these should be performed as efficiently as possible. However, HMC requires methods that have the geometric properties of being volume-preserving and reversible, and this limits the number o…
▽ More
This paper surveys in detail the relations between numerical integration and the Hamiltonian (or hybrid) Monte Carlo method (HMC). Since the computational cost of HMC mainly lies in the numerical integrations, these should be performed as efficiently as possible. However, HMC requires methods that have the geometric properties of being volume-preserving and reversible, and this limits the number of integrators that may be used. On the other hand, these geometric properties have important quantitative implications on the integration error, which in turn have an impact on the acceptance rate of the proposal. While at present the velocity Verlet algorithm is the method of choice for good reasons, we argue that Verlet can be improved upon. We also discuss in detail the behavior of HMC as the dimensionality of the target distribution increases.
△ Less
Submitted 14 November, 2017;
originally announced November 2017.