Search | arXiv e-print repository

Accelerated Bayesian imaging by relaxed proximal-point Langevin sampling

Authors: Teresa Klatzer, Paul Dobson, Yoann Altmann, Marcelo Pereyra, Jesús María Sanz-Serna, Konstantinos C. Zygalakis

Abstract: This paper presents a new accelerated proximal Markov chain Monte Carlo methodology to perform Bayesian inference in imaging inverse problems with an underlying convex geometry. The proposed strategy takes the form of a stochastic relaxed proximal-point iteration that admits two complementary interpretations. For models that are smooth or regularised by Moreau-Yosida smoothing, the algorithm is eq… ▽ More This paper presents a new accelerated proximal Markov chain Monte Carlo methodology to perform Bayesian inference in imaging inverse problems with an underlying convex geometry. The proposed strategy takes the form of a stochastic relaxed proximal-point iteration that admits two complementary interpretations. For models that are smooth or regularised by Moreau-Yosida smoothing, the algorithm is equivalent to an implicit midpoint discretisation of an overdamped Langevin diffusion targeting the posterior distribution of interest. This discretisation is asymptotically unbiased for Gaussian targets and shown to converge in an accelerated manner for any target that is $κ$-strongly log-concave (i.e., requiring in the order of $\sqrtκ$ iterations to converge, similarly to accelerated optimisation schemes), comparing favorably to [M. Pereyra, L. Vargas Mieles, K.C. Zygalakis, SIAM J. Imaging Sciences, 13,2 (2020), pp. 905-935] which is only provably accelerated for Gaussian targets and has bias. For models that are not smooth, the algorithm is equivalent to a Leimkuhler-Matthews discretisation of a Langevin diffusion targeting a Moreau-Yosida approximation of the posterior distribution of interest, and hence achieves a significantly lower bias than conventional unadjusted Langevin strategies based on the Euler-Maruyama discretisation. For targets that are $κ$-strongly log-concave, the provided non-asymptotic convergence analysis also identifies the optimal time step which maximizes the convergence speed. The proposed methodology is demonstrated through a range of experiments related to image deconvolution with Gaussian and Poisson noise, with assumption-driven and data-driven convex priors. Source codes for the numerical experiments of this paper are available from https://github.com/MI2G/accelerated-langevin-imla. △ Less

Submitted 12 January, 2024; v1 submitted 18 August, 2023; originally announced August 2023.

Comments: 34 pages, 13 figures

MSC Class: 65C40; 68U10; 62F15; 65C60; 65J22; 68W25

arXiv:2307.02096 [pdf, other]

doi 10.1016/j.jcp.2024.112800

Adaptive multi-stage integration schemes for Hamiltonian Monte Carlo

Authors: Lorenzo Nagar, Mario Fernández-Pendás, Jesús María Sanz-Serna, Elena Akhmatskaya

Abstract: Hamiltonian Monte Carlo (HMC) is a powerful tool for Bayesian statistical inference due to its potential to rapidly explore high dimensional state space, avoiding the random walk behavior typical of many Markov Chain Monte Carlo samplers. The proper choice of the integrator of the Hamiltonian dynamics is key to the efficiency of HMC. It is becoming increasingly clear that multi-stage splitting int… ▽ More Hamiltonian Monte Carlo (HMC) is a powerful tool for Bayesian statistical inference due to its potential to rapidly explore high dimensional state space, avoiding the random walk behavior typical of many Markov Chain Monte Carlo samplers. The proper choice of the integrator of the Hamiltonian dynamics is key to the efficiency of HMC. It is becoming increasingly clear that multi-stage splitting integrators are a good alternative to the Verlet method, traditionally used in HMC. Here we propose a principled way of finding optimal, problem-specific integration schemes (in terms of the best conservation of energy for harmonic forces/Gaussian targets) within the families of 2- and 3-stage splitting integrators. The method, which we call Adaptive Integration Approach for statistics, or s-AIA, uses a multivariate Gaussian model and simulation data obtained at the HMC burn-in stage to identify a system-specific dimensional stability interval and assigns the most appropriate 2-/3-stage integrator for any user-chosen simulation step size within that interval. s-AIA has been implemented in the in-house software package HaiCS without introducing computational overheads in the simulations. The efficiency of the s-AIA integrators and their impact on the HMC accuracy, sampling performance and convergence are discussed in comparison with known fixed-parameter multi-stage splitting integrators (including Verlet). Numerical experiments on well-known statistical models show that the adaptive schemes reach the best possible performance within the family of 2-, 3-stage splitting schemes. △ Less

Submitted 31 January, 2024; v1 submitted 5 July, 2023; originally announced July 2023.

Journal ref: Journal of Computational Physics, Volume 502, 1 April 2024, 112800

arXiv:2305.08658 [pdf, other]

On the connections between optimization algorithms, Lyapunov functions, and differential equations: theory and insights

Authors: Paul Dobson, Jesus Maria Sanz-Serna, Konstantinos Zygalakis

Abstract: We revisit the general framework introduced by Fazylab et al. (SIAM J. Optim. 28, 2018) to construct Lyapunov functions for optimization algorithms in discrete and continuous time. For smooth, strongly convex objective functions, we relax the requirements necessary for such a construction. As a result we are able to prove for Polyak's ordinary differential equations and for a two-parameter family… ▽ More We revisit the general framework introduced by Fazylab et al. (SIAM J. Optim. 28, 2018) to construct Lyapunov functions for optimization algorithms in discrete and continuous time. For smooth, strongly convex objective functions, we relax the requirements necessary for such a construction. As a result we are able to prove for Polyak's ordinary differential equations and for a two-parameter family of Nesterov algorithms rates of convergence that improve on those available in the literature. We analyse the interpretation of Nesterov algorithms as discretizations of the Polyak equation. We show that the algorithms are instances of Additive Runge-Kutta integrators and discuss the reasons why most discretizations of the differential equation do not result in optimization algorithms with acceleration. We also introduce a modification of Polyak's equation and study its convergence properties. Finally we extend the general framework to the stochastic scenario and consider an application to random algorithms with acceleration for overparameterized models; again we are able to prove convergence rates that improve on those in the literature. △ Less

Submitted 20 May, 2024; v1 submitted 15 May, 2023; originally announced May 2023.

Comments: 23 pages, 5 figures

MSC Class: 65L06; 65L20; 90C25; 93C15

arXiv:2207.07516 [pdf, other]

Split Hamiltonian Monte Carlo revisited

Authors: Fernando Casas, Jesús María Sanz-Serna, Luke Shaw

Abstract: We study Hamiltonian Monte Carlo (HMC) samplers based on splitting the Hamiltonian $H$ as $H_0(θ,p)+U_1(θ)$, where $H_0$ is quadratic and $U_1$ small. We show that, in general, such samplers suffer from stepsize stability restrictions similar to those of algorithms based on the standard leapfrog integrator. The restrictions may be circumvented by preconditioning the dynamics. Numerical experiments… ▽ More We study Hamiltonian Monte Carlo (HMC) samplers based on splitting the Hamiltonian $H$ as $H_0(θ,p)+U_1(θ)$, where $H_0$ is quadratic and $U_1$ small. We show that, in general, such samplers suffer from stepsize stability restrictions similar to those of algorithms based on the standard leapfrog integrator. The restrictions may be circumvented by preconditioning the dynamics. Numerical experiments show that, when the $H_0(θ,p)+U_1(θ)$ splitting is combined with preconditioning, it is possible to construct samplers far more efficient than standard leapfrog HMC. △ Less

Submitted 15 July, 2022; originally announced July 2022.

Comments: 25 pages, 6 figures

arXiv:2104.12384 [pdf, ps, other]

Wasserstein distance estimates for the distributions of numerical approximations to ergodic stochastic differential equations

Authors: J. M. Sanz-Serna, Konstantinos C. Zygalakis

Abstract: We present a framework that allows for the non-asymptotic study of the $2$-Wasserstein distance between the invariant distribution of an ergodic stochastic differential equation and the distribution of its numerical approximation in the strongly log-concave case. This allows us to study in a unified way a number of different integrators proposed in the literature for the overdamped and underdamped… ▽ More We present a framework that allows for the non-asymptotic study of the $2$-Wasserstein distance between the invariant distribution of an ergodic stochastic differential equation and the distribution of its numerical approximation in the strongly log-concave case. This allows us to study in a unified way a number of different integrators proposed in the literature for the overdamped and underdamped Langevin dynamics. In addition, we analyse a novel splitting method for the underdamped Langevin dynamics which only requires one gradient evaluation per time step. Under an additional smoothness assumption on a $d$--dimensional strongly log-concave distribution with condition number $κ$, the algorithm is shown to produce with an $\mathcal{O}\big(κ^{5/4} d^{1/4}ε^{-1/2} \big)$ complexity samples from a distribution that, in Wasserstein distance, is at most $ε>0$ away from the target distribution. △ Less

Submitted 24 September, 2021; v1 submitted 26 April, 2021; originally announced April 2021.

Comments: 29 pages, 2 figures

MSC Class: 65C40; 60H10; 60H35

arXiv:2005.01336

Is the NUTS algorithm correct?

Authors: J. M. Sanz-Serna

Abstract: This paper is devoted to investigate whether the popular No U-turn (NUTS) sampling algorithm is correct, i.e.\ whether the target probability distribution is \emph{exactly} conserved by the algorithm. It turns out that one of the Gibbs substeps used in the algorithm cannot always be guaranteed to be correct. This paper is devoted to investigate whether the popular No U-turn (NUTS) sampling algorithm is correct, i.e.\ whether the target probability distribution is \emph{exactly} conserved by the algorithm. It turns out that one of the Gibbs substeps used in the algorithm cannot always be guaranteed to be correct. △ Less

Submitted 7 May, 2020; v1 submitted 4 May, 2020; originally announced May 2020.

Comments: Some statements in the paper are misleading. It is possible to think of NUTS at not being a slice/Gibbs sampler and, with an alternative interpretation, it may be be possible to prove that the algorithm is correct. In addition the experiment reported in Figure 2 should have had many initial states drawn from the target rather than using a single value

arXiv:1912.03253 [pdf, other]

HMC: avoiding rejections by not using leapfrog and some results on the acceptance rate

Authors: M. P. Calvo, D. Sanz-Alonso, J. M. Sanz-Serna

Abstract: The leapfrog integrator is routinely used within the Hamiltonian Monte Carlo method and its variants. We give strong numerical evidence that alternative, easy to implement algorithms yield fewer rejections with a given computational effort. When the dimensionality of the target distribution is high, the number of accepted proposals may be multiplied by a factor of three or more. This increase in t… ▽ More The leapfrog integrator is routinely used within the Hamiltonian Monte Carlo method and its variants. We give strong numerical evidence that alternative, easy to implement algorithms yield fewer rejections with a given computational effort. When the dimensionality of the target distribution is high, the number of accepted proposals may be multiplied by a factor of three or more. This increase in the number of accepted proposals is not achieved by impairing any positive features of the sampling. We also establish new non-asymptotic and asymptotic results on the monotonic relationship between the expected acceptance rate and the expected energy error. These results further validate the derivation of one of the integrators we consider and are of independent interest. △ Less

Submitted 2 April, 2021; v1 submitted 6 December, 2019; originally announced December 2019.

Comments: 37 pages, 8 figures

arXiv:1711.05337 [pdf, other]

doi 10.1017/S0962492917000101

Geometric integrators and the Hamiltonian Monte Carlo method

Authors: Nawaf Bou-Rabee, Jesús María Sanz-Serna

Abstract: This paper surveys in detail the relations between numerical integration and the Hamiltonian (or hybrid) Monte Carlo method (HMC). Since the computational cost of HMC mainly lies in the numerical integrations, these should be performed as efficiently as possible. However, HMC requires methods that have the geometric properties of being volume-preserving and reversible, and this limits the number o… ▽ More This paper surveys in detail the relations between numerical integration and the Hamiltonian (or hybrid) Monte Carlo method (HMC). Since the computational cost of HMC mainly lies in the numerical integrations, these should be performed as efficiently as possible. However, HMC requires methods that have the geometric properties of being volume-preserving and reversible, and this limits the number of integrators that may be used. On the other hand, these geometric properties have important quantitative implications on the integration error, which in turn have an impact on the acceptance rate of the proposal. While at present the velocity Verlet algorithm is the method of choice for good reasons, we argue that Verlet can be improved upon. We also discuss in detail the behavior of HMC as the dimensionality of the target distribution increases. △ Less

Submitted 14 November, 2017; originally announced November 2017.

Comments: Final version will appear in Acta Numerica 2018

Journal ref: Acta Numerica, Vol. 27, pp. 113-206, 2018

Showing 1–8 of 8 results for author: Sanz-Serna, J M