-
A Variational Perspective on High-Resolution ODEs
Authors:
Hoomaan Maskan,
Konstantinos C. Zygalakis,
Alp Yurtsever
Abstract:
We consider unconstrained minimization of smooth convex functions. We propose a novel variational perspective using forced Euler-Lagrange equation that allows for studying high-resolution ODEs. Through this, we obtain a faster convergence rate for gradient norm minimization using Nesterov's accelerated gradient method. Additionally, we show that Nesterov's method can be interpreted as a rate-match…
▽ More
We consider unconstrained minimization of smooth convex functions. We propose a novel variational perspective using forced Euler-Lagrange equation that allows for studying high-resolution ODEs. Through this, we obtain a faster convergence rate for gradient norm minimization using Nesterov's accelerated gradient method. Additionally, we show that Nesterov's method can be interpreted as a rate-matching discretization of an appropriately chosen high-resolution ODE. Finally, using the results from the new variational perspective, we propose a stochastic method for noisy gradients. Several numerical experiments compare and illustrate our stochastic algorithm with state of the art methods.
△ Less
Submitted 3 November, 2023;
originally announced November 2023.
-
Accelerated Bayesian imaging by relaxed proximal-point Langevin sampling
Authors:
Teresa Klatzer,
Paul Dobson,
Yoann Altmann,
Marcelo Pereyra,
Jesús María Sanz-Serna,
Konstantinos C. Zygalakis
Abstract:
This paper presents a new accelerated proximal Markov chain Monte Carlo methodology to perform Bayesian inference in imaging inverse problems with an underlying convex geometry. The proposed strategy takes the form of a stochastic relaxed proximal-point iteration that admits two complementary interpretations. For models that are smooth or regularised by Moreau-Yosida smoothing, the algorithm is eq…
▽ More
This paper presents a new accelerated proximal Markov chain Monte Carlo methodology to perform Bayesian inference in imaging inverse problems with an underlying convex geometry. The proposed strategy takes the form of a stochastic relaxed proximal-point iteration that admits two complementary interpretations. For models that are smooth or regularised by Moreau-Yosida smoothing, the algorithm is equivalent to an implicit midpoint discretisation of an overdamped Langevin diffusion targeting the posterior distribution of interest. This discretisation is asymptotically unbiased for Gaussian targets and shown to converge in an accelerated manner for any target that is $κ$-strongly log-concave (i.e., requiring in the order of $\sqrtκ$ iterations to converge, similarly to accelerated optimisation schemes), comparing favorably to [M. Pereyra, L. Vargas Mieles, K.C. Zygalakis, SIAM J. Imaging Sciences, 13,2 (2020), pp. 905-935] which is only provably accelerated for Gaussian targets and has bias. For models that are not smooth, the algorithm is equivalent to a Leimkuhler-Matthews discretisation of a Langevin diffusion targeting a Moreau-Yosida approximation of the posterior distribution of interest, and hence achieves a significantly lower bias than conventional unadjusted Langevin strategies based on the Euler-Maruyama discretisation. For targets that are $κ$-strongly log-concave, the provided non-asymptotic convergence analysis also identifies the optimal time step which maximizes the convergence speed. The proposed methodology is demonstrated through a range of experiments related to image deconvolution with Gaussian and Poisson noise, with assumption-driven and data-driven convex priors. Source codes for the numerical experiments of this paper are available from https://github.com/MI2G/accelerated-langevin-imla.
△ Less
Submitted 12 January, 2024; v1 submitted 18 August, 2023;
originally announced August 2023.
-
Gaussian processes for Bayesian inverse problems associated with linear partial differential equations
Authors:
Tianming Bai,
Aretha L. Teckentrup,
Konstantinos C. Zygalakis
Abstract:
This work is concerned with the use of Gaussian surrogate models for Bayesian inverse problems associated with linear partial differential equations. A particular focus is on the regime where only a small amount of training data is available. In this regime the type of Gaussian prior used is of critical importance with respect to how well the surrogate model will perform in terms of Bayesian inver…
▽ More
This work is concerned with the use of Gaussian surrogate models for Bayesian inverse problems associated with linear partial differential equations. A particular focus is on the regime where only a small amount of training data is available. In this regime the type of Gaussian prior used is of critical importance with respect to how well the surrogate model will perform in terms of Bayesian inversion. We extend the framework of Raissi et. al. (2017) to construct PDE-informed Gaussian priors that we then use to construct different approximate posteriors. A number of different numerical experiments illustrate the superiority of the PDE-informed Gaussian priors over more traditional priors.
△ Less
Submitted 17 July, 2023;
originally announced July 2023.
-
A Hierarchy of Network Models Giving Bistability Under Triadic Closure
Authors:
Stefano Di Giovacchino,
Desmond J. Higham,
Konstantinos C. Zygalakis
Abstract:
Triadic closure describes the tendency for new friendships to form between individuals who already have friends in common. It has been argued heuristically that the triadic closure effect can lead to bistability in the formation of large-scale social interaction networks. Here, depending on the initial state and the transient dynamics, the system may evolve towards either of two long-time states.…
▽ More
Triadic closure describes the tendency for new friendships to form between individuals who already have friends in common. It has been argued heuristically that the triadic closure effect can lead to bistability in the formation of large-scale social interaction networks. Here, depending on the initial state and the transient dynamics, the system may evolve towards either of two long-time states. In this work, we propose and study a hierarchy of network evolution models that incorporate triadic closure, building on the work of Grindrod, Higham, and Parsons [Internet Mathematics, 8, 2012, 402--423]. We use a chemical kinetics framework, paying careful attention to the reaction rate scaling with respect to the system size. In a macroscale regime, we show rigorously that a bimodal steady-state distribution is admitted. This behavior corresponds to the existence of two distinct stable fixed points in a deterministic mean-field ODE. The macroscale model is also seen to capture an apparent metastability property of the microscale system.
Computational simulations are used to support the analysis.
△ Less
Submitted 10 November, 2021;
originally announced November 2021.
-
Wasserstein distance estimates for the distributions of numerical approximations to ergodic stochastic differential equations
Authors:
J. M. Sanz-Serna,
Konstantinos C. Zygalakis
Abstract:
We present a framework that allows for the non-asymptotic study of the $2$-Wasserstein distance between the invariant distribution of an ergodic stochastic differential equation and the distribution of its numerical approximation in the strongly log-concave case. This allows us to study in a unified way a number of different integrators proposed in the literature for the overdamped and underdamped…
▽ More
We present a framework that allows for the non-asymptotic study of the $2$-Wasserstein distance between the invariant distribution of an ergodic stochastic differential equation and the distribution of its numerical approximation in the strongly log-concave case. This allows us to study in a unified way a number of different integrators proposed in the literature for the overdamped and underdamped Langevin dynamics. In addition, we analyse a novel splitting method for the underdamped Langevin dynamics which only requires one gradient evaluation per time step. Under an additional smoothness assumption on a $d$--dimensional strongly log-concave distribution with condition number $κ$, the algorithm is shown to produce with an $\mathcal{O}\big(κ^{5/4} d^{1/4}ε^{-1/2} \big)$ complexity samples from a distribution that, in Wasserstein distance, is at most $ε>0$ away from the target distribution.
△ Less
Submitted 24 September, 2021; v1 submitted 26 April, 2021;
originally announced April 2021.
-
Bayesian Imaging With Data-Driven Priors Encoded by Neural Networks: Theory, Methods, and Algorithms
Authors:
Matthew Holden,
Marcelo Pereyra,
Konstantinos C. Zygalakis
Abstract:
This paper proposes a new methodology for performing Bayesian inference in imaging inverse problems where the prior knowledge is available in the form of training data. Following the manifold hypothesis and adopting a generative modelling approach, we construct a data-driven prior that is supported on a sub-manifold of the ambient space, which we can learn from the training data by using a variati…
▽ More
This paper proposes a new methodology for performing Bayesian inference in imaging inverse problems where the prior knowledge is available in the form of training data. Following the manifold hypothesis and adopting a generative modelling approach, we construct a data-driven prior that is supported on a sub-manifold of the ambient space, which we can learn from the training data by using a variational autoencoder or a generative adversarial network. We establish the existence and well-posedness of the associated posterior distribution and posterior moments under easily verifiable conditions, providing a rigorous underpinning for Bayesian estimators and uncertainty quantification analyses. Bayesian computation is performed by using a parallel tempered version of the preconditioned Crank-Nicolson algorithm on the manifold, which is shown to be ergodic and robust to the non-convex nature of these data-driven models. In addition to point estimators and uncertainty quantification analyses, we derive a model misspecification test to automatically detect situations where the data-driven prior is unreliable, and explain how to identify the dimension of the latent space directly from the training data. The proposed approach is illustrated with a range of experiments with the MNIST dataset, where it outperforms alternative image reconstruction approaches from the state of the art. A model accuracy analysis suggests that the Bayesian probabilities reported by the data-driven models are also remarkably accurate under a frequentist definition of probability.
△ Less
Submitted 18 March, 2021;
originally announced March 2021.
-
A Linear Transportation $\mathrm{L}^p$ Distance for Pattern Recognition
Authors:
Oliver M. Crook,
Mihai Cucuringu,
Tim Hurst,
Carola-Bibiane Schönlieb,
Matthew Thorpe,
Konstantinos C. Zygalakis
Abstract:
The transportation $\mathrm{L}^p$ distance, denoted $\mathrm{TL}^p$, has been proposed as a generalisation of Wasserstein $\mathrm{W}^p$ distances motivated by the property that it can be applied directly to colour or multi-channelled images, as well as multivariate time-series without normalisation or mass constraints. These distances, as with $\mathrm{W}^p$, are powerful tools in modelling data…
▽ More
The transportation $\mathrm{L}^p$ distance, denoted $\mathrm{TL}^p$, has been proposed as a generalisation of Wasserstein $\mathrm{W}^p$ distances motivated by the property that it can be applied directly to colour or multi-channelled images, as well as multivariate time-series without normalisation or mass constraints. These distances, as with $\mathrm{W}^p$, are powerful tools in modelling data with spatial or temporal perturbations. However, their computational cost can make them infeasible to apply to even moderate pattern recognition tasks. We propose linear versions of these distances and show that the linear $\mathrm{TL}^p$ distance significantly improves over the linear $\mathrm{W}^p$ distance on signal processing tasks, whilst being several orders of magnitude faster to compute than the $\mathrm{TL}^p$ distance.
△ Less
Submitted 23 September, 2020;
originally announced September 2020.
-
The connections between Lyapunov functions for some optimization algorithms and differential equations
Authors:
J. M. Sanz-Serna,
Konstantinos C. Zygalakis
Abstract:
In this manuscript, we study the properties of a family of second-order differential equations with dam**, its discretizations and their connections with accelerated optimization algorithms for $m$-strongly convex and $L$-smooth functions. In particular, using the Linear Matrix Inequality LMI framework developed by \emph{Fazlyab et. al. $(2018)$}, we derive analytically a (discrete) Lyapunov fun…
▽ More
In this manuscript, we study the properties of a family of second-order differential equations with dam**, its discretizations and their connections with accelerated optimization algorithms for $m$-strongly convex and $L$-smooth functions. In particular, using the Linear Matrix Inequality LMI framework developed by \emph{Fazlyab et. al. $(2018)$}, we derive analytically a (discrete) Lyapunov function for a two-parameter family of Nesterov optimization methods, which allows for the complete characterization of their convergence rate. In the appropriate limit, this family of methods may be seen as a discretization of a family of second-order ordinary differential equations for which we construct(continuous) Lyapunov functions by means of the LMI framework. The continuous Lyapunov functions may alternatively, be obtained by studying the limiting behaviour of their discrete counterparts. Finally, we show that the majority of typical discretizations of the family of ODEs, such as the Heavy ball method, do not possess Lyapunov functions with properties similar to those of the Lyapunov function constructed here for the Nesterov method.
△ Less
Submitted 11 January, 2021; v1 submitted 1 September, 2020;
originally announced September 2020.
-
Constructing Gradient Controllable Recurrent Neural Networks Using Hamiltonian Dynamics
Authors:
Konstantin Rusch,
John W. Pearson,
Konstantinos C. Zygalakis
Abstract:
Recurrent neural networks (RNNs) have gained a great deal of attention in solving sequential learning problems. The learning of long-term dependencies, however, remains challenging due to the problem of a vanishing or exploding hidden states gradient. By exploring further the recently established connections between RNNs and dynamical systems we propose a novel RNN architecture, which we call a Ha…
▽ More
Recurrent neural networks (RNNs) have gained a great deal of attention in solving sequential learning problems. The learning of long-term dependencies, however, remains challenging due to the problem of a vanishing or exploding hidden states gradient. By exploring further the recently established connections between RNNs and dynamical systems we propose a novel RNN architecture, which we call a Hamiltonian recurrent neural network (Hamiltonian RNN), based on a symplectic discretization of an appropriately chosen Hamiltonian system. The key benefit of this approach is that the corresponding RNN inherits the favorable long time properties of the Hamiltonian system, which in turn allows us to control the hidden states gradient with a hyperparameter of the Hamiltonian RNN architecture. This enables us to handle sequential learning problems with arbitrary sequence lengths, since for a range of values of this hyperparameter the gradient neither vanishes nor explodes. Additionally, we provide a heuristic for the optimal choice of the hyperparameter, which we use in our numerical simulations to illustrate that the Hamiltonian RNN is able to outperform other state-of-the-art RNNs without the need of computationally intensive hyperparameter optimization.
△ Less
Submitted 16 March, 2020; v1 submitted 11 November, 2019;
originally announced November 2019.
-
Uncertainty quantification in graph-based classification of high dimensional data
Authors:
Andrea L. Bertozzi,
Xiyang Luo,
Andrew M. Stuart,
Konstantinos C. Zygalakis
Abstract:
Classification of high dimensional data finds wide-ranging applications. In many of these applications equip** the resulting classification with a measure of uncertainty may be as important as the classification itself. In this paper we introduce, develop algorithms for, and investigate the properties of, a variety of Bayesian models for the task of binary classification; via the posterior distr…
▽ More
Classification of high dimensional data finds wide-ranging applications. In many of these applications equip** the resulting classification with a measure of uncertainty may be as important as the classification itself. In this paper we introduce, develop algorithms for, and investigate the properties of, a variety of Bayesian models for the task of binary classification; via the posterior distribution on the classification labels, these methods automatically give measures of uncertainty. The methods are all based around the graph formulation of semi-supervised learning.
We provide a unified framework which brings together a variety of methods which have been introduced in different communities within the mathematical sciences. We study probit classification in the graph-based setting, generalize the level-set method for Bayesian inverse problems to the classification setting, and generalize the Ginzburg-Landau optimization-based classifier to a Bayesian setting; we also show that the probit and level set approaches are natural relaxations of the harmonic function approach introduced in [Zhu et al 2003].
We introduce efficient numerical methods, suited to large data-sets, for both MCMC-based sampling as well as gradient-based MAP estimation. Through numerical experiments we study classification accuracy and uncertainty quantification for our models; these experiments showcase a suite of datasets commonly used to evaluate graph-based semi-supervised learning algorithms.
△ Less
Submitted 8 February, 2018; v1 submitted 26 March, 2017;
originally announced March 2017.