Search | arXiv e-print repository

arXiv:2402.07676 [pdf, other]

Statistical modelling and Bayesian inversion for a Compton imaging system: application to radioactive source localisation

Authors: Cecilia Tarpau, Ming Fang, Konstantinos C. Zygalakis, Marcelo Pereyra, Angela Di Fulvio, Yoann Altmann

Abstract: This paper presents a statistical forward model for a Compton imaging system, called Compton imager. This system, under development at the University of Illinois Urbana Champaign, is a variant of Compton cameras with a single type of sensors which can simultaneously act as scatterers and absorbers. This imager is convenient for imaging situations requiring a wide field of view. The proposed statis… ▽ More This paper presents a statistical forward model for a Compton imaging system, called Compton imager. This system, under development at the University of Illinois Urbana Champaign, is a variant of Compton cameras with a single type of sensors which can simultaneously act as scatterers and absorbers. This imager is convenient for imaging situations requiring a wide field of view. The proposed statistical forward model is then used to solve the inverse problem of estimating the location and energy of point-like sources from observed data. This inverse problem is formulated and solved in a Bayesian framework by using a Metropolis within Gibbs algorithm for the estimation of the location, and an expectation-maximization algorithm for the estimation of the energy. This approach leads to more accurate estimation when compared with the deterministic standard back-projection approach, with the additional benefit of uncertainty quantification in the low photon imaging setting. △ Less

Submitted 16 February, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

arXiv:2401.09097 [pdf, other]

A hybrid tau-leap for simulating chemical kinetics with applications to parameter estimation

Authors: Thomas Trigo Trindade, Konstantinos C. Zygalakis

Abstract: We consider the problem of efficiently simulating stochastic models of chemical kinetics. The Gillespie Stochastic Simulation algorithm (SSA) is often used to simulate these models, however, in many scenarios of interest, the computational cost quickly becomes prohibitive. This is further exasperated in the Bayesian inference context when estimating parameters of chemical models, as the intractabi… ▽ More We consider the problem of efficiently simulating stochastic models of chemical kinetics. The Gillespie Stochastic Simulation algorithm (SSA) is often used to simulate these models, however, in many scenarios of interest, the computational cost quickly becomes prohibitive. This is further exasperated in the Bayesian inference context when estimating parameters of chemical models, as the intractability of the likelihood requires multiple simulations of the underlying system. To deal with issues of computational complexity in this paper, we propose a novel hybrid $τ$-leap algorithm for simulating well-mixed chemical systems. In particular, the algorithm uses $τ$-leap when appropriate (high population densities), and SSA when necessary (low population densities, when discrete effects become non-negligible). In the intermediate regime, a combination of the two methods, which leverages the properties of the underlying Poisson formulation, is employed. As illustrated through a number of numerical experiments the hybrid $τ$ offers significant computational savings when compared to SSA without however sacrificing the overall accuracy. This feature is particularly welcomed in the Bayesian inference context, as it allows for parameter estimation of stochastic chemical kinetics at reduced computational cost. △ Less

Submitted 18 January, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

Comments: 25 pages, 8 figures

arXiv:2308.09460 [pdf, other]

Accelerated Bayesian imaging by relaxed proximal-point Langevin sampling

Authors: Teresa Klatzer, Paul Dobson, Yoann Altmann, Marcelo Pereyra, Jesús María Sanz-Serna, Konstantinos C. Zygalakis

Abstract: This paper presents a new accelerated proximal Markov chain Monte Carlo methodology to perform Bayesian inference in imaging inverse problems with an underlying convex geometry. The proposed strategy takes the form of a stochastic relaxed proximal-point iteration that admits two complementary interpretations. For models that are smooth or regularised by Moreau-Yosida smoothing, the algorithm is eq… ▽ More This paper presents a new accelerated proximal Markov chain Monte Carlo methodology to perform Bayesian inference in imaging inverse problems with an underlying convex geometry. The proposed strategy takes the form of a stochastic relaxed proximal-point iteration that admits two complementary interpretations. For models that are smooth or regularised by Moreau-Yosida smoothing, the algorithm is equivalent to an implicit midpoint discretisation of an overdamped Langevin diffusion targeting the posterior distribution of interest. This discretisation is asymptotically unbiased for Gaussian targets and shown to converge in an accelerated manner for any target that is $κ$-strongly log-concave (i.e., requiring in the order of $\sqrtκ$ iterations to converge, similarly to accelerated optimisation schemes), comparing favorably to [M. Pereyra, L. Vargas Mieles, K.C. Zygalakis, SIAM J. Imaging Sciences, 13,2 (2020), pp. 905-935] which is only provably accelerated for Gaussian targets and has bias. For models that are not smooth, the algorithm is equivalent to a Leimkuhler-Matthews discretisation of a Langevin diffusion targeting a Moreau-Yosida approximation of the posterior distribution of interest, and hence achieves a significantly lower bias than conventional unadjusted Langevin strategies based on the Euler-Maruyama discretisation. For targets that are $κ$-strongly log-concave, the provided non-asymptotic convergence analysis also identifies the optimal time step which maximizes the convergence speed. The proposed methodology is demonstrated through a range of experiments related to image deconvolution with Gaussian and Poisson noise, with assumption-driven and data-driven convex priors. Source codes for the numerical experiments of this paper are available from https://github.com/MI2G/accelerated-langevin-imla. △ Less

Submitted 12 January, 2024; v1 submitted 18 August, 2023; originally announced August 2023.

Comments: 34 pages, 13 figures

MSC Class: 65C40; 68U10; 62F15; 65C60; 65J22; 68W25

arXiv:2307.08343 [pdf, ps, other]

doi 10.1007/s11222-024-10452-2

Gaussian processes for Bayesian inverse problems associated with linear partial differential equations

Authors: Tianming Bai, Aretha L. Teckentrup, Konstantinos C. Zygalakis

Abstract: This work is concerned with the use of Gaussian surrogate models for Bayesian inverse problems associated with linear partial differential equations. A particular focus is on the regime where only a small amount of training data is available. In this regime the type of Gaussian prior used is of critical importance with respect to how well the surrogate model will perform in terms of Bayesian inver… ▽ More This work is concerned with the use of Gaussian surrogate models for Bayesian inverse problems associated with linear partial differential equations. A particular focus is on the regime where only a small amount of training data is available. In this regime the type of Gaussian prior used is of critical importance with respect to how well the surrogate model will perform in terms of Bayesian inversion. We extend the framework of Raissi et. al. (2017) to construct PDE-informed Gaussian priors that we then use to construct different approximate posteriors. A number of different numerical experiments illustrate the superiority of the PDE-informed Gaussian priors over more traditional priors. △ Less

Submitted 17 July, 2023; originally announced July 2023.

arXiv:2206.13894 [pdf, ps, other]

The split Gibbs sampler revisited: improvements to its algorithmic structure and augmented target distribution

Authors: Marcelo Pereyra, Luis A. Vargas-Mieles, Konstantinos C. Zygalakis

Abstract: Develo** efficient Bayesian computation algorithms for imaging inverse problems is challenging due to the dimensionality involved and because Bayesian imaging models are often not smooth. Current state-of-the-art methods often address these difficulties by replacing the posterior density with a smooth approximation that is amenable to efficient exploration by using Langevin Markov chain Monte Ca… ▽ More Develo** efficient Bayesian computation algorithms for imaging inverse problems is challenging due to the dimensionality involved and because Bayesian imaging models are often not smooth. Current state-of-the-art methods often address these difficulties by replacing the posterior density with a smooth approximation that is amenable to efficient exploration by using Langevin Markov chain Monte Carlo (MCMC) methods. An alternative approach is based on data augmentation and relaxation, where auxiliary variables are introduced in order to construct an approximate augmented posterior distribution that is amenable to efficient exploration by Gibbs sampling. This paper proposes a new accelerated proximal MCMC method called latent space SK-ROCK (ls SK-ROCK), which tightly combines the benefits of the two aforementioned strategies. Additionally, instead of viewing the augmented posterior distribution as an approximation of the original model, we propose to consider it as a generalisation of this model. Following on from this, we empirically show that there is a range of values for the relaxation parameter for which the accuracy of the model improves, and propose a stochastic optimisation algorithm to automatically identify the optimal amount of relaxation for a given problem. In this regime, ls SK-ROCK converges faster than competing approaches from the state of the art, and also achieves better accuracy since the underlying augmented Bayesian model has a higher Bayesian evidence. The proposed methodology is demonstrated with a range of numerical experiments related to image deblurring and inpainting, as well as with comparisons with alternative approaches from the state of the art. An open-source implementation of the proposed MCMC methods is available from https://github.com/luisvargasmieles/ls-MCMC. △ Less

Submitted 3 May, 2023; v1 submitted 28 June, 2022; originally announced June 2022.

Comments: 22 pages, 9 figures, 7 tables. Accepted for publication in SIAM Journal on Imaging Sciences (SIIMS)

MSC Class: 60J22; 65C40; 68U10 (Primary) 62E17; 62F15; 62H10; 65J22; 68W25 (Secondary)

arXiv:2206.05350 [pdf, other]

Efficient Bayesian computation for low-photon imaging problems

Authors: Savvas Melidonis, Paul Dobson, Yoann Altmann, Marcelo Pereyra, Konstantinos C. Zygalakis

Abstract: This paper studies a new and highly efficient Markov chain Monte Carlo (MCMC) methodology to perform Bayesian inference in low-photon imaging problems, with particular attention to situations involving observation noise processes that deviate significantly from Gaussian noise, such as binomial, geometric and low-intensity Poisson noise. These problems are challenging for many reasons. From an infe… ▽ More This paper studies a new and highly efficient Markov chain Monte Carlo (MCMC) methodology to perform Bayesian inference in low-photon imaging problems, with particular attention to situations involving observation noise processes that deviate significantly from Gaussian noise, such as binomial, geometric and low-intensity Poisson noise. These problems are challenging for many reasons. From an inferential viewpoint, low-photon numbers lead to severe identifiability issues, poor stability and high uncertainty about the solution. Moreover, low-photon models often exhibit poor regularity properties that make efficient Bayesian computation difficult; e.g., hard non-negativity constraints, non-smooth priors, and log-likelihood terms with exploding gradients. More precisely, the lack of suitable regularity properties hinders the use of state-of-the-art Monte Carlo methods based on numerical approximations of the Langevin stochastic differential equation (SDE), as both the SDE and its numerical approximations behave poorly. We address this difficulty by proposing an MCMC methodology based on a reflected and regularised Langevin SDE, which is shown to be well-posed and exponentially ergodic under mild and easily verifiable conditions. This then allows us to derive four reflected proximal Langevin MCMC algorithms to perform Bayesian computation in low-photon imaging problems. The proposed approach is demonstrated with a range of experiments related to image deblurring, denoising, and inpainting under binomial, geometric and Poisson noise. △ Less

Submitted 10 June, 2022; originally announced June 2022.

Comments: 28 pages, 10 figures

MSC Class: 65C40; 68U10 (Primary) 62F15; 65C60; 65J22; 62E17; 62F30; 62H10; 68W25 (Secondary)

arXiv:2104.12384 [pdf, ps, other]

Wasserstein distance estimates for the distributions of numerical approximations to ergodic stochastic differential equations

Authors: J. M. Sanz-Serna, Konstantinos C. Zygalakis

Abstract: We present a framework that allows for the non-asymptotic study of the $2$-Wasserstein distance between the invariant distribution of an ergodic stochastic differential equation and the distribution of its numerical approximation in the strongly log-concave case. This allows us to study in a unified way a number of different integrators proposed in the literature for the overdamped and underdamped… ▽ More We present a framework that allows for the non-asymptotic study of the $2$-Wasserstein distance between the invariant distribution of an ergodic stochastic differential equation and the distribution of its numerical approximation in the strongly log-concave case. This allows us to study in a unified way a number of different integrators proposed in the literature for the overdamped and underdamped Langevin dynamics. In addition, we analyse a novel splitting method for the underdamped Langevin dynamics which only requires one gradient evaluation per time step. Under an additional smoothness assumption on a $d$--dimensional strongly log-concave distribution with condition number $κ$, the algorithm is shown to produce with an $\mathcal{O}\big(κ^{5/4} d^{1/4}ε^{-1/2} \big)$ complexity samples from a distribution that, in Wasserstein distance, is at most $ε>0$ away from the target distribution. △ Less

Submitted 24 September, 2021; v1 submitted 26 April, 2021; originally announced April 2021.

Comments: 29 pages, 2 figures

MSC Class: 65C40; 60H10; 60H35

arXiv:2103.10182 [pdf, other]

Bayesian Imaging With Data-Driven Priors Encoded by Neural Networks: Theory, Methods, and Algorithms

Authors: Matthew Holden, Marcelo Pereyra, Konstantinos C. Zygalakis

Abstract: This paper proposes a new methodology for performing Bayesian inference in imaging inverse problems where the prior knowledge is available in the form of training data. Following the manifold hypothesis and adopting a generative modelling approach, we construct a data-driven prior that is supported on a sub-manifold of the ambient space, which we can learn from the training data by using a variati… ▽ More This paper proposes a new methodology for performing Bayesian inference in imaging inverse problems where the prior knowledge is available in the form of training data. Following the manifold hypothesis and adopting a generative modelling approach, we construct a data-driven prior that is supported on a sub-manifold of the ambient space, which we can learn from the training data by using a variational autoencoder or a generative adversarial network. We establish the existence and well-posedness of the associated posterior distribution and posterior moments under easily verifiable conditions, providing a rigorous underpinning for Bayesian estimators and uncertainty quantification analyses. Bayesian computation is performed by using a parallel tempered version of the preconditioned Crank-Nicolson algorithm on the manifold, which is shown to be ergodic and robust to the non-convex nature of these data-driven models. In addition to point estimators and uncertainty quantification analyses, we derive a model misspecification test to automatically detect situations where the data-driven prior is unreliable, and explain how to identify the dimension of the latent space directly from the training data. The proposed approach is illustrated with a range of experiments with the MNIST dataset, where it outperforms alternative image reconstruction approaches from the state of the art. A model accuracy analysis suggests that the Bayesian probabilities reported by the data-driven models are also remarkably accurate under a frequentist definition of probability. △ Less

Submitted 18 March, 2021; originally announced March 2021.

arXiv:1911.05035

Constructing Gradient Controllable Recurrent Neural Networks Using Hamiltonian Dynamics

Authors: Konstantin Rusch, John W. Pearson, Konstantinos C. Zygalakis

Abstract: Recurrent neural networks (RNNs) have gained a great deal of attention in solving sequential learning problems. The learning of long-term dependencies, however, remains challenging due to the problem of a vanishing or exploding hidden states gradient. By exploring further the recently established connections between RNNs and dynamical systems we propose a novel RNN architecture, which we call a Ha… ▽ More Recurrent neural networks (RNNs) have gained a great deal of attention in solving sequential learning problems. The learning of long-term dependencies, however, remains challenging due to the problem of a vanishing or exploding hidden states gradient. By exploring further the recently established connections between RNNs and dynamical systems we propose a novel RNN architecture, which we call a Hamiltonian recurrent neural network (Hamiltonian RNN), based on a symplectic discretization of an appropriately chosen Hamiltonian system. The key benefit of this approach is that the corresponding RNN inherits the favorable long time properties of the Hamiltonian system, which in turn allows us to control the hidden states gradient with a hyperparameter of the Hamiltonian RNN architecture. This enables us to handle sequential learning problems with arbitrary sequence lengths, since for a range of values of this hyperparameter the gradient neither vanishes nor explodes. Additionally, we provide a heuristic for the optimal choice of the hyperparameter, which we use in our numerical simulations to illustrate that the Hamiltonian RNN is able to outperform other state-of-the-art RNNs without the need of computationally intensive hyperparameter optimization. △ Less

Submitted 16 March, 2020; v1 submitted 11 November, 2019; originally announced November 2019.

Comments: Reasons: 1. theoretical result of bounding the gradient dynamics is highly important when tackling the exploding gradient problem. However, we only proved the boundedness in one dimension and cannot generalize to the higher dimensional case, as the Hamiltonian argument is not valid in the general higher dimensional case. 2. The only medium strong performance on the widely used sMNIST problem

arXiv:1909.10221 [pdf, other]

PDE-Inspired Algorithms for Semi-Supervised Learning on Point Clouds

Authors: Oliver M. Crook, Tim Hurst, Carola-Bibiane Schönlieb, Matthew Thorpe, Konstantinos C. Zygalakis

Abstract: Given a data set and a subset of labels the problem of semi-supervised learning on point clouds is to extend the labels to the entire data set. In this paper we extend the labels by minimising the constrained discrete $p$-Dirichlet energy. Under suitable conditions the discrete problem can be connected, in the large data limit, with the minimiser of a weighted continuum $p$-Dirichlet energy with t… ▽ More Given a data set and a subset of labels the problem of semi-supervised learning on point clouds is to extend the labels to the entire data set. In this paper we extend the labels by minimising the constrained discrete $p$-Dirichlet energy. Under suitable conditions the discrete problem can be connected, in the large data limit, with the minimiser of a weighted continuum $p$-Dirichlet energy with the same constraints. We take advantage of this connection by designing numerical schemes that first estimate the density of the data and then apply PDE methods, such as pseudo-spectral methods, to solve the corresponding Euler-Lagrange equation. We prove that our scheme is consistent in the large data limit for two methods of density estimation: kernel density estimation and spline kernel density estimation. △ Less

Submitted 23 September, 2019; originally announced September 2019.

arXiv:1908.08845 [pdf, ps, other]

Accelerating proximal Markov chain Monte Carlo by using an explicit stabilised method

Authors: Luis Vargas, Marcelo Pereyra, Konstantinos C. Zygalakis

Abstract: We present a highly efficient proximal Markov chain Monte Carlo methodology to perform Bayesian computation in imaging problems. Similarly to previous proximal Monte Carlo approaches, the proposed method is derived from an approximation of the Langevin diffusion. However, instead of the conventional Euler-Maruyama approximation that underpins existing proximal Monte Carlo methods, here we use a st… ▽ More We present a highly efficient proximal Markov chain Monte Carlo methodology to perform Bayesian computation in imaging problems. Similarly to previous proximal Monte Carlo approaches, the proposed method is derived from an approximation of the Langevin diffusion. However, instead of the conventional Euler-Maruyama approximation that underpins existing proximal Monte Carlo methods, here we use a state-of-the-art orthogonal Runge-Kutta-Chebyshev stochastic approximation that combines several gradient evaluations to significantly accelerate its convergence speed, similarly to accelerated gradient optimisation methods. The proposed methodology is demonstrated via a range of numerical experiments, including non-blind image deconvolution, hyperspectral unmixing, and tomographic reconstruction, with total-variation and $\ell_1$-type priors. Comparisons with Euler-type proximal Monte Carlo methods confirm that the Markov chains generated with our method exhibit significantly faster convergence speeds, achieve larger effective sample sizes, and produce lower mean square estimation errors at equal computational budget. △ Less

Submitted 19 March, 2020; v1 submitted 23 August, 2019; originally announced August 2019.

Comments: 28 pages, 13 figures. Accepted for publication in SIAM Journal on Imaging Sciences (SIIMS)

arXiv:1703.08816 [pdf, other]

Uncertainty quantification in graph-based classification of high dimensional data

Authors: Andrea L. Bertozzi, Xiyang Luo, Andrew M. Stuart, Konstantinos C. Zygalakis

Abstract: Classification of high dimensional data finds wide-ranging applications. In many of these applications equip** the resulting classification with a measure of uncertainty may be as important as the classification itself. In this paper we introduce, develop algorithms for, and investigate the properties of, a variety of Bayesian models for the task of binary classification; via the posterior distr… ▽ More Classification of high dimensional data finds wide-ranging applications. In many of these applications equip** the resulting classification with a measure of uncertainty may be as important as the classification itself. In this paper we introduce, develop algorithms for, and investigate the properties of, a variety of Bayesian models for the task of binary classification; via the posterior distribution on the classification labels, these methods automatically give measures of uncertainty. The methods are all based around the graph formulation of semi-supervised learning. We provide a unified framework which brings together a variety of methods which have been introduced in different communities within the mathematical sciences. We study probit classification in the graph-based setting, generalize the level-set method for Bayesian inverse problems to the classification setting, and generalize the Ginzburg-Landau optimization-based classifier to a Bayesian setting; we also show that the probit and level set approaches are natural relaxations of the harmonic function approach introduced in [Zhu et al 2003]. We introduce efficient numerical methods, suited to large data-sets, for both MCMC-based sampling as well as gradient-based MAP estimation. Through numerical experiments we study classification accuracy and uncertainty quantification for our models; these experiments showcase a suite of datasets commonly used to evaluate graph-based semi-supervised learning algorithms. △ Less

Submitted 8 February, 2018; v1 submitted 26 March, 2017; originally announced March 2017.

Comments: 33 pages, 14 figures

arXiv:1701.04247 [pdf, other]

Nonreversible Langevin Samplers: Splitting Schemes, Analysis and Implementation

Authors: A. B. Duncan, G. A. Pavliotis, K. C. Zygalakis

Abstract: For a given target density, there exist an infinite number of diffusion processes which are ergodic with respect to this density. As observed in a number of papers, samplers based on nonreversible diffusion processes can significantly outperform their reversible counterparts both in terms of asymptotic variance and rate of convergence to equilibrium. In this paper, we take advantage of this in ord… ▽ More For a given target density, there exist an infinite number of diffusion processes which are ergodic with respect to this density. As observed in a number of papers, samplers based on nonreversible diffusion processes can significantly outperform their reversible counterparts both in terms of asymptotic variance and rate of convergence to equilibrium. In this paper, we take advantage of this in order to construct efficient sampling algorithms based on the Lie-Trotter decomposition of a nonreversible diffusion process into reversible and nonreversible components. We show that samplers based on this scheme can significantly outperform standard MCMC methods, at the cost of introducing some controlled bias. In particular, we prove that numerical integrators constructed according to this decomposition are geometrically ergodic and characterise fully their asymptotic bias and variance, showing that the sampler inherits the good mixing properties of the underlying nonreversible diffusion. This is illustrated further with a number of numerical examples ranging from highly correlated low dimensional distributions, to logistic regression problems in high dimensions as well as inference for spatial models with many latent variables. △ Less

Submitted 16 January, 2017; originally announced January 2017.

arXiv:1506.07825 [pdf, ps, other]

Data Assimilation: A Mathematical Introduction

Authors: K. J. H. Law, A. M. Stuart, K. C. Zygalakis

Abstract: These notes provide a systematic mathematical treatment of the subject of data assimilation. These notes provide a systematic mathematical treatment of the subject of data assimilation. △ Less

Submitted 25 June, 2015; originally announced June 2015.

arXiv:1501.00438 [pdf, other]

(Non-) asymptotic properties of Stochastic Gradient Langevin Dynamics

Authors: Sebastian J. Vollmer, Konstantinos C. Zygalakis, and Yee Whye Teh

Abstract: Applying standard Markov chain Monte Carlo (MCMC) algorithms to large data sets is computationally infeasible. The recently proposed stochastic gradient Langevin dynamics (SGLD) method circumvents this problem in three ways: it generates proposed moves using only a subset of the data, it skips the Metropolis-Hastings accept-reject step, and it uses sequences of decreasing step sizes. In \cite{TehT… ▽ More Applying standard Markov chain Monte Carlo (MCMC) algorithms to large data sets is computationally infeasible. The recently proposed stochastic gradient Langevin dynamics (SGLD) method circumvents this problem in three ways: it generates proposed moves using only a subset of the data, it skips the Metropolis-Hastings accept-reject step, and it uses sequences of decreasing step sizes. In \cite{TehThierryVollmerSGLD2014}, we provided the mathematical foundations for the decreasing step size SGLD, including consistency and a central limit theorem. However, in practice the SGLD is run for a relatively small number of iterations, and its step size is not decreased to zero. The present article investigates the behaviour of the SGLD with fixed step size. In particular we characterise the asymptotic bias explicitly, along with its dependence on the step size and the variance of the stochastic gradient. On that basis a modified SGLD which removes the asymptotic bias due to the variance of the stochastic gradients up to first order in the step size is derived. Moreover, we are able to obtain bounds on the finite-time bias, variance and mean squared error (MSE). The theory is illustrated with a Gaussian toy model for which the bias and the MSE for the estimation of moments can be obtained explicitly. For this toy model we study the gain of the SGLD over the standard Euler method in the limit of large data sets. △ Less

Submitted 21 September, 2015; v1 submitted 2 January, 2015; originally announced January 2015.

Comments: 42 pages, 7 figures

MSC Class: 60J05; 65C05

Showing 1–15 of 15 results for author: Zygalakis, K C