Search | arXiv e-print repository

Proximal Interacting Particle Langevin Algorithms

Authors: Paula Cordero Encinar, Francesca R. Crucinio, O. Deniz Akyildiz

Abstract: We introduce a class of algorithms, termed Proximal Interacting Particle Langevin Algorithms (PIPLA), for inference and learning in latent variable models whose joint probability density is non-differentiable. Leveraging proximal Markov chain Monte Carlo (MCMC) techniques and the recently introduced interacting particle Langevin algorithm (IPLA), we propose several variants within the novel proxim… ▽ More We introduce a class of algorithms, termed Proximal Interacting Particle Langevin Algorithms (PIPLA), for inference and learning in latent variable models whose joint probability density is non-differentiable. Leveraging proximal Markov chain Monte Carlo (MCMC) techniques and the recently introduced interacting particle Langevin algorithm (IPLA), we propose several variants within the novel proximal IPLA family, tailored to the problem of estimating parameters in a non-differentiable statistical model. We prove nonasymptotic bounds for the parameter estimates produced by multiple algorithms in the strongly log-concave setting and provide comprehensive numerical experiments on various models to demonstrate the effectiveness of the proposed methods. In particular, we demonstrate the utility of the proposed family of algorithms on a toy hierarchical example where our assumptions can be checked, as well as on the problems of sparse Bayesian logistic regression, sparse Bayesian neural network, and sparse matrix completion. Our theory and experiments together show that PIPLA family can be the de facto choice for parameter estimation problems in latent variable models for non-differentiable models. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: 50 pages

arXiv:2406.04187 [pdf, ps, other]

A Multiscale Perspective on Maximum Marginal Likelihood Estimation

Authors: O. Deniz Akyildiz, Michela Ottobre, Iain Souttar

Abstract: In this paper, we provide a multiscale perspective on the problem of maximum marginal likelihood estimation. We consider and analyse a diffusion-based maximum marginal likelihood estimation scheme using ideas from multiscale dynamics. Our perspective is based on stochastic averaging; we make an explicit connection between ideas in applied probability and parameter inference in computational statis… ▽ More In this paper, we provide a multiscale perspective on the problem of maximum marginal likelihood estimation. We consider and analyse a diffusion-based maximum marginal likelihood estimation scheme using ideas from multiscale dynamics. Our perspective is based on stochastic averaging; we make an explicit connection between ideas in applied probability and parameter inference in computational statistics. In particular, we consider a general class of coupled Langevin diffusions for joint inference of latent variables and parameters in statistical models, where the latent variables are sampled from a fast Langevin process (which acts as a sampler), and the parameters are updated using a slow Langevin process (which acts as an optimiser). We show that the resulting system of stochastic differential equations (SDEs) can be viewed as a two-time scale system. To demonstrate the utility of such a perspective, we show that the averaged parameter dynamics obtained in the limit of scale separation can be used to estimate the optimal parameter, within the strongly convex setting. We do this by using recent uniform-in-time non-asymptotic averaging bounds. Finally, we conclude by showing that the slow-fast algorithm we consider here, termed Slow-Fast Langevin Algorithm, performs on par with state-of-the-art methods on a variety of examples. We believe that the stochastic averaging approach we provide in this paper enables us to look at these algorithms from a fresh angle, as well as unlocking the path to develop and analyse new methods using well-established averaging principles. △ Less

Submitted 10 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

Comments: 31 pages, 3 figures

MSC Class: 62-08 (Primary); 60J60; 62C12; 62F15; 65C30; 65D15 (Secondary)

arXiv:2405.17955 [pdf, other]

Efficient Prior Calibration From Indirect Data

Authors: O. Deniz Akyildiz, Mark Girolami, Andrew M. Stuart, Arnaud Vadeboncoeur

Abstract: Bayesian inversion is central to the quantification of uncertainty within problems arising from numerous applications in science and engineering. To formulate the approach, four ingredients are required: a forward model map** the unknown parameter to an element of a solution space, often the solution space for a differential equation; an observation operator map** an element of the solution sp… ▽ More Bayesian inversion is central to the quantification of uncertainty within problems arising from numerous applications in science and engineering. To formulate the approach, four ingredients are required: a forward model map** the unknown parameter to an element of a solution space, often the solution space for a differential equation; an observation operator map** an element of the solution space to the data space; a noise model describing how noise pollutes the observations; and a prior model describing knowledge about the unknown parameter before the data is acquired. This paper is concerned with learning the prior model from data; in particular, learning the prior from multiple realizations of indirect data obtained through the noisy observation process. The prior is represented, using a generative model, as the pushforward of a Gaussian in a latent space; the pushforward map is learned by minimizing an appropriate loss function. A metric that is well-defined under empirical approximation is used to define the loss function for the pushforward map to make an implementable methodology. Furthermore, an efficient residual-based neural operator approximation of the forward model is proposed and it is shown that this may be learned concurrently with the pushforward map, using a bilevel optimization formulation of the problem; this use of neural operator approximation has the potential to make prior learning from indirect data more computationally efficient, especially when the observation process is expensive, non-smooth or not known. The ideas are illustrated with the Darcy flow inverse problem of finding permeability from piezometric head measurements. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2311.13584 [pdf, other]

On diffusion-based generative models and their error bounds: The log-concave case with full convergence estimates

Authors: Stefano Bruno, Ying Zhang, Dong-Young Lim, Ömer Deniz Akyildiz, Sotirios Sabanis

Abstract: We provide full theoretical guarantees for the convergence behaviour of diffusion-based generative models under the assumption of strongly log-concave data distributions while our approximating class of functions used for score estimation is made of Lipschitz continuous functions. We demonstrate via a motivating example, sampling from a Gaussian distribution with unknown mean, the powerfulness of… ▽ More We provide full theoretical guarantees for the convergence behaviour of diffusion-based generative models under the assumption of strongly log-concave data distributions while our approximating class of functions used for score estimation is made of Lipschitz continuous functions. We demonstrate via a motivating example, sampling from a Gaussian distribution with unknown mean, the powerfulness of our approach. In this case, explicit estimates are provided for the associated optimization problem, i.e. score approximation, while these are combined with the corresponding sampling estimates. As a result, we obtain the best known upper bound estimates in terms of key quantities of interest, such as the dimension and rates of convergence, for the Wasserstein-2 distance between the data distribution (Gaussian with unknown mean) and our sampling algorithm. Beyond the motivating example and in order to allow for the use of a diverse range of stochastic optimizers, we present our results using an $L^2$-accurate score estimation assumption, which crucially is formed under an expectation with respect to the stochastic optimizer and our novel auxiliary process that uses only known information. This approach yields the best known convergence rate for our sampling algorithm. △ Less

Submitted 22 April, 2024; v1 submitted 22 November, 2023; originally announced November 2023.

arXiv:2310.06721 [pdf, other]

Tweedie Moment Projected Diffusions For Inverse Problems

Authors: Benjamin Boys, Mark Girolami, Jakiw Pidstrigach, Sebastian Reich, Alan Mosca, O. Deniz Akyildiz

Abstract: Diffusion generative models unlock new possibilities for inverse problems as they allow for the incorporation of strong empirical priors into the process of scientific inference. Recently, diffusion models received significant attention for solving inverse problems by posterior sampling, but many challenges remain open due to the intractability of this sampling process. Prior work resorted to Gaus… ▽ More Diffusion generative models unlock new possibilities for inverse problems as they allow for the incorporation of strong empirical priors into the process of scientific inference. Recently, diffusion models received significant attention for solving inverse problems by posterior sampling, but many challenges remain open due to the intractability of this sampling process. Prior work resorted to Gaussian approximations to conditional densities of the reverse process, leveraging Tweedie's formula to parameterise its mean, complemented with various heuristics. In this work, we leverage higher order information using Tweedie's formula and obtain a finer approximation with a principled covariance estimate. This novel approximation removes any time-dependent step-size hyperparameters required by earlier methods, and enables higher quality approximations of the posterior density which results in better samples. Specifically, we tackle noisy linear inverse problems and obtain a novel approximation to the gradient of the likelihood. We then plug this gradient estimate into various diffusion models and show that this method is optimal for a Gaussian data distribution. We illustrate the empirical effectiveness of our approach for general linear inverse problems on toy synthetic examples as well as image restoration using pretrained diffusion models as the prior. We show that our method improves the sample quality by providing statistically principled approximations to diffusion posterior sampling problem. △ Less

Submitted 22 November, 2023; v1 submitted 10 October, 2023; originally announced October 2023.

Comments: 9 pages, 4 figures, 1 table when excluding abstract and bibliography; 29 pages, 12 figures, 2 tables when including abstract and bibliography

arXiv:2307.09341 [pdf, other]

Adaptively Optimised Adaptive Importance Samplers

Authors: Carlos A. C. C. Perello, Ömer Deniz Akyildiz

Abstract: We introduce a new class of adaptive importance samplers leveraging adaptive optimisation tools, which we term AdaOAIS. We build on Optimised Adaptive Importance Samplers (OAIS), a class of techniques that adapt proposals to improve the mean-squared error of the importance sampling estimators by parameterising the proposal and optimising the $χ^2$-divergence between the target and the proposal. We… ▽ More We introduce a new class of adaptive importance samplers leveraging adaptive optimisation tools, which we term AdaOAIS. We build on Optimised Adaptive Importance Samplers (OAIS), a class of techniques that adapt proposals to improve the mean-squared error of the importance sampling estimators by parameterising the proposal and optimising the $χ^2$-divergence between the target and the proposal. We show that a naive implementation of OAIS using stochastic gradient descent may lead to unstable estimators despite its convergence guarantees. To remedy this shortcoming, we instead propose to use adaptive optimisers (such as AdaGrad and Adam) to improve the stability of the OAIS. We provide convergence results for AdaOAIS in a similar manner to OAIS. We also provide empirical demonstration on a variety of examples and show that AdaOAIS lead to stable importance sampling estimators in practice. △ Less

Submitted 18 July, 2023; originally announced July 2023.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2303.13429 [pdf, other]

Interacting Particle Langevin Algorithm for Maximum Marginal Likelihood Estimation

Authors: Ö. Deniz Akyildiz, Francesca Romana Crucinio, Mark Girolami, Tim Johnston, Sotirios Sabanis

Abstract: We develop a class of interacting particle systems for implementing a maximum marginal likelihood estimation (MMLE) procedure to estimate the parameters of a latent variable model. We achieve this by formulating a continuous-time interacting particle system which can be seen as a Langevin diffusion over an extended state space of parameters and latent variables. In particular, we prove that the pa… ▽ More We develop a class of interacting particle systems for implementing a maximum marginal likelihood estimation (MMLE) procedure to estimate the parameters of a latent variable model. We achieve this by formulating a continuous-time interacting particle system which can be seen as a Langevin diffusion over an extended state space of parameters and latent variables. In particular, we prove that the parameter marginal of the stationary measure of this diffusion has the form of a Gibbs measure where number of particles acts as the inverse temperature parameter in classical settings for global optimisation. Using a particular rescaling, we then prove geometric ergodicity of this system and bound the discretisation error in a manner that is uniform in time and does not increase with the number of particles. The discretisation results in an algorithm, termed Interacting Particle Langevin Algorithm (IPLA) which can be used for MMLE. We further prove nonasymptotic bounds for the optimisation error of our estimator in terms of key parameters of the problem, and also extend this result to the case of stochastic gradients covering practical scenarios. We provide numerical experiments to illustrate the empirical behaviour of our algorithm in the context of logistic regression with verifiable assumptions. Our setting provides a straightforward way to implement a diffusion-based optimisation routine compared to more classical approaches such as the Expectation Maximisation (EM) algorithm, and allows for especially explicit nonasymptotic bounds. △ Less

Submitted 11 October, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

Comments: 38 pages

arXiv:2301.11040 [pdf, other]

Random Grid Neural Processes for Parametric Partial Differential Equations

Authors: Arnaud Vadeboncoeur, Ieva Kazlauskaite, Yanni Papandreou, Fehmi Cirak, Mark Girolami, Ömer Deniz Akyildiz

Abstract: We introduce a new class of spatially stochastic physics and data informed deep latent models for parametric partial differential equations (PDEs) which operate through scalable variational neural processes. We achieve this by assigning probability measures to the spatial domain, which allows us to treat collocation grids probabilistically as random variables to be marginalised out. Adapting this… ▽ More We introduce a new class of spatially stochastic physics and data informed deep latent models for parametric partial differential equations (PDEs) which operate through scalable variational neural processes. We achieve this by assigning probability measures to the spatial domain, which allows us to treat collocation grids probabilistically as random variables to be marginalised out. Adapting this spatial statistics view, we solve forward and inverse problems for parametric PDEs in a way that leads to the construction of Gaussian process models of solution fields. The implementation of these random grids poses a unique set of challenges for inverse physics informed deep learning frameworks and we propose a new architecture called Grid Invariant Convolutional Networks (GICNets) to overcome these challenges. We further show how to incorporate noisy data in a principled manner into our physics informed model to improve predictions for problems where data may be available but whose measurement location does not coincide with any fixed mesh or grid. The proposed method is tested on a nonlinear Poisson problem, Burgers equation, and Navier-Stokes equations, and we provide extensive numerical comparisons. We demonstrate significant computational advantages over current physics informed neural learning methods for parametric PDEs while improving the predictive capabilities and flexibility of these models. △ Less

Submitted 7 June, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

arXiv:2210.10785 [pdf, ps, other]

Gradient-based Adaptive Importance Samplers

Authors: Víctor Elvira, Emilie Chouzenoux, Ömer Deniz Akyildiz, Luca Martino

Abstract: Importance sampling (IS) is a powerful Monte Carlo methodology for the approximation of intractable integrals, very often involving a target probability density function. The performance of IS heavily depends on the appropriate selection of the proposal distributions where the samples are simulated from. In this paper, we propose an adaptive importance sampler, called GRAMIS, that iteratively impr… ▽ More Importance sampling (IS) is a powerful Monte Carlo methodology for the approximation of intractable integrals, very often involving a target probability density function. The performance of IS heavily depends on the appropriate selection of the proposal distributions where the samples are simulated from. In this paper, we propose an adaptive importance sampler, called GRAMIS, that iteratively improves the set of proposals. The algorithm exploits geometric information of the target to adapt the location and scale parameters of those proposals. Moreover, in order to allow for a cooperative adaptation, a repulsion term is introduced that favors a coordinated exploration of the state space. This translates into a more diverse exploration and a better approximation of the target via the mixture of proposals. Moreover, we provide a theoretical justification of the repulsion term. We show the good performance of GRAMIS in two problems where the target has a challenging shape and cannot be easily approximated by a standard uni-modal proposal. △ Less

Submitted 21 June, 2023; v1 submitted 19 October, 2022; originally announced October 2022.

arXiv:2209.15609 [pdf, other]

$Φ$-DVAE: Physics-Informed Dynamical Variational Autoencoders for Unstructured Data Assimilation

Authors: Alex Glyn-Davies, Connor Duffin, Ö. Deniz Akyildiz, Mark Girolami

Abstract: Incorporating unstructured data into physical models is a challenging problem that is emerging in data assimilation. Traditional approaches focus on well-defined observation operators whose functional forms are typically assumed to be known. This prevents these methods from achieving a consistent model-data synthesis in configurations where the map** from data-space to model-space is unknown. To… ▽ More Incorporating unstructured data into physical models is a challenging problem that is emerging in data assimilation. Traditional approaches focus on well-defined observation operators whose functional forms are typically assumed to be known. This prevents these methods from achieving a consistent model-data synthesis in configurations where the map** from data-space to model-space is unknown. To address these shortcomings, in this paper we develop a physics-informed dynamical variational autoencoder ($Φ$-DVAE) to embed diverse data streams into time-evolving physical systems described by differential equations. Our approach combines a standard, possibly nonlinear, filter for the latent state-space model and a VAE, to assimilate the unstructured data into the latent dynamical system. Unstructured data, in our example systems, comes in the form of video data and velocity field measurements, however the methodology is suitably generic to allow for arbitrary unknown observation operators. A variational Bayesian framework is used for the joint estimation of the encoding, latent states, and unknown system parameters. To demonstrate the method, we provide case studies with the Lorenz-63 ordinary differential equation, and the advection and Korteweg-de Vries partial differential equations. Our results, with synthetic data, show that $Φ$-DVAE provides a data efficient dynamics encoding methodology which is competitive with standard approaches. Unknown parameters are recovered with uncertainty quantification, and unseen data are accurately predicted. △ Less

Submitted 14 July, 2023; v1 submitted 30 September, 2022; originally announced September 2022.

Comments: 27 pages, 8 figures, updated version

arXiv:2208.04856 [pdf, other]

doi 10.1016/j.jcp.2023.112369

Fully probabilistic deep models for forward and inverse problems in parametric PDEs

Authors: Arnaud Vadeboncoeur, Ömer Deniz Akyildiz, Ieva Kazlauskaite, Mark Girolami, Fehmi Cirak

Abstract: We introduce a physics-driven deep latent variable model (PDDLVM) to learn simultaneously parameter-to-solution (forward) and solution-to-parameter (inverse) maps of parametric partial differential equations (PDEs). Our formulation leverages conventional PDE discretization techniques, deep neural networks, probabilistic modelling, and variational inference to assemble a fully probabilistic coheren… ▽ More We introduce a physics-driven deep latent variable model (PDDLVM) to learn simultaneously parameter-to-solution (forward) and solution-to-parameter (inverse) maps of parametric partial differential equations (PDEs). Our formulation leverages conventional PDE discretization techniques, deep neural networks, probabilistic modelling, and variational inference to assemble a fully probabilistic coherent framework. In the posited probabilistic model, both the forward and inverse maps are approximated as Gaussian distributions with a mean and covariance parameterized by deep neural networks. The PDE residual is assumed to be an observed random vector of value zero, hence we model it as a random vector with a zero mean and a user-prescribed covariance. The model is trained by maximizing the probability, that is the evidence or marginal likelihood, of observing a residual of zero by maximizing the evidence lower bound (ELBO). Consequently, the proposed methodology does not require any independent PDE solves and is physics-informed at training time, allowing the real-time solution of PDE forward and inverse problems after training. The proposed framework can be easily extended to seamlessly integrate observed data to solve inverse problems and to build generative models. We demonstrate the efficiency and robustness of our method on finite element discretized parametric PDE problems such as linear and nonlinear Poisson problems, elastic shells with complex 3D geometries, and time-dependent nonlinear and inhomogeneous PDEs using a physics-informed neural network (PINN) discretization. We achieve up to three orders of magnitude speed-up after training compared to traditional finite element method (FEM), while outputting coherent uncertainty estimates. △ Less

Submitted 14 July, 2023; v1 submitted 9 August, 2022; originally announced August 2022.

arXiv:2201.00409 [pdf, other]

Global convergence of optimized adaptive importance samplers

Authors: Ömer Deniz Akyildiz

Abstract: We analyze the optimized adaptive importance sampler (OAIS) for performing Monte Carlo integration with general proposals. We leverage a classical result which shows that the bias and the mean-squared error (MSE) of the importance sampling scales with the $χ^2$-divergence between the target and the proposal and develop a scheme which performs global optimization of $χ^2$-divergence. While it is kn… ▽ More We analyze the optimized adaptive importance sampler (OAIS) for performing Monte Carlo integration with general proposals. We leverage a classical result which shows that the bias and the mean-squared error (MSE) of the importance sampling scales with the $χ^2$-divergence between the target and the proposal and develop a scheme which performs global optimization of $χ^2$-divergence. While it is known that this quantity is convex for exponential family proposals, the case of the general proposals has been an open problem. We close this gap by utilizing the nonasymptotic bounds for stochastic gradient Langevin dynamics (SGLD) for the global optimization of $χ^2$-divergence and derive nonasymptotic bounds for the MSE by leveraging recent results from non-convex optimization literature. The resulting AIS schemes have explicit theoretical guarantees that are uniform-in-time. △ Less

Submitted 28 January, 2024; v1 submitted 2 January, 2022; originally announced January 2022.

Comments: Accepted to Foundations of Data Science (FoDS), 2024, to appear

arXiv:2110.11131 [pdf, other]

Statistical Finite Elements via Langevin Dynamics

Authors: Ömer Deniz Akyildiz, Connor Duffin, Sotirios Sabanis, Mark Girolami

Abstract: The recent statistical finite element method (statFEM) provides a coherent statistical framework to synthesise finite element models with observed data. Through embedding uncertainty inside of the governing equations, finite element solutions are updated to give a posterior distribution which quantifies all sources of uncertainty associated with the model. However to incorporate all sources of unc… ▽ More The recent statistical finite element method (statFEM) provides a coherent statistical framework to synthesise finite element models with observed data. Through embedding uncertainty inside of the governing equations, finite element solutions are updated to give a posterior distribution which quantifies all sources of uncertainty associated with the model. However to incorporate all sources of uncertainty, one must integrate over the uncertainty associated with the model parameters, the known forward problem of uncertainty quantification. In this paper, we make use of Langevin dynamics to solve the statFEM forward problem, studying the utility of the unadjusted Langevin algorithm (ULA), a Metropolis-free Markov chain Monte Carlo sampler, to build a sample-based characterisation of this otherwise intractable measure. Due to the structure of the statFEM problem, these methods are able to solve the forward problem without explicit full PDE solves, requiring only sparse matrix-vector products. ULA is also gradient-based, and hence provides a scalable approach up to high degrees-of-freedom. Leveraging the theory behind Langevin-based samplers, we provide theoretical guarantees on sampler performance, demonstrating convergence, for both the prior and posterior, in the Kullback-Leibler divergence, and, in Wasserstein-2, with further results on the effect of preconditioning. Numerical experiments are also provided, for both the prior and posterior, to demonstrate the efficacy of the sampler, with a Python package also included. △ Less

Submitted 27 December, 2021; v1 submitted 21 October, 2021; originally announced October 2021.

Comments: New version, experiments updated

arXiv:2010.10436 [pdf, other]

VarGrad: A Low-Variance Gradient Estimator for Variational Inference

Authors: Lorenz Richter, Ayman Boustati, Nikolas Nüsken, Francisco J. R. Ruiz, Ömer Deniz Akyildiz

Abstract: We analyse the properties of an unbiased gradient estimator of the ELBO for variational inference, based on the score function method with leave-one-out control variates. We show that this gradient estimator can be obtained using a new loss, defined as the variance of the log-ratio between the exact posterior and the variational approximation, which we call the $\textit{log-variance loss}$. Under… ▽ More We analyse the properties of an unbiased gradient estimator of the ELBO for variational inference, based on the score function method with leave-one-out control variates. We show that this gradient estimator can be obtained using a new loss, defined as the variance of the log-ratio between the exact posterior and the variational approximation, which we call the $\textit{log-variance loss}$. Under certain conditions, the gradient of the log-variance loss equals the gradient of the (negative) ELBO. We show theoretically that this gradient estimator, which we call $\textit{VarGrad}$ due to its connection to the log-variance loss, exhibits lower variance than the score function method in certain settings, and that the leave-one-out control variate coefficients are close to the optimal ones. We empirically demonstrate that VarGrad offers a favourable variance versus computation trade-off compared to other state-of-the-art estimators on a discrete VAE. △ Less

Submitted 29 October, 2020; v1 submitted 20 October, 2020; originally announced October 2020.

arXiv:2002.09998 [pdf, other]

Generalized Bayesian Filtering via Sequential Monte Carlo

Authors: Ayman Boustati, Ömer Deniz Akyildiz, Theodoros Damoulas, Adam M. Johansen

Abstract: We introduce a framework for inference in general state-space hidden Markov models (HMMs) under likelihood misspecification. In particular, we leverage the loss-theoretic perspective of Generalized Bayesian Inference (GBI) to define generalised filtering recursions in HMMs, that can tackle the problem of inference under model misspecification. In doing so, we arrive at principled procedures for ro… ▽ More We introduce a framework for inference in general state-space hidden Markov models (HMMs) under likelihood misspecification. In particular, we leverage the loss-theoretic perspective of Generalized Bayesian Inference (GBI) to define generalised filtering recursions in HMMs, that can tackle the problem of inference under model misspecification. In doing so, we arrive at principled procedures for robust inference against observation contamination by utilising the $β$-divergence. Operationalising the proposed framework is made possible via sequential Monte Carlo methods (SMC), where most standard particle methods, and their associated convergence results, are readily adapted to the new setting. We apply our approach to object tracking and Gaussian process regression problems, and observe improved performance over both standard filtering algorithms and other robust filters. △ Less

Submitted 21 October, 2020; v1 submitted 23 February, 2020; originally announced February 2020.

arXiv:2002.05465 [pdf, ps, other]

Nonasymptotic analysis of Stochastic Gradient Hamiltonian Monte Carlo under local conditions for nonconvex optimization

Authors: Ömer Deniz Akyildiz, Sotirios Sabanis

Abstract: We provide a nonasymptotic analysis of the convergence of the stochastic gradient Hamiltonian Monte Carlo (SGHMC) to a target measure in Wasserstein-2 distance without assuming log-concavity. Our analysis quantifies key theoretical properties of the SGHMC as a sampler under local conditions which significantly improves the findings of previous results. In particular, we prove that the Wasserstein-… ▽ More We provide a nonasymptotic analysis of the convergence of the stochastic gradient Hamiltonian Monte Carlo (SGHMC) to a target measure in Wasserstein-2 distance without assuming log-concavity. Our analysis quantifies key theoretical properties of the SGHMC as a sampler under local conditions which significantly improves the findings of previous results. In particular, we prove that the Wasserstein-2 distance between the target and the law of the SGHMC is uniformly controlled by the step-size of the algorithm, therefore demonstrate that the SGHMC can provide high-precision results uniformly in the number of iterations. The analysis also allows us to obtain nonasymptotic bounds for nonconvex optimization problems under local conditions and implies that the SGHMC, when viewed as a nonconvex optimizer, converges to a global minimum with the best known rates. We apply our results to obtain nonasymptotic bounds for scalable Bayesian inference and nonasymptotic generalization bounds. △ Less

Submitted 28 January, 2024; v1 submitted 13 February, 2020; originally announced February 2020.

Comments: Accepted to Journal of Machine Learning Research (JMLR), 2024, to appear

arXiv:1910.03906 [pdf, other]

Probabilistic sequential matrix factorization

Authors: Ömer Deniz Akyildiz, Gerrit J. J. van den Burg, Theodoros Damoulas, Mark F. J. Steel

Abstract: We introduce the probabilistic sequential matrix factorization (PSMF) method for factorizing time-varying and non-stationary datasets consisting of high-dimensional time-series. In particular, we consider nonlinear Gaussian state-space models where sequential approximate inference results in the factorization of a data matrix into a dictionary and time-varying coefficients with potentially nonline… ▽ More We introduce the probabilistic sequential matrix factorization (PSMF) method for factorizing time-varying and non-stationary datasets consisting of high-dimensional time-series. In particular, we consider nonlinear Gaussian state-space models where sequential approximate inference results in the factorization of a data matrix into a dictionary and time-varying coefficients with potentially nonlinear Markovian dependencies. The assumed Markovian structure on the coefficients enables us to encode temporal dependencies into a low-dimensional feature space. The proposed inference method is solely based on an approximate extended Kalman filtering scheme, which makes the resulting method particularly efficient. PSMF can account for temporal nonlinearities and, more importantly, can be used to calibrate and estimate generic differentiable nonlinear subspace models. We also introduce a robust version of PSMF, called rPSMF, which uses Student-t filters to handle model misspecification. We show that PSMF can be used in multiple contexts: modeling time series with a periodic subspace, robustifying changepoint detection methods, and imputing missing data in several high-dimensional time-series, such as measurements of pollutants across London. △ Less

Submitted 18 March, 2021; v1 submitted 9 October, 2019; originally announced October 2019.

Comments: Accepted for publication at AISTATS 2021

arXiv:1910.02008 [pdf, ps, other]

Nonasymptotic estimates for Stochastic Gradient Langevin Dynamics under local conditions in nonconvex optimization

Authors: Ying Zhang, Ömer Deniz Akyildiz, Theodoros Damoulas, Sotirios Sabanis

Abstract: In this paper, we are concerned with a non-asymptotic analysis of sampling algorithms used in nonconvex optimization. In particular, we obtain non-asymptotic estimates in Wasserstein-1 and Wasserstein-2 distances for a popular class of algorithms called Stochastic Gradient Langevin Dynamics (SGLD). In addition, the aforementioned Wasserstein-2 convergence result can be applied to establish a non-a… ▽ More In this paper, we are concerned with a non-asymptotic analysis of sampling algorithms used in nonconvex optimization. In particular, we obtain non-asymptotic estimates in Wasserstein-1 and Wasserstein-2 distances for a popular class of algorithms called Stochastic Gradient Langevin Dynamics (SGLD). In addition, the aforementioned Wasserstein-2 convergence result can be applied to establish a non-asymptotic error bound for the expected excess risk. Crucially, these results are obtained under a local Lipschitz condition and a local dissipativity condition where we remove the uniform dependence in the data stream. We illustrate the importance of this relaxation by presenting examples from variational inference and from index tracking optimization. △ Less

Submitted 14 October, 2022; v1 submitted 4 October, 2019; originally announced October 2019.

Comments: 38 pages

MSC Class: 60J20; 60J22; 65C05; 65C40; 62D05

arXiv:1903.12044 [pdf, ps, other]

Convergence rates for optimised adaptive importance samplers

Authors: Ömer Deniz Akyildiz, Joaquín Míguez

Abstract: Adaptive importance samplers are adaptive Monte Carlo algorithms to estimate expectations with respect to some target distribution which \textit{adapt} themselves to obtain better estimators over a sequence of iterations. Although it is straightforward to show that they have the same $\mathcal{O}(1/\sqrt{N})$ convergence rate as standard importance samplers, where $N$ is the number of Monte Carlo… ▽ More Adaptive importance samplers are adaptive Monte Carlo algorithms to estimate expectations with respect to some target distribution which \textit{adapt} themselves to obtain better estimators over a sequence of iterations. Although it is straightforward to show that they have the same $\mathcal{O}(1/\sqrt{N})$ convergence rate as standard importance samplers, where $N$ is the number of Monte Carlo samples, the behaviour of adaptive importance samplers over the number of iterations has been left relatively unexplored. In this work, we investigate an adaptation strategy based on convex optimisation which leads to a class of adaptive importance samplers termed \textit{optimised adaptive importance samplers} (OAIS). These samplers rely on the iterative minimisation of the $χ^2$-divergence between an exponential-family proposal and the target. The analysed algorithms are closely related to the class of adaptive importance samplers which minimise the variance of the weight function. We first prove non-asymptotic error bounds for the mean squared errors (MSEs) of these algorithms, which explicitly depend on the number of iterations and the number of samples together. The non-asymptotic bounds derived in this paper imply that when the target belongs to the exponential family, the $L_2$ errors of the optimised samplers converge to the optimal rate of $\mathcal{O}(1/\sqrt{N})$ and the rate of convergence in the number of iterations are explicitly provided. When the target does not belong to the exponential family, the rate of convergence is the same but the asymptotic $L_2$ error increases by a factor $\sqrt{ρ^\star} > 1$, where $ρ^\star - 1$ is the minimum $χ^2$-divergence between the target and an exponential-family proposal. △ Less

Submitted 6 May, 2020; v1 submitted 28 March, 2019; originally announced March 2019.

Comments: Revised version, new results added

arXiv:1812.01655 [pdf, ps, other]

A probabilistic incremental proximal gradient method

Authors: Ömer Deniz Akyildiz, Émilie Chouzenoux, Víctor Elvira, Joaquín Míguez

Abstract: In this paper, we propose a probabilistic optimization method, named probabilistic incremental proximal gradient (PIPG) method, by develo** a probabilistic interpretation of the incremental proximal gradient algorithm. We explicitly model the update rules of the incremental proximal gradient method and develop a systematic approach to propagate the uncertainty of the solution estimate over itera… ▽ More In this paper, we propose a probabilistic optimization method, named probabilistic incremental proximal gradient (PIPG) method, by develo** a probabilistic interpretation of the incremental proximal gradient algorithm. We explicitly model the update rules of the incremental proximal gradient method and develop a systematic approach to propagate the uncertainty of the solution estimate over iterations. The PIPG algorithm takes the form of Bayesian filtering updates for a state-space model constructed by using the cost function. Our framework makes it possible to utilize well-known exact or approximate Bayesian filters, such as Kalman or extended Kalman filters, to solve large-scale regularized optimization problems. △ Less

Submitted 19 June, 2019; v1 submitted 4 December, 2018; originally announced December 2018.

Comments: 5 pages, includes an extra numerical experiment

arXiv:1811.09469 [pdf, ps, other]

Parallel sequential Monte Carlo for stochastic gradient-free nonconvex optimization

Authors: Ömer Deniz Akyildiz, Dan Crisan, Joaquín Míguez

Abstract: We introduce and analyze a parallel sequential Monte Carlo methodology for the numerical solution of optimization problems that involve the minimization of a cost function that consists of the sum of many individual components. The proposed scheme is a stochastic zeroth order optimization algorithm which demands only the capability to evaluate small subsets of components of the cost function. It c… ▽ More We introduce and analyze a parallel sequential Monte Carlo methodology for the numerical solution of optimization problems that involve the minimization of a cost function that consists of the sum of many individual components. The proposed scheme is a stochastic zeroth order optimization algorithm which demands only the capability to evaluate small subsets of components of the cost function. It can be depicted as a bank of samplers that generate particle approximations of several sequences of probability measures. These measures are constructed in such a way that they have associated probability density functions whose global maxima coincide with the global minima of the original cost function. The algorithm selects the best performing sampler and uses it to approximate a global minimum of the cost function. We prove analytically that the resulting estimator converges to a global minimum of the cost function almost surely and provide explicit convergence rates in terms of the number of generated Monte Carlo samples and the dimension of the search space. We show, by way of numerical examples, that the algorithm can tackle cost functions with multiple minima or with broad "flat" regions which are hard to minimize using gradient-based techniques. △ Less

Submitted 1 January, 2022; v1 submitted 23 November, 2018; originally announced November 2018.

Comments: Statistics and Computing volume 30, (2020). Published at https://link.springer.com/article/10.1007/s11222-020-09964-4

arXiv:1807.04594 [pdf, ps, other]

The Incremental Proximal Method: A Probabilistic Perspective

Authors: Ömer Deniz Akyildiz, Victor Elvira, Joaquin Miguez

Abstract: In this work, we highlight a connection between the incremental proximal method and stochastic filters. We begin by showing that the proximal operators coincide, and hence can be realized with, Bayes updates. We give the explicit form of the updates for the linear regression problem and show that there is a one-to-one correspondence between the proximal operator of the least-squares regression and… ▽ More In this work, we highlight a connection between the incremental proximal method and stochastic filters. We begin by showing that the proximal operators coincide, and hence can be realized with, Bayes updates. We give the explicit form of the updates for the linear regression problem and show that there is a one-to-one correspondence between the proximal operator of the least-squares regression and the Bayes update when the prior and the likelihood are Gaussian. We then carry out this observation to a general sequential setting: We consider the incremental proximal method, which is an algorithm for large-scale optimization, and show that, for a linear-quadratic cost function, it can naturally be realized by the Kalman filter. We then discuss the implications of this idea for nonlinear optimization problems where proximal operators are in general not realizable. In such settings, we argue that the extended Kalman filter can provide a systematic way for the derivation of practical procedures. △ Less

Submitted 12 July, 2018; originally announced July 2018.

Comments: Presented at ICASSP, 15-20 April 2018

arXiv:1712.07879 [pdf, ps, other]

A probabilistic interpretation of replicator-mutator dynamics

Authors: Ömer Deniz Akyıldız

Abstract: In this note, we investigate the relationship between probabilistic updating mechanisms and discrete-time replicator-mutator dynamics. We consider the recently shown connection between Bayesian updating and replicator dynamics and extend it to the replicator-mutator dynamics by considering prediction and filtering recursions in hidden Markov models (HMM). We show that it is possible to understand… ▽ More In this note, we investigate the relationship between probabilistic updating mechanisms and discrete-time replicator-mutator dynamics. We consider the recently shown connection between Bayesian updating and replicator dynamics and extend it to the replicator-mutator dynamics by considering prediction and filtering recursions in hidden Markov models (HMM). We show that it is possible to understand the evolution of the frequency vector of a population under the replicator-mutator equation as a posterior predictive inference procedure in an HMM. This view enables us to derive a natural dual version of the replicator-mutator equation, which corresponds to updating the filtering distribution. Finally, we conclude with the implications of the interpretation and with some comments related to the recent discussions about evolution and learning. △ Less

Submitted 21 December, 2017; originally announced December 2017.

arXiv:1708.07801 [pdf, ps, other]

Nudging the particle filter

Authors: Ömer Deniz Akyıldız, Joaquín Míguez

Abstract: We investigate a new sampling scheme aimed at improving the performance of particle filters whenever (a) there is a significant mismatch between the assumed model dynamics and the actual system, or (b) the posterior probability tends to concentrate in relatively small regions of the state space. The proposed scheme pushes some particles towards specific regions where the likelihood is expected to… ▽ More We investigate a new sampling scheme aimed at improving the performance of particle filters whenever (a) there is a significant mismatch between the assumed model dynamics and the actual system, or (b) the posterior probability tends to concentrate in relatively small regions of the state space. The proposed scheme pushes some particles towards specific regions where the likelihood is expected to be high, an operation known as nudging in the geophysics literature. We re-interpret nudging in a form applicable to any particle filtering scheme, as it does not involve any changes in the rest of the algorithm. Since the particles are modified, but the importance weights do not account for this modification, the use of nudging leads to additional bias in the resulting estimators. However, we prove analytically that nudged particle filters can still attain asymptotic convergence with the same error rates as conventional particle methods. Simple analysis also yields an alternative interpretation of the nudging operation that explains its robustness to model errors. Finally, we show numerical results that illustrate the improvements that can be attained using the proposed scheme. In particular, we present nonlinear tracking examples with synthetic data and a model inference example using real-world financial data. △ Less

Submitted 19 March, 2019; v1 submitted 25 August, 2017; originally announced August 2017.

arXiv:1509.02088 [pdf, ps, other]

Matrix Factorisation with Linear Filters

Authors: Ömer Deniz Akyıldız

Abstract: This text investigates relations between two well-known family of algorithms, matrix factorisations and recursive linear filters, by describing a probabilistic model in which approximate inference corresponds to a matrix factorisation algorithm. Using the probabilistic model, we derive a matrix factorisation algorithm as a recursive linear filter. More precisely, we derive a matrix-variate recursi… ▽ More This text investigates relations between two well-known family of algorithms, matrix factorisations and recursive linear filters, by describing a probabilistic model in which approximate inference corresponds to a matrix factorisation algorithm. Using the probabilistic model, we derive a matrix factorisation algorithm as a recursive linear filter. More precisely, we derive a matrix-variate recursive linear filter in order to perform efficient inference in high dimensions. We also show that it is possible to interpret our algorithm as a nontrivial stochastic gradient algorithm. Demonstrations and comparisons on an image restoration task are given. △ Less

Submitted 7 September, 2015; originally announced September 2015.

Comments: Submitted

arXiv:1506.04389 [pdf, ps, other]

Online Matrix Factorization via Broyden Updates

Authors: Ömer Deniz Akyıldız

Abstract: In this paper, we propose an online algorithm to compute matrix factorizations. Proposed algorithm updates the dictionary matrix and associated coefficients using a single observation at each time. The algorithm performs low-rank updates to dictionary matrix. We derive the algorithm by defining a simple objective function to minimize whenever an observation is arrived. We extend the algorithm furt… ▽ More In this paper, we propose an online algorithm to compute matrix factorizations. Proposed algorithm updates the dictionary matrix and associated coefficients using a single observation at each time. The algorithm performs low-rank updates to dictionary matrix. We derive the algorithm by defining a simple objective function to minimize whenever an observation is arrived. We extend the algorithm further for handling missing data. We also provide a mini-batch extension which enables to compute the matrix factorization on big datasets. We demonstrate the efficiency of our algorithm on a real dataset and give comparisons with well-known algorithms such as stochastic gradient matrix factorization and nonnegative matrix factorization (NMF). △ Less

Submitted 26 June, 2015; v1 submitted 14 June, 2015; originally announced June 2015.

Comments: Submitted. Little revisions on acknowledgements, and added references

Showing 1–26 of 26 results for author: Akyildiz, O D