Search | arXiv e-print repository

Particle Semi-Implicit Variational Inference

Abstract: Semi-implicit variational inference (SIVI) enriches the expressiveness of variational families by utilizing a kernel and a mixing distribution to hierarchically define the variational distribution. Existing SIVI methods parameterize the mixing distribution using implicit distributions, leading to intractable variational densities. As a result, directly maximizing the evidence lower bound (ELBO) is… ▽ More Semi-implicit variational inference (SIVI) enriches the expressiveness of variational families by utilizing a kernel and a mixing distribution to hierarchically define the variational distribution. Existing SIVI methods parameterize the mixing distribution using implicit distributions, leading to intractable variational densities. As a result, directly maximizing the evidence lower bound (ELBO) is not possible and so, they resort to either: optimizing bounds on the ELBO, employing costly inner-loop Markov chain Monte Carlo runs, or solving minimax objectives. In this paper, we propose a novel method for SIVI called Particle Variational Inference (PVI) which employs empirical measures to approximate the optimal mixing distributions characterized as the minimizer of a natural free energy functional via a particle approximation of an Euclidean--Wasserstein gradient flow. This approach means that, unlike prior works, PVI can directly optimize the ELBO; furthermore, it makes no parametric assumption about the mixing distribution. Our empirical results demonstrate that PVI performs favourably against other SIVI methods across various tasks. Moreover, we provide a theoretical analysis of the behaviour of the gradient flow of a related free energy functional: establishing the existence and uniqueness of solutions as well as propagation of chaos results. △ Less

Submitted 30 June, 2024; originally announced July 2024.

arXiv:2403.02004 [pdf, ps, other]

Error bounds for particle gradient descent, and extensions of the log-Sobolev and Talagrand inequalities

Authors: Rocco Caprio, Juan Kuntz, Samuel Power, Adam M. Johansen

Abstract: We prove non-asymptotic error bounds for particle gradient descent (PGD)~(Kuntz et al., 2023), a recently introduced algorithm for maximum likelihood estimation of large latent variable models obtained by discretizing a gradient flow of the free energy. We begin by showing that, for models satisfying a condition generalizing both the log-Sobolev and the Polyak--Łojasiewicz inequalities (LSI and PŁ… ▽ More We prove non-asymptotic error bounds for particle gradient descent (PGD)~(Kuntz et al., 2023), a recently introduced algorithm for maximum likelihood estimation of large latent variable models obtained by discretizing a gradient flow of the free energy. We begin by showing that, for models satisfying a condition generalizing both the log-Sobolev and the Polyak--Łojasiewicz inequalities (LSI and PŁI, respectively), the flow converges exponentially fast to the set of minimizers of the free energy. We achieve this by extending a result well-known in the optimal transport literature (that the LSI implies the Talagrand inequality) and its counterpart in the optimization literature (that the PŁI implies the so-called quadratic growth condition), and applying it to our new setting. We also generalize the Bakry--Émery Theorem and show that the LSI/PŁI generalization holds for models with strongly concave log-likelihoods. For such models, we further control PGD's discretization error, obtaining non-asymptotic error bounds. While we are motivated by the study of PGD, we believe that the inequalities and results we extend may be of independent interest. △ Less

Submitted 11 April, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

arXiv:2312.07335 [pdf, other]

Momentum Particle Maximum Likelihood

Authors: Jen Ning Lim, Juan Kuntz, Samuel Power, Adam M. Johansen

Abstract: Maximum likelihood estimation (MLE) of latent variable models is often recast as the minimization of a free energy functional over an extended space of parameters and probability distributions. This perspective was recently combined with insights from optimal transport to obtain novel particle-based algorithms for fitting latent variable models to data. Drawing inspiration from prior works which i… ▽ More Maximum likelihood estimation (MLE) of latent variable models is often recast as the minimization of a free energy functional over an extended space of parameters and probability distributions. This perspective was recently combined with insights from optimal transport to obtain novel particle-based algorithms for fitting latent variable models to data. Drawing inspiration from prior works which interpret `momentum-enriched' optimization algorithms as discretizations of ordinary differential equations, we propose an analogous dynamical-systems-inspired approach to minimizing the free energy functional. The result is a dynamical system that blends elements of Nesterov's Accelerated Gradient method, the underdamped Langevin diffusion, and particle methods. Under suitable assumptions, we prove that the continuous-time system minimizes the functional. By discretizing the system, we obtain a practical algorithm for MLE in latent variable models. The algorithm outperforms existing particle methods in numerical experiments and compares favourably with other MLE algorithms. △ Less

Submitted 4 June, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

Comments: ICML 2024 camera ready

arXiv:2204.12965 [pdf, other]

Particle algorithms for maximum likelihood training of latent variable models

Authors: Juan Kuntz, Jen Ning Lim, Adam M. Johansen

Abstract: (Neal and Hinton, 1998) recast maximum likelihood estimation of any given latent variable model as the minimization of a free energy functional $F$, and the EM algorithm as coordinate descent applied to $F$. Here, we explore alternative ways to optimize the functional. In particular, we identify various gradient flows associated with $F$ and show that their limits coincide with $F$'s stationary po… ▽ More (Neal and Hinton, 1998) recast maximum likelihood estimation of any given latent variable model as the minimization of a free energy functional $F$, and the EM algorithm as coordinate descent applied to $F$. Here, we explore alternative ways to optimize the functional. In particular, we identify various gradient flows associated with $F$ and show that their limits coincide with $F$'s stationary points. By discretizing the flows, we obtain practical particle-based algorithms for maximum likelihood estimation in broad classes of latent variable models. The novel algorithms scale to high-dimensional settings and perform well in numerical experiments. △ Less

Submitted 18 February, 2023; v1 submitted 27 April, 2022; originally announced April 2022.

arXiv:2002.09998 [pdf, other]

Generalized Bayesian Filtering via Sequential Monte Carlo

Authors: Ayman Boustati, Ömer Deniz Akyildiz, Theodoros Damoulas, Adam M. Johansen

Abstract: We introduce a framework for inference in general state-space hidden Markov models (HMMs) under likelihood misspecification. In particular, we leverage the loss-theoretic perspective of Generalized Bayesian Inference (GBI) to define generalised filtering recursions in HMMs, that can tackle the problem of inference under model misspecification. In doing so, we arrive at principled procedures for ro… ▽ More We introduce a framework for inference in general state-space hidden Markov models (HMMs) under likelihood misspecification. In particular, we leverage the loss-theoretic perspective of Generalized Bayesian Inference (GBI) to define generalised filtering recursions in HMMs, that can tackle the problem of inference under model misspecification. In doing so, we arrive at principled procedures for robust inference against observation contamination by utilising the $β$-divergence. Operationalising the proposed framework is made possible via sequential Monte Carlo methods (SMC), where most standard particle methods, and their associated convergence results, are readily adapted to the new setting. We apply our approach to object tracking and Gaussian process regression problems, and observe improved performance over both standard filtering algorithms and other robust filters. △ Less

Submitted 21 October, 2020; v1 submitted 23 February, 2020; originally announced February 2020.

Showing 1–5 of 5 results for author: Johansen, A M