-
Gromov-Wassertein-like Distances in the Gaussian Mixture Models Space
Authors:
Antoine Salmona,
Julie Delon,
Agnès Desolneux
Abstract:
The Gromov-Wasserstein (GW) distance is frequently used in machine learning to compare distributions across distinct metric spaces. Despite its utility, it remains computationally intensive, especially for large-scale problems. Recently, a novel Wasserstein distance specifically tailored for Gaussian mixture models and known as MW (mixture Wasserstein) has been introduced by several authors. In sc…
▽ More
The Gromov-Wasserstein (GW) distance is frequently used in machine learning to compare distributions across distinct metric spaces. Despite its utility, it remains computationally intensive, especially for large-scale problems. Recently, a novel Wasserstein distance specifically tailored for Gaussian mixture models and known as MW (mixture Wasserstein) has been introduced by several authors. In scenarios where data exhibit clustering, this approach simplifies to a small-scale discrete optimal transport problem, which complexity depends solely on the number of Gaussian components in the GMMs. This paper aims to extend MW by introducing new Gromov-type distances. These distances are designed to be isometry-invariant in Euclidean spaces and are applicable for comparing GMMs across different dimensional spaces. Our first contribution is the Mixture Gromov Wasserstein distance (MGW), which can be viewed as a Gromovized version of MW. This new distance has a straightforward discrete formulation, making it highly efficient for estimating distances between GMMs in practical applications. To facilitate the derivation of a transport plan between GMMs, we present a second distance, the Embedded Wasserstein distance (EW). This distance turns out to be closely related to several recent alternatives to Gromov-Wasserstein. We show that EW can be adapted to derive a distance as well as optimal transportation plans between GMMs. We demonstrate the efficiency of these newly proposed distances on medium to large-scale problems, including shape matching and hyperspectral image color transfer.
△ Less
Submitted 29 March, 2024; v1 submitted 17 October, 2023;
originally announced October 2023.
-
Can Push-forward Generative Models Fit Multimodal Distributions?
Authors:
Antoine Salmona,
Valentin de Bortoli,
Julie Delon,
Agnès Desolneux
Abstract:
Many generative models synthesize data by transforming a standard Gaussian random variable using a deterministic neural network. Among these models are the Variational Autoencoders and the Generative Adversarial Networks. In this work, we call them "push-forward" models and study their expressivity. We show that the Lipschitz constant of these generative networks has to be large in order to fit mu…
▽ More
Many generative models synthesize data by transforming a standard Gaussian random variable using a deterministic neural network. Among these models are the Variational Autoencoders and the Generative Adversarial Networks. In this work, we call them "push-forward" models and study their expressivity. We show that the Lipschitz constant of these generative networks has to be large in order to fit multimodal distributions. More precisely, we show that the total variation distance and the Kullback-Leibler divergence between the generated and the data distribution are bounded from below by a constant depending on the mode separation and the Lipschitz constant. Since constraining the Lipschitz constants of neural networks is a common way to stabilize generative models, there is a provable trade-off between the ability of push-forward models to approximate multimodal distributions and the stability of their training. We validate our findings on one-dimensional and image datasets and empirically show that generative models consisting of stacked networks with stochastic input at each step, such as diffusion models do not suffer of such limitations.
△ Less
Submitted 12 October, 2022; v1 submitted 29 June, 2022;
originally announced June 2022.
-
On quantitative Laplace-type convergence results for some exponential probability measures, with two applications
Authors:
Valentin De Bortoli,
Agnès Desolneux
Abstract:
Laplace-type results characterize the limit of sequence of measures $(π_\varepsilon)_{\varepsilon >0}$ with density w.r.t the Lebesgue measure $(\mathrm{d} π_\varepsilon / \mathrm{d} \mathrm{Leb})(x) \propto \exp[-U(x)/\varepsilon]$ when the temperature $\varepsilon>0$ converges to $0$. If a limiting distribution $π_0$ exists, it concentrates on the minimizers of the potential $U$. Classical resul…
▽ More
Laplace-type results characterize the limit of sequence of measures $(π_\varepsilon)_{\varepsilon >0}$ with density w.r.t the Lebesgue measure $(\mathrm{d} π_\varepsilon / \mathrm{d} \mathrm{Leb})(x) \propto \exp[-U(x)/\varepsilon]$ when the temperature $\varepsilon>0$ converges to $0$. If a limiting distribution $π_0$ exists, it concentrates on the minimizers of the potential $U$. Classical results require the invertibility of the Hessian of $U$ in order to establish such asymptotics. In this work, we study the particular case of norm-like potentials $U$ and establish quantitative bounds between $π_\varepsilon$ and $π_0$ w.r.t. the Wasserstein distance of order $1$ under an invertibility condition of a generalized Jacobian. One key element of our proof is the use of geometric measure theory tools such as the coarea formula. We apply our results to the study of maximum entropy models (microcanonical/macrocanonical distributions) and to the convergence of the iterates of the Stochastic Gradient Langevin Dynamics (SGLD) algorithm at low temperatures for non-convex minimization.
△ Less
Submitted 25 October, 2021;
originally announced October 2021.
-
Maximum entropy methods for texture synthesis: theory and practice
Authors:
Valentin De Bortoli,
Agnes Desolneux,
Alain Durmus,
Bruno Galerne,
Arthur Leclaire
Abstract:
Recent years have seen the rise of convolutional neural network techniques in exemplar-based image synthesis. These methods often rely on the minimization of some variational formulation on the image space for which the minimizers are assumed to be the solutions of the synthesis problem. In this paper we investigate, both theoretically and experimentally, another framework to deal with this proble…
▽ More
Recent years have seen the rise of convolutional neural network techniques in exemplar-based image synthesis. These methods often rely on the minimization of some variational formulation on the image space for which the minimizers are assumed to be the solutions of the synthesis problem. In this paper we investigate, both theoretically and experimentally, another framework to deal with this problem using an alternate sampling/minimization scheme. First, we use results from information geometry to assess that our method yields a probability measure which has maximum entropy under some constraints in expectation. Then, we turn to the analysis of our method and we show, using recent results from the Markov chain literature, that its error can be explicitly bounded with constants which depend polynomially in the dimension even in the non-convex setting. This includes the case where the constraints are defined via a differentiable neural network. Finally, we present an extensive experimental study of the model, including a comparison with state-of-the-art methods and an extension to style transfer.
△ Less
Submitted 3 December, 2019;
originally announced December 2019.
-
Exact Sampling of Determinantal Point Processes without Eigendecomposition
Authors:
Claire Launay,
Bruno Galerne,
Agnès Desolneux
Abstract:
Determinantal point processes (DPPs) enable the modeling of repulsion: they provide diverse sets of points. The repulsion is encoded in a kernel $K$ that can be seen as a matrix storing the similarity between points. The diversity comes from the fact that the inclusion probability of a subset is equal to the determinant of a submatrice of $K$. The exact algorithm to sample DPPs uses the spectral d…
▽ More
Determinantal point processes (DPPs) enable the modeling of repulsion: they provide diverse sets of points. The repulsion is encoded in a kernel $K$ that can be seen as a matrix storing the similarity between points. The diversity comes from the fact that the inclusion probability of a subset is equal to the determinant of a submatrice of $K$. The exact algorithm to sample DPPs uses the spectral decomposition of $K$, a computation that becomes costly when dealing with a high number of points. Here, we present an alternative exact algorithm in the discrete setting that avoids the eigenvalues and the eigenvectors computation. Instead, it relies on Cholesky decompositions. This is a two steps strategy: first, it samples a Bernoulli point process with an appropriate distribution, then it samples the target DPP distribution through a thinning procedure. Not only is the method used here innovative, but this algorithm can be competitive with the original algorithm or even faster for some applications specified here.
△ Less
Submitted 22 February, 2021; v1 submitted 23 February, 2018;
originally announced February 2018.