-
Generative Assignment Flows for Representing and Learning Joint Distributions of Discrete Data
Authors:
Bastian Boll,
Daniel Gonzalez-Alvarado,
Stefania Petra,
Christoph Schnörr
Abstract:
We introduce a novel generative model for the representation of joint probability distributions of a possibly large number of discrete random variables. The approach uses measure transport by randomized assignment flows on the statistical submanifold of factorizing distributions, which also enables to sample efficiently from the target distribution and to assess the likelihood of unseen data point…
▽ More
We introduce a novel generative model for the representation of joint probability distributions of a possibly large number of discrete random variables. The approach uses measure transport by randomized assignment flows on the statistical submanifold of factorizing distributions, which also enables to sample efficiently from the target distribution and to assess the likelihood of unseen data points. The embedding of the flow via the Segre map in the meta-simplex of all discrete joint distributions ensures that any target distribution can be represented in principle, whose complexity in practice only depends on the parametrization of the affinity function of the dynamical assignment flow system. Our model can be trained in a simulation-free manner without integration by conditional Riemannian flow matching, using the training data encoded as geodesics in closed-form with respect to the e-connection of information geometry. By projecting high-dimensional flow matching in the meta-simplex of joint distributions to the submanifold of factorizing distributions, our approach has strong motivation from first principles of modeling coupled discrete variables. Numerical experiments devoted to distributions of structured image labelings demonstrate the applicability to large-scale problems, which may include discrete distributions in other application areas. Performance measures show that our approach scales better with the increasing number of classes than recent related work.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Generative Modeling of Discrete Joint Distributions by E-Geodesic Flow Matching on Assignment Manifolds
Authors:
Bastian Boll,
Daniel Gonzalez-Alvarado,
Christoph Schnörr
Abstract:
This paper introduces a novel generative model for discrete distributions based on continuous normalizing flows on the submanifold of factorizing discrete measures. Integration of the flow gradually assigns categories and avoids issues of discretizing the latent continuous model like rounding, sample truncation etc. General non-factorizing discrete distributions capable of representing complex sta…
▽ More
This paper introduces a novel generative model for discrete distributions based on continuous normalizing flows on the submanifold of factorizing discrete measures. Integration of the flow gradually assigns categories and avoids issues of discretizing the latent continuous model like rounding, sample truncation etc. General non-factorizing discrete distributions capable of representing complex statistical dependencies of structured discrete data, can be approximated by embedding the submanifold into a the meta-simplex of all joint discrete distributions and data-driven averaging. Efficient training of the generative model is demonstrated by matching the flow of geodesics of factorizing discrete distributions. Various experiments underline the approach's broad applicability.
△ Less
Submitted 12 February, 2024;
originally announced February 2024.
-
On the Universality of Coupling-based Normalizing Flows
Authors:
Felix Draxler,
Stefan Wahl,
Christoph Schnörr,
Ullrich Köthe
Abstract:
We present a novel theoretical framework for understanding the expressive power of normalizing flows. Despite their prevalence in scientific applications, a comprehensive understanding of flows remains elusive due to their restricted architectures. Existing theorems fall short as they require the use of arbitrarily ill-conditioned neural networks, limiting practical applicability. We propose a dis…
▽ More
We present a novel theoretical framework for understanding the expressive power of normalizing flows. Despite their prevalence in scientific applications, a comprehensive understanding of flows remains elusive due to their restricted architectures. Existing theorems fall short as they require the use of arbitrarily ill-conditioned neural networks, limiting practical applicability. We propose a distributional universality theorem for well-conditioned coupling-based normalizing flows such as RealNVP. In addition, we show that volume-preserving normalizing flows are not universal, what distribution they learn instead, and how to fix their expressivity. Our results support the general wisdom that affine and related couplings are expressive and in general outperform volume-preserving flows, bridging a gap between empirical results and theoretical understanding.
△ Less
Submitted 5 June, 2024; v1 submitted 9 February, 2024;
originally announced February 2024.
-
On the Convergence Rate of Gaussianization with Random Rotations
Authors:
Felix Draxler,
Lars Kühmichel,
Armand Rousselot,
Jens Müller,
Christoph Schnörr,
Ullrich Köthe
Abstract:
Gaussianization is a simple generative model that can be trained without backpropagation. It has shown compelling performance on low dimensional data. As the dimension increases, however, it has been observed that the convergence speed slows down. We show analytically that the number of required layers scales linearly with the dimension for Gaussian input. We argue that this is because the model i…
▽ More
Gaussianization is a simple generative model that can be trained without backpropagation. It has shown compelling performance on low dimensional data. As the dimension increases, however, it has been observed that the convergence speed slows down. We show analytically that the number of required layers scales linearly with the dimension for Gaussian input. We argue that this is because the model is unable to capture dependencies between dimensions. Empirically, we find the same linear increase in cost for arbitrary input $p(x)$, but observe favorable scaling for some distributions. We explore potential speed-ups and formulate challenges for further research.
△ Less
Submitted 23 June, 2023;
originally announced June 2023.
-
On Certified Generalization in Structured Prediction
Authors:
Bastian Boll,
Christoph Schnörr
Abstract:
In structured prediction, target objects have rich internal structure which does not factorize into independent components and violates common i.i.d. assumptions. This challenge becomes apparent through the exponentially large output space in applications such as image segmentation or scene graph generation. We present a novel PAC-Bayesian risk bound for structured prediction wherein the rate of g…
▽ More
In structured prediction, target objects have rich internal structure which does not factorize into independent components and violates common i.i.d. assumptions. This challenge becomes apparent through the exponentially large output space in applications such as image segmentation or scene graph generation. We present a novel PAC-Bayesian risk bound for structured prediction wherein the rate of generalization scales not only with the number of structured examples but also with their size. The underlying assumption, conforming to ongoing research on generative models, is that data are generated by the Knothe-Rosenblatt rearrangement of a factorizing reference measure. This allows to explicitly distill the structure between random output variables into a Wasserstein dependency matrix. Our work makes a preliminary step towards leveraging powerful generative models to establish generalization bounds for discriminative downstream tasks in the challenging setting of structured prediction.
△ Less
Submitted 16 October, 2023; v1 submitted 15 June, 2023;
originally announced June 2023.
-
Whitening Convergence Rate of Coupling-based Normalizing Flows
Authors:
Felix Draxler,
Christoph Schnörr,
Ullrich Köthe
Abstract:
Coupling-based normalizing flows (e.g. RealNVP) are a popular family of normalizing flow architectures that work surprisingly well in practice. This calls for theoretical understanding. Existing work shows that such flows weakly converge to arbitrary data distributions. However, they make no statement about the stricter convergence criterion used in practice, the maximum likelihood loss. For the f…
▽ More
Coupling-based normalizing flows (e.g. RealNVP) are a popular family of normalizing flow architectures that work surprisingly well in practice. This calls for theoretical understanding. Existing work shows that such flows weakly converge to arbitrary data distributions. However, they make no statement about the stricter convergence criterion used in practice, the maximum likelihood loss. For the first time, we make a quantitative statement about this kind of convergence: We prove that all coupling-based normalizing flows perform whitening of the data distribution (i.e. diagonalize the covariance matrix) and derive corresponding convergence bounds that show a linear convergence rate in the depth of the flow. Numerical experiments demonstrate the implications of our theory and point at open questions.
△ Less
Submitted 25 October, 2022;
originally announced October 2022.
-
Self-Certifying Classification by Linearized Deep Assignment
Authors:
Bastian Boll,
Alexander Zeilmann,
Stefania Petra,
Christoph Schnörr
Abstract:
We propose a novel class of deep stochastic predictors for classifying metric data on graphs within the PAC-Bayes risk certification paradigm. Classifiers are realized as linearly parametrized deep assignment flows with random initial conditions. Building on the recent PAC-Bayes literature and data-dependent priors, this approach enables (i) to use risk bounds as training objectives for learning p…
▽ More
We propose a novel class of deep stochastic predictors for classifying metric data on graphs within the PAC-Bayes risk certification paradigm. Classifiers are realized as linearly parametrized deep assignment flows with random initial conditions. Building on the recent PAC-Bayes literature and data-dependent priors, this approach enables (i) to use risk bounds as training objectives for learning posterior distributions on the hypothesis space and (ii) to compute tight out-of-sample risk certificates of randomized classifiers more efficiently than related work. Comparison with empirical test set errors illustrates the performance and practicality of this self-certifying classification method.
△ Less
Submitted 18 February, 2022; v1 submitted 26 January, 2022;
originally announced January 2022.
-
Approximate Variational Inference Based on a Finite Sample of Gaussian Latent Variables
Authors:
Nikolaos Gianniotis,
Christoph Schnörr,
Christian Molkenthin,
Sanjay Singh Bora
Abstract:
Variational methods are employed in situations where exact Bayesian inference becomes intractable due to the difficulty in performing certain integrals. Typically, variational methods postulate a tractable posterior and formulate a lower bound on the desired integral to be approximated, e.g. marginal likelihood. The lower bound is then optimised with respect to its free parameters, the so called v…
▽ More
Variational methods are employed in situations where exact Bayesian inference becomes intractable due to the difficulty in performing certain integrals. Typically, variational methods postulate a tractable posterior and formulate a lower bound on the desired integral to be approximated, e.g. marginal likelihood. The lower bound is then optimised with respect to its free parameters, the so called variational parameters. However, this is not always possible as for certain integrals it is very challenging (or tedious) to come up with a suitable lower bound. Here we propose a simple scheme that overcomes some of the awkward cases where the usual variational treatment becomes difficult. The scheme relies on a rewriting of the lower bound on the model log-likelihood. We demonstrate the proposed scheme on a number of synthetic and real examples, as well as on a real geophysical model for which the standard variational approaches are inapplicable.
△ Less
Submitted 11 June, 2019;
originally announced June 2019.
-
Fast Multivariate Log-Concave Density Estimation
Authors:
Fabian Rathke,
Christoph Schnörr
Abstract:
A novel computational approach to log-concave density estimation is proposed. Previous approaches utilize the piecewise-affine parametrization of the density induced by the given sample set. The number of parameters as well as non-smooth subgradient-based convex optimization for determining the maximum likelihood density estimate cause long runtimes for dimensions $d \geq 2$ and large sample sets.…
▽ More
A novel computational approach to log-concave density estimation is proposed. Previous approaches utilize the piecewise-affine parametrization of the density induced by the given sample set. The number of parameters as well as non-smooth subgradient-based convex optimization for determining the maximum likelihood density estimate cause long runtimes for dimensions $d \geq 2$ and large sample sets. The presented approach is based on mildly non-convex smooth approximations of the objective function and \textit{sparse}, adaptive piecewise-affine density parametrization. Established memory-efficient numerical optimization techniques enable to process larger data sets for dimensions $d \geq 2$. While there is no guarantee that the algorithm returns the maximum likelihood estimate for every problem instance, we provide comprehensive numerical evidence that it does yield near-optimal results after significantly shorter runtimes. For example, 10000 samples in $\mathbb{R}^2$ are processed in two seconds, rather than in $\approx 14$ hours required by the previous approach to terminate. For higher dimensions, density estimation becomes tractable as well: Processing $10000$ samples in $\mathbb{R}^6$ requires 35 minutes. The software is publicly available as CRAN R package fmlogcondens.
△ Less
Submitted 20 February, 2019; v1 submitted 18 May, 2018;
originally announced May 2018.
-
Sum-Product Graphical Models
Authors:
Mattia Desana,
Christoph Schnörr
Abstract:
This paper introduces a new probabilistic architecture called Sum-Product Graphical Model (SPGM). SPGMs combine traits from Sum-Product Networks (SPNs) and Graphical Models (GMs): Like SPNs, SPGMs always enable tractable inference using a class of models that incorporate context specific independence. Like GMs, SPGMs provide a high-level model interpretation in terms of conditional independence as…
▽ More
This paper introduces a new probabilistic architecture called Sum-Product Graphical Model (SPGM). SPGMs combine traits from Sum-Product Networks (SPNs) and Graphical Models (GMs): Like SPNs, SPGMs always enable tractable inference using a class of models that incorporate context specific independence. Like GMs, SPGMs provide a high-level model interpretation in terms of conditional independence assumptions and corresponding factorizations. Thus, the new architecture represents a class of probability distributions that combines, for the first time, the semantics of graphical models with the evaluation efficiency of SPNs. We also propose a novel algorithm for learning both the structure and the parameters of SPGMs. A comparative empirical evaluation demonstrates competitive performances of our approach in density estimation.
△ Less
Submitted 21 August, 2017;
originally announced August 2017.
-
Symmetry-free SDP Relaxations for Affine Subspace Clustering
Authors:
Francesco Silvestri,
Gerhard Reinelt,
Christoph Schnörr
Abstract:
We consider clustering problems where the goal is to determine an optimal partition of a given point set in Euclidean space in terms of a collection of affine subspaces. While there is vast literature on heuristics for this kind of problem, such approaches are known to be susceptible to poor initializations and getting trapped in bad local optima. We alleviate these issues by introducing a semidef…
▽ More
We consider clustering problems where the goal is to determine an optimal partition of a given point set in Euclidean space in terms of a collection of affine subspaces. While there is vast literature on heuristics for this kind of problem, such approaches are known to be susceptible to poor initializations and getting trapped in bad local optima. We alleviate these issues by introducing a semidefinite relaxation based on Lasserre's method of moments. While a similiar approach is known for classical Euclidean clustering problems, a generalization to our more general subspace scenario is not straightforward, due to the high symmetry of the objective function that weakens any convex relaxation. We therefore introduce a new mechanism for symmetry breaking based on covering the feasible region with polytopes. Additionally, we introduce and analyze a deterministic rounding heuristic.
△ Less
Submitted 25 July, 2016;
originally announced July 2016.
-
Shape from Texture using Locally Scaled Point Processes
Authors:
Eva-Maria Didden,
Thordis Linda Thorarinsdottir,
Alex Lenkoski,
Christoph Schnörr
Abstract:
Shape from texture refers to the extraction of 3D information from 2D images with irregular texture. This paper introduces a statistical framework to learn shape from texture where convex texture elements in a 2D image are represented through a point process. In a first step, the 2D image is preprocessed to generate a probability map corresponding to an estimate of the unnormalized intensity of th…
▽ More
Shape from texture refers to the extraction of 3D information from 2D images with irregular texture. This paper introduces a statistical framework to learn shape from texture where convex texture elements in a 2D image are represented through a point process. In a first step, the 2D image is preprocessed to generate a probability map corresponding to an estimate of the unnormalized intensity of the latent point process underlying the texture elements. The latent point process is subsequently inferred from the probability map in a non-parametric, model free manner. Finally, the 3D information is extracted from the point pattern by applying a locally scaled point process model where the local scaling function represents the deformation caused by the projection of a 3D surface onto a 2D image.
△ Less
Submitted 28 November, 2013;
originally announced November 2013.