Search | arXiv e-print repository

Intriguing properties of generative classifiers

Authors: Priyank Jaini, Kevin Clark, Robert Geirhos

Abstract: What is the best paradigm to recognize objects -- discriminative inference (fast but potentially prone to shortcut learning) or using a generative model (slow but potentially more robust)? We build on recent advances in generative modeling that turn text-to-image models into classifiers. This allows us to study their behavior and to compare them against discriminative models and human psychophysic… ▽ More What is the best paradigm to recognize objects -- discriminative inference (fast but potentially prone to shortcut learning) or using a generative model (slow but potentially more robust)? We build on recent advances in generative modeling that turn text-to-image models into classifiers. This allows us to study their behavior and to compare them against discriminative models and human psychophysical data. We report four intriguing emergent properties of generative classifiers: they show a record-breaking human-like shape bias (99% for Imagen), near human-level out-of-distribution accuracy, state-of-the-art alignment with human classification errors, and they understand certain perceptual illusions. Our results indicate that while the current dominant paradigm for modeling human object recognition is discriminative inference, zero-shot generative models approximate human object recognition data surprisingly well. △ Less

Submitted 14 February, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

Comments: ICLR 2024 Spotlight

arXiv:2303.15233 [pdf, other]

Text-to-Image Diffusion Models are Zero-Shot Classifiers

Authors: Kevin Clark, Priyank Jaini

Abstract: The excellent generative capabilities of text-to-image diffusion models suggest they learn informative representations of image-text data. However, what knowledge their representations capture is not fully understood, and they have not been thoroughly explored on downstream tasks. We investigate diffusion models by proposing a method for evaluating them as zero-shot classifiers. The key idea is us… ▽ More The excellent generative capabilities of text-to-image diffusion models suggest they learn informative representations of image-text data. However, what knowledge their representations capture is not fully understood, and they have not been thoroughly explored on downstream tasks. We investigate diffusion models by proposing a method for evaluating them as zero-shot classifiers. The key idea is using a diffusion model's ability to denoise a noised image given a text description of a label as a proxy for that label's likelihood. We apply our method to Stable Diffusion and Imagen, using it to probe fine-grained aspects of the models' knowledge and comparing them with CLIP's zero-shot abilities. They perform competitively with CLIP on a wide range of zero-shot image classification datasets. Additionally, they achieve state-of-the-art results on shape/texture bias tests and can successfully perform attribute binding while CLIP cannot. Although generative pre-training is prevalent in NLP, visual foundation models often use other methods such as contrastive learning. Based on our findings, we argue that generative pre-training should be explored as a compelling alternative for vision-language tasks. △ Less

Submitted 5 September, 2023; v1 submitted 27 March, 2023; originally announced March 2023.

arXiv:2207.02149 [pdf, other]

Stochastic Optimal Control for Collective Variable Free Sampling of Molecular Transition Paths

Authors: Lars Holdijk, Yuanqi Du, Ferry Hooft, Priyank Jaini, Bernd Ensing, Max Welling

Abstract: We consider the problem of sampling transition paths between two given metastable states of a molecular system, e.g. a folded and unfolded protein or products and reactants of a chemical reaction. Due to the existence of high energy barriers separating the states, these transition paths are unlikely to be sampled with standard Molecular Dynamics (MD) simulation. Traditional methods to augment MD w… ▽ More We consider the problem of sampling transition paths between two given metastable states of a molecular system, e.g. a folded and unfolded protein or products and reactants of a chemical reaction. Due to the existence of high energy barriers separating the states, these transition paths are unlikely to be sampled with standard Molecular Dynamics (MD) simulation. Traditional methods to augment MD with a bias potential to increase the probability of the transition rely on a dimensionality reduction step based on Collective Variables (CVs). Unfortunately, selecting appropriate CVs requires chemical intuition and traditional methods are therefore not always applicable to larger systems. Additionally, when incorrect CVs are used, the bias potential might not be minimal and bias the system along dimensions irrelevant to the transition. Showing a formal relation between the problem of sampling molecular transition paths, the Schrödinger bridge problem and stochastic optimal control with neural network policies, we propose a machine learning method for sampling said transitions. Unlike previous non-machine learning approaches our method, named PIPS, does not depend on CVs. We show that our method successful generates low energy transitions for Alanine Dipeptide as well as the larger Polyproline and Chignolin proteins. △ Less

Submitted 18 July, 2023; v1 submitted 27 June, 2022; originally announced July 2022.

arXiv:2111.13772 [pdf, other]

Particle Dynamics for Learning EBMs

Authors: Kirill Neklyudov, Priyank Jaini, Max Welling

Abstract: Energy-based modeling is a promising approach to unsupervised learning, which yields many downstream applications from a single model. The main difficulty in learning energy-based models with the "contrastive approaches" is the generation of samples from the current energy function at each iteration. Many advances have been made to accomplish this subroutine cheaply. Nevertheless, all such samplin… ▽ More Energy-based modeling is a promising approach to unsupervised learning, which yields many downstream applications from a single model. The main difficulty in learning energy-based models with the "contrastive approaches" is the generation of samples from the current energy function at each iteration. Many advances have been made to accomplish this subroutine cheaply. Nevertheless, all such sampling paradigms run MCMC targeting the current model, which requires infinitely long chains to generate samples from the true energy distribution and is problematic in practice. This paper proposes an alternative approach to getting these samples and avoiding crude MCMC sampling from the current model. We accomplish this by viewing the evolution of the modeling distribution as (i) the evolution of the energy function, and (ii) the evolution of the samples from this distribution along some vector field. We subsequently derive this time-dependent vector field such that the particles following this field are approximately distributed as the current density model. Thereby we match the evolution of the particles with the evolution of the energy function prescribed by the learning procedure. Importantly, unlike Monte Carlo sampling, our method targets to match the current distribution in a finite time. Finally, we demonstrate its effectiveness empirically compared to MCMC-based learning methods. △ Less

Submitted 26 November, 2021; originally announced November 2021.

arXiv:2106.07832 [pdf, other]

Learning Equivariant Energy Based Models with Equivariant Stein Variational Gradient Descent

Authors: Priyank Jaini, Lars Holdijk, Max Welling

Abstract: We focus on the problem of efficient sampling and learning of probability densities by incorporating symmetries in probabilistic models. We first introduce Equivariant Stein Variational Gradient Descent algorithm -- an equivariant sampling method based on Stein's identity for sampling from densities with symmetries. Equivariant SVGD explicitly incorporates symmetry information in a density through… ▽ More We focus on the problem of efficient sampling and learning of probability densities by incorporating symmetries in probabilistic models. We first introduce Equivariant Stein Variational Gradient Descent algorithm -- an equivariant sampling method based on Stein's identity for sampling from densities with symmetries. Equivariant SVGD explicitly incorporates symmetry information in a density through equivariant kernels which makes the resultant sampler efficient both in terms of sample complexity and the quality of generated samples. Subsequently, we define equivariant energy based models to model invariant densities that are learned using contrastive divergence. By utilizing our equivariant SVGD for training equivariant EBMs, we propose new ways of improving and scaling up training of energy based models. We apply these equivariant energy models for modelling joint densities in regression and classification tasks for image datasets, many-body particle systems and molecular structure generation. △ Less

Submitted 29 July, 2021; v1 submitted 14 June, 2021; originally announced June 2021.

arXiv:2102.05379 [pdf, other]

Argmax Flows and Multinomial Diffusion: Learning Categorical Distributions

Authors: Emiel Hoogeboom, Didrik Nielsen, Priyank Jaini, Patrick Forré, Max Welling

Abstract: Generative flows and diffusion models have been predominantly trained on ordinal data, for example natural images. This paper introduces two extensions of flows and diffusion for categorical data such as language or image segmentation: Argmax Flows and Multinomial Diffusion. Argmax Flows are defined by a composition of a continuous distribution (such as a normalizing flow), and an argmax function.… ▽ More Generative flows and diffusion models have been predominantly trained on ordinal data, for example natural images. This paper introduces two extensions of flows and diffusion for categorical data such as language or image segmentation: Argmax Flows and Multinomial Diffusion. Argmax Flows are defined by a composition of a continuous distribution (such as a normalizing flow), and an argmax function. To optimize this model, we learn a probabilistic inverse for the argmax that lifts the categorical data to a continuous space. Multinomial Diffusion gradually adds categorical noise in a diffusion process, for which the generative denoising process is learned. We demonstrate that our method outperforms existing dequantization approaches on text modelling and modelling on image segmentation maps in log-likelihood. △ Less

Submitted 22 October, 2021; v1 submitted 10 February, 2021; originally announced February 2021.

Comments: Accepted at Neural Information Processing Systems (NeurIPS 2021)

arXiv:2102.02374 [pdf, other]

Sampling in Combinatorial Spaces with SurVAE Flow Augmented MCMC

Authors: Priyank Jaini, Didrik Nielsen, Max Welling

Abstract: Hybrid Monte Carlo is a powerful Markov Chain Monte Carlo method for sampling from complex continuous distributions. However, a major limitation of HMC is its inability to be applied to discrete domains due to the lack of gradient signal. In this work, we introduce a new approach based on augmenting Monte Carlo methods with SurVAE Flows to sample from discrete distributions using a combination of… ▽ More Hybrid Monte Carlo is a powerful Markov Chain Monte Carlo method for sampling from complex continuous distributions. However, a major limitation of HMC is its inability to be applied to discrete domains due to the lack of gradient signal. In this work, we introduce a new approach based on augmenting Monte Carlo methods with SurVAE Flows to sample from discrete distributions using a combination of neural transport methods like normalizing flows and variational dequantization, and the Metropolis-Hastings rule. Our method first learns a continuous embedding of the discrete space using a surjective map and subsequently learns a bijective transformation from the continuous space to an approximately Gaussian distributed latent variable. Sampling proceeds by simulating MCMC chains in the latent space and map** these samples to the target discrete space via the learned transformations. We demonstrate the efficacy of our algorithm on a range of examples from statistics, computational physics and machine learning, and observe improvements compared to alternative algorithms. △ Less

Submitted 1 March, 2021; v1 submitted 3 February, 2021; originally announced February 2021.

Comments: Accepted at AISTATS 2021; added experiments with longer MCMC chains

arXiv:2011.07248 [pdf, other]

Self Normalizing Flows

Authors: T. Anderson Keller, Jorn W. T. Peters, Priyank Jaini, Emiel Hoogeboom, Patrick Forré, Max Welling

Abstract: Efficient gradient computation of the Jacobian determinant term is a core problem in many machine learning settings, and especially so in the normalizing flow framework. Most proposed flow models therefore either restrict to a function class with easy evaluation of the Jacobian determinant, or an efficient estimator thereof. However, these restrictions limit the performance of such density models,… ▽ More Efficient gradient computation of the Jacobian determinant term is a core problem in many machine learning settings, and especially so in the normalizing flow framework. Most proposed flow models therefore either restrict to a function class with easy evaluation of the Jacobian determinant, or an efficient estimator thereof. However, these restrictions limit the performance of such density models, frequently requiring significant depth to reach desired performance levels. In this work, we propose Self Normalizing Flows, a flexible framework for training normalizing flows by replacing expensive terms in the gradient by learned approximate inverses at each layer. This reduces the computational complexity of each layer's exact update from $\mathcal{O}(D^3)$ to $\mathcal{O}(D^2)$, allowing for the training of flow architectures which were otherwise computationally infeasible, while also providing efficient sampling. We show experimentally that such models are remarkably stable and optimize to similar data likelihood values as their exact gradient counterparts, while training more quickly and surpassing the performance of functionally constrained counterparts. △ Less

Submitted 9 June, 2021; v1 submitted 14 November, 2020; originally announced November 2020.

arXiv:2007.02731 [pdf, other]

SurVAE Flows: Surjections to Bridge the Gap between VAEs and Flows

Authors: Didrik Nielsen, Priyank Jaini, Emiel Hoogeboom, Ole Winther, Max Welling

Abstract: Normalizing flows and variational autoencoders are powerful generative models that can represent complicated density functions. However, they both impose constraints on the models: Normalizing flows use bijective transformations to model densities whereas VAEs learn stochastic transformations that are non-invertible and thus typically do not provide tractable estimates of the marginal likelihood.… ▽ More Normalizing flows and variational autoencoders are powerful generative models that can represent complicated density functions. However, they both impose constraints on the models: Normalizing flows use bijective transformations to model densities whereas VAEs learn stochastic transformations that are non-invertible and thus typically do not provide tractable estimates of the marginal likelihood. In this paper, we introduce SurVAE Flows: A modular framework of composable transformations that encompasses VAEs and normalizing flows. SurVAE Flows bridge the gap between normalizing flows and VAEs with surjective transformations, wherein the transformations are deterministic in one direction -- thereby allowing exact likelihood computation, and stochastic in the reverse direction -- hence providing a lower bound on the corresponding likelihood. We show that several recently proposed methods, including dequantization and augmented normalizing flows, can be expressed as SurVAE Flows. Finally, we introduce common operations such as the max value, the absolute value, sorting and stochastic permutation as composable layers in SurVAE Flows. △ Less

Submitted 30 October, 2020; v1 submitted 6 July, 2020; originally announced July 2020.

arXiv:2003.03731 [pdf, ps, other]

A Positivstellensatz for Conditional SAGE Signomials

Authors: Allen Houze Wang, Priyank Jaini, Yaoliang Yu, Pascal Poupart

Abstract: Recently, the conditional SAGE certificate has been proposed as a sufficient condition for signomial positivity over a convex set. In this article, we show that the conditional SAGE certificate is $\textit{complete}$. That is, for any signomial $f(\mathbf{x}) = \sum_{j=1}^{\ell}c_j \exp(\mathbf{A}_j\mathbf{x})$ defined by rational exponents that is positive over a compact convex set $\mathcal{X}$,… ▽ More Recently, the conditional SAGE certificate has been proposed as a sufficient condition for signomial positivity over a convex set. In this article, we show that the conditional SAGE certificate is $\textit{complete}$. That is, for any signomial $f(\mathbf{x}) = \sum_{j=1}^{\ell}c_j \exp(\mathbf{A}_j\mathbf{x})$ defined by rational exponents that is positive over a compact convex set $\mathcal{X}$, there is $p \in \mathbb{Z}_+$ and a specific positive definite function $w(\mathbf{x})$ such that $w(\mathbf{x})^p f(\mathbf{x})$ may be verified by the conditional SAGE certificate. The completeness result is analogous to Positivstellensatz results from algebraic geometry, which guarantees representation of positive polynomials with sum of squares polynomials. The result gives rise to a convergent hierarchy of lower bounds for constrained signomial optimization over an $\textit{arbitrary}$ compact convex set that is computable via the conditional SAGE certificate. △ Less

Submitted 24 October, 2020; v1 submitted 8 March, 2020; originally announced March 2020.

Comments: 19 pages, preprint

arXiv:1907.04481 [pdf, other]

Tails of Lipschitz Triangular Flows

Authors: Priyank Jaini, Ivan Kobyzev, Yaoliang Yu, Marcus Brubaker

Abstract: We investigate the ability of popular flow based methods to capture tail-properties of a target density by studying the increasing triangular maps used in these flow methods acting on a tractable source density. We show that the density quantile functions of the source and target density provide a precise characterization of the slope of transformation required to capture tails in a target density… ▽ More We investigate the ability of popular flow based methods to capture tail-properties of a target density by studying the increasing triangular maps used in these flow methods acting on a tractable source density. We show that the density quantile functions of the source and target density provide a precise characterization of the slope of transformation required to capture tails in a target density. We further show that any Lipschitz-continuous transport map acting on a source density will result in a density with similar tail properties as the source, highlighting the trade-off between a complex source density and a sufficiently expressive transformation to capture desirable properties of a target density. Subsequently, we illustrate that flow models like Real-NVP, MAF, and Glow as implemented originally lack the ability to capture a distribution with non-Gaussian tails. We circumvent this problem by proposing tail-adaptive flows consisting of a source distribution that can be learned simultaneously with the triangular map to capture tail-properties of a target density. We perform several synthetic and real-world experiments to compliment our theoretical findings. △ Less

Submitted 18 September, 2020; v1 submitted 9 July, 2019; originally announced July 2019.

Comments: Published at the 37th International Conference of Machine Learning, (ICML 2020)

arXiv:1905.02325 [pdf, other]

Sum-of-Squares Polynomial Flow

Authors: Priyank Jaini, Kira A. Selby, Yaoliang Yu

Abstract: Triangular map is a recent construct in probability theory that allows one to transform any source probability density function to any target density function. Based on triangular maps, we propose a general framework for high-dimensional density estimation, by specifying one-dimensional transformations (equivalently conditional densities) and appropriate conditioner networks. This framework (a) re… ▽ More Triangular map is a recent construct in probability theory that allows one to transform any source probability density function to any target density function. Based on triangular maps, we propose a general framework for high-dimensional density estimation, by specifying one-dimensional transformations (equivalently conditional densities) and appropriate conditioner networks. This framework (a) reveals the commonalities and differences of existing autoregressive and flow based methods, (b) allows a unified understanding of the limitations and representation power of these recent approaches and, (c) motivates us to uncover a new Sum-of-Squares (SOS) flow that is interpretable, universal, and easy to train. We perform several synthetic experiments on various density geometries to demonstrate the benefits (and short-comings) of such transformations. SOS flows achieve competitive results in simulations and several real-world datasets. △ Less

Submitted 10 June, 2019; v1 submitted 6 May, 2019; originally announced May 2019.

Comments: 13 pages, ICML'2019

arXiv:1609.05881 [pdf, ps, other]

Online and Distributed learning of Gaussian mixture models by Bayesian Moment Matching

Authors: Priyank Jaini, Pascal Poupart

Abstract: The Gaussian mixture model is a classic technique for clustering and data modeling that is used in numerous applications. With the rise of big data, there is a need for parameter estimation techniques that can handle streaming data and distribute the computation over several processors. While online variants of the Expectation Maximization (EM) algorithm exist, their data efficiency is reduced by… ▽ More The Gaussian mixture model is a classic technique for clustering and data modeling that is used in numerous applications. With the rise of big data, there is a need for parameter estimation techniques that can handle streaming data and distribute the computation over several processors. While online variants of the Expectation Maximization (EM) algorithm exist, their data efficiency is reduced by a stochastic approximation of the E-step and it is not clear how to distribute the computation over multiple processors. We propose a Bayesian learning technique that lends itself naturally to online and distributed computation. Since the Bayesian posterior is not tractable, we project it onto a family of tractable distributions after each observation by matching a set of sufficient moments. This Bayesian moment matching technique compares favorably to online EM in terms of time and accuracy on a set of data modeling benchmarks. △ Less

Submitted 19 September, 2016; originally announced September 2016.

Showing 1–13 of 13 results for author: Jaini, P