-
Nonlinear denoising score matching for enhanced learning of structured distributions
Authors:
Jeremiah Birrell,
Markos A. Katsoulakis,
Luc Rey-Bellet,
Benjamin Zhang,
Wei Zhu
Abstract:
We present a novel method for training score-based generative models which uses nonlinear noising dynamics to improve learning of structured distributions. Generalizing to a nonlinear drift allows for additional structure to be incorporated into the dynamics, thus making the training better adapted to the data, e.g., in the case of multimodality or (approximate) symmetries. Such structure can be o…
▽ More
We present a novel method for training score-based generative models which uses nonlinear noising dynamics to improve learning of structured distributions. Generalizing to a nonlinear drift allows for additional structure to be incorporated into the dynamics, thus making the training better adapted to the data, e.g., in the case of multimodality or (approximate) symmetries. Such structure can be obtained from the data by an inexpensive preprocessing step. The nonlinear dynamics introduces new challenges into training which we address in two ways: 1) we develop a new nonlinear denoising score matching (NDSM) method, 2) we introduce neural control variates in order to reduce the variance of the NDSM training objective. We demonstrate the effectiveness of this method on several examples: a) a collection of low-dimensional examples, motivated by clustering in latent space, b) high-dimensional images, addressing issues with mode collapse, small training sets, and approximate symmetries, the latter being a challenge for methods based on equivariant neural networks, which require exact symmetries.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Learning heavy-tailed distributions with Wasserstein-proximal-regularized $α$-divergences
Authors:
Ziyu Chen,
Hyemin Gu,
Markos A. Katsoulakis,
Luc Rey-Bellet,
Wei Zhu
Abstract:
In this paper, we propose Wasserstein proximals of $α$-divergences as suitable objective functionals for learning heavy-tailed distributions in a stable manner. First, we provide sufficient, and in some cases necessary, relations among data dimension, $α$, and the decay rate of data distributions for the Wasserstein-proximal-regularized divergence to be finite. Finite-sample convergence rates for…
▽ More
In this paper, we propose Wasserstein proximals of $α$-divergences as suitable objective functionals for learning heavy-tailed distributions in a stable manner. First, we provide sufficient, and in some cases necessary, relations among data dimension, $α$, and the decay rate of data distributions for the Wasserstein-proximal-regularized divergence to be finite. Finite-sample convergence rates for the estimation in the case of the Wasserstein-1 proximal divergences are then provided under certain tail conditions. Numerical experiments demonstrate stable learning of heavy-tailed distributions -- even those without first or second moment -- without any explicit knowledge of the tail behavior, using suitable generative models such as GANs and flow-based models related to our proposed Wasserstein-proximal-regularized $α$-divergences. Heuristically, $α$-divergences handle the heavy tails and Wasserstein proximals allow non-absolute continuity between distributions and control the velocities of flow-based algorithms as they learn the target distribution deep into the tails.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Statistical Guarantees of Group-Invariant GANs
Authors:
Ziyu Chen,
Markos A. Katsoulakis,
Luc Rey-Bellet,
Wei Zhu
Abstract:
Group-invariant generative adversarial networks (GANs) are a type of GANs in which the generators and discriminators are hardwired with group symmetries. Empirical studies have shown that these networks are capable of learning group-invariant distributions with significantly improved data efficiency. In this study, we aim to rigorously quantify this improvement by analyzing the reduction in sample…
▽ More
Group-invariant generative adversarial networks (GANs) are a type of GANs in which the generators and discriminators are hardwired with group symmetries. Empirical studies have shown that these networks are capable of learning group-invariant distributions with significantly improved data efficiency. In this study, we aim to rigorously quantify this improvement by analyzing the reduction in sample complexity for group-invariant GANs. Our findings indicate that when learning group-invariant distributions, the number of samples required for group-invariant GANs decreases proportionally by a factor of the group size. Importantly, this sample complexity reduction cannot be achieved merely through data augmentation due to the probabilistic dependence of augmented data. Numerical results substantiate our theory and highlight the stark contrast between learning with group-invariant GANs and using data augmentation. This work presents the first statistical performance guarantees for group-invariant generative models, specifically for GANs, and it may shed light on the study of other generative models with group symmetries.
△ Less
Submitted 4 June, 2024; v1 submitted 22 May, 2023;
originally announced May 2023.
-
Lipschitz-regularized gradient flows and generative particle algorithms for high-dimensional scarce data
Authors:
Hyemin Gu,
Panagiota Birmpa,
Yannis Pantazis,
Luc Rey-Bellet,
Markos A. Katsoulakis
Abstract:
We build a new class of generative algorithms capable of efficiently learning an arbitrary target distribution from possibly scarce, high-dimensional data and subsequently generate new samples. These generative algorithms are particle-based and are constructed as gradient flows of Lipschitz-regularized Kullback-Leibler or other $f$-divergences, where data from a source distribution can be stably t…
▽ More
We build a new class of generative algorithms capable of efficiently learning an arbitrary target distribution from possibly scarce, high-dimensional data and subsequently generate new samples. These generative algorithms are particle-based and are constructed as gradient flows of Lipschitz-regularized Kullback-Leibler or other $f$-divergences, where data from a source distribution can be stably transported as particles, towards the vicinity of the target distribution. As a highlighted result in data integration, we demonstrate that the proposed algorithms correctly transport gene expression data points with dimension exceeding 54K, while the sample size is typically only in the hundreds.
△ Less
Submitted 24 July, 2023; v1 submitted 31 October, 2022;
originally announced October 2022.
-
Function-space regularized Rényi divergences
Authors:
Jeremiah Birrell,
Yannis Pantazis,
Paul Dupuis,
Markos A. Katsoulakis,
Luc Rey-Bellet
Abstract:
We propose a new family of regularized Rényi divergences parametrized not only by the order $α$ but also by a variational function space. These new objects are defined by taking the infimal convolution of the standard Rényi divergence with the integral probability metric (IPM) associated with the chosen function space. We derive a novel dual variational representation that can be used to construct…
▽ More
We propose a new family of regularized Rényi divergences parametrized not only by the order $α$ but also by a variational function space. These new objects are defined by taking the infimal convolution of the standard Rényi divergence with the integral probability metric (IPM) associated with the chosen function space. We derive a novel dual variational representation that can be used to construct numerically tractable divergence estimators. This representation avoids risk-sensitive terms and therefore exhibits lower variance, making it well-behaved when $α>1$; this addresses a notable weakness of prior approaches. We prove several properties of these new divergences, showing that they interpolate between the classical Rényi divergences and IPMs. We also study the $α\to\infty$ limit, which leads to a regularized worst-case-regret and a new variational representation in the classical case. Moreover, we show that the proposed regularized Rényi divergences inherit features from IPMs such as the ability to compare distributions that are not absolutely continuous, e.g., empirical measures and distributions with low-dimensional support. We present numerical results on both synthetic and real datasets, showing the utility of these new divergences in both estimation and GAN training applications; in particular, we demonstrate significantly reduced variance and improved training performance.
△ Less
Submitted 14 February, 2023; v1 submitted 10 October, 2022;
originally announced October 2022.
-
Structure-preserving GANs
Authors:
Jeremiah Birrell,
Markos A. Katsoulakis,
Luc Rey-Bellet,
Wei Zhu
Abstract:
Generative adversarial networks (GANs), a class of distribution-learning methods based on a two-player game between a generator and a discriminator, can generally be formulated as a minmax problem based on the variational representation of a divergence between the unknown and the generated distributions. We introduce structure-preserving GANs as a data-efficient framework for learning distribution…
▽ More
Generative adversarial networks (GANs), a class of distribution-learning methods based on a two-player game between a generator and a discriminator, can generally be formulated as a minmax problem based on the variational representation of a divergence between the unknown and the generated distributions. We introduce structure-preserving GANs as a data-efficient framework for learning distributions with additional structure such as group symmetry, by develo** new variational representations for divergences. Our theory shows that we can reduce the discriminator space to its projection on the invariant discriminator space, using the conditional expectation with respect to the sigma-algebra associated to the underlying structure. In addition, we prove that the discriminator space reduction must be accompanied by a careful design of structured generators, as flawed designs may easily lead to a catastrophic "mode collapse" of the learned distribution. We contextualize our framework by building symmetry-preserving GANs for distributions with intrinsic group symmetry, and demonstrate that both players, namely the equivariant generator and invariant discriminator, play important but distinct roles in the learning process. Empirical experiments and ablation studies across a broad range of data sets, including real-world medical imaging, validate our theory, and show our proposed methods achieve significantly improved sample fidelity and diversity -- almost an order of magnitude measured in Fréchet Inception Distance -- especially in the small data regime.
△ Less
Submitted 17 June, 2022; v1 submitted 2 February, 2022;
originally announced February 2022.
-
Model Uncertainty and Correctability for Directed Graphical Models
Authors:
Panagiota Birmpa,
**chao Feng,
Markos A. Katsoulakis,
Luc Rey-Bellet
Abstract:
Probabilistic graphical models are a fundamental tool in probabilistic modeling, machine learning and artificial intelligence. They allow us to integrate in a natural way expert knowledge, physical modeling, heterogeneous and correlated data and quantities of interest. For exactly this reason, multiple sources of model uncertainty are inherent within the modular structure of the graphical model. I…
▽ More
Probabilistic graphical models are a fundamental tool in probabilistic modeling, machine learning and artificial intelligence. They allow us to integrate in a natural way expert knowledge, physical modeling, heterogeneous and correlated data and quantities of interest. For exactly this reason, multiple sources of model uncertainty are inherent within the modular structure of the graphical model. In this paper we develop information-theoretic, robust uncertainty quantification methods and non-parametric stress tests for directed graphical models to assess the effect and the propagation through the graph of multi-sourced model uncertainties to quantities of interest. These methods allow us to rank the different sources of uncertainty and correct the graphical model by targeting its most impactful components with respect to the quantities of interest. Thus, from a machine learning perspective, we provide a mathematically rigorous approach to correctability that guarantees a systematic selection for improvement of components of a graphical model while controlling potential new errors created in the process in other parts of the model. We demonstrate our methods in two physico-chemical examples, namely quantum scale-informed chemical kinetics and materials screening to improve the efficiency of fuel cells.
△ Less
Submitted 17 July, 2021;
originally announced July 2021.
-
$(f,Γ)$-Divergences: Interpolating between $f$-Divergences and Integral Probability Metrics
Authors:
Jeremiah Birrell,
Paul Dupuis,
Markos A. Katsoulakis,
Yannis Pantazis,
Luc Rey-Bellet
Abstract:
We develop a rigorous and general framework for constructing information-theoretic divergences that subsume both $f$-divergences and integral probability metrics (IPMs), such as the $1$-Wasserstein distance. We prove under which assumptions these divergences, hereafter referred to as $(f,Γ)$-divergences, provide a notion of `distance' between probability measures and show that they can be expresse…
▽ More
We develop a rigorous and general framework for constructing information-theoretic divergences that subsume both $f$-divergences and integral probability metrics (IPMs), such as the $1$-Wasserstein distance. We prove under which assumptions these divergences, hereafter referred to as $(f,Γ)$-divergences, provide a notion of `distance' between probability measures and show that they can be expressed as a two-stage mass-redistribution/mass-transport process. The $(f,Γ)$-divergences inherit features from IPMs, such as the ability to compare distributions which are not absolutely continuous, as well as from $f$-divergences, namely the strict concavity of their variational representations and the ability to control heavy-tailed distributions for particular choices of $f$. When combined, these features establish a divergence with improved properties for estimation, statistical learning, and uncertainty quantification applications. Using statistical learning as an example, we demonstrate their advantage in training generative adversarial networks (GANs) for heavy-tailed, not-absolutely continuous sample distributions. We also show improved performance and stability over gradient-penalized Wasserstein GAN in image generation.
△ Less
Submitted 15 September, 2021; v1 submitted 11 November, 2020;
originally announced November 2020.
-
Variational Representations and Neural Network Estimation of Rényi Divergences
Authors:
Jeremiah Birrell,
Paul Dupuis,
Markos A. Katsoulakis,
Luc Rey-Bellet,
Jie Wang
Abstract:
We derive a new variational formula for the Rényi family of divergences, $R_α(Q\|P)$, between probability measures $Q$ and $P$. Our result generalizes the classical Donsker-Varadhan variational formula for the Kullback-Leibler divergence. We further show that this Rényi variational formula holds over a range of function spaces; this leads to a formula for the optimizer under very weak assumptions…
▽ More
We derive a new variational formula for the Rényi family of divergences, $R_α(Q\|P)$, between probability measures $Q$ and $P$. Our result generalizes the classical Donsker-Varadhan variational formula for the Kullback-Leibler divergence. We further show that this Rényi variational formula holds over a range of function spaces; this leads to a formula for the optimizer under very weak assumptions and is also key in our development of a consistency theory for Rényi divergence estimators. By applying this theory to neural-network estimators, we show that if a neural network family satisfies one of several strengthened versions of the universal approximation property then the corresponding Rényi divergence estimator is consistent. In contrast to density-estimator based methods, our estimators involve only expectations under $Q$ and $P$ and hence are more effective in high dimensional systems. We illustrate this via several numerical examples of neural network estimation in systems of up to 5000 dimensions.
△ Less
Submitted 20 July, 2021; v1 submitted 7 July, 2020;
originally announced July 2020.
-
Quantification of Model Uncertainty on Path-Space via Goal-Oriented Relative Entropy
Authors:
Jeremiah Birrell,
Markos A. Katsoulakis,
Luc Rey-Bellet
Abstract:
Quantifying the impact of parametric and model-form uncertainty on the predictions of stochastic models is a key challenge in many applications. Previous work has shown that the relative entropy rate is an effective tool for deriving path-space uncertainty quantification (UQ) bounds on ergodic averages. In this work we identify appropriate information-theoretic objects for a wider range of quantit…
▽ More
Quantifying the impact of parametric and model-form uncertainty on the predictions of stochastic models is a key challenge in many applications. Previous work has shown that the relative entropy rate is an effective tool for deriving path-space uncertainty quantification (UQ) bounds on ergodic averages. In this work we identify appropriate information-theoretic objects for a wider range of quantities of interest on path-space, such as hitting times and exponentially discounted observables, and develop the corresponding UQ bounds. In addition, our method yields tighter UQ bounds, even in cases where previous relative-entropy-based methods also apply, e.g., for ergodic averages. We illustrate these results with examples from option pricing, non-reversible diffusion processes, stochastic control, semi-Markov queueing models, and expectations and distributions of hitting times.
△ Less
Submitted 2 September, 2020; v1 submitted 21 June, 2019;
originally announced June 2019.
-
How biased is your model? Concentration Inequalities, Information and Model Bias
Authors:
Konstantinos Gourgoulias,
Markos A. Katsoulakis,
Luc Rey-Bellet,
Jie Wang
Abstract:
We derive tight and computable bounds on the bias of statistical estimators, or more generally of quantities of interest, when evaluated on a baseline model P rather than on the typically unknown true model Q. Our proposed method combines the scalable information inequality derived by P. Dupuis, K.Chowdhary, the authors and their collaborators together with classical concentration inequalities (su…
▽ More
We derive tight and computable bounds on the bias of statistical estimators, or more generally of quantities of interest, when evaluated on a baseline model P rather than on the typically unknown true model Q. Our proposed method combines the scalable information inequality derived by P. Dupuis, K.Chowdhary, the authors and their collaborators together with classical concentration inequalities (such as Bennett's and Hoeffding-Azuma inequalities). Our bounds are expressed in terms of the Kullback-Leibler divergence R(Q||P) of model Q with respect to P and the moment generating function for the statistical estimator under P. Furthermore, concentration inequalities, i.e. bounds on moment generating functions, provide tight and computationally inexpensive model bias bounds for quantities of interest. Finally, they allow us to derive rigorous confidence bands for statistical estimators that account for model bias and are valid for an arbitrary amount of data.
△ Less
Submitted 30 June, 2017;
originally announced June 2017.
-
Positive feedback in coordination games: stochastic evolutionary dynamics and the logit choice rule
Authors:
Sung-Ha Hwang,
Luc Rey-Bellet
Abstract:
We study the problem of stochastic stability for evolutionary dynamics under the logit choice rule. We consider general classes of coordination games, symmetric or asymmetric, with an arbitrary number of strategies, which satisfies the marginal bandwagon property (i.e., there is positive feedback to coordinate). Our main result is that the most likely evolutionary escape paths from a status quo co…
▽ More
We study the problem of stochastic stability for evolutionary dynamics under the logit choice rule. We consider general classes of coordination games, symmetric or asymmetric, with an arbitrary number of strategies, which satisfies the marginal bandwagon property (i.e., there is positive feedback to coordinate). Our main result is that the most likely evolutionary escape paths from a status quo convention consist of a series of identical mistakes. As an application of our result, we show that the Nash bargaining solution arises as the long run convention for the evolutionary Nash demand game under the usual logit choice rule. We also obtain a new bargaining solution if the logit choice rule is combined with intentional idiosyncratic plays. The new bargaining solution is more egalitarian than the Nash bargaining solution, demonstrating that intentionality implies equality under the logit choice model.
△ Less
Submitted 11 January, 2021; v1 submitted 17 January, 2017;
originally announced January 2017.
-
Scalable Information Inequalities for Uncertainty Quantification
Authors:
Markos A. Katsoulakis,
Luc Rey-Bellet,
Jie Wang
Abstract:
In this paper we demonstrate the only available scalable information bounds for quantities of interest of high dimensional probabilistic models. Scalability of inequalities allows us to (a) obtain uncertainty quantification bounds for quantities of interest in the large degree of freedom limit and/or at long time regimes; (b) assess the impact of large model perturbations as in nonlinear response…
▽ More
In this paper we demonstrate the only available scalable information bounds for quantities of interest of high dimensional probabilistic models. Scalability of inequalities allows us to (a) obtain uncertainty quantification bounds for quantities of interest in the large degree of freedom limit and/or at long time regimes; (b) assess the impact of large model perturbations as in nonlinear response regimes in statistical mechanics; (c) address model-form uncertainty, i.e. compare different extended models and corresponding quantities of interest. We demonstrate some of these properties by deriving robust uncertainty quantification bounds for phase diagrams in statistical mechanics models.
△ Less
Submitted 13 May, 2016;
originally announced May 2016.
-
Information Criteria for quantifying loss of reversibility in parallelized KMC
Authors:
Konstantinos Gourgoulias,
Markos A. Katsoulakis,
Luc Rey-Bellet
Abstract:
Parallel Kinetic Monte Carlo (KMC) is a potent tool to simulate stochastic particle systems efficiently. However, despite literature on quantifying domain decomposition errors of the particle system for this class of algorithms in the short and in the long time regime, no study yet explores and quantifies the loss of time-reversibility in Parallel KMC. Inspired by concepts from non-equilibrium sta…
▽ More
Parallel Kinetic Monte Carlo (KMC) is a potent tool to simulate stochastic particle systems efficiently. However, despite literature on quantifying domain decomposition errors of the particle system for this class of algorithms in the short and in the long time regime, no study yet explores and quantifies the loss of time-reversibility in Parallel KMC. Inspired by concepts from non-equilibrium statistical mechanics, we propose the entropy production per unit time, or entropy production rate, given in terms of an observable and a corresponding estimator, as a metric that quantifies the loss of reversibility. Typically, this is a quantity that cannot be computed explicitly for Parallel KMC, which is why we develop a posteriori estimators that have good scaling properties with respect to the size of the system. Through these estimators, we can connect the different parameters of the scheme, such as the communication time step of the parallelization, the choice of the domain decomposition, and the computational schedule, with its performance in controlling the loss of reversibility. From this point of view, the entropy production rate can be seen both as an information criterion to compare the reversibility of different parallel schemes and as a tool to diagnose reversibility issues with a particular scheme. As a demonstration, we use Sandia Lab's SPPARKS software to compare different parallelization schemes and different domain (lattice) decompositions.
△ Less
Submitted 16 October, 2016; v1 submitted 8 May, 2016;
originally announced May 2016.
-
Strategic Decompositions of Normal Form Games: Zero-sum Games and Potential Games
Authors:
Sung-Ha Hwang,
Luc Rey-Bellet
Abstract:
We study new classes of games, called zero-sum equivalent games and zero-sum equivalent potential games, and prove decomposition theorems involving these classes of games. We say that two games are "strategically equivalent" if, for every player, the payoff differences between two strategies (holding other players' strategies fixed) are identical. A zero-sum equivalent game is a game that is strat…
▽ More
We study new classes of games, called zero-sum equivalent games and zero-sum equivalent potential games, and prove decomposition theorems involving these classes of games. We say that two games are "strategically equivalent" if, for every player, the payoff differences between two strategies (holding other players' strategies fixed) are identical. A zero-sum equivalent game is a game that is strategically equivalent to a zero-sum game; a zero-sum equivalent potential game is a zero-sum equivalent game that is strategically equivalent to a common interest game. We also call a game "normalized" if the sum of one player's payoffs, given the other players' strategies, is always zero. We show that any normal form game can be uniquely decomposed into either (i) a zero-sum equivalent game and a normalized common interest game, or (ii) a zero-sum equivalent potential game, a normalized zero-sum game, and a normalized common interest game, each with distinctive equilibrium properties. For example, we show that two-player zero-sum equivalent games with finite strategy sets generically have a unique Nash equilibrium and that two-player zero-sum equivalent potential games with finite strategy sets generically have a strictly dominant Nash equilibrium.
△ Less
Submitted 18 May, 2020; v1 submitted 22 February, 2016;
originally announced February 2016.
-
Simple Characterizations of Potential Games and Zero-sum Games
Authors:
Sung-Ha Hwang,
Luc Rey-Bellet
Abstract:
We provide several tests to determine whether a game is a potential game or whether it is a zero-sum equivalent game---a game which is strategically equivalent to a zero-sum game in the same way that a potential game is strategically equivalent to a common interest game. We present a unified framework applicable for both potential and zero-sum equivalent games by deriving a simple but useful chara…
▽ More
We provide several tests to determine whether a game is a potential game or whether it is a zero-sum equivalent game---a game which is strategically equivalent to a zero-sum game in the same way that a potential game is strategically equivalent to a common interest game. We present a unified framework applicable for both potential and zero-sum equivalent games by deriving a simple but useful characterization of these games. This allows us to re-derive known criteria for potential games, as well as obtain several new criteria. In particular, we prove (1) new integral tests for potential games and for zero-sum equivalent games, (2) a new derivative test for zero-sum equivalent games, and (3) a new representation characterization for zero-sum equivalent games.
△ Less
Submitted 24 February, 2020; v1 submitted 13 February, 2016;
originally announced February 2016.
-
Decompositions of two player games: potential, zero-sum, and stable games
Authors:
Sung-Ha Hwang,
Luc Rey-Bellet
Abstract:
We introduce several methods of decomposition for two player normal form games. Viewing the set of all games as a vector space, we exhibit explicit orthonormal bases for the subspaces of potential games, zero-sum games, and their orthogonal complements which we call anti-potential games and anti-zero-sum games, respectively. Perhaps surprisingly, every anti-potential game comes either from the Roc…
▽ More
We introduce several methods of decomposition for two player normal form games. Viewing the set of all games as a vector space, we exhibit explicit orthonormal bases for the subspaces of potential games, zero-sum games, and their orthogonal complements which we call anti-potential games and anti-zero-sum games, respectively. Perhaps surprisingly, every anti-potential game comes either from the Rock-Paper-Scissors type games (in the case of symmetric games) or from the Matching Pennies type games (in the case of asymmetric games). Using these decompositions, we prove old (and some new) cycle criteria for potential and zero-sum games (as orthogonality relations between subspaces). We illustrate the usefulness of our decomposition by (a) analyzing the generalized Rock-Paper-Scissors game, (b) completely characterizing the set of all null-stable games, (c) providing a large class of strict stable games, (d) relating the game decomposition to the decomposition of vector fields for the replicator equations, (e) constructing Lyapunov functions for some replicator dynamics, and (f) constructing Zeeman games -games with an interior asymptotically stable Nash equilibrium and a pure strategy ESS.
△ Less
Submitted 18 July, 2011; v1 submitted 17 June, 2011;
originally announced June 2011.