-
Distributional barycenter problem through data-driven flows
Authors:
Esteban G. Tabak,
Giulio Trigila,
Wenjun Zhao
Abstract:
A new method is proposed for the solution of the data-driven optimal transport barycenter problem and of the more general distributional barycenter problem that the article introduces. The method improves on previous approaches based on adversarial games, by slaving the discriminator to the generator, minimizing the need for parameterizations and by allowing the adoption of general cost functions.…
▽ More
A new method is proposed for the solution of the data-driven optimal transport barycenter problem and of the more general distributional barycenter problem that the article introduces. The method improves on previous approaches based on adversarial games, by slaving the discriminator to the generator, minimizing the need for parameterizations and by allowing the adoption of general cost functions. It is applied to numerical examples, which include analyzing the MNIST data set with a new cost function that penalizes non-isometric maps.
△ Less
Submitted 29 April, 2021;
originally announced April 2021.
-
Conditional Density Estimation, Latent Variable Discovery and Optimal Transport
Authors:
Hongkang Yang,
Esteban G. Tabak
Abstract:
A framework is proposed that addresses both conditional density estimation and latent variable discovery. The objective function maximizes explanation of variability in the data, achieved through the optimal transport barycenter generalized to a collection of conditional distributions indexed by a covariate --either given or latent-- in any suitable space. Theoretical results establish the existen…
▽ More
A framework is proposed that addresses both conditional density estimation and latent variable discovery. The objective function maximizes explanation of variability in the data, achieved through the optimal transport barycenter generalized to a collection of conditional distributions indexed by a covariate --either given or latent-- in any suitable space. Theoretical results establish the existence of barycenters, a minimax formulation of optimal transport maps, and a general characterization of variability via the optimal transport cost. This framework leads to a family of non-parametric neural network-based algorithms, the BaryNet, with a supervised version that estimates conditional distributions and an unsupervised version that assigns latent variables. The efficacy of BaryNets is demonstrated by tests on both artificial and real-world data sets. A parallel drawn between autoencoders and the barycenter framework leads to the Barycentric autoencoder algorithm (BAE).
△ Less
Submitted 1 July, 2020; v1 submitted 30 October, 2019;
originally announced October 2019.
-
Data Driven Conditional Optimal Transport
Authors:
Esteban G. Tabak,
Giulio Trigila,
Wenjun Zhao
Abstract:
A data driven procedure is developed to compute the optimal map between two conditional probabilities $ρ(x|z_{1},...,z_{L})$ and $μ(y|z_{1},...,z_{L})$ depending on a set of covariates $z_{i}$. The procedure is tested on synthetic data from the ACIC Data Analysis Challenge 2017 and it is applied to non uniform lightness transfer between images. Exactly solvable examples and simulations are perform…
▽ More
A data driven procedure is developed to compute the optimal map between two conditional probabilities $ρ(x|z_{1},...,z_{L})$ and $μ(y|z_{1},...,z_{L})$ depending on a set of covariates $z_{i}$. The procedure is tested on synthetic data from the ACIC Data Analysis Challenge 2017 and it is applied to non uniform lightness transfer between images. Exactly solvable examples and simulations are performed to highlight the differences with ordinary optimal transport.
△ Less
Submitted 24 October, 2019;
originally announced October 2019.
-
An implicit gradient-descent procedure for minimax problems
Authors:
Montacer Essid,
Esteban Tabak,
Giulio Trigila
Abstract:
A game theory inspired methodology is proposed for finding a function's saddle points. While explicit descent methods are known to have severe convergence issues, implicit methods are natural in an adversarial setting, as they take the other player's optimal strategy into account. The implicit scheme proposed has an adaptive learning rate that makes it transition to Newton's method in the neighbor…
▽ More
A game theory inspired methodology is proposed for finding a function's saddle points. While explicit descent methods are known to have severe convergence issues, implicit methods are natural in an adversarial setting, as they take the other player's optimal strategy into account. The implicit scheme proposed has an adaptive learning rate that makes it transition to Newton's method in the neighborhood of saddle points. Convergence is shown through local analysis and, in non convex-concave settings, thorough numerical examples in optimal transport and linear programming. An ad-hoc quasi Newton method is developed for high dimensional problems, for which the inversion of the Hessian of the objective function may entail a high computational cost.
△ Less
Submitted 1 June, 2019;
originally announced June 2019.
-
Clustering, factor discovery and optimal transport
Authors:
Hongkang Yang,
Esteban G. Tabak
Abstract:
The clustering problem, and more generally, latent factor discovery --or latent space inference-- is formulated in terms of the Wasserstein barycenter problem from optimal transport. The objective proposed is the maximization of the variability attributable to class, further characterized as the minimization of the variance of the Wasserstein barycenter. Existing theory, which constrains the trans…
▽ More
The clustering problem, and more generally, latent factor discovery --or latent space inference-- is formulated in terms of the Wasserstein barycenter problem from optimal transport. The objective proposed is the maximization of the variability attributable to class, further characterized as the minimization of the variance of the Wasserstein barycenter. Existing theory, which constrains the transport maps to rigid translations, is extended to affine transformations. The resulting non-parametric clustering algorithms include k-means as a special case and exhibit more robust performance. A continuous version of these algorithms discovers continuous latent variables and generalizes principal curves. The strength of these algorithms is demonstrated by tests on both artificial and real-world data sets.
△ Less
Submitted 21 September, 2020; v1 submitted 26 February, 2019;
originally announced February 2019.
-
Adaptive Optimal Transport
Authors:
Montacer Essid,
Debra Laefer,
Esteban G. Tabak
Abstract:
An adaptive, adversarial methodology is developed for the optimal transport problem between two distributions $μ$ and $ν$, known only through a finite set of independent samples $(x_i)_{i=1..N}$ and $(y_j)_{j=1..M}$. The methodology automatically creates features that adapt to the data, thus avoiding reliance on a priori knowledge of data distribution. Specifically, instead of a discrete point-byp…
▽ More
An adaptive, adversarial methodology is developed for the optimal transport problem between two distributions $μ$ and $ν$, known only through a finite set of independent samples $(x_i)_{i=1..N}$ and $(y_j)_{j=1..M}$. The methodology automatically creates features that adapt to the data, thus avoiding reliance on a priori knowledge of data distribution. Specifically, instead of a discrete point-bypoint assignment, the new procedure seeks an optimal map $T(x)$ defined for all $x$, minimizing the Kullback-Leibler divergence between $(T(xi))$ and the target $(y_j)$. The relative entropy is given a sample-based, variational characterization, thereby creating an adversarial setting: as one player seeks to push forward one distribution to the other, the second player develops features that focus on those areas where the two distributions fail to match. The procedure solves local problems matching consecutive, intermediate distributions between $μ$ and $ν$. As a result, maps of arbitrary complexity can be built by composing the simple maps used for each local problem. Displaced interpolation is used to guarantee global from local optimality. The procedure is illustrated through synthetic examples in one and two dimensions.
△ Less
Submitted 18 February, 2019; v1 submitted 1 July, 2018;
originally announced July 2018.
-
The data-driven Schroedinger bridge
Authors:
Michele Pavon,
Esteban G Tabak,
Giulio Trigila
Abstract:
Erwin Schroedinger posed, and to a large extent solved in 1931/32 the problem of finding the most likely random evolution between two continuous probability distributions. This article considers this problem in the case when only samples of the two distributions are available. A novel iterative procedure is proposed, inspired by Fortet-Sinkhorn type algorithms. Since only samples of the marginals…
▽ More
Erwin Schroedinger posed, and to a large extent solved in 1931/32 the problem of finding the most likely random evolution between two continuous probability distributions. This article considers this problem in the case when only samples of the two distributions are available. A novel iterative procedure is proposed, inspired by Fortet-Sinkhorn type algorithms. Since only samples of the marginals are available, the new approach features constrained maximum likelihood estimation in place of the nonlinear boundary couplings, and importance sampling to propagate the functions $\varphi$ and $\hat{\varphi}$ solving the Schroedinger system. This method is well-suited to high-dimensional settings, where introducing grids leads to numerically unfeasible or unreliable methods. The methodology is illustrated in two applications: entropic interpolation of two-dimensional Gaussian mixtures, and the estimation of integrals through a variation of importance sampling.
△ Less
Submitted 5 June, 2018; v1 submitted 4 June, 2018;
originally announced June 2018.
-
A data-driven linear-programming methodology for optimal transport
Authors:
Weikun Chen,
Esteban G. Tabak
Abstract:
A data-driven formulation of the optimal transport problem is presented and solved using adaptively refined meshes to decompose the problem into a sequence of finite linear programming problems. Both the marginal distributions and their unknown optimal coupling are approximated through mixtures, which decouples the problem into the the optimal transport between the individual components of the mix…
▽ More
A data-driven formulation of the optimal transport problem is presented and solved using adaptively refined meshes to decompose the problem into a sequence of finite linear programming problems. Both the marginal distributions and their unknown optimal coupling are approximated through mixtures, which decouples the problem into the the optimal transport between the individual components of the mixtures and a classical assignment problem linking them all. A factorization of the components into products of single-variable distributions makes the first sub-problem solvable in closed form. The size of the assignment problem is addressed through an adaptive procedure: a sequence of linear programming problems which utilize at each level the solution from the previous coarser mesh to restrict the size of the function space where solutions are sought. The linear programming approach for pairwise optimal transportation, combined with an iterative scheme, gives a data driven algorithm for the Wasserstein barycenter problem, which is well suited to parallel computing.
△ Less
Submitted 9 October, 2017;
originally announced October 2017.
-
Principal dynamical components
Authors:
Manuel D. de la Iglesia,
Esteban G. Tabak
Abstract:
A new procedure is proposed for the dimensional reduction of time series. Similarly to principal components, the procedure seeks a low-dimensional manifold that minimizes information loss. Unlike principal components, however, the new procedure involves dynamical considerations, through the proposal of a predictive dynamical model in the reduced manifold. Hence the minimization of the uncertainty…
▽ More
A new procedure is proposed for the dimensional reduction of time series. Similarly to principal components, the procedure seeks a low-dimensional manifold that minimizes information loss. Unlike principal components, however, the new procedure involves dynamical considerations, through the proposal of a predictive dynamical model in the reduced manifold. Hence the minimization of the uncertainty is not only over the choice of a reduced manifold, as in principal components, but also over the parameters of the dynamical model. Further generalizations are provided to non-autonomous and non-Markovian scenarios, which are then applied to historical sea-surface temperature data.
△ Less
Submitted 17 December, 2010;
originally announced December 2010.