-
An implicit gradient-descent procedure for minimax problems
Authors:
Montacer Essid,
Esteban Tabak,
Giulio Trigila
Abstract:
A game theory inspired methodology is proposed for finding a function's saddle points. While explicit descent methods are known to have severe convergence issues, implicit methods are natural in an adversarial setting, as they take the other player's optimal strategy into account. The implicit scheme proposed has an adaptive learning rate that makes it transition to Newton's method in the neighbor…
▽ More
A game theory inspired methodology is proposed for finding a function's saddle points. While explicit descent methods are known to have severe convergence issues, implicit methods are natural in an adversarial setting, as they take the other player's optimal strategy into account. The implicit scheme proposed has an adaptive learning rate that makes it transition to Newton's method in the neighborhood of saddle points. Convergence is shown through local analysis and, in non convex-concave settings, thorough numerical examples in optimal transport and linear programming. An ad-hoc quasi Newton method is developed for high dimensional problems, for which the inversion of the Hessian of the objective function may entail a high computational cost.
△ Less
Submitted 1 June, 2019;
originally announced June 2019.
-
Adaptive Optimal Transport
Authors:
Montacer Essid,
Debra Laefer,
Esteban G. Tabak
Abstract:
An adaptive, adversarial methodology is developed for the optimal transport problem between two distributions $μ$ and $ν$, known only through a finite set of independent samples $(x_i)_{i=1..N}$ and $(y_j)_{j=1..M}$. The methodology automatically creates features that adapt to the data, thus avoiding reliance on a priori knowledge of data distribution. Specifically, instead of a discrete point-byp…
▽ More
An adaptive, adversarial methodology is developed for the optimal transport problem between two distributions $μ$ and $ν$, known only through a finite set of independent samples $(x_i)_{i=1..N}$ and $(y_j)_{j=1..M}$. The methodology automatically creates features that adapt to the data, thus avoiding reliance on a priori knowledge of data distribution. Specifically, instead of a discrete point-bypoint assignment, the new procedure seeks an optimal map $T(x)$ defined for all $x$, minimizing the Kullback-Leibler divergence between $(T(xi))$ and the target $(y_j)$. The relative entropy is given a sample-based, variational characterization, thereby creating an adversarial setting: as one player seeks to push forward one distribution to the other, the second player develops features that focus on those areas where the two distributions fail to match. The procedure solves local problems matching consecutive, intermediate distributions between $μ$ and $ν$. As a result, maps of arbitrary complexity can be built by composing the simple maps used for each local problem. Displaced interpolation is used to guarantee global from local optimality. The procedure is illustrated through synthetic examples in one and two dimensions.
△ Less
Submitted 18 February, 2019; v1 submitted 1 July, 2018;
originally announced July 2018.
-
Traversing the Schroedinger Bridge strait: Robert Fortet's marvelous proof redux
Authors:
Montacer Essid,
Michele Pavon
Abstract:
In the early 1930's, Erwin Schroedinger, motivated by his quest for a more classical formulation of quantum mechanics, posed a large deviation problem for a cloud of independent Brownian particles. He showed that the solution to the problem could be obtained trough a system of two linear equations with nonlinear coupling at the boundary (Schrödinger system). Existence and uniqueness for such a sys…
▽ More
In the early 1930's, Erwin Schroedinger, motivated by his quest for a more classical formulation of quantum mechanics, posed a large deviation problem for a cloud of independent Brownian particles. He showed that the solution to the problem could be obtained trough a system of two linear equations with nonlinear coupling at the boundary (Schrödinger system). Existence and uniqueness for such a system, which represents a sort of bottleneck for the problem, was first established by R. Fortet in 1938/40 under rather general assumptions by proving convergence of an ingenious but complex approximation method. It is the first proof of what are nowadays called Sinkhorn-type algorithms in the much more challenging continuous case. Schrödinger bridges are also an early example of the maximum entropy approach and have been more recently recognized as a regularization of the important Optimal Mass Transport problem. Unfortunately, Fortet's contribution is by and large ignored in contemporary literature. This is likely due to the complexity of his approach coupled with an idiosyncratic exposition style and to missing details and steps in the proofs. Nevertheless, Fortet's approach maintains its importance to this day as it provides the only existing algorithmic proof under rather mild assumptions. It can be adapted, in principle, to other relevant problems such as the regularized Wasserstein barycenter problem. It is the purpose of this paper to remedy this situation by rewriting the bulk of his paper with all the missing passages and in a transparent fashion so as to make it fully available to the scientific community. We consider the problem in $R^d$ rather than $R$ and use as much as possible his notation to facilitate comparison.
△ Less
Submitted 19 September, 2018; v1 submitted 7 May, 2018;
originally announced May 2018.
-
Quadratically-Regularized Optimal Transport on Graphs
Authors:
Montacer Essid,
Justin Solomon
Abstract:
Optimal transportation provides a means of lifting distances between points on a geometric domain to distances between signals over the domain, expressed as probability distributions. On a graph, transportation problems can be used to express challenging tasks involving matching supply to demand with minimal shipment expense; in discrete language, these become minimum-cost network flow problems. R…
▽ More
Optimal transportation provides a means of lifting distances between points on a geometric domain to distances between signals over the domain, expressed as probability distributions. On a graph, transportation problems can be used to express challenging tasks involving matching supply to demand with minimal shipment expense; in discrete language, these become minimum-cost network flow problems. Regularization typically is needed to ensure uniqueness for the linear ground distance case and to improve optimization convergence; state-of-the-art techniques employ entropic regularization on the transportation matrix. In this paper, we explore a quadratic alternative to entropic regularization for transport over a graph. We theoretically analyze the behavior of quadratically-regularized graph transport, characterizing how regularization affects the structure of flows in the regime of small but nonzero regularization. We further exploit elegant second-order structure in the dual of this problem to derive an easily-implemented Newton-type optimization algorithm.
△ Less
Submitted 23 March, 2018; v1 submitted 26 April, 2017;
originally announced April 2017.