Search | arXiv e-print repository

Conditional Optimal Transport on Function Spaces

Authors: Bamdad Hosseini, Alexander W. Hsu, Amirhossein Taghvaei

Abstract: We present a systematic study of conditional triangular transport maps in function spaces from the perspective of optimal transportation and with a view towards amortized Bayesian inference. More specifically, we develop a theory of constrained optimal transport problems that describe block-triangular Monge maps that characterize conditional measures along with their Kantorovich relaxations. This… ▽ More We present a systematic study of conditional triangular transport maps in function spaces from the perspective of optimal transportation and with a view towards amortized Bayesian inference. More specifically, we develop a theory of constrained optimal transport problems that describe block-triangular Monge maps that characterize conditional measures along with their Kantorovich relaxations. This generalizes the theory of optimal triangular transport to separable infinite-dimensional function spaces with general cost functions. We further tailor our results to the case of Bayesian inference problems and obtain regularity estimates on the conditioning maps from the prior to the posterior. Finally, we present numerical experiments that demonstrate the computational applicability of our theoretical results for amortized and likelihood-free inference of functional parameters. △ Less

Submitted 6 February, 2024; v1 submitted 9 November, 2023; originally announced November 2023.

MSC Class: 49Q22; 62G86; 62F15; 60B05

arXiv:2203.11869 [pdf, other]

An Optimal Transport Formulation of Bayes' Law for Nonlinear Filtering Algorithms

Authors: Amirhossein Taghvaei, Bamdad Hosseini

Abstract: This paper presents a variational representation of the Bayes' law using optimal transportation theory. The variational representation is in terms of the optimal transportation between the joint distribution of the (state, observation) and their independent coupling. By imposing certain structure on the transport map, the solution to the variational problem is used to construct a Brenier-type map… ▽ More This paper presents a variational representation of the Bayes' law using optimal transportation theory. The variational representation is in terms of the optimal transportation between the joint distribution of the (state, observation) and their independent coupling. By imposing certain structure on the transport map, the solution to the variational problem is used to construct a Brenier-type map that transports the prior distribution to the posterior distribution for any value of the observation signal. The new formulation is used to derive the optimal transport form of the Ensemble Kalman filter (EnKF) for the discrete-time filtering problem and propose a novel extension of EnKF to the non-Gaussian setting utilizing input convex neural networks. Finally, the proposed methodology is used to derive the optimal transport form of the feedback particle filler (FPF) in the continuous-time limit, which constitutes its first variational construction without explicitly using the nonlinear filtering equation or Bayes' law. △ Less

Submitted 13 September, 2022; v1 submitted 22 March, 2022; originally announced March 2022.

arXiv:2010.01183 [pdf, other]

doi 10.1109/CDC42340.2020.9304260

Deep FPF: Gain function approximation in high-dimensional setting

Authors: S. Yagiz Olmez, Amirhossein Taghvaei, Prashant G. Mehta

Abstract: In this paper, we present a novel approach to approximate the gain function of the feedback particle filter (FPF). The exact gain function is the solution of a Poisson equation involving a probability-weighted Laplacian. The numerical problem is to approximate the exact gain function using only finitely many particles sampled from the probability distribution. Inspired by the recent success of t… ▽ More In this paper, we present a novel approach to approximate the gain function of the feedback particle filter (FPF). The exact gain function is the solution of a Poisson equation involving a probability-weighted Laplacian. The numerical problem is to approximate the exact gain function using only finitely many particles sampled from the probability distribution. Inspired by the recent success of the deep learning methods, we represent the gain function as a gradient of the output of a neural network. Thereupon considering a certain variational formulation of the Poisson equation, an optimization problem is posed for learning the weights of the neural network. A stochastic gradient algorithm is described for this purpose. The proposed approach has two significant properties/advantages: (i) The stochastic optimization algorithm allows one to process, in parallel, only a batch of samples (particles) ensuring good scaling properties with the number of particles; (ii) The remarkable representation power of neural networks means that the algorithm is potentially applicable and useful to solve high-dimensional problems. We numerically establish these two properties and provide extensive comparison to the existing approaches. △ Less

Submitted 2 October, 2020; originally announced October 2020.

Comments: To be presented at 59th IEEE Conference on Decision and Control, 2020

arXiv:2007.04462 [pdf, other]

Scalable Computations of Wasserstein Barycenter via Input Convex Neural Networks

Authors: Jiaojiao Fan, Amirhossein Taghvaei, Yongxin Chen

Abstract: Wasserstein Barycenter is a principled approach to represent the weighted mean of a given set of probability distributions, utilizing the geometry induced by optimal transport. In this work, we present a novel scalable algorithm to approximate the Wasserstein Barycenters aiming at high-dimensional applications in machine learning. Our proposed algorithm is based on the Kantorovich dual formulation… ▽ More Wasserstein Barycenter is a principled approach to represent the weighted mean of a given set of probability distributions, utilizing the geometry induced by optimal transport. In this work, we present a novel scalable algorithm to approximate the Wasserstein Barycenters aiming at high-dimensional applications in machine learning. Our proposed algorithm is based on the Kantorovich dual formulation of the Wasserstein-2 distance as well as a recent neural network architecture, input convex neural network, that is known to parametrize convex functions. The distinguishing features of our method are: i) it only requires samples from the marginal distributions; ii) unlike the existing approaches, it represents the Barycenter with a generative model and can thus generate infinite samples from the barycenter without querying the marginal distributions; iii) it works similar to Generative Adversarial Model in one marginal case. We demonstrate the efficacy of our algorithm by comparing it with the state-of-art methods in multiple experiments. △ Less

Submitted 26 November, 2021; v1 submitted 8 July, 2020; originally announced July 2020.

Comments: 21 pages

MSC Class: 49Q22; 62Dxx; 62F15

arXiv:2005.09152 [pdf, other]

Lasso formulation of the shortest path problem

Authors: Anqi Dong, Amirhossein Taghvaei, Tryphon T. Georgiou

Abstract: The shortest path problem is formulated as an $l_1$-regularized regression problem, known as lasso. Based on this formulation, a connection is established between Dijkstra's shortest path algorithm and the least angle regression (LARS) for the lasso problem. Specifically, the solution path of the lasso problem, obtained by varying the regularization parameter from infinity to zero (the regularizat… ▽ More The shortest path problem is formulated as an $l_1$-regularized regression problem, known as lasso. Based on this formulation, a connection is established between Dijkstra's shortest path algorithm and the least angle regression (LARS) for the lasso problem. Specifically, the solution path of the lasso problem, obtained by varying the regularization parameter from infinity to zero (the regularization path), corresponds to shortest path trees that appear in the bi-directional Dijkstra algorithm. Although Dijkstra's algorithm and the LARS formulation provide exact solutions, they become impractical when the size of the graph is exceedingly large. To overcome this issue, the alternating direction method of multipliers (ADMM) is proposed to solve the lasso formulation. The resulting algorithm produces good and fast approximations of the shortest path by sacrificing exactness that may not be absolutely essential in many applications. Numerical experiments are provided to illustrate the performance of the proposed approach. △ Less

Submitted 22 May, 2020; v1 submitted 18 May, 2020; originally announced May 2020.

Comments: 17 pages

MSC Class: 05C38 (Primary) 62J07; 68R10; 90C25; 90C06(Secondary)

arXiv:1910.02338 [pdf, other]

An Optimal Transport Formulation of the Ensemble Kalman Filter

Authors: Amirhossein Taghvaei, Prashant G. Mehta

Abstract: Controlled interacting particle systems such as the ensemble Kalman filter (EnKF) and the feedback particle filter (FPF) are numerical algorithms to approximate the solution of the nonlinear filtering problem in continuous time. The distinguishing feature of these algorithms is that the Bayesian update step is implemented using a feedback control law. It has been noted in the literature that the c… ▽ More Controlled interacting particle systems such as the ensemble Kalman filter (EnKF) and the feedback particle filter (FPF) are numerical algorithms to approximate the solution of the nonlinear filtering problem in continuous time. The distinguishing feature of these algorithms is that the Bayesian update step is implemented using a feedback control law. It has been noted in the literature that the control law is not unique. This is the main problem addressed in this paper. To obtain a unique control law, the filtering problem is formulated here as an optimal transportation problem. An explicit formula for the (mean-field type) optimal control law is derived in the linear Gaussian setting. Comparisons are made with the control laws for different types of EnKF algorithms described in the literature. Via empirical approximation of the mean-field control law, a finite-$N$ controlled interacting particle algorithm is obtained. For this algorithm, the equations for empirical mean and covariance are derived and shown to be identical to the Kalman filter. This allows strong conclusions on convergence and error properties based on the classical filter stability theory for the Kalman filter. It is shown that, under certain technical conditions, the mean squared error (m.s.e.) converges to zero even with a finite number of particles. A detailed propagation of chaos analysis is carried out for the finite-$N$ algorithm. The analysis is used to prove weak convergence of the empirical distribution as $N\rightarrow\infty$. For a certain simplified filtering problem, analytical comparison of the m.s.e. with the importance sampling-based algorithms is described. The analysis helps explain the favorable scaling properties of the control-based algorithms reported in several numerical studies in recent literature. △ Less

Submitted 5 October, 2019; originally announced October 2019.

arXiv:1908.10962 [pdf, other]

Optimal transport map** via input convex neural networks

Authors: Ashok Vardhan Makkuva, Amirhossein Taghvaei, Sewoong Oh, Jason D. Lee

Abstract: In this paper, we present a novel and principled approach to learn the optimal transport between two distributions, from samples. Guided by the optimal transport theory, we learn the optimal Kantorovich potential which induces the optimal transport map. This involves learning two convex functions, by solving a novel minimax optimization. Building upon recent advances in the field of input convex n… ▽ More In this paper, we present a novel and principled approach to learn the optimal transport between two distributions, from samples. Guided by the optimal transport theory, we learn the optimal Kantorovich potential which induces the optimal transport map. This involves learning two convex functions, by solving a novel minimax optimization. Building upon recent advances in the field of input convex neural networks, we propose a new framework where the gradient of one convex function represents the optimal transport map**. Numerical experiments confirm that we learn the optimal transport map**. This approach ensures that the transport map** we find is optimal independent of how we initialize the neural networks. Further, target distributions from a discontinuous support can be easily captured, as gradient of a convex function naturally models a {\em discontinuous} transport map**. △ Less

Submitted 17 June, 2020; v1 submitted 28 August, 2019; originally announced August 2019.

arXiv:1902.07197 [pdf, other]

2-Wasserstein Approximation via Restricted Convex Potentials with Application to Improved Training for GANs

Authors: Amirhossein Taghvaei, Amin Jalali

Abstract: We provide a framework to approximate the 2-Wasserstein distance and the optimal transport map, amenable to efficient training as well as statistical and geometric analysis. With the quadratic cost and considering the Kantorovich dual form of the optimal transportation problem, the Brenier theorem states that the optimal potential function is convex and the optimal transport map is the gradient of… ▽ More We provide a framework to approximate the 2-Wasserstein distance and the optimal transport map, amenable to efficient training as well as statistical and geometric analysis. With the quadratic cost and considering the Kantorovich dual form of the optimal transportation problem, the Brenier theorem states that the optimal potential function is convex and the optimal transport map is the gradient of the optimal potential function. Using this geometric structure, we restrict the optimization problem to different parametrized classes of convex functions and pay special attention to the class of input-convex neural networks. We analyze the statistical generalization and the discriminative power of the resulting approximate metric, and we prove a restricted moment-matching property for the approximate optimal map. Finally, we discuss a numerical algorithm to solve the restricted optimization problem and provide numerical experiments to illustrate and compare the proposed approach with the established regularization-based approaches. We further discuss practical implications of our proposal in a modular and interpretable design for GANs which connects the generator training with discriminator computations to allow for learning an overall composite generator. △ Less

Submitted 19 February, 2019; originally announced February 2019.

arXiv:1901.03317 [pdf, other]

Accelerated Flow for Probability Distributions

Authors: Amirhossein Taghvaei, Prashant G. Mehta

Abstract: This paper presents a methodology and numerical algorithms for constructing accelerated gradient flows on the space of probability distributions. In particular, we extend the recent variational formulation of accelerated gradient methods in (wibisono, et. al. 2016) from vector valued variables to probability distributions. The variational problem is modeled as a mean-field optimal control problem.… ▽ More This paper presents a methodology and numerical algorithms for constructing accelerated gradient flows on the space of probability distributions. In particular, we extend the recent variational formulation of accelerated gradient methods in (wibisono, et. al. 2016) from vector valued variables to probability distributions. The variational problem is modeled as a mean-field optimal control problem. The maximum principle of optimal control theory is used to derive Hamilton's equations for the optimal gradient flow. The Hamilton's equation are shown to achieve the accelerated form of density transport from any initial probability distribution to a target probability distribution. A quantitative estimate on the asymptotic convergence rate is provided based on a Lyapunov function construction, when the objective functional is displacement convex. Two numerical approximations are presented to implement the Hamilton's equations as a system of $N$ interacting particles. The continuous limit of the Nesterov's algorithm is shown to be a special case with $N=1$. The algorithm is illustrated with numerical examples. △ Less

Submitted 10 January, 2019; v1 submitted 10 January, 2019; originally announced January 2019.

arXiv:1605.06201

Adversarial Delays in Online Strongly-Convex Optimization

Authors: Daniel Khashabi, Kent Quanrud, Amirhossein Taghvaei

Abstract: We consider the problem of strongly-convex online optimization in presence of adversarial delays; in a T-iteration online game, the feedback of the player's query at time t is arbitrarily delayed by an adversary for d_t rounds and delivered before the game ends, at iteration t+d_t-1. Specifically for \algo{online-gradient-descent} algorithm we show it has a simple regret bound of \Oh{\sum_{t=1}^T… ▽ More We consider the problem of strongly-convex online optimization in presence of adversarial delays; in a T-iteration online game, the feedback of the player's query at time t is arbitrarily delayed by an adversary for d_t rounds and delivered before the game ends, at iteration t+d_t-1. Specifically for \algo{online-gradient-descent} algorithm we show it has a simple regret bound of \Oh{\sum_{t=1}^T \log (1+ \frac{d_t}{t})}. This gives a clear and simple bound without resorting any distributional and limiting assumptions on the delays. We further show how this result encompasses and generalizes several of the existing known results in the literature. Specifically it matches the celebrated logarithmic regret \Oh{\log T} when there are no delays (i.e. d_t = 1) and regret bound of \Oh{τ\log T} for constant delays d_t = τ. △ Less

Submitted 11 September, 2019; v1 submitted 19 May, 2016; originally announced May 2016.

Comments: We discovered mistakes in the proof of proof of Theorem 3.1. The overall is no longer correct, although the claim is still true

Showing 1–10 of 10 results for author: Taghvaei, A