Search | arXiv e-print repository

A Fast and Accurate Splitting Method for Optimal Transport: Analysis and Implementation

Authors: Vien V. Mai, Jacob Lindbäck, Mikael Johansson

Abstract: We develop a fast and reliable method for solving large-scale optimal transport (OT) problems at an unprecedented combination of speed and accuracy. Built on the celebrated Douglas-Rachford splitting technique, our method tackles the original OT problem directly instead of solving an approximate regularized problem, as many state-of-the-art techniques do. This allows us to provide sparse transport… ▽ More We develop a fast and reliable method for solving large-scale optimal transport (OT) problems at an unprecedented combination of speed and accuracy. Built on the celebrated Douglas-Rachford splitting technique, our method tackles the original OT problem directly instead of solving an approximate regularized problem, as many state-of-the-art techniques do. This allows us to provide sparse transport plans and avoid numerical issues of methods that use entropic regularization. The algorithm has the same cost per iteration as the popular Sinkhorn method, and each iteration can be executed efficiently, in parallel. The proposed method enjoys an iteration complexity $O(1/ε)$ compared to the best-known $O(1/ε^2)$ of the Sinkhorn method. In addition, we establish a linear convergence rate for our formulation of the OT problem. We detail an efficient GPU implementation of the proposed method that maintains a primal-dual stop** criterion at no extra cost. Substantial experiments demonstrate the effectiveness of our method, both in terms of computation times and robustness. △ Less

Submitted 22 October, 2021; originally announced October 2021.

Comments: 24 pages, 4 figures

arXiv:2102.06489 [pdf, other]

Stability and Convergence of Stochastic Gradient Clip**: Beyond Lipschitz Continuity and Smoothness

Authors: Vien V. Mai, Mikael Johansson

Abstract: Stochastic gradient algorithms are often unstable when applied to functions that do not have Lipschitz-continuous and/or bounded gradients. Gradient clip** is a simple and effective technique to stabilize the training process for problems that are prone to the exploding gradient problem. Despite its widespread popularity, the convergence properties of the gradient clip** heuristic are poorly u… ▽ More Stochastic gradient algorithms are often unstable when applied to functions that do not have Lipschitz-continuous and/or bounded gradients. Gradient clip** is a simple and effective technique to stabilize the training process for problems that are prone to the exploding gradient problem. Despite its widespread popularity, the convergence properties of the gradient clip** heuristic are poorly understood, especially for stochastic problems. This paper establishes both qualitative and quantitative convergence results of the clipped stochastic (sub)gradient method (SGD) for non-smooth convex functions with rapidly growing subgradients. Our analyses show that clip** enhances the stability of SGD and that the clipped SGD algorithm enjoys finite convergence rates in many cases. We also study the convergence of a clipped method with momentum, which includes clipped SGD as a special case, for weakly convex problems under standard assumptions. With a novel Lyapunov analysis, we show that the proposed method achieves the best-known rate for the considered class of problems, demonstrating the effectiveness of clipped methods also in this regime. Numerical results confirm our theoretical developments. △ Less

Submitted 10 June, 2021; v1 submitted 12 February, 2021; originally announced February 2021.

Comments: ICML-2021

arXiv:2002.05466 [pdf, other]

Convergence of a Stochastic Gradient Method with Momentum for Non-Smooth Non-Convex Optimization

Authors: Vien V. Mai, Mikael Johansson

Abstract: Stochastic gradient methods with momentum are widely used in applications and at the core of optimization subroutines in many popular machine learning libraries. However, their sample complexities have not been obtained for problems beyond those that are convex or smooth. This paper establishes the convergence rate of a stochastic subgradient method with a momentum term of Polyak type for a broad… ▽ More Stochastic gradient methods with momentum are widely used in applications and at the core of optimization subroutines in many popular machine learning libraries. However, their sample complexities have not been obtained for problems beyond those that are convex or smooth. This paper establishes the convergence rate of a stochastic subgradient method with a momentum term of Polyak type for a broad class of non-smooth, non-convex, and constrained optimization problems. Our key innovation is the construction of a special Lyapunov function for which the proven complexity can be achieved without any tuning of the momentum parameter. For smooth problems, we extend the known complexity bound to the constrained case and demonstrate how the unconstrained case can be analyzed under weaker assumptions than the state-of-the-art. Numerical results confirm our theoretical developments. △ Less

Submitted 11 February, 2021; v1 submitted 13 February, 2020; originally announced February 2020.

Comments: ICML-2020

arXiv:1910.08590 [pdf, other]

Anderson Acceleration of Proximal Gradient Methods

Authors: Vien V. Mai, Mikael Johansson

Abstract: Anderson acceleration is a well-established and simple technique for speeding up fixed-point computations with countless applications. Previous studies of Anderson acceleration in optimization have only been able to provide convergence guarantees for unconstrained and smooth problems. This work introduces novel methods for adapting Anderson acceleration to (non-smooth and constrained) proximal gra… ▽ More Anderson acceleration is a well-established and simple technique for speeding up fixed-point computations with countless applications. Previous studies of Anderson acceleration in optimization have only been able to provide convergence guarantees for unconstrained and smooth problems. This work introduces novel methods for adapting Anderson acceleration to (non-smooth and constrained) proximal gradient algorithms. Under some technical conditions, we extend the existing local convergence results of Anderson acceleration for smooth fixed-point map**s to the proposed scheme. We also prove analytically that it is not, in general, possible to guarantee global convergence of native Anderson acceleration. We therefore propose a simple scheme for stabilization that combines the global worst-case guarantees of proximal gradient methods with the local adaptation and practical speed-up of Anderson acceleration. △ Less

Submitted 15 June, 2020; v1 submitted 18 October, 2019; originally announced October 2019.

Comments: 25 pages, 7 figures

arXiv:1903.08742 [pdf, ps, other]

doi 10.1109/TSP.2019.2908126

Noisy Accelerated Power Method for Eigenproblems with Applications

Authors: Vien V. Mai, Mikael Johansson

Abstract: This paper introduces an efficient algorithm for finding the dominant generalized eigenvectors of a pair of symmetric matrices. Combining tools from approximation theory and convex optimization, we develop a simple scalable algorithm with strong theoretical performance guarantees. More precisely, the algorithm retains the simplicity of the well-known power method but enjoys the asymptotic iteratio… ▽ More This paper introduces an efficient algorithm for finding the dominant generalized eigenvectors of a pair of symmetric matrices. Combining tools from approximation theory and convex optimization, we develop a simple scalable algorithm with strong theoretical performance guarantees. More precisely, the algorithm retains the simplicity of the well-known power method but enjoys the asymptotic iteration complexity of the powerful Lanczos method. Unlike these classic techniques, our algorithm is designed to decompose the overall problem into a series of subproblems that only need to be solved approximately. The combination of good initializations, fast iterative solvers, and appropriate error control in solving the subproblems lead to a linear running time in the input sizes compared to the superlinear time for the traditional methods. The improved running time immediately offers acceleration for several applications. As an example, we demonstrate how the proposed algorithm can be used to accelerate canonical correlation analysis, which is a fundamental statistical tool for learning of a low-dimensional representation of high-dimensional objects. Numerical experiments on real-world data sets confirm that our approach yields significant improvements over the current state-of-the-art. △ Less

Submitted 20 March, 2019; originally announced March 2019.

Comments: Accepted for publication in the IEEE Transaction on Signal Processing

arXiv:1901.08523 [pdf, ps, other]

Curvature-Exploiting Acceleration of Elastic Net Computations

Authors: Vien V. Mai, Mikael Johansson

Abstract: This paper introduces an efficient second-order method for solving the elastic net problem. Its key innovation is a computationally efficient technique for injecting curvature information in the optimization process which admits a strong theoretical performance guarantee. In particular, we show improved run time over popular first-order methods and quantify the speed-up in terms of statistical mea… ▽ More This paper introduces an efficient second-order method for solving the elastic net problem. Its key innovation is a computationally efficient technique for injecting curvature information in the optimization process which admits a strong theoretical performance guarantee. In particular, we show improved run time over popular first-order methods and quantify the speed-up in terms of statistical measures of the data matrix. The improved time complexity is the result of an extensive exploitation of the problem structure and a careful combination of second-order information, variance reduction techniques, and momentum acceleration. Beside theoretical speed-up, experimental results demonstrate great practical performance benefits of curvature information, especially for ill-conditioned data sets. △ Less

Submitted 24 January, 2019; originally announced January 2019.

Comments: 34 pages, 2 figures

arXiv:1703.00714 [pdf, ps, other]

doi 10.1109/JSTSP.2017.2678106

Wireless Power Transfer for Distributed Estimation in Sensor Networks

Authors: Vien V. Mai, Won-Yong Shin, Koji Ishibashi

Abstract: This paper studies power allocation for distributed estimation of an unknown scalar random source in sensor networks with a multiple-antenna fusion center (FC), where wireless sensors are equipped with radio-frequency based energy harvesting technology. The sensors' observation is locally processed by using an uncoded amplify-and-forward scheme. The processed signals are then sent to the FC, and a… ▽ More This paper studies power allocation for distributed estimation of an unknown scalar random source in sensor networks with a multiple-antenna fusion center (FC), where wireless sensors are equipped with radio-frequency based energy harvesting technology. The sensors' observation is locally processed by using an uncoded amplify-and-forward scheme. The processed signals are then sent to the FC, and are coherently combined at the FC, at which the best linear unbiased estimator (BLUE) is adopted for reliable estimation. We aim to solve the following two power allocation problems: 1) minimizing distortion under various power constraints; and 2) minimizing total transmit power under distortion constraints, where the distortion is measured in terms of mean-squared error of the BLUE. Two iterative algorithms are developed to solve the non-convex problems, which converge at least to a local optimum. In particular, the above algorithms are designed to jointly optimize the amplification coefficients, energy beamforming, and receive filtering. For each problem, a suboptimal design, a single-antenna FC scenario, and a common harvester deployment for colocated sensors, are also studied. Using the powerful semidefinite relaxation framework, our result is shown to be valid for any number of sensors, each with different noise power, and for an arbitrarily number of antennas at the FC. △ Less

Submitted 2 March, 2017; originally announced March 2017.

Comments: 24 pages, 6 figures, To appear in IEEE Journal of Selected Topics in Signal Processing

arXiv:1609.07589 [pdf, ps, other]

doi 10.1109/TMC.2016.2614979

Opportunistic Network Decoupling With Virtual Full-Duplex Operation in Multi-Source Interfering Relay Networks

Authors: Won-Yong Shin, Vien V. Mai, Bang Chul Jung, Hyun Jong Yang

Abstract: We introduce a new achievability scheme, termed opportunistic network decoupling (OND), operating in virtual full-duplex mode. In the scheme, a novel relay scheduling strategy is utilized in the $K\times N\times K$ channel with interfering relays, consisting of $K$ source--destination pairs and $N$ half-duplex relays in-between them. A subset of relays using alternate relaying is opportunistically… ▽ More We introduce a new achievability scheme, termed opportunistic network decoupling (OND), operating in virtual full-duplex mode. In the scheme, a novel relay scheduling strategy is utilized in the $K\times N\times K$ channel with interfering relays, consisting of $K$ source--destination pairs and $N$ half-duplex relays in-between them. A subset of relays using alternate relaying is opportunistically selected in terms of producing the minimum total interference level, thereby resulting in network decoupling. As our main result, it is shown that under a certain relay scaling condition, the OND protocol achieves $K$ degrees of freedom even in the presence of interfering links among relays. Numerical evaluation is also shown to validate the performance of the proposed OND. Our protocol basically operates in a fully distributed fashion along with local channel state information, thereby resulting in a relatively easy implementation. △ Less

Submitted 24 September, 2016; originally announced September 2016.

Comments: 22 pages, 5 figures, To appear in IEEE Transactions on Mobile Computing

Showing 1–8 of 8 results for author: Mai, V V