-
A Fast and Accurate Splitting Method for Optimal Transport: Analysis and Implementation
Authors:
Vien V. Mai,
Jacob Lindbäck,
Mikael Johansson
Abstract:
We develop a fast and reliable method for solving large-scale optimal transport (OT) problems at an unprecedented combination of speed and accuracy. Built on the celebrated Douglas-Rachford splitting technique, our method tackles the original OT problem directly instead of solving an approximate regularized problem, as many state-of-the-art techniques do. This allows us to provide sparse transport…
▽ More
We develop a fast and reliable method for solving large-scale optimal transport (OT) problems at an unprecedented combination of speed and accuracy. Built on the celebrated Douglas-Rachford splitting technique, our method tackles the original OT problem directly instead of solving an approximate regularized problem, as many state-of-the-art techniques do. This allows us to provide sparse transport plans and avoid numerical issues of methods that use entropic regularization. The algorithm has the same cost per iteration as the popular Sinkhorn method, and each iteration can be executed efficiently, in parallel. The proposed method enjoys an iteration complexity $O(1/ε)$ compared to the best-known $O(1/ε^2)$ of the Sinkhorn method. In addition, we establish a linear convergence rate for our formulation of the OT problem. We detail an efficient GPU implementation of the proposed method that maintains a primal-dual stop** criterion at no extra cost. Substantial experiments demonstrate the effectiveness of our method, both in terms of computation times and robustness.
△ Less
Submitted 22 October, 2021;
originally announced October 2021.
-
Stability and Convergence of Stochastic Gradient Clip**: Beyond Lipschitz Continuity and Smoothness
Authors:
Vien V. Mai,
Mikael Johansson
Abstract:
Stochastic gradient algorithms are often unstable when applied to functions that do not have Lipschitz-continuous and/or bounded gradients. Gradient clip** is a simple and effective technique to stabilize the training process for problems that are prone to the exploding gradient problem. Despite its widespread popularity, the convergence properties of the gradient clip** heuristic are poorly u…
▽ More
Stochastic gradient algorithms are often unstable when applied to functions that do not have Lipschitz-continuous and/or bounded gradients. Gradient clip** is a simple and effective technique to stabilize the training process for problems that are prone to the exploding gradient problem. Despite its widespread popularity, the convergence properties of the gradient clip** heuristic are poorly understood, especially for stochastic problems. This paper establishes both qualitative and quantitative convergence results of the clipped stochastic (sub)gradient method (SGD) for non-smooth convex functions with rapidly growing subgradients. Our analyses show that clip** enhances the stability of SGD and that the clipped SGD algorithm enjoys finite convergence rates in many cases. We also study the convergence of a clipped method with momentum, which includes clipped SGD as a special case, for weakly convex problems under standard assumptions. With a novel Lyapunov analysis, we show that the proposed method achieves the best-known rate for the considered class of problems, demonstrating the effectiveness of clipped methods also in this regime. Numerical results confirm our theoretical developments.
△ Less
Submitted 10 June, 2021; v1 submitted 12 February, 2021;
originally announced February 2021.
-
Convergence of a Stochastic Gradient Method with Momentum for Non-Smooth Non-Convex Optimization
Authors:
Vien V. Mai,
Mikael Johansson
Abstract:
Stochastic gradient methods with momentum are widely used in applications and at the core of optimization subroutines in many popular machine learning libraries. However, their sample complexities have not been obtained for problems beyond those that are convex or smooth. This paper establishes the convergence rate of a stochastic subgradient method with a momentum term of Polyak type for a broad…
▽ More
Stochastic gradient methods with momentum are widely used in applications and at the core of optimization subroutines in many popular machine learning libraries. However, their sample complexities have not been obtained for problems beyond those that are convex or smooth. This paper establishes the convergence rate of a stochastic subgradient method with a momentum term of Polyak type for a broad class of non-smooth, non-convex, and constrained optimization problems. Our key innovation is the construction of a special Lyapunov function for which the proven complexity can be achieved without any tuning of the momentum parameter. For smooth problems, we extend the known complexity bound to the constrained case and demonstrate how the unconstrained case can be analyzed under weaker assumptions than the state-of-the-art. Numerical results confirm our theoretical developments.
△ Less
Submitted 11 February, 2021; v1 submitted 13 February, 2020;
originally announced February 2020.
-
Anderson Acceleration of Proximal Gradient Methods
Authors:
Vien V. Mai,
Mikael Johansson
Abstract:
Anderson acceleration is a well-established and simple technique for speeding up fixed-point computations with countless applications. Previous studies of Anderson acceleration in optimization have only been able to provide convergence guarantees for unconstrained and smooth problems. This work introduces novel methods for adapting Anderson acceleration to (non-smooth and constrained) proximal gra…
▽ More
Anderson acceleration is a well-established and simple technique for speeding up fixed-point computations with countless applications. Previous studies of Anderson acceleration in optimization have only been able to provide convergence guarantees for unconstrained and smooth problems. This work introduces novel methods for adapting Anderson acceleration to (non-smooth and constrained) proximal gradient algorithms. Under some technical conditions, we extend the existing local convergence results of Anderson acceleration for smooth fixed-point map**s to the proposed scheme. We also prove analytically that it is not, in general, possible to guarantee global convergence of native Anderson acceleration. We therefore propose a simple scheme for stabilization that combines the global worst-case guarantees of proximal gradient methods with the local adaptation and practical speed-up of Anderson acceleration.
△ Less
Submitted 15 June, 2020; v1 submitted 18 October, 2019;
originally announced October 2019.
-
Noisy Accelerated Power Method for Eigenproblems with Applications
Authors:
Vien V. Mai,
Mikael Johansson
Abstract:
This paper introduces an efficient algorithm for finding the dominant generalized eigenvectors of a pair of symmetric matrices. Combining tools from approximation theory and convex optimization, we develop a simple scalable algorithm with strong theoretical performance guarantees. More precisely, the algorithm retains the simplicity of the well-known power method but enjoys the asymptotic iteratio…
▽ More
This paper introduces an efficient algorithm for finding the dominant generalized eigenvectors of a pair of symmetric matrices. Combining tools from approximation theory and convex optimization, we develop a simple scalable algorithm with strong theoretical performance guarantees. More precisely, the algorithm retains the simplicity of the well-known power method but enjoys the asymptotic iteration complexity of the powerful Lanczos method. Unlike these classic techniques, our algorithm is designed to decompose the overall problem into a series of subproblems that only need to be solved approximately. The combination of good initializations, fast iterative solvers, and appropriate error control in solving the subproblems lead to a linear running time in the input sizes compared to the superlinear time for the traditional methods. The improved running time immediately offers acceleration for several applications. As an example, we demonstrate how the proposed algorithm can be used to accelerate canonical correlation analysis, which is a fundamental statistical tool for learning of a low-dimensional representation of high-dimensional objects. Numerical experiments on real-world data sets confirm that our approach yields significant improvements over the current state-of-the-art.
△ Less
Submitted 20 March, 2019;
originally announced March 2019.
-
Curvature-Exploiting Acceleration of Elastic Net Computations
Authors:
Vien V. Mai,
Mikael Johansson
Abstract:
This paper introduces an efficient second-order method for solving the elastic net problem. Its key innovation is a computationally efficient technique for injecting curvature information in the optimization process which admits a strong theoretical performance guarantee. In particular, we show improved run time over popular first-order methods and quantify the speed-up in terms of statistical mea…
▽ More
This paper introduces an efficient second-order method for solving the elastic net problem. Its key innovation is a computationally efficient technique for injecting curvature information in the optimization process which admits a strong theoretical performance guarantee. In particular, we show improved run time over popular first-order methods and quantify the speed-up in terms of statistical measures of the data matrix. The improved time complexity is the result of an extensive exploitation of the problem structure and a careful combination of second-order information, variance reduction techniques, and momentum acceleration. Beside theoretical speed-up, experimental results demonstrate great practical performance benefits of curvature information, especially for ill-conditioned data sets.
△ Less
Submitted 24 January, 2019;
originally announced January 2019.
-
Wireless Power Transfer for Distributed Estimation in Sensor Networks
Authors:
Vien V. Mai,
Won-Yong Shin,
Koji Ishibashi
Abstract:
This paper studies power allocation for distributed estimation of an unknown scalar random source in sensor networks with a multiple-antenna fusion center (FC), where wireless sensors are equipped with radio-frequency based energy harvesting technology. The sensors' observation is locally processed by using an uncoded amplify-and-forward scheme. The processed signals are then sent to the FC, and a…
▽ More
This paper studies power allocation for distributed estimation of an unknown scalar random source in sensor networks with a multiple-antenna fusion center (FC), where wireless sensors are equipped with radio-frequency based energy harvesting technology. The sensors' observation is locally processed by using an uncoded amplify-and-forward scheme. The processed signals are then sent to the FC, and are coherently combined at the FC, at which the best linear unbiased estimator (BLUE) is adopted for reliable estimation. We aim to solve the following two power allocation problems: 1) minimizing distortion under various power constraints; and 2) minimizing total transmit power under distortion constraints, where the distortion is measured in terms of mean-squared error of the BLUE. Two iterative algorithms are developed to solve the non-convex problems, which converge at least to a local optimum. In particular, the above algorithms are designed to jointly optimize the amplification coefficients, energy beamforming, and receive filtering. For each problem, a suboptimal design, a single-antenna FC scenario, and a common harvester deployment for colocated sensors, are also studied. Using the powerful semidefinite relaxation framework, our result is shown to be valid for any number of sensors, each with different noise power, and for an arbitrarily number of antennas at the FC.
△ Less
Submitted 2 March, 2017;
originally announced March 2017.
-
Opportunistic Network Decoupling With Virtual Full-Duplex Operation in Multi-Source Interfering Relay Networks
Authors:
Won-Yong Shin,
Vien V. Mai,
Bang Chul Jung,
Hyun Jong Yang
Abstract:
We introduce a new achievability scheme, termed opportunistic network decoupling (OND), operating in virtual full-duplex mode. In the scheme, a novel relay scheduling strategy is utilized in the $K\times N\times K$ channel with interfering relays, consisting of $K$ source--destination pairs and $N$ half-duplex relays in-between them. A subset of relays using alternate relaying is opportunistically…
▽ More
We introduce a new achievability scheme, termed opportunistic network decoupling (OND), operating in virtual full-duplex mode. In the scheme, a novel relay scheduling strategy is utilized in the $K\times N\times K$ channel with interfering relays, consisting of $K$ source--destination pairs and $N$ half-duplex relays in-between them. A subset of relays using alternate relaying is opportunistically selected in terms of producing the minimum total interference level, thereby resulting in network decoupling. As our main result, it is shown that under a certain relay scaling condition, the OND protocol achieves $K$ degrees of freedom even in the presence of interfering links among relays. Numerical evaluation is also shown to validate the performance of the proposed OND. Our protocol basically operates in a fully distributed fashion along with local channel state information, thereby resulting in a relatively easy implementation.
△ Less
Submitted 24 September, 2016;
originally announced September 2016.