-
Practical Acceleration of the Condat-Vũ Algorithm
Authors:
Derek Driggs,
Matthias J. Ehrhardt,
Carola-Bibiane Schönlieb,
Junqi Tang
Abstract:
The Condat-Vũ algorithm is a widely used primal-dual method for optimizing composite objectives of three functions. Several algorithms for optimizing composite objectives of two functions are special cases of Condat-Vũ, including proximal gradient descent (PGD). It is well-known that PGD exhibits suboptimal performance, and a simple adjustment to PGD can accelerate its convergence rate from…
▽ More
The Condat-Vũ algorithm is a widely used primal-dual method for optimizing composite objectives of three functions. Several algorithms for optimizing composite objectives of two functions are special cases of Condat-Vũ, including proximal gradient descent (PGD). It is well-known that PGD exhibits suboptimal performance, and a simple adjustment to PGD can accelerate its convergence rate from $\mathcal{O}(1/T)$ to $\mathcal{O}(1/T^2)$ on convex objectives, and this accelerated rate is optimal. In this work, we show that a simple adjustment to the Condat-Vũ algorithm allows it to recover accelerated PGD (APGD) as a special case, instead of PGD. We prove that this accelerated Condat--Vũ algorithm achieves optimal convergence rates and significantly outperforms the traditional Condat-Vũ algorithm in regimes where the Condat--Vũ algorithm approximates the dynamics of PGD. We demonstrate the effectiveness of our approach in various applications in machine learning and computational imaging.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
SPRING: A fast stochastic proximal alternating method for non-smooth non-convex optimization
Authors:
Derek Driggs,
Junqi Tang,
**gwei Liang,
Mike Davies,
Carola-Bibiane Schönlieb
Abstract:
We introduce SPRING, a novel stochastic proximal alternating linearized minimization algorithm for solving a class of non-smooth and non-convex optimization problems. Large-scale imaging problems are becoming increasingly prevalent due to advances in data acquisition and computational capabilities. Motivated by the success of stochastic optimization methods, we propose a stochastic variant of prox…
▽ More
We introduce SPRING, a novel stochastic proximal alternating linearized minimization algorithm for solving a class of non-smooth and non-convex optimization problems. Large-scale imaging problems are becoming increasingly prevalent due to advances in data acquisition and computational capabilities. Motivated by the success of stochastic optimization methods, we propose a stochastic variant of proximal alternating linearized minimization (PALM) algorithm \cite{bolte2014proximal}. We provide global convergence guarantees, demonstrating that our proposed method with variance-reduced stochastic gradient estimators, such as SAGA \cite{SAGA} and SARAH \cite{sarah}, achieves state-of-the-art oracle complexities. We also demonstrate the efficacy of our algorithm via several numerical examples including sparse non-negative matrix factorization, sparse principal component analysis, and blind image deconvolution.
△ Less
Submitted 19 January, 2021; v1 submitted 27 February, 2020;
originally announced February 2020.
-
Accelerating Variance-Reduced Stochastic Gradient Methods
Authors:
Derek Driggs,
Matthias J. Ehrhardt,
Carola-Bibiane Schönlieb
Abstract:
Variance reduction is a crucial tool for improving the slow convergence of stochastic gradient descent. Only a few variance-reduced methods, however, have yet been shown to directly benefit from Nesterov's acceleration techniques to match the convergence rates of accelerated gradient methods. Such approaches rely on "negative momentum", a technique for further variance reduction that is generally…
▽ More
Variance reduction is a crucial tool for improving the slow convergence of stochastic gradient descent. Only a few variance-reduced methods, however, have yet been shown to directly benefit from Nesterov's acceleration techniques to match the convergence rates of accelerated gradient methods. Such approaches rely on "negative momentum", a technique for further variance reduction that is generally specific to the SVRG gradient estimator. In this work, we show that negative momentum is unnecessary for acceleration and develop a universal acceleration framework that allows all popular variance-reduced methods to achieve accelerated convergence rates. The constants appearing in these rates, including their dependence on the number of functions $n$, scale with the mean-squared-error and bias of the gradient estimator. In a series of numerical experiments, we demonstrate that versions of SAGA, SVRG, SARAH, and SARGE using our framework significantly outperform non-accelerated versions and compare favourably with algorithms using negative momentum.
△ Less
Submitted 29 October, 2020; v1 submitted 21 October, 2019;
originally announced October 2019.
-
On Biased Stochastic Gradient Estimation
Authors:
Derek Driggs,
**gwei Liang,
Carola-Bibiane Schönlieb
Abstract:
We present a uniform analysis of biased stochastic gradient methods for minimizing convex, strongly convex, and non-convex composite objectives, and identify settings where bias is useful in stochastic gradient estimation. The framework we present allows us to extend proximal support to biased algorithms, including SAG and SARAH, for the first time in the convex setting. We also use our framework…
▽ More
We present a uniform analysis of biased stochastic gradient methods for minimizing convex, strongly convex, and non-convex composite objectives, and identify settings where bias is useful in stochastic gradient estimation. The framework we present allows us to extend proximal support to biased algorithms, including SAG and SARAH, for the first time in the convex setting. We also use our framework to develop a new algorithm, Stochastic Average Recursive GradiEnt (SARGE), that achieves the oracle complexity lower-bound for non-convex, finite-sum objectives and requires strictly fewer calls to a stochastic gradient oracle per iteration than SVRG and SARAH. We support our theoretical results with numerical experiments that demonstrate the benefits of certain biased gradient estimators.
△ Less
Submitted 27 February, 2020; v1 submitted 3 June, 2019;
originally announced June 2019.
-
Tensor Robust Principal Component Analysis: Better recovery with atomic norm regularization
Authors:
Derek Driggs,
Stephen Becker,
Jordan Boyd-Graber
Abstract:
This paper studies tensor-based Robust Principal Component Analysis (RPCA) using atomic-norm regularization. Given the superposition of a sparse and a low-rank tensor, we present conditions under which it is possible to exactly recover the sparse and low-rank components. Our results improve on existing performance guarantees for tensor-RPCA, including those for matrix RPCA. Our guarantees also sho…
▽ More
This paper studies tensor-based Robust Principal Component Analysis (RPCA) using atomic-norm regularization. Given the superposition of a sparse and a low-rank tensor, we present conditions under which it is possible to exactly recover the sparse and low-rank components. Our results improve on existing performance guarantees for tensor-RPCA, including those for matrix RPCA. Our guarantees also show that atomic-norm regularization provides better recovery for tensor-structured data sets than other approaches based on matricization.
In addition to these performance guarantees, we study a nonconvex formulation of the tensor atomic-norm and identify a class of local minima of this nonconvex program that are globally optimal. We demonstrate the strong performance of our approach in numerical experiments, where we show that our nonconvex model reliably recovers tensors with ranks larger than all of their side lengths, significantly outperforming other algorithms that require matricization.
△ Less
Submitted 30 January, 2019;
originally announced January 2019.
-
Adapting Regularized Low Rank Models for Parallel Architectures
Authors:
Derek Driggs,
Stephen Becker,
Aleksandr Aravkin
Abstract:
We introduce a reformulation of regularized low-rank recovery models to take advantage of GPU, multiple CPU, and hybridized architectures. Low-rank recovery often involves nuclear-norm minimization through iterative thresholding of singular values. These models are slow to fit and difficult to parallelize because of their dependence on computing a singular value decomposition at each iteration. Re…
▽ More
We introduce a reformulation of regularized low-rank recovery models to take advantage of GPU, multiple CPU, and hybridized architectures. Low-rank recovery often involves nuclear-norm minimization through iterative thresholding of singular values. These models are slow to fit and difficult to parallelize because of their dependence on computing a singular value decomposition at each iteration. Regularized low-rank recovery models also incorporate non-smooth terms to separate structured components (e.g. sparse outliers) from the low-rank component, making these problems more difficult.
Using Burer-Monteiro splitting and marginalization, we develop a smooth, non-convex formulation of regularized low-rank recovery models that can be fit with first-order solvers. We develop a computable certificate of convergence for this non-convex program, and use it to establish bounds on the suboptimality of any point. Using robust principal component analysis (RPCA) as an example, we include numerical experiments showing that this approach is an order-of-magnitude faster than existing RPCA solvers on the GPU. We also show that this acceleration allows new applications for RPCA, including real-time background subtraction and MR image analysis.
△ Less
Submitted 4 October, 2017; v1 submitted 7 February, 2017;
originally announced February 2017.