Search | arXiv e-print repository

arXiv:2406.18282 [pdf, ps, other]

Frank-Wolfe meets Shapley-Folkman: a systematic approach for solving nonconvex separable problems with linear constraints

Authors: Benjamin Dubois-Taine, Alexandre d'Aspremont

Abstract: We consider separable nonconvex optimization problems under affine constraints. For these problems, the Shapley-Folkman theorem provides an upper bound on the duality gap as a function of the nonconvexity of the objective functions, but does not provide a systematic way to construct primal solutions satisfying that bound. In this work, we develop a two-stage approach to do so. The first stage appr… ▽ More We consider separable nonconvex optimization problems under affine constraints. For these problems, the Shapley-Folkman theorem provides an upper bound on the duality gap as a function of the nonconvexity of the objective functions, but does not provide a systematic way to construct primal solutions satisfying that bound. In this work, we develop a two-stage approach to do so. The first stage approximates the optimal dual value with a large set of primal feasible solutions. In the second stage, this set is trimmed down to a primal solution by computing (approximate) Caratheodory representations. The main computational requirement of our method is tractability of the Fenchel conjugates of the component functions and their (sub)gradients. When the function domains are convex, the method recovers the classical duality gap bounds obtained via Shapley-Folkman. When the function domains are nonconvex, the method also recovers classical duality gap bounds from the literature, based on a more general notion of nonconvexity. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2401.09961 [pdf, other]

Iteratively Reweighted Least Squares for Phase Unwrap**

Authors: Benjamin Dubois-Taine, Roland Akiki, Alexandre d'Aspremont

Abstract: The 2D phase unwrap** problem seeks to recover a phase image from its observation modulo 2$π$, and is a crucial step in a variety of imaging applications. In particular, it is one of the most time-consuming steps in the interferometric synthetic aperture radar (InSAR) pipeline. In this work we tackle the $L^1$-norm phase unwrap** problem. In optimization terms, this is a simple sparsity-induci… ▽ More The 2D phase unwrap** problem seeks to recover a phase image from its observation modulo 2$π$, and is a crucial step in a variety of imaging applications. In particular, it is one of the most time-consuming steps in the interferometric synthetic aperture radar (InSAR) pipeline. In this work we tackle the $L^1$-norm phase unwrap** problem. In optimization terms, this is a simple sparsity-inducing problem, albeit in very large dimension. To solve this high-dimensional problem, we iteratively solve a series of numerically simpler weighted least squares problems, which are themselves solved using a preconditioned conjugate gradient method. Our algorithm guarantees a sublinear rate of convergence in function values, is simple to implement and can easily be ported to GPUs, where it significantly outperforms state of the art phase unwrap** methods. △ Less

Submitted 18 January, 2024; originally announced January 2024.

arXiv:2312.03583 [pdf, ps, other]

Strong Convexity of Sets in Riemannian Manifolds

Authors: Damien Scieur, Thomas Kerdreux, Martínez-Rubio, Alexandre d'Aspremont, Sebastian Pokutta

Abstract: Convex curvature properties are important in designing and analyzing convex optimization algorithms in the Hilbertian or Riemannian settings. In the case of the Hilbertian setting, strongly convex sets are well studied. Herein, we propose various definitions of strong convexity for uniquely geodesic sets in a Riemannian manifold. We study their relationship, propose tools to determine the geodesic… ▽ More Convex curvature properties are important in designing and analyzing convex optimization algorithms in the Hilbertian or Riemannian settings. In the case of the Hilbertian setting, strongly convex sets are well studied. Herein, we propose various definitions of strong convexity for uniquely geodesic sets in a Riemannian manifold. We study their relationship, propose tools to determine the geodesic strongly convex nature of sets, and analyze the convergence of optimization algorithms over those sets. In particular, we demonstrate that the Riemannian Frank-Wolfe algorithm enjoys a global linear convergence rate when the Riemannian scaling inequalities hold. △ Less

Submitted 6 February, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

arXiv:2306.17470 [pdf, other]

An Oblivious Stochastic Composite Optimization Algorithm for Eigenvalue Optimization Problems

Authors: Clément Lezane, Cristóbal Guzmán, Alexandre d'Aspremont

Abstract: In this work, we revisit the problem of solving large-scale semidefinite programs using randomized first-order methods and stochastic smoothing. We introduce two oblivious stochastic mirror descent algorithms based on a complementary composite setting. One algorithm is designed for non-smooth objectives, while an accelerated version is tailored for smooth objectives. Remarkably, both algorithms wo… ▽ More In this work, we revisit the problem of solving large-scale semidefinite programs using randomized first-order methods and stochastic smoothing. We introduce two oblivious stochastic mirror descent algorithms based on a complementary composite setting. One algorithm is designed for non-smooth objectives, while an accelerated version is tailored for smooth objectives. Remarkably, both algorithms work without prior knowledge of the Lipschitz constant or smoothness of the objective function. For the non-smooth case with $\mathcal{M}-$bounded oracles, we prove a convergence rate of $ O( {\mathcal{M}}/{\sqrt{T}} ) $. For the $L$-smooth case with a feasible set bounded by $D$, we derive a convergence rate of $ O( {L^2 D^2}/{(T^{2}\sqrt{T})} + {(D_0^2+σ^2)}/{\sqrt{T}} )$, where $D_0$ is the starting distance to an optimal solution, and $ σ^2$ is the stochastic oracle variance. These rates had only been obtained so far by either assuming prior knowledge of the Lipschitz constant or the starting distance to an optimal solution. We further show how to extend our framework to relative scale and demonstrate the efficiency and robustness of our methods on large scale semidefinite programs. △ Less

Submitted 30 June, 2023; originally announced June 2023.

arXiv:2211.01758 [pdf, other]

Optimal Algorithms for Stochastic Complementary Composite Minimization

Authors: Alexandre d'Aspremont, Cristóbal Guzmán, Clément Lezane

Abstract: Inspired by regularization techniques in statistics and machine learning, we study complementary composite minimization in the stochastic setting. This problem corresponds to the minimization of the sum of a (weakly) smooth function endowed with a stochastic first-order oracle, and a structured uniformly convex (possibly nonsmooth and non-Lipschitz) regularization term. Despite intensive work on c… ▽ More Inspired by regularization techniques in statistics and machine learning, we study complementary composite minimization in the stochastic setting. This problem corresponds to the minimization of the sum of a (weakly) smooth function endowed with a stochastic first-order oracle, and a structured uniformly convex (possibly nonsmooth and non-Lipschitz) regularization term. Despite intensive work on closely related settings, prior to our work no complexity bounds for this problem were known. We close this gap by providing novel excess risk bounds, both in expectation and with high probability. Our algorithms are nearly optimal, which we prove via novel lower complexity bounds for this class of problems. We conclude by providing numerical results comparing our methods to the state of the art. △ Less

Submitted 23 January, 2024; v1 submitted 3 November, 2022; originally announced November 2022.

arXiv:2103.05907 [pdf, ps, other]

Linear Bandits on Uniformly Convex Sets

Authors: Thomas Kerdreux, Christophe Roux, Alexandre d'Aspremont, Sebastian Pokutta

Abstract: Linear bandit algorithms yield $\tilde{\mathcal{O}}(n\sqrt{T})$ pseudo-regret bounds on compact convex action sets $\mathcal{K}\subset\mathbb{R}^n$ and two types of structural assumptions lead to better pseudo-regret bounds. When $\mathcal{K}$ is the simplex or an $\ell_p$ ball with $p\in]1,2]$, there exist bandits algorithms with $\tilde{\mathcal{O}}(\sqrt{nT})$ pseudo-regret bounds. Here, we der… ▽ More Linear bandit algorithms yield $\tilde{\mathcal{O}}(n\sqrt{T})$ pseudo-regret bounds on compact convex action sets $\mathcal{K}\subset\mathbb{R}^n$ and two types of structural assumptions lead to better pseudo-regret bounds. When $\mathcal{K}$ is the simplex or an $\ell_p$ ball with $p\in]1,2]$, there exist bandits algorithms with $\tilde{\mathcal{O}}(\sqrt{nT})$ pseudo-regret bounds. Here, we derive bandit algorithms for some strongly convex sets beyond $\ell_p$ balls that enjoy pseudo-regret bounds of $\tilde{\mathcal{O}}(\sqrt{nT})$, which answers an open question from [BCB12, §5.5.]. Interestingly, when the action set is uniformly convex but not necessarily strongly convex, we obtain pseudo-regret bounds with a dimension dependency smaller than $\mathcal{O}(\sqrt{n})$. However, this comes at the expense of asymptotic rates in $T$ varying between $\tilde{\mathcal{O}}(\sqrt{T})$ and $\tilde{\mathcal{O}}(T)$. △ Less

Submitted 10 March, 2021; originally announced March 2021.

arXiv:2102.06742 [pdf, other]

Approximation Bounds for Sparse Programs

Authors: Armin Askari, Alexandre d'Aspremont, Laurent El Ghaoui

Abstract: We show that sparsity constrained optimization problems over low dimensional spaces tend to have a small duality gap. We use the Shapley-Folkman theorem to derive both data-driven bounds on the duality gap, and an efficient primalization procedure to recover feasible points satisfying these bounds. These error bounds are proportional to the rate of growth of the objective with the target cardinali… ▽ More We show that sparsity constrained optimization problems over low dimensional spaces tend to have a small duality gap. We use the Shapley-Folkman theorem to derive both data-driven bounds on the duality gap, and an efficient primalization procedure to recover feasible points satisfying these bounds. These error bounds are proportional to the rate of growth of the objective with the target cardinality, which means in particular that the relaxation is nearly tight as soon as the target cardinality is large enough so that only uninformative features are added. △ Less

Submitted 12 February, 2021; originally announced February 2021.

arXiv:2102.05134 [pdf, ps, other]

Local and Global Uniform Convexity Conditions

Authors: Thomas Kerdreux, Alexandre d'Aspremont, Sebastian Pokutta

Abstract: We review various characterizations of uniform convexity and smoothness on norm balls in finite-dimensional spaces and connect results stemming from the geometry of Banach spaces with \textit{scaling inequalities} used in analysing the convergence of optimization methods. In particular, we establish local versions of these conditions to provide sharper insights on a recent body of complexity resul… ▽ More We review various characterizations of uniform convexity and smoothness on norm balls in finite-dimensional spaces and connect results stemming from the geometry of Banach spaces with \textit{scaling inequalities} used in analysing the convergence of optimization methods. In particular, we establish local versions of these conditions to provide sharper insights on a recent body of complexity results in learning theory, online learning, or offline optimization, which rely on the strong convexity of the feasible set. While they have a significant impact on complexity, these strong convexity or uniform convexity properties of feasible sets are not exploited as thoroughly as their functional counterparts, and this work is an effort to correct this imbalance. We conclude with some practical examples in optimization and machine learning where leveraging these conditions and localized assumptions lead to new complexity results. △ Less

Submitted 18 February, 2021; v1 submitted 9 February, 2021; originally announced February 2021.

arXiv:2101.09545 [pdf, ps, other]

doi 10.1561/2400000036

Acceleration Methods

Authors: Alexandre d'Aspremont, Damien Scieur, Adrien Taylor

Abstract: This monograph covers some recent advances in a range of acceleration techniques frequently used in convex optimization. We first use quadratic optimization problems to introduce two key families of methods, namely momentum and nested optimization schemes. They coincide in the quadratic case to form the Chebyshev method. We discuss momentum methods in detail, starting with the seminal work of Nest… ▽ More This monograph covers some recent advances in a range of acceleration techniques frequently used in convex optimization. We first use quadratic optimization problems to introduce two key families of methods, namely momentum and nested optimization schemes. They coincide in the quadratic case to form the Chebyshev method. We discuss momentum methods in detail, starting with the seminal work of Nesterov and structure convergence proofs using a few master templates, such as that for optimized gradient methods, which provide the key benefit of showing how momentum methods optimize convergence guarantees. We further cover proximal acceleration, at the heart of the Catalyst and Accelerated Hybrid Proximal Extragradient frameworks, using similar algorithmic patterns. Common acceleration techniques rely directly on the knowledge of some of the regularity parameters in the problem at hand. We conclude by discussing restart schemes, a set of simple techniques for reaching nearly optimal convergence rates while adapting to unobserved regularity parameters. △ Less

Submitted 21 December, 2021; v1 submitted 23 January, 2021; originally announced January 2021.

Comments: Published in Foundation and Trends in Optimization (see https://www.nowpublishers.com/article/Details/OPT-036)

Journal ref: Foundations and Trends in Optimization: Vol. 5: No. 1-2, pp 1-245 (2021)

arXiv:2010.15482 [pdf, other]

Convergence of Constrained Anderson Acceleration

Authors: Mathieu Barré, Adrien Taylor, Alexandre d'Aspremont

Abstract: We prove non asymptotic linear convergence rates for the constrained Anderson acceleration extrapolation scheme. These guarantees come from new upper bounds on the constrained Chebyshev problem, which consists in minimizing the maximum absolute value of a polynomial on a bounded real interval with $l_1$ constraints on its coefficients vector. Constrained Anderson Acceleration has a numerical cost… ▽ More We prove non asymptotic linear convergence rates for the constrained Anderson acceleration extrapolation scheme. These guarantees come from new upper bounds on the constrained Chebyshev problem, which consists in minimizing the maximum absolute value of a polynomial on a bounded real interval with $l_1$ constraints on its coefficients vector. Constrained Anderson Acceleration has a numerical cost comparable to that of the original scheme. △ Less

Submitted 29 October, 2020; originally announced October 2020.

arXiv:2010.02762 [pdf, other]

Averaging Atmospheric Gas Concentration Data using Wasserstein Barycenters

Authors: Mathieu Barré, Clément Giron, Matthieu Mazzolini, Alexandre d'Aspremont

Abstract: Hyperspectral satellite images report greenhouse gas concentrations worldwide on a daily basis. While taking simple averages of these images over time produces a rough estimate of relative emission rates, atmospheric transport means that simple averages fail to pinpoint the source of these emissions. We propose using Wasserstein barycenters coupled with weather data to average gas concentration da… ▽ More Hyperspectral satellite images report greenhouse gas concentrations worldwide on a daily basis. While taking simple averages of these images over time produces a rough estimate of relative emission rates, atmospheric transport means that simple averages fail to pinpoint the source of these emissions. We propose using Wasserstein barycenters coupled with weather data to average gas concentration data sets and better concentrate the mass around significant sources. △ Less

Submitted 6 October, 2020; originally announced October 2020.

arXiv:2004.11053 [pdf, other]

Projection-Free Optimization on Uniformly Convex Sets

Authors: Thomas Kerdreux, Alexandre d'Aspremont, Sebastian Pokutta

Abstract: The Frank-Wolfe method solves smooth constrained convex optimization problems at a generic sublinear rate of $\mathcal{O}(1/T)$, and it (or its variants) enjoys accelerated convergence rates for two fundamental classes of constraints: polytopes and strongly-convex sets. Uniformly convex sets non-trivially subsume strongly convex sets and form a large variety of \textit{curved} convex sets commonly… ▽ More The Frank-Wolfe method solves smooth constrained convex optimization problems at a generic sublinear rate of $\mathcal{O}(1/T)$, and it (or its variants) enjoys accelerated convergence rates for two fundamental classes of constraints: polytopes and strongly-convex sets. Uniformly convex sets non-trivially subsume strongly convex sets and form a large variety of \textit{curved} convex sets commonly encountered in machine learning and signal processing. For instance, the $\ell_p$-balls are uniformly convex for all $p > 1$, but strongly convex for $p\in]1,2]$ only. We show that these sets systematically induce accelerated convergence rates for the original Frank-Wolfe algorithm, which continuously interpolate between known rates. Our accelerated convergence rates emphasize that it is the curvature of the constraint sets -- not just their strong convexity -- that leads to accelerated convergence rates. These results also importantly highlight that the Frank-Wolfe algorithm is adaptive to much more generic constraint set structures, thus explaining faster empirical convergence. Finally, we also show accelerated convergence rates when the set is only locally uniformly convex and provide similar results in online linear optimization. △ Less

Submitted 16 June, 2020; v1 submitted 23 April, 2020; originally announced April 2020.

arXiv:2002.02208 [pdf, ps, other]

Global Convergence of Frank Wolfe on One Hidden Layer Networks

Authors: Alexandre d'Aspremont, Mert Pilanci

Abstract: We derive global convergence bounds for the Frank Wolfe algorithm when training one hidden layer neural networks. When using the ReLU activation function, and under tractable preconditioning assumptions on the sample data set, the linear minimization oracle used to incrementally form the solution can be solved explicitly as a second order cone program. The classical Frank Wolfe algorithm then conv… ▽ More We derive global convergence bounds for the Frank Wolfe algorithm when training one hidden layer neural networks. When using the ReLU activation function, and under tractable preconditioning assumptions on the sample data set, the linear minimization oracle used to incrementally form the solution can be solved explicitly as a second order cone program. The classical Frank Wolfe algorithm then converges with rate $O(1/T)$ where $T$ is both the number of neurons and the number of calls to the oracle. △ Less

Submitted 6 February, 2020; originally announced February 2020.

arXiv:2002.00915 [pdf, ps, other]

Complexity Guarantees for Polyak Steps with Momentum

Authors: Mathieu Barré, Adrien Taylor, Alexandre d'Aspremont

Abstract: In smooth strongly convex optimization, knowledge of the strong convexity parameter is critical for obtaining simple methods with accelerated rates. In this work, we study a class of methods, based on Polyak steps, where this knowledge is substituted by that of the optimal value, $f_*$. We first show slightly improved convergence bounds than previously known for the classical case of simple gradie… ▽ More In smooth strongly convex optimization, knowledge of the strong convexity parameter is critical for obtaining simple methods with accelerated rates. In this work, we study a class of methods, based on Polyak steps, where this knowledge is substituted by that of the optimal value, $f_*$. We first show slightly improved convergence bounds than previously known for the classical case of simple gradient descent with Polyak steps, we then derive an accelerated gradient method with Polyak steps and momentum, along with convergence guarantees. △ Less

Submitted 3 July, 2020; v1 submitted 3 February, 2020; originally announced February 2020.

Comments: Accepted to COLT2020

arXiv:1911.08510 [pdf, other]

doi 10.1007/s10107-021-01618-1

Optimal Complexity and Certification of Bregman First-Order Methods

Authors: Radu-Alexandru Dragomir, Adrien Taylor, Alexandre d'Aspremont, Jérôme Bolte

Abstract: We provide a lower bound showing that the $O(1/k)$ convergence rate of the NoLips method (a.k.a. Bregman Gradient) is optimal for the class of functions satisfying the $h$-smoothness assumption. This assumption, also known as relative smoothness, appeared in the recent developments around the Bregman Gradient method, where acceleration remained an open issue. On the way, we show how to constructiv… ▽ More We provide a lower bound showing that the $O(1/k)$ convergence rate of the NoLips method (a.k.a. Bregman Gradient) is optimal for the class of functions satisfying the $h$-smoothness assumption. This assumption, also known as relative smoothness, appeared in the recent developments around the Bregman Gradient method, where acceleration remained an open issue. On the way, we show how to constructively obtain the corresponding worst-case functions by extending the computer-assisted performance estimation framework of Drori and Teboulle (Mathematical Programming, 2014) to Bregman first-order methods, and to handle the classes of differentiable and strictly convex functions. △ Less

Submitted 17 February, 2021; v1 submitted 19 November, 2019; originally announced November 2019.

Comments: To appear in Mathematical Programming

MSC Class: 90C25 (Primary) 90C06; 90C60; 90C22; 68Q25 (Secondary)

arXiv:1906.03056 [pdf, other]

Polyak Steps for Adaptive Fast Gradient Method

Authors: Mathieu Barré, Alexandre d'Aspremont

Abstract: Accelerated algorithms for minimizing smooth strongly convex functions usually require knowledge of the strong convexity parameter $μ$. In the case of an unknown $μ$, current adaptive techniques are based on restart schemes. When the optimal value $f^*$ is known, these strategies recover the accelerated linear convergence bound without additional grid search. In this paper we propose a new approac… ▽ More Accelerated algorithms for minimizing smooth strongly convex functions usually require knowledge of the strong convexity parameter $μ$. In the case of an unknown $μ$, current adaptive techniques are based on restart schemes. When the optimal value $f^*$ is known, these strategies recover the accelerated linear convergence bound without additional grid search. In this paper we propose a new approach that has the same bound without any restart, using an online estimation of strong convexity parameter. We show the robustness of the Fast Gradient Method when using a sequence of upper bounds on $μ$. We also present a good candidate for this estimate sequence and detail consistent empirical results. △ Less

Submitted 7 June, 2019; originally announced June 2019.

arXiv:1906.02746 [pdf, other]

Ranking and synchronization from pairwise measurements via SVD

Authors: Alexandre d'Aspremont, Mihai Cucuringu, Hemant Tyagi

Abstract: Given a measurement graph $G= (V,E)$ and an unknown signal $r \in \mathbb{R}^n$, we investigate algorithms for recovering $r$ from pairwise measurements of the form $r_i - r_j$; $\{i,j\} \in E$. This problem arises in a variety of applications, such as ranking teams in sports data and time synchronization of distributed networks. Framed in the context of ranking, the task is to recover the ranking… ▽ More Given a measurement graph $G= (V,E)$ and an unknown signal $r \in \mathbb{R}^n$, we investigate algorithms for recovering $r$ from pairwise measurements of the form $r_i - r_j$; $\{i,j\} \in E$. This problem arises in a variety of applications, such as ranking teams in sports data and time synchronization of distributed networks. Framed in the context of ranking, the task is to recover the ranking of $n$ teams (induced by $r$) given a small subset of noisy pairwise rank offsets. We propose a simple SVD-based algorithmic pipeline for both the problem of time synchronization and ranking. We provide a detailed theoretical analysis in terms of robustness against both sampling sparsity and noise perturbations with outliers, using results from matrix perturbation and random matrix theory. Our theoretical findings are complemented by a detailed set of numerical experiments on both synthetic and real data, showcasing the competitiveness of our proposed algorithms with other state-of-the-art methods. △ Less

Submitted 7 August, 2020; v1 submitted 6 June, 2019; originally announced June 2019.

Comments: 49 pages, 10 figures

arXiv:1901.10791 [pdf, other]

doi 10.1007/s10957-021-01820-3

Quartic First-Order Methods for Low-Rank Minimization

Authors: Radu-Alexandru Dragomir, Alexandre d'Aspremont, Jérôme Bolte

Abstract: We study a generalized nonconvex Burer-Monteiro formulation for low-rank minimization problems. We use recent results on non-Euclidean first order methods to provide efficient and scalable algorithms. Our approach uses geometries induced by quartic kernels on matrix spaces; for unconstrained cases we introduce a novel family of Gram kernels that considerably improves numerical performances. Numeri… ▽ More We study a generalized nonconvex Burer-Monteiro formulation for low-rank minimization problems. We use recent results on non-Euclidean first order methods to provide efficient and scalable algorithms. Our approach uses geometries induced by quartic kernels on matrix spaces; for unconstrained cases we introduce a novel family of Gram kernels that considerably improves numerical performances. Numerical experiments for Euclidean distance matrix completion and symmetric nonnegative matrix factorization show that our algorithms scale well and reach state of the art performance when compared to specialized methods. △ Less

Submitted 17 February, 2021; v1 submitted 30 January, 2019; originally announced January 2019.

Comments: To appear in Journal of Optimization Theory and Applications

MSC Class: 90C06 (Primary) 90C26 (Secondary)

arXiv:1810.04539 [pdf, other]

Nonlinear Acceleration of Momentum and Primal-Dual Algorithms

Authors: Raghu Bollapragada, Damien Scieur, Alexandre d'Aspremont

Abstract: We describe convergence acceleration schemes for multistep optimization algorithms. The extrapolated solution is written as a nonlinear average of the iterates produced by the original optimization method. Our analysis does not need the underlying fixed-point operator to be symmetric, hence handles e.g. algorithms with momentum terms such as Nesterov's accelerated method, or primal-dual methods. T… ▽ More We describe convergence acceleration schemes for multistep optimization algorithms. The extrapolated solution is written as a nonlinear average of the iterates produced by the original optimization method. Our analysis does not need the underlying fixed-point operator to be symmetric, hence handles e.g. algorithms with momentum terms such as Nesterov's accelerated method, or primal-dual methods. The weights are computed via a simple linear system and we analyze performance in both online and offline modes. We use Crouzeix's conjecture to show that acceleration performance is controlled by the solution of a Chebyshev problem on the numerical range of a non-symmetric operator modeling the behavior of iterates near the optimum. Numerical experiments are detailed on logistic regression problems. △ Less

Submitted 17 October, 2019; v1 submitted 10 October, 2018; originally announced October 2018.

arXiv:1810.02748 [pdf, other]

An M* Proxy for Sparse Recovery Performance

Authors: Mathieu Barré, Alexandre d'Aspremont

Abstract: This paper provides a new tractable lower bound for the sparse recovery threshold of sensing matrices. This lower bound is used as a proxy to quantify the quality of sensing matrices in two different applications. First, it serves as regularization for the classical dictionary learning problem in order to learn dictionaries with better generalisation properties on unseen data. Then, the proxy is u… ▽ More This paper provides a new tractable lower bound for the sparse recovery threshold of sensing matrices. This lower bound is used as a proxy to quantify the quality of sensing matrices in two different applications. First, it serves as regularization for the classical dictionary learning problem in order to learn dictionaries with better generalisation properties on unseen data. Then, the proxy is used to design sampling schemes for MRI acquisition that exhibit high reconstruction performances. △ Less

Submitted 14 December, 2020; v1 submitted 5 October, 2018; originally announced October 2018.

arXiv:1810.02429 [pdf, other]

Restarting Frank-Wolfe: Faster Rates Under Hölderian Error Bounds

Authors: Thomas Kerdreux, Alexandre d'Aspremont, Sebastian Pokutta

Abstract: Conditional Gradient algorithms (aka Frank-Wolfe algorithms) form a classical set of methods for constrained smooth convex minimization due to their simplicity, the absence of projection steps, and competitive numerical performance. While the vanilla Frank-Wolfe algorithm only ensures a worst-case rate of $\mathcal{O}(1/ε)$, various recent results have shown that for strongly convex functions on p… ▽ More Conditional Gradient algorithms (aka Frank-Wolfe algorithms) form a classical set of methods for constrained smooth convex minimization due to their simplicity, the absence of projection steps, and competitive numerical performance. While the vanilla Frank-Wolfe algorithm only ensures a worst-case rate of $\mathcal{O}(1/ε)$, various recent results have shown that for strongly convex functions on polytopes, the method can be slightly modified to achieve linear convergence. However, this still leaves a huge gap between sublinear $\mathcal{O}(1/ε)$ convergence and linear $\mathcal{O}(\log 1/ε)$ convergence to reach an $ε$-approximate solution. Here, we present a new variant of Conditional Gradient algorithms, that can dynamically adapt to the function's geometric properties using restarts and smoothly interpolates between the sublinear and linear regimes. These interpolated convergence rates are obtained when the optimization problem satisfies a new type of error bounds, which we call \textit{strong Wolfe primal bounds}. They combine geometric information on the constraint set with Hölderian Error Bounds on the objective function. △ Less

Submitted 19 October, 2021; v1 submitted 4 October, 2018; originally announced October 2018.

Comments: Journal version

arXiv:1806.00664 [pdf, other]

Robust Seriation and Applications to Cancer Genomics

Authors: Antoine Recanati, Nicolas Servant, Jean-Philippe Vert, Alexandre d'Aspremont

Abstract: The seriation problem seeks to reorder a set of elements given pairwise similarity information, so that elements with higher similarity are closer in the resulting sequence. When a global ordering consistent with the similarity information exists, an exact spectral solution recovers it in the noiseless case and seriation is equivalent to the combinatorial 2-SUM problem over permutations, for which… ▽ More The seriation problem seeks to reorder a set of elements given pairwise similarity information, so that elements with higher similarity are closer in the resulting sequence. When a global ordering consistent with the similarity information exists, an exact spectral solution recovers it in the noiseless case and seriation is equivalent to the combinatorial 2-SUM problem over permutations, for which several relaxations have been derived. However, in applications such as DNA assembly, similarity values are often heavily corrupted, and the solution of 2-SUM may no longer yield an approximate serial structure on the elements. We introduce the robust seriation problem and show that it is equivalent to a modified 2-SUM problem for a class of similarity matrices modeling those observed in DNA assembly. We explore several relaxations of this modified 2-SUM problem and compare them empirically on both synthetic matrices and real DNA data. We then introduce the problem of seriation with duplications, which is a generalization of Seriation motivated by applications to cancer genome reconstruction. We propose an algorithm involving robust seriation to solve it, and present preliminary results on synthetic data sets. △ Less

Submitted 2 June, 2018; originally announced June 2018.

arXiv:1806.00370 [pdf, ps, other]

Nonlinear Acceleration of CNNs

Authors: Damien Scieur, Edouard Oyallon, Alexandre d'Aspremont, Francis Bach

Abstract: The Regularized Nonlinear Acceleration (RNA) algorithm is an acceleration method capable of improving the rate of convergence of many optimization schemes such as gradient descend, SAGA or SVRG. Until now, its analysis is limited to convex problems, but empirical observations shows that RNA may be extended to wider settings. In this paper, we investigate further the benefits of RNA when applied to… ▽ More The Regularized Nonlinear Acceleration (RNA) algorithm is an acceleration method capable of improving the rate of convergence of many optimization schemes such as gradient descend, SAGA or SVRG. Until now, its analysis is limited to convex problems, but empirical observations shows that RNA may be extended to wider settings. In this paper, we investigate further the benefits of RNA when applied to neural networks, in particular for the task of image recognition on CIFAR10 and ImageNet. With very few modifications of exiting frameworks, RNA improves slightly the optimization process of CNNs, after training. △ Less

Submitted 1 June, 2018; originally announced June 2018.

arXiv:1805.09639 [pdf, ps, other]

Online Regularized Nonlinear Acceleration

Authors: Damien Scieur, Edouard Oyallon, Alexandre d'Aspremont, Francis Bach

Abstract: Regularized nonlinear acceleration (RNA) estimates the minimum of a function by post-processing iterates from an algorithm such as the gradient method. It can be seen as a regularized version of Anderson acceleration, a classical acceleration scheme from numerical analysis. The new scheme provably improves the rate of convergence of fixed step gradient descent, and its empirical performance is com… ▽ More Regularized nonlinear acceleration (RNA) estimates the minimum of a function by post-processing iterates from an algorithm such as the gradient method. It can be seen as a regularized version of Anderson acceleration, a classical acceleration scheme from numerical analysis. The new scheme provably improves the rate of convergence of fixed step gradient descent, and its empirical performance is comparable to that of quasi-Newton methods. However, RNA cannot accelerate faster multistep algorithms like Nesterov's method and often diverges in this context. Here, we adapt RNA to overcome these issues, so that our scheme can be used on fast algorithms such as gradient methods with momentum. We show optimal complexity bounds for quadratics and asymptotically optimal rates on general convex minimization problems. Moreover, this new scheme works online, i.e., extrapolated solution estimates can be reinjected at each iteration, significantly improving numerical performance over classical accelerated methods. △ Less

Submitted 21 June, 2019; v1 submitted 24 May, 2018; originally announced May 2018.

arXiv:1803.07348 [pdf, ps, other]

Frank-Wolfe with Subsampling Oracle

Authors: Thomas Kerdreux, Fabian Pedregosa, Alexandre d'Aspremont

Abstract: We analyze two novel randomized variants of the Frank-Wolfe (FW) or conditional gradient algorithm. While classical FW algorithms require solving a linear minimization problem over the domain at each iteration, the proposed method only requires to solve a linear minimization problem over a small \emph{subset} of the original domain. The first algorithm that we propose is a randomized variant of th… ▽ More We analyze two novel randomized variants of the Frank-Wolfe (FW) or conditional gradient algorithm. While classical FW algorithms require solving a linear minimization problem over the domain at each iteration, the proposed method only requires to solve a linear minimization problem over a small \emph{subset} of the original domain. The first algorithm that we propose is a randomized variant of the original FW algorithm and achieves a $\mathcal{O}(1/t)$ sublinear convergence rate as in the deterministic counterpart. The second algorithm is a randomized variant of the Away-step FW algorithm, and again as its deterministic counterpart, reaches linear (i.e., exponential) convergence rate making it the first provably convergent randomized variant of Away-step FW. In both cases, while subsampling reduces the convergence rate by a constant factor, the linear minimization step can be a fraction of the cost of that of the deterministic versions, especially when the data is streamed. We illustrate computational gains of the algorithms on regression problems, involving both $\ell_1$ and latent group lasso penalties. △ Less

Submitted 20 March, 2018; originally announced March 2018.

arXiv:1712.08559 [pdf, other]

An Approximate Shapley-Folkman Theorem

Authors: Thomas Kerdreux, Igor Colin, Alexandre d'Aspremont

Abstract: The Shapley-Folkman theorem shows that Minkowski averages of uniformly bounded sets tend to be convex when the number of terms in the sum becomes much larger than the ambient dimension. In optimization, Aubin and Ekeland [1976] show that this produces an a priori bound on the duality gap of separable nonconvex optimization problems involving finite sums. This bound is highly conservative and depen… ▽ More The Shapley-Folkman theorem shows that Minkowski averages of uniformly bounded sets tend to be convex when the number of terms in the sum becomes much larger than the ambient dimension. In optimization, Aubin and Ekeland [1976] show that this produces an a priori bound on the duality gap of separable nonconvex optimization problems involving finite sums. This bound is highly conservative and depends on unstable quantities, and we relax it in several directions to show that non convexity can have a much milder impact on finite sum minimization problems such as empirical risk minimization and multi-task classification. As a byproduct, we show a new version of Maurey's classical approximate Carathéodory lemma where we sample a significant fraction of the coefficients, without replacement, as well as a result on sampling constraints using an approximate Helly theorem, both of independent interest. △ Less

Submitted 1 July, 2019; v1 submitted 22 December, 2017; originally announced December 2017.

Comments: Added constraint sampling result, simplified sampling results, reformat, etc

arXiv:1706.07270 [pdf, other]

Nonlinear Acceleration of Stochastic Algorithms

Authors: Damien Scieur, Alexandre d'Aspremont, Francis Bach

Abstract: Extrapolation methods use the last few iterates of an optimization algorithm to produce a better estimate of the optimum. They were shown to achieve optimal convergence rates in a deterministic setting using simple gradient iterates. Here, we study extrapolation methods in a stochastic setting, where the iterates are produced by either a simple or an accelerated stochastic gradient algorithm. We f… ▽ More Extrapolation methods use the last few iterates of an optimization algorithm to produce a better estimate of the optimum. They were shown to achieve optimal convergence rates in a deterministic setting using simple gradient iterates. Here, we study extrapolation methods in a stochastic setting, where the iterates are produced by either a simple or an accelerated stochastic gradient algorithm. We first derive convergence bounds for arbitrary, potentially biased perturbations, then produce asymptotic bounds using the ratio between the variance of the noise and the accuracy of the current point. Finally, we apply this acceleration technique to stochastic algorithms such as SGD, SAGA, SVRG and Katyusha in different settings, and show significant performance gains. △ Less

Submitted 3 August, 2017; v1 submitted 22 June, 2017; originally announced June 2017.

arXiv:1702.06751 [pdf, ps, other]

Integration Methods and Accelerated Optimization Algorithms

Authors: Damien Scieur, Vincent Roulet, Francis Bach, Alexandre d'Aspremont

Abstract: We show that accelerated optimization methods can be seen as particular instances of multi-step integration schemes from numerical analysis, applied to the gradient flow equation. In comparison with recent advances in this vein, the differential equation considered here is the basic gradient flow and we show that multi-step schemes allow integration of this differential equation using larger step… ▽ More We show that accelerated optimization methods can be seen as particular instances of multi-step integration schemes from numerical analysis, applied to the gradient flow equation. In comparison with recent advances in this vein, the differential equation considered here is the basic gradient flow and we show that multi-step schemes allow integration of this differential equation using larger step sizes, thus intuitively explaining acceleration results. △ Less

Submitted 22 February, 2017; originally announced February 2017.

arXiv:1702.03828 [pdf, ps, other]

Sharpness, Restart and Acceleration

Authors: Vincent Roulet, Alexandre d'Aspremont

Abstract: The Łojasiewicz inequality shows that sharpness bounds on the minimum of convex optimization problems hold almost generically. Sharpness directly controls the performance of restart schemes, as observed by Nemirovsky and Nesterov (1985). The constants quantifying these sharpness bounds are of course unobservable, but we show that optimal restart strategies are robust, in the sense that, in some im… ▽ More The Łojasiewicz inequality shows that sharpness bounds on the minimum of convex optimization problems hold almost generically. Sharpness directly controls the performance of restart schemes, as observed by Nemirovsky and Nesterov (1985). The constants quantifying these sharpness bounds are of course unobservable, but we show that optimal restart strategies are robust, in the sense that, in some important cases, finding the best restart scheme only requires a log scale grid search. Overall then, restart schemes generically accelerate accelerated first-order methods. △ Less

Submitted 4 November, 2019; v1 submitted 13 February, 2017; originally announced February 2017.

Comments: Short version appeared in Advances in Neural Information Processing Systems 30 (NIPS 2017)

MSC Class: 90C25; 90C06

arXiv:1606.04133 [pdf, ps, other]

Regularized Nonlinear Acceleration

Authors: Damien Scieur, Alexandre d'Aspremont, Francis Bach

Abstract: We describe a convergence acceleration technique for unconstrained optimization problems. Our scheme computes estimates of the optimum from a nonlinear average of the iterates produced by any optimization method. The weights in this average are computed via a simple linear system, whose solution can be updated online. This acceleration scheme runs in parallel to the base algorithm, providing impro… ▽ More We describe a convergence acceleration technique for unconstrained optimization problems. Our scheme computes estimates of the optimum from a nonlinear average of the iterates produced by any optimization method. The weights in this average are computed via a simple linear system, whose solution can be updated online. This acceleration scheme runs in parallel to the base algorithm, providing improved estimates of the solution on the fly, while the original optimization method is running. Numerical experiments are detailed on classical classification problems. △ Less

Submitted 14 April, 2019; v1 submitted 13 June, 2016; originally announced June 2016.

arXiv:1506.03295 [pdf, ps, other]

Computational Complexity versus Statistical Performance on Sparse Recovery Problems

Authors: Vincent Roulet, Nicolas Boumal, Alexandre d'Aspremont

Abstract: We show that several classical quantities controlling compressed sensing performance directly match classical parameters controlling algorithmic complexity. We first describe linearly convergent restart schemes on first-order methods solving a broad range of compressed sensing problems, where sharpness at the optimum controls convergence speed. We show that for sparse recovery problems, this sharp… ▽ More We show that several classical quantities controlling compressed sensing performance directly match classical parameters controlling algorithmic complexity. We first describe linearly convergent restart schemes on first-order methods solving a broad range of compressed sensing problems, where sharpness at the optimum controls convergence speed. We show that for sparse recovery problems, this sharpness can be written as a condition number, given by the ratio between true signal sparsity and the largest signal size that can be recovered by the observation matrix. In a similar vein, Renegar's condition number is a data-driven complexity measure for convex programs, generalizing classical condition numbers for linear systems. We show that for a broad class of compressed sensing problems, the worst case value of this algorithmic complexity measure taken over all signals matches the restricted singular value of the observation matrix which controls robust recovery performance. Overall, this means in both cases that, in compressed sensing problems, a single parameter directly controls both computational complexity and recovery performance. Numerical experiments illustrate these points using several classical algorithms. △ Less

Submitted 2 November, 2018; v1 submitted 10 June, 2015; originally announced June 2015.

Comments: Final version, to appear in information and Inference

MSC Class: 68U10; 49K40; 90C25

arXiv:1306.4805 [pdf, other]

Convex Relaxations for Permutation Problems

Authors: Fajwel Fogel, Rodolphe Jenatton, Francis Bach, Alexandre d'Aspremont

Abstract: Seriation seeks to reconstruct a linear order between variables using unsorted, pairwise similarity information. It has direct applications in archeology and shotgun gene sequencing for example. We write seriation as an optimization problem by proving the equivalence between the seriation and combinatorial 2-SUM problems on similarity matrices (2-SUM is a quadratic minimization problem over permut… ▽ More Seriation seeks to reconstruct a linear order between variables using unsorted, pairwise similarity information. It has direct applications in archeology and shotgun gene sequencing for example. We write seriation as an optimization problem by proving the equivalence between the seriation and combinatorial 2-SUM problems on similarity matrices (2-SUM is a quadratic minimization problem over permutations). The seriation problem can be solved exactly by a spectral algorithm in the noiseless case and we derive several convex relaxations for 2-SUM to improve the robustness of seriation solutions in noisy settings. These convex relaxations also allow us to impose structural constraints on the solution, hence solve semi-supervised seriation problems. We derive new approximation bounds for some of these relaxations and present numerical experiments on archeological data, Markov chains and DNA assembly from shotgun gene sequencing data. △ Less

Submitted 6 February, 2015; v1 submitted 20 June, 2013; originally announced June 2013.

Comments: Final journal version, a few typos and references fixed

MSC Class: 06A07; 90C27; 90C25; 92D20

arXiv:1304.7735 [pdf, other]

Phase retrieval for imaging problems

Authors: Fajwel Fogel, Irène Waldspurger, Alexandre d'Aspremont

Abstract: We study convex relaxation algorithms for phase retrieval on imaging problems. We show that structural assumptions on the signal and the observations, such as sparsity, smoothness or positivity, can be exploited to both speed-up convergence and improve recovery performance. We detail experimental results in molecular imaging problems simulated from PDB data. We study convex relaxation algorithms for phase retrieval on imaging problems. We show that structural assumptions on the signal and the observations, such as sparsity, smoothness or positivity, can be exploited to both speed-up convergence and improve recovery performance. We detail experimental results in molecular imaging problems simulated from PDB data. △ Less

Submitted 8 April, 2014; v1 submitted 29 April, 2013; originally announced April 2013.

Comments: Significantly expanded experiments

MSC Class: 94A12; 90C22; 90C27

arXiv:1301.0465 [pdf, ps, other]

An Optimal Affine Invariant Smooth Minimization Algorithm

Authors: Alexandre d'Aspremont, Cristóbal Guzmán, Martin Jaggi

Abstract: We formulate an affine invariant implementation of the accelerated first-order algorithm in Nesterov (1983). Its complexity bound is proportional to an affine invariant regularity constant defined with respect to the Minkowski gauge of the feasible set. We extend these results to more general problems, optimizing Hölder smooth functions using $p$-uniformly convex prox terms, and derive an algorith… ▽ More We formulate an affine invariant implementation of the accelerated first-order algorithm in Nesterov (1983). Its complexity bound is proportional to an affine invariant regularity constant defined with respect to the Minkowski gauge of the feasible set. We extend these results to more general problems, optimizing Hölder smooth functions using $p$-uniformly convex prox terms, and derive an algorithm whose complexity better fits the geometry of the feasible set and adapts to both the best Hölder smoothness parameter and the best gradient Lipschitz constant. Finally, we detail matching complexity lower bounds when the feasible set is an $\ell_p$ ball. In this setting, our upper bounds on iteration complexity for the algorithm in Nesterov (1983) are thus optimal in terms of target precision, smoothness and problem dimension. △ Less

Submitted 28 November, 2016; v1 submitted 3 January, 2013; originally announced January 2013.

Comments: Added algorithm with p-uniformly convex prox, optimal over all ell_p balls and adaptive algorithm in Holder smoothness and gradient Lipschitz constant

MSC Class: 90C25

arXiv:1207.0318 [pdf, ps, other]

Convex Algorithms for Nonnegative Matrix Factorization

Authors: Vijay Krishnamurthy, Alexandre d'Aspremont

Abstract: We derive approximation algorithms for the nonnegative matrix factorization problem, i.e. the problem of factorizing a matrix as the product of two matrices with nonnegative coefficients. We form convex approximations of this problem which can be solved efficiently and test our algorithms on some classic numerical examples. We derive approximation algorithms for the nonnegative matrix factorization problem, i.e. the problem of factorizing a matrix as the product of two matrices with nonnegative coefficients. We form convex approximations of this problem which can be solved efficiently and test our algorithms on some classic numerical examples. △ Less

Submitted 2 July, 2012; originally announced July 2012.

Comments: Discussion paper. Rejected by NIPS in 2007

arXiv:1206.0102 [pdf, ps, other]

Phase Recovery, MaxCut and Complex Semidefinite Programming

Authors: Irène Waldspurger, Alexandre d'Aspremont, Stéphane Mallat

Abstract: Phase retrieval seeks to recover a signal x from the amplitude |Ax| of linear measurements. We cast the phase retrieval problem as a non-convex quadratic program over a complex phase vector and formulate a tractable relaxation (called PhaseCut) similar to the classical MaxCut semidefinite program. We solve this problem using a provably convergent block coordinate descent algorithm whose structure… ▽ More Phase retrieval seeks to recover a signal x from the amplitude |Ax| of linear measurements. We cast the phase retrieval problem as a non-convex quadratic program over a complex phase vector and formulate a tractable relaxation (called PhaseCut) similar to the classical MaxCut semidefinite program. We solve this problem using a provably convergent block coordinate descent algorithm whose structure is similar to that of the original greedy algorithm in Gerchberg-Saxton, where each iteration is a matrix vector product. Numerical results show the performance of this approach over three different phase retrieval problems, in comparison with greedy phase retrieval algorithms and matrix completion formulations. △ Less

Submitted 22 July, 2013; v1 submitted 1 June, 2012; originally announced June 2012.

Comments: Submitted revision

arXiv:1205.0121 [pdf, ps, other]

Approximation Bounds for Sparse Principal Component Analysis

Authors: Alexandre d'Aspremont, Francis Bach, Laurent El Ghaoui

Abstract: We produce approximation bounds on a semidefinite programming relaxation for sparse principal component analysis. These bounds control approximation ratios for tractable statistics in hypothesis testing problems where data points are sampled from Gaussian models with a single sparse leading component. We produce approximation bounds on a semidefinite programming relaxation for sparse principal component analysis. These bounds control approximation ratios for tractable statistics in hypothesis testing problems where data points are sampled from Gaussian models with a single sparse leading component. △ Less

Submitted 18 June, 2012; v1 submitted 1 May, 2012; originally announced May 2012.

Comments: Section 4 substantially clarified. Added comparison with BBP transition for \lambdamax(.)

MSC Class: 62H25; 90C22; 90C27

arXiv:1204.0665 [pdf, other]

A Stochastic Smoothing Algorithm for Semidefinite Programming

Authors: Alexandre d'Aspremont, Noureddine El Karoui

Abstract: We use a rank one Gaussian perturbation to derive a smooth stochastic approximation of the maximum eigenvalue function. We then combine this smoothing result with an optimal smooth stochastic optimization algorithm to produce an efficient method for solving maximum eigenvalue minimization problems. We show that the complexity of this new method is lower than that of deterministic smoothing algorit… ▽ More We use a rank one Gaussian perturbation to derive a smooth stochastic approximation of the maximum eigenvalue function. We then combine this smoothing result with an optimal smooth stochastic optimization algorithm to produce an efficient method for solving maximum eigenvalue minimization problems. We show that the complexity of this new method is lower than that of deterministic smoothing algorithms in certain precision/dimension regimes. △ Less

Submitted 4 March, 2014; v1 submitted 3 April, 2012; originally announced April 2012.

Comments: Final version. The paper was reorganized, with additional details in the regularity proof. The published version is missing the appendix

MSC Class: 90C22; 90C15; 47A75

arXiv:1101.3027 [pdf, ps, other]

Sparse Recovery, Kashin Decomposition and Conic Programming

Authors: Alexandre d'Aspremont

Abstract: We produce relaxation bounds on the diameter of arbitrary sections of the l1 ball in R^n. We use these results to test conditions for sparse recovery. We produce relaxation bounds on the diameter of arbitrary sections of the l1 ball in R^n. We use these results to test conditions for sparse recovery. △ Less

Submitted 15 January, 2011; originally announced January 2011.

Comments: First, conference version

MSC Class: 94A12; 90C27; 90C22

arXiv:1011.3781 [pdf, ps, other]

Sparse PCA: Convex Relaxations, Algorithms and Applications

Authors: Youwei Zhang, Alexandre d'Aspremont, Laurent El Ghaoui

Abstract: Given a sample covariance matrix, we examine the problem of maximizing the variance explained by a linear combination of the input variables while constraining the number of nonzero coefficients in this combination. This is known as sparse principal component analysis and has a wide array of applications in machine learning and engineering. Unfortunately, this problem is also combinatorially hard… ▽ More Given a sample covariance matrix, we examine the problem of maximizing the variance explained by a linear combination of the input variables while constraining the number of nonzero coefficients in this combination. This is known as sparse principal component analysis and has a wide array of applications in machine learning and engineering. Unfortunately, this problem is also combinatorially hard and we discuss convex relaxation techniques that efficiently produce good approximate solutions. We then describe several algorithms solving these relaxations as well as greedy algorithms that iteratively improve the solution quality. Finally, we illustrate sparse PCA in several applications, ranging from senate voting and finance to news data. △ Less

Submitted 22 December, 2010; v1 submitted 16 November, 2010; originally announced November 2010.

Comments: To appear in "Handbook on Semidefinite, Cone and Polynomial Optimization", M. Anjos and J.B. Lasserre, editors. This revision includes ROC curves for greedy algorithms

MSC Class: 90C27; 62H25; 90C22

arXiv:1006.3601 [pdf, ps, other]

Convex Relaxations for Subset Selection

Authors: Francis Bach, Selin Damla Ahipasaoglu, Alexandre d'Aspremont

Abstract: We use convex relaxation techniques to produce lower bounds on the optimal value of subset selection problems and generate good approximate solutions. We then explicitly bound the quality of these relaxations by studying the approximation ratio of sparse eigenvalue relaxations. Our results are used to improve the performance of branch-and-bound algorithms to produce exact solutions to subset selec… ▽ More We use convex relaxation techniques to produce lower bounds on the optimal value of subset selection problems and generate good approximate solutions. We then explicitly bound the quality of these relaxations by studying the approximation ratio of sparse eigenvalue relaxations. Our results are used to improve the performance of branch-and-bound algorithms to produce exact solutions to subset selection problems. △ Less

Submitted 17 June, 2010; originally announced June 2010.

MSC Class: 62F07; 90C59; 68W25

arXiv:1004.5151 [pdf, ps, other]

Weak Recovery Conditions from Graph Partitioning Bounds and Order Statistics

Authors: Alexandre d'Aspremont, Noureddine El Karoui

Abstract: We study a weaker formulation of the nullspace property which guarantees recovery of sparse signals from linear measurements by l_1 minimization. We require this condition to hold only with high probability, given a distribution on the nullspace of the coding matrix A. Under some assumptions on the distribution of the reconstruction error, we show that testing these weak conditions means bounding… ▽ More We study a weaker formulation of the nullspace property which guarantees recovery of sparse signals from linear measurements by l_1 minimization. We require this condition to hold only with high probability, given a distribution on the nullspace of the coding matrix A. Under some assumptions on the distribution of the reconstruction error, we show that testing these weak conditions means bounding the optimal value of two classical graph partitioning problems: the k-Dense-Subgraph and MaxCut problems. Both problems admit efficient, relatively tight relaxations and we use a randomization argument to produce new approximation bounds for k-Dense-Subgraph. We test the performance of our results on several families of coding matrices. △ Less

Submitted 7 November, 2012; v1 submitted 28 April, 2010; originally announced April 2010.

Comments: Final version

MSC Class: 65C60; 90C22; 92C55; 97K20

arXiv:0908.0143 [pdf, ps, other]

A Pathwise Algorithm for Covariance Selection

Authors: Vijay Krishnamurthy, Alexandre d'Aspremont

Abstract: Covariance selection seeks to estimate a covariance matrix by maximum likelihood while restricting the number of nonzero inverse covariance matrix coefficients. A single penalty parameter usually controls the tradeoff between log likelihood and sparsity in the inverse matrix. We describe an efficient algorithm for computing a full regularization path of solutions to this problem. Covariance selection seeks to estimate a covariance matrix by maximum likelihood while restricting the number of nonzero inverse covariance matrix coefficients. A single penalty parameter usually controls the tradeoff between log likelihood and sparsity in the inverse matrix. We describe an efficient algorithm for computing a full regularization path of solutions to this problem. △ Less

Submitted 8 October, 2010; v1 submitted 2 August, 2009; originally announced August 2009.

Comments: More details & numerical experiments. Not all figures could be uploaded on arXiv. Please get local pdf file for complete numerical results

arXiv:0908.0137 [pdf, ps, other]

Second order accurate distributed eigenvector computation for extremely large matrices

Authors: Noureddine El Karoui, Alexandre d'Aspremont

Abstract: We propose a second-order accurate method to estimate the eigenvectors of extremely large matrices thereby addressing a problem of relevance to statisticians working in the analysis of very large datasets. More specifically, we show that averaging eigenvectors of randomly subsampled matrices efficiently approximates the true eigenvectors of the original matrix under certain conditions on the inc… ▽ More We propose a second-order accurate method to estimate the eigenvectors of extremely large matrices thereby addressing a problem of relevance to statisticians working in the analysis of very large datasets. More specifically, we show that averaging eigenvectors of randomly subsampled matrices efficiently approximates the true eigenvectors of the original matrix under certain conditions on the incoherence of the spectral decomposition. This incoherence assumption is typically milder than those made in matrix completion and allows eigenvectors to be sparse. We discuss applications to spectral methods in dimensionality reduction and information retrieval. △ Less

Submitted 5 February, 2010; v1 submitted 2 August, 2009; originally announced August 2009.

Comments: Complete proofs are included on averaging performance

arXiv:0807.3520 [pdf, ps, other]

Testing the Nullspace Property using Semidefinite Programming

Authors: Alexandre d'Aspremont, Laurent El Ghaoui

Abstract: Recent results in compressed sensing show that, under certain conditions, the sparsest solution to an underdetermined set of linear equations can be recovered by solving a linear program. These results either rely on computing sparse eigenvalues of the design matrix or on properties of its nullspace. So far, no tractable algorithm is known to test these conditions and most current results rely on… ▽ More Recent results in compressed sensing show that, under certain conditions, the sparsest solution to an underdetermined set of linear equations can be recovered by solving a linear program. These results either rely on computing sparse eigenvalues of the design matrix or on properties of its nullspace. So far, no tractable algorithm is known to test these conditions and most current results rely on asymptotic properties of random matrices. Given a matrix A, we use semidefinite relaxation techniques to test the nullspace property on A and show on some numerical examples that these relaxation bounds can prove perfect recovery of sparse solutions with relatively high cardinality. △ Less

Submitted 1 November, 2010; v1 submitted 22 July, 2008; originally announced July 2008.

Comments: Some typos corrected in Section 4.2

arXiv:0803.1990 [pdf, ps, other]

Subsampling Algorithms for Semidefinite Programming

Authors: Alexandre d'Aspremont

Abstract: We derive a stochastic gradient algorithm for semidefinite optimization using randomization techniques. The algorithm uses subsampling to reduce the computational cost of each iteration and the subsampling ratio explicitly controls granularity, i.e. the tradeoff between cost per iteration and total number of iterations. Furthermore, the total computational cost is directly proportional to the comp… ▽ More We derive a stochastic gradient algorithm for semidefinite optimization using randomization techniques. The algorithm uses subsampling to reduce the computational cost of each iteration and the subsampling ratio explicitly controls granularity, i.e. the tradeoff between cost per iteration and total number of iterations. Furthermore, the total computational cost is directly proportional to the complexity (i.e. rank) of the solution. We study numerical performance on some large-scale problems arising in statistical learning. △ Less

Submitted 29 August, 2011; v1 submitted 13 March, 2008; originally announced March 2008.

Comments: Final version, to appear in Stochastic Systems

MSC Class: 90C22; 65K05; 15A52

arXiv:math/0609812 [pdf, ps, other]

First-order methods for sparse covariance selection

Authors: Alexandre d'Aspremont, Onureena Banerjee, Laurent El Ghaoui

Abstract: Given a sample covariance matrix, we solve a maximum likelihood problem penalized by the number of nonzero coefficients in the inverse covariance matrix. Our objective is to find a sparse representation of the sample data and to highlight conditional independence relationships between the sample variables. We first formulate a convex relaxation of this combinatorial problem, we then detail two e… ▽ More Given a sample covariance matrix, we solve a maximum likelihood problem penalized by the number of nonzero coefficients in the inverse covariance matrix. Our objective is to find a sparse representation of the sample data and to highlight conditional independence relationships between the sample variables. We first formulate a convex relaxation of this combinatorial problem, we then detail two efficient first-order algorithms with low memory requirements to solve large-scale, dense problem instances. △ Less

Submitted 28 September, 2006; originally announced September 2006.

MSC Class: 90C22; 62H20; 90C59

arXiv:math/0512344 [pdf, ps, other]

Smooth Optimization with Approximate Gradient

Authors: Alexandre d'Aspremont

Abstract: We show that the optimal complexity of Nesterov's smooth first-order optimization algorithm is preserved when the gradient is only computed up to a small, uniformly bounded error. In applications of this method to semidefinite programs, this means in some instances computing only a few leading eigenvalues of the current iterate instead of a full matrix exponential, which significantly reduces th… ▽ More We show that the optimal complexity of Nesterov's smooth first-order optimization algorithm is preserved when the gradient is only computed up to a small, uniformly bounded error. In applications of this method to semidefinite programs, this means in some instances computing only a few leading eigenvalues of the current iterate instead of a full matrix exponential, which significantly reduces the method's computational cost. This also allows sparse problems to be solved efficiently using sparse maximum eigenvalue packages. △ Less

Submitted 16 May, 2008; v1 submitted 14 December, 2005; originally announced December 2005.

Comments: Titled changed from "Smooth Optimization for Sparse Semidefinite Programs". New figures, tests. Final version

MSC Class: 90C25; 90C22; 90C06

arXiv:math/0309048 [pdf, ps, other]

A Harmonic Analysis Solution to the Static Basket Arbitrage Problem

Authors: Alexandre d'Aspremont

Abstract: We consider the problem of computing upper and lower bounds on the price of a European basket call option, given prices on other similar baskets. We focus here on an interpretation of this program as a generalized moment problem. Recent results by Berg & Maserick (1984), Putinar & Vasilescu (1999) and Lasserre (2001) on harmonic analysis on semigroups, the K-moment problem and its applications t… ▽ More We consider the problem of computing upper and lower bounds on the price of a European basket call option, given prices on other similar baskets. We focus here on an interpretation of this program as a generalized moment problem. Recent results by Berg & Maserick (1984), Putinar & Vasilescu (1999) and Lasserre (2001) on harmonic analysis on semigroups, the K-moment problem and its applications to optimization, allow us to derive tractable necessary and sufficient conditions for the absence of static arbitrage between basket straddles, hence between basket calls and puts. △ Less

Submitted 2 September, 2003; originally announced September 2003.

Comments: Preliminary version for IMA workshop "Risk Management and Model Specifications Issues in Finance". Numerical results to be added later

MSC Class: 90C22; 47A57

arXiv:math/0302243 [pdf, ps, other]

Static Arbitrage Bounds on Basket Option Prices

Authors: Alexandre d'Aspremont, Laurent El Ghaoui

Abstract: We consider the problem of computing upper and lower bounds on the price of a European basket call option, given prices on other similar baskets. Although this problem is very hard to solve exactly in the general case, we show that in some instances the upper and lower bounds can be computed via simple closed-form expressions, or linear programs. We also introduce an efficient linear programming… ▽ More We consider the problem of computing upper and lower bounds on the price of a European basket call option, given prices on other similar baskets. Although this problem is very hard to solve exactly in the general case, we show that in some instances the upper and lower bounds can be computed via simple closed-form expressions, or linear programs. We also introduce an efficient linear programming relaxation of the general problem based on an integral transform interpretation of the call price function. We show that this relaxation is tight in some of the special cases examined before. △ Less

Submitted 5 October, 2005; v1 submitted 19 February, 2003; originally announced February 2003.

Comments: To Appear in Mathematical Programming, Series A

MSC Class: 44A12; 44A60; 90C05; 90C34; 91B28

Showing 1–50 of 51 results for author: d'Aspremont, A