-
Type-B analogue of Bell numbers using Rota's Umbral calculus approach
Authors:
Eli Bagno,
David Garber
Abstract:
Rota used the functional L to recover old properties and obtain some new formulas for the Bell numbers. Tanny used Rota's functional L and the celebrated Worpitzky identity to obtain some expression for the ordered Bell numbers, which can be seen as an evident to the fact that the ordered Bell numbers are gamma-positive. In this paper, we extend some of Rota's and Tanny's results to the framework…
▽ More
Rota used the functional L to recover old properties and obtain some new formulas for the Bell numbers. Tanny used Rota's functional L and the celebrated Worpitzky identity to obtain some expression for the ordered Bell numbers, which can be seen as an evident to the fact that the ordered Bell numbers are gamma-positive. In this paper, we extend some of Rota's and Tanny's results to the framework of the set partitions of Coxeter type B.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Projection-Free Online Convex Optimization with Time-Varying Constraints
Authors:
Dan Garber,
Ben Kretzu
Abstract:
We consider the setting of online convex optimization with adversarial time-varying constraints in which actions must be feasible w.r.t. a fixed constraint set, and are also required on average to approximately satisfy additional time-varying constraints. Motivated by scenarios in which the fixed feasible set (hard constraint) is difficult to project on, we consider projection-free algorithms that…
▽ More
We consider the setting of online convex optimization with adversarial time-varying constraints in which actions must be feasible w.r.t. a fixed constraint set, and are also required on average to approximately satisfy additional time-varying constraints. Motivated by scenarios in which the fixed feasible set (hard constraint) is difficult to project on, we consider projection-free algorithms that access this set only through a linear optimization oracle (LOO). We present an algorithm that, on a sequence of length $T$ and using overall $T$ calls to the LOO, guarantees $\tilde{O}(T^{3/4})$ regret w.r.t. the losses and $O(T^{7/8})$ constraints violation (ignoring all quantities except for $T$) . In particular, these bounds hold w.r.t. any interval of the sequence. We also present a more efficient algorithm that requires only first-order oracle access to the soft constraints and achieves similar bounds w.r.t. the entire sequence. We extend the latter to the setting of bandit feedback and obtain similar bounds (as a function of $T$) in expectation.
△ Less
Submitted 13 February, 2024;
originally announced February 2024.
-
Combinatorics of q,r-analogues of Stirling numbers of type B
Authors:
Eli Bagno,
David Garber
Abstract:
Stirling number of the first and the second kinds have seen many generalizations and applications in various areas of mathematics. We introduce some combinatorial parameters which realize $q$-analogues and Broder's $r$-variants of Stirling numbers of type $B$ of both kinds, which count signed set partitions and signed permutations respectively. Applications to orthogonality relations and power sum…
▽ More
Stirling number of the first and the second kinds have seen many generalizations and applications in various areas of mathematics. We introduce some combinatorial parameters which realize $q$-analogues and Broder's $r$-variants of Stirling numbers of type $B$ of both kinds, which count signed set partitions and signed permutations respectively. Applications to orthogonality relations and power sums are given.
△ Less
Submitted 16 January, 2024;
originally announced January 2024.
-
From Oja's Algorithm to the Multiplicative Weights Update Method with Applications
Authors:
Dan Garber
Abstract:
Oja's algorithm is a well known online algorithm studied mainly in the context of stochastic principal component analysis. We make a simple observation, yet to the best of our knowledge a novel one, that when applied to a any (not necessarily stochastic) sequence of symmetric matrices which share common eigenvectors, the regret of Oja's algorithm could be directly bounded in terms of the regret of…
▽ More
Oja's algorithm is a well known online algorithm studied mainly in the context of stochastic principal component analysis. We make a simple observation, yet to the best of our knowledge a novel one, that when applied to a any (not necessarily stochastic) sequence of symmetric matrices which share common eigenvectors, the regret of Oja's algorithm could be directly bounded in terms of the regret of the well known multiplicative weights update method for the problem of prediction with expert advice. Several applications to optimization with quadratic forms over the unit sphere in $\reals^n$ are discussed.
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
Efficiency of First-Order Methods for Low-Rank Tensor Recovery with the Tensor Nuclear Norm Under Strict Complementarity
Authors:
Dan Garber,
Atara Kaplan
Abstract:
We consider convex relaxations for recovering low-rank tensors based on constrained minimization over a ball induced by the tensor nuclear norm, recently introduced in \cite{tensor_tSVD}. We build on a recent line of results that considered convex relaxations for the recovery of low-rank matrices and established that under a strict complementarity condition (SC), both the convergence rate and per-…
▽ More
We consider convex relaxations for recovering low-rank tensors based on constrained minimization over a ball induced by the tensor nuclear norm, recently introduced in \cite{tensor_tSVD}. We build on a recent line of results that considered convex relaxations for the recovery of low-rank matrices and established that under a strict complementarity condition (SC), both the convergence rate and per-iteration runtime of standard gradient methods may improve dramatically. We develop the appropriate strict complementarity condition for the tensor nuclear norm ball and obtain the following main results under this condition: 1. When the objective to minimize is of the form $f(\mX)=g(\mA\mX)+\langle{\mC,\mX}\rangle$ , where $g$ is strongly convex and $\mA$ is a linear map (e.g., least squares), a quadratic growth bound holds, which implies linear convergence rates for standard projected gradient methods, despite the fact that $f$ need not be strongly convex. 2. For a smooth objective function, when initialized in certain proximity of an optimal solution which satisfies SC, standard projected gradient methods only require SVD computations (for projecting onto the tensor nuclear norm ball) of rank that matches the tubal rank of the optimal solution. In particular, when the tubal rank is constant, this implies nearly linear (in the size of the tensor) runtime per iteration, as opposed to super linear without further assumptions. 3. For a nonsmooth objective function which admits a popular smooth saddle-point formulation, we derive similar results to the latter for the well known extragradient method. An additional contribution which may be of independent interest, is the rigorous extension of many basic results regarding tensors of arbitrary order, which were previously obtained only for third-order tensors.
△ Less
Submitted 3 August, 2023;
originally announced August 2023.
-
Projection-free Online Exp-concave Optimization
Authors:
Dan Garber,
Ben Kretzu
Abstract:
We consider the setting of online convex optimization (OCO) with \textit{exp-concave} losses. The best regret bound known for this setting is $O(n\log{}T)$, where $n$ is the dimension and $T$ is the number of prediction rounds (treating all other quantities as constants and assuming $T$ is sufficiently large), and is attainable via the well-known Online Newton Step algorithm (ONS). However, ONS re…
▽ More
We consider the setting of online convex optimization (OCO) with \textit{exp-concave} losses. The best regret bound known for this setting is $O(n\log{}T)$, where $n$ is the dimension and $T$ is the number of prediction rounds (treating all other quantities as constants and assuming $T$ is sufficiently large), and is attainable via the well-known Online Newton Step algorithm (ONS). However, ONS requires on each iteration to compute a projection (according to some matrix-induced norm) onto the feasible convex set, which is often computationally prohibitive in high-dimensional settings and when the feasible set admits a non-trivial structure. In this work we consider projection-free online algorithms for exp-concave and smooth losses, where by projection-free we refer to algorithms that rely only on the availability of a linear optimization oracle (LOO) for the feasible set, which in many applications of interest admits much more efficient implementations than a projection oracle. We present an LOO-based ONS-style algorithm, which using overall $O(T)$ calls to a LOO, guarantees in worst case regret bounded by $\widetilde{O}(n^{2/3}T^{2/3})$ (ignoring all quantities except for $n,T$). However, our algorithm is most interesting in an important and plausible low-dimensional data scenario: if the gradients (approximately) span a subspace of dimension at most $ρ$, $ρ<< n$, the regret bound improves to $\widetilde{O}(ρ^{2/3}T^{2/3})$, and by applying standard deterministic sketching techniques, both the space and average additional per-iteration runtime requirements are only $O(ρn)$ (instead of $O(n^2)$). This improves upon recently proposed LOO-based algorithms for OCO which, while having the same state-of-the-art dependence on the horizon $T$, suffer from regret/oracle complexity that scales with $\sqrt{n}$ or worse.
△ Less
Submitted 9 February, 2023;
originally announced February 2023.
-
Faster Projection-Free Augmented Lagrangian Methods via Weak Proximal Oracle
Authors:
Dan Garber,
Tsur Livney,
Shoham Sabach
Abstract:
This paper considers a convex composite optimization problem with affine constraints, which includes problems that take the form of minimizing a smooth convex objective function over the intersection of (simple) convex sets, or regularized with multiple (simple) functions. Motivated by high-dimensional applications in which exact projection/proximal computations are not tractable, we propose a \te…
▽ More
This paper considers a convex composite optimization problem with affine constraints, which includes problems that take the form of minimizing a smooth convex objective function over the intersection of (simple) convex sets, or regularized with multiple (simple) functions. Motivated by high-dimensional applications in which exact projection/proximal computations are not tractable, we propose a \textit{projection-free} augmented Lagrangian-based method, in which primal updates are carried out using a \textit{weak proximal oracle} (WPO). In an earlier work, WPO was shown to be more powerful than the standard \textit{linear minimization oracle} (LMO) that underlies conditional gradient-based methods (aka Frank-Wolfe methods). Moreover, WPO is computationally tractable for many high-dimensional problems of interest, including those motivated by recovery of low-rank matrices and tensors, and optimization over polytopes which admit efficient LMOs. The main result of this paper shows that under a certain curvature assumption (which is weaker than strong convexity), our WPO-based algorithm achieves an ergodic rate of convergence of $O(1/T)$ for both the objective residual and feasibility gap. This result, to the best of our knowledge, improves upon the $O(1/\sqrt{T})$ rate for existing LMO-based projection-free methods for this class of problems. Empirical experiments on a low-rank and sparse covariance matrix estimation task and the Max Cut semidefinite relaxation demonstrate that of our method can outperform state-of-the-art LMO-based Lagrangian-based methods.
△ Less
Submitted 21 February, 2023; v1 submitted 25 October, 2022;
originally announced October 2022.
-
Analytic aspects of $q,r$-analogue of poly-Stirling numbers of both kinds
Authors:
Takao Komatsu,
Eli Bagno,
David Garber
Abstract:
The Stirling numbers of type $B$ of the second kind count signed set partitions. In this paper we provide new combinatorial and analytical identities regarding these numbers as well as Broder's $r$-version of these numbers. Among these identities one can find recursions, explicit formulas based on the inclusion-exclusion principle, and also exponential generating functions.
These Stirling number…
▽ More
The Stirling numbers of type $B$ of the second kind count signed set partitions. In this paper we provide new combinatorial and analytical identities regarding these numbers as well as Broder's $r$-version of these numbers. Among these identities one can find recursions, explicit formulas based on the inclusion-exclusion principle, and also exponential generating functions.
These Stirling numbers can be considered as members of a wider family of triangles of numbers that are characterized using results of Comtet and Lancaster.
We generalize these theorems, which present equivalent conditions for a triangle of numbers to be a triangle of generalized Stirling numbers, to the case of the $q,r$-poly Stirling numbers, which are $q$-analogues of the restricted Stirling numbers defined by Broder and having a polynomial value appearing in their defining recursion. There are two ways to do this and these ways are related by a nice identity.
△ Less
Submitted 5 April, 2024; v1 submitted 14 September, 2022;
originally announced September 2022.
-
Low-Rank Mirror-Prox for Nonsmooth and Low-Rank Matrix Optimization Problems
Authors:
Dan Garber,
Atara Kaplan
Abstract:
Low-rank and nonsmooth matrix optimization problems capture many fundamental tasks in statistics and machine learning. While significant progress has been made in recent years in develo** efficient methods for \textit{smooth} low-rank optimization problems that avoid maintaining high-rank matrices and computing expensive high-rank SVDs, advances for nonsmooth problems have been slow paced. In th…
▽ More
Low-rank and nonsmooth matrix optimization problems capture many fundamental tasks in statistics and machine learning. While significant progress has been made in recent years in develo** efficient methods for \textit{smooth} low-rank optimization problems that avoid maintaining high-rank matrices and computing expensive high-rank SVDs, advances for nonsmooth problems have been slow paced. In this paper we consider standard convex relaxations for such problems. Mainly, we prove that under a \textit{strict complementarity} condition and under the relatively mild assumption that the nonsmooth objective can be written as a maximum of smooth functions, approximated variants of two popular \textit{mirror-prox} methods: the Euclidean \textit{extragradient method} and mirror-prox with \textit{matrix exponentiated gradient updates}, when initialized with a "warm-start", converge to an optimal solution with rate $O(1/t)$, while requiring only two \textit{low-rank} SVDs per iteration. Moreover, for the extragradient method we also consider relaxed versions of strict complementarity which yield a trade-off between the rank of the SVDs required and the radius of the ball in which we need to initialize the method. We support our theoretical results with empirical experiments on several nonsmooth low-rank matrix recovery tasks, demonstrating both the plausibility of the strict complementarity assumption, and the efficient convergence of our proposed low-rank mirror-prox variants.
△ Less
Submitted 23 June, 2022;
originally announced June 2022.
-
Frank-Wolfe-based Algorithms for Approximating Tyler's M-estimator
Authors:
Lior Danon,
Dan Garber
Abstract:
Tyler's M-estimator is a well known procedure for robust and heavy-tailed covariance estimation. Tyler himself suggested an iterative fixed-point algorithm for computing his estimator however, it requires super-linear (in the size of the data) runtime per iteration, which maybe prohibitive in large scale. In this work we propose, to the best of our knowledge, the first Frank-Wolfe-based algorithms…
▽ More
Tyler's M-estimator is a well known procedure for robust and heavy-tailed covariance estimation. Tyler himself suggested an iterative fixed-point algorithm for computing his estimator however, it requires super-linear (in the size of the data) runtime per iteration, which maybe prohibitive in large scale. In this work we propose, to the best of our knowledge, the first Frank-Wolfe-based algorithms for computing Tyler's estimator. One variant uses standard Frank-Wolfe steps, the second also considers \textit{away-steps} (AFW), and the third is a \textit{geodesic} version of AFW (GAFW). AFW provably requires, up to a log factor, only linear time per iteration, while GAFW runs in linear time (up to a log factor) in a large $n$ (number of data-points) regime. All three variants are shown to provably converge to the optimal solution with sublinear rate, under standard assumptions, despite the fact that the underlying optimization problem is not convex nor smooth. Under an additional fairly mild assumption, that holds with probability 1 when the (normalized) data-points are i.i.d. samples from a continuous distribution supported on the entire unit sphere, AFW and GAFW are proved to converge with linear rates. Importantly, all three variants are parameter-free and use adaptive step-sizes.
△ Less
Submitted 25 October, 2022; v1 submitted 19 June, 2022;
originally announced June 2022.
-
New Projection-free Algorithms for Online Convex Optimization with Adaptive Regret Guarantees
Authors:
Dan Garber,
Ben Kretzu
Abstract:
We present new efficient \textit{projection-free} algorithms for online convex optimization (OCO), where by projection-free we refer to algorithms that avoid computing orthogonal projections onto the feasible set, and instead relay on different and potentially much more efficient oracles. While most state-of-the-art projection-free algorithms are based on the \textit{follow-the-leader} framework,…
▽ More
We present new efficient \textit{projection-free} algorithms for online convex optimization (OCO), where by projection-free we refer to algorithms that avoid computing orthogonal projections onto the feasible set, and instead relay on different and potentially much more efficient oracles. While most state-of-the-art projection-free algorithms are based on the \textit{follow-the-leader} framework, our algorithms are fundamentally different and are based on the \textit{online gradient descent} algorithm with a novel and efficient approach to computing so-called \textit{infeasible projections}. As a consequence, we obtain the first projection-free algorithms which naturally yield \textit{adaptive regret} guarantees, i.e., regret bounds that hold w.r.t. any sub-interval of the sequence. Concretely, when assuming the availability of a linear optimization oracle (LOO) for the feasible set, on a sequence of length $T$, our algorithms guarantee $O(T^{3/4})$ adaptive regret and $O(T^{3/4})$ adaptive expected regret, for the full-information and bandit settings, respectively, using only $O(T)$ calls to the LOO. These bounds match the current state-of-the-art regret bounds for LOO-based projection-free OCO, which are \textit{not adaptive}. We also consider a new natural setting in which the feasible set is accessible through a separation oracle. We present algorithms which, using overall $O(T)$ calls to the separation oracle, guarantee $O(\sqrt{T})$ adaptive regret and $O(T^{3/4})$ adaptive expected regret for the full-information and bandit settings, respectively.
△ Less
Submitted 19 March, 2023; v1 submitted 9 February, 2022;
originally announced February 2022.
-
Low-Rank Extragradient Method for Nonsmooth and Low-Rank Matrix Optimization Problems
Authors:
Dan Garber,
Atara Kaplan
Abstract:
Low-rank and nonsmooth matrix optimization problems capture many fundamental tasks in statistics and machine learning. While significant progress has been made in recent years in develo** efficient methods for \textit{smooth} low-rank optimization problems that avoid maintaining high-rank matrices and computing expensive high-rank SVDs, advances for nonsmooth problems have been slow paced.
In…
▽ More
Low-rank and nonsmooth matrix optimization problems capture many fundamental tasks in statistics and machine learning. While significant progress has been made in recent years in develo** efficient methods for \textit{smooth} low-rank optimization problems that avoid maintaining high-rank matrices and computing expensive high-rank SVDs, advances for nonsmooth problems have been slow paced.
In this paper we consider standard convex relaxations for such problems. Mainly, we prove that under a natural \textit{generalized strict complementarity} condition and under the relatively mild assumption that the nonsmooth objective can be written as a maximum of smooth functions, the \textit{extragradient method}, when initialized with a "warm-start" point, converges to an optimal solution with rate $O(1/t)$ while requiring only two \textit{low-rank} SVDs per iteration. We give a precise trade-off between the rank of the SVDs required and the radius of the ball in which we need to initialize the method. We support our theoretical results with empirical experiments on several nonsmooth low-rank matrix recovery tasks, demonstrating that using simple initializations, the extragradient method produces exactly the same iterates when full-rank SVDs are replaced with SVDs of rank that matches the rank of the (low-rank) ground-truth matrix to be recovered.
△ Less
Submitted 8 February, 2022;
originally announced February 2022.
-
Local Linear Convergence of Gradient Methods for Subspace Optimization via Strict Complementarity
Authors:
Dan Garber,
Ron Fisher
Abstract:
We consider optimization problems in which the goal is find a $k$-dimensional subspace of $\mathbb{R}^n$, $k<<n$, which minimizes a convex and smooth loss. Such problems generalize the fundamental task of principal component analysis (PCA) to include robust and sparse counterparts, and logistic PCA for binary data, among others. This problem could be approached either via nonconvex gradient method…
▽ More
We consider optimization problems in which the goal is find a $k$-dimensional subspace of $\mathbb{R}^n$, $k<<n$, which minimizes a convex and smooth loss. Such problems generalize the fundamental task of principal component analysis (PCA) to include robust and sparse counterparts, and logistic PCA for binary data, among others. This problem could be approached either via nonconvex gradient methods with highly-efficient iterations, but for which arguing about fast convergence to a global minimizer is difficult or, via a convex relaxation for which arguing about convergence to a global minimizer is straightforward, but the corresponding methods are often inefficient in high dimensions. In this work we bridge these two approaches under a strict complementarity assumption, which in particular implies that the optimal solution to the convex relaxation is unique and is also the optimal solution to the original nonconvex problem. Our main result is a proof that a natural nonconvex gradient method which is \textit{SVD-free} and requires only a single QR-factorization of an $n\times k$ matrix per iteration, converges locally with a linear rate. We also establish linear convergence results for the nonconvex projected gradient method, and the Frank-Wolfe method when applied to the convex relaxation.
△ Less
Submitted 25 October, 2022; v1 submitted 8 February, 2022;
originally announced February 2022.
-
Frank-Wolfe with a Nearest Extreme Point Oracle
Authors:
Dan Garber,
Noam Wolf
Abstract:
We consider variants of the classical Frank-Wolfe algorithm for constrained smooth convex minimization, that instead of access to the standard oracle for minimizing a linear function over the feasible set, have access to an oracle that can find an extreme point of the feasible set that is closest in Euclidean distance to a given vector. We first show that for many feasible sets of interest, such a…
▽ More
We consider variants of the classical Frank-Wolfe algorithm for constrained smooth convex minimization, that instead of access to the standard oracle for minimizing a linear function over the feasible set, have access to an oracle that can find an extreme point of the feasible set that is closest in Euclidean distance to a given vector. We first show that for many feasible sets of interest, such an oracle can be implemented with the same complexity as the standard linear optimization oracle. We then show that with such an oracle we can design new Frank-Wolfe variants which enjoy significantly improved complexity bounds in case the set of optimal solutions lies in the convex hull of a subset of extreme points with small diameter (e.g., a low-dimensional face of a polytope). In particular, for many $0\text{--}1$ polytopes, under quadratic growth and strict complementarity conditions, we obtain the first linearly convergent variant with rate that depends only on the dimension of the optimal face and not on the ambient dimension.
△ Less
Submitted 9 February, 2022; v1 submitted 3 February, 2021;
originally announced February 2021.
-
On the Efficient Implementation of the Matrix Exponentiated Gradient Algorithm for Low-Rank Matrix Optimization
Authors:
Dan Garber,
Atara Kaplan
Abstract:
Convex optimization over the spectrahedron, i.e., the set of all real $n\times n$ positive semidefinite matrices with unit trace, has important applications in machine learning, signal processing and statistics, mainly as a convex relaxation for optimization problems with low-rank matrices. It is also one of the most prominent examples in the theory of first-order methods for convex optimization i…
▽ More
Convex optimization over the spectrahedron, i.e., the set of all real $n\times n$ positive semidefinite matrices with unit trace, has important applications in machine learning, signal processing and statistics, mainly as a convex relaxation for optimization problems with low-rank matrices. It is also one of the most prominent examples in the theory of first-order methods for convex optimization in which non-Euclidean methods can be significantly preferable to their Euclidean counterparts. In particular, the desirable choice is the Matrix Exponentiated Gradient (MEG) method which is based on the Bregman distance induced by the (negative) von Neumann entropy. Unfortunately, implementing MEG requires a full SVD computation on each iteration, which is not scalable to high-dimensional problems. In this work we propose an efficient implementations of MEG, both with deterministic and stochastic gradients, which are tailored for optimization with low-rank matrices, and only use a single low-rank SVD computation on each iteration. We also provide efficiently-computable certificates for the correct convergence of our methods. Mainly, we prove that under a strict complementarity condition, the suggested methods converge from a ``warm-start" initialization with similar rates to their full-SVD-based counterparts. Finally, we bring empirical experiments which both support our theoretical findings and demonstrate the practical appeal of our methods.
△ Less
Submitted 30 October, 2022; v1 submitted 18 December, 2020;
originally announced December 2020.
-
Revisiting Projection-free Online Learning: the Strongly Convex Case
Authors:
Dan Garber,
Ben Kretzu
Abstract:
Projection-free optimization algorithms, which are mostly based on the classical Frank-Wolfe method, have gained significant interest in the machine learning community in recent years due to their ability to handle convex constraints that are popular in many applications, but for which computing projections is often computationally impractical in high-dimensional settings, and hence prohibit the u…
▽ More
Projection-free optimization algorithms, which are mostly based on the classical Frank-Wolfe method, have gained significant interest in the machine learning community in recent years due to their ability to handle convex constraints that are popular in many applications, but for which computing projections is often computationally impractical in high-dimensional settings, and hence prohibit the use of most standard projection-based methods. In particular, a significant research effort was put on projection-free methods for online learning. In this paper we revisit the Online Frank-Wolfe (OFW) method suggested by Hazan and Kale \cite{Hazan12} and fill a gap that has been left unnoticed for several years: OFW achieves a faster rate of $O(T^{2/3})$ on strongly convex functions (as opposed to the standard $O(T^{3/4})$ for convex but not strongly convex functions), where $T$ is the sequence length. This is somewhat surprising since it is known that for offline optimization, in general, strong convexity does not lead to faster rates for Frank-Wolfe. We also revisit the bandit setting under strong convexity and prove a similar bound of $\tilde O(T^{2/3})$ (instead of $O(T^{3/4})$ without strong convexity). Hence, in the current state-of-affairs, the best projection-free upper-bounds for the full-information and bandit settings with strongly convex and nonsmooth functions match up to logarithmic factors in $T$.
△ Less
Submitted 23 February, 2021; v1 submitted 15 October, 2020;
originally announced October 2020.
-
Revisiting Frank-Wolfe for Polytopes: Strict Complementarity and Sparsity
Authors:
Dan Garber
Abstract:
In recent years it was proved that simple modifications of the classical Frank-Wolfe algorithm (aka conditional gradient algorithm) for smooth convex minimization over convex and compact polytopes, converge with linear rate, assuming the objective function has the quadratic growth property. However, the rate of these methods depends explicitly on the dimension of the problem which cannot explain t…
▽ More
In recent years it was proved that simple modifications of the classical Frank-Wolfe algorithm (aka conditional gradient algorithm) for smooth convex minimization over convex and compact polytopes, converge with linear rate, assuming the objective function has the quadratic growth property. However, the rate of these methods depends explicitly on the dimension of the problem which cannot explain their empirical success for large scale problems. In this paper we first demonstrate that already for very simple problems and even when the optimal solution lies on a low-dimensional face of the polytope, such dependence on the dimension cannot be avoided in worst case. We then revisit the addition of a strict complementarity assumption already considered in Wolfe's classical book \cite{Wolfe1970}, and prove that under this condition, the Frank-Wolfe method with away-steps and line-search converges linearly with rate that depends explicitly only on the dimension of the optimal face. We motivate strict complementarity by proving that it implies sparsity-robustness of optimal solutions to noise.
△ Less
Submitted 6 January, 2021; v1 submitted 31 May, 2020;
originally announced June 2020.
-
The Worpitzky identity for the groups of signed and even-signed permutations
Authors:
Eli Bagno,
David Garber,
Mordechai Novick
Abstract:
The well-known Worpitzky identity provides a connection between two bases of $\mathbb{Q}[x]$: The standard basis $(x+1)^n$ and the binomial basis ${{x+n-i} \choose {n}}$, where the Eulerian numbers for the Coxeter group of type $A$ (the symmetric group) serve as the entries of the transformation matrix. Brenti has generalized this identity to the Coxeter groups of types $B$ and $D$ (signed and eve…
▽ More
The well-known Worpitzky identity provides a connection between two bases of $\mathbb{Q}[x]$: The standard basis $(x+1)^n$ and the binomial basis ${{x+n-i} \choose {n}}$, where the Eulerian numbers for the Coxeter group of type $A$ (the symmetric group) serve as the entries of the transformation matrix. Brenti has generalized this identity to the Coxeter groups of types $B$ and $D$ (signed and even-signed permutations groups, respectively) using generating function techniques.
Motivated by Foata-Schützenberger and Rawlings' proof for the Worpitzky identity in the symmetric group, we provide combinatorial proofs of this identity and for their $q-$analogues in the Coxeter groups of types $B$ and $D$.
△ Less
Submitted 7 April, 2020;
originally announced April 2020.
-
On the Convergence of Stochastic Gradient Descent with Low-Rank Projections for Convex Low-Rank Matrix Problems
Authors:
Dan Garber
Abstract:
We revisit the use of Stochastic Gradient Descent (SGD) for solving convex optimization problems that serve as highly popular convex relaxations for many important low-rank matrix recovery problems such as \textit{matrix completion}, \textit{phase retrieval}, and more. The computational limitation of applying SGD to solving these relaxations in large-scale is the need to compute a potentially high…
▽ More
We revisit the use of Stochastic Gradient Descent (SGD) for solving convex optimization problems that serve as highly popular convex relaxations for many important low-rank matrix recovery problems such as \textit{matrix completion}, \textit{phase retrieval}, and more. The computational limitation of applying SGD to solving these relaxations in large-scale is the need to compute a potentially high-rank singular value decomposition (SVD) on each iteration in order to enforce the low-rank-promoting constraint. We begin by considering a simple and natural sufficient condition so that these relaxations indeed admit low-rank solutions. This condition is also necessary for a certain notion of low-rank-robustness to hold. Our main result shows that under this condition which involves the eigenvalues of the gradient vector at optimal points, SGD with mini-batches, when initialized with a "warm-start" point, produces iterates that are low-rank with high probability, and hence only a low-rank SVD computation is required on each iteration. This suggests that SGD may indeed be practically applicable to solving large-scale convex relaxations of low-rank matrix recovery problems. Our theoretical results are accompanied with supporting preliminary empirical evidence. As a side benefit, our analysis is quite simple and short.
△ Less
Submitted 14 June, 2020; v1 submitted 31 January, 2020;
originally announced January 2020.
-
Linear Convergence of Frank-Wolfe for Rank-One Matrix Recovery Without Strong Convexity
Authors:
Dan Garber
Abstract:
We consider convex optimization problems which are widely used as convex relaxations for low-rank matrix recovery problems. In particular, in several important problems, such as phase retrieval and robust PCA, the underlying assumption in many cases is that the optimal solution is rank-one. In this paper we consider a simple and natural sufficient condition on the objective so that the optimal sol…
▽ More
We consider convex optimization problems which are widely used as convex relaxations for low-rank matrix recovery problems. In particular, in several important problems, such as phase retrieval and robust PCA, the underlying assumption in many cases is that the optimal solution is rank-one. In this paper we consider a simple and natural sufficient condition on the objective so that the optimal solution to these relaxations is indeed unique and rank-one. Mainly, we show that under this condition, the standard Frank-Wolfe method with line-search (i.e., without any tuning of parameters whatsoever), which only requires a single rank-one SVD computation per iteration, finds an $ε$-approximated solution in only $O(\log{1/ε})$ iterations (as opposed to the previous best known bound of $O(1/ε)$), despite the fact that the objective is not strongly convex. We consider several variants of the basic method with improved complexities, as well as an extension motivated by robust PCA, and finally, an extension to nonsmooth problems.
△ Less
Submitted 19 June, 2022; v1 submitted 3 December, 2019;
originally announced December 2019.
-
Improved Regret Bounds for Projection-free Bandit Convex Optimization
Authors:
Dan Garber,
Ben Kretzu
Abstract:
We revisit the challenge of designing online algorithms for the bandit convex optimization problem (BCO) which are also scalable to high dimensional problems. Hence, we consider algorithms that are \textit{projection-free}, i.e., based on the conditional gradient method whose only access to the feasible decision set, is through a linear optimization oracle (as opposed to other methods which requir…
▽ More
We revisit the challenge of designing online algorithms for the bandit convex optimization problem (BCO) which are also scalable to high dimensional problems. Hence, we consider algorithms that are \textit{projection-free}, i.e., based on the conditional gradient method whose only access to the feasible decision set, is through a linear optimization oracle (as opposed to other methods which require potentially much more computationally-expensive subprocedures, such as computing Euclidean projections). We present the first such algorithm that attains $O(T^{3/4})$ expected regret using only $O(T)$ overall calls to the linear optimization oracle, in expectation, where $T$ is the number of prediction rounds. This improves over the $O(T^{4/5})$ expected regret bound recently obtained by \cite{Karbasi19}, and actually matches the current best regret bound for projection-free online learning in the \textit{full information} setting.
△ Less
Submitted 8 October, 2019;
originally announced October 2019.
-
Signed partitions - A balls into urns approach
Authors:
Eli Bagno,
David Garber
Abstract:
Using Reiner's definition of Stirling numbers of type B of the second kind, we provide a 'balls into urns' approach for proving a generalization of a well-known identity concerning the classical Stirling numbers of the second kind: $x^n=\sum\limits_{k=0}^n{S(n,k)[x]_k}.$
Using Reiner's definition of Stirling numbers of type B of the second kind, we provide a 'balls into urns' approach for proving a generalization of a well-known identity concerning the classical Stirling numbers of the second kind: $x^n=\sum\limits_{k=0}^n{S(n,k)[x]_k}.$
△ Less
Submitted 7 March, 2019;
originally announced March 2019.
-
On the Convergence of Projected-Gradient Methods with Low-Rank Projections for Smooth Convex Minimization over Trace-Norm Balls and Related Problems
Authors:
Dan Garber
Abstract:
Smooth convex minimization over the unit trace-norm ball is an important optimization problem in machine learning, signal processing, statistics and other fields, that underlies many tasks in which one wishes to recover a low-rank matrix given certain measurements. While first-order methods for convex optimization enjoy optimal convergence rates, they require in worst-case to compute a full-rank S…
▽ More
Smooth convex minimization over the unit trace-norm ball is an important optimization problem in machine learning, signal processing, statistics and other fields, that underlies many tasks in which one wishes to recover a low-rank matrix given certain measurements. While first-order methods for convex optimization enjoy optimal convergence rates, they require in worst-case to compute a full-rank SVD on each iteration, in order to compute the projection onto the trace-norm ball. These full-rank SVD computations however prohibit the application of such methods to large problems. A simple and natural heuristic to reduce the computational cost is to approximate the projection using only a low-rank SVD. This raises the question if, and under what conditions, this simple heuristic can indeed result in provable convergence to the optimal solution. In this paper we show that any optimal solution is a center of a Euclid. ball inside-which the projected-gradient map** admits rank that is at most the multiplicity of the largest singular value of the gradient vector. Moreover, the radius of the ball scales with the spectral gap of this gradient vector. We show how this readily implies the local convergence (i.e., from a "warm-start" initialization) of standard first-order methods, using only low-rank SVD computations. We also quantify the effect of "over-parameterization", i.e., using SVD computations with higher rank, on the radius of this ball, showing it can increase dramatically with moderately larger rank. We extend our results also to the setting of optimization with trace-norm regularization and optimization over bounded-trace positive semidefinite matrices. Our theoretical investigation is supported by concrete empirical evidence that demonstrates the \textit{correct} convergence of first-order methods with low-rank projections on real-world datasets.
△ Less
Submitted 28 November, 2020; v1 submitted 5 February, 2019;
originally announced February 2019.
-
Some identities involving second kind Stirling numbers of types $B$ and $D$
Authors:
Eli Bagno,
Riccardo Biagioli,
David Garber
Abstract:
Using Reiner's definition of Stirling numbers of the second kind in types $B$ and $D$, we generalize two well-known identities concerning the classical Stirling numbers of the second kind. The first identity relates them with Eulerian numbers and the second identity interprets them as entries in a transition matrix between the elements of two standard bases of the polynomial ring $R[x]$. Finally,…
▽ More
Using Reiner's definition of Stirling numbers of the second kind in types $B$ and $D$, we generalize two well-known identities concerning the classical Stirling numbers of the second kind. The first identity relates them with Eulerian numbers and the second identity interprets them as entries in a transition matrix between the elements of two standard bases of the polynomial ring $R[x]$. Finally, we generalize these identities to the group of colored permutations $G_{m,n}$.
△ Less
Submitted 23 January, 2019;
originally announced January 2019.
-
Property testing and expansion in cubical complexes
Authors:
David Garber,
Uzi Vishne
Abstract:
We consider expansion and property testing in the language of incidence geometry, covering both simplicial and cubical complexes in any dimension. We develop a general method for passing from an explicit description of the cohomology group, which need not be trivial, to a testability proof with linear ratio between errors. The method is demonstrated by testing functions on 2-cells in cubical compl…
▽ More
We consider expansion and property testing in the language of incidence geometry, covering both simplicial and cubical complexes in any dimension. We develop a general method for passing from an explicit description of the cohomology group, which need not be trivial, to a testability proof with linear ratio between errors. The method is demonstrated by testing functions on 2-cells in cubical complexes to be induced from the edges.
△ Less
Submitted 26 November, 2019; v1 submitted 27 September, 2018;
originally announced September 2018.
-
On the Regret Minimization of Nonconvex Online Gradient Ascent for Online PCA
Authors:
Dan Garber
Abstract:
In this paper we focus on the problem of Online Principal Component Analysis in the regret minimization framework. For this problem, all existing regret minimization algorithms for the fully-adversarial setting are based on a positive semidefinite convex relaxation, and hence require quadratic memory and SVD computation (either thin of full) on each iteration, which amounts to at least quadratic r…
▽ More
In this paper we focus on the problem of Online Principal Component Analysis in the regret minimization framework. For this problem, all existing regret minimization algorithms for the fully-adversarial setting are based on a positive semidefinite convex relaxation, and hence require quadratic memory and SVD computation (either thin of full) on each iteration, which amounts to at least quadratic runtime per iteration. This is in stark contrast to a corresponding stochastic i.i.d. variant of the problem, which was studied extensively lately, and admits very efficient gradient ascent algorithms that work directly on the natural non-convex formulation of the problem, and hence require only linear memory and linear runtime per iteration. This raises the question: can non-convex online gradient ascent algorithms be shown to minimize regret in online adversarial settings? In this paper we take a step forward towards answering this question. We introduce an \textit{adversarially-perturbed spiked-covariance model} in which, each data point is assumed to follow a fixed stochastic distribution with a non-zero spectral gap in the covariance matrix, but is then perturbed with some adversarial vector. This model is a natural extension of a well studied standard stochastic setting that allows for non-stationary (adversarial) patterns to arise in the data and hence, might serve as a significantly better approximation for real-world data-streams. We show that in an interesting regime of parameters, when the non-convex online gradient ascent algorithm is initialized with a "warm-start" vector, it provably minimizes the regret with high probability. We further discuss the possibility of computing such a "warm-start" vector, and also the use of regularization to obtain fast regret rates. Our theoretical findings are supported by empirical experiments on both synthetic and real-world data.
△ Less
Submitted 31 January, 2019; v1 submitted 27 September, 2018;
originally announced September 2018.
-
Fast Stochastic Algorithms for Low-rank and Nonsmooth Matrix Problems
Authors:
Dan Garber,
Atara Kaplan
Abstract:
Composite convex optimization problems which include both a nonsmooth term and a low-rank promoting term have important applications in machine learning and signal processing, such as when one wishes to recover an unknown matrix that is simultaneously low-rank and sparse. However, such problems are highly challenging to solve in large-scale: the low-rank promoting term prohibits efficient implemen…
▽ More
Composite convex optimization problems which include both a nonsmooth term and a low-rank promoting term have important applications in machine learning and signal processing, such as when one wishes to recover an unknown matrix that is simultaneously low-rank and sparse. However, such problems are highly challenging to solve in large-scale: the low-rank promoting term prohibits efficient implementations of proximal methods for composite optimization and even simple subgradient methods. On the other hand, methods which are tailored for low-rank optimization, such as conditional gradient-type methods, which are often applied to a smooth approximation of the nonsmooth objective, are slow since their runtime scales with both the large Lipshitz parameter of the smoothed gradient vector and with $1/ε$. In this paper we develop efficient algorithms for \textit{stochastic} optimization of a strongly-convex objective which includes both a nonsmooth term and a low-rank promoting term. In particular, to the best of our knowledge, we present the first algorithm that enjoys all following critical properties for large-scale problems: i) (nearly) optimal sample complexity, ii) each iteration requires only a single \textit{low-rank} SVD computation, and iii) overall number of thin-SVD computations scales only with $\log{1/ε}$ (as opposed to $\textrm{poly}(1/ε)$ in previous methods). We also give an algorithm for the closely-related finite-sum setting. At the heart of our results lie a novel combination of a variance-reduction technique and the use of a \textit{weak-proximal oracle} which is key to obtaining all above three properties simultaneously.
△ Less
Submitted 27 September, 2018;
originally announced September 2018.
-
Improved Complexities of Conditional Gradient-Type Methods with Applications to Robust Matrix Recovery Problems
Authors:
Dan Garber,
Shoham Sabach,
Atara Kaplan
Abstract:
Motivated by robust matrix recovery problems such as Robust Principal Component Analysis, we consider a general optimization problem of minimizing a smooth and strongly convex loss function applied to the sum of two blocks of variables, where each block of variables is constrained or regularized individually. We study a Conditional Gradient-Type method which is able to leverage the special structu…
▽ More
Motivated by robust matrix recovery problems such as Robust Principal Component Analysis, we consider a general optimization problem of minimizing a smooth and strongly convex loss function applied to the sum of two blocks of variables, where each block of variables is constrained or regularized individually. We study a Conditional Gradient-Type method which is able to leverage the special structure of the problem to obtain faster convergence rates than those attainable via standard methods, under a variety of assumptions. In particular, our method is appealing for matrix problems in which one of the blocks corresponds to a low-rank matrix since it avoids prohibitive full-rank singular value decompositions required by most standard methods. While our initial motivation comes from problems which originated in statistics, our analysis does not impose any statistical assumptions on the data.
△ Less
Submitted 15 November, 2019; v1 submitted 15 February, 2018;
originally announced February 2018.
-
Logarithmic Regret for Online Gradient Descent Beyond Strong Convexity
Authors:
Dan Garber
Abstract:
Hoffman's classical result gives a bound on the distance of a point from a convex and compact polytope in terms of the magnitude of violation of the constraints. Recently, several results showed that Hoffman's bound can be used to derive strongly-convex-like rates for first-order methods for \textit{offline} convex optimization of curved, though not strongly convex, functions, over polyhedral sets…
▽ More
Hoffman's classical result gives a bound on the distance of a point from a convex and compact polytope in terms of the magnitude of violation of the constraints. Recently, several results showed that Hoffman's bound can be used to derive strongly-convex-like rates for first-order methods for \textit{offline} convex optimization of curved, though not strongly convex, functions, over polyhedral sets. In this work, we use this classical result for the first time to obtain faster rates for \textit{online convex optimization} over polyhedral sets with curved convex, though not strongly convex, loss functions. We show that under several reasonable assumptions on the data, the standard \textit{Online Gradient Descent} algorithm guarantees logarithmic regret. To the best of our knowledge, the only previous algorithm to achieve logarithmic regret in the considered settings is the \textit{Online Newton Step} algorithm which requires quadratic (in the dimension) memory and at least quadratic runtime per iteration, which greatly limits its applicability to large-scale problems. In particular, our results hold for \textit{semi-adversarial} settings in which the data is a combination of an arbitrary (adversarial) sequence and a stochastic sequence, which might provide reasonable approximation for many real-world sequences, or under a natural assumption that the data is low-rank. We demonstrate via experiments that the regret of OGD is indeed comparable to that of ONS (and even far better) on curved though not strongly-convex losses.
△ Less
Submitted 18 February, 2019; v1 submitted 13 February, 2018;
originally announced February 2018.
-
Efficient Online Linear Optimization with Approximation Algorithms
Authors:
Dan Garber
Abstract:
We revisit the problem of \textit{online linear optimization} in case the set of feasible actions is accessible through an approximated linear optimization oracle with a factor $α$ multiplicative approximation guarantee. This setting is in particular interesting since it captures natural online extensions of well-studied \textit{offline} linear optimization problems which are NP-hard, yet admit ef…
▽ More
We revisit the problem of \textit{online linear optimization} in case the set of feasible actions is accessible through an approximated linear optimization oracle with a factor $α$ multiplicative approximation guarantee. This setting is in particular interesting since it captures natural online extensions of well-studied \textit{offline} linear optimization problems which are NP-hard, yet admit efficient approximation algorithms. The goal here is to minimize the $α$\textit{-regret} which is the natural extension of the standard \textit{regret} in \textit{online learning} to this setting.
We present new algorithms with significantly improved oracle complexity for both the full information and bandit variants of the problem. Mainly, for both variants, we present $α$-regret bounds of $O(T^{-1/3})$, were $T$ is the number of prediction rounds, using only $O(\log{T})$ calls to the approximation oracle per iteration, on average. These are the first results to obtain both average oracle complexity of $O(\log{T})$ (or even poly-logarithmic in $T$) and $α$-regret bound $O(T^{-c})$ for a constant $c>0$, for both variants.
△ Less
Submitted 10 September, 2017;
originally announced September 2017.
-
Efficient coordinate-wise leading eigenvector computation
Authors:
Jialei Wang,
Weiran Wang,
Dan Garber,
Nathan Srebro
Abstract:
We develop and analyze efficient "coordinate-wise" methods for finding the leading eigenvector, where each step involves only a vector-vector product. We establish global convergence with overall runtime guarantees that are at least as good as Lanczos's method and dominate it for slowly decaying spectrum. Our methods are based on combining a shift-and-invert approach with coordinate-wise algorithm…
▽ More
We develop and analyze efficient "coordinate-wise" methods for finding the leading eigenvector, where each step involves only a vector-vector product. We establish global convergence with overall runtime guarantees that are at least as good as Lanczos's method and dominate it for slowly decaying spectrum. Our methods are based on combining a shift-and-invert approach with coordinate-wise algorithms for linear regression.
△ Less
Submitted 25 February, 2017;
originally announced February 2017.
-
Faster Eigenvector Computation via Shift-and-Invert Preconditioning
Authors:
Dan Garber,
Elad Hazan,
Chi **,
Sham M. Kakade,
Cameron Musco,
Praneeth Netrapalli,
Aaron Sidford
Abstract:
We give faster algorithms and improved sample complexities for estimating the top eigenvector of a matrix $Σ$ -- i.e. computing a unit vector $x$ such that $x^T Σx \ge (1-ε)λ_1(Σ)$:
Offline Eigenvector Estimation: Given an explicit $A \in \mathbb{R}^{n \times d}$ with $Σ= A^TA$, we show how to compute an $ε$ approximate top eigenvector in time…
▽ More
We give faster algorithms and improved sample complexities for estimating the top eigenvector of a matrix $Σ$ -- i.e. computing a unit vector $x$ such that $x^T Σx \ge (1-ε)λ_1(Σ)$:
Offline Eigenvector Estimation: Given an explicit $A \in \mathbb{R}^{n \times d}$ with $Σ= A^TA$, we show how to compute an $ε$ approximate top eigenvector in time $\tilde O([nnz(A) + \frac{d*sr(A)}{gap^2} ]* \log 1/ε)$ and $\tilde O([\frac{nnz(A)^{3/4} (d*sr(A))^{1/4}}{\sqrt{gap}} ] * \log 1/ε)$. Here $nnz(A)$ is the number of nonzeros in $A$, $sr(A)$ is the stable rank, $gap$ is the relative eigengap. By separating the $gap$ dependence from the $nnz(A)$ term, our first runtime improves upon the classical power and Lanczos methods. It also improves prior work using fast subspace embeddings [AC09, CW13] and stochastic optimization [Sha15c], giving significantly better dependencies on $sr(A)$ and $ε$. Our second running time improves these further when $nnz(A) \le \frac{d*sr(A)}{gap^2}$.
Online Eigenvector Estimation: Given a distribution $D$ with covariance matrix $Σ$ and a vector $x_0$ which is an $O(gap)$ approximate top eigenvector for $Σ$, we show how to refine to an $ε$ approximation using $ O(\frac{var(D)}{gap*ε})$ samples from $D$. Here $var(D)$ is a natural notion of variance. Combining our algorithm with previous work to initialize $x_0$, we obtain improved sample complexity and runtime results under a variety of assumptions on $D$.
We achieve our results using a general framework that we believe is of independent interest. We give a robust analysis of the classic method of shift-and-invert preconditioning to reduce eigenvector computation to approximately solving a sequence of linear systems. We then apply fast stochastic variance reduced gradient (SVRG) based system solvers to achieve our claims.
△ Less
Submitted 25 May, 2016;
originally announced May 2016.
-
Linear-memory and Decomposition-invariant Linearly Convergent Conditional Gradient Algorithm for Structured Polytopes
Authors:
Dan Garber,
Ofer Meshi
Abstract:
Recently, several works have shown that natural modifications of the classical conditional gradient method (aka Frank-Wolfe algorithm) for constrained convex optimization, provably converge with a linear rate when: i) the feasible set is a polytope, and ii) the objective is smooth and strongly-convex. However, all of these results suffer from two significant shortcomings: large memory requirement…
▽ More
Recently, several works have shown that natural modifications of the classical conditional gradient method (aka Frank-Wolfe algorithm) for constrained convex optimization, provably converge with a linear rate when: i) the feasible set is a polytope, and ii) the objective is smooth and strongly-convex. However, all of these results suffer from two significant shortcomings: large memory requirement due to the need to store an explicit convex decomposition of the current iterate, and as a consequence, large running-time overhead per iteration, and worst case convergence rate that depends unfavorably on the dimension.
In this work we present a new conditional gradient variant and a corresponding analysis that improves on both of the above shortcomings. In particular: both memory and computation overheads are only linear in the dimension. Moreover, in case the optimal solution is sparse, the new convergence rate replaces a factor which is at least linear in the dimension in previous works, with a linear dependence on the number of non-zeros in the optimal solution.
At the heart of our method, and corresponding analysis, is a novel way to compute decomposition-invariant away-steps. While our theoretical guarantees do not apply to any polytope, they apply to several important structured polytopes that capture central concepts such as paths in graphs, perfect matchings in bipartite graphs, marginal distributions that arise in structured prediction tasks, and more. Our theoretical findings are complemented by empirical evidence which shows that our method delivers state-of-the-art performance.
△ Less
Submitted 20 May, 2016;
originally announced May 2016.
-
Faster Projection-free Convex Optimization over the Spectrahedron
Authors:
Dan Garber
Abstract:
Minimizing a convex function over the spectrahedron, i.e., the set of all positive semidefinite matrices with unit trace, is an important optimization task with many applications in optimization, machine learning, and signal processing. It is also notoriously difficult to solve in large-scale since standard techniques require expensive matrix decompositions. An alternative, is the conditional grad…
▽ More
Minimizing a convex function over the spectrahedron, i.e., the set of all positive semidefinite matrices with unit trace, is an important optimization task with many applications in optimization, machine learning, and signal processing. It is also notoriously difficult to solve in large-scale since standard techniques require expensive matrix decompositions. An alternative, is the conditional gradient method (aka Frank-Wolfe algorithm) that regained much interest in recent years, mostly due to its application to this specific setting. The key benefit of the CG method is that it avoids expensive matrix decompositions all together, and simply requires a single eigenvector computation per iteration, which is much more efficient. On the downside, the CG method, in general, converges with an inferior rate. The error for minimizing a $β$-smooth function after $t$ iterations scales like $β/t$. This convergence rate does not improve even if the function is also strongly convex.
In this work we present a modification of the CG method tailored for convex optimization over the spectrahedron. The per-iteration complexity of the method is essentially identical to that of the standard CG method: only a single eigenvecor computation is required. For minimizing an $α$-strongly convex and $β$-smooth function, the expected approximation error of the method after $t$ iterations is: $$O\left({\min\{\frac{β}{t} ,\left({\frac{β\sqrt{\textrm{rank}(\textbf{X}^*)}}{α^{1/4}t}}\right)^{4/3}, \left({\fracβ{\sqrtαλ_{\min}(\textbf{X}^*)t}}\right)^{2}\}}\right) ,$$ where $\textbf{X}^*$ is the optimal solution. To the best of our knowledge, this is the first result that attains provably faster convergence rates for a CG variant for optimization over the spectrahedron. We also present encouraging preliminary empirical results.
△ Less
Submitted 19 May, 2016;
originally announced May 2016.
-
Fast and Simple PCA via Convex Optimization
Authors:
Dan Garber,
Elad Hazan
Abstract:
The problem of principle component analysis (PCA) is traditionally solved by spectral or algebraic methods. We show how computing the leading principal component could be reduced to solving a \textit{small} number of well-conditioned {\it convex} optimization problems. This gives rise to a new efficient method for PCA based on recent advances in stochastic methods for convex optimization.
In par…
▽ More
The problem of principle component analysis (PCA) is traditionally solved by spectral or algebraic methods. We show how computing the leading principal component could be reduced to solving a \textit{small} number of well-conditioned {\it convex} optimization problems. This gives rise to a new efficient method for PCA based on recent advances in stochastic methods for convex optimization.
In particular we show that given a $d\times d$ matrix $\X = \frac{1}{n}\sum_{i=1}^n\x_i\x_i^{\top}$ with top eigenvector $\u$ and top eigenvalue $λ_1$ it is possible to: \begin{itemize} \item compute a unit vector $\w$ such that $(\w^{\top}\u)^2 \geq 1-ε$ in $\tilde{O}\left({\frac{d}{δ^2}+N}\right)$ time, where $δ= λ_1 - λ_2$ and $N$ is the total number of non-zero entries in $\x_1,...,\x_n$,
\item compute a unit vector $\w$ such that $\w^{\top}\X\w \geq λ_1-ε$ in $\tilde{O}(d/ε^2)$ time. \end{itemize} To the best of our knowledge, these bounds are the fastest to date for a wide regime of parameters. These results could be further accelerated when $δ$ (in the first case) and $ε$ (in the second case) are smaller than $\sqrt{d/N}$.
△ Less
Submitted 25 November, 2015; v1 submitted 18 September, 2015;
originally announced September 2015.
-
Faster Rates for the Frank-Wolfe Method over Strongly-Convex Sets
Authors:
Dan Garber,
Elad Hazan
Abstract:
The Frank-Wolfe method (a.k.a. conditional gradient algorithm) for smooth optimization has regained much interest in recent years in the context of large scale optimization and machine learning. A key advantage of the method is that it avoids projections - the computational bottleneck in many applications - replacing it by a linear optimization step. Despite this advantage, the known convergence r…
▽ More
The Frank-Wolfe method (a.k.a. conditional gradient algorithm) for smooth optimization has regained much interest in recent years in the context of large scale optimization and machine learning. A key advantage of the method is that it avoids projections - the computational bottleneck in many applications - replacing it by a linear optimization step. Despite this advantage, the known convergence rates of the FW method fall behind standard first order methods for most settings of interest. It is an active line of research to derive faster linear optimization-based algorithms for various settings of convex optimization.
In this paper we consider the special case of optimization over strongly convex sets, for which we prove that the vanila FW method converges at a rate of $\frac{1}{t^2}$. This gives a quadratic improvement in convergence rate compared to the general case, in which convergence is of the order $\frac{1}{t}$, and known to be tight. We show that various balls induced by $\ell_p$ norms, Schatten norms and group norms are strongly convex on one hand and on the other hand, linear optimization over these sets is straightforward and admits a closed-form solution. We further show how several previous fast-rate results for the FW method follow easily from our analysis.
△ Less
Submitted 14 August, 2015; v1 submitted 5 June, 2014;
originally announced June 2014.
-
On the group of alternating colored permutations
Authors:
Eli Bagno,
David Garber,
Toufik Mansour
Abstract:
The group of alternating colored permutations is the natural analogue of the classical alternating group, inside the wreath product $\mathbb{Z}_r \wr S_n$. We present a 'Coxeter-like' presentation for this group and compute the length function with respect to that presentation. Then, we present this group as a covering of $\mathbb{Z}_{\frac{r}{2}} \wr S_n$ and use this point of view to give anothe…
▽ More
The group of alternating colored permutations is the natural analogue of the classical alternating group, inside the wreath product $\mathbb{Z}_r \wr S_n$. We present a 'Coxeter-like' presentation for this group and compute the length function with respect to that presentation. Then, we present this group as a covering of $\mathbb{Z}_{\frac{r}{2}} \wr S_n$ and use this point of view to give another expression for the length function. We also use this covering to lift several known parameters of $\mathbb{Z}_{\frac{r}{2}} \wr S_n$ to the group of alternating colored permutations.
△ Less
Submitted 22 January, 2014;
originally announced January 2014.
-
Double Centralizers of Parabolic Subgroups of Braid Groups
Authors:
David Garber,
Arkadius Kalka,
Eran Liberman,
Mina Teicher
Abstract:
We characterize the double centralizer of all parabolic subgroups of the braid groups. We apply this result to provide a new and potentially more efficient solution to the subgroup conjugacy problem for parabolic subgroups. In the course of the proof we also characterize the centralizer for all parabolic subgroups.
We characterize the double centralizer of all parabolic subgroups of the braid groups. We apply this result to provide a new and potentially more efficient solution to the subgroup conjugacy problem for parabolic subgroups. In the course of the proof we also characterize the centralizer for all parabolic subgroups.
△ Less
Submitted 13 June, 2015; v1 submitted 3 October, 2013;
originally announced October 2013.
-
Length-based attacks in polycyclic groups
Authors:
David Garber,
Delaram Kahrobaei,
Ha T. Lam
Abstract:
After the Anshel-Anshel-Goldfeld (AAG) key-exchange protocol was introduced in 1999, it was implemented and studied with braid groups and with the Thompson group as its underlying platforms. The length-based attack, introduced by Hughes and Tannenbaum, has been used to extensively study AAG with the braid group as the underlying platform. Meanwhile, a new platform, using polycyclic groups, was pro…
▽ More
After the Anshel-Anshel-Goldfeld (AAG) key-exchange protocol was introduced in 1999, it was implemented and studied with braid groups and with the Thompson group as its underlying platforms. The length-based attack, introduced by Hughes and Tannenbaum, has been used to extensively study AAG with the braid group as the underlying platform. Meanwhile, a new platform, using polycyclic groups, was proposed by Eick and Kahrobaei.
In this paper, we show that with a high enough Hirsch length, the polycyclic group as an underlying platform for AAG is resistant to the length-based attack. In particular, polycyclic groups could provide a secure platform for any cryptosystem based on conjugacy search problem such as non-commutative Diffie-Hellman, ElGamal and Cramer-Shoup key exchange protocols.
△ Less
Submitted 22 November, 2014; v1 submitted 2 May, 2013;
originally announced May 2013.
-
On the structure of fundamental groups of conic-line arrangements having a cycle in their graph
Authors:
Michael Friedman,
David Garber
Abstract:
The fundamental group of the complement of a plane curve is a very important topological invariant. In particular, it is interesting to find out whether this group is determined by the combinatorics of the curve or not, and whether it is a direct sum of free groups and a free abelian group, or it has a conjugation-free geometric presentation.
In this paper, we investigate the structure of this f…
▽ More
The fundamental group of the complement of a plane curve is a very important topological invariant. In particular, it is interesting to find out whether this group is determined by the combinatorics of the curve or not, and whether it is a direct sum of free groups and a free abelian group, or it has a conjugation-free geometric presentation.
In this paper, we investigate the structure of this fundamental group when the graph of the conic-line arrangement is a unique cycle of length $n$ and the conic passes through all the multiple points of the cycle. We show that if n is odd, then the affine fundamental group is abelian but not conjugation-free. For the even case, if n>4, then using quotients of the lower central series, we show that the fundamental group is not even a direct sum of a free abelian group and free groups.
△ Less
Submitted 29 April, 2013;
originally announced April 2013.
-
On Left regular bands and real Conic-Line arrangements
Authors:
Michael Friedman,
David Garber
Abstract:
An arrangement of curves in the real plane divides it into a collection of faces. In the case of line arrangements, there exists an associative product which gives this collection a structure of a left regular band. A natural question is whether the same is possible for other arrangements. In this paper, we try to answer this question for the simplest generalization of line arrangements, that is,…
▽ More
An arrangement of curves in the real plane divides it into a collection of faces. In the case of line arrangements, there exists an associative product which gives this collection a structure of a left regular band. A natural question is whether the same is possible for other arrangements. In this paper, we try to answer this question for the simplest generalization of line arrangements, that is, conic--line arrangements.
Investigating the different algebraic structures induced by the face poset of a conic--line arrangement, we present two different generalizations for the product and its associated structures: an alternative left regular band and an associative aperiodic semigroup. We also study the structure of sub left regular bands induced by these arrangements. We finish with some chamber counting results for conic--line arrangements.
△ Less
Submitted 29 August, 2018; v1 submitted 22 January, 2013;
originally announced January 2013.
-
A Linearly Convergent Conditional Gradient Algorithm with Applications to Online and Stochastic Optimization
Authors:
Dan Garber,
Elad Hazan
Abstract:
Linear optimization is many times algorithmically simpler than non-linear convex optimization. Linear optimization over matroid polytopes, matching polytopes and path polytopes are example of problems for which we have simple and efficient combinatorial algorithms, but whose non-linear convex counterpart is harder and admits significantly less efficient algorithms. This motivates the computational…
▽ More
Linear optimization is many times algorithmically simpler than non-linear convex optimization. Linear optimization over matroid polytopes, matching polytopes and path polytopes are example of problems for which we have simple and efficient combinatorial algorithms, but whose non-linear convex counterpart is harder and admits significantly less efficient algorithms. This motivates the computational model of convex optimization, including the offline, online and stochastic settings, using a linear optimization oracle. In this computational model we give several new results that improve over the previous state-of-the-art. Our main result is a novel conditional gradient algorithm for smooth and strongly convex optimization over polyhedral sets that performs only a single linear optimization step over the domain on each iteration and enjoys a linear convergence rate. This gives an exponential improvement in convergence rate over previous results.
Based on this new conditional gradient algorithm we give the first algorithms for online convex optimization over polyhedral sets that perform only a single linear optimization step over the domain while having optimal regret guarantees, answering an open question of Kalai and Vempala, and Hazan and Kale. Our online algorithms also imply conditional gradient algorithms for non-smooth and stochastic convex optimization with the same convergence rates as projected (sub)gradient methods.
△ Less
Submitted 14 August, 2015; v1 submitted 20 January, 2013;
originally announced January 2013.
-
Almost Optimal Sublinear Time Algorithm for Semidefinite Programming
Authors:
Dan Garber,
Elad Hazan
Abstract:
We present an algorithm for approximating semidefinite programs with running time that is sublinear in the number of entries in the semidefinite instance. We also present lower bounds that show our algorithm to have a nearly optimal running time.
We present an algorithm for approximating semidefinite programs with running time that is sublinear in the number of entries in the semidefinite instance. We also present lower bounds that show our algorithm to have a nearly optimal running time.
△ Less
Submitted 26 August, 2012;
originally announced August 2012.
-
On the Orchard crossing number of prisms, ladders and other related graphs
Authors:
Elie Feder,
David Garber
Abstract:
This paper deals with the Orchard crossing number of some families of graphs which are based on cycles. These include disjoint cycles, cycles which share a vertex and cycles which share an edge. Specifically, we focus on the prism and ladder graphs.
This paper deals with the Orchard crossing number of some families of graphs which are based on cycles. These include disjoint cycles, cycles which share a vertex and cycles which share an edge. Specifically, we focus on the prism and ladder graphs.
△ Less
Submitted 23 November, 2011;
originally announced November 2011.
-
On the structure of conjugation-free fundamental groups of conic-line arrangements
Authors:
Michael Friedman,
David Garber
Abstract:
The fundamental group of the complement of a hyperplane arrangement plays an important role in studying the corresponding arrangements. In particular, for large families of hyperplane arrangements, this fundamental group, being isomorphic to the fundamental group of a complement of a line arrangement, has some remarkable properties: either it is a direct sum of free groups and a free abelian group…
▽ More
The fundamental group of the complement of a hyperplane arrangement plays an important role in studying the corresponding arrangements. In particular, for large families of hyperplane arrangements, this fundamental group, being isomorphic to the fundamental group of a complement of a line arrangement, has some remarkable properties: either it is a direct sum of free groups and a free abelian group, or it has a conjugation-free geometric presentation.
In this paper, we first give a complete proof to the following key lemma: if we draw a new line through only one intersection point of a given real line arrangement whose fundamental group is conjugation-free, then the fundamental group of the new arrangement is also conjugation-free.
Second, we generalize this lemma to the case of conic-line arrangements. Moreover, we prove that once the graph associated to conic-line arrangements (defined slightly different than the corresponding graph for line arrangements) has no cycles, then the fundamental group of its complement has a conjugation-free geometric presentation and in addition can be written as a direct sum of free groups and a free abelian group. Also, we show that if the graph consists of one cycle, and the conic does not pass through all the multiple points corresponding to the vertices of the cycle, then the fundamental group has a conjugation-free geometric presentation as well.
For conclusion, we extend the family of real line arrangements having a conjugation-free geometric presentation (for their fundamental group) by defining the notion of a conjugation-free graph. We also extend this notion to certain families of conic-line arrangements.
△ Less
Submitted 28 April, 2013; v1 submitted 22 November, 2011;
originally announced November 2011.
-
A conjugation-free geometric presentation of fundamental groups of arrangements II: Expansion and some properties
Authors:
Meital Eliyahu,
David Garber,
Mina Teicher
Abstract:
A conjugation-free geometric presentation of a fundamental group is a presentation with the natural topological generators $x_1, ..., x_n$ and the cyclic relations: $x_{i_k}x_{i_{k-1}} ... x_{i_1} = x_{i_{k-1}} ... x_{i_1} x_{i_k} = ... = x_{i_1} x_{i_k} ... x_{i_2}$ with no conjugations on the generators.
We have already proved that if the graph of the arrangement is a disjoint union of cycles,…
▽ More
A conjugation-free geometric presentation of a fundamental group is a presentation with the natural topological generators $x_1, ..., x_n$ and the cyclic relations: $x_{i_k}x_{i_{k-1}} ... x_{i_1} = x_{i_{k-1}} ... x_{i_1} x_{i_k} = ... = x_{i_1} x_{i_k} ... x_{i_2}$ with no conjugations on the generators.
We have already proved that if the graph of the arrangement is a disjoint union of cycles, then its fundamental group has a conjugation-free geometric presentation. In this paper, we extend this property to arrangements whose graphs are a disjoint union of cycle-tree graphs.
Moreover, we study some properties of this type of presentations for a fundamental group of a line arrangement's complement. We show that these presentations satisfy a completeness property in the sense of Dehornoy, if the corresponding graph of the arrangement has no edges. The completeness property is a powerful property which leads to many nice properties concerning the presentation (such as the left-cancellativity of the associated monoid and yields some simple criterion for the solvability of the word problem in the group).
△ Less
Submitted 6 June, 2012; v1 submitted 7 September, 2010;
originally announced September 2010.
-
On the Orchard crossing number of complete bipartite graphs
Authors:
Elie Feder,
David Garber
Abstract:
We compute the Orchard crossing number, which is defined in a similar way to the rectilinear crossing number, for the complete bipartite graphs K_{n,n}.
We compute the Orchard crossing number, which is defined in a similar way to the rectilinear crossing number, for the complete bipartite graphs K_{n,n}.
△ Less
Submitted 16 August, 2010;
originally announced August 2010.
-
Eulerian partitions for configurations of skew lines
Authors:
Roland Bacher,
David Garber
Abstract:
In this paper, which is a complement of \cite{BG}, we study a few elementary invariants for configurations of skew lines, as introduced and analyzed first by Viro and his collaborators. We slightly simplify the exposition of some known invariants and use them to define a natural partition of the lines in a skew configuration.
We also describe an algorithm which constructs a spindle-permutation f…
▽ More
In this paper, which is a complement of \cite{BG}, we study a few elementary invariants for configurations of skew lines, as introduced and analyzed first by Viro and his collaborators. We slightly simplify the exposition of some known invariants and use them to define a natural partition of the lines in a skew configuration.
We also describe an algorithm which constructs a spindle-permutation for a given switching class, or proves non-existence of such a spindle-permutation.
△ Less
Submitted 17 June, 2010;
originally announced June 2010.
-
Conjugation-free geometric presentations of fundamental groups of arrangements
Authors:
Meital Eliyahu,
David Garber,
Mina Teicher
Abstract:
We introduce the notion of a conjugation-free geometric presentation for a fundamental group of a line arrangement's complement, and we show that the fundamental groups of the following family of arrangements have a conjugation-free geometric presentation: A real arrangement L, whose graph of multiple points is a union of disjoint cycles, has no line with more than two multiple points, and where t…
▽ More
We introduce the notion of a conjugation-free geometric presentation for a fundamental group of a line arrangement's complement, and we show that the fundamental groups of the following family of arrangements have a conjugation-free geometric presentation: A real arrangement L, whose graph of multiple points is a union of disjoint cycles, has no line with more than two multiple points, and where the multiplicities of the multiple points are arbitrary.
We also compute the exact group structure (by means of a semi-direct product of groups) of the arrangement of 6 lines whose graph consists of a cycle of length 3, and all the multiple points have multiplicity 3.
△ Less
Submitted 16 March, 2010; v1 submitted 30 October, 2008;
originally announced October 2008.
-
On the excedance sets of colored permutations
Authors:
Eli Bagno,
David Garber,
Robert Shwartz
Abstract:
We define the excedence set and the excedance word on $G_{r,n}$, generalizing a work of Ehrenborg and Steingrimsson and use the inclusion-exclusion principle to calculate the number of colored permutations having a prescribed excedance word. We show some symmetric properties as Log concavity and unimodality of a specific sequence of excedance words.
We define the excedence set and the excedance word on $G_{r,n}$, generalizing a work of Ehrenborg and Steingrimsson and use the inclusion-exclusion principle to calculate the number of colored permutations having a prescribed excedance word. We show some symmetric properties as Log concavity and unimodality of a specific sequence of excedance words.
△ Less
Submitted 12 June, 2008;
originally announced June 2008.