Search | arXiv e-print repository

doi 10.4204/EPTCS.403.12

Type-B analogue of Bell numbers using Rota's Umbral calculus approach

Abstract: Rota used the functional L to recover old properties and obtain some new formulas for the Bell numbers. Tanny used Rota's functional L and the celebrated Worpitzky identity to obtain some expression for the ordered Bell numbers, which can be seen as an evident to the fact that the ordered Bell numbers are gamma-positive. In this paper, we extend some of Rota's and Tanny's results to the framework… ▽ More Rota used the functional L to recover old properties and obtain some new formulas for the Bell numbers. Tanny used Rota's functional L and the celebrated Worpitzky identity to obtain some expression for the ordered Bell numbers, which can be seen as an evident to the fact that the ordered Bell numbers are gamma-positive. In this paper, we extend some of Rota's and Tanny's results to the framework of the set partitions of Coxeter type B. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: In Proceedings GASCom 2024, arXiv:2406.14588

Journal ref: EPTCS 403, 2024, pp. 43-48

arXiv:2402.08799 [pdf, ps, other]

Projection-Free Online Convex Optimization with Time-Varying Constraints

Authors: Dan Garber, Ben Kretzu

Abstract: We consider the setting of online convex optimization with adversarial time-varying constraints in which actions must be feasible w.r.t. a fixed constraint set, and are also required on average to approximately satisfy additional time-varying constraints. Motivated by scenarios in which the fixed feasible set (hard constraint) is difficult to project on, we consider projection-free algorithms that… ▽ More We consider the setting of online convex optimization with adversarial time-varying constraints in which actions must be feasible w.r.t. a fixed constraint set, and are also required on average to approximately satisfy additional time-varying constraints. Motivated by scenarios in which the fixed feasible set (hard constraint) is difficult to project on, we consider projection-free algorithms that access this set only through a linear optimization oracle (LOO). We present an algorithm that, on a sequence of length $T$ and using overall $T$ calls to the LOO, guarantees $\tilde{O}(T^{3/4})$ regret w.r.t. the losses and $O(T^{7/8})$ constraints violation (ignoring all quantities except for $T$) . In particular, these bounds hold w.r.t. any interval of the sequence. We also present a more efficient algorithm that requires only first-order oracle access to the soft constraints and achieves similar bounds w.r.t. the entire sequence. We extend the latter to the setting of bandit feedback and obtain similar bounds (as a function of $T$) in expectation. △ Less

Submitted 13 February, 2024; originally announced February 2024.

arXiv:2401.08365 [pdf, ps, other]

Combinatorics of q,r-analogues of Stirling numbers of type B

Authors: Eli Bagno, David Garber

Abstract: Stirling number of the first and the second kinds have seen many generalizations and applications in various areas of mathematics. We introduce some combinatorial parameters which realize $q$-analogues and Broder's $r$-variants of Stirling numbers of type $B$ of both kinds, which count signed set partitions and signed permutations respectively. Applications to orthogonality relations and power sum… ▽ More Stirling number of the first and the second kinds have seen many generalizations and applications in various areas of mathematics. We introduce some combinatorial parameters which realize $q$-analogues and Broder's $r$-variants of Stirling numbers of type $B$ of both kinds, which count signed set partitions and signed permutations respectively. Applications to orthogonality relations and power sums are given. △ Less

Submitted 16 January, 2024; originally announced January 2024.

Comments: 32 pages, no figures; submitted

MSC Class: 05A18; 05A30

arXiv:2310.15559 [pdf, ps, other]

From Oja's Algorithm to the Multiplicative Weights Update Method with Applications

Authors: Dan Garber

Abstract: Oja's algorithm is a well known online algorithm studied mainly in the context of stochastic principal component analysis. We make a simple observation, yet to the best of our knowledge a novel one, that when applied to a any (not necessarily stochastic) sequence of symmetric matrices which share common eigenvectors, the regret of Oja's algorithm could be directly bounded in terms of the regret of… ▽ More Oja's algorithm is a well known online algorithm studied mainly in the context of stochastic principal component analysis. We make a simple observation, yet to the best of our knowledge a novel one, that when applied to a any (not necessarily stochastic) sequence of symmetric matrices which share common eigenvectors, the regret of Oja's algorithm could be directly bounded in terms of the regret of the well known multiplicative weights update method for the problem of prediction with expert advice. Several applications to optimization with quadratic forms over the unit sphere in $\reals^n$ are discussed. △ Less

Submitted 24 October, 2023; originally announced October 2023.

arXiv:2308.01677 [pdf, ps, other]

Efficiency of First-Order Methods for Low-Rank Tensor Recovery with the Tensor Nuclear Norm Under Strict Complementarity

Authors: Dan Garber, Atara Kaplan

Abstract: We consider convex relaxations for recovering low-rank tensors based on constrained minimization over a ball induced by the tensor nuclear norm, recently introduced in \cite{tensor_tSVD}. We build on a recent line of results that considered convex relaxations for the recovery of low-rank matrices and established that under a strict complementarity condition (SC), both the convergence rate and per-… ▽ More We consider convex relaxations for recovering low-rank tensors based on constrained minimization over a ball induced by the tensor nuclear norm, recently introduced in \cite{tensor_tSVD}. We build on a recent line of results that considered convex relaxations for the recovery of low-rank matrices and established that under a strict complementarity condition (SC), both the convergence rate and per-iteration runtime of standard gradient methods may improve dramatically. We develop the appropriate strict complementarity condition for the tensor nuclear norm ball and obtain the following main results under this condition: 1. When the objective to minimize is of the form $f(\mX)=g(\mA\mX)+\langle{\mC,\mX}\rangle$ , where $g$ is strongly convex and $\mA$ is a linear map (e.g., least squares), a quadratic growth bound holds, which implies linear convergence rates for standard projected gradient methods, despite the fact that $f$ need not be strongly convex. 2. For a smooth objective function, when initialized in certain proximity of an optimal solution which satisfies SC, standard projected gradient methods only require SVD computations (for projecting onto the tensor nuclear norm ball) of rank that matches the tubal rank of the optimal solution. In particular, when the tubal rank is constant, this implies nearly linear (in the size of the tensor) runtime per iteration, as opposed to super linear without further assumptions. 3. For a nonsmooth objective function which admits a popular smooth saddle-point formulation, we derive similar results to the latter for the well known extragradient method. An additional contribution which may be of independent interest, is the rigorous extension of many basic results regarding tensors of arbitrary order, which were previously obtained only for third-order tensors. △ Less

Submitted 3 August, 2023; originally announced August 2023.

arXiv:2302.04859 [pdf, ps, other]

Projection-free Online Exp-concave Optimization

Authors: Dan Garber, Ben Kretzu

Abstract: We consider the setting of online convex optimization (OCO) with \textit{exp-concave} losses. The best regret bound known for this setting is $O(n\log{}T)$, where $n$ is the dimension and $T$ is the number of prediction rounds (treating all other quantities as constants and assuming $T$ is sufficiently large), and is attainable via the well-known Online Newton Step algorithm (ONS). However, ONS re… ▽ More We consider the setting of online convex optimization (OCO) with \textit{exp-concave} losses. The best regret bound known for this setting is $O(n\log{}T)$, where $n$ is the dimension and $T$ is the number of prediction rounds (treating all other quantities as constants and assuming $T$ is sufficiently large), and is attainable via the well-known Online Newton Step algorithm (ONS). However, ONS requires on each iteration to compute a projection (according to some matrix-induced norm) onto the feasible convex set, which is often computationally prohibitive in high-dimensional settings and when the feasible set admits a non-trivial structure. In this work we consider projection-free online algorithms for exp-concave and smooth losses, where by projection-free we refer to algorithms that rely only on the availability of a linear optimization oracle (LOO) for the feasible set, which in many applications of interest admits much more efficient implementations than a projection oracle. We present an LOO-based ONS-style algorithm, which using overall $O(T)$ calls to a LOO, guarantees in worst case regret bounded by $\widetilde{O}(n^{2/3}T^{2/3})$ (ignoring all quantities except for $n,T$). However, our algorithm is most interesting in an important and plausible low-dimensional data scenario: if the gradients (approximately) span a subspace of dimension at most $ρ$, $ρ<< n$, the regret bound improves to $\widetilde{O}(ρ^{2/3}T^{2/3})$, and by applying standard deterministic sketching techniques, both the space and average additional per-iteration runtime requirements are only $O(ρn)$ (instead of $O(n^2)$). This improves upon recently proposed LOO-based algorithms for OCO which, while having the same state-of-the-art dependence on the horizon $T$, suffer from regret/oracle complexity that scales with $\sqrt{n}$ or worse. △ Less

Submitted 9 February, 2023; originally announced February 2023.

arXiv:2210.13968 [pdf, other]

Faster Projection-Free Augmented Lagrangian Methods via Weak Proximal Oracle

Authors: Dan Garber, Tsur Livney, Shoham Sabach

Abstract: This paper considers a convex composite optimization problem with affine constraints, which includes problems that take the form of minimizing a smooth convex objective function over the intersection of (simple) convex sets, or regularized with multiple (simple) functions. Motivated by high-dimensional applications in which exact projection/proximal computations are not tractable, we propose a \te… ▽ More This paper considers a convex composite optimization problem with affine constraints, which includes problems that take the form of minimizing a smooth convex objective function over the intersection of (simple) convex sets, or regularized with multiple (simple) functions. Motivated by high-dimensional applications in which exact projection/proximal computations are not tractable, we propose a \textit{projection-free} augmented Lagrangian-based method, in which primal updates are carried out using a \textit{weak proximal oracle} (WPO). In an earlier work, WPO was shown to be more powerful than the standard \textit{linear minimization oracle} (LMO) that underlies conditional gradient-based methods (aka Frank-Wolfe methods). Moreover, WPO is computationally tractable for many high-dimensional problems of interest, including those motivated by recovery of low-rank matrices and tensors, and optimization over polytopes which admit efficient LMOs. The main result of this paper shows that under a certain curvature assumption (which is weaker than strong convexity), our WPO-based algorithm achieves an ergodic rate of convergence of $O(1/T)$ for both the objective residual and feasibility gap. This result, to the best of our knowledge, improves upon the $O(1/\sqrt{T})$ rate for existing LMO-based projection-free methods for this class of problems. Empirical experiments on a low-rank and sparse covariance matrix estimation task and the Max Cut semidefinite relaxation demonstrate that of our method can outperform state-of-the-art LMO-based Lagrangian-based methods. △ Less

Submitted 21 February, 2023; v1 submitted 25 October, 2022; originally announced October 2022.

Comments: Accepted to International Conference on Artificial Intelligence and Statistics (AISTATS), 2023

arXiv:2209.06674 [pdf, ps, other]

Analytic aspects of $q,r$-analogue of poly-Stirling numbers of both kinds

Authors: Takao Komatsu, Eli Bagno, David Garber

Abstract: The Stirling numbers of type $B$ of the second kind count signed set partitions. In this paper we provide new combinatorial and analytical identities regarding these numbers as well as Broder's $r$-version of these numbers. Among these identities one can find recursions, explicit formulas based on the inclusion-exclusion principle, and also exponential generating functions. These Stirling number… ▽ More The Stirling numbers of type $B$ of the second kind count signed set partitions. In this paper we provide new combinatorial and analytical identities regarding these numbers as well as Broder's $r$-version of these numbers. Among these identities one can find recursions, explicit formulas based on the inclusion-exclusion principle, and also exponential generating functions. These Stirling numbers can be considered as members of a wider family of triangles of numbers that are characterized using results of Comtet and Lancaster. We generalize these theorems, which present equivalent conditions for a triangle of numbers to be a triangle of generalized Stirling numbers, to the case of the $q,r$-poly Stirling numbers, which are $q$-analogues of the restricted Stirling numbers defined by Broder and having a polynomial value appearing in their defining recursion. There are two ways to do this and these ways are related by a nice identity. △ Less

Submitted 5 April, 2024; v1 submitted 14 September, 2022; originally announced September 2022.

Comments: Revised version. 37 pages, no figures; submitted

MSC Class: Primary: 05A15; Secondary: 05A18; 05A19; 05A30; 11B73

arXiv:2206.11523 [pdf, ps, other]

Low-Rank Mirror-Prox for Nonsmooth and Low-Rank Matrix Optimization Problems

Authors: Dan Garber, Atara Kaplan

Abstract: Low-rank and nonsmooth matrix optimization problems capture many fundamental tasks in statistics and machine learning. While significant progress has been made in recent years in develo** efficient methods for \textit{smooth} low-rank optimization problems that avoid maintaining high-rank matrices and computing expensive high-rank SVDs, advances for nonsmooth problems have been slow paced. In th… ▽ More Low-rank and nonsmooth matrix optimization problems capture many fundamental tasks in statistics and machine learning. While significant progress has been made in recent years in develo** efficient methods for \textit{smooth} low-rank optimization problems that avoid maintaining high-rank matrices and computing expensive high-rank SVDs, advances for nonsmooth problems have been slow paced. In this paper we consider standard convex relaxations for such problems. Mainly, we prove that under a \textit{strict complementarity} condition and under the relatively mild assumption that the nonsmooth objective can be written as a maximum of smooth functions, approximated variants of two popular \textit{mirror-prox} methods: the Euclidean \textit{extragradient method} and mirror-prox with \textit{matrix exponentiated gradient updates}, when initialized with a "warm-start", converge to an optimal solution with rate $O(1/t)$, while requiring only two \textit{low-rank} SVDs per iteration. Moreover, for the extragradient method we also consider relaxed versions of strict complementarity which yield a trade-off between the rank of the SVDs required and the radius of the ball in which we need to initialize the method. We support our theoretical results with empirical experiments on several nonsmooth low-rank matrix recovery tasks, demonstrating both the plausibility of the strict complementarity assumption, and the efficient convergence of our proposed low-rank mirror-prox variants. △ Less

Submitted 23 June, 2022; originally announced June 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2202.04026

arXiv:2206.09370 [pdf, other]

Frank-Wolfe-based Algorithms for Approximating Tyler's M-estimator

Authors: Lior Danon, Dan Garber

Abstract: Tyler's M-estimator is a well known procedure for robust and heavy-tailed covariance estimation. Tyler himself suggested an iterative fixed-point algorithm for computing his estimator however, it requires super-linear (in the size of the data) runtime per iteration, which maybe prohibitive in large scale. In this work we propose, to the best of our knowledge, the first Frank-Wolfe-based algorithms… ▽ More Tyler's M-estimator is a well known procedure for robust and heavy-tailed covariance estimation. Tyler himself suggested an iterative fixed-point algorithm for computing his estimator however, it requires super-linear (in the size of the data) runtime per iteration, which maybe prohibitive in large scale. In this work we propose, to the best of our knowledge, the first Frank-Wolfe-based algorithms for computing Tyler's estimator. One variant uses standard Frank-Wolfe steps, the second also considers \textit{away-steps} (AFW), and the third is a \textit{geodesic} version of AFW (GAFW). AFW provably requires, up to a log factor, only linear time per iteration, while GAFW runs in linear time (up to a log factor) in a large $n$ (number of data-points) regime. All three variants are shown to provably converge to the optimal solution with sublinear rate, under standard assumptions, despite the fact that the underlying optimization problem is not convex nor smooth. Under an additional fairly mild assumption, that holds with probability 1 when the (normalized) data-points are i.i.d. samples from a continuous distribution supported on the entire unit sphere, AFW and GAFW are proved to converge with linear rates. Importantly, all three variants are parameter-free and use adaptive step-sizes. △ Less

Submitted 25 October, 2022; v1 submitted 19 June, 2022; originally announced June 2022.

Comments: In Neural Information Processing Systems (NeurIPS) 2022

arXiv:2202.04721 [pdf, ps, other]

New Projection-free Algorithms for Online Convex Optimization with Adaptive Regret Guarantees

Authors: Dan Garber, Ben Kretzu

Abstract: We present new efficient \textit{projection-free} algorithms for online convex optimization (OCO), where by projection-free we refer to algorithms that avoid computing orthogonal projections onto the feasible set, and instead relay on different and potentially much more efficient oracles. While most state-of-the-art projection-free algorithms are based on the \textit{follow-the-leader} framework,… ▽ More We present new efficient \textit{projection-free} algorithms for online convex optimization (OCO), where by projection-free we refer to algorithms that avoid computing orthogonal projections onto the feasible set, and instead relay on different and potentially much more efficient oracles. While most state-of-the-art projection-free algorithms are based on the \textit{follow-the-leader} framework, our algorithms are fundamentally different and are based on the \textit{online gradient descent} algorithm with a novel and efficient approach to computing so-called \textit{infeasible projections}. As a consequence, we obtain the first projection-free algorithms which naturally yield \textit{adaptive regret} guarantees, i.e., regret bounds that hold w.r.t. any sub-interval of the sequence. Concretely, when assuming the availability of a linear optimization oracle (LOO) for the feasible set, on a sequence of length $T$, our algorithms guarantee $O(T^{3/4})$ adaptive regret and $O(T^{3/4})$ adaptive expected regret, for the full-information and bandit settings, respectively, using only $O(T)$ calls to the LOO. These bounds match the current state-of-the-art regret bounds for LOO-based projection-free OCO, which are \textit{not adaptive}. We also consider a new natural setting in which the feasible set is accessible through a separation oracle. We present algorithms which, using overall $O(T)$ calls to the separation oracle, guarantee $O(\sqrt{T})$ adaptive regret and $O(T^{3/4})$ adaptive expected regret for the full-information and bandit settings, respectively. △ Less

Submitted 19 March, 2023; v1 submitted 9 February, 2022; originally announced February 2022.

Comments: Accepted to Conference on Learning Theory (COLT), 2022. This version subsumes the COLT version and fixes an error in the proof of Theorem 10 in the COLT version (convergence for strongly convex losses)

arXiv:2202.04026 [pdf, ps, other]

Low-Rank Extragradient Method for Nonsmooth and Low-Rank Matrix Optimization Problems

Authors: Dan Garber, Atara Kaplan

Abstract: Low-rank and nonsmooth matrix optimization problems capture many fundamental tasks in statistics and machine learning. While significant progress has been made in recent years in develo** efficient methods for \textit{smooth} low-rank optimization problems that avoid maintaining high-rank matrices and computing expensive high-rank SVDs, advances for nonsmooth problems have been slow paced. In… ▽ More Low-rank and nonsmooth matrix optimization problems capture many fundamental tasks in statistics and machine learning. While significant progress has been made in recent years in develo** efficient methods for \textit{smooth} low-rank optimization problems that avoid maintaining high-rank matrices and computing expensive high-rank SVDs, advances for nonsmooth problems have been slow paced. In this paper we consider standard convex relaxations for such problems. Mainly, we prove that under a natural \textit{generalized strict complementarity} condition and under the relatively mild assumption that the nonsmooth objective can be written as a maximum of smooth functions, the \textit{extragradient method}, when initialized with a "warm-start" point, converges to an optimal solution with rate $O(1/t)$ while requiring only two \textit{low-rank} SVDs per iteration. We give a precise trade-off between the rank of the SVDs required and the radius of the ball in which we need to initialize the method. We support our theoretical results with empirical experiments on several nonsmooth low-rank matrix recovery tasks, demonstrating that using simple initializations, the extragradient method produces exactly the same iterates when full-rank SVDs are replaced with SVDs of rank that matches the rank of the (low-rank) ground-truth matrix to be recovered. △ Less

Submitted 8 February, 2022; originally announced February 2022.

Comments: Appeared in Conference on Neural Information Processing Systems (NeurIPS), 2021

arXiv:2202.04020 [pdf, other]

Local Linear Convergence of Gradient Methods for Subspace Optimization via Strict Complementarity

Authors: Dan Garber, Ron Fisher

Abstract: We consider optimization problems in which the goal is find a $k$-dimensional subspace of $\mathbb{R}^n$, $k<<n$, which minimizes a convex and smooth loss. Such problems generalize the fundamental task of principal component analysis (PCA) to include robust and sparse counterparts, and logistic PCA for binary data, among others. This problem could be approached either via nonconvex gradient method… ▽ More We consider optimization problems in which the goal is find a $k$-dimensional subspace of $\mathbb{R}^n$, $k<<n$, which minimizes a convex and smooth loss. Such problems generalize the fundamental task of principal component analysis (PCA) to include robust and sparse counterparts, and logistic PCA for binary data, among others. This problem could be approached either via nonconvex gradient methods with highly-efficient iterations, but for which arguing about fast convergence to a global minimizer is difficult or, via a convex relaxation for which arguing about convergence to a global minimizer is straightforward, but the corresponding methods are often inefficient in high dimensions. In this work we bridge these two approaches under a strict complementarity assumption, which in particular implies that the optimal solution to the convex relaxation is unique and is also the optimal solution to the original nonconvex problem. Our main result is a proof that a natural nonconvex gradient method which is \textit{SVD-free} and requires only a single QR-factorization of an $n\times k$ matrix per iteration, converges locally with a linear rate. We also establish linear convergence results for the nonconvex projected gradient method, and the Frank-Wolfe method when applied to the convex relaxation. △ Less

Submitted 25 October, 2022; v1 submitted 8 February, 2022; originally announced February 2022.

Comments: In Neural Information Processing Systems (NeurIPS) 2022

arXiv:2102.02029 [pdf, other]

Frank-Wolfe with a Nearest Extreme Point Oracle

Authors: Dan Garber, Noam Wolf

Abstract: We consider variants of the classical Frank-Wolfe algorithm for constrained smooth convex minimization, that instead of access to the standard oracle for minimizing a linear function over the feasible set, have access to an oracle that can find an extreme point of the feasible set that is closest in Euclidean distance to a given vector. We first show that for many feasible sets of interest, such a… ▽ More We consider variants of the classical Frank-Wolfe algorithm for constrained smooth convex minimization, that instead of access to the standard oracle for minimizing a linear function over the feasible set, have access to an oracle that can find an extreme point of the feasible set that is closest in Euclidean distance to a given vector. We first show that for many feasible sets of interest, such an oracle can be implemented with the same complexity as the standard linear optimization oracle. We then show that with such an oracle we can design new Frank-Wolfe variants which enjoy significantly improved complexity bounds in case the set of optimal solutions lies in the convex hull of a subset of extreme points with small diameter (e.g., a low-dimensional face of a polytope). In particular, for many $0\text{--}1$ polytopes, under quadratic growth and strict complementarity conditions, we obtain the first linearly convergent variant with rate that depends only on the dimension of the optimal face and not on the ambient dimension. △ Less

Submitted 9 February, 2022; v1 submitted 3 February, 2021; originally announced February 2021.

Comments: Appeared in Conference on Learning Theory (COLT), 2021

arXiv:2012.10469 [pdf, ps, other]

On the Efficient Implementation of the Matrix Exponentiated Gradient Algorithm for Low-Rank Matrix Optimization

Authors: Dan Garber, Atara Kaplan

Abstract: Convex optimization over the spectrahedron, i.e., the set of all real $n\times n$ positive semidefinite matrices with unit trace, has important applications in machine learning, signal processing and statistics, mainly as a convex relaxation for optimization problems with low-rank matrices. It is also one of the most prominent examples in the theory of first-order methods for convex optimization i… ▽ More Convex optimization over the spectrahedron, i.e., the set of all real $n\times n$ positive semidefinite matrices with unit trace, has important applications in machine learning, signal processing and statistics, mainly as a convex relaxation for optimization problems with low-rank matrices. It is also one of the most prominent examples in the theory of first-order methods for convex optimization in which non-Euclidean methods can be significantly preferable to their Euclidean counterparts. In particular, the desirable choice is the Matrix Exponentiated Gradient (MEG) method which is based on the Bregman distance induced by the (negative) von Neumann entropy. Unfortunately, implementing MEG requires a full SVD computation on each iteration, which is not scalable to high-dimensional problems. In this work we propose an efficient implementations of MEG, both with deterministic and stochastic gradients, which are tailored for optimization with low-rank matrices, and only use a single low-rank SVD computation on each iteration. We also provide efficiently-computable certificates for the correct convergence of our methods. Mainly, we prove that under a strict complementarity condition, the suggested methods converge from a ``warm-start" initialization with similar rates to their full-SVD-based counterparts. Finally, we bring empirical experiments which both support our theoretical findings and demonstrate the practical appeal of our methods. △ Less

Submitted 30 October, 2022; v1 submitted 18 December, 2020; originally announced December 2020.

Comments: Accepted for publication in Mathematics of Operations Research

arXiv:2010.07572 [pdf, ps, other]

Revisiting Projection-free Online Learning: the Strongly Convex Case

Authors: Dan Garber, Ben Kretzu

Abstract: Projection-free optimization algorithms, which are mostly based on the classical Frank-Wolfe method, have gained significant interest in the machine learning community in recent years due to their ability to handle convex constraints that are popular in many applications, but for which computing projections is often computationally impractical in high-dimensional settings, and hence prohibit the u… ▽ More Projection-free optimization algorithms, which are mostly based on the classical Frank-Wolfe method, have gained significant interest in the machine learning community in recent years due to their ability to handle convex constraints that are popular in many applications, but for which computing projections is often computationally impractical in high-dimensional settings, and hence prohibit the use of most standard projection-based methods. In particular, a significant research effort was put on projection-free methods for online learning. In this paper we revisit the Online Frank-Wolfe (OFW) method suggested by Hazan and Kale \cite{Hazan12} and fill a gap that has been left unnoticed for several years: OFW achieves a faster rate of $O(T^{2/3})$ on strongly convex functions (as opposed to the standard $O(T^{3/4})$ for convex but not strongly convex functions), where $T$ is the sequence length. This is somewhat surprising since it is known that for offline optimization, in general, strong convexity does not lead to faster rates for Frank-Wolfe. We also revisit the bandit setting under strong convexity and prove a similar bound of $\tilde O(T^{2/3})$ (instead of $O(T^{3/4})$ without strong convexity). Hence, in the current state-of-affairs, the best projection-free upper-bounds for the full-information and bandit settings with strongly convex and nonsmooth functions match up to logarithmic factors in $T$. △ Less

Submitted 23 February, 2021; v1 submitted 15 October, 2020; originally announced October 2020.

Comments: Accepted to Artificial Intelligence and Statistics (AISTATS) 2021

arXiv:2006.00558 [pdf, ps, other]

Revisiting Frank-Wolfe for Polytopes: Strict Complementarity and Sparsity

Authors: Dan Garber

Abstract: In recent years it was proved that simple modifications of the classical Frank-Wolfe algorithm (aka conditional gradient algorithm) for smooth convex minimization over convex and compact polytopes, converge with linear rate, assuming the objective function has the quadratic growth property. However, the rate of these methods depends explicitly on the dimension of the problem which cannot explain t… ▽ More In recent years it was proved that simple modifications of the classical Frank-Wolfe algorithm (aka conditional gradient algorithm) for smooth convex minimization over convex and compact polytopes, converge with linear rate, assuming the objective function has the quadratic growth property. However, the rate of these methods depends explicitly on the dimension of the problem which cannot explain their empirical success for large scale problems. In this paper we first demonstrate that already for very simple problems and even when the optimal solution lies on a low-dimensional face of the polytope, such dependence on the dimension cannot be avoided in worst case. We then revisit the addition of a strict complementarity assumption already considered in Wolfe's classical book \cite{Wolfe1970}, and prove that under this condition, the Frank-Wolfe method with away-steps and line-search converges linearly with rate that depends explicitly only on the dimension of the optimal face. We motivate strict complementarity by proving that it implies sparsity-robustness of optimal solutions to noise. △ Less

Submitted 6 January, 2021; v1 submitted 31 May, 2020; originally announced June 2020.

Comments: Accepted to Conference on Neural Information Processing Systems (NeurIPS) 2020, spotlight presentation. This version corrects a mistake in the last part of the proof of Theorem 5

arXiv:2004.03681 [pdf, ps, other]

The Worpitzky identity for the groups of signed and even-signed permutations

Authors: Eli Bagno, David Garber, Mordechai Novick

Abstract: The well-known Worpitzky identity provides a connection between two bases of $\mathbb{Q}[x]$: The standard basis $(x+1)^n$ and the binomial basis ${{x+n-i} \choose {n}}$, where the Eulerian numbers for the Coxeter group of type $A$ (the symmetric group) serve as the entries of the transformation matrix. Brenti has generalized this identity to the Coxeter groups of types $B$ and $D$ (signed and eve… ▽ More The well-known Worpitzky identity provides a connection between two bases of $\mathbb{Q}[x]$: The standard basis $(x+1)^n$ and the binomial basis ${{x+n-i} \choose {n}}$, where the Eulerian numbers for the Coxeter group of type $A$ (the symmetric group) serve as the entries of the transformation matrix. Brenti has generalized this identity to the Coxeter groups of types $B$ and $D$ (signed and even-signed permutations groups, respectively) using generating function techniques. Motivated by Foata-Schützenberger and Rawlings' proof for the Worpitzky identity in the symmetric group, we provide combinatorial proofs of this identity and for their $q-$analogues in the Coxeter groups of types $B$ and $D$. △ Less

Submitted 7 April, 2020; originally announced April 2020.

Comments: 17 pages, 3 figures; submitted

MSC Class: 05A19; 05A30

arXiv:2001.11668 [pdf, ps, other]

On the Convergence of Stochastic Gradient Descent with Low-Rank Projections for Convex Low-Rank Matrix Problems

Authors: Dan Garber

Abstract: We revisit the use of Stochastic Gradient Descent (SGD) for solving convex optimization problems that serve as highly popular convex relaxations for many important low-rank matrix recovery problems such as \textit{matrix completion}, \textit{phase retrieval}, and more. The computational limitation of applying SGD to solving these relaxations in large-scale is the need to compute a potentially high… ▽ More We revisit the use of Stochastic Gradient Descent (SGD) for solving convex optimization problems that serve as highly popular convex relaxations for many important low-rank matrix recovery problems such as \textit{matrix completion}, \textit{phase retrieval}, and more. The computational limitation of applying SGD to solving these relaxations in large-scale is the need to compute a potentially high-rank singular value decomposition (SVD) on each iteration in order to enforce the low-rank-promoting constraint. We begin by considering a simple and natural sufficient condition so that these relaxations indeed admit low-rank solutions. This condition is also necessary for a certain notion of low-rank-robustness to hold. Our main result shows that under this condition which involves the eigenvalues of the gradient vector at optimal points, SGD with mini-batches, when initialized with a "warm-start" point, produces iterates that are low-rank with high probability, and hence only a low-rank SVD computation is required on each iteration. This suggests that SGD may indeed be practically applicable to solving large-scale convex relaxations of low-rank matrix recovery problems. Our theoretical results are accompanied with supporting preliminary empirical evidence. As a side benefit, our analysis is quite simple and short. △ Less

Submitted 14 June, 2020; v1 submitted 31 January, 2020; originally announced January 2020.

Comments: Accepted to Conference on Learning Theory 2020 (COLT 2020). This version fixes some minor errors in previous version

arXiv:1912.01467 [pdf, ps, other]

Linear Convergence of Frank-Wolfe for Rank-One Matrix Recovery Without Strong Convexity

Authors: Dan Garber

Abstract: We consider convex optimization problems which are widely used as convex relaxations for low-rank matrix recovery problems. In particular, in several important problems, such as phase retrieval and robust PCA, the underlying assumption in many cases is that the optimal solution is rank-one. In this paper we consider a simple and natural sufficient condition on the objective so that the optimal sol… ▽ More We consider convex optimization problems which are widely used as convex relaxations for low-rank matrix recovery problems. In particular, in several important problems, such as phase retrieval and robust PCA, the underlying assumption in many cases is that the optimal solution is rank-one. In this paper we consider a simple and natural sufficient condition on the objective so that the optimal solution to these relaxations is indeed unique and rank-one. Mainly, we show that under this condition, the standard Frank-Wolfe method with line-search (i.e., without any tuning of parameters whatsoever), which only requires a single rank-one SVD computation per iteration, finds an $ε$-approximated solution in only $O(\log{1/ε})$ iterations (as opposed to the previous best known bound of $O(1/ε)$), despite the fact that the objective is not strongly convex. We consider several variants of the basic method with improved complexities, as well as an extension motivated by robust PCA, and finally, an extension to nonsmooth problems. △ Less

Submitted 19 June, 2022; v1 submitted 3 December, 2019; originally announced December 2019.

Comments: Accepted to Mathematical Programming Series A

arXiv:1910.03374 [pdf, ps, other]

Improved Regret Bounds for Projection-free Bandit Convex Optimization

Authors: Dan Garber, Ben Kretzu

Abstract: We revisit the challenge of designing online algorithms for the bandit convex optimization problem (BCO) which are also scalable to high dimensional problems. Hence, we consider algorithms that are \textit{projection-free}, i.e., based on the conditional gradient method whose only access to the feasible decision set, is through a linear optimization oracle (as opposed to other methods which requir… ▽ More We revisit the challenge of designing online algorithms for the bandit convex optimization problem (BCO) which are also scalable to high dimensional problems. Hence, we consider algorithms that are \textit{projection-free}, i.e., based on the conditional gradient method whose only access to the feasible decision set, is through a linear optimization oracle (as opposed to other methods which require potentially much more computationally-expensive subprocedures, such as computing Euclidean projections). We present the first such algorithm that attains $O(T^{3/4})$ expected regret using only $O(T)$ overall calls to the linear optimization oracle, in expectation, where $T$ is the number of prediction rounds. This improves over the $O(T^{4/5})$ expected regret bound recently obtained by \cite{Karbasi19}, and actually matches the current best regret bound for projection-free online learning in the \textit{full information} setting. △ Less

Submitted 8 October, 2019; originally announced October 2019.

arXiv:1903.02877 [pdf, ps, other]

Signed partitions - A balls into urns approach

Authors: Eli Bagno, David Garber

Abstract: Using Reiner's definition of Stirling numbers of type B of the second kind, we provide a 'balls into urns' approach for proving a generalization of a well-known identity concerning the classical Stirling numbers of the second kind: $x^n=\sum\limits_{k=0}^n{S(n,k)[x]_k}.$ Using Reiner's definition of Stirling numbers of type B of the second kind, we provide a 'balls into urns' approach for proving a generalization of a well-known identity concerning the classical Stirling numbers of the second kind: $x^n=\sum\limits_{k=0}^n{S(n,k)[x]_k}.$ △ Less

Submitted 7 March, 2019; originally announced March 2019.

Comments: 6 pages, one figure; submitted

MSC Class: 05A18 (Primary); 05A19 (Secondary)

arXiv:1902.01644 [pdf, ps, other]

On the Convergence of Projected-Gradient Methods with Low-Rank Projections for Smooth Convex Minimization over Trace-Norm Balls and Related Problems

Authors: Dan Garber

Abstract: Smooth convex minimization over the unit trace-norm ball is an important optimization problem in machine learning, signal processing, statistics and other fields, that underlies many tasks in which one wishes to recover a low-rank matrix given certain measurements. While first-order methods for convex optimization enjoy optimal convergence rates, they require in worst-case to compute a full-rank S… ▽ More Smooth convex minimization over the unit trace-norm ball is an important optimization problem in machine learning, signal processing, statistics and other fields, that underlies many tasks in which one wishes to recover a low-rank matrix given certain measurements. While first-order methods for convex optimization enjoy optimal convergence rates, they require in worst-case to compute a full-rank SVD on each iteration, in order to compute the projection onto the trace-norm ball. These full-rank SVD computations however prohibit the application of such methods to large problems. A simple and natural heuristic to reduce the computational cost is to approximate the projection using only a low-rank SVD. This raises the question if, and under what conditions, this simple heuristic can indeed result in provable convergence to the optimal solution. In this paper we show that any optimal solution is a center of a Euclid. ball inside-which the projected-gradient map** admits rank that is at most the multiplicity of the largest singular value of the gradient vector. Moreover, the radius of the ball scales with the spectral gap of this gradient vector. We show how this readily implies the local convergence (i.e., from a "warm-start" initialization) of standard first-order methods, using only low-rank SVD computations. We also quantify the effect of "over-parameterization", i.e., using SVD computations with higher rank, on the radius of this ball, showing it can increase dramatically with moderately larger rank. We extend our results also to the setting of optimization with trace-norm regularization and optimization over bounded-trace positive semidefinite matrices. Our theoretical investigation is supported by concrete empirical evidence that demonstrates the \textit{correct} convergence of first-order methods with low-rank projections on real-world datasets. △ Less

Submitted 28 November, 2020; v1 submitted 5 February, 2019; originally announced February 2019.

Comments: Accepted to SIAM Journal on Optimization (SIOPT)

arXiv:1901.07830 [pdf, ps, other]

Some identities involving second kind Stirling numbers of types $B$ and $D$

Authors: Eli Bagno, Riccardo Biagioli, David Garber

Abstract: Using Reiner's definition of Stirling numbers of the second kind in types $B$ and $D$, we generalize two well-known identities concerning the classical Stirling numbers of the second kind. The first identity relates them with Eulerian numbers and the second identity interprets them as entries in a transition matrix between the elements of two standard bases of the polynomial ring $R[x]$. Finally,… ▽ More Using Reiner's definition of Stirling numbers of the second kind in types $B$ and $D$, we generalize two well-known identities concerning the classical Stirling numbers of the second kind. The first identity relates them with Eulerian numbers and the second identity interprets them as entries in a transition matrix between the elements of two standard bases of the polynomial ring $R[x]$. Finally, we generalize these identities to the group of colored permutations $G_{m,n}$. △ Less

Submitted 23 January, 2019; originally announced January 2019.

Comments: 17 pages, 2 figures; submitted

MSC Class: 05A19 (Primary); 05A18; 05A10 (Secondary)

Journal ref: Electro. J. Combin. 26(3) (2019), P3.9

arXiv:1809.10588 [pdf, ps, other]

Property testing and expansion in cubical complexes

Authors: David Garber, Uzi Vishne

Abstract: We consider expansion and property testing in the language of incidence geometry, covering both simplicial and cubical complexes in any dimension. We develop a general method for passing from an explicit description of the cohomology group, which need not be trivial, to a testability proof with linear ratio between errors. The method is demonstrated by testing functions on 2-cells in cubical compl… ▽ More We consider expansion and property testing in the language of incidence geometry, covering both simplicial and cubical complexes in any dimension. We develop a general method for passing from an explicit description of the cohomology group, which need not be trivial, to a testability proof with linear ratio between errors. The method is demonstrated by testing functions on 2-cells in cubical complexes to be induced from the edges. △ Less

Submitted 26 November, 2019; v1 submitted 27 September, 2018; originally announced September 2018.

Comments: 25 pages; revised version; accepted to Discrete and Computational Geometry

arXiv:1809.10491 [pdf, other]

On the Regret Minimization of Nonconvex Online Gradient Ascent for Online PCA

Authors: Dan Garber

Abstract: In this paper we focus on the problem of Online Principal Component Analysis in the regret minimization framework. For this problem, all existing regret minimization algorithms for the fully-adversarial setting are based on a positive semidefinite convex relaxation, and hence require quadratic memory and SVD computation (either thin of full) on each iteration, which amounts to at least quadratic r… ▽ More In this paper we focus on the problem of Online Principal Component Analysis in the regret minimization framework. For this problem, all existing regret minimization algorithms for the fully-adversarial setting are based on a positive semidefinite convex relaxation, and hence require quadratic memory and SVD computation (either thin of full) on each iteration, which amounts to at least quadratic runtime per iteration. This is in stark contrast to a corresponding stochastic i.i.d. variant of the problem, which was studied extensively lately, and admits very efficient gradient ascent algorithms that work directly on the natural non-convex formulation of the problem, and hence require only linear memory and linear runtime per iteration. This raises the question: can non-convex online gradient ascent algorithms be shown to minimize regret in online adversarial settings? In this paper we take a step forward towards answering this question. We introduce an \textit{adversarially-perturbed spiked-covariance model} in which, each data point is assumed to follow a fixed stochastic distribution with a non-zero spectral gap in the covariance matrix, but is then perturbed with some adversarial vector. This model is a natural extension of a well studied standard stochastic setting that allows for non-stationary (adversarial) patterns to arise in the data and hence, might serve as a significantly better approximation for real-world data-streams. We show that in an interesting regime of parameters, when the non-convex online gradient ascent algorithm is initialized with a "warm-start" vector, it provably minimizes the regret with high probability. We further discuss the possibility of computing such a "warm-start" vector, and also the use of regularization to obtain fast regret rates. Our theoretical findings are supported by empirical experiments on both synthetic and real-world data. △ Less

Submitted 31 January, 2019; v1 submitted 27 September, 2018; originally announced September 2018.

Comments: added logarithmic regret bounds, more related work, fixed some small errors

arXiv:1809.10477 [pdf, ps, other]

Fast Stochastic Algorithms for Low-rank and Nonsmooth Matrix Problems

Authors: Dan Garber, Atara Kaplan

Abstract: Composite convex optimization problems which include both a nonsmooth term and a low-rank promoting term have important applications in machine learning and signal processing, such as when one wishes to recover an unknown matrix that is simultaneously low-rank and sparse. However, such problems are highly challenging to solve in large-scale: the low-rank promoting term prohibits efficient implemen… ▽ More Composite convex optimization problems which include both a nonsmooth term and a low-rank promoting term have important applications in machine learning and signal processing, such as when one wishes to recover an unknown matrix that is simultaneously low-rank and sparse. However, such problems are highly challenging to solve in large-scale: the low-rank promoting term prohibits efficient implementations of proximal methods for composite optimization and even simple subgradient methods. On the other hand, methods which are tailored for low-rank optimization, such as conditional gradient-type methods, which are often applied to a smooth approximation of the nonsmooth objective, are slow since their runtime scales with both the large Lipshitz parameter of the smoothed gradient vector and with $1/ε$. In this paper we develop efficient algorithms for \textit{stochastic} optimization of a strongly-convex objective which includes both a nonsmooth term and a low-rank promoting term. In particular, to the best of our knowledge, we present the first algorithm that enjoys all following critical properties for large-scale problems: i) (nearly) optimal sample complexity, ii) each iteration requires only a single \textit{low-rank} SVD computation, and iii) overall number of thin-SVD computations scales only with $\log{1/ε}$ (as opposed to $\textrm{poly}(1/ε)$ in previous methods). We also give an algorithm for the closely-related finite-sum setting. At the heart of our results lie a novel combination of a variance-reduction technique and the use of a \textit{weak-proximal oracle} which is key to obtaining all above three properties simultaneously. △ Less

Submitted 27 September, 2018; originally announced September 2018.

arXiv:1802.05581 [pdf, other]

Improved Complexities of Conditional Gradient-Type Methods with Applications to Robust Matrix Recovery Problems

Authors: Dan Garber, Shoham Sabach, Atara Kaplan

Abstract: Motivated by robust matrix recovery problems such as Robust Principal Component Analysis, we consider a general optimization problem of minimizing a smooth and strongly convex loss function applied to the sum of two blocks of variables, where each block of variables is constrained or regularized individually. We study a Conditional Gradient-Type method which is able to leverage the special structu… ▽ More Motivated by robust matrix recovery problems such as Robust Principal Component Analysis, we consider a general optimization problem of minimizing a smooth and strongly convex loss function applied to the sum of two blocks of variables, where each block of variables is constrained or regularized individually. We study a Conditional Gradient-Type method which is able to leverage the special structure of the problem to obtain faster convergence rates than those attainable via standard methods, under a variety of assumptions. In particular, our method is appealing for matrix problems in which one of the blocks corresponds to a low-rank matrix since it avoids prohibitive full-rank singular value decompositions required by most standard methods. While our initial motivation comes from problems which originated in statistics, our analysis does not impose any statistical assumptions on the data. △ Less

Submitted 15 November, 2019; v1 submitted 15 February, 2018; originally announced February 2018.

Comments: Accepted to Mathematical Programming

arXiv:1802.04623 [pdf, ps, other]

Logarithmic Regret for Online Gradient Descent Beyond Strong Convexity

Authors: Dan Garber

Abstract: Hoffman's classical result gives a bound on the distance of a point from a convex and compact polytope in terms of the magnitude of violation of the constraints. Recently, several results showed that Hoffman's bound can be used to derive strongly-convex-like rates for first-order methods for \textit{offline} convex optimization of curved, though not strongly convex, functions, over polyhedral sets… ▽ More Hoffman's classical result gives a bound on the distance of a point from a convex and compact polytope in terms of the magnitude of violation of the constraints. Recently, several results showed that Hoffman's bound can be used to derive strongly-convex-like rates for first-order methods for \textit{offline} convex optimization of curved, though not strongly convex, functions, over polyhedral sets. In this work, we use this classical result for the first time to obtain faster rates for \textit{online convex optimization} over polyhedral sets with curved convex, though not strongly convex, loss functions. We show that under several reasonable assumptions on the data, the standard \textit{Online Gradient Descent} algorithm guarantees logarithmic regret. To the best of our knowledge, the only previous algorithm to achieve logarithmic regret in the considered settings is the \textit{Online Newton Step} algorithm which requires quadratic (in the dimension) memory and at least quadratic runtime per iteration, which greatly limits its applicability to large-scale problems. In particular, our results hold for \textit{semi-adversarial} settings in which the data is a combination of an arbitrary (adversarial) sequence and a stochastic sequence, which might provide reasonable approximation for many real-world sequences, or under a natural assumption that the data is low-rank. We demonstrate via experiments that the regret of OGD is indeed comparable to that of ONS (and even far better) on curved though not strongly-convex losses. △ Less

Submitted 18 February, 2019; v1 submitted 13 February, 2018; originally announced February 2018.

Comments: Revised version. Accepted to AISTATS 2019

arXiv:1709.03093 [pdf, ps, other]

Efficient Online Linear Optimization with Approximation Algorithms

Authors: Dan Garber

Abstract: We revisit the problem of \textit{online linear optimization} in case the set of feasible actions is accessible through an approximated linear optimization oracle with a factor $α$ multiplicative approximation guarantee. This setting is in particular interesting since it captures natural online extensions of well-studied \textit{offline} linear optimization problems which are NP-hard, yet admit ef… ▽ More We revisit the problem of \textit{online linear optimization} in case the set of feasible actions is accessible through an approximated linear optimization oracle with a factor $α$ multiplicative approximation guarantee. This setting is in particular interesting since it captures natural online extensions of well-studied \textit{offline} linear optimization problems which are NP-hard, yet admit efficient approximation algorithms. The goal here is to minimize the $α$\textit{-regret} which is the natural extension of the standard \textit{regret} in \textit{online learning} to this setting. We present new algorithms with significantly improved oracle complexity for both the full information and bandit variants of the problem. Mainly, for both variants, we present $α$-regret bounds of $O(T^{-1/3})$, were $T$ is the number of prediction rounds, using only $O(\log{T})$ calls to the approximation oracle per iteration, on average. These are the first results to obtain both average oracle complexity of $O(\log{T})$ (or even poly-logarithmic in $T$) and $α$-regret bound $O(T^{-c})$ for a constant $c>0$, for both variants. △ Less

Submitted 10 September, 2017; originally announced September 2017.

Comments: Accepted to Conference on Neural Information Processing System (NIPS) 2017

arXiv:1702.07834 [pdf, ps, other]

Efficient coordinate-wise leading eigenvector computation

Authors: Jialei Wang, Weiran Wang, Dan Garber, Nathan Srebro

Abstract: We develop and analyze efficient "coordinate-wise" methods for finding the leading eigenvector, where each step involves only a vector-vector product. We establish global convergence with overall runtime guarantees that are at least as good as Lanczos's method and dominate it for slowly decaying spectrum. Our methods are based on combining a shift-and-invert approach with coordinate-wise algorithm… ▽ More We develop and analyze efficient "coordinate-wise" methods for finding the leading eigenvector, where each step involves only a vector-vector product. We establish global convergence with overall runtime guarantees that are at least as good as Lanczos's method and dominate it for slowly decaying spectrum. Our methods are based on combining a shift-and-invert approach with coordinate-wise algorithms for linear regression. △ Less

Submitted 25 February, 2017; originally announced February 2017.

arXiv:1605.08754 [pdf, other]

Faster Eigenvector Computation via Shift-and-Invert Preconditioning

Authors: Dan Garber, Elad Hazan, Chi **, Sham M. Kakade, Cameron Musco, Praneeth Netrapalli, Aaron Sidford

Abstract: We give faster algorithms and improved sample complexities for estimating the top eigenvector of a matrix $Σ$ -- i.e. computing a unit vector $x$ such that $x^T Σx \ge (1-ε)λ_1(Σ)$: Offline Eigenvector Estimation: Given an explicit $A \in \mathbb{R}^{n \times d}$ with $Σ= A^TA$, we show how to compute an $ε$ approximate top eigenvector in time… ▽ More We give faster algorithms and improved sample complexities for estimating the top eigenvector of a matrix $Σ$ -- i.e. computing a unit vector $x$ such that $x^T Σx \ge (1-ε)λ_1(Σ)$: Offline Eigenvector Estimation: Given an explicit $A \in \mathbb{R}^{n \times d}$ with $Σ= A^TA$, we show how to compute an $ε$ approximate top eigenvector in time $\tilde O([nnz(A) + \frac{d*sr(A)}{gap^2} ]* \log 1/ε)$ and $\tilde O([\frac{nnz(A)^{3/4} (d*sr(A))^{1/4}}{\sqrt{gap}} ] * \log 1/ε)$. Here $nnz(A)$ is the number of nonzeros in $A$, $sr(A)$ is the stable rank, $gap$ is the relative eigengap. By separating the $gap$ dependence from the $nnz(A)$ term, our first runtime improves upon the classical power and Lanczos methods. It also improves prior work using fast subspace embeddings [AC09, CW13] and stochastic optimization [Sha15c], giving significantly better dependencies on $sr(A)$ and $ε$. Our second running time improves these further when $nnz(A) \le \frac{d*sr(A)}{gap^2}$. Online Eigenvector Estimation: Given a distribution $D$ with covariance matrix $Σ$ and a vector $x_0$ which is an $O(gap)$ approximate top eigenvector for $Σ$, we show how to refine to an $ε$ approximation using $ O(\frac{var(D)}{gap*ε})$ samples from $D$. Here $var(D)$ is a natural notion of variance. Combining our algorithm with previous work to initialize $x_0$, we obtain improved sample complexity and runtime results under a variety of assumptions on $D$. We achieve our results using a general framework that we believe is of independent interest. We give a robust analysis of the classic method of shift-and-invert preconditioning to reduce eigenvector computation to approximately solving a sequence of linear systems. We then apply fast stochastic variance reduced gradient (SVRG) based system solvers to achieve our claims. △ Less

Submitted 25 May, 2016; originally announced May 2016.

Comments: Appearing in ICML 2016. Combination of work in arXiv:1509.05647 and arXiv:1510.08896

arXiv:1605.06492 [pdf, other]

Linear-memory and Decomposition-invariant Linearly Convergent Conditional Gradient Algorithm for Structured Polytopes

Authors: Dan Garber, Ofer Meshi

Abstract: Recently, several works have shown that natural modifications of the classical conditional gradient method (aka Frank-Wolfe algorithm) for constrained convex optimization, provably converge with a linear rate when: i) the feasible set is a polytope, and ii) the objective is smooth and strongly-convex. However, all of these results suffer from two significant shortcomings: large memory requirement… ▽ More Recently, several works have shown that natural modifications of the classical conditional gradient method (aka Frank-Wolfe algorithm) for constrained convex optimization, provably converge with a linear rate when: i) the feasible set is a polytope, and ii) the objective is smooth and strongly-convex. However, all of these results suffer from two significant shortcomings: large memory requirement due to the need to store an explicit convex decomposition of the current iterate, and as a consequence, large running-time overhead per iteration, and worst case convergence rate that depends unfavorably on the dimension. In this work we present a new conditional gradient variant and a corresponding analysis that improves on both of the above shortcomings. In particular: both memory and computation overheads are only linear in the dimension. Moreover, in case the optimal solution is sparse, the new convergence rate replaces a factor which is at least linear in the dimension in previous works, with a linear dependence on the number of non-zeros in the optimal solution. At the heart of our method, and corresponding analysis, is a novel way to compute decomposition-invariant away-steps. While our theoretical guarantees do not apply to any polytope, they apply to several important structured polytopes that capture central concepts such as paths in graphs, perfect matchings in bipartite graphs, marginal distributions that arise in structured prediction tasks, and more. Our theoretical findings are complemented by empirical evidence which shows that our method delivers state-of-the-art performance. △ Less

Submitted 20 May, 2016; originally announced May 2016.

arXiv:1605.06203 [pdf, ps, other]

Faster Projection-free Convex Optimization over the Spectrahedron

Authors: Dan Garber

Abstract: Minimizing a convex function over the spectrahedron, i.e., the set of all positive semidefinite matrices with unit trace, is an important optimization task with many applications in optimization, machine learning, and signal processing. It is also notoriously difficult to solve in large-scale since standard techniques require expensive matrix decompositions. An alternative, is the conditional grad… ▽ More Minimizing a convex function over the spectrahedron, i.e., the set of all positive semidefinite matrices with unit trace, is an important optimization task with many applications in optimization, machine learning, and signal processing. It is also notoriously difficult to solve in large-scale since standard techniques require expensive matrix decompositions. An alternative, is the conditional gradient method (aka Frank-Wolfe algorithm) that regained much interest in recent years, mostly due to its application to this specific setting. The key benefit of the CG method is that it avoids expensive matrix decompositions all together, and simply requires a single eigenvector computation per iteration, which is much more efficient. On the downside, the CG method, in general, converges with an inferior rate. The error for minimizing a $β$-smooth function after $t$ iterations scales like $β/t$. This convergence rate does not improve even if the function is also strongly convex. In this work we present a modification of the CG method tailored for convex optimization over the spectrahedron. The per-iteration complexity of the method is essentially identical to that of the standard CG method: only a single eigenvecor computation is required. For minimizing an $α$-strongly convex and $β$-smooth function, the expected approximation error of the method after $t$ iterations is: $$O\left({\min\{\frac{β}{t} ,\left({\frac{β\sqrt{\textrm{rank}(\textbf{X}^*)}}{α^{1/4}t}}\right)^{4/3}, \left({\fracβ{\sqrtαλ_{\min}(\textbf{X}^*)t}}\right)^{2}\}}\right) ,$$ where $\textbf{X}^*$ is the optimal solution. To the best of our knowledge, this is the first result that attains provably faster convergence rates for a CG variant for optimization over the spectrahedron. We also present encouraging preliminary empirical results. △ Less

Submitted 19 May, 2016; originally announced May 2016.

arXiv:1509.05647 [pdf, ps, other]

Fast and Simple PCA via Convex Optimization

Authors: Dan Garber, Elad Hazan

Abstract: The problem of principle component analysis (PCA) is traditionally solved by spectral or algebraic methods. We show how computing the leading principal component could be reduced to solving a \textit{small} number of well-conditioned {\it convex} optimization problems. This gives rise to a new efficient method for PCA based on recent advances in stochastic methods for convex optimization. In par… ▽ More The problem of principle component analysis (PCA) is traditionally solved by spectral or algebraic methods. We show how computing the leading principal component could be reduced to solving a \textit{small} number of well-conditioned {\it convex} optimization problems. This gives rise to a new efficient method for PCA based on recent advances in stochastic methods for convex optimization. In particular we show that given a $d\times d$ matrix $\X = \frac{1}{n}\sum_{i=1}^n\x_i\x_i^{\top}$ with top eigenvector $\u$ and top eigenvalue $λ_1$ it is possible to: \begin{itemize} \item compute a unit vector $\w$ such that $(\w^{\top}\u)^2 \geq 1-ε$ in $\tilde{O}\left({\frac{d}{δ^2}+N}\right)$ time, where $δ= λ_1 - λ_2$ and $N$ is the total number of non-zero entries in $\x_1,...,\x_n$, \item compute a unit vector $\w$ such that $\w^{\top}\X\w \geq λ_1-ε$ in $\tilde{O}(d/ε^2)$ time. \end{itemize} To the best of our knowledge, these bounds are the fastest to date for a wide regime of parameters. These results could be further accelerated when $δ$ (in the first case) and $ε$ (in the second case) are smaller than $\sqrt{d/N}$. △ Less

Submitted 25 November, 2015; v1 submitted 18 September, 2015; originally announced September 2015.

arXiv:1406.1305 [pdf, other]

Faster Rates for the Frank-Wolfe Method over Strongly-Convex Sets

Authors: Dan Garber, Elad Hazan

Abstract: The Frank-Wolfe method (a.k.a. conditional gradient algorithm) for smooth optimization has regained much interest in recent years in the context of large scale optimization and machine learning. A key advantage of the method is that it avoids projections - the computational bottleneck in many applications - replacing it by a linear optimization step. Despite this advantage, the known convergence r… ▽ More The Frank-Wolfe method (a.k.a. conditional gradient algorithm) for smooth optimization has regained much interest in recent years in the context of large scale optimization and machine learning. A key advantage of the method is that it avoids projections - the computational bottleneck in many applications - replacing it by a linear optimization step. Despite this advantage, the known convergence rates of the FW method fall behind standard first order methods for most settings of interest. It is an active line of research to derive faster linear optimization-based algorithms for various settings of convex optimization. In this paper we consider the special case of optimization over strongly convex sets, for which we prove that the vanila FW method converges at a rate of $\frac{1}{t^2}$. This gives a quadratic improvement in convergence rate compared to the general case, in which convergence is of the order $\frac{1}{t}$, and known to be tight. We show that various balls induced by $\ell_p$ norms, Schatten norms and group norms are strongly convex on one hand and on the other hand, linear optimization over these sets is straightforward and admits a closed-form solution. We further show how several previous fast-rate results for the FW method follow easily from our analysis. △ Less

Submitted 14 August, 2015; v1 submitted 5 June, 2014; originally announced June 2014.

arXiv:1401.5568 [pdf, ps, other]

On the group of alternating colored permutations

Authors: Eli Bagno, David Garber, Toufik Mansour

Abstract: The group of alternating colored permutations is the natural analogue of the classical alternating group, inside the wreath product $\mathbb{Z}_r \wr S_n$. We present a 'Coxeter-like' presentation for this group and compute the length function with respect to that presentation. Then, we present this group as a covering of $\mathbb{Z}_{\frac{r}{2}} \wr S_n$ and use this point of view to give anothe… ▽ More The group of alternating colored permutations is the natural analogue of the classical alternating group, inside the wreath product $\mathbb{Z}_r \wr S_n$. We present a 'Coxeter-like' presentation for this group and compute the length function with respect to that presentation. Then, we present this group as a covering of $\mathbb{Z}_{\frac{r}{2}} \wr S_n$ and use this point of view to give another expression for the length function. We also use this covering to lift several known parameters of $\mathbb{Z}_{\frac{r}{2}} \wr S_n$ to the group of alternating colored permutations. △ Less

Submitted 22 January, 2014; originally announced January 2014.

Comments: 29 pages, one figure; submitted

MSC Class: 20B35; 20F05

arXiv:1310.0936 [pdf, ps, other]

Double Centralizers of Parabolic Subgroups of Braid Groups

Authors: David Garber, Arkadius Kalka, Eran Liberman, Mina Teicher

Abstract: We characterize the double centralizer of all parabolic subgroups of the braid groups. We apply this result to provide a new and potentially more efficient solution to the subgroup conjugacy problem for parabolic subgroups. In the course of the proof we also characterize the centralizer for all parabolic subgroups. We characterize the double centralizer of all parabolic subgroups of the braid groups. We apply this result to provide a new and potentially more efficient solution to the subgroup conjugacy problem for parabolic subgroups. In the course of the proof we also characterize the centralizer for all parabolic subgroups. △ Less

Submitted 13 June, 2015; v1 submitted 3 October, 2013; originally announced October 2013.

Comments: 19 pages, 7 figures. Added Remark 3.4. Shortened abstract

MSC Class: 20F36

arXiv:1305.0548 [pdf, ps, other]

Length-based attacks in polycyclic groups

Authors: David Garber, Delaram Kahrobaei, Ha T. Lam

Abstract: After the Anshel-Anshel-Goldfeld (AAG) key-exchange protocol was introduced in 1999, it was implemented and studied with braid groups and with the Thompson group as its underlying platforms. The length-based attack, introduced by Hughes and Tannenbaum, has been used to extensively study AAG with the braid group as the underlying platform. Meanwhile, a new platform, using polycyclic groups, was pro… ▽ More After the Anshel-Anshel-Goldfeld (AAG) key-exchange protocol was introduced in 1999, it was implemented and studied with braid groups and with the Thompson group as its underlying platforms. The length-based attack, introduced by Hughes and Tannenbaum, has been used to extensively study AAG with the braid group as the underlying platform. Meanwhile, a new platform, using polycyclic groups, was proposed by Eick and Kahrobaei. In this paper, we show that with a high enough Hirsch length, the polycyclic group as an underlying platform for AAG is resistant to the length-based attack. In particular, polycyclic groups could provide a secure platform for any cryptosystem based on conjugacy search problem such as non-commutative Diffie-Hellman, ElGamal and Cramer-Shoup key exchange protocols. △ Less

Submitted 22 November, 2014; v1 submitted 2 May, 2013; originally announced May 2013.

Comments: J. Math. Crypt. 2014

arXiv:1304.7561 [pdf, ps, other]

On the structure of fundamental groups of conic-line arrangements having a cycle in their graph

Authors: Michael Friedman, David Garber

Abstract: The fundamental group of the complement of a plane curve is a very important topological invariant. In particular, it is interesting to find out whether this group is determined by the combinatorics of the curve or not, and whether it is a direct sum of free groups and a free abelian group, or it has a conjugation-free geometric presentation. In this paper, we investigate the structure of this f… ▽ More The fundamental group of the complement of a plane curve is a very important topological invariant. In particular, it is interesting to find out whether this group is determined by the combinatorics of the curve or not, and whether it is a direct sum of free groups and a free abelian group, or it has a conjugation-free geometric presentation. In this paper, we investigate the structure of this fundamental group when the graph of the conic-line arrangement is a unique cycle of length $n$ and the conic passes through all the multiple points of the cycle. We show that if n is odd, then the affine fundamental group is abelian but not conjugation-free. For the even case, if n>4, then using quotients of the lower central series, we show that the fundamental group is not even a direct sum of a free abelian group and free groups. △ Less

Submitted 29 April, 2013; originally announced April 2013.

Comments: 25 pages, 19 figures; Originally was a part of the paper arXiv:1111.5291, but is now separated; submitted

MSC Class: 14H30

arXiv:1301.5299 [pdf, ps, other]

On Left regular bands and real Conic-Line arrangements

Authors: Michael Friedman, David Garber

Abstract: An arrangement of curves in the real plane divides it into a collection of faces. In the case of line arrangements, there exists an associative product which gives this collection a structure of a left regular band. A natural question is whether the same is possible for other arrangements. In this paper, we try to answer this question for the simplest generalization of line arrangements, that is,… ▽ More An arrangement of curves in the real plane divides it into a collection of faces. In the case of line arrangements, there exists an associative product which gives this collection a structure of a left regular band. A natural question is whether the same is possible for other arrangements. In this paper, we try to answer this question for the simplest generalization of line arrangements, that is, conic--line arrangements. Investigating the different algebraic structures induced by the face poset of a conic--line arrangement, we present two different generalizations for the product and its associated structures: an alternative left regular band and an associative aperiodic semigroup. We also study the structure of sub left regular bands induced by these arrangements. We finish with some chamber counting results for conic--line arrangements. △ Less

Submitted 29 August, 2018; v1 submitted 22 January, 2013; originally announced January 2013.

Comments: 46 pages, 27 figures; Totally revised and extended version due to referee's reports. To appear in Semigroup Forum

arXiv:1301.4666 [pdf, ps, other]

A Linearly Convergent Conditional Gradient Algorithm with Applications to Online and Stochastic Optimization

Authors: Dan Garber, Elad Hazan

Abstract: Linear optimization is many times algorithmically simpler than non-linear convex optimization. Linear optimization over matroid polytopes, matching polytopes and path polytopes are example of problems for which we have simple and efficient combinatorial algorithms, but whose non-linear convex counterpart is harder and admits significantly less efficient algorithms. This motivates the computational… ▽ More Linear optimization is many times algorithmically simpler than non-linear convex optimization. Linear optimization over matroid polytopes, matching polytopes and path polytopes are example of problems for which we have simple and efficient combinatorial algorithms, but whose non-linear convex counterpart is harder and admits significantly less efficient algorithms. This motivates the computational model of convex optimization, including the offline, online and stochastic settings, using a linear optimization oracle. In this computational model we give several new results that improve over the previous state-of-the-art. Our main result is a novel conditional gradient algorithm for smooth and strongly convex optimization over polyhedral sets that performs only a single linear optimization step over the domain on each iteration and enjoys a linear convergence rate. This gives an exponential improvement in convergence rate over previous results. Based on this new conditional gradient algorithm we give the first algorithms for online convex optimization over polyhedral sets that perform only a single linear optimization step over the domain while having optimal regret guarantees, answering an open question of Kalai and Vempala, and Hazan and Kale. Our online algorithms also imply conditional gradient algorithms for non-smooth and stochastic convex optimization with the same convergence rates as projected (sub)gradient methods. △ Less

Submitted 14 August, 2015; v1 submitted 20 January, 2013; originally announced January 2013.

arXiv:1208.5211 [pdf, ps, other]

Almost Optimal Sublinear Time Algorithm for Semidefinite Programming

Authors: Dan Garber, Elad Hazan

Abstract: We present an algorithm for approximating semidefinite programs with running time that is sublinear in the number of entries in the semidefinite instance. We also present lower bounds that show our algorithm to have a nearly optimal running time. We present an algorithm for approximating semidefinite programs with running time that is sublinear in the number of entries in the semidefinite instance. We also present lower bounds that show our algorithm to have a nearly optimal running time. △ Less

Submitted 26 August, 2012; originally announced August 2012.

arXiv:1111.5412 [pdf, ps, other]

On the Orchard crossing number of prisms, ladders and other related graphs

Authors: Elie Feder, David Garber

Abstract: This paper deals with the Orchard crossing number of some families of graphs which are based on cycles. These include disjoint cycles, cycles which share a vertex and cycles which share an edge. Specifically, we focus on the prism and ladder graphs. This paper deals with the Orchard crossing number of some families of graphs which are based on cycles. These include disjoint cycles, cycles which share a vertex and cycles which share an edge. Specifically, we focus on the prism and ladder graphs. △ Less

Submitted 23 November, 2011; originally announced November 2011.

Comments: 17 pages, 14 figures; submitted

MSC Class: 05C62; 68R10 (Primary)

arXiv:1111.5291 [pdf, ps, other]

On the structure of conjugation-free fundamental groups of conic-line arrangements

Authors: Michael Friedman, David Garber

Abstract: The fundamental group of the complement of a hyperplane arrangement plays an important role in studying the corresponding arrangements. In particular, for large families of hyperplane arrangements, this fundamental group, being isomorphic to the fundamental group of a complement of a line arrangement, has some remarkable properties: either it is a direct sum of free groups and a free abelian group… ▽ More The fundamental group of the complement of a hyperplane arrangement plays an important role in studying the corresponding arrangements. In particular, for large families of hyperplane arrangements, this fundamental group, being isomorphic to the fundamental group of a complement of a line arrangement, has some remarkable properties: either it is a direct sum of free groups and a free abelian group, or it has a conjugation-free geometric presentation. In this paper, we first give a complete proof to the following key lemma: if we draw a new line through only one intersection point of a given real line arrangement whose fundamental group is conjugation-free, then the fundamental group of the new arrangement is also conjugation-free. Second, we generalize this lemma to the case of conic-line arrangements. Moreover, we prove that once the graph associated to conic-line arrangements (defined slightly different than the corresponding graph for line arrangements) has no cycles, then the fundamental group of its complement has a conjugation-free geometric presentation and in addition can be written as a direct sum of free groups and a free abelian group. Also, we show that if the graph consists of one cycle, and the conic does not pass through all the multiple points corresponding to the vertices of the cycle, then the fundamental group has a conjugation-free geometric presentation as well. For conclusion, we extend the family of real line arrangements having a conjugation-free geometric presentation (for their fundamental group) by defining the notion of a conjugation-free graph. We also extend this notion to certain families of conic-line arrangements. △ Less

Submitted 28 April, 2013; v1 submitted 22 November, 2011; originally announced November 2011.

Comments: 42 pages, 26 figures; Totally revised version. Part of the material of the paper is separated into a new paper; submitted

MSC Class: 14H30 (Primary)

arXiv:1009.1349 [pdf, ps, other]

A conjugation-free geometric presentation of fundamental groups of arrangements II: Expansion and some properties

Authors: Meital Eliyahu, David Garber, Mina Teicher

Abstract: A conjugation-free geometric presentation of a fundamental group is a presentation with the natural topological generators $x_1, ..., x_n$ and the cyclic relations: $x_{i_k}x_{i_{k-1}} ... x_{i_1} = x_{i_{k-1}} ... x_{i_1} x_{i_k} = ... = x_{i_1} x_{i_k} ... x_{i_2}$ with no conjugations on the generators. We have already proved that if the graph of the arrangement is a disjoint union of cycles,… ▽ More A conjugation-free geometric presentation of a fundamental group is a presentation with the natural topological generators $x_1, ..., x_n$ and the cyclic relations: $x_{i_k}x_{i_{k-1}} ... x_{i_1} = x_{i_{k-1}} ... x_{i_1} x_{i_k} = ... = x_{i_1} x_{i_k} ... x_{i_2}$ with no conjugations on the generators. We have already proved that if the graph of the arrangement is a disjoint union of cycles, then its fundamental group has a conjugation-free geometric presentation. In this paper, we extend this property to arrangements whose graphs are a disjoint union of cycle-tree graphs. Moreover, we study some properties of this type of presentations for a fundamental group of a line arrangement's complement. We show that these presentations satisfy a completeness property in the sense of Dehornoy, if the corresponding graph of the arrangement has no edges. The completeness property is a powerful property which leads to many nice properties concerning the presentation (such as the left-cancellativity of the associated monoid and yields some simple criterion for the solvability of the word problem in the group). △ Less

Submitted 6 June, 2012; v1 submitted 7 September, 2010; originally announced September 2010.

Comments: 17 pages, 9 figures; final version, which corrects a mistake in the published version

MSC Class: 14H30 (Primary) 32S22; 57M05; 20M05; 20F05 (Secondary)

arXiv:1008.2638 [pdf, ps, other]

On the Orchard crossing number of complete bipartite graphs

Authors: Elie Feder, David Garber

Abstract: We compute the Orchard crossing number, which is defined in a similar way to the rectilinear crossing number, for the complete bipartite graphs K_{n,n}. We compute the Orchard crossing number, which is defined in a similar way to the rectilinear crossing number, for the complete bipartite graphs K_{n,n}. △ Less

Submitted 16 August, 2010; originally announced August 2010.

Comments: 23 pages, 4 figures; Submitted

MSC Class: 05C62

arXiv:1006.3447 [pdf, ps, other]

Eulerian partitions for configurations of skew lines

Authors: Roland Bacher, David Garber

Abstract: In this paper, which is a complement of \cite{BG}, we study a few elementary invariants for configurations of skew lines, as introduced and analyzed first by Viro and his collaborators. We slightly simplify the exposition of some known invariants and use them to define a natural partition of the lines in a skew configuration. We also describe an algorithm which constructs a spindle-permutation f… ▽ More In this paper, which is a complement of \cite{BG}, we study a few elementary invariants for configurations of skew lines, as introduced and analyzed first by Viro and his collaborators. We slightly simplify the exposition of some known invariants and use them to define a natural partition of the lines in a skew configuration. We also describe an algorithm which constructs a spindle-permutation for a given switching class, or proves non-existence of such a spindle-permutation. △ Less

Submitted 17 June, 2010; originally announced June 2010.

Comments: 17 pages, 8 Figures. This paper is an update of a part of math/0205245. Submitted for publication

arXiv:0810.5615 [pdf, ps, other]

Conjugation-free geometric presentations of fundamental groups of arrangements

Authors: Meital Eliyahu, David Garber, Mina Teicher

Abstract: We introduce the notion of a conjugation-free geometric presentation for a fundamental group of a line arrangement's complement, and we show that the fundamental groups of the following family of arrangements have a conjugation-free geometric presentation: A real arrangement L, whose graph of multiple points is a union of disjoint cycles, has no line with more than two multiple points, and where t… ▽ More We introduce the notion of a conjugation-free geometric presentation for a fundamental group of a line arrangement's complement, and we show that the fundamental groups of the following family of arrangements have a conjugation-free geometric presentation: A real arrangement L, whose graph of multiple points is a union of disjoint cycles, has no line with more than two multiple points, and where the multiplicities of the multiple points are arbitrary. We also compute the exact group structure (by means of a semi-direct product of groups) of the arrangement of 6 lines whose graph consists of a cycle of length 3, and all the multiple points have multiplicity 3. △ Less

Submitted 16 March, 2010; v1 submitted 30 October, 2008; originally announced October 2008.

Comments: 28 pages, many figures; totally revised version; submitted

MSC Class: 14H30 (Primary); 32S22; 57M05 (Secondary)

arXiv:0806.2114 [pdf, ps, other]

On the excedance sets of colored permutations

Authors: Eli Bagno, David Garber, Robert Shwartz

Abstract: We define the excedence set and the excedance word on $G_{r,n}$, generalizing a work of Ehrenborg and Steingrimsson and use the inclusion-exclusion principle to calculate the number of colored permutations having a prescribed excedance word. We show some symmetric properties as Log concavity and unimodality of a specific sequence of excedance words. We define the excedence set and the excedance word on $G_{r,n}$, generalizing a work of Ehrenborg and Steingrimsson and use the inclusion-exclusion principle to calculate the number of colored permutations having a prescribed excedance word. We show some symmetric properties as Log concavity and unimodality of a specific sequence of excedance words. △ Less

Submitted 12 June, 2008; originally announced June 2008.

Comments: 9 pages, no figures; submitted

MSC Class: 05E15

Showing 1–50 of 72 results for author: Garber, D