-
Explicit diagonalization of an anti-triangular Cesaró matrix
Authors:
Suvrit Sra
Abstract:
We study a specific "anti-triangular" Cesaró matrix corresponding to a Markov chain. We derive closed forms for all the eigenvalues and eigenvectors of this matrix.
We study a specific "anti-triangular" Cesaró matrix corresponding to a Markov chain. We derive closed forms for all the eigenvalues and eigenvectors of this matrix.
△ Less
Submitted 4 April, 2015; v1 submitted 14 November, 2014;
originally announced November 2014.
-
Modular proximal optimization for multidimensional total-variation regularization
Authors:
Álvaro Barbero,
Suvrit Sra
Abstract:
We study \emph{TV regularization}, a widely used technique for eliciting structured sparsity. In particular, we propose efficient algorithms for computing prox-operators for $\ell_p$-norm TV. The most important among these is $\ell_1$-norm TV, for whose prox-operator we present a new geometric analysis which unveils a hitherto unknown connection to taut-string methods. This connection turns out to…
▽ More
We study \emph{TV regularization}, a widely used technique for eliciting structured sparsity. In particular, we propose efficient algorithms for computing prox-operators for $\ell_p$-norm TV. The most important among these is $\ell_1$-norm TV, for whose prox-operator we present a new geometric analysis which unveils a hitherto unknown connection to taut-string methods. This connection turns out to be remarkably useful as it shows how our geometry guided implementation results in efficient weighted and unweighted 1D-TV solvers, surpassing state-of-the-art methods. Our 1D-TV solvers provide the backbone for building more complex (two or higher-dimensional) TV solvers within a modular proximal optimization approach. We review the literature for an array of methods exploiting this strategy, and illustrate the benefits of our modular design through extensive suite of experiments on (i) image denoising, (ii) image deconvolution, (iii) four variants of fused-lasso, and (iv) video denoising. To underscore our claims and permit easy reproducibility, we provide all the reviewed and our new TV solvers in an easy to use multi-threaded C++, Matlab and Python library.
△ Less
Submitted 30 December, 2017; v1 submitted 3 November, 2014;
originally announced November 2014.
-
Hlawka-Popoviciu inequalities on positive definite tensors
Authors:
Wolfgang Berndt,
Suvrit Sra
Abstract:
We prove inequalities on symmetric tensor sums of positive definite operators. In particular, we prove multivariable operator inequalities inspired by generalizations to the well-known Hlawka and Popoviciu inequalities. As corollaries, we obtain generalized Hlawka and Popoviciu inequalities for determinants, permanents, and generalized matrix functions. The new operator inequalities and their coro…
▽ More
We prove inequalities on symmetric tensor sums of positive definite operators. In particular, we prove multivariable operator inequalities inspired by generalizations to the well-known Hlawka and Popoviciu inequalities. As corollaries, we obtain generalized Hlawka and Popoviciu inequalities for determinants, permanents, and generalized matrix functions. The new operator inequalities and their corollaries contain a few recently published inequalities on positive definite matrices as special cases.
△ Less
Submitted 15 November, 2014; v1 submitted 1 November, 2014;
originally announced November 2014.
-
Inference and Mixture Modeling with the Elliptical Gamma Distribution
Authors:
Reshad Hosseini,
Suvrit Sra,
Lucas Theis,
Matthias Bethge
Abstract:
We study modeling and inference with the Elliptical Gamma Distribution (EGD). We consider maximum likelihood (ML) estimation for EGD scatter matrices, a task for which we develop new fixed-point algorithms. Our algorithms are efficient and converge to global optima despite nonconvexity. Moreover, they turn out to be much faster than both a well-known iterative algorithm of Kent & Tyler (1991) and…
▽ More
We study modeling and inference with the Elliptical Gamma Distribution (EGD). We consider maximum likelihood (ML) estimation for EGD scatter matrices, a task for which we develop new fixed-point algorithms. Our algorithms are efficient and converge to global optima despite nonconvexity. Moreover, they turn out to be much faster than both a well-known iterative algorithm of Kent & Tyler (1991) and sophisticated manifold optimization algorithms. Subsequently, we invoke our ML algorithms as subroutines for estimating parameters of a mixture of EGDs. We illustrate our methods by applying them to model natural image statistics---the proposed EGD mixture model yields the most parsimonious model among several competing approaches.
△ Less
Submitted 20 December, 2015; v1 submitted 17 October, 2014;
originally announced October 2014.
-
Completely strong superadditivity of generalized matrix functions
Authors:
Minghua Lin,
Suvrit Sra
Abstract:
We prove that generalized matrix functions satisfy a block-matrix strong superadditivity inequality over the cone of positive semidefinite matrices. Our result extends a recent result of Paksoy-Turkmen-Zhang (V. Paksoy, R. Turkmen, F. Zhang, Inequalities of generalized matrix functions via tensor products, Electron. J. Linear Algebra 27 (2014) 332-341.). As an application, we obtain a short proof…
▽ More
We prove that generalized matrix functions satisfy a block-matrix strong superadditivity inequality over the cone of positive semidefinite matrices. Our result extends a recent result of Paksoy-Turkmen-Zhang (V. Paksoy, R. Turkmen, F. Zhang, Inequalities of generalized matrix functions via tensor products, Electron. J. Linear Algebra 27 (2014) 332-341.). As an application, we obtain a short proof of a classical inequality of Thompson (1961) on block matrix determinants.
△ Less
Submitted 7 October, 2014;
originally announced October 2014.
-
Parallel and Distributed Block-Coordinate Frank-Wolfe Algorithms
Authors:
Yu-Xiang Wang,
Veeranjaneyulu Sadhanala,
Wei Dai,
Willie Neiswanger,
Suvrit Sra,
Eric P. Xing
Abstract:
We develop parallel and distributed Frank-Wolfe algorithms; the former on shared memory machines with mini-batching, and the latter in a delayed update framework. Whenever possible, we perform computations asynchronously, which helps attain speedups on multicore machines as well as in distributed environments. Moreover, instead of worst-case bounded delays, our methods only depend (mildly) on \emp…
▽ More
We develop parallel and distributed Frank-Wolfe algorithms; the former on shared memory machines with mini-batching, and the latter in a delayed update framework. Whenever possible, we perform computations asynchronously, which helps attain speedups on multicore machines as well as in distributed environments. Moreover, instead of worst-case bounded delays, our methods only depend (mildly) on \emph{expected} delays, allowing them to be robust to stragglers and faulty worker threads. Our algorithms assume block-separable constraints, and subsume the recent Block-Coordinate Frank-Wolfe (BCFW) method~\citep{lacoste2013block}. Our analysis reveals problem-dependent quantities that govern the speedups of our methods over BCFW. We present experiments on structural SVM and Group Fused Lasso, obtaining significant speedups over competing state-of-the-art (and synchronous) methods.
△ Less
Submitted 12 February, 2016; v1 submitted 22 September, 2014;
originally announced September 2014.
-
Large-scale randomized-coordinate descent methods with non-separable linear constraints
Authors:
Sashank Reddi,
Ahmed Hefny,
Carlton Downey,
Avinava Dubey,
Suvrit Sra
Abstract:
We develop randomized (block) coordinate descent (CD) methods for linearly constrained convex optimization. Unlike most CD methods, we do not assume the constraints to be separable, but let them be coupled linearly. To our knowledge, ours is the first CD method that allows linear coupling constraints, without making the global iteration complexity have an exponential dependence on the number of co…
▽ More
We develop randomized (block) coordinate descent (CD) methods for linearly constrained convex optimization. Unlike most CD methods, we do not assume the constraints to be separable, but let them be coupled linearly. To our knowledge, ours is the first CD method that allows linear coupling constraints, without making the global iteration complexity have an exponential dependence on the number of constraints. We present algorithms and analysis for four key problem scenarios: (i) smooth; (ii) smooth + nonsmooth separable; (iii) asynchronous parallel; and (iv) stochastic. We illustrate empirical behavior of our algorithms by simulation experiments.
△ Less
Submitted 10 June, 2015; v1 submitted 9 September, 2014;
originally announced September 2014.
-
Randomized Nonlinear Component Analysis
Authors:
David Lopez-Paz,
Suvrit Sra,
Alex Smola,
Zoubin Ghahramani,
Bernhard Schölkopf
Abstract:
Classical methods such as Principal Component Analysis (PCA) and Canonical Correlation Analysis (CCA) are ubiquitous in statistics. However, these techniques are only able to reveal linear relationships in data. Although nonlinear variants of PCA and CCA have been proposed, these are computationally prohibitive in the large scale.
In a separate strand of recent research, randomized methods have…
▽ More
Classical methods such as Principal Component Analysis (PCA) and Canonical Correlation Analysis (CCA) are ubiquitous in statistics. However, these techniques are only able to reveal linear relationships in data. Although nonlinear variants of PCA and CCA have been proposed, these are computationally prohibitive in the large scale.
In a separate strand of recent research, randomized methods have been proposed to construct features that help reveal nonlinear patterns in data. For basic tasks such as regression or classification, random features exhibit little or no loss in performance, while achieving drastic savings in computational requirements.
In this paper we leverage randomness to design scalable new variants of nonlinear PCA and CCA; our ideas extend to key multivariate analysis tools such as spectral clustering or LDA. We demonstrate our algorithms through experiments on real-world data, on which we compare against the state-of-the-art. A simple R implementation of the presented algorithms is provided.
△ Less
Submitted 13 May, 2014; v1 submitted 1 February, 2014;
originally announced February 2014.
-
Conic geometric optimisation on the manifold of positive definite matrices
Authors:
Suvrit Sra,
Reshad Hosseini
Abstract:
We develop \emph{geometric optimisation} on the manifold of Hermitian positive definite (HPD) matrices. In particular, we consider optimising two types of cost functions: (i) geodesically convex (g-convex); and (ii) log-nonexpansive (LN). G-convex functions are nonconvex in the usual euclidean sense, but convex along the manifold and thus allow global optimisation. LN functions may fail to be even…
▽ More
We develop \emph{geometric optimisation} on the manifold of Hermitian positive definite (HPD) matrices. In particular, we consider optimising two types of cost functions: (i) geodesically convex (g-convex); and (ii) log-nonexpansive (LN). G-convex functions are nonconvex in the usual euclidean sense, but convex along the manifold and thus allow global optimisation. LN functions may fail to be even g-convex, but still remain globally optimisable due to their special structure. We develop theoretical tools to recognise and generate g-convex functions as well as cone theoretic fixed-point optimisation algorithms. We illustrate our techniques by applying them to maximum-likelihood parameter estimation for elliptically contoured distributions (a rich class that substantially generalises the multivariate normal distribution). We compare our fixed-point algorithms with sophisticated manifold optimisation methods and obtain notable speedups.
△ Less
Submitted 12 December, 2014; v1 submitted 4 December, 2013;
originally announced December 2013.
-
Statistical estimation for optimization problems on graphs
Authors:
Mikhail Langovoy,
Suvrit Sra
Abstract:
Large graphs abound in machine learning, data mining, and several related areas. A useful step towards analyzing such graphs is that of obtaining certain summary statistics - e.g., or the expected length of a shortest path between two nodes, or the expected weight of a minimum spanning tree of the graph, etc. These statistics provide insight into the structure of a graph, and they can help predict…
▽ More
Large graphs abound in machine learning, data mining, and several related areas. A useful step towards analyzing such graphs is that of obtaining certain summary statistics - e.g., or the expected length of a shortest path between two nodes, or the expected weight of a minimum spanning tree of the graph, etc. These statistics provide insight into the structure of a graph, and they can help predict global properties of a graph. Motivated thus, we propose to study statistical properties of structured subgraphs (of a given graph), in particular, to estimate the expected objective function value of a combinatorial optimization problem over these subgraphs. The general task is very difficult, if not unsolvable; so for concreteness we describe a more specific statistical estimation problem based on spanning trees. We hope that our position paper encourages others to also study other types of graphical structures for which one can prove nontrivial statistical estimates.
△ Less
Submitted 29 November, 2013;
originally announced November 2013.
-
Reflection methods for user-friendly submodular optimization
Authors:
Stefanie Jegelka,
Francis Bach,
Suvrit Sra
Abstract:
Recently, it has become evident that submodularity naturally captures widely occurring concepts in machine learning, signal processing and computer vision. Consequently, there is need for efficient optimization procedures for submodular functions, especially for minimization problems. While general submodular minimization is challenging, we propose a new method that exploits existing decomposabili…
▽ More
Recently, it has become evident that submodularity naturally captures widely occurring concepts in machine learning, signal processing and computer vision. Consequently, there is need for efficient optimization procedures for submodular functions, especially for minimization problems. While general submodular minimization is challenging, we propose a new method that exploits existing decomposability of submodular functions. In contrast to previous approaches, our method is neither approximate, nor impractical, nor does it need any cumbersome parameter tuning. Moreover, it is easy to implement and parallelize. A key component of our method is a formulation of the discrete submodular minimization problem as a continuous best approximation problem that is solved through a sequence of reflections, and its solution can be easily thresholded to obtain an optimal discrete solution. This method solves both the continuous and discrete formulations of the problem, and therefore has applications in learning, inference, and reconstruction. In our experiments, we illustrate the benefits of our method on two image segmentation tasks.
△ Less
Submitted 18 November, 2013;
originally announced November 2013.
-
Fast projections onto mixed-norm balls with applications
Authors:
Suvrit Sra
Abstract:
Joint sparsity offers powerful structural cues for feature selection, especially for variables that are expected to demonstrate a "grouped" behavior. Such behavior is commonly modeled via group-lasso, multitask lasso, and related methods where feature selection is effected via mixed-norms. Several mixed-norm based sparse models have received substantial attention, and for some cases efficient algo…
▽ More
Joint sparsity offers powerful structural cues for feature selection, especially for variables that are expected to demonstrate a "grouped" behavior. Such behavior is commonly modeled via group-lasso, multitask lasso, and related methods where feature selection is effected via mixed-norms. Several mixed-norm based sparse models have received substantial attention, and for some cases efficient algorithms are also available. Surprisingly, several constrained sparse models seem to be lacking scalable algorithms. We address this deficiency by presenting batch and online (stochastic-gradient) optimization methods, both of which rely on efficient projections onto mixed-norm balls. We illustrate our methods by applying them to the multitask lasso. We conclude by mentioning some open problems.
△ Less
Submitted 6 April, 2012;
originally announced April 2012.
-
Explicit eigenvalues of certain scaled trigonometric matrices
Authors:
Suvrit Sra
Abstract:
In a very recent paper "\emph{On eigenvalues and equivalent transformation of trigonometric matrices}" (D. Zhang, Z. Lin, and Y. Liu, LAA 436, 71--78 (2012)), the authors motivated and discussed a trigonometric matrix that arises in the design of finite impulse response (FIR) digital filters. The eigenvalues of this matrix shed light on the FIR filter design, so obtaining them in closed form was i…
▽ More
In a very recent paper "\emph{On eigenvalues and equivalent transformation of trigonometric matrices}" (D. Zhang, Z. Lin, and Y. Liu, LAA 436, 71--78 (2012)), the authors motivated and discussed a trigonometric matrix that arises in the design of finite impulse response (FIR) digital filters. The eigenvalues of this matrix shed light on the FIR filter design, so obtaining them in closed form was investigated. Zhang \emph{et al.}\ proved that their matrix had rank-4 and they conjectured closed form expressions for its eigenvalues, leaving a rigorous proof as an open problem. This paper studies trigonometric matrices significantly more general than theirs, deduces their rank, and derives closed-forms for their eigenvalues. As a corollary, it yields a short proof of the conjectures in the aforementioned paper.
△ Less
Submitted 29 April, 2012; v1 submitted 23 January, 2012;
originally announced January 2012.
-
Positive definite matrices and the S-divergence
Authors:
Suvrit Sra
Abstract:
Positive definite matrices abound in a dazzling variety of applications. This ubiquity can be in part attributed to their rich geometric structure: positive definite matrices form a self-dual convex cone whose strict interior is a Riemannian manifold. The manifold view is endowed with a "natural" distance function while the conic view is not. Nevertheless, drawing motivation from the conic view, w…
▽ More
Positive definite matrices abound in a dazzling variety of applications. This ubiquity can be in part attributed to their rich geometric structure: positive definite matrices form a self-dual convex cone whose strict interior is a Riemannian manifold. The manifold view is endowed with a "natural" distance function while the conic view is not. Nevertheless, drawing motivation from the conic view, we introduce the S-Divergence as a "natural" distance-like function on the open cone of positive definite matrices. We motivate the S-divergence via a sequence of results that connect it to the Riemannian distance. In particular, we show that (a) this divergence is the square of a distance; and (b) that it has several geometric properties similar to those of the Riemannian distance, though without being computationally as demanding. The S-divergence is even more intriguing: although nonconvex, we can still compute matrix means and medians using it to global optimality. We complement our results with some numerical experiments illustrating our theorems and our optimization algorithm for computing matrix medians.
△ Less
Submitted 27 December, 2013; v1 submitted 8 October, 2011;
originally announced October 2011.
-
Nonconvex proximal splitting: batch and incremental algorithms
Authors:
Suvrit Sra
Abstract:
Within the unmanageably large class of nonconvex optimization, we consider the rich subclass of nonsmooth problems that have composite objectives---this already includes the extensively studied convex, composite objective problems as a special case. For this subclass, we introduce a powerful, new framework that permits asymptotically non-vanishing perturbations. In particular, we develop perturbat…
▽ More
Within the unmanageably large class of nonconvex optimization, we consider the rich subclass of nonsmooth problems that have composite objectives---this already includes the extensively studied convex, composite objective problems as a special case. For this subclass, we introduce a powerful, new framework that permits asymptotically non-vanishing perturbations. In particular, we develop perturbation-based batch and incremental (online like) nonconvex proximal splitting algorithms. To our knowledge, this is the first time that such perturbation-based nonconvex splitting algorithms are being proposed and analyzed. While the main contribution of the paper is the theoretical framework, we complement our results by presenting some empirical results on matrix factorization.
△ Less
Submitted 17 September, 2012; v1 submitted 1 September, 2011;
originally announced September 2011.
-
Sparse Inverse Covariance Estimation via an Adaptive Gradient-Based Method
Authors:
Suvrit Sra,
Dongmin Kim
Abstract:
We study the problem of estimating from data, a sparse approximation to the inverse covariance matrix. Estimating a sparsity constrained inverse covariance matrix is a key component in Gaussian graphical model learning, but one that is numerically very challenging. We address this challenge by develo** a new adaptive gradient-based method that carefully combines gradient information with an adap…
▽ More
We study the problem of estimating from data, a sparse approximation to the inverse covariance matrix. Estimating a sparsity constrained inverse covariance matrix is a key component in Gaussian graphical model learning, but one that is numerically very challenging. We address this challenge by develo** a new adaptive gradient-based method that carefully combines gradient information with an adaptive step-scaling strategy, which results in a scalable, highly competitive method. Our algorithm, like its predecessors, maximizes an $\ell_1$-norm penalized log-likelihood and has the same per iteration arithmetic complexity as the best methods in its class. Our experiments reveal that our approach outperforms state-of-the-art competitors, often significantly so, for large problems.
△ Less
Submitted 25 June, 2011;
originally announced June 2011.
-
The Multivariate Watson Distribution: Maximum-Likelihood Estimation and other Aspects
Authors:
Suvrit Sra,
Dmitrii Karp
Abstract:
This paper studies fundamental aspects of modelling data using multivariate Watson distributions. Although these distributions are natural for modelling axially symmetric data (i.e., unit vectors where $\pm \x$ are equivalent), for high-dimensions using them can be difficult. Why so? Largely because for Watson distributions even basic tasks such as maximum-likelihood are numerically challenging. T…
▽ More
This paper studies fundamental aspects of modelling data using multivariate Watson distributions. Although these distributions are natural for modelling axially symmetric data (i.e., unit vectors where $\pm \x$ are equivalent), for high-dimensions using them can be difficult. Why so? Largely because for Watson distributions even basic tasks such as maximum-likelihood are numerically challenging. To tackle the numerical difficulties some approximations have been derived---but these are either grossly inaccurate in high-dimensions (\emph{Directional Statistics}, Mardia & Jupp. 2000) or when reasonably accurate (\emph{J. Machine Learning Research, W. & C.P., v2}, Bijral \emph{et al.}, 2007, pp. 35--42), they lack theoretical justification. We derive new approximations to the maximum-likelihood estimates; our approximations are theoretically well-defined, numerically accurate, and easy to compute. We build on our parameter estimation and discuss mixture-modelling with Watson distributions; here we uncover a hitherto unknown connection to the "diametrical clustering" algorithm of Dhillon \emph{et al.} (\emph{Bioinformatics}, 19(13), 2003, pp. 1612--1619).
△ Less
Submitted 25 May, 2012; v1 submitted 22 April, 2011;
originally announced April 2011.
-
A Trivial Observation related to Sparse Recovery
Authors:
Suvrit Sra
Abstract:
We make a trivial modification to the elegant analysis of Garg and Khandekar
(\emph{Gradient Descent with Sparsification} ICML 2009) that replaces the standard Restricted Isometry Property (RIP), with another RIP-type property (which could be simpler than the RIP, but we are not sure; it could be as hard as the RIP to check, thereby rendering this little writeup totally worthless).
We make a trivial modification to the elegant analysis of Garg and Khandekar
(\emph{Gradient Descent with Sparsification} ICML 2009) that replaces the standard Restricted Isometry Property (RIP), with another RIP-type property (which could be simpler than the RIP, but we are not sure; it could be as hard as the RIP to check, thereby rendering this little writeup totally worthless).
△ Less
Submitted 27 June, 2009; v1 submitted 26 June, 2009;
originally announced June 2009.
-
Approximation Algorithms for Bregman Co-clustering and Tensor Clustering
Authors:
Stefanie Jegelka,
Suvrit Sra,
Arindam Banerjee
Abstract:
In the past few years powerful generalizations to the Euclidean k-means problem have been made, such as Bregman clustering [7], co-clustering (i.e., simultaneous clustering of rows and columns of an input matrix) [9,18], and tensor clustering [8,34]. Like k-means, these more general problems also suffer from the NP-hardness of the associated optimization. Researchers have developed approximation…
▽ More
In the past few years powerful generalizations to the Euclidean k-means problem have been made, such as Bregman clustering [7], co-clustering (i.e., simultaneous clustering of rows and columns of an input matrix) [9,18], and tensor clustering [8,34]. Like k-means, these more general problems also suffer from the NP-hardness of the associated optimization. Researchers have developed approximation algorithms of varying degrees of sophistication for k-means, k-medians, and more recently also for Bregman clustering [2]. However, there seem to be no approximation algorithms for Bregman co- and tensor clustering. In this paper we derive the first (to our knowledge) guaranteed methods for these increasingly important clustering settings. Going beyond Bregman divergences, we also prove an approximation factor for tensor clustering with arbitrary separable metrics. Through extensive experiments we evaluate the characteristics of our method, and show that it also has practical impact.
△ Less
Submitted 9 November, 2009; v1 submitted 1 December, 2008;
originally announced December 2008.