Skip to main content

Showing 51–100 of 119 results for author: Sra, S

.
  1. arXiv:1906.05413  [pdf, other

    cs.LG stat.ML

    Flexible Modeling of Diversity with Strongly Log-Concave Distributions

    Authors: Joshua Robinson, Suvrit Sra, Stefanie Jegelka

    Abstract: Strongly log-concave (SLC) distributions are a rich class of discrete probability distributions over subsets of some ground set. They are strictly more general than strongly Rayleigh (SR) distributions such as the well-known determinantal point process. While SR distributions offer elegant models of diversity, they lack an easy control over how they express diversity. We propose SLC as the right e… ▽ More

    Submitted 12 June, 2019; originally announced June 2019.

  2. arXiv:1905.12436  [pdf, other

    math.OC

    Acceleration in First Order Quasi-strongly Convex Optimization by ODE Discretization

    Authors: **gzhao Zhang, Suvrit Sra, Ali Jadbabaie

    Abstract: We study gradient-based optimization methods obtained by direct Runge-Kutta discretization of the ordinary differential equation (ODE) describing the movement of a heavy-ball under constant friction coefficient. When the function is high order smooth and strongly convex, we show that directly simulating the ODE with known numerical integrators achieve acceleration in a nontrivial neighborhood of t… ▽ More

    Submitted 28 May, 2019; originally announced May 2019.

    Comments: arXiv admin note: text overlap with arXiv:1805.00521

  3. arXiv:1905.11881  [pdf, other

    math.OC cs.LG

    Why gradient clip** accelerates training: A theoretical justification for adaptivity

    Authors: **gzhao Zhang, Tianxing He, Suvrit Sra, Ali Jadbabaie

    Abstract: We provide a theoretical explanation for the effectiveness of gradient clip** in training deep neural networks. The key ingredient is a new smoothness condition derived from practical neural network training examples. We observe that gradient smoothness, a concept central to the analysis of first-order optimization algorithms that is often assumed to be a constant, demonstrates significant varia… ▽ More

    Submitted 10 February, 2020; v1 submitted 28 May, 2019; originally announced May 2019.

  4. arXiv:1901.09149  [pdf, other

    cs.LG math.OC stat.ML

    Esca** Saddle Points with Adaptive Gradient Methods

    Authors: Matthew Staib, Sashank J. Reddi, Satyen Kale, Sanjiv Kumar, Suvrit Sra

    Abstract: Adaptive methods such as Adam and RMSProp are widely used in deep learning but are not well understood. In this paper, we seek a crisp, clean and precise characterization of their behavior in nonconvex settings. To this end, we first provide a novel view of adaptive methods as preconditioned SGD, where the preconditioner is estimated in an online manner. By studying the preconditioner on its own,… ▽ More

    Submitted 3 February, 2020; v1 submitted 25 January, 2019; originally announced January 2019.

    Comments: Update Theorem 4.1 and proof to use martingale concentration bounds, i.e. matrix Freedman

  5. arXiv:1812.03190  [pdf, other

    cs.LG stat.ML

    Deep-RBF Networks Revisited: Robust Classification with Rejection

    Authors: Pourya Habib Zadeh, Reshad Hosseini, Suvrit Sra

    Abstract: One of the main drawbacks of deep neural networks, like many other classifiers, is their vulnerability to adversarial attacks. An important reason for their vulnerability is assigning high confidence to regions with few or even no feature points. By feature points, we mean a nonlinear transformation of the input space extracting a meaningful representation of the input data. On the other hand, dee… ▽ More

    Submitted 7 December, 2018; originally announced December 2018.

  6. arXiv:1811.04194  [pdf, other

    math.OC cs.LG

    R-SPIDER: A Fast Riemannian Stochastic Optimization Algorithm with Curvature Independent Rate

    Authors: **gzhao Zhang, Hongyi Zhang, Suvrit Sra

    Abstract: We study smooth stochastic optimization problems on Riemannian manifolds. Via adapting the recently proposed SPIDER algorithm \citep{fang2018spider} (a variance reduced stochastic method) to Riemannian manifold, we can achieve faster rate than known algorithms in both the finite sum and stochastic settings. Unlike previous works, by \emph{not} resorting to bounding iterate distances, our analysis… ▽ More

    Submitted 14 December, 2018; v1 submitted 9 November, 2018; originally announced November 2018.

    Comments: arXiv admin note: text overlap with arXiv:1605.07147

  7. arXiv:1810.07770  [pdf, ps, other

    cs.LG stat.ML

    Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity

    Authors: Chulhee Yun, Suvrit Sra, Ali Jadbabaie

    Abstract: We study finite sample expressivity, i.e., memorization power of ReLU networks. Recent results require $N$ hidden nodes to memorize/interpolate arbitrary $N$ data points. In contrast, by exploiting depth, we show that 3-layer ReLU networks with $Ω(\sqrt{N})$ hidden nodes can perfectly memorize most datasets with $N$ points. We also prove that width $Θ(\sqrt{N})$ is necessary and sufficient for mem… ▽ More

    Submitted 29 October, 2019; v1 submitted 17 October, 2018; originally announced October 2018.

    Comments: 28 pages, 2 figures. NeurIPS 2019 Camera-ready version

  8. arXiv:1809.10858  [pdf, ps, other

    math.OC cs.LG stat.ML

    Efficiently testing local optimality and esca** saddles for ReLU networks

    Authors: Chulhee Yun, Suvrit Sra, Ali Jadbabaie

    Abstract: We provide a theoretical algorithm for checking local optimality and esca** saddles at nondifferentiable points of empirical risks of two-layer ReLU networks. Our algorithm receives any parameter value and returns: local minimum, second-order stationary point, or a strict descent direction. The presence of $M$ data points on the nondifferentiability of the ReLU divides the parameter space into a… ▽ More

    Submitted 28 May, 2019; v1 submitted 28 September, 2018; originally announced September 2018.

    Comments: 23 pages, appeared at ICLR 2019

  9. arXiv:1806.10077  [pdf, other

    math.OC stat.ML

    Random Shuffling Beats SGD after Finite Epochs

    Authors: Jeff Z. HaoChen, Suvrit Sra

    Abstract: A long-standing problem in the theory of stochastic gradient descent (SGD) is to prove that its without-replacement version RandomShuffle converges faster than the usual with-replacement version. We present the first (to our knowledge) non-asymptotic solution to this problem, which shows that after a "reasonable" number of epochs RandomShuffle indeed converges faster than SGD. Specifically, we pro… ▽ More

    Submitted 7 October, 2019; v1 submitted 26 June, 2018; originally announced June 2018.

  10. arXiv:1806.02812  [pdf, other

    math.OC cs.LG

    Towards Riemannian Accelerated Gradient Methods

    Authors: Hongyi Zhang, Suvrit Sra

    Abstract: We propose a Riemannian version of Nesterov's Accelerated Gradient algorithm (RAGD), and show that for geodesically smooth and strongly convex problems, within a neighborhood of the minimizer whose radius depends on the condition number as well as the sectional curvature of the manifold, RAGD converges to the minimizer with acceleration. Unlike the algorithm in (Liu et al., 2017) that requires the… ▽ More

    Submitted 7 June, 2018; originally announced June 2018.

    Comments: Published in 31th Annual Conference on Learning Theory (COLT'18)

  11. arXiv:1805.00521  [pdf, other

    math.OC cs.LG stat.ML

    Direct Runge-Kutta Discretization Achieves Acceleration

    Authors: **gzhao Zhang, Aryan Mokhtari, Suvrit Sra, Ali Jadbabaie

    Abstract: We study gradient-based optimization methods obtained by directly discretizing a second-order ordinary differential equation (ODE) related to the continuous limit of Nesterov's accelerated gradient method. When the function is smooth enough, we show that acceleration can be achieved by a stable discretization of this ODE using standard Runge-Kutta integrators. Specifically, we prove that under Lip… ▽ More

    Submitted 27 November, 2018; v1 submitted 1 May, 2018; originally announced May 2018.

    Comments: 24 pages. 4 figures

  12. arXiv:1803.11064  [pdf, other

    cs.CV

    Non-Linear Temporal Subspace Representations for Activity Recognition

    Authors: Anoop Cherian, Suvrit Sra, Stephen Gould, Richard Hartley

    Abstract: Representations that can compactly and effectively capture the temporal evolution of semantic content are important to computer vision and machine learning algorithms that operate on multi-variate time-series data. We investigate such representations motivated by the task of human action recognition. Here each data instance is encoded by a multivariate feature (such as via a deep CNN) where action… ▽ More

    Submitted 27 March, 2018; originally announced March 2018.

    Comments: Accepted at the IEEE International Conference on Computer Vision and Pattern Recognition, CVPR, 2018. arXiv admin note: substantial text overlap with arXiv:1705.08583

  13. arXiv:1803.10141  [pdf, other

    math.OC math.CA

    New concavity and convexity results for symmetric polynomials and their ratios

    Authors: Suvrit Sra

    Abstract: We prove some "power" generalizations of Marcus-Lopes-style (including McLeod and Bullen) concavity inequalities for elementary symmetric polynomials, and convexity inequalities (of McLeod and Baston) for complete homogeneous symmetric polynomials. Finally, we present sundry concavity results for elementary symmetric polynomials, of which the main result is a concavity theorem that among other imp… ▽ More

    Submitted 27 March, 2018; originally announced March 2018.

    Comments: 6 pages

  14. arXiv:1802.05649  [pdf, other

    cs.LG

    Learning Determinantal Point Processes by Corrective Negative Sampling

    Authors: Zelda Mariet, Mike Gartrell, Suvrit Sra

    Abstract: Determinantal Point Processes (DPPs) have attracted significant interest from the machine-learning community due to their ability to elegantly and tractably model the delicate balance between quality and diversity of sets. DPPs are commonly learned from data using maximum likelihood estimation (MLE). While fitting observed sets well, MLE for DPPs may also assign high likelihoods to unobserved sets… ▽ More

    Submitted 26 February, 2019; v1 submitted 15 February, 2018; originally announced February 2018.

    Comments: Will appear in AISTATS 2019

  15. arXiv:1802.03487  [pdf, ps, other

    cs.LG math.OC stat.ML

    Small nonlinearities in activation functions create bad local minima in neural networks

    Authors: Chulhee Yun, Suvrit Sra, Ali Jadbabaie

    Abstract: We investigate the loss surface of neural networks. We prove that even for one-hidden-layer networks with "slightest" nonlinearity, the empirical risks have spurious local minima in most cases. Our results thus indicate that in general "no spurious local minima" is a property limited to deep linear networks, and insights obtained from linear networks may not be robust. Specifically, for ReLU(-like… ▽ More

    Submitted 28 May, 2019; v1 submitted 9 February, 2018; originally announced February 2018.

    Comments: 33 pages, appeared at ICLR 2019

  16. arXiv:1710.10770  [pdf, other

    math.OC cs.LG math.FA

    Riemannian Optimization via Frank-Wolfe Methods

    Authors: Melanie Weber, Suvrit Sra

    Abstract: We study projection-free methods for constrained Riemannian optimization. In particular, we propose the Riemannian Frank-Wolfe (RFW) method. We analyze non-asymptotic convergence rates of RFW to an optimum for (geodesically) convex problems, and to a critical point for nonconvex objectives. We also present a practical setting under which RFW can attain a linear convergence rate. As a concrete exam… ▽ More

    Submitted 24 November, 2021; v1 submitted 30 October, 2017; originally announced October 2017.

    Comments: Under Review. Updated version with new section on approximately solving the RLO

    MSC Class: 46N10; 15A24; 65K10; 49Q99

  17. arXiv:1709.01434  [pdf, other

    cs.LG cs.AI

    A Generic Approach for Esca** Saddle points

    Authors: Sashank J Reddi, Manzil Zaheer, Suvrit Sra, Barnabas Poczos, Francis Bach, Ruslan Salakhutdinov, Alexander J Smola

    Abstract: A central challenge to using first-order methods for optimizing nonconvex problems is the presence of saddle points. First-order methods often get stuck at saddle points, greatly deteriorating their performance. Typically, to escape from saddles one has to use second-order methods. However, most works on second-order methods rely extensively on expensive Hessian-based computations, making them imp… ▽ More

    Submitted 5 September, 2017; originally announced September 2017.

  18. arXiv:1707.02444  [pdf, ps, other

    cs.LG math.OC stat.ML

    Global optimality conditions for deep neural networks

    Authors: Chulhee Yun, Suvrit Sra, Ali Jadbabaie

    Abstract: We study the error landscape of deep linear and nonlinear neural networks with the squared error loss. Minimizing the loss of a deep linear neural network is a nonconvex problem, and despite recent progress, our understanding of this loss surface is still incomplete. For deep linear networks, we present necessary and sufficient conditions for a critical point of the risk function to be a global mi… ▽ More

    Submitted 24 March, 2018; v1 submitted 8 July, 2017; originally announced July 2017.

    Comments: 14 pages. A camera-ready version that will appear at ICLR 2018

  19. arXiv:1706.09549  [pdf, other

    cs.LG

    Distributional Adversarial Networks

    Authors: Chengtao Li, David Alvarez-Melis, Keyulu Xu, Stefanie Jegelka, Suvrit Sra

    Abstract: We propose a framework for adversarial training that relies on a sample rather than a single sample point as the fundamental unit of discrimination. Inspired by discrepancy measures and two-sample tests between probability distributions, we propose two such distributional adversaries that operate and predict on samples, and show how they can be easily implemented on top of existing models. Various… ▽ More

    Submitted 9 July, 2017; v1 submitted 28 June, 2017; originally announced June 2017.

  20. arXiv:1706.03267  [pdf, other

    stat.ML cs.LG

    An Alternative to EM for Gaussian Mixture Models: Batch and Stochastic Riemannian Optimization

    Authors: Reshad Hosseini, Suvrit Sra

    Abstract: We consider maximum likelihood estimation for Gaussian Mixture Models (Gmms). This task is almost invariably solved (in theory and practice) via the Expectation Maximization (EM) algorithm. EM owes its success to various factors, of which is its ability to fulfill positive definiteness constraints in closed form is of key importance. We propose an alternative to EM by appealing to the rich Riemann… ▽ More

    Submitted 10 June, 2017; originally announced June 2017.

    Comments: 21 pages, 6 figures

  21. arXiv:1705.09677  [pdf, ps, other

    math.ST

    Elementary Symmetric Polynomials for Optimal Experimental Design

    Authors: Zelda Mariet, Suvrit Sra

    Abstract: We revisit the classical problem of optimal experimental design (OED) under a new mathematical model grounded in a geometric motivation. Specifically, we introduce models based on elementary symmetric polynomials; these polynomials capture "partial volumes" and offer a graded interpolation between the widely used A-optimal design and D-optimal design models, obtaining each of them as special cases… ▽ More

    Submitted 24 May, 2017; originally announced May 2017.

  22. arXiv:1705.08583  [pdf, other

    cs.CV

    Sequence Summarization Using Order-constrained Kernelized Feature Subspaces

    Authors: Anoop Cherian, Suvrit Sra, Richard Hartley

    Abstract: Representations that can compactly and effectively capture temporal evolution of semantic content are important to machine learning algorithms that operate on multi-variate time-series data. We investigate such representations motivated by the task of human action recognition. Here each data instance is encoded by a multivariate feature (such as via a deep CNN) where action dynamics are characteri… ▽ More

    Submitted 23 May, 2017; originally announced May 2017.

  23. arXiv:1703.02674  [pdf, other

    stat.ML

    Polynomial Time Algorithms for Dual Volume Sampling

    Authors: Chengtao Li, Stefanie Jegelka, Suvrit Sra

    Abstract: We study dual volume sampling, a method for selecting k columns from an n x m short and wide matrix (n <= k <= m) such that the probability of selection is proportional to the volume spanned by the rows of the induced submatrix. This method was proposed by Avron and Boutsidis (2013), who showed it to be a promising method for column subset selection and its multiple applications. However, its wide… ▽ More

    Submitted 15 November, 2017; v1 submitted 7 March, 2017; originally announced March 2017.

  24. arXiv:1608.01008  [pdf, other

    stat.ML

    Fast Mixing Markov Chains for Strongly Rayleigh Measures, DPPs, and Constrained Sampling

    Authors: Chengtao Li, Stefanie Jegelka, Suvrit Sra

    Abstract: We study probability measures induced by set functions with constraints. Such measures arise in a variety of real-world settings, where prior knowledge, resource limitations, or other pragmatic considerations impose constraints. We consider the task of rapidly sampling from such constrained measures, and develop fast Markov chain samplers for them. Our first main result is for MCMC sampling from S… ▽ More

    Submitted 8 January, 2017; v1 submitted 2 August, 2016; originally announced August 2016.

    Comments: The present version subsumes arXiv:1607.03559

  25. arXiv:1607.08254  [pdf, other

    math.OC cs.LG stat.ML

    Stochastic Frank-Wolfe Methods for Nonconvex Optimization

    Authors: Sashank J. Reddi, Suvrit Sra, Barnabas Poczos, Alex Smola

    Abstract: We study Frank-Wolfe methods for nonconvex stochastic and finite-sum optimization problems. Frank-Wolfe methods (in the convex case) have gained tremendous recent interest in machine learning and optimization communities due to their projection-free property and their ability to exploit structured constraints. However, our understanding of these algorithms in the nonconvex setting is fairly limite… ▽ More

    Submitted 29 July, 2016; v1 submitted 27 July, 2016; originally announced July 2016.

  26. arXiv:1607.05002  [pdf, ps, other

    stat.ML cs.LG

    Geometric Mean Metric Learning

    Authors: Pourya Habib Zadeh, Reshad Hosseini, Suvrit Sra

    Abstract: We revisit the task of learning a Euclidean metric from data. We approach this problem from first principles and formulate it as a surprisingly simple optimization problem. Indeed, our formulation even admits a closed form solution. This solution possesses several very attractive properties: (i) an innate geometric appeal through the Riemannian geometry of positive definite matrices; (ii) ease of… ▽ More

    Submitted 18 July, 2016; originally announced July 2016.

    Comments: 7 pages, 4 figures

  27. arXiv:1607.03559  [pdf, other

    cs.LG cs.DS math.PR stat.ML

    Fast Sampling for Strongly Rayleigh Measures with Application to Determinantal Point Processes

    Authors: Chengtao Li, Stefanie Jegelka, Suvrit Sra

    Abstract: In this note we consider sampling from (non-homogeneous) strongly Rayleigh probability measures. As an important corollary, we obtain a fast mixing Markov Chain sampler for Determinantal Point Processes.

    Submitted 12 July, 2016; originally announced July 2016.

  28. arXiv:1605.08374  [pdf, other

    cs.LG cs.AI stat.ML

    Kronecker Determinantal Point Processes

    Authors: Zelda Mariet, Suvrit Sra

    Abstract: Determinantal Point Processes (DPPs) are probabilistic models over all subsets a ground set of $N$ items. They have recently gained prominence in several applications that rely on "diverse" subsets. However, their applicability to large problems is still limited due to the $\mathcal O(N^3)$ complexity of core tasks such as sampling and learning. We enable efficient sampling and learning for DPPs b… ▽ More

    Submitted 26 May, 2016; originally announced May 2016.

  29. arXiv:1605.07147  [pdf, other

    math.OC cs.LG

    Riemannian SVRG: Fast Stochastic Optimization on Riemannian Manifolds

    Authors: Hongyi Zhang, Sashank J. Reddi, Suvrit Sra

    Abstract: We study optimization of finite sums of geodesically smooth functions on Riemannian manifolds. Although variance reduction techniques for optimizing finite-sums have witnessed tremendous attention in the recent years, existing work is limited to vector space problems. We introduce Riemannian SVRG (RSVRG), a new variance reduced Riemannian optimization method. We analyze RSVRG for both geodesically… ▽ More

    Submitted 7 April, 2017; v1 submitted 23 May, 2016; originally announced May 2016.

    Comments: This is the final version that appeared in NIPS 2016. Our proof of Lemma 2 was incorrect in the previous arXiv version. (9 pages paper + 6 pages appendix)

    Journal ref: Advances in Neural Information Processing Systems 29 (NIPS 2016)

  30. arXiv:1605.06900  [pdf, other

    math.OC cs.LG stat.ML

    Fast Stochastic Methods for Nonsmooth Nonconvex Optimization

    Authors: Sashank J. Reddi, Suvrit Sra, Barnabas Poczos, Alex Smola

    Abstract: We analyze stochastic algorithms for optimizing nonconvex, nonsmooth finite-sum problems, where the nonconvex part is smooth and the nonsmooth part is convex. Surprisingly, unlike the smooth case, our knowledge of this fundamental problem is very limited. For example, it is not known whether the proximal stochastic gradient method with constant minibatch converges to a stationary point. To tackle… ▽ More

    Submitted 23 May, 2016; originally announced May 2016.

  31. arXiv:1605.00316  [pdf, other

    stat.ML

    Directional Statistics in Machine Learning: a Brief Review

    Authors: Suvrit Sra

    Abstract: The modern data analyst must cope with data encoded in various forms, vectors, matrices, strings, graphs, or more. Consequently, statistical and machine learning models tailored to different data encodings are important. We focus on data encoded as normalized vectors, so that their "direction" is more important than their magnitude. Specifically, we consider high-dimensional vectors that lie eithe… ▽ More

    Submitted 1 May, 2016; originally announced May 2016.

    Comments: 12 pages, slightly modified version of submitted book chapter

  32. arXiv:1604.02027  [pdf, other

    cs.LG cs.CL stat.ML

    Combinatorial Topic Models using Small-Variance Asymptotics

    Authors: Ke Jiang, Suvrit Sra, Brian Kulis

    Abstract: Topic models have emerged as fundamental tools in unsupervised machine learning. Most modern topic modeling algorithms take a probabilistic view and derive inference algorithms based on Latent Dirichlet Allocation (LDA) or its variants. In contrast, we study topic modeling as a combinatorial optimization problem, and propose a new objective function derived from LDA by passing to the small-varianc… ▽ More

    Submitted 26 May, 2016; v1 submitted 7 April, 2016; originally announced April 2016.

    Comments: 19 pages

  33. arXiv:1603.06160  [pdf, other

    math.OC cs.LG cs.NE stat.ML

    Stochastic Variance Reduction for Nonconvex Optimization

    Authors: Sashank J. Reddi, Ahmed Hefny, Suvrit Sra, Barnabas Poczos, Alex Smola

    Abstract: We study nonconvex finite-sum problems and analyze stochastic variance reduced gradient (SVRG) methods for them. SVRG and related methods have recently surged into prominence for convex optimization given their edge over stochastic gradient descent (SGD); but their theoretical analysis almost exclusively assumes convexity. In contrast, we prove non-asymptotic rates of convergence (to stationary po… ▽ More

    Submitted 4 April, 2016; v1 submitted 19 March, 2016; originally announced March 2016.

    Comments: Minor feedback changes

  34. arXiv:1603.06159  [pdf, other

    math.OC cs.LG stat.ML

    Fast Incremental Method for Nonconvex Optimization

    Authors: Sashank J. Reddi, Suvrit Sra, Barnabas Poczos, Alex Smola

    Abstract: We analyze a fast incremental aggregated gradient method for optimizing nonconvex problems of the form $\min_x \sum_i f_i(x)$. Specifically, we analyze the SAGA algorithm within an Incremental First-order Oracle framework, and show that it converges to a stationary point provably faster than both gradient descent and stochastic gradient descent. We also discuss a Polyak's special class of nonconve… ▽ More

    Submitted 19 March, 2016; originally announced March 2016.

  35. arXiv:1603.06052  [pdf, other

    cs.LG

    Fast DPP Sampling for Nyström with Application to Kernel Methods

    Authors: Chengtao Li, Stefanie Jegelka, Suvrit Sra

    Abstract: The Nyström method has long been popular for scaling up kernel methods. Its theoretical guarantees and empirical performance rely critically on the quality of the landmarks selected. We study landmark selection for Nyström using Determinantal Point Processes (DPPs), discrete probability models that allow tractable generation of diverse samples. We prove that landmarks selected via DPPs guarantee b… ▽ More

    Submitted 28 May, 2016; v1 submitted 19 March, 2016; originally announced March 2016.

  36. arXiv:1602.06053  [pdf, other

    math.OC cs.LG stat.ML

    First-order Methods for Geodesically Convex Optimization

    Authors: Hongyi Zhang, Suvrit Sra

    Abstract: Geodesic convexity generalizes the notion of (vector space) convexity to nonlinear metric spaces. But unlike convex optimization, geodesically convex (g-convex) optimization is much less developed. In this paper we contribute to the understanding of g-convex optimization by develo** iteration complexity analysis for several first-order algorithms on Hadamard manifolds. Specifically, we prove upp… ▽ More

    Submitted 19 February, 2016; originally announced February 2016.

    Comments: 21 pages

  37. arXiv:1512.01904  [pdf, other

    stat.ML math.NA

    Gauss quadrature for matrix inverse forms with applications

    Authors: Chengtao Li, Suvrit Sra, Stefanie Jegelka

    Abstract: We present a framework for accelerating a spectrum of machine learning algorithms that require computation of bilinear inverse forms $u^\top A^{-1}u$, where $A$ is a positive definite matrix and $u$ a given vector. Our framework is built on Gauss-type quadrature and easily scales to large, sparse matrices. Further, it allows retrospective computation of lower and upper bounds on $u^\top A^{-1}u$,… ▽ More

    Submitted 28 May, 2016; v1 submitted 6 December, 2015; originally announced December 2015.

  38. arXiv:1511.05077  [pdf, other

    cs.LG cs.NE

    Diversity Networks: Neural Network Compression Using Determinantal Point Processes

    Authors: Zelda Mariet, Suvrit Sra

    Abstract: We introduce Divnet, a flexible technique for learning networks with diverse neurons. Divnet models neuronal diversity by placing a Determinantal Point Process (DPP) over neurons in a given layer. It uses this DPP to select a subset of diverse neurons and subsequently fuses the redundant neurons into the selected ones. Compared with previous approaches, Divnet offers a more principled, flexible te… ▽ More

    Submitted 18 April, 2017; v1 submitted 16 November, 2015; originally announced November 2015.

    Comments: This paper appeared under the shorter title Diversity Networks at ICLR 2016 (http://www.iclr.cc/doku.php?id=iclr2016:main#accepted_papers_conference_track)

  39. arXiv:1509.05902  [pdf, other

    math.CA quant-ph

    Logarithmic inequalities under an elementary symmetric polynomial dominance order

    Authors: Suvrit Sra

    Abstract: We consider a dominance order on positive vectors induced by the elementary symmetric polynomials. Under this dominance order we provide conditions that yield simple proofs of several monotonicity questions. Notably, our approach yields a quick (4 line) proof of the so-called \emph{"sum-of-squared-logarithms"} inequality conjectured in (P.~Neff, B.~Eidel, F.~Osterbrink, and R.~Martin, \emph{Applie… ▽ More

    Submitted 22 June, 2017; v1 submitted 19 September, 2015; originally announced September 2015.

    Comments: 6 pages; updated typesetting, some minor bugfixes

  40. arXiv:1509.02447  [pdf, other

    eess.SY math.OC

    Efficient Structured Matrix Rank Minimization

    Authors: Adams Wei Yu, Wanli Ma, Yaoliang Yu, Jaime G. Carbonell, Suvrit Sra

    Abstract: We study the problem of finding structured low-rank matrices using nuclear norm regularization where the structure is encoded by a linear map. In contrast to most known approaches for linearly structured rank minimization, we do not (a) use the full SVD, nor (b) resort to augmented Lagrangian techniques, nor (c) solve linear systems per iteration. Instead, we formulate the problem differently so t… ▽ More

    Submitted 8 September, 2015; originally announced September 2015.

  41. arXiv:1509.01618  [pdf, other

    cs.LG

    Efficient Sampling for k-Determinantal Point Processes

    Authors: Chengtao Li, Stefanie Jegelka, Suvrit Sra

    Abstract: Determinantal Point Processes (DPPs) are elegant probabilistic models of repulsion and diversity over discrete sets of items. But their applicability to large sets is hindered by expensive cubic-complexity matrix operations for basic tasks such as sampling. In light of this, we propose a new method for approximate sampling from discrete $k$-DPPs. Our method takes advantage of the diversity propert… ▽ More

    Submitted 27 May, 2016; v1 submitted 4 September, 2015; originally announced September 2015.

  42. arXiv:1508.05003  [pdf, other

    stat.ML cs.LG math.OC

    AdaDelay: Delay Adaptive Distributed Stochastic Convex Optimization

    Authors: Suvrit Sra, Adams Wei Yu, Mu Li, Alexander J. Smola

    Abstract: We study distributed stochastic convex optimization under the delayed gradient model where the server nodes perform parameter updates, while the worker nodes compute stochastic gradients. We discuss, analyze, and experiment with a setup motivated by the behavior of real-world distributed computation networks, where the machines are differently slow at different time. Therefore, we allow the parame… ▽ More

    Submitted 20 August, 2015; originally announced August 2015.

    Comments: 19 pages

  43. arXiv:1508.04039  [pdf, ps, other

    math.CA

    The sum of squared logarithms inequality in arbitrary dimensions

    Authors: Lev Borisov, Patrizio Neff, Suvrit Sra, Christian Thiel

    Abstract: We prove the \emph{sum of squared logarithms inequality} (SSLI) which states that for nonnegative vectors $x, y \in \mathbb{R}^n$ whose elementary symmetric polynomials satisfy $e_k(x)\le e_k(y)$ (for $1\le k < n$) and $e_n(x)=e_n(y)$, the inequality $\sum_i (\log x_i)^2 \le \sum_i (\log y_i)^2$ holds. Our proof of this inequality follows by a suitable extension to the complex plane. In particular… ▽ More

    Submitted 2 November, 2015; v1 submitted 17 August, 2015; originally announced August 2015.

    MSC Class: 26D05; 26D07; 30C15; 97H20

  44. arXiv:1508.00792  [pdf, other

    cs.LG

    Fixed-point algorithms for learning determinantal point processes

    Authors: Zelda Mariet, Suvrit Sra

    Abstract: Determinantal point processes (DPPs) offer an elegant tool for encoding probabilities over subsets of a ground set. Discrete DPPs are parametrized by a positive semidefinite matrix (called the DPP kernel), and estimating this kernel is key to learning DPPs from observed data. We consider the task of learning the DPP kernel, and develop for it a surprisingly simple yet effective new algorithm. Our… ▽ More

    Submitted 8 October, 2015; v1 submitted 4 August, 2015; originally announced August 2015.

    Comments: ICML, 2015

  45. arXiv:1507.08366  [pdf, other

    math.NA math.OC

    On the matrix square root via geometric optimization

    Authors: Suvrit Sra

    Abstract: This paper is triggered by the preprint "\emph{Computing Matrix Squareroot via Non Convex Local Search}" by Jain et al. (\textit{\textcolor{blue}{arXiv:1507.05854}}), which analyzes gradient-descent for computing the square root of a positive definite matrix. Contrary to claims of~\citet{jain2015}, our experiments reveal that Newton-like methods compute matrix square roots rapidly and reliably, ev… ▽ More

    Submitted 16 December, 2015; v1 submitted 29 July, 2015; originally announced July 2015.

    Comments: 8 pages, 12 plots, this version contains several more references and more words about the rank-deficient case

  46. arXiv:1507.02772  [pdf, ps, other

    cs.CV

    Riemannian Dictionary Learning and Sparse Coding for Positive Definite Matrices

    Authors: Anoop Cherian, Suvrit Sra

    Abstract: Data encoded as symmetric positive definite (SPD) matrices frequently arise in many areas of computer vision and machine learning. While these matrices form an open subset of the Euclidean space of symmetric matrices, viewing them through the lens of non-Euclidean Riemannian geometry often turns out to be better suited in capturing several desirable data properties. However, formulating classical… ▽ More

    Submitted 16 December, 2015; v1 submitted 9 July, 2015; originally announced July 2015.

  47. arXiv:1506.07677  [pdf, other

    stat.ML cs.LG math.OC

    Manifold Optimization for Gaussian Mixture Models

    Authors: Reshad Hosseini, Suvrit Sra

    Abstract: We take a new look at parameter estimation for Gaussian Mixture Models (GMMs). In particular, we propose using \emph{Riemannian manifold optimization} as a powerful counterpart to Expectation Maximization (EM). An out-of-the-box invocation of manifold optimization, however, fails spectacularly: it converges to the same solution but vastly slower. Driven by intuition from manifold convexity, we the… ▽ More

    Submitted 25 June, 2015; originally announced June 2015.

    Comments: 19 pages

  48. arXiv:1506.06840  [pdf, other

    cs.LG stat.ML

    On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants

    Authors: Sashank J. Reddi, Ahmed Hefny, Suvrit Sra, Barnabás Póczos, Alex Smola

    Abstract: We study optimization algorithms based on variance reduction for stochastic gradient descent (SGD). Remarkable recent progress has been made in this direction through development of algorithms like SAG, SVRG, SAGA. These algorithms have been shown to outperform SGD, both theoretically and empirically. However, asynchronous versions of these algorithms---a crucial requirement for modern large-scale… ▽ More

    Submitted 24 January, 2016; v1 submitted 22 June, 2015; originally announced June 2015.

  49. arXiv:1503.01563  [pdf, other

    cs.CV math.OC

    Convex Optimization for Parallel Energy Minimization

    Authors: K. S. Sesh Kumar, Alvaro Barbero, Stefanie Jegelka, Suvrit Sra, Francis Bach

    Abstract: Energy minimization has been an intensely studied core problem in computer vision. With growing image sizes (2D and 3D), it is now highly desirable to run energy minimization algorithms in parallel. But many existing algorithms, in particular, some efficient combinatorial algorithms, are difficult to par-allelize. By exploiting results from convex and submodular theory, we reformulate the quadrati… ▽ More

    Submitted 5 March, 2015; originally announced March 2015.

  50. On inequalities for normalized Schur functions

    Authors: Suvrit Sra

    Abstract: We prove a conjecture of Cuttler et al.~[2011] [A. Cuttler, C. Greene, and M. Skandera; \emph{Inequalities for symmetric means}. European J. Combinatorics, 32(2011), 745--761] on the monotonicity of \emph{normalized Schur functions} under the usual (dominance) partial-order on partitions. We believe that our proof technique may be helpful in obtaining similar inequalities for other symmetric funct… ▽ More

    Submitted 20 July, 2015; v1 submitted 16 February, 2015; originally announced February 2015.

    Comments: This version fixes the error of the previous one