Skip to main content

Showing 1–50 of 57 results for author: Suvrit

Searching in archive stat. Search in all archives.
.
  1. arXiv:2402.10357  [pdf, other

    math.ST cs.LG math.PR stat.CO stat.ML

    Efficient Sampling on Riemannian Manifolds via Langevin MCMC

    Authors: Xiang Cheng, **gzhao Zhang, Suvrit Sra

    Abstract: We study the task of efficiently sampling from a Gibbs distribution $d π^* = e^{-h} d {vol}_g$ over a Riemannian manifold $M$ via (geometric) Langevin MCMC; this algorithm involves computing exponential maps in random Gaussian directions and is efficiently implementable in practice. The key to our analysis of Langevin MCMC is a bound on the discretization error of the geometric Euler-Murayama sche… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: This is an old paper from NeurIPS 2022. arXiv admin note: text overlap with arXiv:2204.13665

  2. arXiv:2305.15287  [pdf, other

    cs.LG cs.AI stat.ML

    The Crucial Role of Normalization in Sharpness-Aware Minimization

    Authors: Yan Dai, Kwangjun Ahn, Suvrit Sra

    Abstract: Sharpness-Aware Minimization (SAM) is a recently proposed gradient-based optimizer (Foret et al., ICLR 2021) that greatly improves the prediction performance of deep neural networks. Consequently, there has been a surge of interest in explaining its empirical success. We focus, in particular, on understanding the role played by normalization, a key component of the SAM updates. We theoretically an… ▽ More

    Submitted 23 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: 30 pages, Published in 37th Neural Information Processing Systems (NeurIPS 2023)

  3. arXiv:2212.14511  [pdf, other

    cs.LG eess.SY math.OC stat.ML

    Can Direct Latent Model Learning Solve Linear Quadratic Gaussian Control?

    Authors: Yi Tian, Kaiqing Zhang, Russ Tedrake, Suvrit Sra

    Abstract: We study the task of learning state representations from potentially high-dimensional observations, with the goal of controlling an unknown partially observable system. We pursue a direct latent model learning approach, where a dynamic model in some latent state space is learned by predicting quantities directly related to planning (e.g., costs) without reconstructing the observations. In particul… ▽ More

    Submitted 13 March, 2024; v1 submitted 29 December, 2022; originally announced December 2022.

    Comments: 37 pages; Updated structure and proofs

  4. arXiv:2202.13013  [pdf, other

    cs.LG stat.ML

    Sign and Basis Invariant Networks for Spectral Graph Representation Learning

    Authors: Derek Lim, Joshua Robinson, Lingxiao Zhao, Tess Smidt, Suvrit Sra, Haggai Maron, Stefanie Jegelka

    Abstract: We introduce SignNet and BasisNet -- new neural architectures that are invariant to two key symmetries displayed by eigenvectors: (i) sign flips, since if $v$ is an eigenvector then so is $-v$; and (ii) more general basis symmetries, which occur in higher dimensional eigenspaces with infinitely many choices of basis eigenvectors. We prove that under certain conditions our networks are universal, i… ▽ More

    Submitted 30 September, 2022; v1 submitted 25 February, 2022; originally announced February 2022.

    Comments: 42 pages

  5. arXiv:2202.06950  [pdf, other

    math.OC cs.LG stat.ML

    Sion's Minimax Theorem in Geodesic Metric Spaces and a Riemannian Extragradient Algorithm

    Authors: Peiyuan Zhang, **gzhao Zhang, Suvrit Sra

    Abstract: Deciding whether saddle points exist or are approximable for nonconvex-nonconcave problems is usually intractable. This paper takes a step towards understanding a broad class of nonconvex-nonconcave minimax problems that do remain tractable. Specifically, it studies minimax problems over geodesic metric spaces, which provide a vast generalization of the usual convex-concave saddle point problems.… ▽ More

    Submitted 28 May, 2023; v1 submitted 13 February, 2022; originally announced February 2022.

    Comments: 23 pages, 3 figures

  6. arXiv:2112.14862  [pdf, ps, other

    math.ST math.OC stat.ML

    Time varying regression with hidden linear dynamics

    Authors: Ali Jadbabaie, Horia Mania, Devavrat Shah, Suvrit Sra

    Abstract: We revisit a model for time-varying linear regression that assumes the unknown parameters evolve according to a linear dynamical system. Counterintuitively, we show that when the underlying dynamics are stable the parameters of this model can be estimated from data by combining just two ordinary least squares estimates. We offer a finite sample guarantee on the estimation error of our method and d… ▽ More

    Submitted 29 December, 2021; originally announced December 2021.

    Comments: 22 pages

  7. arXiv:2111.02763  [pdf, ps, other

    math.OC cs.LG stat.ML

    Understanding Riemannian Acceleration via a Proximal Extragradient Framework

    Authors: Jikai **, Suvrit Sra

    Abstract: We contribute to advancing the understanding of Riemannian accelerated gradient methods. In particular, we revisit Accelerated Hybrid Proximal Extragradient(A-HPE), a powerful framework for obtaining Euclidean accelerated methods \citep{monteiro2013accelerated}. Building on A-HPE, we then propose and analyze Riemannian A-HPE. The core of our analysis consists of two key components: (i) a set of ne… ▽ More

    Submitted 9 February, 2022; v1 submitted 4 November, 2021; originally announced November 2021.

  8. arXiv:2110.10342  [pdf, other

    cs.LG math.OC stat.ML

    Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond

    Authors: Chulhee Yun, Shashank Rajput, Suvrit Sra

    Abstract: In distributed learning, local SGD (also known as federated averaging) and its simple baseline minibatch SGD are widely studied optimization methods. Most existing analyses of these methods assume independent and unbiased gradient estimates obtained via with-replacement sampling. In contrast, we study shuffling-based variants: minibatch and local Random Reshuffling, which draw stochastic gradients… ▽ More

    Submitted 23 March, 2022; v1 submitted 19 October, 2021; originally announced October 2021.

    Comments: ICLR 2022 camera-ready (selected for an oral presentation); 76 pages, 3 figures

  9. arXiv:2110.06256  [pdf, other

    cs.LG math.OC stat.ML

    Neural Network Weights Do Not Converge to Stationary Points: An Invariant Measure Perspective

    Authors: **gzhao Zhang, Haochuan Li, Suvrit Sra, Ali Jadbabaie

    Abstract: This work examines the deep disconnect between existing theoretical analyses of gradient-based algorithms and the practice of training deep neural networks. Specifically, we provide numerical evidence that in large-scale neural network training (e.g., ImageNet + ResNet101, and WT103 + TransformerXL models), the neural network's weights do not converge to stationary points where the gradient of the… ▽ More

    Submitted 17 June, 2022; v1 submitted 12 October, 2021; originally announced October 2021.

    Journal ref: ICML 2022

  10. arXiv:2012.15483  [pdf, other

    cs.LG stat.ML

    Why do classifier accuracies show linear trends under distribution shift?

    Authors: Horia Mania, Suvrit Sra

    Abstract: Recent studies of generalization in deep learning have observed a puzzling trend: accuracies of models on one data distribution are approximately linear functions of the accuracies on another distribution. We explain this trend under an intuitive assumption on model similarity, which was verified empirically in prior work. More precisely, we assume the probability that two models agree in their pr… ▽ More

    Submitted 22 February, 2021; v1 submitted 31 December, 2020; originally announced December 2020.

    Comments: 18 pages, 13 figures

  11. arXiv:2010.15020  [pdf, other

    cs.LG stat.ML

    Online Learning in Unknown Markov Games

    Authors: Yi Tian, Yuanhao Wang, Tiancheng Yu, Suvrit Sra

    Abstract: We study online learning in unknown Markov games, a problem that arises in episodic multi-agent reinforcement learning where the actions of the opponents are unobservable. We show that in this challenging setting, achieving sublinear regret against the best response in hindsight is statistically hard. We then consider a weaker notion of regret by competing with the \emph{minimax value} of the game… ▽ More

    Submitted 6 February, 2021; v1 submitted 28 October, 2020; originally announced October 2020.

    Comments: 25 pages

  12. arXiv:2010.04592  [pdf, other

    cs.LG stat.ML

    Contrastive Learning with Hard Negative Samples

    Authors: Joshua Robinson, Ching-Yao Chuang, Suvrit Sra, Stefanie Jegelka

    Abstract: How can you sample good negative examples for contrastive learning? We argue that, as with metric learning, contrastive learning of representations benefits from hard negative samples (i.e., points that are difficult to distinguish from an anchor point). The key challenge toward using hard negatives is that contrastive methods must remain unsupervised, making it infeasible to adopt existing negati… ▽ More

    Submitted 24 January, 2021; v1 submitted 9 October, 2020; originally announced October 2020.

    Comments: Published as a conference paper at ICLR 2021

  13. arXiv:2006.13405  [pdf, other

    cs.LG math.OC stat.ML

    Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes

    Authors: Yi Tian, Jian Qian, Suvrit Sra

    Abstract: We study minimax optimal reinforcement learning in episodic factored Markov decision processes (FMDPs), which are MDPs with conditionally independent transition components. Assuming the factorization is known, we propose two model-based algorithms. The first one achieves minimax optimal regret guarantees for a rich class of factored structures, while the second one enjoys better computational comp… ▽ More

    Submitted 23 June, 2020; originally announced June 2020.

    Comments: 54 pages

  14. arXiv:2006.06946  [pdf, other

    math.OC stat.ML

    SGD with shuffling: optimal rates without component convexity and large epoch requirements

    Authors: Kwangjun Ahn, Chulhee Yun, Suvrit Sra

    Abstract: We study without-replacement SGD for solving finite-sum optimization problems. Specifically, depending on how the indices of the finite-sum are shuffled, we consider the RandomShuffle (shuffle at the beginning of each epoch) and SingleShuffle (shuffle only once) algorithms. First, we establish minimax optimal convergence rates of these algorithms up to poly-log factors. Notably, our analysis is ge… ▽ More

    Submitted 21 June, 2020; v1 submitted 12 June, 2020; originally announced June 2020.

    Comments: 53 pages; supersedes the preprint arXiv:2004.08657; v2 corrects an erroneous claim about SingleShuffle and newly adds Theorem 24 and Appendix F for SingleShuffle

  15. arXiv:2002.08483  [pdf, other

    cs.LG stat.ML

    Strength from Weakness: Fast Learning Using Weak Supervision

    Authors: Joshua Robinson, Stefanie Jegelka, Suvrit Sra

    Abstract: We study generalization properties of weakly supervised learning. That is, learning where only a few "strong" labels (the actual target of our prediction) are present but many more "weak" labels are available. In particular, we show that having access to weak labels can significantly accelerate the learning rate for the strong task to the fast rate of $\mathcal{O}(\nicefrac1n)$, where $n$ denotes… ▽ More

    Submitted 19 February, 2020; originally announced February 2020.

    Comments: 21 pages, 8 figures

  16. arXiv:2001.08876  [pdf, other

    math.OC stat.ML

    From Nesterov's Estimate Sequence to Riemannian Acceleration

    Authors: Kwangjun Ahn, Suvrit Sra

    Abstract: We propose the first global accelerated gradient method for Riemannian manifolds. Toward establishing our result we revisit Nesterov's estimate sequence technique and develop an alternative analysis for it that may also be of independent interest. Then, we extend this analysis to the Riemannian setting, localizing the key difficulty due to non-Euclidean structure into a certain ``metric distortion… ▽ More

    Submitted 23 January, 2020; originally announced January 2020.

    Comments: 30 pages

  17. arXiv:1912.01192  [pdf, ps, other

    cs.LG stat.ML

    Learning Adversarial MDPs with Bandit Feedback and Unknown Transition

    Authors: Chi **, Tiancheng **, Haipeng Luo, Suvrit Sra, Tiancheng Yu

    Abstract: We consider the problem of learning in episodic finite-horizon Markov decision processes with an unknown transition function, bandit feedback, and adversarial losses. We propose an efficient algorithm that achieves $\mathcal{\tilde{O}}(L|X|\sqrt{|A|T})$ regret with high probability, where $L$ is the horizon, $|X|$ is the number of states, $|A|$ is the number of actions, and $T$ is the number of ep… ▽ More

    Submitted 2 November, 2020; v1 submitted 3 December, 2019; originally announced December 2019.

    Comments: Fix a bug

    MSC Class: I.2.6 ACM Class: I.2.6

  18. arXiv:1907.09350   

    cs.LG stat.ML

    Efficient Policy Learning for Non-Stationary MDPs under Adversarial Manipulation

    Authors: Tiancheng Yu, Suvrit Sra

    Abstract: A Markov Decision Process (MDP) is a popular model for reinforcement learning. However, its commonly used assumption of stationary dynamics and rewards is too stringent and fails to hold in adversarial, nonstationary, or multi-agent problems. We study an episodic setting where the parameters of an MDP can differ across episodes. We learn a reliable policy of this potentially adversarial MDP by dev… ▽ More

    Submitted 21 August, 2019; v1 submitted 22 July, 2019; originally announced July 2019.

    Comments: There is a problem in the Theorem 1. We will try to fix it and update a new version

  19. arXiv:1907.03922  [pdf, ps, other

    cs.LG math.OC stat.ML

    Are deep ResNets provably better than linear predictors?

    Authors: Chulhee Yun, Suvrit Sra, Ali Jadbabaie

    Abstract: Recent results in the literature indicate that a residual network (ResNet) composed of a single residual block outperforms linear predictors, in the sense that all local minima in its optimization landscape are at least as good as the best linear predictor. However, these results are limited to a single residual block (i.e., shallow ResNets), instead of the deep ResNets composed of multiple residu… ▽ More

    Submitted 29 October, 2019; v1 submitted 8 July, 2019; originally announced July 2019.

    Comments: 15 pages. NeurIPS 2019 Camera-ready version

  20. arXiv:1906.11289   

    cs.LG stat.ML

    Near Optimal Stratified Sampling

    Authors: Tiancheng Yu, Xiyu Zhai, Suvrit Sra

    Abstract: The performance of a machine learning system is usually evaluated by using i.i.d.\ observations with true labels. However, acquiring ground truth labels is expensive, while obtaining unlabeled samples may be cheaper. Stratified sampling can be beneficial in such settings and can reduce the number of true labels required without compromising the evaluation accuracy. Stratified sampling exploits sta… ▽ More

    Submitted 26 July, 2019; v1 submitted 26 June, 2019; originally announced June 2019.

    Comments: We have discovered a mistake in the main result. The quantity on the RHS of (3) is not equal to the variance of estimator (2) when the sampling rule is designed adaptively as we do. There will be further cross-product terms which are now dominant terms. Therefore, although our bound is correct for (3), it no longer implies bound of the variance of (2)

  21. arXiv:1906.05413  [pdf, other

    cs.LG stat.ML

    Flexible Modeling of Diversity with Strongly Log-Concave Distributions

    Authors: Joshua Robinson, Suvrit Sra, Stefanie Jegelka

    Abstract: Strongly log-concave (SLC) distributions are a rich class of discrete probability distributions over subsets of some ground set. They are strictly more general than strongly Rayleigh (SR) distributions such as the well-known determinantal point process. While SR distributions offer elegant models of diversity, they lack an easy control over how they express diversity. We propose SLC as the right e… ▽ More

    Submitted 12 June, 2019; originally announced June 2019.

  22. arXiv:1901.09149  [pdf, other

    cs.LG math.OC stat.ML

    Esca** Saddle Points with Adaptive Gradient Methods

    Authors: Matthew Staib, Sashank J. Reddi, Satyen Kale, Sanjiv Kumar, Suvrit Sra

    Abstract: Adaptive methods such as Adam and RMSProp are widely used in deep learning but are not well understood. In this paper, we seek a crisp, clean and precise characterization of their behavior in nonconvex settings. To this end, we first provide a novel view of adaptive methods as preconditioned SGD, where the preconditioner is estimated in an online manner. By studying the preconditioner on its own,… ▽ More

    Submitted 3 February, 2020; v1 submitted 25 January, 2019; originally announced January 2019.

    Comments: Update Theorem 4.1 and proof to use martingale concentration bounds, i.e. matrix Freedman

  23. arXiv:1812.03190  [pdf, other

    cs.LG stat.ML

    Deep-RBF Networks Revisited: Robust Classification with Rejection

    Authors: Pourya Habib Zadeh, Reshad Hosseini, Suvrit Sra

    Abstract: One of the main drawbacks of deep neural networks, like many other classifiers, is their vulnerability to adversarial attacks. An important reason for their vulnerability is assigning high confidence to regions with few or even no feature points. By feature points, we mean a nonlinear transformation of the input space extracting a meaningful representation of the input data. On the other hand, dee… ▽ More

    Submitted 7 December, 2018; originally announced December 2018.

  24. arXiv:1810.07770  [pdf, ps, other

    cs.LG stat.ML

    Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity

    Authors: Chulhee Yun, Suvrit Sra, Ali Jadbabaie

    Abstract: We study finite sample expressivity, i.e., memorization power of ReLU networks. Recent results require $N$ hidden nodes to memorize/interpolate arbitrary $N$ data points. In contrast, by exploiting depth, we show that 3-layer ReLU networks with $Ω(\sqrt{N})$ hidden nodes can perfectly memorize most datasets with $N$ points. We also prove that width $Θ(\sqrt{N})$ is necessary and sufficient for mem… ▽ More

    Submitted 29 October, 2019; v1 submitted 17 October, 2018; originally announced October 2018.

    Comments: 28 pages, 2 figures. NeurIPS 2019 Camera-ready version

  25. arXiv:1809.10858  [pdf, ps, other

    math.OC cs.LG stat.ML

    Efficiently testing local optimality and esca** saddles for ReLU networks

    Authors: Chulhee Yun, Suvrit Sra, Ali Jadbabaie

    Abstract: We provide a theoretical algorithm for checking local optimality and esca** saddles at nondifferentiable points of empirical risks of two-layer ReLU networks. Our algorithm receives any parameter value and returns: local minimum, second-order stationary point, or a strict descent direction. The presence of $M$ data points on the nondifferentiability of the ReLU divides the parameter space into a… ▽ More

    Submitted 28 May, 2019; v1 submitted 28 September, 2018; originally announced September 2018.

    Comments: 23 pages, appeared at ICLR 2019

  26. arXiv:1806.10077  [pdf, other

    math.OC stat.ML

    Random Shuffling Beats SGD after Finite Epochs

    Authors: Jeff Z. HaoChen, Suvrit Sra

    Abstract: A long-standing problem in the theory of stochastic gradient descent (SGD) is to prove that its without-replacement version RandomShuffle converges faster than the usual with-replacement version. We present the first (to our knowledge) non-asymptotic solution to this problem, which shows that after a "reasonable" number of epochs RandomShuffle indeed converges faster than SGD. Specifically, we pro… ▽ More

    Submitted 7 October, 2019; v1 submitted 26 June, 2018; originally announced June 2018.

  27. arXiv:1805.00521  [pdf, other

    math.OC cs.LG stat.ML

    Direct Runge-Kutta Discretization Achieves Acceleration

    Authors: **gzhao Zhang, Aryan Mokhtari, Suvrit Sra, Ali Jadbabaie

    Abstract: We study gradient-based optimization methods obtained by directly discretizing a second-order ordinary differential equation (ODE) related to the continuous limit of Nesterov's accelerated gradient method. When the function is smooth enough, we show that acceleration can be achieved by a stable discretization of this ODE using standard Runge-Kutta integrators. Specifically, we prove that under Lip… ▽ More

    Submitted 27 November, 2018; v1 submitted 1 May, 2018; originally announced May 2018.

    Comments: 24 pages. 4 figures

  28. arXiv:1802.03487  [pdf, ps, other

    cs.LG math.OC stat.ML

    Small nonlinearities in activation functions create bad local minima in neural networks

    Authors: Chulhee Yun, Suvrit Sra, Ali Jadbabaie

    Abstract: We investigate the loss surface of neural networks. We prove that even for one-hidden-layer networks with "slightest" nonlinearity, the empirical risks have spurious local minima in most cases. Our results thus indicate that in general "no spurious local minima" is a property limited to deep linear networks, and insights obtained from linear networks may not be robust. Specifically, for ReLU(-like… ▽ More

    Submitted 28 May, 2019; v1 submitted 9 February, 2018; originally announced February 2018.

    Comments: 33 pages, appeared at ICLR 2019

  29. arXiv:1707.02444  [pdf, ps, other

    cs.LG math.OC stat.ML

    Global optimality conditions for deep neural networks

    Authors: Chulhee Yun, Suvrit Sra, Ali Jadbabaie

    Abstract: We study the error landscape of deep linear and nonlinear neural networks with the squared error loss. Minimizing the loss of a deep linear neural network is a nonconvex problem, and despite recent progress, our understanding of this loss surface is still incomplete. For deep linear networks, we present necessary and sufficient conditions for a critical point of the risk function to be a global mi… ▽ More

    Submitted 24 March, 2018; v1 submitted 8 July, 2017; originally announced July 2017.

    Comments: 14 pages. A camera-ready version that will appear at ICLR 2018

  30. arXiv:1706.03267  [pdf, other

    stat.ML cs.LG

    An Alternative to EM for Gaussian Mixture Models: Batch and Stochastic Riemannian Optimization

    Authors: Reshad Hosseini, Suvrit Sra

    Abstract: We consider maximum likelihood estimation for Gaussian Mixture Models (Gmms). This task is almost invariably solved (in theory and practice) via the Expectation Maximization (EM) algorithm. EM owes its success to various factors, of which is its ability to fulfill positive definiteness constraints in closed form is of key importance. We propose an alternative to EM by appealing to the rich Riemann… ▽ More

    Submitted 10 June, 2017; originally announced June 2017.

    Comments: 21 pages, 6 figures

  31. arXiv:1703.02674  [pdf, other

    stat.ML

    Polynomial Time Algorithms for Dual Volume Sampling

    Authors: Chengtao Li, Stefanie Jegelka, Suvrit Sra

    Abstract: We study dual volume sampling, a method for selecting k columns from an n x m short and wide matrix (n <= k <= m) such that the probability of selection is proportional to the volume spanned by the rows of the induced submatrix. This method was proposed by Avron and Boutsidis (2013), who showed it to be a promising method for column subset selection and its multiple applications. However, its wide… ▽ More

    Submitted 15 November, 2017; v1 submitted 7 March, 2017; originally announced March 2017.

  32. arXiv:1608.01008  [pdf, other

    stat.ML

    Fast Mixing Markov Chains for Strongly Rayleigh Measures, DPPs, and Constrained Sampling

    Authors: Chengtao Li, Stefanie Jegelka, Suvrit Sra

    Abstract: We study probability measures induced by set functions with constraints. Such measures arise in a variety of real-world settings, where prior knowledge, resource limitations, or other pragmatic considerations impose constraints. We consider the task of rapidly sampling from such constrained measures, and develop fast Markov chain samplers for them. Our first main result is for MCMC sampling from S… ▽ More

    Submitted 8 January, 2017; v1 submitted 2 August, 2016; originally announced August 2016.

    Comments: The present version subsumes arXiv:1607.03559

  33. arXiv:1607.08254  [pdf, other

    math.OC cs.LG stat.ML

    Stochastic Frank-Wolfe Methods for Nonconvex Optimization

    Authors: Sashank J. Reddi, Suvrit Sra, Barnabas Poczos, Alex Smola

    Abstract: We study Frank-Wolfe methods for nonconvex stochastic and finite-sum optimization problems. Frank-Wolfe methods (in the convex case) have gained tremendous recent interest in machine learning and optimization communities due to their projection-free property and their ability to exploit structured constraints. However, our understanding of these algorithms in the nonconvex setting is fairly limite… ▽ More

    Submitted 29 July, 2016; v1 submitted 27 July, 2016; originally announced July 2016.

  34. arXiv:1607.05002  [pdf, ps, other

    stat.ML cs.LG

    Geometric Mean Metric Learning

    Authors: Pourya Habib Zadeh, Reshad Hosseini, Suvrit Sra

    Abstract: We revisit the task of learning a Euclidean metric from data. We approach this problem from first principles and formulate it as a surprisingly simple optimization problem. Indeed, our formulation even admits a closed form solution. This solution possesses several very attractive properties: (i) an innate geometric appeal through the Riemannian geometry of positive definite matrices; (ii) ease of… ▽ More

    Submitted 18 July, 2016; originally announced July 2016.

    Comments: 7 pages, 4 figures

  35. arXiv:1607.03559  [pdf, other

    cs.LG cs.DS math.PR stat.ML

    Fast Sampling for Strongly Rayleigh Measures with Application to Determinantal Point Processes

    Authors: Chengtao Li, Stefanie Jegelka, Suvrit Sra

    Abstract: In this note we consider sampling from (non-homogeneous) strongly Rayleigh probability measures. As an important corollary, we obtain a fast mixing Markov Chain sampler for Determinantal Point Processes.

    Submitted 12 July, 2016; originally announced July 2016.

  36. arXiv:1605.08374  [pdf, other

    cs.LG cs.AI stat.ML

    Kronecker Determinantal Point Processes

    Authors: Zelda Mariet, Suvrit Sra

    Abstract: Determinantal Point Processes (DPPs) are probabilistic models over all subsets a ground set of $N$ items. They have recently gained prominence in several applications that rely on "diverse" subsets. However, their applicability to large problems is still limited due to the $\mathcal O(N^3)$ complexity of core tasks such as sampling and learning. We enable efficient sampling and learning for DPPs b… ▽ More

    Submitted 26 May, 2016; originally announced May 2016.

  37. arXiv:1605.06900  [pdf, other

    math.OC cs.LG stat.ML

    Fast Stochastic Methods for Nonsmooth Nonconvex Optimization

    Authors: Sashank J. Reddi, Suvrit Sra, Barnabas Poczos, Alex Smola

    Abstract: We analyze stochastic algorithms for optimizing nonconvex, nonsmooth finite-sum problems, where the nonconvex part is smooth and the nonsmooth part is convex. Surprisingly, unlike the smooth case, our knowledge of this fundamental problem is very limited. For example, it is not known whether the proximal stochastic gradient method with constant minibatch converges to a stationary point. To tackle… ▽ More

    Submitted 23 May, 2016; originally announced May 2016.

  38. arXiv:1605.00316  [pdf, other

    stat.ML

    Directional Statistics in Machine Learning: a Brief Review

    Authors: Suvrit Sra

    Abstract: The modern data analyst must cope with data encoded in various forms, vectors, matrices, strings, graphs, or more. Consequently, statistical and machine learning models tailored to different data encodings are important. We focus on data encoded as normalized vectors, so that their "direction" is more important than their magnitude. Specifically, we consider high-dimensional vectors that lie eithe… ▽ More

    Submitted 1 May, 2016; originally announced May 2016.

    Comments: 12 pages, slightly modified version of submitted book chapter

  39. arXiv:1604.02027  [pdf, other

    cs.LG cs.CL stat.ML

    Combinatorial Topic Models using Small-Variance Asymptotics

    Authors: Ke Jiang, Suvrit Sra, Brian Kulis

    Abstract: Topic models have emerged as fundamental tools in unsupervised machine learning. Most modern topic modeling algorithms take a probabilistic view and derive inference algorithms based on Latent Dirichlet Allocation (LDA) or its variants. In contrast, we study topic modeling as a combinatorial optimization problem, and propose a new objective function derived from LDA by passing to the small-varianc… ▽ More

    Submitted 26 May, 2016; v1 submitted 7 April, 2016; originally announced April 2016.

    Comments: 19 pages

  40. arXiv:1603.06160  [pdf, other

    math.OC cs.LG cs.NE stat.ML

    Stochastic Variance Reduction for Nonconvex Optimization

    Authors: Sashank J. Reddi, Ahmed Hefny, Suvrit Sra, Barnabas Poczos, Alex Smola

    Abstract: We study nonconvex finite-sum problems and analyze stochastic variance reduced gradient (SVRG) methods for them. SVRG and related methods have recently surged into prominence for convex optimization given their edge over stochastic gradient descent (SGD); but their theoretical analysis almost exclusively assumes convexity. In contrast, we prove non-asymptotic rates of convergence (to stationary po… ▽ More

    Submitted 4 April, 2016; v1 submitted 19 March, 2016; originally announced March 2016.

    Comments: Minor feedback changes

  41. arXiv:1603.06159  [pdf, other

    math.OC cs.LG stat.ML

    Fast Incremental Method for Nonconvex Optimization

    Authors: Sashank J. Reddi, Suvrit Sra, Barnabas Poczos, Alex Smola

    Abstract: We analyze a fast incremental aggregated gradient method for optimizing nonconvex problems of the form $\min_x \sum_i f_i(x)$. Specifically, we analyze the SAGA algorithm within an Incremental First-order Oracle framework, and show that it converges to a stationary point provably faster than both gradient descent and stochastic gradient descent. We also discuss a Polyak's special class of nonconve… ▽ More

    Submitted 19 March, 2016; originally announced March 2016.

  42. arXiv:1602.06053  [pdf, other

    math.OC cs.LG stat.ML

    First-order Methods for Geodesically Convex Optimization

    Authors: Hongyi Zhang, Suvrit Sra

    Abstract: Geodesic convexity generalizes the notion of (vector space) convexity to nonlinear metric spaces. But unlike convex optimization, geodesically convex (g-convex) optimization is much less developed. In this paper we contribute to the understanding of g-convex optimization by develo** iteration complexity analysis for several first-order algorithms on Hadamard manifolds. Specifically, we prove upp… ▽ More

    Submitted 19 February, 2016; originally announced February 2016.

    Comments: 21 pages

  43. arXiv:1512.01904  [pdf, other

    stat.ML math.NA

    Gauss quadrature for matrix inverse forms with applications

    Authors: Chengtao Li, Suvrit Sra, Stefanie Jegelka

    Abstract: We present a framework for accelerating a spectrum of machine learning algorithms that require computation of bilinear inverse forms $u^\top A^{-1}u$, where $A$ is a positive definite matrix and $u$ a given vector. Our framework is built on Gauss-type quadrature and easily scales to large, sparse matrices. Further, it allows retrospective computation of lower and upper bounds on $u^\top A^{-1}u$,… ▽ More

    Submitted 28 May, 2016; v1 submitted 6 December, 2015; originally announced December 2015.

  44. arXiv:1508.05003  [pdf, other

    stat.ML cs.LG math.OC

    AdaDelay: Delay Adaptive Distributed Stochastic Convex Optimization

    Authors: Suvrit Sra, Adams Wei Yu, Mu Li, Alexander J. Smola

    Abstract: We study distributed stochastic convex optimization under the delayed gradient model where the server nodes perform parameter updates, while the worker nodes compute stochastic gradients. We discuss, analyze, and experiment with a setup motivated by the behavior of real-world distributed computation networks, where the machines are differently slow at different time. Therefore, we allow the parame… ▽ More

    Submitted 20 August, 2015; originally announced August 2015.

    Comments: 19 pages

  45. arXiv:1506.07677  [pdf, other

    stat.ML cs.LG math.OC

    Manifold Optimization for Gaussian Mixture Models

    Authors: Reshad Hosseini, Suvrit Sra

    Abstract: We take a new look at parameter estimation for Gaussian Mixture Models (GMMs). In particular, we propose using \emph{Riemannian manifold optimization} as a powerful counterpart to Expectation Maximization (EM). An out-of-the-box invocation of manifold optimization, however, fails spectacularly: it converges to the same solution but vastly slower. Driven by intuition from manifold convexity, we the… ▽ More

    Submitted 25 June, 2015; originally announced June 2015.

    Comments: 19 pages

  46. arXiv:1506.06840  [pdf, other

    cs.LG stat.ML

    On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants

    Authors: Sashank J. Reddi, Ahmed Hefny, Suvrit Sra, Barnabás Póczos, Alex Smola

    Abstract: We study optimization algorithms based on variance reduction for stochastic gradient descent (SGD). Remarkable recent progress has been made in this direction through development of algorithms like SAG, SVRG, SAGA. These algorithms have been shown to outperform SGD, both theoretically and empirically. However, asynchronous versions of these algorithms---a crucial requirement for modern large-scale… ▽ More

    Submitted 24 January, 2016; v1 submitted 22 June, 2015; originally announced June 2015.

  47. arXiv:1411.0589  [pdf, other

    stat.ML math.OC

    Modular proximal optimization for multidimensional total-variation regularization

    Authors: Álvaro Barbero, Suvrit Sra

    Abstract: We study \emph{TV regularization}, a widely used technique for eliciting structured sparsity. In particular, we propose efficient algorithms for computing prox-operators for $\ell_p$-norm TV. The most important among these is $\ell_1$-norm TV, for whose prox-operator we present a new geometric analysis which unveils a hitherto unknown connection to taut-string methods. This connection turns out to… ▽ More

    Submitted 30 December, 2017; v1 submitted 3 November, 2014; originally announced November 2014.

    Comments: 67 pages, 32 figures, new non-iterative fast TV algorithm, extensive new experiments, corresponds to the github proxtv repository now

  48. arXiv:1410.4812  [pdf, other

    stat.CO math.OC stat.ML

    Inference and Mixture Modeling with the Elliptical Gamma Distribution

    Authors: Reshad Hosseini, Suvrit Sra, Lucas Theis, Matthias Bethge

    Abstract: We study modeling and inference with the Elliptical Gamma Distribution (EGD). We consider maximum likelihood (ML) estimation for EGD scatter matrices, a task for which we develop new fixed-point algorithms. Our algorithms are efficient and converge to global optima despite nonconvexity. Moreover, they turn out to be much faster than both a well-known iterative algorithm of Kent & Tyler (1991) and… ▽ More

    Submitted 20 December, 2015; v1 submitted 17 October, 2014; originally announced October 2014.

    Comments: 23 pages, 11 figures

    Journal ref: Computational Statistics & Data Analysis 2016, Vol. 101, 29-43

  49. arXiv:1409.6086  [pdf, other

    stat.ML math.OC

    Parallel and Distributed Block-Coordinate Frank-Wolfe Algorithms

    Authors: Yu-Xiang Wang, Veeranjaneyulu Sadhanala, Wei Dai, Willie Neiswanger, Suvrit Sra, Eric P. Xing

    Abstract: We develop parallel and distributed Frank-Wolfe algorithms; the former on shared memory machines with mini-batching, and the latter in a delayed update framework. Whenever possible, we perform computations asynchronously, which helps attain speedups on multicore machines as well as in distributed environments. Moreover, instead of worst-case bounded delays, our methods only depend (mildly) on \emp… ▽ More

    Submitted 12 February, 2016; v1 submitted 22 September, 2014; originally announced September 2014.

  50. arXiv:1409.2617  [pdf, other

    math.OC stat.ML

    Large-scale randomized-coordinate descent methods with non-separable linear constraints

    Authors: Sashank Reddi, Ahmed Hefny, Carlton Downey, Avinava Dubey, Suvrit Sra

    Abstract: We develop randomized (block) coordinate descent (CD) methods for linearly constrained convex optimization. Unlike most CD methods, we do not assume the constraints to be separable, but let them be coupled linearly. To our knowledge, ours is the first CD method that allows linear coupling constraints, without making the global iteration complexity have an exponential dependence on the number of co… ▽ More

    Submitted 10 June, 2015; v1 submitted 9 September, 2014; originally announced September 2014.