Skip to main content

Showing 1–50 of 60 results for author: Sanghavi, S

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.11206  [pdf, other

    cs.LG cs.CR stat.ML

    Retraining with Predicted Hard Labels Provably Increases Model Accuracy

    Authors: Rudrajit Das, Inderjit S. Dhillon, Alessandro Epasto, Adel Javanmard, Jieming Mao, Vahab Mirrokni, Sujay Sanghavi, Peilin Zhong

    Abstract: The performance of a model trained with \textit{noisy labels} is often improved by simply \textit{retraining} the model with its own predicted \textit{hard} labels (i.e., $1$/$0$ labels). Yet, a detailed theoretical characterization of this phenomenon is lacking. In this paper, we theoretically analyze retraining in a linearly separable setting with randomly corrupted labels given to us and prove… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  2. arXiv:2406.02016  [pdf, other

    math.OC cs.LG stat.ML

    Adaptive and Optimal Second-order Optimistic Methods for Minimax Optimization

    Authors: Ruichen Jiang, Ali Kavis, Qiujiang **, Sujay Sanghavi, Aryan Mokhtari

    Abstract: We propose adaptive, line search-free second-order methods with optimal rate of convergence for solving convex-concave min-max problems. By means of an adaptive step size, our algorithms feature a simple update rule that requires solving only one linear system per iteration, eliminating the need for line search or backtracking mechanisms. Specifically, we base our algorithms on the optimistic meth… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 33 pages, 2 figures

  3. arXiv:2402.07114  [pdf, other

    cs.LG math.NA math.OC stat.ML

    Towards Quantifying the Preconditioning Effect of Adam

    Authors: Rudrajit Das, Naman Agarwal, Sujay Sanghavi, Inderjit S. Dhillon

    Abstract: There is a notable dearth of results characterizing the preconditioning effect of Adam and showing how it may alleviate the curse of ill-conditioning -- an issue plaguing gradient descent (GD). In this work, we perform a detailed analysis of Adam's preconditioning effect for quadratic functions and quantify to what extent Adam can mitigate the dependence on the condition number of the Hessian. Our… ▽ More

    Submitted 11 February, 2024; originally announced February 2024.

  4. arXiv:2402.07052  [pdf, other

    cs.LG stat.ML

    Understanding the Training Speedup from Sampling with Approximate Losses

    Authors: Rudrajit Das, Xi Chen, Bertram Ieong, Parikshit Bansal, Sujay Sanghavi

    Abstract: It is well known that selecting samples with large losses/gradients can significantly reduce the number of training steps. However, the selection overhead is often too high to yield any meaningful gains in terms of overall training time. In this work, we focus on the greedy approach of selecting samples with large \textit{approximate losses} instead of exact losses in order to reduce the selection… ▽ More

    Submitted 10 February, 2024; originally announced February 2024.

  5. arXiv:2306.09136  [pdf, other

    cs.LG stat.ML

    Finite-Time Logarithmic Bayes Regret Upper Bounds

    Authors: Alexia Atsidakou, Branislav Kveton, Sumeet Katariya, Constantine Caramanis, Sujay Sanghavi

    Abstract: We derive the first finite-time logarithmic Bayes regret upper bounds for Bayesian bandits. In a multi-armed bandit, we obtain $O(c_Δ\log n)$ and $O(c_h \log^2 n)$ upper bounds for an upper confidence bound algorithm, where $c_h$ and $c_Δ$ are constants depending on the prior distribution and the gaps of bandit instances sampled from it, respectively. The latter bound asymptotically matches the lo… ▽ More

    Submitted 21 January, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

  6. arXiv:2301.13304  [pdf, other

    cs.LG cs.AI stat.ML

    Understanding Self-Distillation in the Presence of Label Noise

    Authors: Rudrajit Das, Sujay Sanghavi

    Abstract: Self-distillation (SD) is the process of first training a \enquote{teacher} model and then using its predictions to train a \enquote{student} model with the \textit{same} architecture. Specifically, the student's objective function is $\big(ξ*\ell(\text{teacher's predictions}, \text{ student's predictions}) + (1-ξ)*\ell(\text{given labels}, \text{ student's predictions})\big)$, where $\ell$ is som… ▽ More

    Submitted 30 January, 2023; originally announced January 2023.

  7. arXiv:2212.08765  [pdf, other

    cs.LG stat.ML

    Latent Variable Representation for Reinforcement Learning

    Authors: Tongzheng Ren, Chenjun Xiao, Tianjun Zhang, Na Li, Zhaoran Wang, Sujay Sanghavi, Dale Schuurmans, Bo Dai

    Abstract: Deep latent variable models have achieved significant empirical successes in model-based reinforcement learning (RL) due to their expressiveness in modeling complex transition dynamics. On the other hand, it remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of RL. In this paper, we provide a… ▽ More

    Submitted 7 March, 2023; v1 submitted 16 December, 2022; originally announced December 2022.

    Comments: ICLR 2023. The first two authors contribute equally. Project Website: https://rlrep.github.io/lvrep/

  8. arXiv:2211.08572  [pdf, other

    cs.LG stat.ML

    Bayesian Fixed-Budget Best-Arm Identification

    Authors: Alexia Atsidakou, Sumeet Katariya, Sujay Sanghavi, Branislav Kveton

    Abstract: Fixed-budget best-arm identification (BAI) is a bandit problem where the agent maximizes the probability of identifying the optimal arm within a fixed budget of observations. In this work, we study this problem in the Bayesian setting. We propose a Bayesian elimination algorithm and derive an upper bound on its probability of misidentifying the optimal arm. The bound reflects the quality of the pr… ▽ More

    Submitted 15 June, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

  9. arXiv:2206.10713  [pdf, other

    cs.LG stat.ML

    Beyond Uniform Lipschitz Condition in Differentially Private Optimization

    Authors: Rudrajit Das, Satyen Kale, Zheng Xu, Tong Zhang, Sujay Sanghavi

    Abstract: Most prior results on differentially private stochastic gradient descent (DP-SGD) are derived under the simplistic assumption of uniform Lipschitzness, i.e., the per-sample gradients are uniformly bounded. We generalize uniform Lipschitzness by assuming that the per-sample gradients have sample-dependent upper bounds, i.e., per-sample Lipschitz constants, which themselves may be unbounded. We prov… ▽ More

    Submitted 5 June, 2023; v1 submitted 21 June, 2022; originally announced June 2022.

    Comments: To appear in ICML 2023

  10. arXiv:2205.11078  [pdf, other

    stat.ML cs.LG math.ST

    Beyond EM Algorithm on Over-specified Two-Component Location-Scale Gaussian Mixtures

    Authors: Tongzheng Ren, Fuheng Cui, Sujay Sanghavi, Nhat Ho

    Abstract: The Expectation-Maximization (EM) algorithm has been predominantly used to approximate the maximum likelihood estimation of the location-scale Gaussian mixtures. However, when the models are over-specified, namely, the chosen number of components to fit the data is larger than the unknown true number of components, EM needs a polynomial number of iterations in terms of the sample size to reach the… ▽ More

    Submitted 23 May, 2022; originally announced May 2022.

    Comments: 38 pages, 4 figures. Tongzheng Ren and Fuheng Cui contributed equally to this work

  11. arXiv:2205.07999  [pdf, other

    stat.ML cs.LG math.OC math.ST

    An Exponentially Increasing Step-size for Parameter Estimation in Statistical Models

    Authors: Nhat Ho, Tongzheng Ren, Sujay Sanghavi, Purnamrita Sarkar, Rachel Ward

    Abstract: Using gradient descent (GD) with fixed or decaying step-size is a standard practice in unconstrained optimization problems. However, when the loss function is only locally convex, such a step-size schedule artificially slows GD down as it cannot explore the flat curvature of the loss function. To overcome that issue, we propose to exponentially increase the step-size of the GD algorithm. Under hom… ▽ More

    Submitted 1 February, 2023; v1 submitted 16 May, 2022; originally announced May 2022.

    Comments: 37 pages. The authors are listed in alphabetical order

  12. arXiv:2203.12577  [pdf, other

    cs.LG stat.ML

    Minimax Regret for Cascading Bandits

    Authors: Daniel Vial, Sujay Sanghavi, Sanjay Shakkottai, R. Srikant

    Abstract: Cascading bandits is a natural and popular model that frames the task of learning to rank from Bernoulli click feedback in a bandit setting. For the case of unstructured rewards, we prove matching upper and lower bounds for the problem-independent (i.e., gap-free) regret, both of which strictly improve the best known. A key observation is that the hard instances of this problem are those with smal… ▽ More

    Submitted 10 October, 2022; v1 submitted 23 March, 2022; originally announced March 2022.

    Journal ref: Conference on Neural Information Processing Systems (NeurIPS) 2022

  13. arXiv:2202.04219  [pdf, other

    stat.ML cs.LG math.ST

    Improving Computational Complexity in Statistical Models with Second-Order Information

    Authors: Tongzheng Ren, Jiacheng Zhuo, Sujay Sanghavi, Nhat Ho

    Abstract: It is known that when the statistical models are singular, i.e., the Fisher information matrix at the true parameter is degenerate, the fixed step-size gradient descent algorithm takes polynomial number of steps in terms of the sample size $n$ to converge to a final statistical radius around the true parameter, which can be unsatisfactory for the application. To further improve that computational… ▽ More

    Submitted 13 April, 2022; v1 submitted 8 February, 2022; originally announced February 2022.

    Comments: 27 pages, 2 figures. Fixing a bug in the proof of Lemma 7

  14. arXiv:2110.07810  [pdf, other

    cs.LG math.ST stat.ML

    Towards Statistical and Computational Complexities of Polyak Step Size Gradient Descent

    Authors: Tongzheng Ren, Fuheng Cui, Alexia Atsidakou, Sujay Sanghavi, Nhat Ho

    Abstract: We study the statistical and computational complexities of the Polyak step size gradient descent algorithm under generalized smoothness and Lojasiewicz conditions of the population loss function, namely, the limit of the empirical loss function when the sample size goes to infinity, and the stability between the gradients of the empirical and population loss functions, namely, the polynomial growt… ▽ More

    Submitted 14 October, 2021; originally announced October 2021.

    Comments: First three authors contributed equally. 40 pages, 4 figures

  15. arXiv:2106.08882  [pdf, other

    cs.LG cs.DC math.OC stat.ML

    Robust Training in High Dimensions via Block Coordinate Geometric Median Descent

    Authors: Anish Acharya, Abolfazl Hashemi, Prateek Jain, Sujay Sanghavi, Inderjit S. Dhillon, Ufuk Topcu

    Abstract: Geometric median (\textsc{Gm}) is a classical method in statistics for achieving a robust estimation of the uncorrupted data; under gross corruption, it achieves the optimal breakdown point of 0.5. However, its computational complexity makes it infeasible for robustifying stochastic gradient descent (SGD) for high-dimensional optimization problems. In this paper, we show that by applying \textsc{G… ▽ More

    Submitted 16 June, 2021; originally announced June 2021.

  16. arXiv:2106.07094  [pdf, other

    cs.LG cs.DC eess.SP math.OC stat.ML

    On the Convergence of Differentially Private Federated Learning on Non-Lipschitz Objectives, and with Normalized Client Updates

    Authors: Rudrajit Das, Abolfazl Hashemi, Sujay Sanghavi, Inderjit S. Dhillon

    Abstract: There is a dearth of convergence results for differentially private federated learning (FL) with non-Lipschitz objective functions (i.e., when gradient norms are not bounded). The primary reason for this is that the clip** operation (i.e., projection onto an $\ell_2$ ball of a fixed radius called the clip** threshold) for bounding the sensitivity of the average update to each client's update i… ▽ More

    Submitted 15 April, 2022; v1 submitted 13 June, 2021; originally announced June 2021.

  17. arXiv:2103.14077  [pdf, ps, other

    stat.ML cs.LG

    Nearly Horizon-Free Offline Reinforcement Learning

    Authors: Tongzheng Ren, Jialian Li, Bo Dai, Simon S. Du, Sujay Sanghavi

    Abstract: We revisit offline reinforcement learning on episodic time-homogeneous Markov Decision Processes (MDP). For tabular MDP with $S$ states and $A$ actions, or linear MDP with anchor points and feature dimension $d$, given the collected $K$ episodes data with minimum visiting probability of (anchor) state-action pairs $d_m$, we obtain nearly horizon $H$-free sample complexity bounds for offline reinfo… ▽ More

    Submitted 9 February, 2022; v1 submitted 25 March, 2021; originally announced March 2021.

    Comments: NeurIPS 2021

  18. arXiv:2012.04061  [pdf, other

    stat.ML cs.DC cs.LG math.OC

    Faster Non-Convex Federated Learning via Global and Local Momentum

    Authors: Rudrajit Das, Anish Acharya, Abolfazl Hashemi, Sujay Sanghavi, Inderjit S. Dhillon, Ufuk Topcu

    Abstract: We propose \texttt{FedGLOMO}, a novel federated learning (FL) algorithm with an iteration complexity of $\mathcal{O}(ε^{-1.5})$ to converge to an $ε$-stationary point (i.e., $\mathbb{E}[\|\nabla f(\bm{x})\|^2] \leq ε$) for smooth non-convex functions -- under arbitrary client heterogeneity and compressed communication -- compared to the $\mathcal{O}(ε^{-2})$ complexity of most prior works. Our key… ▽ More

    Submitted 24 October, 2021; v1 submitted 7 December, 2020; originally announced December 2020.

  19. arXiv:2011.14066  [pdf, other

    stat.ML cs.LG

    On Generalization of Adaptive Methods for Over-parameterized Linear Regression

    Authors: Vatsal Shah, Soumya Basu, Anastasios Kyrillidis, Sujay Sanghavi

    Abstract: Over-parameterization and adaptive methods have played a crucial role in the success of deep learning in the last decade. The widespread use of over-parameterization has forced us to rethink generalization by bringing forth new phenomena, such as implicit regularization of optimization algorithms and double descent with training progression. A series of recent works have started to shed light on t… ▽ More

    Submitted 27 November, 2020; originally announced November 2020.

    Comments: arXiv admin note: substantial text overlap with arXiv:1811.07055

  20. arXiv:2011.10643  [pdf, other

    cs.LG cs.DC math.OC stat.ML

    On the Benefits of Multiple Gossip Steps in Communication-Constrained Decentralized Optimization

    Authors: Abolfazl Hashemi, Anish Acharya, Rudrajit Das, Haris Vikalo, Sujay Sanghavi, Inderjit Dhillon

    Abstract: In decentralized optimization, it is common algorithmic practice to have nodes interleave (local) gradient descent iterations with gossip (i.e. averaging over the network) steps. Motivated by the training of large-scale machine learning models, it is also increasingly common to require that messages be {\em lossy compressed} versions of the local parameters. In this paper, we show that, in such co… ▽ More

    Submitted 20 November, 2020; originally announced November 2020.

  21. arXiv:2004.00198  [pdf, other

    cs.LG stat.ML

    Extreme Multi-label Classification from Aggregated Labels

    Authors: Yanyao Shen, Hsiang-fu Yu, Sujay Sanghavi, Inderjit Dhillon

    Abstract: Extreme multi-label classification (XMC) is the problem of finding the relevant labels for an input, from a very large universe of possible labels. We consider XMC in the setting where labels are available only for groups of samples - but not for individual ones. Current XMC approaches are not built for such multi-instance multi-label (MIML) training data, and MIML approaches do not scale to XMC s… ▽ More

    Submitted 31 March, 2020; originally announced April 2020.

  22. arXiv:2001.03316  [pdf, other

    stat.ML cs.LG

    Choosing the Sample with Lowest Loss makes SGD Robust

    Authors: Vatsal Shah, Xiaoxia Wu, Sujay Sanghavi

    Abstract: The presence of outliers can potentially significantly skew the parameters of machine learning models trained via stochastic gradient descent (SGD). In this paper we propose a simple variant of the simple SGD method: in each step, first choose a set of k samples, then from these choose the one with the smallest current loss, and do an SGD-like update with this chosen sample. Vanilla SGD correspond… ▽ More

    Submitted 10 January, 2020; originally announced January 2020.

  23. arXiv:1911.03034  [pdf, other

    cs.LG stat.ML

    Interaction Hard Thresholding: Consistent Sparse Quadratic Regression in Sub-quadratic Time and Space

    Authors: Shuo Yang, Yanyao Shen, Sujay Sanghavi

    Abstract: Quadratic regression involves modeling the response as a (generalized) linear function of not only the features $x^{j_1}$ but also of quadratic terms $x^{j_1}x^{j_2}$. The inclusion of such higher-order "interaction terms" in regression often provides an easy way to increase accuracy in already-high-dimensional problems. However, this explodes the problem dimension from linear $O(p)$ to quadratic… ▽ More

    Submitted 7 November, 2019; originally announced November 2019.

    Comments: Accepted by NeurIPS 2019

  24. arXiv:1909.01812  [pdf, other

    cs.LG cs.DS math.ST stat.ML

    Learning Distributions Generated by One-Layer ReLU Networks

    Authors: Shanshan Wu, Alexandros G. Dimakis, Sujay Sanghavi

    Abstract: We consider the problem of estimating the parameters of a $d$-dimensional rectified Gaussian distribution from i.i.d. samples. A rectified Gaussian distribution is defined by passing a standard Gaussian distribution through a one-layer ReLU neural network. We give a simple algorithm to estimate the parameters (i.e., the weight matrix and bias vector of the ReLU neural network) up to an error… ▽ More

    Submitted 19 September, 2019; v1 submitted 4 September, 2019; originally announced September 2019.

    Comments: NeurIPS 2019

  25. arXiv:1907.11975  [pdf, other

    cs.LG stat.ML

    Blocking Bandits

    Authors: Soumya Basu, Rajat Sen, Sujay Sanghavi, Sanjay Shakkottai

    Abstract: We consider a novel stochastic multi-armed bandit setting, where playing an arm makes it unavailable for a fixed number of time slots thereafter. This models situations where reusing an arm too often is undesirable (e.g. making the same product recommendation repeatedly) or infeasible (e.g. compute job scheduling on machines). We show that with prior knowledge of the rewards and delays of all the… ▽ More

    Submitted 27 July, 2019; originally announced July 2019.

  26. arXiv:1902.03653  [pdf, other

    cs.LG stat.ML

    Iterative Least Trimmed Squares for Mixed Linear Regression

    Authors: Yanyao Shen, Sujay Sanghavi

    Abstract: Given a linear regression setting, Iterative Least Trimmed Squares (ILTS) involves alternating between (a) selecting the subset of samples with lowest current loss, and (b) re-fitting the linear model only on that subset. Both steps are very fast and simple. In this paper we analyze ILTS in the setting of mixed linear regression with corruptions (MLR-C). We first establish deterministic conditions… ▽ More

    Submitted 12 November, 2019; v1 submitted 10 February, 2019; originally announced February 2019.

    Comments: Accepted by NeurIPS 2019

  27. PruneTrain: Fast Neural Network Training by Dynamic Sparse Model Reconfiguration

    Authors: Sangkug Lym, Esha Choukse, Siavash Zangeneh, Wei Wen, Sujay Sanghavi, Mattan Erez

    Abstract: State-of-the-art convolutional neural networks (CNNs) used in vision applications have large models with numerous weights. Training these models is very compute- and memory-resource intensive. Much research has been done on pruning or compressing these models to reduce the cost of inference, but little work has addressed the costs of training. We focus precisely on accelerating training. We propos… ▽ More

    Submitted 9 December, 2019; v1 submitted 26 January, 2019; originally announced January 2019.

  28. arXiv:1811.07055   

    stat.ML cs.LG

    Minimum weight norm models do not always generalize well for over-parameterized problems

    Authors: Vatsal Shah, Anastasios Kyrillidis, Sujay Sanghavi

    Abstract: This work is substituted by the paper in arXiv:2011.14066. Stochastic gradient descent is the de facto algorithm for training deep neural networks (DNNs). Despite its popularity, it still requires fine tuning in order to achieve its best performance. This has led to the development of adaptive methods, that claim automatic hyper-parameter optimization. Recently, researchers have studied both algo… ▽ More

    Submitted 1 December, 2020; v1 submitted 16 November, 2018; originally announced November 2018.

    Comments: This work is substituted by the paper in arXiv:2011.14066

  29. arXiv:1810.11905  [pdf, other

    cs.LG cs.DS math.ST stat.ML

    Sparse Logistic Regression Learns All Discrete Pairwise Graphical Models

    Authors: Shanshan Wu, Sujay Sanghavi, Alexandros G. Dimakis

    Abstract: We characterize the effectiveness of a classical algorithm for recovering the Markov graph of a general discrete pairwise graphical model from i.i.d. samples. The algorithm is (appropriately regularized) maximum conditional log-likelihood, which involves solving a convex program for each node; for Ising models this is $\ell_1$-constrained logistic regression, while for more general alphabets an… ▽ More

    Submitted 18 June, 2019; v1 submitted 28 October, 2018; originally announced October 2018.

    Comments: 30 pages, 3 figures

  30. arXiv:1810.11874  [pdf, other

    cs.LG stat.ML

    Learning with Bad Training Data via Iterative Trimmed Loss Minimization

    Authors: Yanyao Shen, Sujay Sanghavi

    Abstract: In this paper, we study a simple and generic framework to tackle the problem of learning model parameters when a fraction of the training samples are corrupted. We first make a simple observation: in a variety of such settings, the evolution of training accuracy (as a function of training epochs) is different for clean and bad samples. Based on this we propose to iteratively minimize the trimmed l… ▽ More

    Submitted 18 February, 2019; v1 submitted 28 October, 2018; originally announced October 2018.

  31. arXiv:1806.10175  [pdf, other

    stat.ML cs.IT cs.LG

    Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling

    Authors: Shanshan Wu, Alexandros G. Dimakis, Sujay Sanghavi, Felix X. Yu, Daniel Holtmann-Rice, Dmitry Storcheus, Afshin Rostamizadeh, Sanjiv Kumar

    Abstract: Linear encoding of sparse vectors is widely popular, but is commonly data-independent -- missing any possible extra (but a priori unknown) structure beyond sparsity. In this paper we present a new method to learn linear encoders that adapt to data, while still performing well with the widely used $\ell_1$ decoder. The convex $\ell_1$ decoder prevents gradient propagation as needed in standard grad… ▽ More

    Submitted 2 July, 2019; v1 submitted 26 June, 2018; originally announced June 2018.

    Comments: 17 pages, 7 tables, 8 figures, published in ICML 2019; part of this work was done while Shanshan was an intern at Google Research, New York

  32. arXiv:1806.07944  [pdf, ps, other

    cs.SI cs.LG stat.ML

    Searching for a Single Community in a Graph

    Authors: Avik Ray, Sujay Sanghavi, Sanjay Shakkottai

    Abstract: In standard graph clustering/community detection, one is interested in partitioning the graph into more densely connected subsets of nodes. In contrast, the "search" problem of this paper aims to only find the nodes in a "single" such community, the target, out of the many communities that may exist. To do so , we are given suitable side information about the target; for example, a very small numb… ▽ More

    Submitted 24 May, 2018; originally announced June 2018.

    Comments: ACM Journal on Modeling and Performance Evaluation of Computing Systems (TOMPECS) [to appear]

  33. arXiv:1703.02682  [pdf, other

    stat.ML cs.IT cs.LG

    Sparse Quadratic Logistic Regression in Sub-quadratic Time

    Authors: Karthikeyan Shanmugam, Murat Kocaoglu, Alexandros G. Dimakis, Sujay Sanghavi

    Abstract: We consider support recovery in the quadratic logistic regression setting - where the target depends on both p linear terms $x_i$ and up to $p^2$ quadratic terms $x_i x_j$. Quadratic terms enable prediction/modeling of higher-order effects between features and the target, but when incorporated naively may involve solving a very large regression problem. We consider the sparse case, where at most… ▽ More

    Submitted 7 March, 2017; originally announced March 2017.

  34. arXiv:1610.06656  [pdf, ps, other

    stat.ML cs.DS cs.IT cs.LG

    Single Pass PCA of Matrix Products

    Authors: Shanshan Wu, Srinadh Bhojanapalli, Sujay Sanghavi, Alexandros G. Dimakis

    Abstract: In this paper we present a new algorithm for computing a low rank approximation of the product $A^TB$ by taking only a single pass of the two matrices $A$ and $B$. The straightforward way to do this is to (a) first sketch $A$ and $B$ individually, and then (b) find the top components using PCA on the sketch. Our algorithm in contrast retains additional summary information about $A,B$ (e.g. row and… ▽ More

    Submitted 26 October, 2016; v1 submitted 20 October, 2016; originally announced October 2016.

    Comments: 24 pages, 4 figures, NIPS 2016

  35. arXiv:1610.00843  [pdf, ps, other

    stat.ML cs.LG

    The Search Problem in Mixture Models

    Authors: Avik Ray, Joe Neeman, Sujay Sanghavi, Sanjay Shakkottai

    Abstract: We consider the task of learning the parameters of a {\em single} component of a mixture model, for the case when we are given {\em side information} about that component, we call this the "search problem" in mixture models. We would like to solve this with computational and sample complexity lower than solving the overall original problem, where one learns parameters of all components. Our main… ▽ More

    Submitted 24 February, 2018; v1 submitted 4 October, 2016; originally announced October 2016.

  36. arXiv:1609.03240  [pdf, ps, other

    stat.ML cs.IT cs.LG math.NA math.OC

    Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach

    Authors: Dohyung Park, Anastasios Kyrillidis, Constantine Caramanis, Sujay Sanghavi

    Abstract: We consider the non-square matrix sensing problem, under restricted isometry property (RIP) assumptions. We focus on the non-convex formulation, where any rank-$r$ matrix $X \in \mathbb{R}^{m \times n}$ is represented as $UV^\top$, where $U \in \mathbb{R}^{m \times r}$ and $V \in \mathbb{R}^{n \times r}$. In this paper, we complement recent findings on the non-convex geometry of the analogous PSD… ▽ More

    Submitted 26 September, 2016; v1 submitted 11 September, 2016; originally announced September 2016.

    Comments: 14 pages, no figures

  37. arXiv:1608.05749  [pdf, other

    cs.LG cs.IT math.ST stat.ML

    Solving a Mixture of Many Random Linear Equations by Tensor Decomposition and Alternating Minimization

    Authors: Xinyang Yi, Constantine Caramanis, Sujay Sanghavi

    Abstract: We consider the problem of solving mixed random linear equations with $k$ components. This is the noiseless setting of mixed linear regression. The goal is to estimate multiple linear models from mixed samples in the case where the labels (which sample corresponds to which model) are not observed. We give a tractable algorithm for the mixed linear equation problem, and show that under some technic… ▽ More

    Submitted 19 August, 2016; originally announced August 2016.

    Comments: 39 pages, 2 figures

  38. arXiv:1606.01316  [pdf, other

    stat.ML cs.DS cs.IT math.NA math.OC

    Provable Burer-Monteiro factorization for a class of norm-constrained matrix problems

    Authors: Dohyung Park, Anastasios Kyrillidis, Srinadh Bhojanapalli, Constantine Caramanis, Sujay Sanghavi

    Abstract: We study the projected gradient descent method on low-rank matrix problems with a strongly convex objective. We use the Burer-Monteiro factorization approach to implicitly enforce low-rankness; such factorization introduces non-convexity in the objective. We focus on constraint sets that include both positive semi-definite (PSD) constraints and specific matrix norm-constraints. Such criteria appea… ▽ More

    Submitted 1 October, 2016; v1 submitted 3 June, 2016; originally announced June 2016.

    Comments: 28 pages

  39. arXiv:1603.06861  [pdf, other

    stat.ML cs.IT cs.LG math.OC

    Trading-off variance and complexity in stochastic gradient descent

    Authors: Vatsal Shah, Megasthenis Asteris, Anastasios Kyrillidis, Sujay Sanghavi

    Abstract: Stochastic gradient descent is the method of choice for large-scale machine learning problems, by virtue of its light complexity per iteration. However, it lags behind its non-stochastic counterparts with respect to the convergence rate, due to high variance introduced by the stochastic updates. The popular Stochastic Variance-Reduced Gradient (SVRG) method mitigates this shortcoming, introducing… ▽ More

    Submitted 22 March, 2016; originally announced March 2016.

    Comments: 14 pages, 13 figures, first edition on 9th of October 2015

  40. arXiv:1509.03917  [pdf, other

    stat.ML cs.DS cs.IT cs.LG math.NA math.OC

    Drop** Convexity for Faster Semi-definite Optimization

    Authors: Srinadh Bhojanapalli, Anastasios Kyrillidis, Sujay Sanghavi

    Abstract: We study the minimization of a convex function $f(X)$ over the set of $n\times n$ positive semi-definite matrices, but when the problem is recast as $\min_U g(U) := f(UU^\top)$, with $U \in \mathbb{R}^{n \times r}$ and $r \leq n$. We study the performance of gradient descent on $g$---which we refer to as Factored Gradient Descent (FGD)---under standard assumptions on the original function $f$. W… ▽ More

    Submitted 15 April, 2016; v1 submitted 13 September, 2015; originally announced September 2015.

    Comments: 40 pages

  41. arXiv:1507.04457  [pdf, other

    stat.ML cs.LG

    Preference Completion: Large-scale Collaborative Ranking from Pairwise Comparisons

    Authors: Dohyung Park, Joe Neeman, ** Zhang, Sujay Sanghavi, Inderjit S. Dhillon

    Abstract: In this paper we consider the collaborative ranking setting: a pool of users each provides a small number of pairwise preferences between $d$ possible items; from these we need to predict preferences of the users for items they have not yet seen. We do so by fitting a rank $r$ score matrix to the pairwise data, and provide two main contributions: (a) we show that an algorithm based on convex optim… ▽ More

    Submitted 16 July, 2015; originally announced July 2015.

  42. arXiv:1506.07868  [pdf, other

    math.NA math.OC stat.ML

    The local convexity of solving systems of quadratic equations

    Authors: Chris D. White, Sujay Sanghavi, Rachel Ward

    Abstract: This paper considers the recovery of a rank $r$ positive semidefinite matrix $X X^T\in\mathbb{R}^{n\times n}$ from $m$ scalar measurements of the form $y_i := a_i^T X X^T a_i$ (i.e., quadratic measurements of $X$). Such problems arise in a variety of applications, including covariance sketching of high-dimensional data streams, quadratic regression, quantum state tomography, among others. A natura… ▽ More

    Submitted 1 June, 2016; v1 submitted 25 June, 2015; originally announced June 2015.

    Comments: 36 pages, 3 figures

  43. arXiv:1506.02348  [pdf, ps, other

    cs.LG stat.ML

    Convergence Rates of Active Learning for Maximum Likelihood Estimation

    Authors: Kamalika Chaudhuri, Sham Kakade, Praneeth Netrapalli, Sujay Sanghavi

    Abstract: An active learner is given a class of models, a large set of unlabeled examples, and the ability to interactively query labels of a subset of these examples; the goal of the learner is to learn a model in the class that fits the data well. Previous theoretical work has rigorously characterized label complexity of active learning, but most of this work has focused on the PAC or the agnostic PAC m… ▽ More

    Submitted 8 June, 2015; originally announced June 2015.

  44. arXiv:1502.05023  [pdf, ps, other

    stat.ML cs.DS cs.IT cs.LG

    A New Sampling Technique for Tensors

    Authors: Srinadh Bhojanapalli, Sujay Sanghavi

    Abstract: In this paper we propose new techniques to sample arbitrary third-order tensors, with an objective of speeding up tensor algorithms that have recently gained popularity in machine learning. Our main contribution is a new way to select, in a biased random way, only $O(n^{1.5}/ε^2)$ of the possible $n^3$ elements while still achieving each of the three goals: \\ {\em (a) tensor sparsification}: for… ▽ More

    Submitted 19 February, 2015; v1 submitted 17 February, 2015; originally announced February 2015.

    Comments: 29 pages,3 figures

  45. arXiv:1410.8864  [pdf, other

    stat.ML cs.IT cs.LG

    Greedy Subspace Clustering

    Authors: Dohyung Park, Constantine Caramanis, Sujay Sanghavi

    Abstract: We consider the problem of subspace clustering: given points that lie on or near the union of many low-dimensional linear subspaces, recover the subspaces. To this end, one first identifies sets of points close to the same subspace and uses the sets to estimate the subspaces. As the geometric structure of the clusters (linear subspaces) forbids proper performance of general distance based approach… ▽ More

    Submitted 31 October, 2014; originally announced October 2014.

    Comments: To appear in NIPS 2014

  46. arXiv:1410.7660  [pdf, other

    cs.IT cs.LG stat.ML

    Non-convex Robust PCA

    Authors: Praneeth Netrapalli, U N Niranjan, Sujay Sanghavi, Animashree Anandkumar, Prateek Jain

    Abstract: We propose a new method for robust PCA -- the task of recovering a low-rank matrix from sparse corruptions that are of unknown value and support. Our method involves alternating between projecting appropriate residuals onto the set of low-rank matrices, and the set of sparse matrices; each projection is {\em non-convex} but easy to compute. In spite of this non-convexity, we establish exact recove… ▽ More

    Submitted 28 October, 2014; originally announced October 2014.

    Comments: Extended abstract to appear in NIPS 2014

  47. arXiv:1410.3886  [pdf, ps, other

    cs.DS cs.LG stat.ML

    Tighter Low-rank Approximation via Sampling the Leveraged Element

    Authors: Srinadh Bhojanapalli, Prateek Jain, Sujay Sanghavi

    Abstract: In this work, we propose a new randomized algorithm for computing a low-rank approximation to a given matrix. Taking an approach different from existing literature, our method first involves a specific biased sampling, with an element being chosen based on the leverage scores of its row and column, and then involves weighted alternating minimization over the factored form of the intended low-rank… ▽ More

    Submitted 14 October, 2014; originally announced October 2014.

    Comments: 36 pages, 3 figures, Extended abstract to appear in the proceedings of ACM-SIAM Symposium on Discrete Algorithms (SODA15)

  48. arXiv:1310.3745  [pdf, ps, other

    stat.ML

    Alternating Minimization for Mixed Linear Regression

    Authors: Xinyang Yi, Constantine Caramanis, Sujay Sanghavi

    Abstract: Mixed linear regression involves the recovery of two (or more) unknown vectors from unlabeled linear measurements; that is, where each sample comes from exactly one of the vectors, but we do not know which one. It is a classic problem, and the natural and empirically most popular approach to its solution has been the EM algorithm. As in other settings, this is prone to bad local minima; however, e… ▽ More

    Submitted 7 February, 2014; v1 submitted 14 October, 2013; originally announced October 2013.

  49. arXiv:1306.2979  [pdf, other

    stat.ML cs.IT cs.LG

    Completing Any Low-rank Matrix, Provably

    Authors: Yudong Chen, Srinadh Bhojanapalli, Sujay Sanghavi, Rachel Ward

    Abstract: Matrix completion, i.e., the exact and provable recovery of a low-rank matrix from a small subset of its elements, is currently only known to be possible if the matrix satisfies a restrictive structural constraint---known as {\em incoherence}---on its row and column spaces. In these cases, the subset of elements is sampled uniformly at random. In this paper, we show that {\em any} rank-$ r $… ▽ More

    Submitted 21 July, 2014; v1 submitted 12 June, 2013; originally announced June 2013.

    Comments: Added a new necessary condition(Theorem 6) and a result on completion of row coherent matrices(Corollary 4). Partial results appeared in the International Conference on Machine Learning 2014, under the title 'Coherent Matrix Completion'. (34 pages, 4 figures)

  50. arXiv:1306.0160  [pdf, other

    stat.ML cs.IT cs.LG

    Phase Retrieval using Alternating Minimization

    Authors: Praneeth Netrapalli, Prateek Jain, Sujay Sanghavi

    Abstract: Phase retrieval problems involve solving linear equations, but with missing sign (or phase, for complex numbers) information. More than four decades after it was first proposed, the seminal error reduction algorithm of (Gerchberg and Saxton 1972) and (Fienup 1982) is still the popular choice for solving many variants of this problem. The algorithm is based on alternating minimization; i.e. it alte… ▽ More

    Submitted 12 June, 2015; v1 submitted 1 June, 2013; originally announced June 2013.

    Comments: Accepted for publication in IEEE Transactions on Signal Processing