Search | arXiv e-print repository

Provable Inductive Robust PCA via Iterative Hard Thresholding

Authors: U. N. Niranjan, Arun Rajkumar, Theja Tulabandhula

Abstract: The robust PCA problem, wherein, given an input data matrix that is the superposition of a low-rank matrix and a sparse matrix, we aim to separate out the low-rank and sparse components, is a well-studied problem in machine learning. One natural question that arises is that, as in the inductive setting, if features are provided as input as well, can we hope to do better? Answering this in the affi… ▽ More The robust PCA problem, wherein, given an input data matrix that is the superposition of a low-rank matrix and a sparse matrix, we aim to separate out the low-rank and sparse components, is a well-studied problem in machine learning. One natural question that arises is that, as in the inductive setting, if features are provided as input as well, can we hope to do better? Answering this in the affirmative, the main goal of this paper is to study the robust PCA problem while incorporating feature information. In contrast to previous works in which recovery guarantees are based on the convex relaxation of the problem, we propose a simple iterative algorithm based on hard-thresholding of appropriate residuals. Under weaker assumptions than previous works, we prove the global convergence of our iterative procedure; moreover, it admits a much faster convergence rate and lesser computational complexity per iteration. In practice, through systematic synthetic and real data simulations, we confirm our theoretical findings regarding improvements obtained by using feature information. △ Less

Submitted 4 July, 2017; v1 submitted 2 April, 2017; originally announced April 2017.

arXiv:1702.02661 [pdf, other]

Inductive Pairwise Ranking: Going Beyond the n log(n) Barrier

Authors: U. N. Niranjan, Arun Rajkumar

Abstract: We study the problem of ranking a set of items from nonactively chosen pairwise preferences where each item has feature information with it. We propose and characterize a very broad class of preference matrices giving rise to the Feature Low Rank (FLR) model, which subsumes several models ranging from the classic Bradley-Terry-Luce (BTL) (Bradley and Terry 1952) and Thurstone (Thurstone 1927) mode… ▽ More We study the problem of ranking a set of items from nonactively chosen pairwise preferences where each item has feature information with it. We propose and characterize a very broad class of preference matrices giving rise to the Feature Low Rank (FLR) model, which subsumes several models ranging from the classic Bradley-Terry-Luce (BTL) (Bradley and Terry 1952) and Thurstone (Thurstone 1927) models to the recently proposed blade-chest (Chen and Joachims 2016) and generic low-rank preference (Rajkumar and Agarwal 2016) models. We use the technique of matrix completion in the presence of side information to develop the Inductive Pairwise Ranking (IPR) algorithm that provably learns a good ranking under the FLR model, in a sample-efficient manner. In practice, through systematic synthetic simulations, we confirm our theoretical findings regarding improvements in the sample complexity due to the use of feature information. Moreover, on popular real-world preference learning datasets, with as less as 10% sampling of the pairwise comparisons, our method recovers a good ranking. △ Less

Submitted 8 February, 2017; originally announced February 2017.

arXiv:1606.05696 [pdf, other]

doi 10.1109/HiPC.2016.031

Tensor Contractions with Extended BLAS Kernels on CPU and GPU

Authors: Yang Shi, U. N. Niranjan, Animashree Anandkumar, Cris Cecka

Abstract: Tensor contractions constitute a key computational ingredient of numerical multi-linear algebra. However, as the order and dimension of tensors grow, the time and space complexities of tensor-based computations grow quickly. Existing approaches for tensor contractions typically involves explicit copy and transpose operations. In this paper, we propose and evaluate a new BLAS-like primitive STRIDED… ▽ More Tensor contractions constitute a key computational ingredient of numerical multi-linear algebra. However, as the order and dimension of tensors grow, the time and space complexities of tensor-based computations grow quickly. Existing approaches for tensor contractions typically involves explicit copy and transpose operations. In this paper, we propose and evaluate a new BLAS-like primitive STRIDEDBATCHEDGEMM that is capable of performing a wide range of tensor contractions on CPU and GPU efficiently. Through systematic benchmarking, we demonstrate the advantages of our approach over conventional approaches. Concretely, we implement the Tucker decomposition and show that using our kernels yields 100x speedup as compared to the implementation using existing state-of-the-art libraries. △ Less

Submitted 2 October, 2016; v1 submitted 17 June, 2016; originally announced June 2016.

arXiv:1510.04747 [pdf, ps, other]

Tensor vs Matrix Methods: Robust Tensor Decomposition under Block Sparse Perturbations

Authors: Animashree Anandkumar, Prateek Jain, Yang Shi, U. N. Niranjan

Abstract: Robust tensor CP decomposition involves decomposing a tensor into low rank and sparse components. We propose a novel non-convex iterative algorithm with guaranteed recovery. It alternates between low-rank CP decomposition through gradient ascent (a variant of the tensor power method), and hard thresholding of the residual. We prove convergence to the globally optimal solution under natural incoher… ▽ More Robust tensor CP decomposition involves decomposing a tensor into low rank and sparse components. We propose a novel non-convex iterative algorithm with guaranteed recovery. It alternates between low-rank CP decomposition through gradient ascent (a variant of the tensor power method), and hard thresholding of the residual. We prove convergence to the globally optimal solution under natural incoherence conditions on the low rank component, and bounded level of sparse perturbations. We compare our method with natural baselines which apply robust matrix PCA either to the {\em flattened} tensor, or to the matrix slices of the tensor. Our method can provably handle a far greater level of perturbation when the sparse tensor is block-structured. This naturally occurs in many applications such as the activity detection task in videos. Our experiments validate these findings. Thus, we establish that tensor methods can tolerate a higher level of gross corruptions compared to matrix methods. △ Less

Submitted 27 April, 2016; v1 submitted 15 October, 2015; originally announced October 2015.

arXiv:1410.7660 [pdf, other]

Non-convex Robust PCA

Authors: Praneeth Netrapalli, U N Niranjan, Sujay Sanghavi, Animashree Anandkumar, Prateek Jain

Abstract: We propose a new method for robust PCA -- the task of recovering a low-rank matrix from sparse corruptions that are of unknown value and support. Our method involves alternating between projecting appropriate residuals onto the set of low-rank matrices, and the set of sparse matrices; each projection is {\em non-convex} but easy to compute. In spite of this non-convexity, we establish exact recove… ▽ More We propose a new method for robust PCA -- the task of recovering a low-rank matrix from sparse corruptions that are of unknown value and support. Our method involves alternating between projecting appropriate residuals onto the set of low-rank matrices, and the set of sparse matrices; each projection is {\em non-convex} but easy to compute. In spite of this non-convexity, we establish exact recovery of the low-rank matrix, under the same conditions that are required by existing methods (which are based on convex optimization). For an $m \times n$ input matrix ($m \leq n)$, our method has a running time of $O(r^2mn)$ per iteration, and needs $O(\log(1/ε))$ iterations to reach an accuracy of $ε$. This is close to the running time of simple PCA via the power method, which requires $O(rmn)$ per iteration, and $O(\log(1/ε))$ iterations. In contrast, existing methods for robust PCA, which are based on convex optimization, have $O(m^2n)$ complexity per iteration, and take $O(1/ε)$ iterations, i.e., exponentially more iterations for the same accuracy. Experiments on both synthetic and real data establishes the improved speed and accuracy of our method over existing convex implementations. △ Less

Submitted 28 October, 2014; originally announced October 2014.

Comments: Extended abstract to appear in NIPS 2014

arXiv:1309.0787 [pdf, ps, other]

Online Tensor Methods for Learning Latent Variable Models

Authors: Furong Huang, U. N. Niranjan, Mohammad Umar Hakeem, Animashree Anandkumar

Abstract: We introduce an online tensor decomposition based approach for two latent variable modeling problems namely, (1) community detection, in which we learn the latent communities that the social actors in social networks belong to, and (2) topic modeling, in which we infer hidden topics of text articles. We consider decomposition of moment tensors using stochastic gradient descent. We conduct optimiza… ▽ More We introduce an online tensor decomposition based approach for two latent variable modeling problems namely, (1) community detection, in which we learn the latent communities that the social actors in social networks belong to, and (2) topic modeling, in which we infer hidden topics of text articles. We consider decomposition of moment tensors using stochastic gradient descent. We conduct optimization of multilinear operations in SGD and avoid directly forming the tensors, to save computational and storage costs. We present optimized algorithm in two platforms. Our GPU-based implementation exploits the parallelism of SIMD architectures to allow for maximum speed-up by a careful optimization of storage and data transfer, whereas our CPU-based implementation uses efficient sparse matrix computations and is suitable for large sparse datasets. For the community detection problem, we demonstrate accuracy and computational efficiency on Facebook, Yelp and DBLP datasets, and for the topic modeling problem, we also demonstrate good performance on the New York Times dataset. We compare our results to the state-of-the-art algorithms such as the variational method, and report a gain of accuracy and a gain of several orders of magnitude in the execution time. △ Less

Submitted 3 October, 2015; v1 submitted 3 September, 2013; originally announced September 2013.

Comments: JMLR 2014

Showing 1–6 of 6 results for author: Niranjan, U N