Skip to main content

Showing 51–72 of 72 results for author: Suresh, A T

.
  1. arXiv:1811.08417  [pdf, other

    cs.LG cs.CL stat.ML

    WEST: Word Encoded Sequence Transducers

    Authors: Ehsan Variani, Ananda Theertha Suresh, Mitchel Weintraub

    Abstract: Most of the parameters in large vocabulary models are used in embedding layer to map categorical features to vectors and in softmax layer for classification weights. This is a bottle-neck in memory constraint on-device training applications like federated learning and on-device inference applications like automatic speech recognition (ASR). One way of compressing the embedding and softmax layers i… ▽ More

    Submitted 20 November, 2018; originally announced November 2018.

    Comments: 12 pages

  2. arXiv:1805.10559  [pdf, other

    stat.ML cs.CR cs.LG

    cpSGD: Communication-efficient and differentially-private distributed SGD

    Authors: Naman Agarwal, Ananda Theertha Suresh, Felix Yu, Sanjiv Kumar, H. Brendan Mcmahan

    Abstract: Distributed stochastic gradient descent is an important subroutine in distributed learning. A setting of particular interest is when the clients are mobile devices, where two important concerns are communication efficiency and the privacy of the clients. Several recent works have focused on reducing the communication cost or introducing privacy guarantees, but none of the proposed communication ef… ▽ More

    Submitted 26 May, 2018; originally announced May 2018.

  3. arXiv:1711.05448  [pdf, other

    stat.ML cs.CL cs.LG

    Lattice Rescoring Strategies for Long Short Term Memory Language Models in Speech Recognition

    Authors: Shankar Kumar, Michael Nirschl, Daniel Holtmann-Rice, Hank Liao, Ananda Theertha Suresh, Felix Yu

    Abstract: Recurrent neural network (RNN) language models (LMs) and Long Short Term Memory (LSTM) LMs, a variant of RNN LMs, have been shown to outperform traditional N-gram LMs on speech recognition tasks. However, these models are computationally more expensive than N-gram LMs for decoding, and thus, challenging to integrate into speech recognizers. Recent research has proposed the use of lattice-rescoring… ▽ More

    Submitted 15 November, 2017; originally announced November 2017.

    Comments: Accepted at ASRU 2017

    Journal ref: Proceedings of ASRU 2017

  4. arXiv:1709.06138  [pdf, other

    stat.ML cs.AI cs.IT cs.LG

    Model-Powered Conditional Independence Test

    Authors: Rajat Sen, Ananda Theertha Suresh, Karthikeyan Shanmugam, Alexandros G. Dimakis, Sanjay Shakkottai

    Abstract: We consider the problem of non-parametric Conditional Independence testing (CI testing) for continuous random variables. Given i.i.d samples from the joint distribution $f(x,y,z)$ of continuous random vectors $X,Y$ and $Z,$ we determine whether $X \perp Y | Z$. We approach this by converting the conditional independence test into a classification problem. This allows us to harness very powerful cl… ▽ More

    Submitted 18 September, 2017; originally announced September 2017.

    Comments: 19 Pages, 2 figures, Accepted for publication in NIPS 2017

  5. arXiv:1705.05366  [pdf, ps, other

    cs.LG

    Maximum Selection and Ranking under Noisy Comparisons

    Authors: Moein Falahatgar, Alon Orlitsky, Venkatadheeraj Pichapati, Ananda Theertha Suresh

    Abstract: We consider $(ε,δ)$-PAC maximum-selection and ranking for general probabilistic models whose comparisons probabilities satisfy strong stochastic transitivity and stochastic triangle inequality. Modifying the popular knockout tournament, we propose a maximum-selection algorithm that uses $\mathcal{O}\left(\frac{n}{ε^2}\log \frac{1}δ\right)$ comparisons, a number tight up to a constant factor. We th… ▽ More

    Submitted 15 May, 2017; originally announced May 2017.

  6. arXiv:1705.05006  [pdf, ps, other

    cs.IT

    Minimax Risk for Missing Mass Estimation

    Authors: Nikhilesh Rajaraman, Andrew Thangaraj, Ananda Theertha Suresh

    Abstract: The problem of estimating the missing mass or total probability of unseen elements in a sequence of $n$ random samples is considered under the squared error loss function. The worst-case risk of the popular Good-Turing estimator is shown to be between $0.6080/n$ and $0.6179/n$. The minimax risk is shown to be lower bounded by $0.25/n$. This appears to be the first such published result on minimax… ▽ More

    Submitted 14 May, 2017; originally announced May 2017.

    Comments: IEEE International Symposium on Information Theory 2017, Aachen, Germany

  7. arXiv:1702.05574  [pdf, ps, other

    math.ST cs.IT stat.ML

    Sample complexity of population recovery

    Authors: Yury Polyanskiy, Ananda Theertha Suresh, Yihong Wu

    Abstract: The problem of population recovery refers to estimating a distribution based on incomplete or corrupted samples. Consider a random poll of sample size $n$ conducted on a population of individuals, where each pollee is asked to answer $d$ binary questions. We consider one of the two polling impediments: (a) in lossy population recovery, a pollee may skip each question with probability $ε$, (b) in n… ▽ More

    Submitted 29 April, 2020; v1 submitted 18 February, 2017; originally announced February 2017.

    Comments: Earlier versions (incl. the one in proceedings) had a mistake in Prop. 9 that propagated to Theorem 1 (lower bound) and Lemma 12. This version (v3) fixes those

  8. arXiv:1611.02960  [pdf, other

    cs.IT cs.DS cs.LG

    A Unified Maximum Likelihood Approach for Optimal Distribution Property Estimation

    Authors: Jayadev Acharya, Hirakendu Das, Alon Orlitsky, Ananda Theertha Suresh

    Abstract: The advent of data science has spurred interest in estimating properties of distributions over large alphabets. Fundamental symmetric properties such as support size, support coverage, entropy, and proximity to uniformity, received most attention, with each property estimated using a different technique and often intricate analysis tools. We prove that for all these properties, a single, simple,… ▽ More

    Submitted 28 November, 2016; v1 submitted 9 November, 2016; originally announced November 2016.

  9. arXiv:1611.00429  [pdf, ps, other

    cs.LG

    Distributed Mean Estimation with Limited Communication

    Authors: Ananda Theertha Suresh, Felix X. Yu, Sanjiv Kumar, H. Brendan McMahan

    Abstract: Motivated by the need for distributed learning and optimization algorithms with low communication cost, we study communication efficient algorithms for distributed mean estimation. Unlike previous works, we make no probabilistic assumptions on the data. We first show that for $d$ dimensional data with $n$ clients, a naive stochastic binary rounding approach yields a mean squared error (MSE) of… ▽ More

    Submitted 25 September, 2017; v1 submitted 1 November, 2016; originally announced November 2016.

  10. arXiv:1610.09072  [pdf, other

    cs.LG stat.ML

    Orthogonal Random Features

    Authors: Felix X. Yu, Ananda Theertha Suresh, Krzysztof Choromanski, Daniel Holtmann-Rice, Sanjiv Kumar

    Abstract: We present an intriguing discovery related to Random Fourier Features: in Gaussian kernel approximation, replacing the random Gaussian matrix by a properly scaled random orthogonal matrix significantly decreases kernel approximation error. We call this technique Orthogonal Random Features (ORF), and provide theoretical and empirical justification for this behavior. Motivated by this discovery, we… ▽ More

    Submitted 27 October, 2016; originally announced October 2016.

    Comments: NIPS 2016

  11. arXiv:1610.05492  [pdf, other

    cs.LG

    Federated Learning: Strategies for Improving Communication Efficiency

    Authors: Jakub Konečný, H. Brendan McMahan, Felix X. Yu, Peter Richtárik, Ananda Theertha Suresh, Dave Bacon

    Abstract: Federated Learning is a machine learning setting where the goal is to train a high-quality centralized model while training data remains distributed over a large number of clients each with unreliable and relatively slow network connections. We consider learning algorithms for this setting where on each round, each client independently computes an update to the current model based on its local dat… ▽ More

    Submitted 30 October, 2017; v1 submitted 18 October, 2016; originally announced October 2016.

  12. arXiv:1606.02786  [pdf, ps, other

    cs.DS cs.IT

    Maximum Selection and Sorting with Adversarial Comparators and an Application to Density Estimation

    Authors: Jayadev Acharya, Moein Falahatgar, Ashkan Jafarpour, Alon Orlitsky, Ananda Theertha Suresh

    Abstract: We study maximum selection and sorting of $n$ numbers using pairwise comparators that output the larger of their two inputs if the inputs are more than a given threshold apart, and output an adversarially-chosen input otherwise. We consider two adversarial models. A non-adaptive adversary that decides on the outcomes in advance based solely on the inputs, and an adaptive adversary that can decide… ▽ More

    Submitted 8 June, 2016; originally announced June 2016.

  13. arXiv:1511.07428  [pdf, other

    math.ST stat.ML

    Estimating the number of unseen species: A bird in the hand is worth $\log n $ in the bush

    Authors: Alon Orlitsky, Ananda Theertha Suresh, Yihong Wu

    Abstract: Estimating the number of unseen species is an important problem in many scientific endeavors. Its most popular formulation, introduced by Fisher, uses $n$ samples to predict the number $U$ of hitherto unseen species that would be observed if $t\cdot n$ new samples were collected. Of considerable interest is the largest ratio $t$ between the number of new and existing samples for which $U$ can be a… ▽ More

    Submitted 2 March, 2016; v1 submitted 23 November, 2015; originally announced November 2015.

  14. arXiv:1504.08070  [pdf, ps, other

    cs.IT math.ST

    Universal Compression of Power-Law Distributions

    Authors: Moein Falahatgar, Ashkan Jafarpour, Alon Orlitsky, Venkatadheeraj Pichapati, Ananda Theertha Suresh

    Abstract: English words and the outputs of many other natural processes are well-known to follow a Zipf distribution. Yet this thoroughly-established property has never been shown to help compress or predict these important processes. We show that the expected redundancy of Zipf distributions of order $α>1$ is roughly the $1/α$ power of the expected redundancy of unrestricted distributions. Hence for these… ▽ More

    Submitted 30 April, 2015; v1 submitted 29 April, 2015; originally announced April 2015.

    Comments: 20 pages

  15. arXiv:1504.04103  [pdf, ps, other

    cs.DS cs.CC cs.LG math.ST

    Faster Algorithms for Testing under Conditional Sampling

    Authors: Moein Falahatgar, Ashkan Jafarpour, Alon Orlitsky, Venkatadheeraj Pichapathi, Ananda Theertha Suresh

    Abstract: There has been considerable recent interest in distribution-tests whose run-time and sample requirements are sublinear in the domain-size $k$. We study two of the most important tests under the conditional-sampling model where each query specifies a subset $S$ of the domain, and the response is a sample drawn from $S$ according to the underlying distribution. For identity testing, which asks whe… ▽ More

    Submitted 16 April, 2015; originally announced April 2015.

    Comments: 31 pages

  16. arXiv:1503.07940  [pdf, other

    cs.IT cs.DS cs.LG math.ST

    Competitive Distribution Estimation

    Authors: Alon Orlitsky, Ananda Theertha Suresh

    Abstract: Estimating an unknown distribution from its samples is a fundamental problem in statistics. The common, min-max, formulation of this goal considers the performance of the best estimator over all distributions in a class. It shows that with $n$ samples, distributions over $k$ symbols can be learned to a KL divergence that decreases to zero with the sample size $n$, but grows unboundedly with the al… ▽ More

    Submitted 26 March, 2015; originally announced March 2015.

    Comments: 15 pages

  17. arXiv:1502.07288  [pdf, other

    cs.IT cs.DS cs.FL

    Automata and Graph Compression

    Authors: Mehryar Mohri, Michael Riley, Ananda Theertha Suresh

    Abstract: We present a theoretical framework for the compression of automata, which are widely used in speech processing and other natural language processing tasks. The framework extends to graph compression. Similar to stationary ergodic processes, we formulate a probabilistic process of graph and automata generation that captures real world phenomena and provide a universal compression scheme LZA for thi… ▽ More

    Submitted 25 February, 2015; originally announced February 2015.

    Comments: 15 pages

  18. arXiv:1501.01689  [pdf, ps, other

    cs.DS cs.IT cs.LG

    Sparse Solutions to Nonnegative Linear Systems and Applications

    Authors: Aditya Bhaskara, Ananda Theertha Suresh, Morteza Zadimoghaddam

    Abstract: We give an efficient algorithm for finding sparse approximate solutions to linear systems of equations with nonnegative coefficients. Unlike most known results for sparse recovery, we do not require {\em any} assumption on the matrix other than non-negativity. Our algorithm is combinatorial in nature, inspired by techniques for the set cover problem, as well as the multiplicative weight update met… ▽ More

    Submitted 7 January, 2015; originally announced January 2015.

    Comments: 22 pages

  19. arXiv:1408.1000  [pdf, other

    cs.IT cs.DS cs.LG

    Estimating Renyi Entropy of Discrete Distributions

    Authors: Jayadev Acharya, Alon Orlitsky, Ananda Theertha Suresh, Himanshu Tyagi

    Abstract: It was recently shown that estimating the Shannon entropy $H({\rm p})$ of a discrete $k$-symbol distribution ${\rm p}$ requires $Θ(k/\log k)$ samples, a number that grows near-linearly in the support size. In many applications $H({\rm p})$ can be replaced by the more general Rényi entropy of order $α$, $H_α({\rm p})$. We determine the number of samples needed to estimate $H_α({\rm p})$ for all… ▽ More

    Submitted 10 March, 2016; v1 submitted 2 August, 2014; originally announced August 2014.

  20. arXiv:1405.7460  [pdf, ps, other

    cs.IT cs.LG

    Universal Compression of Envelope Classes: Tight Characterization via Poisson Sampling

    Authors: Jayadev Acharya, Ashkan Jafarpour, Alon Orlitsky, Ananda Theertha Suresh

    Abstract: The Poisson-sampling technique eliminates dependencies among symbol appearances in a random sequence. It has been used to simplify the analysis and strengthen the performance guarantees of randomized algorithms. Applying this method to universal compression, we relate the redundancies of fixed-length and Poisson-sampled sequences, use the relation to derive a simple single-letter formula that appr… ▽ More

    Submitted 29 May, 2014; originally announced May 2014.

  21. arXiv:1402.4746  [pdf, ps, other

    cs.LG cs.DS cs.IT stat.ML

    Near-optimal-sample estimators for spherical Gaussian mixtures

    Authors: Jayadev Acharya, Ashkan Jafarpour, Alon Orlitsky, Ananda Theertha Suresh

    Abstract: Statistical and machine-learning algorithms are frequently applied to high-dimensional data. In many of these applications data is scarce, and often much more costly than computation time. We provide the first sample-efficient polynomial-time estimator for high-dimensional spherical Gaussian mixtures. For mixtures of any $k$ $d$-dimensional spherical Gaussians, we derive an intuitive spectral-es… ▽ More

    Submitted 19 February, 2014; originally announced February 2014.

  22. Strong Secrecy for Erasure Wiretap Channels

    Authors: Ananda T. Suresh, Arunkumar Subramanian, Andrew Thangaraj, Matthieu Bloch, Steven McLaughlin

    Abstract: We show that duals of certain low-density parity-check (LDPC) codes, when used in a standard coset coding scheme, provide strong secrecy over the binary erasure wiretap channel (BEWC). This result hinges on a stop** set analysis of ensembles of LDPC codes with block length $n$ and girth $\geq 2k$, for some $k \geq 2$. We show that if the minimum left degree of the ensemble is $l_\mathrm{min}$, t… ▽ More

    Submitted 30 April, 2010; originally announced April 2010.

    Comments: Submitted to the Information Theory Workship (ITW) 2010, Dublin