Skip to main content

Showing 1–44 of 44 results for author: Kamath, G

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.17814  [pdf, ps, other

    stat.ML cs.DS cs.IT cs.LG math.ST

    Distribution Learnability and Robustness

    Authors: Shai Ben-David, Alex Bie, Gautam Kamath, Tosca Lechner

    Abstract: We examine the relationship between learnability and robust (or agnostic) learnability for the problem of distribution learning. We show that, contrary to other learning settings (e.g., PAC learning of function classes), realizable learnability of a class of probability distributions does not imply its agnostic learnability. We go on to examine what type of data corruption can disrupt the learnabi… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: In NeurIPS 2023

  2. arXiv:2405.20769  [pdf, other

    cs.CR cs.DS cs.LG stat.ML

    Avoiding Pitfalls for Privacy Accounting of Subsampled Mechanisms under Composition

    Authors: Christian Janos Lebeda, Matthew Regehr, Gautam Kamath, Thomas Steinke

    Abstract: We consider the problem of computing tight privacy guarantees for the composition of subsampled differentially private mechanisms. Recent algorithms can numerically compute the privacy parameters to arbitrary precision but must be carefully applied. Our main contribution is to address two common points of confusion. First, some privacy accountants assume that the privacy guarantees for the compo… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  3. arXiv:2405.20405  [pdf, other

    cs.DS cs.CR cs.IT cs.LG stat.ML

    Private Mean Estimation with Person-Level Differential Privacy

    Authors: Sushant Agarwal, Gautam Kamath, Mahbod Majid, Argyris Mouzakis, Rose Silver, Jonathan Ullman

    Abstract: We study differentially private (DP) mean estimation in the case where each person holds multiple samples. Commonly referred to as the "user-level" setting, DP here requires the usual notion of distributional stability when all of a person's datapoints can be modified. Informally, if $n$ people each have $m$ samples from an unknown $d$-dimensional distribution with bounded $k$-th moments, we show… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 67 pages, 3 figures

  4. arXiv:2402.00267  [pdf, ps, other

    cs.DS cs.CR stat.ML

    Not All Learnable Distribution Classes are Privately Learnable

    Authors: Mark Bun, Gautam Kamath, Argyris Mouzakis, Vikrant Singhal

    Abstract: We give an example of a class of distributions that is learnable in total variation distance with a finite number of samples, but not learnable under $(\varepsilon, δ)$-differential privacy. This refutes a conjecture of Ashtiani.

    Submitted 5 February, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

    Comments: To appear in ALT 2024. Added a minor clarification to the construction and an acknowledgement of the Fields Institute

  5. arXiv:2308.06239  [pdf, ps, other

    cs.LG cs.CR stat.ML

    Private Distribution Learning with Public Data: The View from Sample Compression

    Authors: Shai Ben-David, Alex Bie, Clément L. Canonne, Gautam Kamath, Vikrant Singhal

    Abstract: We study the problem of private distribution learning with access to public data. In this setup, which we refer to as public-private learning, the learner is given public and private samples drawn from an unknown distribution $p$ belonging to a class $\mathcal Q$, with the goal of outputting an estimate of $p$ while adhering to privacy constraints (here, pure differential privacy) only with respec… ▽ More

    Submitted 14 August, 2023; v1 submitted 11 August, 2023; originally announced August 2023.

    Comments: 31 pages

  6. arXiv:2303.01256  [pdf, other

    stat.ML cs.CR cs.CV cs.DS cs.LG

    Choosing Public Datasets for Private Machine Learning via Gradient Subspace Distance

    Authors: Xin Gu, Gautam Kamath, Zhiwei Steven Wu

    Abstract: Differentially private stochastic gradient descent privatizes model training by injecting noise into each iteration, where the noise magnitude increases with the number of model parameters. Recent works suggest that we can reduce the noise by leveraging public data for private machine learning, by projecting gradients onto a subspace prescribed by the public data. However, given a choice of public… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

  7. arXiv:2301.13334  [pdf, ps, other

    math.ST cs.CR cs.DS stat.ML

    A Bias-Variance-Privacy Trilemma for Statistical Estimation

    Authors: Gautam Kamath, Argyris Mouzakis, Matthew Regehr, Vikrant Singhal, Thomas Steinke, Jonathan Ullman

    Abstract: The canonical algorithm for differentially private mean estimation is to first clip the samples to a bounded range and then add noise to their empirical mean. Clip** controls the sensitivity and, hence, the variance of the noise that we add for privacy. But clip** also introduces statistical bias. We prove that this tradeoff is inherent: no algorithm can simultaneously have low bias, low varia… ▽ More

    Submitted 28 February, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

  8. arXiv:2212.06470  [pdf, ps, other

    cs.LG cs.CR stat.ML

    Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining

    Authors: Florian Tramèr, Gautam Kamath, Nicholas Carlini

    Abstract: The performance of differentially private machine learning can be boosted significantly by leveraging the transfer learning capabilities of non-private models pretrained on large public datasets. We critically review this approach. We primarily question whether the use of large Web-scraped datasets should be viewed as differential-privacy-preserving. We caution that publicizing these models pret… ▽ More

    Submitted 2 June, 2024; v1 submitted 13 December, 2022; originally announced December 2022.

    Comments: ICML 2024

  9. arXiv:2212.05015  [pdf, ps, other

    cs.DS cs.CR cs.IT stat.ML

    Robustness Implies Privacy in Statistical Estimation

    Authors: Samuel B. Hopkins, Gautam Kamath, Mahbod Majid, Shyam Narayanan

    Abstract: We study the relationship between adversarial robustness and differential privacy in high-dimensional algorithmic statistics. We give the first black-box reduction from privacy to robustness which can produce private estimators with optimal tradeoffs among sample complexity, accuracy, and privacy for a wide range of fundamental high-dimensional parameter estimation problems, including mean and cov… ▽ More

    Submitted 15 June, 2024; v1 submitted 9 December, 2022; originally announced December 2022.

    Comments: 90 pages, 2 tables. Appeared in STOC, 2023

  10. arXiv:2208.07984  [pdf, other

    cs.LG cs.CR stat.ML

    Private Estimation with Public Data

    Authors: Alex Bie, Gautam Kamath, Vikrant Singhal

    Abstract: We initiate the study of differentially private (DP) estimation with access to a small amount of public data. For private estimation of d-dimensional Gaussians, we assume that the public data comes from a Gaussian that may have vanishing similarity in total variation distance with the underlying Gaussian of the private data. We show that under the constraints of pure or concentrated DP, d+1 public… ▽ More

    Submitted 5 April, 2023; v1 submitted 16 August, 2022; originally announced August 2022.

    Comments: 55 pages; updated funding acknowledgement + simulation results from NeurIPS 2022 camera-ready

  11. arXiv:2206.02617  [pdf, other

    cs.LG cs.CR cs.DS stat.ML

    Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent

    Authors: Da Yu, Gautam Kamath, Janardhan Kulkarni, Tie-Yan Liu, Jian Yin, Huishuai Zhang

    Abstract: Differentially private stochastic gradient descent (DP-SGD) is the workhorse algorithm for recent advances in private deep learning. It provides a single privacy guarantee to all datapoints in the dataset. We propose output-specific $(\varepsilon,δ)$-DP to characterize privacy guarantees for individual examples when releasing models trained by DP-SGD. We also design an efficient algorithm to inves… ▽ More

    Submitted 2 September, 2023; v1 submitted 6 June, 2022; originally announced June 2022.

    Comments: Published in Transactions on Machine Learning Research (TMLR)

  12. arXiv:2205.08532  [pdf, ps, other

    cs.DS cs.CR stat.ML

    New Lower Bounds for Private Estimation and a Generalized Fingerprinting Lemma

    Authors: Gautam Kamath, Argyris Mouzakis, Vikrant Singhal

    Abstract: We prove new lower bounds for statistical estimation tasks under the constraint of $(\varepsilon, δ)$-differential privacy. First, we provide tight lower bounds for private covariance estimation of Gaussian distributions. We show that estimating the covariance matrix in Frobenius norm requires $Ω(d^2)$ samples, and in spectral norm requires $Ω(d^{3/2})$ samples, both matching upper bounds up to lo… ▽ More

    Submitted 28 March, 2023; v1 submitted 17 May, 2022; originally announced May 2022.

    Comments: NeurIPS 2022. Minor correction to the discussion of independent work

  13. Efficient Mean Estimation with Pure Differential Privacy via a Sum-of-Squares Exponential Mechanism

    Authors: Samuel B. Hopkins, Gautam Kamath, Mahbod Majid

    Abstract: We give the first polynomial-time algorithm to estimate the mean of a $d$-variate probability distribution with bounded covariance from $\tilde{O}(d)$ independent samples subject to pure differential privacy. Prior algorithms for this problem either incur exponential running time, require $Ω(d^{1.5})$ samples, or satisfy only the weaker concentrated or approximate differential privacy conditions.… ▽ More

    Submitted 2 June, 2022; v1 submitted 25 November, 2021; originally announced November 2021.

    Comments: 66 pages, STOC 2022

  14. arXiv:2111.05320  [pdf, ps, other

    cs.DS cs.IT math.ST stat.ML

    Robust Estimation for Random Graphs

    Authors: Jayadev Acharya, Ayush Jain, Gautam Kamath, Ananda Theertha Suresh, Huanyu Zhang

    Abstract: We study the problem of robustly estimating the parameter $p$ of an Erdős-Rényi random graph on $n$ nodes, where a $γ$ fraction of nodes may be adversarially corrupted. After showing the deficiencies of canonical estimators, we design a computationally-efficient spectral algorithm which estimates $p$ up to accuracy $\tilde O(\sqrt{p(1-p)}/n + γ\sqrt{p(1-p)} /\sqrt{n}+ γ/n)$ for $γ< 1/60$. Furtherm… ▽ More

    Submitted 15 February, 2022; v1 submitted 9 November, 2021; originally announced November 2021.

  15. arXiv:2111.04906  [pdf, other

    stat.ML cs.CR cs.LG

    The Role of Adaptive Optimizers for Honest Private Hyperparameter Selection

    Authors: Shubhankar Mohapatra, Sa** Sasy, Xi He, Gautam Kamath, Om Thakkar

    Abstract: Hyperparameter optimization is a ubiquitous challenge in machine learning, and the performance of a trained model depends crucially upon their effective selection. While a rich set of tools exist for this purpose, there are currently no practical hyperparameter selection methods under the constraint of differential privacy (DP). We study honest hyperparameter selection for differentially private m… ▽ More

    Submitted 8 November, 2021; originally announced November 2021.

  16. arXiv:2111.04609  [pdf, ps, other

    stat.ML cs.CR cs.DS cs.IT cs.LG

    A Private and Computationally-Efficient Estimator for Unbounded Gaussians

    Authors: Gautam Kamath, Argyris Mouzakis, Vikrant Singhal, Thomas Steinke, Jonathan Ullman

    Abstract: We give the first polynomial-time, polynomial-sample, differentially private estimator for the mean and covariance of an arbitrary Gaussian distribution $\mathcal{N}(μ,Σ)$ in $\mathbb{R}^d$. All previous estimators are either nonconstructive, with unbounded running time, or require the user to specify a priori bounds on the parameters $μ$ and $Σ$. The primary new technical tool in our algorithm is… ▽ More

    Submitted 11 February, 2022; v1 submitted 8 November, 2021; originally announced November 2021.

  17. arXiv:2110.14465  [pdf, other

    stat.ME cs.CR math.ST

    Unbiased Statistical Estimation and Valid Confidence Intervals Under Differential Privacy

    Authors: Christian Covington, Xi He, James Honaker, Gautam Kamath

    Abstract: We present a method for producing unbiased parameter estimates and valid confidence intervals under the constraints of differential privacy, a formal framework for limiting individual information leakage from sensitive data. Prior work in this area is limited in that it is tailored to calculating confidence intervals for specific statistical procedures, such as mean estimation or simple linear reg… ▽ More

    Submitted 14 February, 2024; v1 submitted 27 October, 2021; originally announced October 2021.

  18. arXiv:2110.06500  [pdf, other

    cs.LG cs.CL cs.CR stat.ML

    Differentially Private Fine-tuning of Language Models

    Authors: Da Yu, Saurabh Naik, Arturs Backurs, Sivakanth Gopi, Huseyin A. Inan, Gautam Kamath, Janardhan Kulkarni, Yin Tat Lee, Andre Manoel, Lukas Wutschitz, Sergey Yekhanin, Huishuai Zhang

    Abstract: We give simpler, sparser, and faster algorithms for differentially private fine-tuning of large-scale pre-trained language models, which achieve the state-of-the-art privacy versus utility tradeoffs on many standard NLP tasks. We propose a meta-framework for this problem, inspired by the recent success of highly parameter-efficient methods for fine-tuning. Our experiments show that differentially… ▽ More

    Submitted 14 July, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: ICLR 2022. Code available at https://github.com/huseyinatahaninan/Differentially-Private-Fine-tuning-of-Language-Models

  19. arXiv:2106.13414  [pdf, other

    cs.DS cs.IT math.PR math.ST stat.ML

    The Price of Tolerance in Distribution Testing

    Authors: Clément L. Canonne, Ayush Jain, Gautam Kamath, Jerry Li

    Abstract: We revisit the problem of tolerant distribution testing. That is, given samples from an unknown distribution $p$ over $\{1, \dots, n\}$, is it $\varepsilon_1$-close to or $\varepsilon_2$-far from a reference distribution $q$ (in total variation distance)? Despite significant interest over the past decade, this problem is well understood only in the extreme cases. In the noiseless setting (i.e.,… ▽ More

    Submitted 8 November, 2021; v1 submitted 24 June, 2021; originally announced June 2021.

    Comments: Added a result on instance-optimal testing, and further discussion in the introduction

  20. arXiv:2106.01336  [pdf, ps, other

    cs.LG cs.CR cs.DS math.OC stat.ML

    Improved Rates for Differentially Private Stochastic Convex Optimization with Heavy-Tailed Data

    Authors: Gautam Kamath, Xingtu Liu, Huanyu Zhang

    Abstract: We study stochastic convex optimization with heavy-tailed data under the constraint of differential privacy (DP). Most prior work on this problem is restricted to the case where the loss function is Lipschitz. Instead, as introduced by Wang, Xiao, Devadas, and Xu \cite{WangXDX20}, we study general convex loss functions with the assumption that the distribution of gradients has bounded $k$-th momen… ▽ More

    Submitted 1 November, 2022; v1 submitted 2 June, 2021; originally announced June 2021.

  21. arXiv:2104.09732  [pdf, other

    stat.ML cs.LG

    Knowledge Distillation as Semiparametric Inference

    Authors: Tri Dao, Govinda M Kamath, Vasilis Syrgkanis, Lester Mackey

    Abstract: A popular approach to model compression is to train an inexpensive student model to mimic the class probabilities of a highly accurate but cumbersome teacher model. Surprisingly, this two-step knowledge distillation process often leads to higher accuracy than training the student directly on labeled data. To explain and enhance this phenomenon, we cast knowledge distillation as a semiparametric in… ▽ More

    Submitted 19 April, 2021; originally announced April 2021.

  22. arXiv:2011.04832  [pdf, other

    cs.LG cs.IT q-bio.GN stat.ML

    Adaptive Learning of Rank-One Models for Efficient Pairwise Sequence Alignment

    Authors: Govinda M. Kamath, Tavor Z. Baharav, Ilan Shomorony

    Abstract: Pairwise alignment of DNA sequencing data is a ubiquitous task in bioinformatics and typically represents a heavy computational burden. State-of-the-art approaches to speed up this task use hashing to identify short segments (k-mers) that are shared by pairs of reads, which can then be used to estimate alignment scores. However, when the number of reads is large, accurately estimating alignment sc… ▽ More

    Submitted 12 February, 2021; v1 submitted 9 November, 2020; originally announced November 2020.

    Comments: NeurIPS 2020

  23. arXiv:2010.09929  [pdf, ps, other

    stat.ML cs.CR cs.DS cs.IT cs.LG

    On the Sample Complexity of Privately Learning Unbounded High-Dimensional Gaussians

    Authors: Ishaq Aden-Ali, Hassan Ashtiani, Gautam Kamath

    Abstract: We provide sample complexity upper bounds for agnostically learning multivariate Gaussians under the constraint of approximate differential privacy. These are the first finite sample upper bounds for general Gaussians which do not impose restrictions on the parameters of the distribution. Our bounds are near-optimal in the case when the covariance is known to be the identity, and conjectured to be… ▽ More

    Submitted 19 October, 2020; originally announced October 2020.

  24. arXiv:2006.06618  [pdf, other

    stat.ML cs.CR cs.DS cs.IT cs.LG math.ST

    CoinPress: Practical Private Mean and Covariance Estimation

    Authors: Sourav Biswas, Yihe Dong, Gautam Kamath, Jonathan Ullman

    Abstract: We present simple differentially private estimators for the mean and covariance of multivariate sub-Gaussian data that are accurate at small sample sizes. We demonstrate the effectiveness of our algorithms both theoretically and empirically using synthetic and real-world datasets -- showing that their asymptotic error rates match the state-of-the-art theoretical bounds, and that they concretely ou… ▽ More

    Submitted 9 October, 2022; v1 submitted 11 June, 2020; originally announced June 2020.

    Comments: Code is available at https://github.com/twistedcubic/coin-press

  25. arXiv:2005.00010  [pdf, other

    stat.ML cs.CR cs.DS cs.IT cs.LG

    A Primer on Private Statistics

    Authors: Gautam Kamath, Jonathan Ullman

    Abstract: Differentially private statistical estimation has seen a flurry of developments over the last several years. Study has been divided into two schools of thought, focusing on empirical statistics versus population statistics. We suggest that these two lines of work are more similar than different by giving examples of methods that were initially framed for empirical statistics, but can be applied ju… ▽ More

    Submitted 30 April, 2020; originally announced May 2020.

    Comments: 20 pages. Comments welcome

  26. arXiv:2004.00010  [pdf, other

    cs.DS cs.CR stat.ML

    The Discrete Gaussian for Differential Privacy

    Authors: Clément L. Canonne, Gautam Kamath, Thomas Steinke

    Abstract: A key tool for building differentially private systems is adding Gaussian noise to the output of a function evaluated on a sensitive dataset. Unfortunately, using a continuous distribution presents several practical challenges. First and foremost, finite computers cannot exactly represent samples from continuous distributions, and previous work has demonstrated that seemingly innocuous numerical e… ▽ More

    Submitted 18 January, 2021; v1 submitted 31 March, 2020; originally announced April 2020.

    Comments: Improved time analysis, and generalisation to the multivariate case

  27. arXiv:2002.12321  [pdf, other

    stat.ML cs.CR cs.DS cs.LG math.ST stat.ME

    PAPRIKA: Private Online False Discovery Rate Control

    Authors: Wanrong Zhang, Gautam Kamath, Rachel Cummings

    Abstract: In hypothesis testing, a false discovery occurs when a hypothesis is incorrectly rejected due to noise in the sample. When adaptively testing multiple hypotheses, the probability of a false discovery increases as more tests are performed. Thus the problem of False Discovery Rate (FDR) control is to find a procedure for testing multiple hypotheses that accounts for this effect in determining the se… ▽ More

    Submitted 20 October, 2020; v1 submitted 27 February, 2020; originally announced February 2020.

  28. arXiv:2002.09465  [pdf, other

    cs.DS cs.CR cs.IT cs.LG stat.ML

    Locally Private Hypothesis Selection

    Authors: Sivakanth Gopi, Gautam Kamath, Janardhan Kulkarni, Aleksandar Nikolov, Zhiwei Steven Wu, Huanyu Zhang

    Abstract: We initiate the study of hypothesis selection under local differential privacy. Given samples from an unknown probability distribution $p$ and a set of $k$ probability distributions $\mathcal{Q}$, we aim to output, under the constraints of $\varepsilon$-local differential privacy, a distribution from $\mathcal{Q}$ whose total variation distance to $p$ is comparable to the best such distribution. T… ▽ More

    Submitted 19 June, 2020; v1 submitted 21 February, 2020; originally announced February 2020.

    Comments: To appear in COLT 2020

  29. arXiv:2002.09464  [pdf, other

    cs.DS cs.CR cs.IT cs.LG stat.ML

    Private Mean Estimation of Heavy-Tailed Distributions

    Authors: Gautam Kamath, Vikrant Singhal, Jonathan Ullman

    Abstract: We give new upper and lower bounds on the minimax sample complexity of differentially private mean estimation of distributions with bounded $k$-th moments. Roughly speaking, in the univariate case, we show that $n = Θ\left(\frac{1}{α^2} + \frac{1}{α^{\frac{k}{k-1}}\varepsilon}\right)$ samples are necessary and sufficient to estimate the mean to $α$-accuracy under $\varepsilon$-differential privacy… ▽ More

    Submitted 16 February, 2021; v1 submitted 21 February, 2020; originally announced February 2020.

    Comments: Appeared in COLT 2020

  30. arXiv:2002.09463  [pdf, ps, other

    cs.DS cs.CR cs.LG stat.ML

    Privately Learning Markov Random Fields

    Authors: Huanyu Zhang, Gautam Kamath, Janardhan Kulkarni, Zhiwei Steven Wu

    Abstract: We consider the problem of learning Markov Random Fields (including the prototypical example, the Ising model) under the constraint of differential privacy. Our learning goals include both structure learning, where we try to estimate the underlying graph structure of the model, as well as the harder goal of parameter learning, in which we additionally estimate the parameter on each edge. We provid… ▽ More

    Submitted 14 August, 2020; v1 submitted 21 February, 2020; originally announced February 2020.

  31. arXiv:1909.03951  [pdf, other

    cs.DS cs.CR cs.IT cs.LG stat.ML

    Differentially Private Algorithms for Learning Mixtures of Separated Gaussians

    Authors: Gautam Kamath, Or Sheffet, Vikrant Singhal, Jonathan Ullman

    Abstract: Learning the parameters of Gaussian mixture models is a fundamental and widely studied problem with numerous applications. In this work, we give new algorithms for learning the parameters of a high-dimensional, well separated, Gaussian mixture model subject to the strong constraint of differential privacy. In particular, we give a differentially private analogue of the algorithm of Achlioptas and… ▽ More

    Submitted 15 October, 2019; v1 submitted 9 September, 2019; originally announced September 2019.

    Comments: To appear in NeurIPS 2019

  32. arXiv:1905.13229  [pdf, ps, other

    cs.DS cs.CR cs.LG stat.ML

    Private Hypothesis Selection

    Authors: Mark Bun, Gautam Kamath, Thomas Steinke, Zhiwei Steven Wu

    Abstract: We provide a differentially private algorithm for hypothesis selection. Given samples from an unknown probability distribution $P$ and a set of $m$ probability distributions $\mathcal{H}$, the goal is to output, in a $\varepsilon$-differentially private manner, a distribution from $\mathcal{H}$ whose total variation distance to $P$ is comparable to that of the best such distribution (which we deno… ▽ More

    Submitted 4 January, 2021; v1 submitted 30 May, 2019; originally announced May 2019.

    Comments: Appeared in NeurIPS 2019. Final version to appear in IEEE Transactions on Information Theory

  33. arXiv:1905.11947  [pdf, ps, other

    cs.DS cs.CR cs.IT cs.LG stat.ML

    Private Identity Testing for High-Dimensional Distributions

    Authors: Clément L. Canonne, Gautam Kamath, Audra McMillan, Jonathan Ullman, Lydia Zakynthinou

    Abstract: In this work we present novel differentially private identity (goodness-of-fit) testers for natural and widely studied classes of multivariate product distributions: Gaussians in $\mathbb{R}^d$ with known covariance and product distributions over $\{\pm 1\}^{d}$. Our testers have improved sample complexity compared to those derived from previous techniques, and are the first testers whose sample c… ▽ More

    Submitted 3 March, 2022; v1 submitted 28 May, 2019; originally announced May 2019.

    Comments: Discussing a mistake in the proof of one of the algorithms (Theorem 1.2, computationally inefficient tester), and pointing to follow-up work by Narayanan (2022) who improves upon our results and fixes this mistake

  34. arXiv:1811.11148  [pdf, ps, other

    cs.DS cs.CR cs.IT cs.LG stat.ML

    The Structure of Optimal Private Tests for Simple Hypotheses

    Authors: Clément L. Canonne, Gautam Kamath, Audra McMillan, Adam Smith, Jonathan Ullman

    Abstract: Hypothesis testing plays a central role in statistical inference, and is used in many settings where privacy concerns are paramount. This work answers a basic question about privately testing simple hypotheses: given two distributions $P$ and $Q$, and a privacy level $\varepsilon$, how many i.i.d. samples are needed to distinguish $P$ from $Q$ subject to $\varepsilon$-differential privacy, and wha… ▽ More

    Submitted 2 April, 2019; v1 submitted 27 November, 2018; originally announced November 2018.

    Comments: To appear in STOC 2019

  35. arXiv:1805.08321  [pdf, other

    cs.LG cs.DS cs.IT stat.CO stat.ML

    Bandit-Based Monte Carlo Optimization for Nearest Neighbors

    Authors: Vivek Bagaria, Tavor Z. Baharav, Govinda M. Kamath, David N. Tse

    Abstract: The celebrated Monte Carlo method estimates an expensive-to-compute quantity by random sampling. Bandit-based Monte Carlo optimization is a general technique for computing the minimum of many such expensive-to-compute quantities by adaptive random sampling. The technique converts an optimization problem into a statistical estimation problem which is then solved via multi-armed bandits. We apply th… ▽ More

    Submitted 28 April, 2021; v1 submitted 21 May, 2018; originally announced May 2018.

    Comments: Accepted to the IEEE Journal on Selected Areas in Information Theory (JSAIT) - Special Issue on Sequential, Active, and Reinforcement Learning

  36. arXiv:1805.00216  [pdf, other

    cs.DS cs.CR cs.LG stat.ML

    Privately Learning High-Dimensional Distributions

    Authors: Gautam Kamath, Jerry Li, Vikrant Singhal, Jonathan Ullman

    Abstract: We present novel, computationally efficient, and differentially private algorithms for two fundamental high-dimensional learning problems: learning a multivariate Gaussian and learning a product distribution over the Boolean hypercube in total variation distance. The sample complexity of our algorithms nearly matches the sample complexity of the optimal non-private learners for these tasks in a wi… ▽ More

    Submitted 30 May, 2019; v1 submitted 1 May, 2018; originally announced May 2018.

    Comments: To appear in COLT 2019

  37. arXiv:1803.02815  [pdf, other

    cs.LG cs.AI cs.DS stat.ML

    Sever: A Robust Meta-Algorithm for Stochastic Optimization

    Authors: Ilias Diakonikolas, Gautam Kamath, Daniel M. Kane, Jerry Li, Jacob Steinhardt, Alistair Stewart

    Abstract: In high dimensions, most machine learning methods are brittle to even a small fraction of structured outliers. To address this, we introduce a new meta-algorithm that can take in a base learner such as least squares or stochastic gradient descent, and harden the learner to be resistant to outliers. Our method, Sever, possesses strong theoretical guarantees yet is also highly scalable -- beyond run… ▽ More

    Submitted 29 May, 2019; v1 submitted 7 March, 2018; originally announced March 2018.

    Comments: To appear in ICML 2019

  38. arXiv:1802.07229  [pdf, other

    cs.LG cs.DS stat.ML

    Actively Avoiding Nonsense in Generative Models

    Authors: Steve Hanneke, Adam Kalai, Gautam Kamath, Christos Tzamos

    Abstract: A generative model may generate utter nonsense when it is fit to maximize the likelihood of observed data. This happens due to "model error," i.e., when the true data generating distribution does not fit within the class of generative models being learned. To address this, we propose a model of active distribution learning using a binary invalidity oracle that identifies some examples as clearly i… ▽ More

    Submitted 20 February, 2018; originally announced February 2018.

  39. arXiv:1711.00817  [pdf, other

    stat.ML cs.DS cs.IT cs.LG

    Medoids in almost linear time via multi-armed bandits

    Authors: Vivek Bagaria, Govinda M. Kamath, Vasilis Ntranos, Martin J. Zhang, David Tse

    Abstract: Computing the medoid of a large number of points in high-dimensional space is an increasingly common operation in many data science problems. We present an algorithm Med-dit which uses O(n log n) distance evaluations to compute the medoid with high probability. Med-dit is based on a connection with the multi-armed bandit problem. We evaluate the performance of Med-dit empirically on the Netflix-pr… ▽ More

    Submitted 7 November, 2017; v1 submitted 2 November, 2017; originally announced November 2017.

  40. arXiv:1710.04170  [pdf, ps, other

    math.PR cs.LG math-ph math.ST stat.ML

    Concentration of Multilinear Functions of the Ising Model with Applications to Network Data

    Authors: Constantinos Daskalakis, Nishanth Dikkala, Gautam Kamath

    Abstract: We prove near-tight concentration of measure for polynomial functions of the Ising model under high temperature. For any degree $d$, we show that a degree-$d$ polynomial of a $n$-spin Ising model exhibits exponential tails that scale as $\exp(-r^{2/d})$ at radius $r=\tildeΩ_d(n^{d/2})$. Our concentration radius is optimal up to logarithmic factors for constant $d$, improving known results by polyn… ▽ More

    Submitted 11 October, 2017; originally announced October 2017.

    Comments: To appear in NIPS 2017

  41. arXiv:1704.03866  [pdf, ps, other

    cs.DS cs.IT cs.LG math.ST stat.ML

    Robustly Learning a Gaussian: Getting Optimal Error, Efficiently

    Authors: Ilias Diakonikolas, Gautam Kamath, Daniel M. Kane, Jerry Li, Ankur Moitra, Alistair Stewart

    Abstract: We study the fundamental problem of learning the parameters of a high-dimensional Gaussian in the presence of noise -- where an $\varepsilon$-fraction of our samples were chosen by an adversary. We give robust estimators that achieve estimation error $O(\varepsilon)$ in the total variation distance, which is optimal up to a universal constant that is independent of the dimension. In the case whe… ▽ More

    Submitted 5 November, 2017; v1 submitted 12 April, 2017; originally announced April 2017.

    Comments: To appear in SODA 2018

  42. arXiv:1703.00893  [pdf, other

    cs.LG cs.DS cs.IT stat.ML

    Being Robust (in High Dimensions) Can Be Practical

    Authors: Ilias Diakonikolas, Gautam Kamath, Daniel M. Kane, Jerry Li, Ankur Moitra, Alistair Stewart

    Abstract: Robust estimation is much more challenging in high dimensions than it is in one dimension: Most techniques either lead to intractable optimization problems or estimators that can tolerate only a tiny fraction of errors. Recent work in theoretical computer science has shown that, in appropriate distributional models, it is possible to robustly estimate the mean and covariance with polynomial time a… ▽ More

    Submitted 13 March, 2018; v1 submitted 2 March, 2017; originally announced March 2017.

    Comments: Appeared in ICML 2017

  43. arXiv:1604.06443  [pdf, ps, other

    cs.DS cs.IT cs.LG math.ST stat.ML

    Robust Estimators in High Dimensions without the Computational Intractability

    Authors: Ilias Diakonikolas, Gautam Kamath, Daniel Kane, Jerry Li, Ankur Moitra, Alistair Stewart

    Abstract: We study high-dimensional distribution learning in an agnostic setting where an adversary is allowed to arbitrarily corrupt an $\varepsilon$-fraction of the samples. Such questions have a rich history spanning statistics, machine learning and theoretical computer science. Even in the most basic settings, the only known approaches are either computationally inefficient or lose dimension-dependent f… ▽ More

    Submitted 14 March, 2019; v1 submitted 21 April, 2016; originally announced April 2016.

  44. arXiv:1502.01975  [pdf, other

    cs.IT cs.CE q-bio.GN stat.AP

    Optimal Haplotype Assembly from High-Throughput Mate-Pair Reads

    Authors: Govinda M. Kamath, Eren Şaşoğlu, David Tse

    Abstract: Humans have $23$ pairs of homologous chromosomes. The homologous pairs are almost identical pairs of chromosomes. For the most part, differences in homologous chromosome occur at certain documented positions called single nucleotide polymorphisms (SNPs). A haplotype of an individual is the pair of sequences of SNPs on the two homologous chromosomes. In this paper, we study the problem of inferring… ▽ More

    Submitted 6 February, 2015; originally announced February 2015.

    Comments: 10 pages, 4 figures, Submitted to ISIT 2015