Search | arXiv e-print repository

arXiv:2405.20405 [pdf, other]

Private Mean Estimation with Person-Level Differential Privacy

Authors: Sushant Agarwal, Gautam Kamath, Mahbod Majid, Argyris Mouzakis, Rose Silver, Jonathan Ullman

Abstract: We study differentially private (DP) mean estimation in the case where each person holds multiple samples. Commonly referred to as the "user-level" setting, DP here requires the usual notion of distributional stability when all of a person's datapoints can be modified. Informally, if $n$ people each have $m$ samples from an unknown $d$-dimensional distribution with bounded $k$-th moments, we show… ▽ More We study differentially private (DP) mean estimation in the case where each person holds multiple samples. Commonly referred to as the "user-level" setting, DP here requires the usual notion of distributional stability when all of a person's datapoints can be modified. Informally, if $n$ people each have $m$ samples from an unknown $d$-dimensional distribution with bounded $k$-th moments, we show that \[n = \tilde Θ\left(\frac{d}{α^2 m} + \frac{d }{ αm^{1/2} \varepsilon} + \frac{d}{α^{k/(k-1)} m \varepsilon} + \frac{d}{\varepsilon}\right)\] people are necessary and sufficient to estimate the mean up to distance $α$ in $\ell_2$-norm under $\varepsilon$-differential privacy (and its common relaxations). In the multivariate setting, we give computationally efficient algorithms under approximate DP (with slightly degraded sample complexity) and computationally inefficient algorithms under pure DP, and our nearly matching lower bounds hold for the most permissive case of approximate DP. Our computationally efficient estimators are based on the well known noisy-clipped-mean approach, but the analysis for our setting requires new bounds on the tails of sums of independent, vector-valued, bounded-moments random variables, and a new argument for bounding the bias introduced by clip**. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: 67 pages, 3 figures

arXiv:2402.00267 [pdf, ps, other]

Not All Learnable Distribution Classes are Privately Learnable

Authors: Mark Bun, Gautam Kamath, Argyris Mouzakis, Vikrant Singhal

Abstract: We give an example of a class of distributions that is learnable in total variation distance with a finite number of samples, but not learnable under $(\varepsilon, δ)$-differential privacy. This refutes a conjecture of Ashtiani. We give an example of a class of distributions that is learnable in total variation distance with a finite number of samples, but not learnable under $(\varepsilon, δ)$-differential privacy. This refutes a conjecture of Ashtiani. △ Less

Submitted 5 February, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

Comments: To appear in ALT 2024. Added a minor clarification to the construction and an acknowledgement of the Fields Institute

arXiv:2301.13334 [pdf, ps, other]

A Bias-Variance-Privacy Trilemma for Statistical Estimation

Authors: Gautam Kamath, Argyris Mouzakis, Matthew Regehr, Vikrant Singhal, Thomas Steinke, Jonathan Ullman

Abstract: The canonical algorithm for differentially private mean estimation is to first clip the samples to a bounded range and then add noise to their empirical mean. Clip** controls the sensitivity and, hence, the variance of the noise that we add for privacy. But clip** also introduces statistical bias. We prove that this tradeoff is inherent: no algorithm can simultaneously have low bias, low varia… ▽ More The canonical algorithm for differentially private mean estimation is to first clip the samples to a bounded range and then add noise to their empirical mean. Clip** controls the sensitivity and, hence, the variance of the noise that we add for privacy. But clip** also introduces statistical bias. We prove that this tradeoff is inherent: no algorithm can simultaneously have low bias, low variance, and low privacy loss for arbitrary distributions. On the positive side, we show that unbiased mean estimation is possible under approximate differential privacy if we assume that the distribution is symmetric. Furthermore, we show that, even if we assume that the data is sampled from a Gaussian, unbiased mean estimation is impossible under pure or concentrated differential privacy. △ Less

Submitted 28 February, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

arXiv:2205.08532 [pdf, ps, other]

New Lower Bounds for Private Estimation and a Generalized Fingerprinting Lemma

Authors: Gautam Kamath, Argyris Mouzakis, Vikrant Singhal

Abstract: We prove new lower bounds for statistical estimation tasks under the constraint of $(\varepsilon, δ)$-differential privacy. First, we provide tight lower bounds for private covariance estimation of Gaussian distributions. We show that estimating the covariance matrix in Frobenius norm requires $Ω(d^2)$ samples, and in spectral norm requires $Ω(d^{3/2})$ samples, both matching upper bounds up to lo… ▽ More We prove new lower bounds for statistical estimation tasks under the constraint of $(\varepsilon, δ)$-differential privacy. First, we provide tight lower bounds for private covariance estimation of Gaussian distributions. We show that estimating the covariance matrix in Frobenius norm requires $Ω(d^2)$ samples, and in spectral norm requires $Ω(d^{3/2})$ samples, both matching upper bounds up to logarithmic factors. The latter bound verifies the existence of a conjectured statistical gap between the private and the non-private sample complexities for spectral estimation of Gaussian covariances. We prove these bounds via our main technical contribution, a broad generalization of the fingerprinting method to exponential families. Additionally, using the private Assouad method of Acharya, Sun, and Zhang, we show a tight $Ω(d/(α^2 \varepsilon))$ lower bound for estimating the mean of a distribution with bounded covariance to $α$-error in $\ell_2$-distance. Prior known lower bounds for all these problems were either polynomially weaker or held under the stricter condition of $(\varepsilon, 0)$-differential privacy. △ Less

Submitted 28 March, 2023; v1 submitted 17 May, 2022; originally announced May 2022.

Comments: NeurIPS 2022. Minor correction to the discussion of independent work

arXiv:2111.04609 [pdf, ps, other]

A Private and Computationally-Efficient Estimator for Unbounded Gaussians

Authors: Gautam Kamath, Argyris Mouzakis, Vikrant Singhal, Thomas Steinke, Jonathan Ullman

Abstract: We give the first polynomial-time, polynomial-sample, differentially private estimator for the mean and covariance of an arbitrary Gaussian distribution $\mathcal{N}(μ,Σ)$ in $\mathbb{R}^d$. All previous estimators are either nonconstructive, with unbounded running time, or require the user to specify a priori bounds on the parameters $μ$ and $Σ$. The primary new technical tool in our algorithm is… ▽ More We give the first polynomial-time, polynomial-sample, differentially private estimator for the mean and covariance of an arbitrary Gaussian distribution $\mathcal{N}(μ,Σ)$ in $\mathbb{R}^d$. All previous estimators are either nonconstructive, with unbounded running time, or require the user to specify a priori bounds on the parameters $μ$ and $Σ$. The primary new technical tool in our algorithm is a new differentially private preconditioner that takes samples from an arbitrary Gaussian $\mathcal{N}(0,Σ)$ and returns a matrix $A$ such that $A ΣA^T$ has constant condition number. △ Less

Submitted 11 February, 2022; v1 submitted 8 November, 2021; originally announced November 2021.

Showing 1–5 of 5 results for author: Mouzakis, A