-
Private Mean Estimation with Person-Level Differential Privacy
Authors:
Sushant Agarwal,
Gautam Kamath,
Mahbod Majid,
Argyris Mouzakis,
Rose Silver,
Jonathan Ullman
Abstract:
We study differentially private (DP) mean estimation in the case where each person holds multiple samples. Commonly referred to as the "user-level" setting, DP here requires the usual notion of distributional stability when all of a person's datapoints can be modified. Informally, if $n$ people each have $m$ samples from an unknown $d$-dimensional distribution with bounded $k$-th moments, we show…
▽ More
We study differentially private (DP) mean estimation in the case where each person holds multiple samples. Commonly referred to as the "user-level" setting, DP here requires the usual notion of distributional stability when all of a person's datapoints can be modified. Informally, if $n$ people each have $m$ samples from an unknown $d$-dimensional distribution with bounded $k$-th moments, we show that
\[n = \tilde Θ\left(\frac{d}{α^2 m} + \frac{d }{ αm^{1/2} \varepsilon} + \frac{d}{α^{k/(k-1)} m \varepsilon} + \frac{d}{\varepsilon}\right)\]
people are necessary and sufficient to estimate the mean up to distance $α$ in $\ell_2$-norm under $\varepsilon$-differential privacy (and its common relaxations). In the multivariate setting, we give computationally efficient algorithms under approximate DP (with slightly degraded sample complexity) and computationally inefficient algorithms under pure DP, and our nearly matching lower bounds hold for the most permissive case of approximate DP. Our computationally efficient estimators are based on the well known noisy-clipped-mean approach, but the analysis for our setting requires new bounds on the tails of sums of independent, vector-valued, bounded-moments random variables, and a new argument for bounding the bias introduced by clip**.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Not All Learnable Distribution Classes are Privately Learnable
Authors:
Mark Bun,
Gautam Kamath,
Argyris Mouzakis,
Vikrant Singhal
Abstract:
We give an example of a class of distributions that is learnable in total variation distance with a finite number of samples, but not learnable under $(\varepsilon, δ)$-differential privacy. This refutes a conjecture of Ashtiani.
We give an example of a class of distributions that is learnable in total variation distance with a finite number of samples, but not learnable under $(\varepsilon, δ)$-differential privacy. This refutes a conjecture of Ashtiani.
△ Less
Submitted 5 February, 2024; v1 submitted 31 January, 2024;
originally announced February 2024.
-
A Bias-Variance-Privacy Trilemma for Statistical Estimation
Authors:
Gautam Kamath,
Argyris Mouzakis,
Matthew Regehr,
Vikrant Singhal,
Thomas Steinke,
Jonathan Ullman
Abstract:
The canonical algorithm for differentially private mean estimation is to first clip the samples to a bounded range and then add noise to their empirical mean. Clip** controls the sensitivity and, hence, the variance of the noise that we add for privacy. But clip** also introduces statistical bias. We prove that this tradeoff is inherent: no algorithm can simultaneously have low bias, low varia…
▽ More
The canonical algorithm for differentially private mean estimation is to first clip the samples to a bounded range and then add noise to their empirical mean. Clip** controls the sensitivity and, hence, the variance of the noise that we add for privacy. But clip** also introduces statistical bias. We prove that this tradeoff is inherent: no algorithm can simultaneously have low bias, low variance, and low privacy loss for arbitrary distributions.
On the positive side, we show that unbiased mean estimation is possible under approximate differential privacy if we assume that the distribution is symmetric. Furthermore, we show that, even if we assume that the data is sampled from a Gaussian, unbiased mean estimation is impossible under pure or concentrated differential privacy.
△ Less
Submitted 28 February, 2023; v1 submitted 30 January, 2023;
originally announced January 2023.
-
New Lower Bounds for Private Estimation and a Generalized Fingerprinting Lemma
Authors:
Gautam Kamath,
Argyris Mouzakis,
Vikrant Singhal
Abstract:
We prove new lower bounds for statistical estimation tasks under the constraint of $(\varepsilon, δ)$-differential privacy. First, we provide tight lower bounds for private covariance estimation of Gaussian distributions. We show that estimating the covariance matrix in Frobenius norm requires $Ω(d^2)$ samples, and in spectral norm requires $Ω(d^{3/2})$ samples, both matching upper bounds up to lo…
▽ More
We prove new lower bounds for statistical estimation tasks under the constraint of $(\varepsilon, δ)$-differential privacy. First, we provide tight lower bounds for private covariance estimation of Gaussian distributions. We show that estimating the covariance matrix in Frobenius norm requires $Ω(d^2)$ samples, and in spectral norm requires $Ω(d^{3/2})$ samples, both matching upper bounds up to logarithmic factors. The latter bound verifies the existence of a conjectured statistical gap between the private and the non-private sample complexities for spectral estimation of Gaussian covariances. We prove these bounds via our main technical contribution, a broad generalization of the fingerprinting method to exponential families. Additionally, using the private Assouad method of Acharya, Sun, and Zhang, we show a tight $Ω(d/(α^2 \varepsilon))$ lower bound for estimating the mean of a distribution with bounded covariance to $α$-error in $\ell_2$-distance. Prior known lower bounds for all these problems were either polynomially weaker or held under the stricter condition of $(\varepsilon, 0)$-differential privacy.
△ Less
Submitted 28 March, 2023; v1 submitted 17 May, 2022;
originally announced May 2022.
-
A Private and Computationally-Efficient Estimator for Unbounded Gaussians
Authors:
Gautam Kamath,
Argyris Mouzakis,
Vikrant Singhal,
Thomas Steinke,
Jonathan Ullman
Abstract:
We give the first polynomial-time, polynomial-sample, differentially private estimator for the mean and covariance of an arbitrary Gaussian distribution $\mathcal{N}(μ,Σ)$ in $\mathbb{R}^d$. All previous estimators are either nonconstructive, with unbounded running time, or require the user to specify a priori bounds on the parameters $μ$ and $Σ$. The primary new technical tool in our algorithm is…
▽ More
We give the first polynomial-time, polynomial-sample, differentially private estimator for the mean and covariance of an arbitrary Gaussian distribution $\mathcal{N}(μ,Σ)$ in $\mathbb{R}^d$. All previous estimators are either nonconstructive, with unbounded running time, or require the user to specify a priori bounds on the parameters $μ$ and $Σ$. The primary new technical tool in our algorithm is a new differentially private preconditioner that takes samples from an arbitrary Gaussian $\mathcal{N}(0,Σ)$ and returns a matrix $A$ such that $A ΣA^T$ has constant condition number.
△ Less
Submitted 11 February, 2022; v1 submitted 8 November, 2021;
originally announced November 2021.