Search | arXiv e-print repository

Private Geometric Median

Authors: Mahdi Haghifam, Thomas Steinke, Jonathan Ullman

Abstract: In this paper, we study differentially private (DP) algorithms for computing the geometric median (GM) of a dataset: Given $n$ points, $x_1,\dots,x_n$ in $\mathbb{R}^d$, the goal is to find a point $θ$ that minimizes the sum of the Euclidean distances to these points, i.e., $\sum_{i=1}^{n} \|θ- x_i\|_2$. Off-the-shelf methods, such as DP-GD, require strong a priori knowledge locating the data with… ▽ More In this paper, we study differentially private (DP) algorithms for computing the geometric median (GM) of a dataset: Given $n$ points, $x_1,\dots,x_n$ in $\mathbb{R}^d$, the goal is to find a point $θ$ that minimizes the sum of the Euclidean distances to these points, i.e., $\sum_{i=1}^{n} \|θ- x_i\|_2$. Off-the-shelf methods, such as DP-GD, require strong a priori knowledge locating the data within a ball of radius $R$, and the excess risk of the algorithm depends linearly on $R$. In this paper, we ask: can we design an efficient and private algorithm with an excess error guarantee that scales with the (unknown) radius containing the majority of the datapoints? Our main contribution is a pair of polynomial-time DP algorithms for the task of private GM with an excess error guarantee that scales with the effective diameter of the datapoints. Additionally, we propose an inefficient algorithm based on the inverse smooth sensitivity mechanism, which satisfies the more restrictive notion of pure DP. We complement our results with a lower bound and demonstrate the optimality of our polynomial-time algorithms in terms of sample complexity. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: 36 pages

arXiv:2405.20769 [pdf, other]

Avoiding Pitfalls for Privacy Accounting of Subsampled Mechanisms under Composition

Authors: Christian Janos Lebeda, Matthew Regehr, Gautam Kamath, Thomas Steinke

Abstract: We consider the problem of computing tight privacy guarantees for the composition of subsampled differentially private mechanisms. Recent algorithms can numerically compute the privacy parameters to arbitrary precision but must be carefully applied. Our main contribution is to address two common points of confusion. First, some privacy accountants assume that the privacy guarantees for the compo… ▽ More We consider the problem of computing tight privacy guarantees for the composition of subsampled differentially private mechanisms. Recent algorithms can numerically compute the privacy parameters to arbitrary precision but must be carefully applied. Our main contribution is to address two common points of confusion. First, some privacy accountants assume that the privacy guarantees for the composition of a subsampled mechanism are determined by self-composing the worst-case datasets for the uncomposed mechanism. We show that this is not true in general. Second, Poisson subsampling is sometimes assumed to have similar privacy guarantees compared to sampling without replacement. We show that the privacy guarantees may in fact differ significantly between the two sampling schemes. In particular, we give an example of hyperparameters that result in $\varepsilon \approx 1$ for Poisson subsampling and $\varepsilon > 10$ for sampling without replacement. This occurs for some parameters that could realistically be chosen for DP-SGD. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2404.16706 [pdf, other]

Efficient and Near-Optimal Noise Generation for Streaming Differential Privacy

Authors: Krishnamurthy Dvijotham, H. Brendan McMahan, Krishna Pillutla, Thomas Steinke, Abhradeep Thakurta

Abstract: In the task of differentially private (DP) continual counting, we receive a stream of increments and our goal is to output an approximate running total of these increments, without revealing too much about any specific increment. Despite its simplicity, differentially private continual counting has attracted significant attention both in theory and in practice. Existing algorithms for differential… ▽ More In the task of differentially private (DP) continual counting, we receive a stream of increments and our goal is to output an approximate running total of these increments, without revealing too much about any specific increment. Despite its simplicity, differentially private continual counting has attracted significant attention both in theory and in practice. Existing algorithms for differentially private continual counting are either inefficient in terms of their space usage or add an excessive amount of noise, inducing suboptimal utility. The most practical DP continual counting algorithms add carefully correlated Gaussian noise to the values. The task of choosing the covariance for this noise can be expressed in terms of factoring the lower-triangular matrix of ones (which computes prefix sums). We present two approaches from this class (for different parameter regimes) that achieve near-optimal utility for DP continual counting and only require logarithmic or polylogarithmic space (and time). Our first approach is based on a space-efficient streaming matrix multiplication algorithm for a class of Toeplitz matrices. We show that to instantiate this algorithm for DP continual counting, it is sufficient to find a low-degree rational function that approximates the square root on a circle in the complex plane. We then apply and extend tools from approximation theory to achieve this. We also derive efficient closed-forms for the objective function for arbitrarily many steps, and show direct numerical optimization yields a highly practical solution to the problem. Our second approach combines our first approach with a recursive construction similar to the binary tree mechanism. △ Less

Submitted 6 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

arXiv:2403.06634 [pdf, other]

Stealing Part of a Production Language Model

Authors: Nicholas Carlini, Daniel Paleka, Krishnamurthy Dj Dvijotham, Thomas Steinke, Jonathan Hayase, A. Feder Cooper, Katherine Lee, Matthew Jagielski, Milad Nasr, Arthur Conmy, Itay Yona, Eric Wallace, David Rolnick, Florian Tramèr

Abstract: We introduce the first model-stealing attack that extracts precise, nontrivial information from black-box production language models like OpenAI's ChatGPT or Google's PaLM-2. Specifically, our attack recovers the embedding projection layer (up to symmetries) of a transformer model, given typical API access. For under \… ▽ More We introduce the first model-stealing attack that extracts precise, nontrivial information from black-box production language models like OpenAI's ChatGPT or Google's PaLM-2. Specifically, our attack recovers the embedding projection layer (up to symmetries) of a transformer model, given typical API access. For under \$20 USD, our attack extracts the entire projection matrix of OpenAI's Ada and Babbage language models. We thereby confirm, for the first time, that these black-box models have a hidden dimension of 1024 and 2048, respectively. We also recover the exact hidden dimension size of the gpt-3.5-turbo model, and estimate it would cost under $2,000 in queries to recover the entire projection matrix. We conclude with potential defenses and mitigations, and discuss the implications of possible future work that could extend our attack. △ Less

Submitted 9 July, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

arXiv:2311.07206 [pdf, other]

Efficient adaptivity for simulating cardiac electrophysiology with spectral deferred correction methods

Authors: Fatemeh Chegini, Thomas Steinke, Martin Weiser

Abstract: The locality of solution features in cardiac electrophysiology simulations calls for adaptive methods. Due to the overhead incurred by established mesh refinement and coarsening, however, such approaches failed in accelerating the computations. Here we investigate a different route to spatial adaptivity that is based on nested subset selection for algebraic degrees of freedom in spectral deferred… ▽ More The locality of solution features in cardiac electrophysiology simulations calls for adaptive methods. Due to the overhead incurred by established mesh refinement and coarsening, however, such approaches failed in accelerating the computations. Here we investigate a different route to spatial adaptivity that is based on nested subset selection for algebraic degrees of freedom in spectral deferred correction methods. This combination of algebraic adaptivity and iterative solvers for higher order collocation time step** realizes a multirate integration with minimal overhead. This leads to moderate but significant speedups in both monodomain and cell-by-cell models of cardiac excitation, as demonstrated at four numerical examples. △ Less

Submitted 13 November, 2023; originally announced November 2023.

Comments: 12 pages, 12 figures

MSC Class: 65M60; 65M70; 65M50; 92C30

arXiv:2310.15526 [pdf, other]

Privacy Amplification for Matrix Mechanisms

Authors: Christopher A. Choquette-Choo, Arun Ganesh, Thomas Steinke, Abhradeep Thakurta

Abstract: Privacy amplification exploits randomness in data selection to provide tighter differential privacy (DP) guarantees. This analysis is key to DP-SGD's success in machine learning, but, is not readily applicable to the newer state-of-the-art algorithms. This is because these algorithms, known as DP-FTRL, use the matrix mechanism to add correlated noise instead of independent noise as in DP-SGD. In… ▽ More Privacy amplification exploits randomness in data selection to provide tighter differential privacy (DP) guarantees. This analysis is key to DP-SGD's success in machine learning, but, is not readily applicable to the newer state-of-the-art algorithms. This is because these algorithms, known as DP-FTRL, use the matrix mechanism to add correlated noise instead of independent noise as in DP-SGD. In this paper, we propose "MMCC", the first algorithm to analyze privacy amplification via sampling for any generic matrix mechanism. MMCC is nearly tight in that it approaches a lower bound as $ε\to0$. To analyze correlated outputs in MMCC, we prove that they can be analyzed as if they were independent, by conditioning them on prior outputs. Our "conditional composition theorem" has broad utility: we use it to show that the noise added to binary-tree-DP-FTRL can asymptotically match the noise added to DP-SGD with amplification. Our amplification algorithm also has practical empirical utility: we show it leads to significant improvement in the privacy-utility trade-offs for DP-FTRL algorithms on standard benchmarks. △ Less

Submitted 4 May, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

Comments: Appearing in ICLR 2024. Changes made to match the conference version of the paper

arXiv:2310.06771 [pdf, other]

Correlated Noise Provably Beats Independent Noise for Differentially Private Learning

Authors: Christopher A. Choquette-Choo, Krishnamurthy Dvijotham, Krishna Pillutla, Arun Ganesh, Thomas Steinke, Abhradeep Thakurta

Abstract: Differentially private learning algorithms inject noise into the learning process. While the most common private learning algorithm, DP-SGD, adds independent Gaussian noise in each iteration, recent work on matrix factorization mechanisms has shown empirically that introducing correlations in the noise can greatly improve their utility. We characterize the asymptotic learning utility for any choic… ▽ More Differentially private learning algorithms inject noise into the learning process. While the most common private learning algorithm, DP-SGD, adds independent Gaussian noise in each iteration, recent work on matrix factorization mechanisms has shown empirically that introducing correlations in the noise can greatly improve their utility. We characterize the asymptotic learning utility for any choice of the correlation function, giving precise analytical bounds for linear regression and as the solution to a convex program for general convex functions. We show, using these bounds, how correlated noise provably improves upon vanilla DP-SGD as a function of problem parameters such as the effective dimension and condition number. Moreover, our analytical expression for the near-optimal correlation function circumvents the cubic complexity of the semi-definite program used to optimize the noise correlation matrix in previous work. We validate our theory with experiments on private deep learning. Our work matches or outperforms prior work while being efficient both in terms of compute and memory. △ Less

Submitted 7 May, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

Comments: Christopher A. Choquette-Choo, Krishnamurthy Dvijotham, and Krishna Pillutla contributed equally

Journal ref: ICLR 2024

arXiv:2308.12947 [pdf, ps, other]

Counting Distinct Elements Under Person-Level Differential Privacy

Authors: Alexander Knop, Thomas Steinke

Abstract: We study the problem of counting the number of distinct elements in a dataset subject to the constraint of differential privacy. We consider the challenging setting of person-level DP (a.k.a. user-level DP) where each person may contribute an unbounded number of items and hence the sensitivity is unbounded. Our approach is to compute a bounded-sensitivity version of this query, which reduces to… ▽ More We study the problem of counting the number of distinct elements in a dataset subject to the constraint of differential privacy. We consider the challenging setting of person-level DP (a.k.a. user-level DP) where each person may contribute an unbounded number of items and hence the sensitivity is unbounded. Our approach is to compute a bounded-sensitivity version of this query, which reduces to solving a max-flow problem. The sensitivity bound is optimized to balance the noise we must add to privatize the answer against the error of the approximation of the bounded-sensitivity query to the true number of unique elements. △ Less

Submitted 27 October, 2023; v1 submitted 24 August, 2023; originally announced August 2023.

arXiv:2305.13440 [pdf, ps, other]

Differentially Private Medians and Interior Points for Non-Pathological Data

Authors: Maryam Aliakbarpour, Rose Silver, Thomas Steinke, Jonathan Ullman

Abstract: We construct differentially private estimators with low sample complexity that estimate the median of an arbitrary distribution over $\mathbb{R}$ satisfying very mild moment conditions. Our result stands in contrast to the surprising negative result of Bun et al. (FOCS 2015) that showed there is no differentially private estimator with any finite sample complexity that returns any non-trivial appr… ▽ More We construct differentially private estimators with low sample complexity that estimate the median of an arbitrary distribution over $\mathbb{R}$ satisfying very mild moment conditions. Our result stands in contrast to the surprising negative result of Bun et al. (FOCS 2015) that showed there is no differentially private estimator with any finite sample complexity that returns any non-trivial approximation to the median of an arbitrary distribution. △ Less

Submitted 22 May, 2023; originally announced May 2023.

arXiv:2305.13209 [pdf, other]

Faster Differentially Private Convex Optimization via Second-Order Methods

Authors: Arun Ganesh, Mahdi Haghifam, Thomas Steinke, Abhradeep Thakurta

Abstract: Differentially private (stochastic) gradient descent is the workhorse of DP private machine learning in both the convex and non-convex settings. Without privacy constraints, second-order methods, like Newton's method, converge faster than first-order methods like gradient descent. In this work, we investigate the prospect of using the second-order information from the loss function to accelerate D… ▽ More Differentially private (stochastic) gradient descent is the workhorse of DP private machine learning in both the convex and non-convex settings. Without privacy constraints, second-order methods, like Newton's method, converge faster than first-order methods like gradient descent. In this work, we investigate the prospect of using the second-order information from the loss function to accelerate DP convex optimization. We first develop a private variant of the regularized cubic Newton method of Nesterov and Polyak, and show that for the class of strongly convex loss functions, our algorithm has quadratic convergence and achieves the optimal excess loss. We then design a practical second-order DP algorithm for the unconstrained logistic regression problem. We theoretically and empirically study the performance of our algorithm. Empirical results show our algorithm consistently achieves the best excess loss compared to other baselines and is 10-40x faster than DP-GD/DP-SGD. △ Less

Submitted 22 May, 2023; originally announced May 2023.

arXiv:2305.08846 [pdf, other]

Privacy Auditing with One (1) Training Run

Authors: Thomas Steinke, Milad Nasr, Matthew Jagielski

Abstract: We propose a scheme for auditing differentially private machine learning systems with a single training run. This exploits the parallelism of being able to add or remove multiple training examples independently. We analyze this using the connection between differential privacy and statistical generalization, which avoids the cost of group privacy. Our auditing scheme requires minimal assumptions a… ▽ More We propose a scheme for auditing differentially private machine learning systems with a single training run. This exploits the parallelism of being able to add or remove multiple training examples independently. We analyze this using the connection between differential privacy and statistical generalization, which avoids the cost of group privacy. Our auditing scheme requires minimal assumptions about the algorithm and can be applied in the black-box or white-box setting. △ Less

Submitted 15 May, 2023; originally announced May 2023.

arXiv:2303.18086 [pdf, other]

Differentially Private Stream Processing at Scale

Authors: Bing Zhang, Vadym Doroshenko, Peter Kairouz, Thomas Steinke, Abhradeep Thakurta, Ziyin Ma, Eidan Cohen, Himani Apte, Jodi Spacek

Abstract: We design, to the best of our knowledge, the first differentially private (DP) stream aggregation processing system at scale. Our system -- Differential Privacy SQL Pipelines (DP-SQLP) -- is built using a streaming framework similar to Spark streaming, and is built on top of the Spanner database and the F1 query engine from Google. Towards designing DP-SQLP we make both algorithmic and systemic… ▽ More We design, to the best of our knowledge, the first differentially private (DP) stream aggregation processing system at scale. Our system -- Differential Privacy SQL Pipelines (DP-SQLP) -- is built using a streaming framework similar to Spark streaming, and is built on top of the Spanner database and the F1 query engine from Google. Towards designing DP-SQLP we make both algorithmic and systemic advances, namely, we (i) design a novel (user-level) DP key selection algorithm that can operate on an unbounded set of possible keys, and can scale to one billion keys that users have contributed, (ii) design a preemptive execution scheme for DP key selection that avoids enumerating all the keys at each triggering time, and (iii) use algorithmic techniques from DP continual observation to release a continual DP histogram of user contributions to different keys over the stream length. We empirically demonstrate the efficacy by obtaining at least $16\times$ reduction in error over meaningful baselines we consider. We implemented a streaming differentially private user impressions for Google Shop** with DP-SQLP. The streaming DP algorithms are further applied to Google Trends. △ Less

Submitted 4 April, 2024; v1 submitted 31 March, 2023; originally announced March 2023.

arXiv:2302.09483 [pdf, other]

Why Is Public Pretraining Necessary for Private Model Training?

Authors: Arun Ganesh, Mahdi Haghifam, Milad Nasr, Sewoong Oh, Thomas Steinke, Om Thakkar, Abhradeep Thakurta, Lun Wang

Abstract: In the privacy-utility tradeoff of a model trained on benchmark language and vision tasks, remarkable improvements have been widely reported with the use of pretraining on publicly available data. This is in part due to the benefits of transfer learning, which is the standard motivation for pretraining in non-private settings. However, the stark contrast in the improvement achieved through pretrai… ▽ More In the privacy-utility tradeoff of a model trained on benchmark language and vision tasks, remarkable improvements have been widely reported with the use of pretraining on publicly available data. This is in part due to the benefits of transfer learning, which is the standard motivation for pretraining in non-private settings. However, the stark contrast in the improvement achieved through pretraining under privacy compared to non-private settings suggests that there may be a deeper, distinct cause driving these gains. To explain this phenomenon, we hypothesize that the non-convex loss landscape of a model training necessitates an optimization algorithm to go through two phases. In the first, the algorithm needs to select a good "basin" in the loss landscape. In the second, the algorithm solves an easy optimization within that basin. The former is a harder problem to solve with private data, while the latter is harder to solve with public data due to a distribution shift or data scarcity. Guided by this intuition, we provide theoretical constructions that provably demonstrate the separation between private training with and without public pretraining. Further, systematic experiments on CIFAR10 and LibriSpeech provide supporting evidence for our hypothesis. △ Less

Submitted 19 February, 2023; originally announced February 2023.

arXiv:2302.07956 [pdf, other]

Tight Auditing of Differentially Private Machine Learning

Authors: Milad Nasr, Jamie Hayes, Thomas Steinke, Borja Balle, Florian Tramèr, Matthew Jagielski, Nicholas Carlini, Andreas Terzis

Abstract: Auditing mechanisms for differential privacy use probabilistic means to empirically estimate the privacy level of an algorithm. For private machine learning, existing auditing mechanisms are tight: the empirical privacy estimate (nearly) matches the algorithm's provable privacy guarantee. But these auditing techniques suffer from two limitations. First, they only give tight estimates under implaus… ▽ More Auditing mechanisms for differential privacy use probabilistic means to empirically estimate the privacy level of an algorithm. For private machine learning, existing auditing mechanisms are tight: the empirical privacy estimate (nearly) matches the algorithm's provable privacy guarantee. But these auditing techniques suffer from two limitations. First, they only give tight estimates under implausible worst-case assumptions (e.g., a fully adversarial dataset). Second, they require thousands or millions of training runs to produce non-trivial statistical estimates of the privacy leakage. This work addresses both issues. We design an improved auditing scheme that yields tight privacy estimates for natural (not adversarially crafted) datasets -- if the adversary can see all model updates during training. Prior auditing works rely on the same assumption, which is permitted under the standard differential privacy threat model. This threat model is also applicable, e.g., in federated learning settings. Moreover, our auditing scheme requires only two training runs (instead of thousands) to produce tight privacy estimates, by adapting recent advances in tight composition theorems for differential privacy. We demonstrate the utility of our improved auditing schemes by surfacing implementation bugs in private machine learning code that eluded prior auditing techniques. △ Less

Submitted 15 February, 2023; originally announced February 2023.

arXiv:2301.13334 [pdf, ps, other]

A Bias-Variance-Privacy Trilemma for Statistical Estimation

Authors: Gautam Kamath, Argyris Mouzakis, Matthew Regehr, Vikrant Singhal, Thomas Steinke, Jonathan Ullman

Abstract: The canonical algorithm for differentially private mean estimation is to first clip the samples to a bounded range and then add noise to their empirical mean. Clip** controls the sensitivity and, hence, the variance of the noise that we add for privacy. But clip** also introduces statistical bias. We prove that this tradeoff is inherent: no algorithm can simultaneously have low bias, low varia… ▽ More The canonical algorithm for differentially private mean estimation is to first clip the samples to a bounded range and then add noise to their empirical mean. Clip** controls the sensitivity and, hence, the variance of the noise that we add for privacy. But clip** also introduces statistical bias. We prove that this tradeoff is inherent: no algorithm can simultaneously have low bias, low variance, and low privacy loss for arbitrary distributions. On the positive side, we show that unbiased mean estimation is possible under approximate differential privacy if we assume that the distribution is symmetric. Furthermore, we show that, even if we assume that the data is sampled from a Gaussian, unbiased mean estimation is impossible under pure or concentrated differential privacy. △ Less

Submitted 28 February, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

arXiv:2210.00597 [pdf, other]

Composition of Differential Privacy & Privacy Amplification by Subsampling

Authors: Thomas Steinke

Abstract: This chapter is meant to be part of the book "Differential Privacy for Artificial Intelligence Applications." We give an introduction to the most important property of differential privacy -- composition: running multiple independent analyses on the data of a set of people will still be differentially private as long as each of the analyses is private on its own -- as well as the related topic of… ▽ More This chapter is meant to be part of the book "Differential Privacy for Artificial Intelligence Applications." We give an introduction to the most important property of differential privacy -- composition: running multiple independent analyses on the data of a set of people will still be differentially private as long as each of the analyses is private on its own -- as well as the related topic of privacy amplification by subsampling. This chapter introduces the basic concepts and gives proofs of the key results needed to apply these tools in practice. △ Less

Submitted 26 October, 2022; v1 submitted 2 October, 2022; originally announced October 2022.

arXiv:2209.04053 [pdf, ps, other]

Algorithms with More Granular Differential Privacy Guarantees

Authors: Badih Ghazi, Ravi Kumar, Pasin Manurangsi, Thomas Steinke

Abstract: Differential privacy is often applied with a privacy parameter that is larger than the theory suggests is ideal; various informal justifications for tolerating large privacy parameters have been proposed. In this work, we consider partial differential privacy (DP), which allows quantifying the privacy guarantee on a per-attribute basis. In this framework, we study several basic data analysis and l… ▽ More Differential privacy is often applied with a privacy parameter that is larger than the theory suggests is ideal; various informal justifications for tolerating large privacy parameters have been proposed. In this work, we consider partial differential privacy (DP), which allows quantifying the privacy guarantee on a per-attribute basis. In this framework, we study several basic data analysis and learning tasks, and design algorithms whose per-attribute privacy parameter is smaller that the best possible privacy parameter for the entire record of a person (i.e., all the attributes). △ Less

Submitted 8 September, 2022; originally announced September 2022.

arXiv:2202.12219 [pdf, other]

Debugging Differential Privacy: A Case Study for Privacy Auditing

Authors: Florian Tramer, Andreas Terzis, Thomas Steinke, Shuang Song, Matthew Jagielski, Nicholas Carlini

Abstract: Differential Privacy can provide provable privacy guarantees for training data in machine learning. However, the presence of proofs does not preclude the presence of errors. Inspired by recent advances in auditing which have been used for estimating lower bounds on differentially private algorithms, here we show that auditing can also be used to find flaws in (purportedly) differentially private s… ▽ More Differential Privacy can provide provable privacy guarantees for training data in machine learning. However, the presence of proofs does not preclude the presence of errors. Inspired by recent advances in auditing which have been used for estimating lower bounds on differentially private algorithms, here we show that auditing can also be used to find flaws in (purportedly) differentially private schemes. In this case study, we audit a recent open source implementation of a differentially private deep learning algorithm and find, with 99.99999999% confidence, that the implementation does not satisfy the claimed differential privacy guarantee. △ Less

Submitted 28 March, 2022; v1 submitted 24 February, 2022; originally announced February 2022.

arXiv:2112.00193 [pdf, other]

Public Data-Assisted Mirror Descent for Private Model Training

Authors: Ehsan Amid, Arun Ganesh, Rajiv Mathews, Swaroop Ramaswamy, Shuang Song, Thomas Steinke, Vinith M. Suriyakumar, Om Thakkar, Abhradeep Thakurta

Abstract: In this paper, we revisit the problem of using in-distribution public data to improve the privacy/utility trade-offs for differentially private (DP) model training. (Here, public data refers to auxiliary data sets that have no privacy concerns.) We design a natural variant of DP mirror descent, where the DP gradients of the private/sensitive data act as the linear term, and the loss generated by t… ▽ More In this paper, we revisit the problem of using in-distribution public data to improve the privacy/utility trade-offs for differentially private (DP) model training. (Here, public data refers to auxiliary data sets that have no privacy concerns.) We design a natural variant of DP mirror descent, where the DP gradients of the private/sensitive data act as the linear term, and the loss generated by the public data as the mirror map. We show that, for linear regression with feature vectors drawn from a non-isotropic sub-Gaussian distribution, our algorithm, PDA-DPMD (a variant of mirror descent), provides population risk guarantees that are asymptotically better than the best known guarantees under DP (without having access to public data), when the number of public data samples ($n_{\sf pub}$) is sufficiently large. We further show that our algorithm has natural "noise stability" properties that control the variance due to noise added to ensure DP. We demonstrate the efficacy of our algorithm by showing privacy/utility trade-offs on four benchmark datasets (StackOverflow, WikiText-2, CIFAR-10, and EMNIST). We show that our algorithm not only significantly improves over traditional DP-SGD, which does not have access to public data, but to our knowledge is the first to improve over DP-SGD on models that have been pre-trained with public data. △ Less

Submitted 27 March, 2022; v1 submitted 30 November, 2021; originally announced December 2021.

Comments: 20 pages, 8 figures, 3 tables

arXiv:2111.04609 [pdf, ps, other]

A Private and Computationally-Efficient Estimator for Unbounded Gaussians

Authors: Gautam Kamath, Argyris Mouzakis, Vikrant Singhal, Thomas Steinke, Jonathan Ullman

Abstract: We give the first polynomial-time, polynomial-sample, differentially private estimator for the mean and covariance of an arbitrary Gaussian distribution $\mathcal{N}(μ,Σ)$ in $\mathbb{R}^d$. All previous estimators are either nonconstructive, with unbounded running time, or require the user to specify a priori bounds on the parameters $μ$ and $Σ$. The primary new technical tool in our algorithm is… ▽ More We give the first polynomial-time, polynomial-sample, differentially private estimator for the mean and covariance of an arbitrary Gaussian distribution $\mathcal{N}(μ,Σ)$ in $\mathbb{R}^d$. All previous estimators are either nonconstructive, with unbounded running time, or require the user to specify a priori bounds on the parameters $μ$ and $Σ$. The primary new technical tool in our algorithm is a new differentially private preconditioner that takes samples from an arbitrary Gaussian $\mathcal{N}(0,Σ)$ and returns a matrix $A$ such that $A ΣA^T$ has constant condition number. △ Less

Submitted 11 February, 2022; v1 submitted 8 November, 2021; originally announced November 2021.

arXiv:2110.03620 [pdf, other]

Hyperparameter Tuning with Renyi Differential Privacy

Authors: Nicolas Papernot, Thomas Steinke

Abstract: For many differentially private algorithms, such as the prominent noisy stochastic gradient descent (DP-SGD), the analysis needed to bound the privacy leakage of a single training run is well understood. However, few studies have reasoned about the privacy leakage resulting from the multiple training runs needed to fine tune the value of the training algorithm's hyperparameters. In this work, we f… ▽ More For many differentially private algorithms, such as the prominent noisy stochastic gradient descent (DP-SGD), the analysis needed to bound the privacy leakage of a single training run is well understood. However, few studies have reasoned about the privacy leakage resulting from the multiple training runs needed to fine tune the value of the training algorithm's hyperparameters. In this work, we first illustrate how simply setting hyperparameters based on non-private training runs can leak private information. Motivated by this observation, we then provide privacy guarantees for hyperparameter search procedures within the framework of Renyi Differential Privacy. Our results improve and extend the work of Liu and Talwar (STOC 2019). Our analysis supports our previous observation that tuning hyperparameters does indeed leak private information, but we prove that, under certain assumptions, this leakage is modest, as long as each candidate training run needed to select hyperparameters is itself differentially private. △ Less

Submitted 14 March, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

arXiv:2106.09683 [pdf, other]

PAC-Bayes, MAC-Bayes and Conditional Mutual Information: Fast rate bounds that handle general VC classes

Authors: Peter Grünwald, Thomas Steinke, Lydia Zakynthinou

Abstract: We give a novel, unified derivation of conditional PAC-Bayesian and mutual information (MI) generalization bounds. We derive conditional MI bounds as an instance, with special choice of prior, of conditional MAC-Bayesian (Mean Approximately Correct) bounds, itself derived from conditional PAC-Bayesian bounds, where `conditional' means that one can use priors conditioned on a joint training and gho… ▽ More We give a novel, unified derivation of conditional PAC-Bayesian and mutual information (MI) generalization bounds. We derive conditional MI bounds as an instance, with special choice of prior, of conditional MAC-Bayesian (Mean Approximately Correct) bounds, itself derived from conditional PAC-Bayesian bounds, where `conditional' means that one can use priors conditioned on a joint training and ghost sample. This allows us to get nontrivial PAC-Bayes and MI-style bounds for general VC classes, something recently shown to be impossible with standard PAC-Bayesian/MI bounds. Second, it allows us to get faster rates of order $O \left(({\text{KL}}/n)^γ\right)$ for $γ> 1/2$ if a Bernstein condition holds and for exp-concave losses (with $γ=1$), which is impossible with both standard PAC-Bayes generalization and MI bounds. Our work extends the recent work by Steinke and Zakynthinou [2020] who handle MI with VC but neither PAC-Bayes nor fast rates, the recent work of Hellström and Durisi [2020] who extend the latter to the PAC-Bayes setting via a unifying exponential inequality, and Mhammedi et al. [2019] who initiated fast rate PAC-Bayes generalization error bounds but handle neither MI nor general VC classes. △ Less

Submitted 17 June, 2021; originally announced June 2021.

Comments: 24 pages, accepted for publication at COLT 2021

arXiv:2106.00001 [pdf, other]

Privately Learning Subspaces

Authors: Vikrant Singhal, Thomas Steinke

Abstract: Private data analysis suffers a costly curse of dimensionality. However, the data often has an underlying low-dimensional structure. For example, when optimizing via gradient descent, the gradients often lie in or near a low-dimensional subspace. If that low-dimensional structure can be identified, then we can avoid paying (in terms of privacy or accuracy) for the high ambient dimension. We pres… ▽ More Private data analysis suffers a costly curse of dimensionality. However, the data often has an underlying low-dimensional structure. For example, when optimizing via gradient descent, the gradients often lie in or near a low-dimensional subspace. If that low-dimensional structure can be identified, then we can avoid paying (in terms of privacy or accuracy) for the high ambient dimension. We present differentially private algorithms that take input data sampled from a low-dimensional linear subspace (possibly with a small amount of error) and output that subspace (or an approximation to it). These algorithms can serve as a pre-processing step for other procedures. △ Less

Submitted 10 August, 2021; v1 submitted 28 May, 2021; originally announced June 2021.

arXiv:2105.07260 [pdf, ps, other]

The Permute-and-Flip Mechanism is Identical to Report-Noisy-Max with Exponential Noise

Authors: Zeyu Ding, Daniel Kifer, Sayed M. Saghaian N. E., Thomas Steinke, Yuxin Wang, Yingtai Xiao, Danfeng Zhang

Abstract: The permute-and-flip mechanism is a recently proposed differentially private selection algorithm that was shown to outperform the exponential mechanism. In this paper, we show that permute-and-flip is equivalent to the well-known report noisy max algorithm with exponential noise. The permute-and-flip mechanism is a recently proposed differentially private selection algorithm that was shown to outperform the exponential mechanism. In this paper, we show that permute-and-flip is equivalent to the well-known report noisy max algorithm with exponential noise. △ Less

Submitted 5 June, 2021; v1 submitted 15 May, 2021; originally announced May 2021.

arXiv:2102.08598 [pdf, other]

Leveraging Public Data for Practical Private Query Release

Authors: Terrance Liu, Giuseppe Vietri, Thomas Steinke, Jonathan Ullman, Zhiwei Steven Wu

Abstract: In many statistical problems, incorporating priors can significantly improve performance. However, the use of prior knowledge in differentially private query release has remained underexplored, despite such priors commonly being available in the form of public datasets, such as previous US Census releases. With the goal of releasing statistics about a private dataset, we present PMW^Pub, which --… ▽ More In many statistical problems, incorporating priors can significantly improve performance. However, the use of prior knowledge in differentially private query release has remained underexplored, despite such priors commonly being available in the form of public datasets, such as previous US Census releases. With the goal of releasing statistics about a private dataset, we present PMW^Pub, which -- unlike existing baselines -- leverages public data drawn from a related distribution as prior information. We provide a theoretical analysis and an empirical evaluation on the American Community Survey (ACS) and ADULT datasets, which shows that our method outperforms state-of-the-art methods. Furthermore, PMW^Pub scales well to high-dimensional data domains, where running many existing methods would be computationally infeasible. △ Less

Submitted 10 June, 2021; v1 submitted 17 February, 2021; originally announced February 2021.

Comments: ICML 2021

arXiv:2102.06387 [pdf, other]

The Distributed Discrete Gaussian Mechanism for Federated Learning with Secure Aggregation

Authors: Peter Kairouz, Ziyu Liu, Thomas Steinke

Abstract: We consider training models on private data that are distributed across user devices. To ensure privacy, we add on-device noise and use secure aggregation so that only the noisy sum is revealed to the server. We present a comprehensive end-to-end system, which appropriately discretizes the data and adds discrete Gaussian noise before performing secure aggregation. We provide a novel privacy analys… ▽ More We consider training models on private data that are distributed across user devices. To ensure privacy, we add on-device noise and use secure aggregation so that only the noisy sum is revealed to the server. We present a comprehensive end-to-end system, which appropriately discretizes the data and adds discrete Gaussian noise before performing secure aggregation. We provide a novel privacy analysis for sums of discrete Gaussians and carefully analyze the effects of data quantization and modular summation arithmetic. Our theoretical guarantees highlight the complex tension between communication, privacy, and accuracy. Our extensive experimental results demonstrate that our solution is essentially able to match the accuracy to central differential privacy with less than 16 bits of precision per value. △ Less

Submitted 8 September, 2022; v1 submitted 12 February, 2021; originally announced February 2021.

Comments: International Conference on Machine Learning (ICML), 2021

arXiv:2009.05401 [pdf, ps, other]

Multi-Central Differential Privacy

Authors: Thomas Steinke

Abstract: Differential privacy is typically studied in the central model where a trusted "aggregator" holds the sensitive data of all the individuals and is responsible for protecting their privacy. A popular alternative is the local model in which the aggregator is untrusted and instead each individual is responsible for their own privacy. The decentralized privacy guarantee of the local model comes at a h… ▽ More Differential privacy is typically studied in the central model where a trusted "aggregator" holds the sensitive data of all the individuals and is responsible for protecting their privacy. A popular alternative is the local model in which the aggregator is untrusted and instead each individual is responsible for their own privacy. The decentralized privacy guarantee of the local model comes at a high price in statistical utility or computational complexity. Thus intermediate models such as the shuffled model and pan privacy have been studied in an attempt to attain the best of both worlds. In this note, we propose an intermediate trust model for differential privacy, which we call the multi-central model. Here there are multiple aggregators and we only assume that they do not collude nefariously. This model relaxes the trust requirements of the central model while avoiding the price of the local model. We motivate this model and provide some simple and efficient algorithms for it. We argue that this model is a promising direction for further research. △ Less

Submitted 11 September, 2020; originally announced September 2020.

Comments: Short working paper (10 pages) - comments welcome

arXiv:2007.05453 [pdf, other]

New Oracle-Efficient Algorithms for Private Synthetic Data Release

Authors: Giuseppe Vietri, Grace Tian, Mark Bun, Thomas Steinke, Zhiwei Steven Wu

Abstract: We present three new algorithms for constructing differentially private synthetic data---a sanitized version of a sensitive dataset that approximately preserves the answers to a large collection of statistical queries. All three algorithms are \emph{oracle-efficient} in the sense that they are computationally efficient when given access to an optimization oracle. Such an oracle can be implemented… ▽ More We present three new algorithms for constructing differentially private synthetic data---a sanitized version of a sensitive dataset that approximately preserves the answers to a large collection of statistical queries. All three algorithms are \emph{oracle-efficient} in the sense that they are computationally efficient when given access to an optimization oracle. Such an oracle can be implemented using many existing (non-private) optimization tools such as sophisticated integer program solvers. While the accuracy of the synthetic data is contingent on the oracle's optimization performance, the algorithms satisfy differential privacy even in the worst case. For all three algorithms, we provide theoretical guarantees for both accuracy and privacy. Through empirical evaluation, we demonstrate that our methods scale well with both the dimensionality of the data and the number of queries. Compared to the state-of-the-art method High-Dimensional Matrix Mechanism \cite{McKennaMHM18}, our algorithms provide better accuracy in the large workload and high privacy regime (corresponding to low privacy loss $\varepsilon$). △ Less

Submitted 10 July, 2020; originally announced July 2020.

arXiv:2006.06783 [pdf, other]

Evading Curse of Dimensionality in Unconstrained Private GLMs via Private Gradient Descent

Authors: Shuang Song, Thomas Steinke, Om Thakkar, Abhradeep Thakurta

Abstract: We revisit the well-studied problem of differentially private empirical risk minimization (ERM). We show that for unconstrained convex generalized linear models (GLMs), one can obtain an excess empirical risk of $\tilde O\left(\sqrt{\texttt{rank}}/εn\right)$, where ${\texttt{rank}}$ is the rank of the feature matrix in the GLM problem, $n$ is the number of data samples, and $ε$ is the privacy para… ▽ More We revisit the well-studied problem of differentially private empirical risk minimization (ERM). We show that for unconstrained convex generalized linear models (GLMs), one can obtain an excess empirical risk of $\tilde O\left(\sqrt{\texttt{rank}}/εn\right)$, where ${\texttt{rank}}$ is the rank of the feature matrix in the GLM problem, $n$ is the number of data samples, and $ε$ is the privacy parameter. This bound is attained via differentially private gradient descent (DP-GD). Furthermore, via the first lower bound for unconstrained private ERM, we show that our upper bound is tight. In sharp contrast to the constrained ERM setting, there is no dependence on the dimensionality of the ambient model space ($p$). (Notice that ${\texttt{rank}}\leq \min\{n, p\}$.) Besides, we obtain an analogous excess population risk bound which depends on ${\texttt{rank}}$ instead of $p$. For the smooth non-convex GLM setting (i.e., where the objective function is non-convex but preserves the GLM structure), we further show that DP-GD attains a dimension-independent convergence of $\tilde O\left(\sqrt{\texttt{rank}}/εn\right)$ to a first-order-stationary-point of the underlying objective. Finally, we show that for convex GLMs, a variant of DP-GD commonly used in practice (which involves clip** the individual gradients) also exhibits the same dimension-independent convergence to the minimum of a well-defined objective. To that end, we provide a structural lemma that characterizes the effect of clip** on the optimization profile of DP-GD. △ Less

Submitted 2 March, 2021; v1 submitted 11 June, 2020; originally announced June 2020.

arXiv:2004.00010 [pdf, other]

The Discrete Gaussian for Differential Privacy

Authors: Clément L. Canonne, Gautam Kamath, Thomas Steinke

Abstract: A key tool for building differentially private systems is adding Gaussian noise to the output of a function evaluated on a sensitive dataset. Unfortunately, using a continuous distribution presents several practical challenges. First and foremost, finite computers cannot exactly represent samples from continuous distributions, and previous work has demonstrated that seemingly innocuous numerical e… ▽ More A key tool for building differentially private systems is adding Gaussian noise to the output of a function evaluated on a sensitive dataset. Unfortunately, using a continuous distribution presents several practical challenges. First and foremost, finite computers cannot exactly represent samples from continuous distributions, and previous work has demonstrated that seemingly innocuous numerical errors can entirely destroy privacy. Moreover, when the underlying data is itself discrete (e.g., population counts), adding continuous noise makes the result less interpretable. With these shortcomings in mind, we introduce and analyze the discrete Gaussian in the context of differential privacy. Specifically, we theoretically and experimentally show that adding discrete Gaussian noise provides essentially the same privacy and accuracy guarantees as the addition of continuous Gaussian noise. We also present an simple and efficient algorithm for exact sampling from this distribution. This demonstrates its applicability for privately answering counting queries, or more generally, low-sensitivity integer-valued queries. △ Less

Submitted 18 January, 2021; v1 submitted 31 March, 2020; originally announced April 2020.

Comments: Improved time analysis, and generalisation to the multivariate case

arXiv:2001.09122 [pdf, other]

Reasoning About Generalization via Conditional Mutual Information

Authors: Thomas Steinke, Lydia Zakynthinou

Abstract: We provide an information-theoretic framework for studying the generalization properties of machine learning algorithms. Our framework ties together existing approaches, including uniform convergence bounds and recent methods for adaptive data analysis. Specifically, we use Conditional Mutual Information (CMI) to quantify how well the input (i.e., the training data) can be recognized given the out… ▽ More We provide an information-theoretic framework for studying the generalization properties of machine learning algorithms. Our framework ties together existing approaches, including uniform convergence bounds and recent methods for adaptive data analysis. Specifically, we use Conditional Mutual Information (CMI) to quantify how well the input (i.e., the training data) can be recognized given the output (i.e., the trained model) of the learning algorithm. We show that bounds on CMI can be obtained from VC dimension, compression schemes, differential privacy, and other methods. We then show that bounded CMI implies various forms of generalization. △ Less

Submitted 18 June, 2020; v1 submitted 24 January, 2020; originally announced January 2020.

Comments: 58 pages. Changes from previous version: Added discussion on related work and updated references. Simplified part of the proof of Theorem 4.10

arXiv:1906.02830 [pdf, other]

Average-Case Averages: Private Algorithms for Smooth Sensitivity and Mean Estimation

Authors: Mark Bun, Thomas Steinke

Abstract: The simplest and most widely applied method for guaranteeing differential privacy is to add instance-independent noise to a statistic of interest that is scaled to its global sensitivity. However, global sensitivity is a worst-case notion that is often too conservative for realized dataset instances. We provide methods for scaling noise in an instance-dependent way and demonstrate that they provid… ▽ More The simplest and most widely applied method for guaranteeing differential privacy is to add instance-independent noise to a statistic of interest that is scaled to its global sensitivity. However, global sensitivity is a worst-case notion that is often too conservative for realized dataset instances. We provide methods for scaling noise in an instance-dependent way and demonstrate that they provide greater accuracy under average-case distributional assumptions. Specifically, we consider the basic problem of privately estimating the mean of a real distribution from i.i.d.~samples. The standard empirical mean estimator can have arbitrarily-high global sensitivity. We propose the trimmed mean estimator, which interpolates between the mean and the median, as a way of attaining much lower sensitivity on average while losing very little in terms of statistical accuracy. To privately estimate the trimmed mean, we revisit the smooth sensitivity framework of Nissim, Raskhodnikova, and Smith (STOC 2007), which provides a framework for using instance-dependent sensitivity. We propose three new additive noise distributions which provide concentrated differential privacy when scaled to smooth sensitivity. We provide theoretical and experimental evidence showing that our noise distributions compare favorably to others in the literature, in particular, when applied to the mean estimation problem. △ Less

Submitted 6 June, 2019; originally announced June 2019.

arXiv:1905.13229 [pdf, ps, other]

Private Hypothesis Selection

Authors: Mark Bun, Gautam Kamath, Thomas Steinke, Zhiwei Steven Wu

Abstract: We provide a differentially private algorithm for hypothesis selection. Given samples from an unknown probability distribution $P$ and a set of $m$ probability distributions $\mathcal{H}$, the goal is to output, in a $\varepsilon$-differentially private manner, a distribution from $\mathcal{H}$ whose total variation distance to $P$ is comparable to that of the best such distribution (which we deno… ▽ More We provide a differentially private algorithm for hypothesis selection. Given samples from an unknown probability distribution $P$ and a set of $m$ probability distributions $\mathcal{H}$, the goal is to output, in a $\varepsilon$-differentially private manner, a distribution from $\mathcal{H}$ whose total variation distance to $P$ is comparable to that of the best such distribution (which we denote by $α$). The sample complexity of our basic algorithm is $O\left(\frac{\log m}{α^2} + \frac{\log m}{α\varepsilon}\right)$, representing a minimal cost for privacy when compared to the non-private algorithm. We also can handle infinite hypothesis classes $\mathcal{H}$ by relaxing to $(\varepsilon,δ)$-differential privacy. We apply our hypothesis selection algorithm to give learning algorithms for a number of natural distribution classes, including Gaussians, product distributions, sums of independent random variables, piecewise polynomials, and mixture classes. Our hypothesis selection procedure allows us to generically convert a cover for a class to a learning algorithm, complementing known learning lower bounds which are in terms of the size of the packing number of the class. As the covering and packing numbers are often closely related, for constant $α$, our algorithms achieve the optimal sample complexity for many classes of interest. Finally, we describe an application to private distribution-free PAC learning. △ Less

Submitted 4 January, 2021; v1 submitted 30 May, 2019; originally announced May 2019.

Comments: Appeared in NeurIPS 2019. Final version to appear in IEEE Transactions on Information Theory

arXiv:1812.03224 [pdf, other]

A Hybrid Approach to Privacy-Preserving Federated Learning

Authors: Stacey Truex, Nathalie Baracaldo, Ali Anwar, Thomas Steinke, Heiko Ludwig, Rui Zhang, Yi Zhou

Abstract: Federated learning facilitates the collaborative training of models without the sharing of raw data. However, recent attacks demonstrate that simply maintaining data locality during training processes does not provide sufficient privacy guarantees. Rather, we need a federated learning system capable of preventing inference over both the messages exchanged during training and the final trained mode… ▽ More Federated learning facilitates the collaborative training of models without the sharing of raw data. However, recent attacks demonstrate that simply maintaining data locality during training processes does not provide sufficient privacy guarantees. Rather, we need a federated learning system capable of preventing inference over both the messages exchanged during training and the final trained model while ensuring the resulting model also has acceptable predictive accuracy. Existing federated learning approaches either use secure multiparty computation (SMC) which is vulnerable to inference or differential privacy which can lead to low accuracy given a large number of parties with relatively small amounts of data each. In this paper, we present an alternative approach that utilizes both differential privacy and SMC to balance these trade-offs. Combining differential privacy with secure multiparty computation enables us to reduce the growth of noise injection as the number of parties increases without sacrificing privacy while maintaining a pre-defined rate of trust. Our system is therefore a scalable approach that protects against inference threats and produces models with high accuracy. Additionally, our system can be used to train a variety of machine learning models, which we validate with experimental results on 3 different machine learning algorithms. Our experiments demonstrate that our approach out-performs state of the art solutions. △ Less

Submitted 14 August, 2019; v1 submitted 7 December, 2018; originally announced December 2018.

arXiv:1811.03763 [pdf, ps, other]

Towards Instance-Optimal Private Query Release

Authors: Jaroslaw Blasiok, Mark Bun, Aleksandar Nikolov, Thomas Steinke

Abstract: We study efficient mechanisms for the query release problem in differential privacy: given a workload of $m$ statistical queries, output approximate answers to the queries while satisfying the constraints of differential privacy. In particular, we are interested in mechanisms that optimally adapt to the given workload. Building on the projection mechanism of Nikolov, Talwar, and Zhang, and using t… ▽ More We study efficient mechanisms for the query release problem in differential privacy: given a workload of $m$ statistical queries, output approximate answers to the queries while satisfying the constraints of differential privacy. In particular, we are interested in mechanisms that optimally adapt to the given workload. Building on the projection mechanism of Nikolov, Talwar, and Zhang, and using the ideas behind Dudley's chaining inequality, we propose new efficient algorithms for the query release problem, and prove that they achieve optimal sample complexity for the given workload (up to constant factors, in certain parameter regimes) with respect to the class of mechanisms that satisfy concentrated differential privacy. We also give variants of our algorithms that satisfy local differential privacy, and prove that they also achieve optimal sample complexity among all local sequentially interactive private mechanisms. △ Less

Submitted 8 November, 2018; originally announced November 2018.

Comments: To appear in SODA 2019

arXiv:1806.06100 [pdf, ps, other]

The Limits of Post-Selection Generalization

Authors: Kobbi Nissim, Adam Smith, Thomas Steinke, Uri Stemmer, Jonathan Ullman

Abstract: While statistics and machine learning offers numerous methods for ensuring generalization, these methods often fail in the presence of adaptivity---the common practice in which the choice of analysis depends on previous interactions with the same dataset. A recent line of work has introduced powerful, general purpose algorithms that ensure post hoc generalization (also called robust or post-select… ▽ More While statistics and machine learning offers numerous methods for ensuring generalization, these methods often fail in the presence of adaptivity---the common practice in which the choice of analysis depends on previous interactions with the same dataset. A recent line of work has introduced powerful, general purpose algorithms that ensure post hoc generalization (also called robust or post-selection generalization), which says that, given the output of the algorithm, it is hard to find any statistic for which the data differs significantly from the population it came from. In this work we show several limitations on the power of algorithms satisfying post hoc generalization. First, we show a tight lower bound on the error of any algorithm that satisfies post hoc generalization and answers adaptively chosen statistical queries, showing a strong barrier to progress in post selection data analysis. Second, we show that post hoc generalization is not closed under composition, despite many examples of such algorithms exhibiting strong composition properties. △ Less

Submitted 15 June, 2018; originally announced June 2018.

arXiv:1712.07196 [pdf, ps, other]

Calibrating Noise to Variance in Adaptive Data Analysis

Authors: Vitaly Feldman, Thomas Steinke

Abstract: Datasets are often used multiple times and each successive analysis may depend on the outcome of previous analyses. Standard techniques for ensuring generalization and statistical validity do not account for this adaptive dependence. A recent line of work studies the challenges that arise from such adaptive data reuse by considering the problem of answering a sequence of "queries" about the data d… ▽ More Datasets are often used multiple times and each successive analysis may depend on the outcome of previous analyses. Standard techniques for ensuring generalization and statistical validity do not account for this adaptive dependence. A recent line of work studies the challenges that arise from such adaptive data reuse by considering the problem of answering a sequence of "queries" about the data distribution where each query may depend arbitrarily on answers to previous queries. The strongest results obtained for this problem rely on differential privacy -- a strong notion of algorithmic stability with the important property that it "composes" well when data is reused. However the notion is rather strict, as it requires stability under replacement of an arbitrary data element. The simplest algorithm is to add Gaussian (or Laplace) noise to distort the empirical answers. However, analysing this technique using differential privacy yields suboptimal accuracy guarantees when the queries have low variance. Here we propose a relaxed notion of stability that also composes adaptively. We demonstrate that a simple and natural algorithm based on adding noise scaled to the standard deviation of the query provides our notion of stability. This implies an algorithm that can answer statistical queries about the dataset with substantially improved accuracy guarantees for low-variance queries. The only previous approach that provides such accuracy guarantees is based on a more involved differentially private median-of-means algorithm and its analysis exploits stronger "group" stability of the algorithm. △ Less

Submitted 11 June, 2018; v1 submitted 19 December, 2017; originally announced December 2017.

Comments: Accepted for presentation at Conference on Learning Theory (COLT) 2018

arXiv:1706.05069 [pdf, ps, other]

Generalization for Adaptively-chosen Estimators via Stable Median

Authors: Vitaly Feldman, Thomas Steinke

Abstract: Datasets are often reused to perform multiple statistical analyses in an adaptive way, in which each analysis may depend on the outcomes of previous analyses on the same dataset. Standard statistical guarantees do not account for these dependencies and little is known about how to provably avoid overfitting and false discovery in the adaptive setting. We consider a natural formalization of this pr… ▽ More Datasets are often reused to perform multiple statistical analyses in an adaptive way, in which each analysis may depend on the outcomes of previous analyses on the same dataset. Standard statistical guarantees do not account for these dependencies and little is known about how to provably avoid overfitting and false discovery in the adaptive setting. We consider a natural formalization of this problem in which the goal is to design an algorithm that, given a limited number of i.i.d.~samples from an unknown distribution, can answer adaptively-chosen queries about that distribution. We present an algorithm that estimates the expectations of $k$ arbitrary adaptively-chosen real-valued estimators using a number of samples that scales as $\sqrt{k}$. The answers given by our algorithm are essentially as accurate as if fresh samples were used to evaluate each estimator. In contrast, prior work yields error guarantees that scale with the worst-case sensitivity of each estimator. We also give a version of our algorithm that can be used to verify answers to such queries where the sample complexity depends logarithmically on the number of queries $k$ (as in the reusable holdout technique). Our algorithm is based on a simple approximate median algorithm that satisfies the strong stability guarantees of differential privacy. Our techniques provide a new approach for analyzing the generalization guarantees of differentially private algorithms. △ Less

Submitted 15 June, 2017; originally announced June 2017.

Comments: To appear in Conference on Learning Theory (COLT) 2017

arXiv:1704.03024 [pdf, other]

Tight Lower Bounds for Differentially Private Selection

Authors: Thomas Steinke, Jonathan Ullman

Abstract: A pervasive task in the differential privacy literature is to select the $k$ items of "highest quality" out of a set of $d$ items, where the quality of each item depends on a sensitive dataset that must be protected. Variants of this task arise naturally in fundamental problems like feature selection and hypothesis testing, and also as subroutines for many sophisticated differentially private algo… ▽ More A pervasive task in the differential privacy literature is to select the $k$ items of "highest quality" out of a set of $d$ items, where the quality of each item depends on a sensitive dataset that must be protected. Variants of this task arise naturally in fundamental problems like feature selection and hypothesis testing, and also as subroutines for many sophisticated differentially private algorithms. The standard approaches to these tasks---repeated use of the exponential mechanism or the sparse vector technique---approximately solve this problem given a dataset of $n = O(\sqrt{k}\log d)$ samples. We provide a tight lower bound for some very simple variants of the private selection problem. Our lower bound shows that a sample of size $n = Ω(\sqrt{k} \log d)$ is required even to achieve a very minimal accuracy guarantee. Our results are based on an extension of the fingerprinting method to sparse selection problems. Previously, the fingerprinting method has been used to provide tight lower bounds for answering an entire set of $d$ queries, but often only some much smaller set of $k$ queries are relevant. Our extension allows us to prove lower bounds that depend on both the number of relevant queries and the total number of queries. △ Less

Submitted 10 April, 2017; originally announced April 2017.

arXiv:1701.03493 [pdf, other]

Subgaussian Tail Bounds via Stability Arguments

Authors: Thomas Steinke, Jonathan Ullman

Abstract: Sums of independent, bounded random variables concentrate around their expectation approximately as well a Gaussian of the same variance. Well known results of this form include the Bernstein, Hoeffding, and Chernoff inequalities and many others. We present an alternative proof of these tail bounds based on what we call a stability argument, which avoids bounding the moment generating function or… ▽ More Sums of independent, bounded random variables concentrate around their expectation approximately as well a Gaussian of the same variance. Well known results of this form include the Bernstein, Hoeffding, and Chernoff inequalities and many others. We present an alternative proof of these tail bounds based on what we call a stability argument, which avoids bounding the moment generating function or higher-order moments of the distribution. Our stability argument is inspired by recent work on the generalization properties of differential privacy and their connection to adaptive data analysis (Bassily et al., STOC 2016). △ Less

Submitted 21 April, 2017; v1 submitted 12 January, 2017; originally announced January 2017.

arXiv:1605.02065 [pdf, other]

Concentrated Differential Privacy: Simplifications, Extensions, and Lower Bounds

Authors: Mark Bun, Thomas Steinke

Abstract: "Concentrated differential privacy" was recently introduced by Dwork and Rothblum as a relaxation of differential privacy, which permits sharper analyses of many privacy-preserving computations. We present an alternative formulation of the concept of concentrated differential privacy in terms of the Renyi divergence between the distributions obtained by running an algorithm on neighboring inputs.… ▽ More "Concentrated differential privacy" was recently introduced by Dwork and Rothblum as a relaxation of differential privacy, which permits sharper analyses of many privacy-preserving computations. We present an alternative formulation of the concept of concentrated differential privacy in terms of the Renyi divergence between the distributions obtained by running an algorithm on neighboring inputs. With this reformulation in hand, we prove sharper quantitative results, establish lower bounds, and raise a few new questions. We also unify this approach with approximate differential privacy by giving an appropriate definition of "approximate concentrated differential privacy." △ Less

Submitted 6 May, 2016; originally announced May 2016.

arXiv:1604.04618 [pdf, other]

Make Up Your Mind: The Price of Online Queries in Differential Privacy

Authors: Mark Bun, Thomas Steinke, Jonathan Ullman

Abstract: We consider the problem of answering queries about a sensitive dataset subject to differential privacy. The queries may be chosen adversarially from a larger set Q of allowable queries in one of three ways, which we list in order from easiest to hardest to answer: Offline: The queries are chosen all at once and the differentially private mechanism answers the queries in a single batch. Online:… ▽ More We consider the problem of answering queries about a sensitive dataset subject to differential privacy. The queries may be chosen adversarially from a larger set Q of allowable queries in one of three ways, which we list in order from easiest to hardest to answer: Offline: The queries are chosen all at once and the differentially private mechanism answers the queries in a single batch. Online: The queries are chosen all at once, but the mechanism only receives the queries in a streaming fashion and must answer each query before seeing the next query. Adaptive: The queries are chosen one at a time and the mechanism must answer each query before the next query is chosen. In particular, each query may depend on the answers given to previous queries. Many differentially private mechanisms are just as efficient in the adaptive model as they are in the offline model. Meanwhile, most lower bounds for differential privacy hold in the offline setting. This suggests that the three models may be equivalent. We prove that these models are all, in fact, distinct. Specifically, we show that there is a family of statistical queries such that exponentially more queries from this family can be answered in the offline model than in the online model. We also exhibit a family of search queries such that exponentially more queries from this family can be answered in the online model than in the adaptive model. We also investigate whether such separations might hold for simple queries like threshold queries over the real line. △ Less

Submitted 15 April, 2016; originally announced April 2016.

arXiv:1511.02513 [pdf, other]

Algorithmic Stability for Adaptive Data Analysis

Authors: Raef Bassily, Kobbi Nissim, Adam Smith, Thomas Steinke, Uri Stemmer, Jonathan Ullman

Abstract: Adaptivity is an important feature of data analysis---the choice of questions to ask about a dataset often depends on previous interactions with the same dataset. However, statistical validity is typically studied in a nonadaptive model, where all questions are specified before the dataset is drawn. Recent work by Dwork et al. (STOC, 2015) and Hardt and Ullman (FOCS, 2014) initiated the formal stu… ▽ More Adaptivity is an important feature of data analysis---the choice of questions to ask about a dataset often depends on previous interactions with the same dataset. However, statistical validity is typically studied in a nonadaptive model, where all questions are specified before the dataset is drawn. Recent work by Dwork et al. (STOC, 2015) and Hardt and Ullman (FOCS, 2014) initiated the formal study of this problem, and gave the first upper and lower bounds on the achievable generalization error for adaptive data analysis. Specifically, suppose there is an unknown distribution $\mathbf{P}$ and a set of $n$ independent samples $\mathbf{x}$ is drawn from $\mathbf{P}$. We seek an algorithm that, given $\mathbf{x}$ as input, accurately answers a sequence of adaptively chosen queries about the unknown distribution $\mathbf{P}$. How many samples $n$ must we draw from the distribution, as a function of the type of queries, the number of queries, and the desired level of accuracy? In this work we make two new contributions: (i) We give upper bounds on the number of samples $n$ that are needed to answer statistical queries. The bounds improve and simplify the work of Dwork et al. (STOC, 2015), and have been applied in subsequent work by those authors (Science, 2015, NIPS, 2015). (ii) We prove the first upper bounds on the number of samples required to answer more general families of queries. These include arbitrary low-sensitivity queries and an important class of optimization queries. As in Dwork et al., our algorithms are based on a connection with algorithmic stability in the form of differential privacy. We extend their work by giving a quantitatively optimal, more general, and simpler proof of their main theorem that stability implies low generalization error. We also study weaker stability guarantees such as bounded KL divergence and total variation distance. △ Less

Submitted 8 November, 2015; originally announced November 2015.

Comments: This work unifies and subsumes the two arXiv manuscripts arXiv:1503.04843 and arXiv:1504.05800

arXiv:1504.04675 [pdf, other]

Pseudorandomness for Read-Once, Constant-Depth Circuits

Authors: Sitan Chen, Thomas Steinke, Salil Vadhan

Abstract: For Boolean functions computed by read-once, depth-$D$ circuits with unbounded fan-in over the de Morgan basis, we present an explicit pseudorandom generator with seed length $\tilde{O}(\log^{D+1} n)$. The previous best seed length known for this model was $\tilde{O}(\log^{D+4} n)$, obtained by Trevisan and Xue (CCC `13) for all of $AC^0$ (not just read-once). Our work makes use of Fourier analyti… ▽ More For Boolean functions computed by read-once, depth-$D$ circuits with unbounded fan-in over the de Morgan basis, we present an explicit pseudorandom generator with seed length $\tilde{O}(\log^{D+1} n)$. The previous best seed length known for this model was $\tilde{O}(\log^{D+4} n)$, obtained by Trevisan and Xue (CCC `13) for all of $AC^0$ (not just read-once). Our work makes use of Fourier analytic techniques for pseudorandomness introduced by Reingold, Steinke, and Vadhan (RANDOM `13) to show that the generator of Gopalan et al. (FOCS `12) fools read-once $AC^0$. To this end, we prove a new Fourier growth bound for read-once circuits, namely that for every $F: \{0,1\}^n\to\{0,1\}$ computed by a read-once, depth-$D$ circuit, \begin{equation*}\sum_{s\subseteq[n], |s|=k}|\hat{F}[s]|\le O(\log^{D-1}n)^k,\end{equation*} where $\hat{F}$ denotes the Fourier transform of $F$ over $\mathbb{Z}^n_2$. △ Less

Submitted 18 September, 2015; v1 submitted 17 April, 2015; originally announced April 2015.

arXiv:1503.04843

More General Queries and Less Generalization Error in Adaptive Data Analysis

Authors: Raef Bassily, Adam Smith, Thomas Steinke, Jonathan Ullman

Abstract: Adaptivity is an important feature of data analysis---typically the choice of questions asked about a dataset depends on previous interactions with the same dataset. However, generalization error is typically bounded in a non-adaptive model, where all questions are specified before the dataset is drawn. Recent work by Dwork et al. (STOC '15) and Hardt and Ullman (FOCS '14) initiated the formal stu… ▽ More Adaptivity is an important feature of data analysis---typically the choice of questions asked about a dataset depends on previous interactions with the same dataset. However, generalization error is typically bounded in a non-adaptive model, where all questions are specified before the dataset is drawn. Recent work by Dwork et al. (STOC '15) and Hardt and Ullman (FOCS '14) initiated the formal study of this problem, and gave the first upper and lower bounds on the achievable generalization error for adaptive data analysis. Specifically, suppose there is an unknown distribution $\mathcal{P}$ and a set of $n$ independent samples $x$ is drawn from $\mathcal{P}$. We seek an algorithm that, given $x$ as input, "accurately" answers a sequence of adaptively chosen "queries" about the unknown distribution $\mathcal{P}$. How many samples $n$ must we draw from the distribution, as a function of the type of queries, the number of queries, and the desired level of accuracy? In this work we make two new contributions towards resolving this question: *We give upper bounds on the number of samples $n$ that are needed to answer statistical queries that improve over the bounds of Dwork et al. *We prove the first upper bounds on the number of samples required to answer more general families of queries. These include arbitrary low-sensitivity queries and the important class of convex risk minimization queries. As in Dwork et al., our algorithms are based on a connection between differential privacy and generalization error, but we feel that our analysis is simpler and more modular, which may be useful for studying these questions in the future. △ Less

Submitted 9 November, 2015; v1 submitted 16 March, 2015; originally announced March 2015.

Comments: This paper was merged with another manuscript and is now subsumed by arXiv:1511.02513

arXiv:1501.06095 [pdf, other]

Between Pure and Approximate Differential Privacy

Authors: Thomas Steinke, Jonathan Ullman

Abstract: We show a new lower bound on the sample complexity of $(\varepsilon, δ)$-differentially private algorithms that accurately answer statistical queries on high-dimensional databases. The novelty of our bound is that it depends optimally on the parameter $δ$, which loosely corresponds to the probability that the algorithm fails to be private, and is the first to smoothly interpolate between approxima… ▽ More We show a new lower bound on the sample complexity of $(\varepsilon, δ)$-differentially private algorithms that accurately answer statistical queries on high-dimensional databases. The novelty of our bound is that it depends optimally on the parameter $δ$, which loosely corresponds to the probability that the algorithm fails to be private, and is the first to smoothly interpolate between approximate differential privacy ($δ> 0$) and pure differential privacy ($δ= 0$). Specifically, we consider a database $D \in \{\pm1\}^{n \times d}$ and its \emph{one-way marginals}, which are the $d$ queries of the form "What fraction of individual records have the $i$-th bit set to $+1$?" We show that in order to answer all of these queries to within error $\pm α$ (on average) while satisfying $(\varepsilon, δ)$-differential privacy, it is necessary that $$ n \geq Ω\left( \frac{\sqrt{d \log(1/δ)}}{α\varepsilon} \right), $$ which is optimal up to constant factors. To prove our lower bound, we build on the connection between \emph{fingerprinting codes} and lower bounds in differential privacy (Bun, Ullman, and Vadhan, STOC'14). In addition to our lower bound, we give new purely and approximately differentially private algorithms for answering arbitrary statistical queries that improve on the sample complexity of the standard Laplace and Gaussian mechanisms for achieving worst-case accuracy guarantees by a logarithmic factor. △ Less

Submitted 24 January, 2015; originally announced January 2015.

arXiv:1412.2457 [pdf, ps, other]

Weighted Polynomial Approximations: Limits for Learning and Pseudorandomness

Authors: Mark Bun, Thomas Steinke

Abstract: Polynomial approximations to boolean functions have led to many positive results in computer science. In particular, polynomial approximations to the sign function underly algorithms for agnostically learning halfspaces, as well as pseudorandom generators for halfspaces. In this work, we investigate the limits of these techniques by proving inapproximability results for the sign function. Firstl… ▽ More Polynomial approximations to boolean functions have led to many positive results in computer science. In particular, polynomial approximations to the sign function underly algorithms for agnostically learning halfspaces, as well as pseudorandom generators for halfspaces. In this work, we investigate the limits of these techniques by proving inapproximability results for the sign function. Firstly, the polynomial regression algorithm of Kalai et al. (SIAM J. Comput. 2008) shows that halfspaces can be learned with respect to log-concave distributions on $\mathbb{R}^n$ in the challenging agnostic learning model. The power of this algorithm relies on the fact that under log-concave distributions, halfspaces can be approximated arbitrarily well by low-degree polynomials. We ask whether this technique can be extended beyond log-concave distributions, and establish a negative result. We show that polynomials of any degree cannot approximate the sign function to within arbitrarily low error for a large class of non-log-concave distributions on the real line, including those with densities proportional to $\exp(-|x|^{0.99})$. Secondly, we investigate the derandomization of Chernoff-type concentration inequalities. Chernoff-type tail bounds on sums of independent random variables have pervasive applications in theoretical computer science. Schmidt et al. (SIAM J. Discrete Math. 1995) showed that these inequalities can be established for sums of random variables with only $O(\log(1/δ))$-wise independence, for a tail probability of $δ$. We show that their results are tight up to constant factors. These results rely on techniques from weighted approximation theory, which studies how well functions on the real line can be approximated by polynomials under various distributions. We believe that these techniques will have further applications in other areas of computer science. △ Less

Submitted 8 December, 2014; originally announced December 2014.

Comments: 22 pages

arXiv:1410.1228 [pdf, other]

Interactive Fingerprinting Codes and the Hardness of Preventing False Discovery

Authors: Thomas Steinke, Jonathan Ullman

Abstract: We show an essentially tight bound on the number of adaptively chosen statistical queries that a computationally efficient algorithm can answer accurately given $n$ samples from an unknown distribution. A statistical query asks for the expectation of a predicate over the underlying distribution, and an answer to a statistical query is accurate if it is "close" to the correct expectation over the d… ▽ More We show an essentially tight bound on the number of adaptively chosen statistical queries that a computationally efficient algorithm can answer accurately given $n$ samples from an unknown distribution. A statistical query asks for the expectation of a predicate over the underlying distribution, and an answer to a statistical query is accurate if it is "close" to the correct expectation over the distribution. This question was recently studied by Dwork et al., who showed how to answer $\tildeΩ(n^2)$ queries efficiently, and also by Hardt and Ullman, who showed that answering $\tilde{O}(n^3)$ queries is hard. We close the gap between the two bounds and show that, under a standard hardness assumption, there is no computationally efficient algorithm that, given $n$ samples from an unknown distribution, can give valid answers to $O(n^2)$ adaptively chosen statistical queries. An implication of our results is that computationally efficient algorithms for answering arbitrary, adaptively chosen statistical queries may as well be differentially private. We obtain our results using a new connection between the problem of answering adaptively chosen statistical queries and a combinatorial object called an interactive fingerprinting code. In order to optimize our hardness result, we give a new Fourier-analytic approach to analyzing fingerprinting codes that is simpler, more flexible, and yields better parameters than previous constructions. △ Less

Submitted 20 February, 2015; v1 submitted 5 October, 2014; originally announced October 2014.

arXiv:1405.7028 [pdf, ps, other]

Pseudorandomness and Fourier Growth Bounds for Width 3 Branching Programs

Authors: Thomas Steinke, Salil Vadhan, Andrew Wan

Abstract: We present an explicit pseudorandom generator for oblivious, read-once, width-$3$ branching programs, which can read their input bits in any order. The generator has seed length $\tilde{O}( \log^3 n ).$ The previously best known seed length for this model is $n^{1/2+o(1)}$ due to Impagliazzo, Meka, and Zuckerman (FOCS '12). Our work generalizes a recent result of Reingold, Steinke, and Vadhan (RAN… ▽ More We present an explicit pseudorandom generator for oblivious, read-once, width-$3$ branching programs, which can read their input bits in any order. The generator has seed length $\tilde{O}( \log^3 n ).$ The previously best known seed length for this model is $n^{1/2+o(1)}$ due to Impagliazzo, Meka, and Zuckerman (FOCS '12). Our work generalizes a recent result of Reingold, Steinke, and Vadhan (RANDOM '13) for \textit{permutation} branching programs. The main technical novelty underlying our generator is a new bound on the Fourier growth of width-3, oblivious, read-once branching programs. Specifically, we show that for any $f:\{0,1\}^n\rightarrow \{0,1\}$ computed by such a branching program, and $k\in [n],$ $$\sum_{s\subseteq [n]: |s|=k} \left| \hat{f}[s] \right| \leq n^2 \cdot (O(\log n))^k,$$ where $\widehat{f}[s] = \mathbb{E}\left[f[U] \cdot (-1)^{s \cdot U}\right]$ is the standard Fourier transform over $\mathbb{Z}_2^n$. The base $O(\log n)$ of the Fourier growth is tight up to a factor of $\log \log n$. △ Less

Submitted 27 May, 2014; originally announced May 2014.

Comments: arXiv admin note: text overlap with arXiv:1306.3004

arXiv:1306.3004 [pdf, ps, other]

Pseudorandomness for Regular Branching Programs via Fourier Analysis

Authors: Omer Reingold, Thomas Steinke, Salil Vadhan

Abstract: We present an explicit pseudorandom generator for oblivious, read-once, permutation branching programs of constant width that can read their input bits in any order. The seed length is $O(\log^2 n)$, where $n$ is the length of the branching program. The previous best seed length known for this model was $n^{1/2+o(1)}$, which follows as a special case of a generator due to Impagliazzo, Meka, and Zu… ▽ More We present an explicit pseudorandom generator for oblivious, read-once, permutation branching programs of constant width that can read their input bits in any order. The seed length is $O(\log^2 n)$, where $n$ is the length of the branching program. The previous best seed length known for this model was $n^{1/2+o(1)}$, which follows as a special case of a generator due to Impagliazzo, Meka, and Zuckerman (FOCS 2012) (which gives a seed length of $s^{1/2+o(1)}$ for arbitrary branching programs of size $s$). Our techniques also give seed length $n^{1/2+o(1)}$ for general oblivious, read-once branching programs of width $2^{n^{o(1)}}$, which is incomparable to the results of Impagliazzo et al.Our pseudorandom generator is similar to the one used by Gopalan et al. (FOCS 2012) for read-once CNFs, but the analysis is quite different; ours is based on Fourier analysis of branching programs. In particular, we show that an oblivious, read-once, regular branching program of width $w$ has Fourier mass at most $(2w^2)^k$ at level $k$, independent of the length of the program. △ Less

Submitted 19 June, 2013; v1 submitted 12 June, 2013; originally announced June 2013.

Comments: RANDOM 2013

Showing 1–50 of 52 results for author: Steinke, T