-
Exponential Separations in Local Differential Privacy
Authors:
Matthew Joseph,
Jieming Mao,
Aaron Roth
Abstract:
We prove a general connection between the communication complexity of two-player games and the sample complexity of their multi-player locally private analogues. We use this connection to prove sample complexity lower bounds for locally differentially private protocols as straightforward corollaries of results from communication complexity. In particular, we 1) use a communication lower bound for…
▽ More
We prove a general connection between the communication complexity of two-player games and the sample complexity of their multi-player locally private analogues. We use this connection to prove sample complexity lower bounds for locally differentially private protocols as straightforward corollaries of results from communication complexity. In particular, we 1) use a communication lower bound for the hidden layers problem to prove an exponential sample complexity separation between sequentially and fully interactive locally private protocols, and 2) use a communication lower bound for the pointer chasing problem to prove an exponential sample complexity separation between $k$ round and $k+1$ round sequentially interactive locally private protocols, for every $k$.
△ Less
Submitted 29 October, 2019; v1 submitted 1 July, 2019;
originally announced July 2019.
-
The Role of Interactivity in Local Differential Privacy
Authors:
Matthew Joseph,
Jieming Mao,
Seth Neel,
Aaron Roth
Abstract:
We study the power of interactivity in local differential privacy. First, we focus on the difference between fully interactive and sequentially interactive protocols. Sequentially interactive protocols may query users adaptively in sequence, but they cannot return to previously queried users. The vast majority of existing lower bounds for local differential privacy apply only to sequentially inter…
▽ More
We study the power of interactivity in local differential privacy. First, we focus on the difference between fully interactive and sequentially interactive protocols. Sequentially interactive protocols may query users adaptively in sequence, but they cannot return to previously queried users. The vast majority of existing lower bounds for local differential privacy apply only to sequentially interactive protocols, and before this paper it was not known whether fully interactive protocols were more powerful. We resolve this question. First, we classify locally private protocols by their compositionality, the multiplicative factor $k \geq 1$ by which the sum of a protocol's single-round privacy parameters exceeds its overall privacy guarantee. We then show how to efficiently transform any fully interactive $k$-compositional protocol into an equivalent sequentially interactive protocol with an $O(k)$ blowup in sample complexity. Next, we show that our reduction is tight by exhibiting a family of problems such that for any $k$, there is a fully interactive $k$-compositional protocol which solves the problem, while no sequentially interactive protocol can solve the problem without at least an $\tilde Ω(k)$ factor more examples. We then turn our attention to hypothesis testing problems. We show that for a large class of compound hypothesis testing problems --- which include all simple hypothesis testing problems as a special case --- a simple noninteractive test is optimal among the class of all (possibly fully interactive) tests.
△ Less
Submitted 8 November, 2019; v1 submitted 6 April, 2019;
originally announced April 2019.
-
Locally Private Gaussian Estimation
Authors:
Matthew Joseph,
Janardhan Kulkarni,
Jieming Mao,
Zhiwei Steven Wu
Abstract:
We study a basic private estimation problem: each of $n$ users draws a single i.i.d. sample from an unknown Gaussian distribution, and the goal is to estimate the mean of this Gaussian distribution while satisfying local differential privacy for each user. Informally, local differential privacy requires that each data point is individually and independently privatized before it is passed to a lear…
▽ More
We study a basic private estimation problem: each of $n$ users draws a single i.i.d. sample from an unknown Gaussian distribution, and the goal is to estimate the mean of this Gaussian distribution while satisfying local differential privacy for each user. Informally, local differential privacy requires that each data point is individually and independently privatized before it is passed to a learning algorithm. Locally private Gaussian estimation is therefore difficult because the data domain is unbounded: users may draw arbitrarily different inputs, but local differential privacy nonetheless mandates that different users have (worst-case) similar privatized output distributions. We provide both adaptive two-round solutions and nonadaptive one-round solutions for locally private Gaussian estimation. We then partially match these upper bounds with an information-theoretic lower bound. This lower bound shows that our accuracy guarantees are tight up to logarithmic factors for all sequentially interactive $(\varepsilon,δ)$-locally private protocols.
△ Less
Submitted 27 October, 2019; v1 submitted 20 November, 2018;
originally announced November 2018.
-
Predicting property damage from tornadoes with zero-inflated neural networks
Authors:
Jeremy Diaz,
Maxwell Joseph
Abstract:
Tornadoes are the most violent of all atmospheric storms. In a typical year, the United States experiences hundreds of tornadoes with associated damages on the order of one billion dollars. Community preparation and resilience would benefit from accurate predictions of these economic losses, particularly as populations in tornado-prone areas increase in density and extent. Here, we use a zero-infl…
▽ More
Tornadoes are the most violent of all atmospheric storms. In a typical year, the United States experiences hundreds of tornadoes with associated damages on the order of one billion dollars. Community preparation and resilience would benefit from accurate predictions of these economic losses, particularly as populations in tornado-prone areas increase in density and extent. Here, we use a zero-inflated modeling approach and artificial neural networks to predict tornado-induced property damage using publicly available data. We developed a neural network that predicts whether a tornado will cause property damage (out-of-sample accuracy = 0.821 and area under the receiver operating characteristic curve, AUROC, = 0.872). Conditional on a tornado causing damage, another neural network predicts the amount of damage (out-of-sample mean squared error = 0.0918 and R2 = 0.432). When used together, these two models function as a zero-inflated log-normal regression with hidden layers. From the best-performing models, we provide static and interactive gridded maps of monthly predicted probabilities of damage and property damages for the year 2019. Two primary weaknesses include (1) model fitting requires log-scale data which leads to large natural-scale residuals and (2) beginning tornado coordinates were utilized rather than tornado paths. Ultimately, this is the first known study to directly model tornado-induced property damages, and all data, code, and tools are publicly available. The predictive capacity of this model along with an interactive interface may provide an opportunity for science-informed tornado disaster planning.
△ Less
Submitted 19 July, 2019; v1 submitted 9 July, 2018;
originally announced July 2018.
-
A Convex Framework for Fair Regression
Authors:
Richard Berk,
Hoda Heidari,
Shahin Jabbari,
Matthew Joseph,
Michael Kearns,
Jamie Morgenstern,
Seth Neel,
Aaron Roth
Abstract:
We introduce a flexible family of fairness regularizers for (linear and logistic) regression problems. These regularizers all enjoy convexity, permitting fast optimization, and they span the rang from notions of group fairness to strong individual fairness. By varying the weight on the fairness regularizer, we can compute the efficient frontier of the accuracy-fairness trade-off on any given datas…
▽ More
We introduce a flexible family of fairness regularizers for (linear and logistic) regression problems. These regularizers all enjoy convexity, permitting fast optimization, and they span the rang from notions of group fairness to strong individual fairness. By varying the weight on the fairness regularizer, we can compute the efficient frontier of the accuracy-fairness trade-off on any given dataset, and we measure the severity of this trade-off via a numerical quantity we call the Price of Fairness (PoF). The centerpiece of our results is an extensive comparative study of the PoF across six different datasets in which fairness is a primary consideration.
△ Less
Submitted 7 June, 2017;
originally announced June 2017.
-
Fairness in Learning: Classic and Contextual Bandits
Authors:
Matthew Joseph,
Michael Kearns,
Jamie Morgenstern,
Aaron Roth
Abstract:
We introduce the study of fairness in multi-armed bandit problems. Our fairness definition can be interpreted as demanding that given a pool of applicants (say, for college admission or mortgages), a worse applicant is never favored over a better one, despite a learning algorithm's uncertainty over the true payoffs. We prove results of two types.
First, in the important special case of the class…
▽ More
We introduce the study of fairness in multi-armed bandit problems. Our fairness definition can be interpreted as demanding that given a pool of applicants (say, for college admission or mortgages), a worse applicant is never favored over a better one, despite a learning algorithm's uncertainty over the true payoffs. We prove results of two types.
First, in the important special case of the classic stochastic bandits problem (i.e., in which there are no contexts), we provide a provably fair algorithm based on "chained" confidence intervals, and provide a cumulative regret bound with a cubic dependence on the number of arms. We further show that any fair algorithm must have such a dependence. When combined with regret bounds for standard non-fair algorithms such as UCB, this proves a strong separation between fair and unfair learning, which extends to the general contextual case.
In the general contextual case, we prove a tight connection between fairness and the KWIK (Knows What It Knows) learning model: a KWIK algorithm for a class of functions can be transformed into a provably fair contextual bandit algorithm, and conversely any fair contextual bandit algorithm can be transformed into a KWIK learning algorithm. This tight connection allows us to provide a provably fair algorithm for the linear contextual bandit problem with a polynomial dependence on the dimension, and to show (for a different class of functions) a worst-case exponential gap in regret between fair and non-fair learning algorithms
△ Less
Submitted 7 November, 2016; v1 submitted 23 May, 2016;
originally announced May 2016.