Search | arXiv e-print repository

arXiv:2405.19522 [pdf]

Artificial Intelligence Index Report 2024

Authors: Nestor Maslej, Loredana Fattorini, Raymond Perrault, Vanessa Parli, Anka Reuel, Erik Brynjolfsson, John Etchemendy, Katrina Ligett, Terah Lyons, James Manyika, Juan Carlos Niebles, Yoav Shoham, Russell Wald, Jack Clark

Abstract: The 2024 Index is our most comprehensive to date and arrives at an important moment when AI's influence on society has never been more pronounced. This year, we have broadened our scope to more extensively cover essential trends such as technical advancements in AI, public perceptions of the technology, and the geopolitical dynamics surrounding its development. Featuring more original data than ev… ▽ More The 2024 Index is our most comprehensive to date and arrives at an important moment when AI's influence on society has never been more pronounced. This year, we have broadened our scope to more extensively cover essential trends such as technical advancements in AI, public perceptions of the technology, and the geopolitical dynamics surrounding its development. Featuring more original data than ever before, this edition introduces new estimates on AI training costs, detailed analyses of the responsible AI landscape, and an entirely new chapter dedicated to AI's impact on science and medicine. The AI Index report tracks, collates, distills, and visualizes data related to artificial intelligence (AI). Our mission is to provide unbiased, rigorously vetted, broadly sourced data in order for policymakers, researchers, executives, journalists, and the general public to develop a more thorough and nuanced understanding of the complex field of AI. The AI Index is recognized globally as one of the most credible and authoritative sources for data and insights on artificial intelligence. Previous editions have been cited in major newspapers, including the The New York Times, Bloomberg, and The Guardian, have amassed hundreds of academic citations, and been referenced by high-level policymakers in the United States, the United Kingdom, and the European Union, among other places. This year's edition surpasses all previous ones in size, scale, and scope, reflecting the growing significance that AI is coming to hold in all of our lives. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2402.14005 [pdf, other]

Information Elicitation in Agency Games

Authors: Serena Wang, Michael I. Jordan, Katrina Ligett, R. Preston McAfee

Abstract: Rapid progress in scalable, commoditized tools for data collection and data processing has made it possible for firms and policymakers to employ ever more complex metrics as guides for decision-making. These developments have highlighted a prevailing challenge -- deciding *which* metrics to compute. In particular, a firm's ability to compute a wider range of existing metrics does not address the p… ▽ More Rapid progress in scalable, commoditized tools for data collection and data processing has made it possible for firms and policymakers to employ ever more complex metrics as guides for decision-making. These developments have highlighted a prevailing challenge -- deciding *which* metrics to compute. In particular, a firm's ability to compute a wider range of existing metrics does not address the problem of *unknown unknowns*, which reflects informational limitations on the part of the firm. To guide the choice of metrics in the face of this informational problem, we turn to the evaluated agents themselves, who may have more information than a principal about how to measure outcomes effectively. We model this interaction as a simple agency game, where we ask: *When does an agent have an incentive to reveal the observability of a cost-correlated variable to the principal?* There are two effects: better information reduces the agent's information rents but also makes some projects go forward that otherwise would fail. We show that the agent prefers to reveal information that exposes a strong enough differentiation between high and low costs. Expanding the agent's action space to include the ability to *garble* their information, we show that the agent often prefers to garble over full revelation. Still, giving the agent the ability to garble can lead to higher total welfare. Our model has analogies with price discrimination, and we leverage some of these synergies to analyze total welfare. △ Less

Submitted 15 April, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

arXiv:2310.03715 [pdf]

Artificial Intelligence Index Report 2023

Authors: Nestor Maslej, Loredana Fattorini, Erik Brynjolfsson, John Etchemendy, Katrina Ligett, Terah Lyons, James Manyika, Helen Ngo, Juan Carlos Niebles, Vanessa Parli, Yoav Shoham, Russell Wald, Jack Clark, Raymond Perrault

Abstract: Welcome to the sixth edition of the AI Index Report. This year, the report introduces more original data than any previous edition, including a new chapter on AI public opinion, a more thorough technical performance chapter, original analysis about large language and multimodal models, detailed trends in global AI legislation records, a study of the environmental impact of AI systems, and more. Th… ▽ More Welcome to the sixth edition of the AI Index Report. This year, the report introduces more original data than any previous edition, including a new chapter on AI public opinion, a more thorough technical performance chapter, original analysis about large language and multimodal models, detailed trends in global AI legislation records, a study of the environmental impact of AI systems, and more. The AI Index Report tracks, collates, distills, and visualizes data related to artificial intelligence. Our mission is to provide unbiased, rigorously vetted, broadly sourced data in order for policymakers, researchers, executives, journalists, and the general public to develop a more thorough and nuanced understanding of the complex field of AI. The report aims to be the world's most credible and authoritative source for data and insights about AI. △ Less

Submitted 5 October, 2023; originally announced October 2023.

arXiv:2301.06206 [pdf, ps, other]

Efficiency in Collective Decision-Making via Quadratic Transfers

Authors: Jon X. Eguia, Nicole Immorlica, Steven P. Lalley, Katrina Ligett, Glen Weyl, Dimitrios Xefteris

Abstract: Consider the following collective choice problem: a group of budget constrained agents must choose one of several alternatives. Is there a budget balanced mechanism that: i) does not depend on the specific characteristics of the group, ii) does not require unaffordable transfers, and iii) implements utilitarianism if the agents' preferences are quasilinear and their private information? We study t… ▽ More Consider the following collective choice problem: a group of budget constrained agents must choose one of several alternatives. Is there a budget balanced mechanism that: i) does not depend on the specific characteristics of the group, ii) does not require unaffordable transfers, and iii) implements utilitarianism if the agents' preferences are quasilinear and their private information? We study the following procedure: every agent can express any intensity of support or opposition to each alternative, by transferring to the rest of the agents wealth equal to the square of the intensity expressed; and the outcome is determined by the sums of the expressed intensities. We prove that as the group grows large, in every equilibrium of this quadratic-transfers mechanism, each agent's transfer converges to zero, and the probability that the efficient outcome is chosen converges to one. △ Less

Submitted 15 January, 2023; originally announced January 2023.

arXiv:2106.10761 [pdf, ps, other]

Generalization in the Face of Adaptivity: A Bayesian Perspective

Authors: Moshe Shenfeld, Katrina Ligett

Abstract: Repeated use of a data sample via adaptively chosen queries can rapidly lead to overfitting, wherein the empirical evaluation of queries on the sample significantly deviates from their mean with respect to the underlying data distribution. It turns out that simple noise addition algorithms suffice to prevent this issue, and differential privacy-based analysis of these algorithms shows that they ca… ▽ More Repeated use of a data sample via adaptively chosen queries can rapidly lead to overfitting, wherein the empirical evaluation of queries on the sample significantly deviates from their mean with respect to the underlying data distribution. It turns out that simple noise addition algorithms suffice to prevent this issue, and differential privacy-based analysis of these algorithms shows that they can handle an asymptotically optimal number of queries. However, differential privacy's worst-case nature entails scaling such noise to the range of the queries even for highly-concentrated queries, or introducing more complex algorithms. In this paper, we prove that straightforward noise-addition algorithms already provide variance-dependent guarantees that also extend to unbounded queries. This improvement stems from a novel characterization that illuminates the core problem of adaptive data analysis. We show that the harm of adaptivity results from the covariance between the new query and a Bayes factor-based measure of how much information about the data sample was encoded in the responses given to past queries. We then leverage this characterization to introduce a new data-dependent stability notion that can bound this covariance. △ Less

Submitted 3 April, 2024; v1 submitted 20 June, 2021; originally announced June 2021.

Journal ref: Advances in Neural Information Processing Systems, 36 (2024)

arXiv:2002.07024 [pdf, ps, other]

Gaming Helps! Learning from Strategic Interactions in Natural Dynamics

Authors: Yahav Bechavod, Katrina Ligett, Zhiwei Steven Wu, Juba Ziani

Abstract: We consider an online regression setting in which individuals adapt to the regression model: arriving individuals are aware of the current model, and invest strategically in modifying their own features so as to improve the predicted score that the current model assigns to them. Such feature manipulation has been observed in various scenarios -- from credit assessment to school admissions -- posin… ▽ More We consider an online regression setting in which individuals adapt to the regression model: arriving individuals are aware of the current model, and invest strategically in modifying their own features so as to improve the predicted score that the current model assigns to them. Such feature manipulation has been observed in various scenarios -- from credit assessment to school admissions -- posing a challenge for the learner. Surprisingly, we find that such strategic manipulations may in fact help the learner recover the meaningful variables -- that is, the features that, when changed, affect the true label (as opposed to non-meaningful features that have no effect). We show that even simple behavior on the learner's part allows her to simultaneously i) accurately recover the meaningful features, and ii) incentivize agents to invest in these meaningful features, providing incentives for improvement. △ Less

Submitted 28 February, 2021; v1 submitted 17 February, 2020; originally announced February 2020.

Comments: The Conference version of this paper is to appear in the Proceedings of AISTATS 2021. 27 pages

arXiv:2002.05660 [pdf, other]

Learn to Expect the Unexpected: Probably Approximately Correct Domain Generalization

Authors: Vikas K. Garg, Adam Kalai, Katrina Ligett, Zhiwei Steven Wu

Abstract: Domain generalization is the problem of machine learning when the training data and the test data come from different data domains. We present a simple theoretical model of learning to generalize across domains in which there is a meta-distribution over data distributions, and those data distributions may even have different supports. In our model, the training data given to a learning algorithm c… ▽ More Domain generalization is the problem of machine learning when the training data and the test data come from different data domains. We present a simple theoretical model of learning to generalize across domains in which there is a meta-distribution over data distributions, and those data distributions may even have different supports. In our model, the training data given to a learning algorithm consists of multiple datasets each from a single domain drawn in turn from the meta-distribution. We study this model in three different problem settings---a multi-domain Massart noise setting, a decision tree multi-dataset setting, and a feature selection setting, and find that computationally efficient, polynomial-sample domain generalization is possible in each. Experiments demonstrate that our feature selection algorithm indeed ignores spurious correlations and improves generalization. △ Less

Submitted 13 February, 2020; originally announced February 2020.

arXiv:1911.10137 [pdf, ps, other]

Privately Learning Thresholds: Closing the Exponential Gap

Authors: Haim Kaplan, Katrina Ligett, Yishay Mansour, Moni Naor, Uri Stemmer

Abstract: We study the sample complexity of learning threshold functions under the constraint of differential privacy. It is assumed that each labeled example in the training data is the information of one individual and we would like to come up with a generalizing hypothesis $h$ while guaranteeing differential privacy for the individuals. Intuitively, this means that any single labeled example in the train… ▽ More We study the sample complexity of learning threshold functions under the constraint of differential privacy. It is assumed that each labeled example in the training data is the information of one individual and we would like to come up with a generalizing hypothesis $h$ while guaranteeing differential privacy for the individuals. Intuitively, this means that any single labeled example in the training data should not have a significant effect on the choice of the hypothesis. This problem has received much attention recently; unlike the non-private case, where the sample complexity is independent of the domain size and just depends on the desired accuracy and confidence, for private learning the sample complexity must depend on the domain size $X$ (even for approximate differential privacy). Alon et al. (STOC 2019) showed a lower bound of $Ω(\log^*|X|)$ on the sample complexity and Bun et al. (FOCS 2015) presented an approximate-private learner with sample complexity $\tilde{O}\left(2^{\log^*|X|}\right)$. In this work we reduce this gap significantly, almost settling the sample complexity. We first present a new upper bound (algorithm) of $\tilde{O}\left(\left(\log^*|X|\right)^2\right)$ on the sample complexity and then present an improved version with sample complexity $\tilde{O}\left(\left(\log^*|X|\right)^{1.5}\right)$. Our algorithm is constructed for the related interior point problem, where the goal is to find a point between the largest and smallest input elements. It is based on selecting an input-dependent hash function and using it to embed the database into a domain whose size is reduced logarithmically; this results in a new database, an interior point of which can be used to generate an interior point in the original database in a differentially private manner. △ Less

Submitted 22 November, 2019; originally announced November 2019.

arXiv:1909.03577 [pdf, other]

A New Analysis of Differential Privacy's Generalization Guarantees

Authors: Christopher Jung, Katrina Ligett, Seth Neel, Aaron Roth, Saeed Sharifi-Malvajerdi, Moshe Shenfeld

Abstract: We give a new proof of the "transfer theorem" underlying adaptive data analysis: that any mechanism for answering adaptively chosen statistical queries that is differentially private and sample-accurate is also accurate out-of-sample. Our new proof is elementary and gives structural insights that we expect will be useful elsewhere. We show: 1) that differential privacy ensures that the expectation… ▽ More We give a new proof of the "transfer theorem" underlying adaptive data analysis: that any mechanism for answering adaptively chosen statistical queries that is differentially private and sample-accurate is also accurate out-of-sample. Our new proof is elementary and gives structural insights that we expect will be useful elsewhere. We show: 1) that differential privacy ensures that the expectation of any query on the posterior distribution on datasets induced by the transcript of the interaction is close to its true value on the data distribution, and 2) sample accuracy on its own ensures that any query answer produced by the mechanism is close to its posterior expectation with high probability. This second claim follows from a thought experiment in which we imagine that the dataset is resampled from the posterior distribution after the mechanism has committed to its answers. The transfer theorem then follows by summing these two bounds, and in particular, avoids the "monitor argument" used to derive high probability bounds in prior work. An upshot of our new proof technique is that the concrete bounds we obtain are substantially better than the best previously known bounds, even though the improvements are in the constants, rather than the asymptotics (which are known to be tight). As we show, our new bounds outperform the naive "sample-splitting" baseline at dramatically smaller dataset sizes compared to the previous state of the art, bringing techniques from this literature closer to practicality. △ Less

Submitted 3 June, 2024; v1 submitted 8 September, 2019; originally announced September 2019.

arXiv:1906.00930 [pdf, ps, other]

A necessary and sufficient stability notion for adaptive generalization

Authors: Katrina Ligett, Moshe Shenfeld

Abstract: We introduce a new notion of the stability of computations, which holds under post-processing and adaptive composition. We show that the notion is both necessary and sufficient to ensure generalization in the face of adaptivity, for any computations that respond to bounded-sensitivity linear queries while providing accuracy with respect to the data sample set. The stability notion is based on quan… ▽ More We introduce a new notion of the stability of computations, which holds under post-processing and adaptive composition. We show that the notion is both necessary and sufficient to ensure generalization in the face of adaptivity, for any computations that respond to bounded-sensitivity linear queries while providing accuracy with respect to the data sample set. The stability notion is based on quantifying the effect of observing a computation's outputs on the posterior over the data sample elements. We show a separation between this stability notion and previously studied notion and observe that all differentially private algorithms also satisfy this notion. △ Less

Submitted 28 December, 2019; v1 submitted 3 June, 2019; originally announced June 2019.

Journal ref: In Advances in Neural Information Processing Systems 2019 (pp. 11481-11490)

arXiv:1904.11875 [pdf, other]

Learning to Prune: Speeding up Repeated Computations

Authors: Daniel Alabi, Adam Tauman Kalai, Katrina Ligett, Cameron Musco, Christos Tzamos, Ellen Vitercik

Abstract: It is common to encounter situations where one must solve a sequence of similar computational problems. Running a standard algorithm with worst-case runtime guarantees on each instance will fail to take advantage of valuable structure shared across the problem instances. For example, when a commuter drives from work to home, there are typically only a handful of routes that will ever be the shorte… ▽ More It is common to encounter situations where one must solve a sequence of similar computational problems. Running a standard algorithm with worst-case runtime guarantees on each instance will fail to take advantage of valuable structure shared across the problem instances. For example, when a commuter drives from work to home, there are typically only a handful of routes that will ever be the shortest path. A naive algorithm that does not exploit this common structure may spend most of its time checking roads that will never be in the shortest path. More generally, we can often ignore large swaths of the search space that will likely never contain an optimal solution. We present an algorithm that learns to maximally prune the search space on repeated computations, thereby reducing runtime while provably outputting the correct solution each period with high probability. Our algorithm employs a simple explore-exploit technique resembling those used in online algorithms, though our setting is quite different. We prove that, with respect to our model of pruning search spaces, our approach is optimal up to constant factors. Finally, we illustrate the applicability of our model and algorithm to three classic problems: shortest-path routing, string search, and linear programming. We present experiments confirming that our simple algorithm is effective at significantly reducing the runtime of solving repeated computations. △ Less

Submitted 26 April, 2019; originally announced April 2019.

arXiv:1902.02242 [pdf, ps, other]

Equal Opportunity in Online Classification with Partial Feedback

Authors: Yahav Bechavod, Katrina Ligett, Aaron Roth, Bo Waggoner, Zhiwei Steven Wu

Abstract: We study an online classification problem with partial feedback in which individuals arrive one at a time from a fixed but unknown distribution, and must be classified as positive or negative. Our algorithm only observes the true label of an individual if they are given a positive classification. This setting captures many classification problems for which fairness is a concern: for example, in cr… ▽ More We study an online classification problem with partial feedback in which individuals arrive one at a time from a fixed but unknown distribution, and must be classified as positive or negative. Our algorithm only observes the true label of an individual if they are given a positive classification. This setting captures many classification problems for which fairness is a concern: for example, in criminal recidivism prediction, recidivism is only observed if the inmate is released; in lending applications, loan repayment is only observed if the loan is granted. We require that our algorithms satisfy common statistical fairness constraints (such as equalizing false positive or negative rates -- introduced as "equal opportunity" in Hardt et al. (2016)) at every round, with respect to the underlying distribution. We give upper and lower bounds characterizing the cost of this constraint in terms of the regret rate (and show that it is mild), and give an oracle efficient algorithm that achieves the upper bound. △ Less

Submitted 16 April, 2020; v1 submitted 6 February, 2019; originally announced February 2019.

Comments: The Conference version of this paper appears in the Proceedings of NeurIPS 2019. 29 pages

arXiv:1809.04224 [pdf, other]

Access to Population-Level Signaling as a Source of Inequality

Authors: Nicole Immorlica, Katrina Ligett, Juba Ziani

Abstract: We identify and explore differential access to population-level signaling (also known as information design) as a source of unequal access to opportunity. A population-level signaler has potentially noisy observations of a binary type for each member of a population and, based on this, produces a signal about each member. A decision-maker infers types from signals and accepts those individuals who… ▽ More We identify and explore differential access to population-level signaling (also known as information design) as a source of unequal access to opportunity. A population-level signaler has potentially noisy observations of a binary type for each member of a population and, based on this, produces a signal about each member. A decision-maker infers types from signals and accepts those individuals whose type is high in expectation. We assume the signaler of the disadvantaged population reveals her observations to the decision-maker, whereas the signaler of the advantaged population forms signals strategically. We study the expected utility of the populations as measured by the fraction of accepted members, as well as the false positive rates (FPR) and false negative rates (FNR). We first show the intuitive results that for a fixed environment, the advantaged population has higher expected utility, higher FPR, and lower FNR, than the disadvantaged one (despite having identical population quality), and that more accurate observations improve the expected utility of the advantaged population while harming that of the disadvantaged one. We next explore the introduction of a publicly-observable signal, such as a test score, as a potential intervention. Our main finding is that this natural intervention, intended to reduce the inequality between the populations' utilities, may actually exacerbate it in settings where observations and test scores are noisy. △ Less

Submitted 11 September, 2018; originally announced September 2018.

arXiv:1802.07407 [pdf, ps, other]

Third-Party Data Providers Ruin Simple Mechanisms

Authors: Yang Cai, Federico Echenique, Hu Fu, Katrina Ligett, Adam Wierman, Juba Ziani

Abstract: Motivated by the growing prominence of third-party data providers in online marketplaces, this paper studies the impact of the presence of third-party data providers on mechanism design. When no data provider is present, it has been shown that simple mechanisms are "good enough" -- they can achieve a constant fraction of the revenue of optimal mechanisms. The results in this paper demonstrate that… ▽ More Motivated by the growing prominence of third-party data providers in online marketplaces, this paper studies the impact of the presence of third-party data providers on mechanism design. When no data provider is present, it has been shown that simple mechanisms are "good enough" -- they can achieve a constant fraction of the revenue of optimal mechanisms. The results in this paper demonstrate that this is no longer true in the presence of a third-party data provider who can provide the bidder with a signal that is correlated with the item type. Specifically, even with a single seller, a single bidder, and a single item of uncertain type for sale, the strategies of pricing each item-type separately (the analog of item pricing for multi-item auctions) and bundling all item-types under a single price (the analog of grand bundling) can both simultaneously be a logarithmic factor worse than the optimal revenue. Further, in the presence of a data provider, item-type partitioning mechanisms---a more general class of mechanisms which divide item-types into disjoint groups and offer prices for each group---still cannot achieve within a $\log \log$ factor of the optimal revenue. Thus, our results highlight that the presence of a data-provider forces the use of more complicated mechanisms in order to achieve a constant fraction of the optimal revenue. △ Less

Submitted 16 February, 2020; v1 submitted 20 February, 2018; originally announced February 2018.

arXiv:1707.00044 [pdf, other]

Penalizing Unfairness in Binary Classification

Authors: Yahav Bechavod, Katrina Ligett

Abstract: We present a new approach for mitigating unfairness in learned classifiers. In particular, we focus on binary classification tasks over individuals from two populations, where, as our criterion for fairness, we wish to achieve similar false positive rates in both populations, and similar false negative rates in both populations. As a proof of concept, we implement our approach and empirically eval… ▽ More We present a new approach for mitigating unfairness in learned classifiers. In particular, we focus on binary classification tasks over individuals from two populations, where, as our criterion for fairness, we wish to achieve similar false positive rates in both populations, and similar false negative rates in both populations. As a proof of concept, we implement our approach and empirically evaluate its ability to achieve both fairness and accuracy, using datasets from the fields of criminal risk assessment, credit, lending, and college admissions. △ Less

Submitted 8 March, 2018; v1 submitted 30 June, 2017; originally announced July 2017.

arXiv:1705.10829 [pdf, other]

Accuracy First: Selecting a Differential Privacy Level for Accuracy-Constrained ERM

Authors: Katrina Ligett, Seth Neel, Aaron Roth, Bo Waggoner, Z. Steven Wu

Abstract: Traditional approaches to differential privacy assume a fixed privacy requirement $ε$ for a computation, and attempt to maximize the accuracy of the computation subject to the privacy constraint. As differential privacy is increasingly deployed in practical settings, it may often be that there is instead a fixed accuracy requirement for a given computation and the data analyst would like to maximi… ▽ More Traditional approaches to differential privacy assume a fixed privacy requirement $ε$ for a computation, and attempt to maximize the accuracy of the computation subject to the privacy constraint. As differential privacy is increasingly deployed in practical settings, it may often be that there is instead a fixed accuracy requirement for a given computation and the data analyst would like to maximize the privacy of the computation subject to the accuracy constraint. This raises the question of how to find and run a maximally private empirical risk minimizer subject to a given accuracy requirement. We propose a general "noise reduction" framework that can apply to a variety of private empirical risk minimization (ERM) algorithms, using them to "search" the space of privacy levels to find the empirically strongest one that meets the accuracy constraint, incurring only logarithmic overhead in the number of privacy levels searched. The privacy analysis of our algorithm leads naturally to a version of differential privacy where the privacy parameters are dependent on the data, which we term ex-post privacy, and which is related to the recently introduced notion of privacy odometers. We also give an ex-post privacy analysis of the classical AboveThreshold privacy tool, modifying it to allow for queries chosen depending on the database. Finally, we apply our approach to two common objectives, regularized linear and logistic regression, and empirically compare our noise reduction methods to (i) inverting the theoretical utility guarantees of standard private ERM algorithms and (ii) a stronger, empirical baseline based on binary search. △ Less

Submitted 30 May, 2017; originally announced May 2017.

Comments: 24 pages single-column

arXiv:1604.02676 [pdf, ps, other]

Approximating Nash Equilibria in Tree Polymatrix Games

Authors: Siddharth Barman, Katrina Ligett, Georgios Piliouras

Abstract: We develop a quasi-polynomial time Las Vegas algorithm for approximating Nash equilibria in polymatrix games over trees, under a mild renormalizing assumption. Our result, in particular, leads to an expected polynomial-time algorithm for computing approximate Nash equilibria of tree polymatrix games in which the number of actions per player is a fixed constant. Further, for trees with constant deg… ▽ More We develop a quasi-polynomial time Las Vegas algorithm for approximating Nash equilibria in polymatrix games over trees, under a mild renormalizing assumption. Our result, in particular, leads to an expected polynomial-time algorithm for computing approximate Nash equilibria of tree polymatrix games in which the number of actions per player is a fixed constant. Further, for trees with constant degree, the running time of the algorithm matches the best known upper bound for approximating Nash equilibria in bimatrix games (Lipton, Markakis, and Mehta 2003). Notably, this work closely complements the hardness result of Rubinstein (2015), which establishes the inapproximability of Nash equilibria in polymatrix games over constant-degree bipartite graphs with two actions per player. △ Less

Submitted 10 April, 2016; originally announced April 2016.

Comments: Appeared in the proceedings of the 8th International Symposium on Algorithmic Game Theory (SAGT), 2015. 11 pages

arXiv:1603.07319 [pdf, other]

Putting Peer Prediction Under the Micro(economic)scope and Making Truth-telling Focal

Authors: Yuqing Kong, Grant Schoenebeck, Katrina Ligett

Abstract: Peer-prediction is a (meta-)mechanism which, given any proper scoring rule, produces a mechanism to elicit privately-held, non-verifiable information from self-interested agents. Formally, truth-telling is a strict Nash equilibrium of the mechanism. Unfortunately, there may be other equilibria as well (including uninformative equilibria where all players simply report the same fixed signal, regard… ▽ More Peer-prediction is a (meta-)mechanism which, given any proper scoring rule, produces a mechanism to elicit privately-held, non-verifiable information from self-interested agents. Formally, truth-telling is a strict Nash equilibrium of the mechanism. Unfortunately, there may be other equilibria as well (including uninformative equilibria where all players simply report the same fixed signal, regardless of their true signal) and, typically, the truth-telling equilibrium does not have the highest expected payoff. The main result of this paper is to show that, in the symmetric binary setting, by tweaking peer-prediction, in part by carefully selecting the proper scoring rule it is based on, we can make the truth-telling equilibrium focal---that is, truth-telling has higher expected payoff than any other equilibrium. Along the way, we prove the following: in the setting where agents receive binary signals we 1) classify all equilibria of the peer-prediction mechanism; 2) introduce a new technical tool for understanding scoring rules, which allows us to make truth-telling pay better than any other informative equilibrium; 3) leverage this tool to provide an optimal version of the previous result; that is, we optimize the gap between the expected payoff of truth-telling and other informative equilibria; and 4) show that with a slight modification to the peer prediction framework, we can, in general, make the truth-telling equilibrium focal---that is, truth-telling pays more than any other equilibrium (including the uninformative equilibria). △ Less

Submitted 23 March, 2016; originally announced March 2016.

arXiv:1603.01318 [pdf, other]

Efficiently characterizing games consistent with perturbed equilibrium observations

Authors: Juba Ziani, Venkat Chandrasekaran, Katrina Ligett

Abstract: We study the problem of characterizing the set of games that are consistent with observed equilibrium play. Our contribution is to develop and analyze a new methodology based on convex optimization to address this problem for many classes of games and observation models of interest. Our approach provides a sharp, computationally efficient characterization of the extent to which a particular set of… ▽ More We study the problem of characterizing the set of games that are consistent with observed equilibrium play. Our contribution is to develop and analyze a new methodology based on convex optimization to address this problem for many classes of games and observation models of interest. Our approach provides a sharp, computationally efficient characterization of the extent to which a particular set of observations constrains the space of games that could have generated them. This allows us to solve a number of variants of this problem as well as to quantify the power of games from particular classes (e.g., zero-sum, potential, linearly parameterized) to explain player behavior. We illustrate our approach with numerical simulations. △ Less

Submitted 22 March, 2017; v1 submitted 3 March, 2016; originally announced March 2016.

arXiv:1602.07726 [pdf, ps, other]

Adaptive Learning with Robust Generalization Guarantees

Authors: Rachel Cummings, Katrina Ligett, Kobbi Nissim, Aaron Roth, Zhiwei Steven Wu

Abstract: The traditional notion of generalization---i.e., learning a hypothesis whose empirical error is close to its true error---is surprisingly brittle. As has recently been noted in [DFH+15b], even if several algorithms have this guarantee in isolation, the guarantee need not hold if the algorithms are composed adaptively. In this paper, we study three notions of generalization---increasing in strength… ▽ More The traditional notion of generalization---i.e., learning a hypothesis whose empirical error is close to its true error---is surprisingly brittle. As has recently been noted in [DFH+15b], even if several algorithms have this guarantee in isolation, the guarantee need not hold if the algorithms are composed adaptively. In this paper, we study three notions of generalization---increasing in strength---that are robust to postprocessing and amenable to adaptive composition, and examine the relationships between them. We call the weakest such notion Robust Generalization. A second, intermediate, notion is the stability guarantee known as differential privacy. The strongest guarantee we consider we call Perfect Generalization. We prove that every hypothesis class that is PAC learnable is also PAC learnable in a robustly generalizing fashion, with almost the same sample complexity. It was previously known that differentially private algorithms satisfy robust generalization. In this paper, we show that robust generalization is a strictly weaker concept, and that there is a learning task that can be carried out subject to robust generalization guarantees, yet cannot be carried out subject to differential privacy. We also show that perfect generalization is a strictly stronger guarantee than differential privacy, but that, nevertheless, many learning tasks can be carried out subject to the guarantees of perfect generalization. △ Less

Submitted 1 June, 2016; v1 submitted 24 February, 2016; originally announced February 2016.

arXiv:1508.03769 [pdf, ps, other]

A Tale of Two Metrics: Simultaneous Bounds on Competitiveness and Regret

Authors: Lachlan L. H. Andrew, Siddharth Barman, Katrina Ligett, Minghong Lin, Adam Meyerson, Alan Roytman, Adam Wierman

Abstract: We consider algorithms for "smoothed online convex optimization" problems, a variant of the class of online convex optimization problems that is strongly related to metrical task systems. Prior literature on these problems has focused on two performance metrics: regret and the competitive ratio. There exist known algorithms with sublinear regret and known algorithms with constant competitive ratio… ▽ More We consider algorithms for "smoothed online convex optimization" problems, a variant of the class of online convex optimization problems that is strongly related to metrical task systems. Prior literature on these problems has focused on two performance metrics: regret and the competitive ratio. There exist known algorithms with sublinear regret and known algorithms with constant competitive ratios; however, no known algorithm achieves both simultaneously. We show that this is due to a fundamental incompatibility between these two metrics - no algorithm (deterministic or randomized) can achieve sublinear regret and a constant competitive ratio, even in the case when the objective functions are linear. However, we also exhibit an algorithm that, for the important special case of one-dimensional decision spaces, provides sublinear regret while maintaining a competitive ratio that grows arbitrarily slowly. △ Less

Submitted 15 August, 2015; originally announced August 2015.

arXiv:1508.03735 [pdf, ps, other]

Coordination Complexity: Small Information Coordinating Large Populations

Authors: Rachel Cummings, Katrina Ligett, Jaikumar Radhakrishnan, Aaron Roth, Zhiwei Steven Wu

Abstract: We initiate the study of a quantity that we call coordination complexity. In a distributed optimization problem, the information defining a problem instance is distributed among $n$ parties, who need to each choose an action, which jointly will form a solution to the optimization problem. The coordination complexity represents the minimal amount of information that a centralized coordinator, who h… ▽ More We initiate the study of a quantity that we call coordination complexity. In a distributed optimization problem, the information defining a problem instance is distributed among $n$ parties, who need to each choose an action, which jointly will form a solution to the optimization problem. The coordination complexity represents the minimal amount of information that a centralized coordinator, who has full knowledge of the problem instance, needs to broadcast in order to coordinate the $n$ parties to play a nearly optimal solution. We show that upper bounds on the coordination complexity of a problem imply the existence of good jointly differentially private algorithms for solving that problem, which in turn are known to upper bound the price of anarchy in certain games with dynamically changing populations. We show several results. We fully characterize the coordination complexity for the problem of computing a many-to-one matching in a bipartite graph by giving almost matching lower and upper bounds.Our upper bound in fact extends much more generally, to the problem of solving a linearly separable convex program. We also give a different upper bound technique, which we use to bound the coordination complexity of coordinating a Nash equilibrium in a routing game, and of computing a stable matching. △ Less

Submitted 5 January, 2016; v1 submitted 15 August, 2015; originally announced August 2015.

arXiv:1508.03080 [pdf, other]

The Strange Case of Privacy in Equilibrium Models

Authors: Rachel Cummings, Katrina Ligett, Mallesh M. Pai, Aaron Roth

Abstract: We study how privacy technologies affect user and advertiser behavior in a simple economic model of targeted advertising. In our model, a consumer first decides whether or not to buy a good, and then an advertiser chooses an advertisement to show the consumer. The consumer's value for the good is correlated with her type, which determines which ad the advertiser would prefer to show to her---and h… ▽ More We study how privacy technologies affect user and advertiser behavior in a simple economic model of targeted advertising. In our model, a consumer first decides whether or not to buy a good, and then an advertiser chooses an advertisement to show the consumer. The consumer's value for the good is correlated with her type, which determines which ad the advertiser would prefer to show to her---and hence, the advertiser would like to use information about the consumer's purchase decision to target the ad that he shows. In our model, the advertiser is given only a differentially private signal about the consumer's behavior---which can range from no signal at all to a perfect signal, as we vary the differential privacy parameter. This allows us to study equilibrium behavior as a function of the level of privacy provided to the consumer. We show that this behavior can be highly counter-intuitive, and that the effect of adding privacy in equilibrium can be completely different from what we would expect if we ignored equilibrium incentives. Specifically, we show that increasing the level of privacy can actually increase the amount of information about the consumer's type contained in the signal the advertiser receives, lead to decreased utility for the consumer, and increased profit for the advertiser, and that generally these quantities can be non-monotonic and even discontinuous in the privacy level of the signal. △ Less

Submitted 12 August, 2015; originally announced August 2015.

arXiv:1506.03489 [pdf, ps, other]

Truthful Linear Regression

Authors: Rachel Cummings, Stratis Ioannidis, Katrina Ligett

Abstract: We consider the problem of fitting a linear model to data held by individuals who are concerned about their privacy. Incentivizing most players to truthfully report their data to the analyst constrains our design to mechanisms that provide a privacy guarantee to the participants; we use differential privacy to model individuals' privacy losses. This immediately poses a problem, as differentially p… ▽ More We consider the problem of fitting a linear model to data held by individuals who are concerned about their privacy. Incentivizing most players to truthfully report their data to the analyst constrains our design to mechanisms that provide a privacy guarantee to the participants; we use differential privacy to model individuals' privacy losses. This immediately poses a problem, as differentially private computation of a linear model necessarily produces a biased estimation, and existing approaches to design mechanisms to elicit data from privacy-sensitive individuals do not generalize well to biased estimators. We overcome this challenge through an appropriate design of the computation and payment scheme. △ Less

Submitted 10 June, 2015; originally announced June 2015.

Comments: To appear in Proceedings of the 28th Annual Conference on Learning Theory (COLT 2015)

arXiv:1504.06314 [pdf, ps, other]

Finding Any Nontrivial Coarse Correlated Equilibrium Is Hard

Authors: Siddharth Barman, Katrina Ligett

Abstract: One of the most appealing aspects of the (coarse) correlated equilibrium concept is that natural dynamics quickly arrive at approximations of such equilibria, even in games with many players. In addition, there exist polynomial-time algorithms that compute exact (coarse) correlated equilibria. In light of these results, a natural question is how good are the (coarse) correlated equilibria that can… ▽ More One of the most appealing aspects of the (coarse) correlated equilibrium concept is that natural dynamics quickly arrive at approximations of such equilibria, even in games with many players. In addition, there exist polynomial-time algorithms that compute exact (coarse) correlated equilibria. In light of these results, a natural question is how good are the (coarse) correlated equilibria that can arise from any efficient algorithm or dynamics. In this paper we address this question, and establish strong negative results. In particular, we show that in multiplayer games that have a succinct representation, it is NP-hard to compute any coarse correlated equilibrium (or approximate coarse correlated equilibrium) with welfare strictly better than the worst possible. The focus on succinct games ensures that the underlying complexity question is interesting; many multiplayer games of interest are in fact succinct. Our results imply that, while one can efficiently compute a coarse correlated equilibrium, one cannot provide any nontrivial welfare guarantee for the resulting equilibrium, unless P=NP. We show that analogous hardness results hold for correlated equilibria, and persist under the egalitarian objective or Pareto optimality. To complement the hardness results, we develop an algorithmic framework that identifies settings in which we can efficiently compute an approximate correlated equilibrium with near-optimal welfare. We use this framework to develop an efficient algorithm for computing an approximate correlated equilibrium with near-optimal welfare in aggregative games. △ Less

Submitted 23 April, 2015; originally announced April 2015.

Comments: 21 pages

ACM Class: F.2.0

arXiv:1408.1429 [pdf, ps, other]

Achieving Target Equilibria in Network Routing Games without Knowing the Latency Functions

Authors: Umang Bhaskar, Katrina Ligett, Leonard J. Schulman, Chaitanya Swamy

Abstract: The analysis of network routing games typically assumes, right at the onset, precise and detailed information about the latency functions. Such information may, however, be unavailable or difficult to obtain. Moreover, one is often primarily interested in enforcing a desired target flow as the equilibrium by suitably influencing player behavior in the routing game. We ask whether one can achieve t… ▽ More The analysis of network routing games typically assumes, right at the onset, precise and detailed information about the latency functions. Such information may, however, be unavailable or difficult to obtain. Moreover, one is often primarily interested in enforcing a desired target flow as the equilibrium by suitably influencing player behavior in the routing game. We ask whether one can achieve target flows as equilibria without knowing the underlying latency functions. Our main result gives a crisp positive answer to this question. We show that, under fairly general settings, one can efficiently compute edge tolls that induce a given target multicommodity flow in a nonatomic routing game using a polynomial number of queries to an oracle that takes candidate tolls as input and returns the resulting equilibrium flow. This result is obtained via a novel application of the ellipsoid method. Our algorithm extends easily to many other settings, such as (i) when certain edges cannot be tolled or there is an upper bound on the total toll paid by a user, and (ii) general nonatomic congestion games. We obtain tighter bounds on the query complexity for series-parallel networks, and single-commodity routing games with linear latency functions, and complement these with a query-complexity lower bound. We also obtain strong positive results for Stackelberg routing to achieve target equilibria in series-parallel graphs. Our results build upon various new techniques that we develop pertaining to the computation of, and connections between, different notions of approximate equilibrium; properties of multicommodity flows and tolls in series-parallel graphs; and sensitivity of equilibrium flow with respect to tolls. Our results demonstrate that one can indeed circumvent the potentially-onerous task of modeling latency functions, and yet obtain meaningful results for the underlying routing game. △ Less

Submitted 6 August, 2014; originally announced August 2014.

Comments: 36 pages, 3 figures

arXiv:1404.6003 [pdf, ps, other]

Buying Private Data without Verification

Authors: Arpita Ghosh, Katrina Ligett, Aaron Roth, Grant Schoenebeck

Abstract: We consider the problem of designing a survey to aggregate non-verifiable information from a privacy-sensitive population: an analyst wants to compute some aggregate statistic from the private bits held by each member of a population, but cannot verify the correctness of the bits reported by participants in his survey. Individuals in the population are strategic agents with a cost for privacy, \ie… ▽ More We consider the problem of designing a survey to aggregate non-verifiable information from a privacy-sensitive population: an analyst wants to compute some aggregate statistic from the private bits held by each member of a population, but cannot verify the correctness of the bits reported by participants in his survey. Individuals in the population are strategic agents with a cost for privacy, \ie, they not only account for the payments they expect to receive from the mechanism, but also their privacy costs from any information revealed about them by the mechanism's outcome---the computed statistic as well as the payments---to determine their utilities. How can the analyst design payments to obtain an accurate estimate of the population statistic when individuals strategically decide both whether to participate and whether to truthfully report their sensitive information? We design a differentially private peer-prediction mechanism that supports accurate estimation of the population statistic as a Bayes-Nash equilibrium in settings where agents have explicit preferences for privacy. The mechanism requires knowledge of the marginal prior distribution on bits $b_i$, but does not need full knowledge of the marginal distribution on the costs $c_i$, instead requiring only an approximate upper bound. Our mechanism guarantees $ε$-differential privacy to each agent $i$ against any adversary who can observe the statistical estimate output by the mechanism, as well as the payments made to the $n-1$ other agents $j\neq i$. Finally, we show that with slightly more structured assumptions on the privacy cost functions of each agent, the cost of running the survey goes to $0$ as the number of agents diverges. △ Less

Submitted 23 April, 2014; originally announced April 2014.

Comments: Appears in EC 2014

arXiv:1307.3794 [pdf, ps, other]

The Network Improvement Problem for Equilibrium Routing

Authors: Umang Bhaskar, Katrina Ligett, Leonard J. Schulman

Abstract: In routing games, agents pick their routes through a network to minimize their own delay. A primary concern for the network designer in routing games is the average agent delay at equilibrium. A number of methods to control this average delay have received substantial attention, including network tolls, Stackelberg routing, and edge removal. A related approach with arguably greater practical rel… ▽ More In routing games, agents pick their routes through a network to minimize their own delay. A primary concern for the network designer in routing games is the average agent delay at equilibrium. A number of methods to control this average delay have received substantial attention, including network tolls, Stackelberg routing, and edge removal. A related approach with arguably greater practical relevance is that of making investments in improvements to the edges of the network, so that, for a given investment budget, the average delay at equilibrium in the improved network is minimized. This problem has received considerable attention in the literature on transportation research and a number of different algorithms have been studied. To our knowledge, none of this work gives guarantees on the output quality of any polynomial-time algorithm. We study a model for this problem introduced in transportation research literature, and present both hardness results and algorithms that obtain nearly optimal performance guarantees. - We first show that a simple algorithm obtains good approximation guarantees for the problem. Despite its simplicity, we show that for affine delays the approximation ratio of 4/3 obtained by the algorithm cannot be improved. - To obtain better results, we then consider restricted topologies. For graphs consisting of parallel paths with affine delay functions we give an optimal algorithm. However, for graphs that consist of a series of parallel links, we show the problem is weakly NP-hard. - Finally, we consider the problem in series-parallel graphs, and give an FPTAS for this case. Our work thus formalizes the intuition held by transportation researchers that the network improvement problem is hard, and presents topology-dependent algorithms that have provably tight approximation guarantees. △ Less

Submitted 10 November, 2013; v1 submitted 14 July, 2013; originally announced July 2013.

Comments: 27 pages (including abstract), 3 figures

ACM Class: G.2.0

arXiv:1202.4741 [pdf, ps, other]

Take it or Leave it: Running a Survey when Privacy Comes at a Cost

Authors: Katrina Ligett, Aaron Roth

Abstract: In this paper, we consider the problem of estimating a potentially sensitive (individually stigmatizing) statistic on a population. In our model, individuals are concerned about their privacy, and experience some cost as a function of their privacy loss. Nevertheless, they would be willing to participate in the survey if they were compensated for their privacy cost. These cost functions are not pu… ▽ More In this paper, we consider the problem of estimating a potentially sensitive (individually stigmatizing) statistic on a population. In our model, individuals are concerned about their privacy, and experience some cost as a function of their privacy loss. Nevertheless, they would be willing to participate in the survey if they were compensated for their privacy cost. These cost functions are not publicly known, however, nor do we make Bayesian assumptions about their form or distribution. Individuals are rational and will misreport their costs for privacy if doing so is in their best interest. Ghosh and Roth recently showed in this setting, when costs for privacy loss may be correlated with private types, if individuals value differential privacy, no individually rational direct revelation mechanism can compute any non-trivial estimate of the population statistic. In this paper, we circumvent this impossibility result by proposing a modified notion of how individuals experience cost as a function of their privacy loss, and by giving a mechanism which does not operate by direct revelation. Instead, our mechanism has the ability to randomly approach individuals from a population and offer them a take-it-or-leave-it offer. This is intended to model the abilities of a surveyor who may stand on a street corner and approach passers-by. △ Less

Submitted 26 February, 2012; v1 submitted 21 February, 2012; originally announced February 2012.

arXiv:1109.2229 [pdf, ps, other]

A Learning Theory Approach to Non-Interactive Database Privacy

Authors: Avrim Blum, Katrina Ligett, Aaron Roth

Abstract: In this paper we demonstrate that, ignoring computational constraints, it is possible to privately release synthetic databases that are useful for large classes of queries -- much larger in size than the database itself. Specifically, we give a mechanism that privately releases synthetic data for a class of queries over a discrete domain with error that grows as a function of the size of the small… ▽ More In this paper we demonstrate that, ignoring computational constraints, it is possible to privately release synthetic databases that are useful for large classes of queries -- much larger in size than the database itself. Specifically, we give a mechanism that privately releases synthetic data for a class of queries over a discrete domain with error that grows as a function of the size of the smallest net approximately representing the answers to that class of queries. We show that this in particular implies a mechanism for counting queries that gives error guarantees that grow only with the VC-dimension of the class of queries, which itself grows only logarithmically with the size of the query class. We also show that it is not possible to privately release even simple classes of queries (such as intervals and their generalizations) over continuous domains. Despite this, we give a privacy-preserving polynomial time algorithm that releases information useful for all halfspace queries, given a slight relaxation of the utility guarantee. This algorithm does not release synthetic data, but instead another data structure capable of representing an answer for each query. We also give an efficient algorithm for releasing synthetic data for the class of interval queries and axis-aligned rectangles of constant dimension. Finally, inspired by learning theory, we introduce a new notion of data privacy, which we call distributional privacy, and show that it is strictly stronger than the prevailing privacy notion, differential privacy. △ Less

Submitted 10 September, 2011; originally announced September 2011.

Comments: Full Version. Extended Abstract appeared in STOC 2008

arXiv:1012.4763 [pdf, other]

A simple and practical algorithm for differentially private data release

Authors: Moritz Hardt, Katrina Ligett, Frank McSherry

Abstract: We present new theoretical results on differentially private data release useful with respect to any target class of counting queries, coupled with experimental results on a variety of real world data sets. Specifically, we study a simple combination of the multiplicative weights approach of [Hardt and Rothblum, 2010] with the exponential mechanism of [McSherry and Talwar, 2007]. The multiplicat… ▽ More We present new theoretical results on differentially private data release useful with respect to any target class of counting queries, coupled with experimental results on a variety of real world data sets. Specifically, we study a simple combination of the multiplicative weights approach of [Hardt and Rothblum, 2010] with the exponential mechanism of [McSherry and Talwar, 2007]. The multiplicative weights framework allows us to maintain and improve a distribution approximating a given data set with respect to a set of counting queries. We use the exponential mechanism to select those queries most incorrectly tracked by the current distribution. Combing the two, we quickly approach a distribution that agrees with the data set on the given set of queries up to small error. The resulting algorithm and its analysis is simple, but nevertheless improves upon previous work in terms of both error and running time. We also empirically demonstrate the practicality of our approach on several data sets commonly used in the statistical community for contingency table release. △ Less

Submitted 15 March, 2012; v1 submitted 21 December, 2010; originally announced December 2010.

Comments: rewritten, with much more extensive experimental validation than the previous version

arXiv:1010.2705 [pdf, ps, other]

Privacy-Compatibility For General Utility Metrics

Authors: Robert Kleinberg, Katrina Ligett

Abstract: In this note, we present a complete characterization of the utility metrics that allow for non-trivial differential privacy guarantees. In this note, we present a complete characterization of the utility metrics that allow for non-trivial differential privacy guarantees. △ Less

Submitted 13 October, 2010; originally announced October 2010.

arXiv:1003.0469 [pdf, ps, other]

Information-Sharing and Privacy in Social Networks

Authors: Jon Kleinberg, Katrina Ligett

Abstract: We present a new model for reasoning about the way information is shared among friends in a social network, and the resulting ways in which it spreads. Our model formalizes the intuition that revealing personal information in social settings involves a trade-off between the benefits of sharing information with friends, and the risks that additional gossi** will propagate it to people with whom… ▽ More We present a new model for reasoning about the way information is shared among friends in a social network, and the resulting ways in which it spreads. Our model formalizes the intuition that revealing personal information in social settings involves a trade-off between the benefits of sharing information with friends, and the risks that additional gossi** will propagate it to people with whom one is not on friendly terms. We study the behavior of rational agents in such a situation, and we characterize the existence and computability of stable information-sharing networks, in which agents do not have an incentive to change the partners with whom they share information. We analyze the implications of these stable networks for social welfare, and the resulting fragmentation of the social network. △ Less

Submitted 1 March, 2010; originally announced March 2010.

arXiv:0903.4510 [pdf, ps, other]

Differentially Private Combinatorial Optimization

Authors: Anupam Gupta, Katrina Ligett, Frank McSherry, Aaron Roth, Kunal Talwar

Abstract: Consider the following problem: given a metric space, some of whose points are "clients", open a set of at most $k$ facilities to minimize the average distance from the clients to these facilities. This is just the well-studied $k$-median problem, for which many approximation algorithms and hardness results are known. Note that the objective function encourages opening facilities in areas where… ▽ More Consider the following problem: given a metric space, some of whose points are "clients", open a set of at most $k$ facilities to minimize the average distance from the clients to these facilities. This is just the well-studied $k$-median problem, for which many approximation algorithms and hardness results are known. Note that the objective function encourages opening facilities in areas where there are many clients, and given a solution, it is often possible to get a good idea of where the clients are located. However, this poses the following quandary: what if the identity of the clients is sensitive information that we would like to keep private? Is it even possible to design good algorithms for this problem that preserve the privacy of the clients? In this paper, we initiate a systematic study of algorithms for discrete optimization problems in the framework of differential privacy (which formalizes the idea of protecting the privacy of individual input elements). We show that many such problems indeed have good approximation algorithms that preserve differential privacy; this is even in cases where it is impossible to preserve cryptographic definitions of privacy while computing any non-trivial approximation to even the_value_ of an optimal solution, let alone the entire solution. Apart from the $k$-median problem, we study the problems of vertex and set cover, min-cut, facility location, Steiner tree, and the recently introduced submodular maximization problem, "Combinatorial Public Projects" (CPP). △ Less

Submitted 10 November, 2009; v1 submitted 26 March, 2009; originally announced March 2009.

Comments: 28 pages

arXiv:0901.1365 [pdf, ps, other]

Differential Privacy with Compression

Authors: Shuheng Zhou, Katrina Ligett, Larry Wasserman

Abstract: This work studies formal utility and privacy guarantees for a simple multiplicative database transformation, where the data are compressed by a random linear or affine transformation, reducing the number of data records substantially, while preserving the number of original input variables. We provide an analysis framework inspired by a recent concept known as differential privacy (Dwork 06). Ou… ▽ More This work studies formal utility and privacy guarantees for a simple multiplicative database transformation, where the data are compressed by a random linear or affine transformation, reducing the number of data records substantially, while preserving the number of original input variables. We provide an analysis framework inspired by a recent concept known as differential privacy (Dwork 06). Our goal is to show that, despite the general difficulty of achieving the differential privacy guarantee, it is possible to publish synthetic data that are useful for a number of common statistical learning applications. This includes high dimensional sparse regression (Zhou et al. 07), principal component analysis (PCA), and other statistical measures (Liu et al. 06) based on the covariance of the initial data. △ Less

Submitted 10 January, 2009; originally announced January 2009.

Comments: 14 pages

Showing 1–35 of 35 results for author: Ligett, K