Skip to main content

Showing 1–48 of 48 results for author: Machanavajjhala, A

.
  1. arXiv:2310.12827  [pdf, other

    cs.DB cs.CR

    Privately Answering Queries on Skewed Data via Per Record Differential Privacy

    Authors: Jeremy Seeman, William Sexton, David Pujol, Ashwin Machanavajjhala

    Abstract: We consider the problem of the private release of statistics (like aggregate payrolls) where it is critical to preserve the contribution made by a small number of outlying large entities. We propose a privacy formalism, per-record zero concentrated differential privacy (PzCDP), where the privacy loss associated with each record is a public function of that record's value. Unlike other formalisms w… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: 14 pages, 5 figures

  2. arXiv:2309.08574  [pdf, other

    cs.DB cs.CR

    DP-PQD: Privately Detecting Per-Query Gaps In Synthetic Data Generated By Black-Box Mechanisms

    Authors: Shweta Patwa, Danyu Sun, Amir Gilad, Ashwin Machanavajjhala, Sudeepa Roy

    Abstract: Synthetic data generation methods, and in particular, private synthetic data generation methods, are gaining popularity as a means to make copies of sensitive databases that can be shared widely for research and data analysis. Some of the fundamental operations in data analysis include analyzing aggregated statistics, e.g., count, sum, or median, on a subset of data satisfying some conditions. Whe… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

  3. arXiv:2308.16298  [pdf, other

    cs.CR

    Publishing Wikipedia usage data with strong privacy guarantees

    Authors: Temilola Adeleye, Skye Berghel, Damien Desfontaines, Michael Hay, Isaac Johnson, Cléo Lemoisson, Ashwin Machanavajjhala, Tom Magerlein, Gabriele Modena, David Pujol, Daniel Simmons-Marengo, Hal Triedman

    Abstract: For almost 20 years, the Wikimedia Foundation has been publishing statistics about how many people visited each Wikipedia page on each day. This data helps Wikipedia editors determine where to focus their efforts to improve the online encyclopedia, and enables academic research. In June 2023, the Wikimedia Foundation, helped by Tumult Labs, addressed a long-standing request from Wikipedia editors… ▽ More

    Submitted 1 September, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

    Comments: 11 pages, 10 figures, Theory and Practice of Differential Privacy (TPDP) 2023

  4. arXiv:2212.10310  [pdf, other

    cs.CR cs.CY cs.DB

    PreFair: Privately Generating Justifiably Fair Synthetic Data

    Authors: David Pujol, Amir Gilad, Ashwin Machanavajjhala

    Abstract: When a database is protected by Differential Privacy (DP), its usability is limited in scope. In this scenario, generating a synthetic version of the data that mimics the properties of the private data allows users to perform any operation on the synthetic data, while maintaining the privacy of the original data. Therefore, multiple works have been devoted to devising systems for DP synthetic data… ▽ More

    Submitted 27 March, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: 15 pages, 11 figures

  5. arXiv:2212.09884  [pdf, other

    cs.CY cs.CR cs.DB

    Multi-Analyst Differential Privacy for Online Query Answering

    Authors: David Pujol, Albert Sun, Brandon Fain, Ashwin Machanavajjhala

    Abstract: Most differentially private mechanisms are designed for the use of a single analyst. In reality, however, there are often multiple stakeholders with different and possibly conflicting priorities that must share the same privacy loss budget. This motivates the problem of equitable budget-sharing for multi-analyst differential privacy. Our previous work defined desiderata that any mechanism in this… ▽ More

    Submitted 19 December, 2022; originally announced December 2022.

    Comments: 11 pages 3 figures

  6. arXiv:2212.04133  [pdf, other

    cs.CR

    Tumult Analytics: a robust, easy-to-use, scalable, and expressive framework for differential privacy

    Authors: Skye Berghel, Philip Bohannon, Damien Desfontaines, Charles Estes, Sam Haney, Luke Hartman, Michael Hay, Ashwin Machanavajjhala, Tom Magerlein, Gerome Miklau, Amritha Pai, William Sexton, Ruchit Shrestha

    Abstract: In this short paper, we outline the design of Tumult Analytics, a Python framework for differential privacy used at institutions such as the U.S. Census Bureau, the Wikimedia Foundation, or the Internal Revenue Service.

    Submitted 8 December, 2022; originally announced December 2022.

  7. arXiv:2209.03310  [pdf, other

    cs.CR stat.ME

    Bayesian and Frequentist Semantics for Common Variations of Differential Privacy: Applications to the 2020 Census

    Authors: Daniel Kifer, John M. Abowd, Robert Ashmead, Ryan Cumings-Menon, Philip Leclerc, Ashwin Machanavajjhala, William Sexton, Pavel Zhuravlev

    Abstract: The purpose of this paper is to guide interpretation of the semantic privacy guarantees for some of the major variations of differential privacy, which include pure, approximate, Rényi, zero-concentrated, and $f$ differential privacy. We interpret privacy-loss accounting parameters, frequentist semantics, and Bayesian semantics (including new results). The driving application is the interpretation… ▽ More

    Submitted 7 September, 2022; originally announced September 2022.

  8. arXiv:2209.01286  [pdf, other

    cs.DB

    DPXPlain: Privately Explaining Aggregate Query Answers

    Authors: Yuchao Tao, Amir Gilad, Ashwin Machanavajjhala, Sudeepa Roy

    Abstract: Differential privacy (DP) is the state-of-the-art and rigorous notion of privacy for answering aggregate database queries while preserving the privacy of sensitive information in the data. In today's era of data analysis, however, it poses new challenges for users to understand the trends and anomalies observed in the query results: Is the unexpected answer due to the data itself, or is it due to… ▽ More

    Submitted 2 September, 2022; originally announced September 2022.

  9. arXiv:2204.08986  [pdf, other

    cs.CR econ.EM stat.AP

    The 2020 Census Disclosure Avoidance System TopDown Algorithm

    Authors: John M. Abowd, Robert Ashmead, Ryan Cumings-Menon, Simson Garfinkel, Micah Heineck, Christine Heiss, Robert Johns, Daniel Kifer, Philip Leclerc, Ashwin Machanavajjhala, Brett Moran, William Sexton, Matthew Spence, Pavel Zhuravlev

    Abstract: The Census TopDown Algorithm (TDA) is a disclosure avoidance system using differential privacy for privacy-loss accounting. The algorithm ingests the final, edited version of the 2020 Census data and the final tabulation geographic definitions. The algorithm then creates noisy versions of key queries on the data, referred to as measurements, using zero-Concentrated Differential Privacy. Another ke… ▽ More

    Submitted 19 April, 2022; originally announced April 2022.

  10. arXiv:2203.05084  [pdf, other

    cs.DB cs.CR

    IncShrink: Architecting Efficient Outsourced Databases using Incremental MPC and Differential Privacy

    Authors: Chenghong Wang, Johes Bater, Kartik Nayak, Ashwin Machanavajjhala

    Abstract: In this paper, we consider secure outsourced growing databases that support view-based query answering. These databases allow untrusted servers to privately maintain a materialized view, such that they can use only the materialized view to process query requests instead of accessing the original data from which the view was derived. To tackle this, we devise a novel view-based secure outsourced gr… ▽ More

    Submitted 9 March, 2022; originally announced March 2022.

  11. arXiv:2112.09238  [pdf, other

    cs.CR

    Benchmarking Differentially Private Synthetic Data Generation Algorithms

    Authors: Yuchao Tao, Ryan McKenna, Michael Hay, Ashwin Machanavajjhala, Gerome Miklau

    Abstract: This work presents a systematic benchmark of differentially private synthetic data generation algorithms that can generate tabular data. Utility of the synthetic data is evaluated by measuring whether the synthetic data preserve the distribution of individual and pairs of attributes, pairwise correlation as well as on the accuracy of an ML classification model. In a comprehensive empirical evaluat… ▽ More

    Submitted 15 February, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

  12. Equity and Privacy: More Than Just a Tradeoff

    Authors: David Pujol, Ashwin Machanavajjhala

    Abstract: While the entire field of privacy preserving data analytics is focused on the privacy-utility tradeoff, recent work has shown that privacy preserving data publishing can introduce different levels of utility across different population groups. It is important to understand this new tradeoff between privacy and equity as privacy technology is being deployed in situations where the data products wil… ▽ More

    Submitted 8 November, 2021; originally announced November 2021.

    Comments: 3 pages, 1 figure. Published in IEEE Security & Privacy ( Volume: 19, Issue: 6, Nov.-Dec. 2021)

  13. arXiv:2107.10659  [pdf, ps, other

    cs.CR cs.DB stat.AP

    Differentially Private Algorithms for 2020 Census Detailed DHC Race \& Ethnicity

    Authors: Sam Haney, William Sexton, Ashwin Machanavajjhala, Michael Hay, Gerome Miklau

    Abstract: This article describes a proposed differentially private (DP) algorithms that the US Census Bureau is considering to release the Detailed Demographic and Housing Characteristics (DHC) Race & Ethnicity tabulations as part of the 2020 Census. The tabulations contain statistics (counts) of demographic and housing characteristics of the entire population of the US crossed with detailed races and tribe… ▽ More

    Submitted 22 July, 2021; originally announced July 2021.

    Comments: Presented at Theory and Practice of Differential Privacy Workshop (TPDP) 2021

  14. arXiv:2106.12118  [pdf, other

    cs.DB cs.CR

    HDMM: Optimizing error of high-dimensional statistical queries under differential privacy

    Authors: Ryan McKenna, Gerome Miklau, Michael Hay, Ashwin Machanavajjhala

    Abstract: In this work we describe the High-Dimensional Matrix Mechanism (HDMM), a differentially private algorithm for answering a workload of predicate counting queries. HDMM represents query workloads using a compact implicit matrix representation and exploits this representation to efficiently optimize over (a subset of) the space of differentially private algorithms for one that is unbiased and answers… ▽ More

    Submitted 22 June, 2021; originally announced June 2021.

    Comments: arXiv admin note: text overlap with arXiv:1808.03537

  15. arXiv:2106.05131  [pdf, other

    cs.DS cs.CR

    Prior-Aware Distribution Estimation for Differential Privacy

    Authors: Yuchao Tao, Johes Bater, Ashwin Machanavajjhala

    Abstract: Joint distribution estimation of a dataset under differential privacy is a fundamental problem for many privacy-focused applications, such as query answering, machine learning tasks and synthetic data generation. In this work, we examine the joint distribution estimation problem given two data points: 1) differentially private answers of a workload computed over private data and 2) a prior empiric… ▽ More

    Submitted 9 June, 2021; originally announced June 2021.

  16. DP-Sync: Hiding Update Patterns in Secure Outsourced Databases with Differential Privacy

    Authors: Chenghong Wang, Johes Bater, Kartik Nayak, Ashwin Machanavajjhala

    Abstract: In this paper, we have introduced a new type of leakage associated with modern encrypted databases called update pattern leakage. We formalize the definition and security model of DP-Sync with DP update patterns. We also proposed the framework DP-Sync, which extends existing encrypted database schemes to DP-Sync with DP update patterns. DP-Sync guarantees that the entire data update history over t… ▽ More

    Submitted 6 April, 2021; v1 submitted 29 March, 2021; originally announced March 2021.

  17. arXiv:2103.14435  [pdf, other

    cs.DB

    Synthesizing Linked Data Under Cardinality and Integrity Constraints

    Authors: Amir Gilad, Shweta Patwa, Ashwin Machanavajjhala

    Abstract: The generation of synthetic data is useful in multiple aspects, from testing applications to benchmarking to privacy preservation. Generating the links between relations, subject to cardinality constraints (CCs) and integrity constraints (ICs) is an important aspect of this problem. Given instances of two relations, where one has a foreign key dependence on the other and is missing its foreign key… ▽ More

    Submitted 26 March, 2021; originally announced March 2021.

  18. Budget Sharing for Multi-Analyst Differential Privacy

    Authors: David Pujol, Yikai Wu, Brandon Fain, Ashwin Machanavajjhala

    Abstract: Large organizations that collect data about populations (like the US Census Bureau) release summary statistics that are used by multiple stakeholders for resource allocation and policy making problems. These organizations are also legally required to protect the privacy of individuals from whom they collect data. Differential Privacy (DP) provides a solution to release useful summary data while pr… ▽ More

    Submitted 4 November, 2021; v1 submitted 2 November, 2020; originally announced November 2020.

    Comments: 13 pages, 5 figures. Proceedings of the VLDB Endowment (PVLDB) Vol. 14 No. 10. Presented at the International Conference on Very Large Data Bases (VLDB) 2021

    ACM Class: H.2.7

    Journal ref: PVLDB, 14(10): 1805-1817, 2021

  19. arXiv:2004.08887  [pdf, other

    cs.CR

    DP-Cryptography: Marrying Differential Privacy and Cryptography in Emerging Applications

    Authors: Sameer Wagh, Xi He, Ashwin Machanavajjhala, Prateek Mittal

    Abstract: Differential privacy (DP) has arisen as the state-of-the-art metric for quantifying individual privacy when sensitive data are analyzed, and it is starting to see practical deployment in organizations such as the US Census Bureau, Apple, Google, etc. There are two popular models for deploying differential privacy - standard differential privacy (SDP), where a trusted server aggregates all the data… ▽ More

    Submitted 19 April, 2020; originally announced April 2020.

  20. Computing Local Sensitivities of Counting Queries with Joins

    Authors: Yuchao Tao, Xi He, Ashwin Machanavajjhala, Sudeepa Roy

    Abstract: Local sensitivity of a query Q given a database instance D, i.e. how much the output Q(D) changes when a tuple is added to D or deleted from D, has many applications including query analysis, outlier detection, and in differential privacy. However, it is NP-hard to find local sensitivity of a conjunctive query in terms of the size of the query, even for the class of acyclic queries. Although the c… ▽ More

    Submitted 9 April, 2020; originally announced April 2020.

    Comments: To be published in Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

  21. arXiv:1908.10268  [pdf, other

    cs.DB cs.CR

    Answering Summation Queries for Numerical Attributes under Differential Privacy

    Authors: Yikai Wu, David Pujol, Ios Kotsogiannis, Ashwin Machanavajjhala

    Abstract: In this work we explore the problem of answering a set of sum queries under Differential Privacy. This is a little understood, non-trivial problem especially in the case of numerical domains. We show that traditional techniques from the literature are not always the best choice and a more rigorous approach is necessary to develop low error algorithms.

    Submitted 27 August, 2019; originally announced August 2019.

    Comments: TPDP 2019, 7 pages

  22. arXiv:1907.02159  [pdf, other

    cs.LG cs.CR stat.ML

    Capacity Bounded Differential Privacy

    Authors: Kamalika Chaudhuri, Jacob Imola, Ashwin Machanavajjhala

    Abstract: Differential privacy, a notion of algorithmic stability, is a gold standard for measuring the additional risk an algorithm's output poses to the privacy of a single record in the dataset. Differential privacy is defined as the distance between the output distribution of an algorithm on neighboring datasets that differ in one entry. In this work, we present a novel relaxation of differential privac… ▽ More

    Submitted 3 July, 2019; originally announced July 2019.

    Comments: 10 pages, 2 figures, Neurips 2019

  23. arXiv:1905.12744  [pdf, other

    cs.DB

    Fair Decision Making using Privacy-Protected Data

    Authors: Satya Kuppam, Ryan Mckenna, David Pujol, Michael Hay, Ashwin Machanavajjhala, Gerome Miklau

    Abstract: Data collected about individuals is regularly used to make decisions that impact those same individuals. We consider settings where sensitive personal data is used to decide who will receive resources or benefits. While it is well known that there is a tradeoff between protecting privacy and the accuracy of decisions, we initiate a first-of-its-kind study into the impact of formally private mechan… ▽ More

    Submitted 24 January, 2020; v1 submitted 29 May, 2019; originally announced May 2019.

    Comments: 12 pages, 4 figures

  24. arXiv:1902.07756  [pdf, other

    cs.CR

    Crypt$ε$: Crypto-Assisted Differential Privacy on Untrusted Servers

    Authors: Amrita Roy Chowdhury, Chenghong Wang, Xi He, Ashwin Machanavajjhala, Somesh Jha

    Abstract: Differential privacy (DP) has steadily become the de-facto standard for achieving privacy in data analysis, which is typically implemented either in the "central" or "local" model. The local model has been more popular for commercial deployments as it does not require a trusted data collector. This increased privacy, however, comes at a cost of utility and algorithmic expressibility as compared to… ▽ More

    Submitted 10 March, 2020; v1 submitted 20 February, 2019; originally announced February 2019.

  25. arXiv:1810.01816  [pdf, other

    cs.DB

    Shrinkwrap: Differentially-Private Query Processing in Private Data Federations

    Authors: Johes Bater, Xi He, William Ehrich, Ashwin Machanavajjhala, Jennie Rogers

    Abstract: A private data federation is a set of autonomous databases that share a unified query interface offering in-situ evaluation of SQL queries over the union of the sensitive data of its members. Owing to privacy concerns, these systems do not have a trusted data collector that can see all their data and their member databases cannot learn about individual records of other engines. Federations current… ▽ More

    Submitted 3 October, 2018; originally announced October 2018.

  26. Ektelo: A Framework for Defining Differentially-Private Computations

    Authors: Dan Zhang, Ryan McKenna, Ios Kotsogiannis, George Bissias, Michael Hay, Ashwin Machanavajjhala, Gerome Miklau

    Abstract: The adoption of differential privacy is growing but the complexity of designing private, efficient and accurate algorithms is still high. We propose a novel programming framework and system, Ektelo, for implementing both existing and new privacy algorithms. For the task of answering linear counting queries, we show that nearly all existing algorithms can be composed from operators, each conforming… ▽ More

    Submitted 24 May, 2019; v1 submitted 10 August, 2018; originally announced August 2018.

    Comments: Journal version under submission

  27. Optimizing error of high-dimensional statistical queries under differential privacy

    Authors: Ryan McKenna, Gerome Miklau, Michael Hay, Ashwin Machanavajjhala

    Abstract: Differentially private algorithms for answering sets of predicate counting queries on a sensitive database have many applications. Organizations that collect individual-level data, such as statistical agencies and medical institutions, use them to safely release summary tabulations. However, existing techniques are accurate only on a narrow class of query workloads, or are extremely slow, especial… ▽ More

    Submitted 10 August, 2018; originally announced August 2018.

    Journal ref: PVLDB, 11 (10): 1206-1219, 2018

  28. arXiv:1804.00370  [pdf, other

    cs.DB

    Differentially Private Hierarchical Count-of-Counts Histograms

    Authors: Yu-Hsuan Kuo, Cho-Chun Chiu, Daniel Kifer, Michael Hay, Ashwin Machanavajjhala

    Abstract: We consider the problem of privately releasing a class of queries that we call hierarchical count-of-counts histograms. Count-of-counts histograms partition the rows of an input table into groups (e.g., group of people in the same household), and for every integer j report the number of groups of size j. Hierarchical count-of-counts queries report count-of-counts histograms at different granularit… ▽ More

    Submitted 13 September, 2018; v1 submitted 1 April, 2018; originally announced April 2018.

    Comments: 13 pages

  29. arXiv:1712.10266  [pdf, other

    cs.DB

    APEx: Accuracy-Aware Differentially Private Data Exploration

    Authors: Chang Ge, Xi He, Ihab F. Ilyas, Ashwin Machanavajjhala

    Abstract: Organizations are increasingly interested in allowing external data scientists to explore their sensitive datasets. Due to the popularity of differential privacy, data owners want the data exploration to ensure provable privacy guarantees. However, current systems for answering queries with differential privacy place an inordinate burden on the data analysts to understand differential privacy, man… ▽ More

    Submitted 10 May, 2019; v1 submitted 29 December, 2017; originally announced December 2017.

    Comments: Full version of the ACM SIGMOD 2019 paper

  30. arXiv:1712.05888  [pdf, other

    cs.CR

    One-sided Differential Privacy

    Authors: Stelios Doudalis, Ios Kotsogiannis, Samuel Haney, Ashwin Machanavajjhala, Sharad Mehrotra

    Abstract: In this paper, we study the problem of privacy-preserving data sharing, wherein only a subset of the records in a database are sensitive, possibly based on predefined privacy policies. Existing solutions, viz, differential privacy (DP), are over-pessimistic and treat all information as sensitive. Alternatively, techniques, like access control and personalized differential privacy, reveal all non-s… ▽ More

    Submitted 15 December, 2017; originally announced December 2017.

  31. arXiv:1705.09561  [pdf, other

    stat.ME

    Differentially private significance tests for regression coefficients

    Authors: Andrés F. Barrientos, Jerome P. Reiter, Ashwin Machanavajjhala, Yan Chen

    Abstract: Many data producers seek to provide users access to confidential data without unduly compromising data subjects' privacy and confidentiality. One general strategy is to require users to do analyses without seeing the confidential data; for example, analysts only get access to synthetic data or query systems that provide disclosure-protected outputs of statistical models. With synthetic data or red… ▽ More

    Submitted 11 June, 2018; v1 submitted 26 May, 2017; originally announced May 2017.

  32. arXiv:1705.07872  [pdf, other

    stat.AP

    Providing Access to Confidential Research Data Through Synthesis and Verification: An Application to Data on Employees of the U.S. Federal Government

    Authors: Andrés F. Barrientos, Alexander Bolton, Tom Balmat, Jerome P. Reiter, John M. de Figueiredo, Ashwin Machanavajjhala, Yan Chen, Charley Kneifel, Mark DeLong

    Abstract: Data stewards seeking to provide access to large-scale social science data face a difficult challenge. They have to share data in ways that protect privacy and confidentiality, are informative for many analyses and purposes, and are relatively straightforward to use by data analysts. One approach suggested in the literature is that data stewards generate and release synthetic data, i.e., data simu… ▽ More

    Submitted 16 June, 2018; v1 submitted 22 May, 2017; originally announced May 2017.

  33. arXiv:1702.00535  [pdf, other

    cs.DB cs.CR

    Composing Differential Privacy and Secure Computation: A case study on scaling private record linkage

    Authors: Xi He, Ashwin Machanavajjhala, Cheryl Flynn, Divesh Srivastava

    Abstract: Private record linkage (PRL) is the problem of identifying pairs of records that are similar as per an input matching rule from databases held by two parties that do not trust one another. We identify three key desiderata that a PRL solution must ensure: 1) perfect precision and high recall of matching pairs, 2) a proof of end-to-end privacy, and 3) communication and computational costs that scale… ▽ More

    Submitted 1 September, 2017; v1 submitted 1 February, 2017; originally announced February 2017.

  34. arXiv:1701.00752  [pdf

    cs.CY cs.CR

    Privacy-Preserving Data Analysis for the Federal Statistical Agencies

    Authors: John Abowd, Lorenzo Alvisi, Cynthia Dwork, Sampath Kannan, Ashwin Machanavajjhala, Jerome Reiter

    Abstract: Government statistical agencies collect enormously valuable data on the nation's population and business activities. Wide access to these data enables evidence-based policy making, supports new research that improves society, facilitates training for students in data science, and provides resources for the public to better understand and participate in their society. These data also affect the pri… ▽ More

    Submitted 3 January, 2017; originally announced January 2017.

    Comments: A Computing Community Consortium (CCC) white paper, 7 pages

  35. arXiv:1512.04817  [pdf, other

    cs.DB cs.CR

    Principled Evaluation of Differentially Private Algorithms using DPBench

    Authors: Michael Hay, Ashwin Machanavajjhala, Gerome Miklau, Yan Chen, Dan Zhang

    Abstract: Differential privacy has become the dominant standard in the research community for strong privacy protection. There has been a flood of research into query answering algorithms that meet this standard. Algorithms are becoming increasingly complex, and in particular, the performance of many emerging algorithms is {\em data dependent}, meaning the distribution of the noise added to query answers ma… ▽ More

    Submitted 15 December, 2015; originally announced December 2015.

  36. arXiv:1508.07306  [pdf, ps, other

    cs.DB cs.CR

    On the Privacy Properties of Variants on the Sparse Vector Technique

    Authors: Yan Chen, Ashwin Machanavajjhala

    Abstract: The sparse vector technique is a powerful differentially private primitive that allows an analyst to check whether queries in a stream are greater or lesser than a threshold. This technique has a unique property -- the algorithm works by adding noise with a finite variance to the queries and the threshold, and guarantees privacy that only degrades with (a) the maximum sensitivity of any one query… ▽ More

    Submitted 28 August, 2015; originally announced August 2015.

    Comments: 8 pages

  37. arXiv:1411.5428  [pdf, ps, other

    cs.LG

    Differentially Private Algorithms for Empirical Machine Learning

    Authors: Ben Stoddard, Yan Chen, Ashwin Machanavajjhala

    Abstract: An important use of private data is to build machine learning classifiers. While there is a burgeoning literature on differentially private classification algorithms, we find that they are not practical in real applications due to two reasons. First, existing differentially private classifiers provide poor accuracy on real world datasets. Second, there is no known differentially private algorithm… ▽ More

    Submitted 21 November, 2014; v1 submitted 19 November, 2014; originally announced November 2014.

  38. arXiv:1404.3722  [pdf, other

    cs.DB cs.CR

    Design of Policy-Aware Differentially Private Algorithms

    Authors: Samuel Haney, Ashwin Machanavajjhala, Bolin Ding

    Abstract: The problem of designing error optimal differentially private algorithms is well studied. Recent work applying differential privacy to real world settings have used variants of differential privacy that appropriately modify the notion of neighboring databases. The problem of designing error optimal algorithms for such variants of differential privacy is open. In this paper, we show a novel transfo… ▽ More

    Submitted 20 November, 2015; v1 submitted 14 April, 2014; originally announced April 2014.

  39. Blowfish Privacy: Tuning Privacy-Utility Trade-offs using Policies

    Authors: Xi He, Ashwin Machanavajjhala, Bolin Ding

    Abstract: Privacy definitions provide ways for trading-off the privacy of individuals in a statistical database for the utility of downstream analysis of the data. In this paper, we present Blowfish, a class of privacy definitions inspired by the Pufferfish framework, that provides a rich interface for this trade-off. In particular, we allow data publishers to extend differential privacy using a policy, whi… ▽ More

    Submitted 23 June, 2014; v1 submitted 13 December, 2013; originally announced December 2013.

    Comments: Full version of the paper at SIGMOD'14 Snowbird, Utah USA

  40. arXiv:1302.6556  [pdf, other

    cs.DB

    On Sharing Private Data with Multiple Non-Colluding Adversaries

    Authors: Theodoros Rekatsinas, Amol Deshpande, Ashwin Machanavajjhala

    Abstract: We present SPARSI, a theoretical framework for partitioning sensitive data across multiple non-colluding adversaries. Most work in privacy-aware data sharing has considered disclosing summaries where the aggregate information about the data is preserved, but sensitive user information is protected. Nonetheless, there are applications, including online advertising, cloud computing and crowdsourcing… ▽ More

    Submitted 11 March, 2013; v1 submitted 26 February, 2013; originally announced February 2013.

    Comments: 14 pages, 6 figures, 2 tables

  41. arXiv:1205.0435  [pdf, other

    cs.DB cs.SI physics.soc-ph

    Scalable Social Coordination using Enmeshed Queries

    Authors: Jianjun Chen, Ashwin Machanavajjhala, George Varghese

    Abstract: Social coordination allows users to move beyond awareness of their friends to efficiently coordinating physical activities with others. While specific forms of social coordination can be seen in tools such as Evite, Meetup and Groupon, we introduce a more general model using what we call enmeshed queries. An enmeshed query allows users to declaratively specify an intent to coordinate by specifying… ▽ More

    Submitted 17 August, 2012; v1 submitted 2 May, 2012; originally announced May 2012.

    Comments: 11 pages, 9 figures

  42. arXiv:1203.6406  [pdf, other

    cs.DB

    An Analysis of Structured Data on the Web

    Authors: Nilesh Dalvi, Ashwin Machanavajjhala, Bo Pang

    Abstract: In this paper, we analyze the nature and distribution of structured data on the Web. Web-scale information extraction, or the problem of creating structured tables using extraction from the entire web, is gathering lots of research interest. We perform a study to understand and quantify the value of Web-scale extraction, and how structured information is distributed amongst top aggregator websites… ▽ More

    Submitted 28 March, 2012; originally announced March 2012.

    Comments: VLDB2012

    Journal ref: Proceedings of the VLDB Endowment (PVLDB), Vol. 5, No. 7, pp. 680-691 (2012)

  43. arXiv:1203.5387  [pdf, ps, other

    cs.DS cs.DB

    Finding Connected Components on Map-reduce in Logarithmic Rounds

    Authors: Vibhor Rastogi, Ashwin Machanavajjhala, Laukik Chitnis, Anish Das Sarma

    Abstract: Given a large graph G = (V,E) with millions of nodes and edges, how do we compute its connected components efficiently? Recent work addresses this problem in map-reduce, where a fundamental trade-off exists between the number of map-reduce rounds and the communication of each round. Denoting d the diameter of the graph, and n the number of nodes in the largest component, all prior map-reduce techn… ▽ More

    Submitted 12 November, 2012; v1 submitted 24 March, 2012; originally announced March 2012.

  44. arXiv:1111.3689  [pdf, other

    cs.DB

    CBLOCK: An Automatic Blocking Mechanism for Large-Scale De-duplication Tasks

    Authors: Anish Das Sarma, Ankur Jain, Ashwin Machanavajjhala, Philip Bohannon

    Abstract: De-duplication---identification of distinct records referring to the same real-world entity---is a well-known challenge in data integration. Since very large datasets prohibit the comparison of every pair of records, {\em blocking} has been identified as a technique of dividing the dataset for pairwise comparisons, thereby trading off {\em recall} of identified duplicates for {\em efficiency}. Tra… ▽ More

    Submitted 15 November, 2011; originally announced November 2011.

  45. arXiv:1105.4254  [pdf

    cs.DB cs.CR cs.SI

    Personalized Social Recommendations - Accurate or Private?

    Authors: Ashwin Machanavajjhala, Aleksandra Korolova, Atish Das Sarma

    Abstract: With the recent surge of social networks like Facebook, new forms of recommendations have become possible - personalized recommendations of ads, content, and even new friend and product connections based on one's social interactions. Since recommendations may use sensitive social information, it is speculated that these recommendations are associated with privacy risks. The main contribution of th… ▽ More

    Submitted 21 May, 2011; originally announced May 2011.

    Comments: VLDB2011

    Journal ref: Proceedings of the VLDB Endowment (PVLDB), Vol. 4, No. 7, pp. 440-450 (2011)

  46. arXiv:1004.5600  [pdf, other

    cs.DS

    On the (Im)possibility of Preserving Utility and Privacy in Personalized Social Recommendations

    Authors: Ashwin Machanavajjhala, Aleksandra Korolova, Atish Das Sarma

    Abstract: With the recent surge of social networks like Facebook, new forms of recommendations have become possible -- personalized recommendations of ads, content, and even new social and product connections based on one's social interactions. In this paper, we study whether "social recommendations", or recommendations that utilize a user's social network, can be made without disclosing sensitive links b… ▽ More

    Submitted 30 April, 2010; originally announced April 2010.

  47. arXiv:0904.0682  [pdf, ps, other

    cs.DB cs.IR

    Privacy in Search Logs

    Authors: Michaela Goetz, Ashwin Machanavajjhala, Guozhang Wang, Xiaokui Xiao, Johannes Gehrke

    Abstract: Search engine companies collect the "database of intentions", the histories of their users' search queries. These search logs are a gold mine for researchers. Search engine companies, however, are wary of publishing search logs in order not to disclose sensitive information. In this paper we analyze algorithms for publishing frequent keywords, queries and clicks of a search log. We first show how… ▽ More

    Submitted 11 May, 2011; v1 submitted 4 April, 2009; originally announced April 2009.

  48. arXiv:0705.2787  [pdf, ps, other

    cs.DB

    Worst-Case Background Knowledge for Privacy-Preserving Data Publishing

    Authors: David J. Martin, Daniel Kifer, Ashwin Machanavajjhala, Johannes Gehrke, Joseph Y. Halpern

    Abstract: Recent work has shown the necessity of considering an attacker's background knowledge when reasoning about privacy in data publishing. However, in practice, the data publisher does not know what background knowledge the attacker possesses. Thus, it is important to consider the worst-case. In this paper, we initiate a formal study of worst-case background knowledge. We propose a language that can… ▽ More

    Submitted 18 May, 2007; originally announced May 2007.

    Comments: 10 pages