Skip to main content

Showing 1–4 of 4 results for author: Bohannon, P

.
  1. arXiv:2212.04133  [pdf, other

    cs.CR

    Tumult Analytics: a robust, easy-to-use, scalable, and expressive framework for differential privacy

    Authors: Skye Berghel, Philip Bohannon, Damien Desfontaines, Charles Estes, Sam Haney, Luke Hartman, Michael Hay, Ashwin Machanavajjhala, Tom Magerlein, Gerome Miklau, Amritha Pai, William Sexton, Ruchit Shrestha

    Abstract: In this short paper, we outline the design of Tumult Analytics, a Python framework for differential privacy used at institutions such as the U.S. Census Bureau, the Wikimedia Foundation, or the Internal Revenue Service.

    Submitted 8 December, 2022; originally announced December 2022.

  2. arXiv:1111.7170  [pdf, other

    cs.DB

    REX: Explaining Relationships between Entity Pairs

    Authors: Lujun Fang, Anish Das Sarma, Cong Yu, Philip Bohannon

    Abstract: Knowledge bases of entities and relations (either constructed manually or automatically) are behind many real world search engines, including those at Yahoo!, Microsoft, and Google. Those knowledge bases can be viewed as graphs with nodes representing entities and edges representing (primary) relationships, and various studies have been conducted on how to leverage them to answer entity seeking qu… ▽ More

    Submitted 30 November, 2011; originally announced November 2011.

    Comments: VLDB2012

    Journal ref: Proceedings of the VLDB Endowment (PVLDB), Vol. 5, No. 3, pp. 241-252 (2011)

  3. arXiv:1111.3689  [pdf, other

    cs.DB

    CBLOCK: An Automatic Blocking Mechanism for Large-Scale De-duplication Tasks

    Authors: Anish Das Sarma, Ankur Jain, Ashwin Machanavajjhala, Philip Bohannon

    Abstract: De-duplication---identification of distinct records referring to the same real-world entity---is a well-known challenge in data integration. Since very large datasets prohibit the comparison of every pair of records, {\em blocking} has been identified as a technique of dividing the dataset for pairwise comparisons, thereby trading off {\em recall} of identified duplicates for {\em efficiency}. Tra… ▽ More

    Submitted 15 November, 2011; originally announced November 2011.

  4. arXiv:1004.1614  [pdf, other

    cs.DB

    PROBER: Ad-Hoc Debugging of Extraction and Integration Pipelines

    Authors: Anish Das Sarma, Alpa Jain, Philip Bohannon

    Abstract: Complex information extraction (IE) pipelines assembled by plumbing together off-the-shelf operators, specially customized operators, and operators re-used from other text processing pipelines are becoming an integral component of most text processing frameworks. A critical task faced by the IE pipeline user is to run a post-mortem analysis on the output. Due to the diverse nature of extraction op… ▽ More

    Submitted 9 April, 2010; originally announced April 2010.

    Comments: 10 pages