Skip to main content

Showing 1–21 of 21 results for author: Deutch, D

.
  1. arXiv:2308.05588  [pdf, other

    cs.DB

    Banzhaf Values for Facts in Query Answering

    Authors: Omer Abramovich, Daniel Deutch, Nave Frost, Ahmet Kara, Dan Olteanu

    Abstract: Quantifying the contribution of database facts to query answers has been studied as means of explanation. The Banzhaf value, originally developed in Game Theory, is a natural measure of fact contribution, yet its efficient computation for select-project-join-union queries is challenging. In this paper, we introduce three algorithms to compute the Banzhaf value of database facts: an exact algorithm… ▽ More

    Submitted 10 August, 2023; originally announced August 2023.

  2. arXiv:2304.13007  [pdf, other

    cs.CL cs.AI

    Answering Questions by Meta-Reasoning over Multiple Chains of Thought

    Authors: Ori Yoran, Tomer Wolfson, Ben Bogin, Uri Katz, Daniel Deutch, Jonathan Berant

    Abstract: Modern systems for multi-hop question answering (QA) typically break questions into a sequence of reasoning steps, termed chain-of-thought (CoT), before arriving at a final answer. Often, multiple chains are sampled and aggregated through a voting mechanism over the final answers, but the intermediate steps themselves are discarded. While such approaches improve performance, they do not consider t… ▽ More

    Submitted 17 October, 2023; v1 submitted 25 April, 2023; originally announced April 2023.

    Comments: Accepted for publication in The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023). Author's final version

  3. arXiv:2209.06260  [pdf, other

    cs.DB

    FEDEX: An Explainability Framework for Data Exploration Steps

    Authors: Daniel Deutch, Amir Gilad, Tova Milo, Amit Mualem, Amit Somech

    Abstract: When exploring a new dataset, Data Scientists often apply analysis queries, look for insights in the resulting dataframe, and repeat to apply further queries. We propose in this paper a novel solution that assists data scientists in this laborious process. In a nutshell, our solution pinpoints the most interesting (sets of) rows in each obtained dataframe. Uniquely, our definition of interest is b… ▽ More

    Submitted 13 September, 2022; originally announced September 2022.

    Comments: Full version of the VLDB paper with the same title

  4. arXiv:2112.08874  [pdf, other

    cs.DB

    Computing the Shapley Value of Facts in Query Answering

    Authors: Daniel Deutch, Nave Frost, Benny Kimelfeld, Mikaƫl Monet

    Abstract: The Shapley value is a game-theoretic notion for wealth distribution that is nowadays extensively used to explain complex data-intensive computation, for instance, in network analysis or machine learning. Recent theoretical works show that query evaluation over relational databases fits well in this explanation paradigm. Yet, these works fall short of providing practical solutions to the computati… ▽ More

    Submitted 2 January, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

  5. arXiv:2112.06311  [pdf, other

    cs.CL cs.AI cs.DB

    Weakly Supervised Text-to-SQL Parsing through Question Decomposition

    Authors: Tomer Wolfson, Daniel Deutch, Jonathan Berant

    Abstract: Text-to-SQL parsers are crucial in enabling non-experts to effortlessly query relational data. Training such parsers, by contrast, generally requires expertise in annotating natural language (NL) utterances with corresponding SQL queries. In this work, we propose a weak supervision approach for training text-to-SQL parsers. We take advantage of the recently proposed question meaning representation… ▽ More

    Submitted 26 April, 2022; v1 submitted 12 December, 2021; originally announced December 2021.

    Comments: Accepted for publication in Findings of NAACL 2022. Author's final version

  6. arXiv:2103.00288  [pdf, other

    cs.DB

    On Optimizing the Trade-off between Privacy and Utility in Data Provenance

    Authors: Daniel Deutch, Ariel Frankenthal, Amir Gilad, Yuval Moskovitch

    Abstract: Organizations that collect and analyze data may wish or be mandated by regulation to justify and explain their analysis results. At the same time, the logic that they have followed to analyze the data, i.e., their queries, may be proprietary and confidential. Data provenance, a record of the transformations that data underwent, was extensively studied as means of explanations. In contrast, only a… ▽ More

    Submitted 27 February, 2021; originally announced March 2021.

  7. arXiv:2007.05463  [pdf, other

    cs.DB

    Equivalence-Invariant Algebraic Provenance for Hyperplane Update Queries

    Authors: Pierre Bourhis, Daniel Deutch, Yuval Moskovitch

    Abstract: The algebraic approach for provenance tracking, originating in the semiring model of Green et. al, has proven useful as an abstract way of handling metadata. Commutative Semirings were shown to be the "correct" algebraic structure for Union of Conjunctive Queries, in the sense that its use allows provenance to be invariant under certain expected query equivalence axioms. In this paper we present… ▽ More

    Submitted 10 July, 2020; originally announced July 2020.

    Journal ref: Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], pages: 415--429

  8. arXiv:2007.05400  [pdf, other

    cs.DB

    Hypothetical Reasoning via Provenance Abstraction

    Authors: Daniel Deutch, Yuval Moskovitch, Noam Rinetzky

    Abstract: Data analytics often involves hypothetical reasoning: repeatedly modifying the data and observing the induced effect on the computation result of a data-centric application. Previous work has shown that fine-grained data provenance can help make such an analysis more efficient: instead of a costly re-execution of the underlying application, hypothetical scenarios are applied to a pre-computed prov… ▽ More

    Submitted 10 July, 2020; originally announced July 2020.

    Journal ref: Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, pages 537--554

  9. arXiv:2007.05389  [pdf, other

    cs.DB

    COBRA: Compression via Abstraction of Provenance for Hypothetical Reasoning

    Authors: Daniel Deutch, Yuval Moskovitch, Noam Rinetzky

    Abstract: Data analytics often involves hypothetical reasoning: repeatedly modifying the data and observing the induced effect on the computation result of a data-centric application. Recent work has proposed to leverage ideas from data provenance tracking towards supporting efficient hypothetical reasoning: instead of a costly re-execution of the underlying application, one may assign values to a pre-compu… ▽ More

    Submitted 10 July, 2020; originally announced July 2020.

    Journal ref: 2019 IEEE 35th International Conference on Data Engineering (ICDE), Macao, Macao, 2019, pp. 2016--2019

  10. Explaining Natural Language Query Results

    Authors: Daniel Deutch, Nave Frost, Amir Gilad

    Abstract: Multiple lines of research have developed Natural Language (NL) interfaces for formulating database queries. We build upon this work, but focus on presenting a highly detailed form of the answers in NL. The answers that we present are importantly based on the provenance of tuples in the query result, detailing not only the results but also their explanations. We develop a novel method for transfor… ▽ More

    Submitted 8 July, 2020; originally announced July 2020.

    Journal ref: The VLDB Journal 29, pp. 485--508 (2020)

  11. Just in Time: Personal Temporal Insights for Altering Model Decisions

    Authors: Naama Boer, Daniel Deutch, Nave Frost, Tova Milo

    Abstract: The interpretability of complex Machine Learning models is coming to be a critical social concern, as they are increasingly used in human-related decision-making processes such as resume filtering or loan applications. Individuals receiving an undesired classification are likely to call for an explanation -- preferably one that specifies what they should do in order to alter that decision when the… ▽ More

    Submitted 8 July, 2020; originally announced July 2020.

    Journal ref: 2019 IEEE 35th International Conference on Data Engineering (ICDE), Macao, Macao, 2019, pp. 1988 -- 1991

  12. T-REx: Table Repair Explanations

    Authors: Daniel Deutch, Nave Frost, Amir Gilad, Oren Sheffer

    Abstract: Data repair is a common and crucial step in many frameworks today, as applications may use data from different sources and of different levels of credibility. Thus, this step has been the focus of many works, proposing diverse approaches. To assist users in understanding the output of such data repair algorithms, we propose T-REx, a system for providing data repair explanations through Shapley val… ▽ More

    Submitted 8 July, 2020; originally announced July 2020.

    Journal ref: In Proceedings of the 2020 ACM SIGMOD. Association for Computing Machinery, New York, NY, USA, pages: 2765 to 2768 (2020)

  13. On Multiple Semantics for Declarative Database Repairs

    Authors: Amir Gilad, Daniel Deutch, Sudeepa Roy

    Abstract: We study the problem of database repairs through a rule-based framework that we refer to as Delta Rules. Delta Rules are highly expressive and allow specifying complex, cross-relations repair logic associated with Denial Constraints, Causal Rules, and allowing to capture Database Triggers of interest. We show that there are no one-size-fits-all semantics for repairs in this inclusive setting, and… ▽ More

    Submitted 12 April, 2020; v1 submitted 10 April, 2020; originally announced April 2020.

    Journal ref: SIGMOD 2020

  14. arXiv:2001.11770  [pdf, other

    cs.CL

    Break It Down: A Question Understanding Benchmark

    Authors: Tomer Wolfson, Mor Geva, Ankit Gupta, Matt Gardner, Yoav Goldberg, Daniel Deutch, Jonathan Berant

    Abstract: Understanding natural language questions entails the ability to break down a question into the requisite steps for computing its answer. In this work, we introduce a Question Decomposition Meaning Representation (QDMR) for questions. QDMR constitutes the ordered list of steps, expressed through natural language, that are necessary for answering a question. We develop a crowdsourcing pipeline, show… ▽ More

    Submitted 31 January, 2020; originally announced January 2020.

    Comments: Accepted for publication in Transactions of the Association for Computational Linguistics (TACL), 2020. Author's final version

  15. arXiv:1808.04614  [pdf, other

    cs.CL cs.AI

    Explaining Queries over Web Tables to Non-Experts

    Authors: Jonathan Berant, Daniel Deutch, Amir Globerson, Tova Milo, Tomer Wolfson

    Abstract: Designing a reliable natural language (NL) interface for querying tables has been a longtime goal of researchers in both the data management and natural language processing (NLP) communities. Such an interface receives as input an NL question, translates it into a formal query, executes the query and returns the results. Errors in the translation process are not uncommon, and users typically strug… ▽ More

    Submitted 14 August, 2018; originally announced August 2018.

    Comments: Short paper version to appear in ICDE 2019

  16. arXiv:1801.06396  [pdf, ps, other

    cs.DB

    Computing Possible and Certain Answers over Order-Incomplete Data

    Authors: Antoine Amarilli, Mouhamadou Lamine Ba, Daniel Deutch, Pierre Senellart

    Abstract: This paper studies the complexity of query evaluation for databases whose relations are partially ordered; the problem commonly arises when combining or transforming ordered data from multiple sources. We focus on queries in a useful fragment of SQL, namely positive relational algebra with aggregates, whose bag semantics we extend to the partially ordered setting. Our semantics leads to the study… ▽ More

    Submitted 29 May, 2019; v1 submitted 19 January, 2018; originally announced January 2018.

    Comments: 55 pages, 56 references. Extended journal version of arXiv:1707.07222. Up to the stylesheet, page/environment numbering, and possible minor publisher-induced changes, this is the exact content of the journal paper that will appear in Theoretical Computer Science

  17. Possible and Certain Answers for Queries over Order-Incomplete Data

    Authors: Antoine Amarilli, Mouhamadou Lamine Ba, Daniel Deutch, Pierre Senellart

    Abstract: To combine and query ordered data from multiple sources, one needs to handle uncertainty about the possible orderings. Examples of such "order-incomplete" data include integrated event sequences such as log entries, lists of properties (e.g., hotels and restaurants) ranked by an unknown function reflecting relevance or customer ratings, and documents edited concurrently with an uncertain order on… ▽ More

    Submitted 26 January, 2018; v1 submitted 22 July, 2017; originally announced July 2017.

    Comments: This paper is the full version with appendices of the TIME'17 article. See also the upcoming journal version: arXiv:1801.06396. Important note: This version (version 2) removes some results because we found a bug in their proofs. See Appendix G for detailed explanations. The journal version also omits the affected results (and does not contain Appendix G)

  18. arXiv:1602.03819  [pdf, other

    cs.DB

    Query By Provenance

    Authors: Daniel Deutch, Amir Gilad

    Abstract: To assist non-specialists in formulating database queries, multiple frameworks that automatically infer queries from a set of examples have been proposed. While highly useful, a shortcoming of the approach is that if users can only provide a small set of examples, many inherently different queries may qualify, and only some of these actually match the user intentions. Our main observation is that… ▽ More

    Submitted 16 May, 2016; v1 submitted 11 February, 2016; originally announced February 2016.

  19. arXiv:1201.0231  [pdf, other

    cs.DB

    Putting Lipstick on Pig: Enabling Database-style Workflow Provenance

    Authors: Yael Amsterdamer, Susan B. Davidson, Daniel Deutch, Tova Milo, Julia Stoyanovich, Val Tannen

    Abstract: Workflow provenance typically assumes that each module is a "black-box", so that each output depends on all inputs (coarse-grained dependencies). Furthermore, it does not model the internal state of a module, which can change between repeated executions. In practice, however, an output may depend on only a small subset of the inputs (fine-grained dependencies) as well as on the internal state of t… ▽ More

    Submitted 31 December, 2011; originally announced January 2012.

    Comments: VLDB2012

    Journal ref: Proceedings of the VLDB Endowment (PVLDB), Vol. 5, No. 4, pp. 346-357 (2011)

  20. arXiv:1105.2255  [pdf, ps, other

    cs.DB

    On the Limitations of Provenance for Queries With Difference

    Authors: Yael Amsterdamer, Daniel Deutch, Val Tannen

    Abstract: The annotation of the results of database transformations was shown to be very effective for various applications. Until recently, most works in this context focused on positive query languages. The provenance semirings is a particular approach that was proven effective for these languages, and it was shown that when propagating provenance with semirings, the expected equivalence axioms of the cor… ▽ More

    Submitted 11 May, 2011; originally announced May 2011.

    Comments: TAPP 2011

  21. arXiv:1101.1110  [pdf, ps, other

    cs.DB

    Provenance for Aggregate Queries

    Authors: Yael Amsterdamer, Daniel Deutch, Val Tannen

    Abstract: We study in this paper provenance information for queries with aggregation. Provenance information was studied in the context of various query languages that do not allow for aggregation, and recent work has suggested to capture provenance by annotating the different database tuples with elements of a commutative semiring and propagating the annotations through query evaluation. We show that aggre… ▽ More

    Submitted 5 January, 2011; originally announced January 2011.