Skip to main content

Showing 1–40 of 40 results for author: Gatterbauer, W

.
  1. arXiv:2404.00007  [pdf, other

    cs.DB cs.LO

    A Comprehensive Tutorial on over 100 Years of Diagrammatic Representations of Logical Statements and Relational Queries

    Authors: Wolfgang Gatterbauer

    Abstract: Query formulation is increasingly performed by systems that need to guess a user's intent (e.g. via spoken word interfaces). But how can a user know that the computational agent is returning answers to the "right" query? More generally, given that relational queries can become pretty complicated, how can we help users understand relational queries, whether human-generated or automatically generate… ▽ More

    Submitted 4 March, 2024; originally announced April 2024.

    Comments: 6 pages, 2 figures, preprint of ICDE 2024 tutorial. arXiv admin note: substantial text overlap with arXiv:2308.10319

  2. arXiv:2401.04758  [pdf, other

    cs.DB cs.HC cs.LO

    On The Reasonable Effectiveness of Relational Diagrams: Explaining Relational Query Patterns and the Pattern Expressiveness of Relational Languages

    Authors: Wolfgang Gatterbauer, Cody Dunne

    Abstract: Comparing relational languages by their logical expressiveness is well understood. Less well understood is how to compare relational languages by their ability to represent relational query patterns. Indeed, what are query patterns other than "a certain way of writing a query"? And how can query patterns be defined across procedural and declarative languages, irrespective of their syntax? To the b… ▽ More

    Submitted 9 January, 2024; originally announced January 2024.

    Comments: 71 pages, 49 figures, full version of SIGMOD 2024 paper of same title: https://doi.org/10.1145/3639316. arXiv admin note: text overlap with arXiv:2203.07284

  3. arXiv:2401.00013  [pdf, other

    cs.SI cs.DB cs.DS cs.LG

    HITSnDIFFs: From Truth Discovery to Ability Discovery by Recovering Matrices with the Consecutive Ones Property

    Authors: Zixuan Chen, Subhodeep Mitra, R Ravi, Wolfgang Gatterbauer

    Abstract: We analyze a general problem in a crowd-sourced setting where one user asks a question (also called item) and other users return answers (also called labels) for this question. Different from existing crowd sourcing work which focuses on finding the most appropriate label for the question (the "truth"), our problem is to determine a ranking of the users based on their ability to answer questions.… ▽ More

    Submitted 21 December, 2023; originally announced January 2024.

    Comments: 22 pages, 14 figures, long version of of ICDE 2024 conference paper

  4. arXiv:2308.10319  [pdf, other

    cs.DB cs.HC cs.LO cs.PL

    A Tutorial on Visual Representations of Relational Queries

    Authors: Wolfgang Gatterbauer

    Abstract: Query formulation is increasingly performed by systems that need to guess a user's intent (e.g. via spoken word interfaces). But how can a user know that the computational agent is returning answers to the "right" query? More generally, given that relational queries can become pretty complicated, how can we help users understand existing relational queries, whether human-generated or automatically… ▽ More

    Submitted 20 August, 2023; originally announced August 2023.

    Comments: 4 page tutorial paper at VLDB 2023, tutorial web page with slides to be posted in time: https://northeastern-datalab.github.io/visual-query-representation-tutorial/. arXiv admin note: text overlap with arXiv:2208.01613

  5. arXiv:2307.00465  [pdf, other

    cs.LG cs.LO

    Towards Unbiased Exploration in Partial Label Learning

    Authors: Zsolt Zombori, Agapi Rissaki, Kristóf Szabó, Wolfgang Gatterbauer, Michael Benedikt

    Abstract: We consider learning a probabilistic classifier from partially-labelled supervision (inputs denoted with multiple possibilities) using standard neural architectures with a softmax as the final layer. We identify a bias phenomenon that can arise from the softmax layer in even simple architectures that prevents proper exploration of alternative options, making the dynamics of gradient descent overly… ▽ More

    Submitted 1 July, 2023; originally announced July 2023.

  6. Efficient Computation of Quantiles over Joins

    Authors: Nikolaos Tziavelis, Nofar Carmeli, Wolfgang Gatterbauer, Benny Kimelfeld, Mirek Riedewald

    Abstract: We present efficient algorithms for Quantile Join Queries, abbreviated as %JQ. A %JQ asks for the answer at a specified relative position (e.g., 50% for the median) under some ordering over the answers to a Join Query (JQ). Our goal is to avoid materializing the set of all join answers, and to achieve quasilinear time in the size of the database, regardless of the total number of answers. A recent… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

  7. A Unified Approach for Resilience and Causal Responsibility with Integer Linear Programming (ILP) and LP Relaxations

    Authors: Neha Makhija, Wolfgang Gatterbauer

    Abstract: Resilience is one of the key algorithmic problems underlying various forms of reverse data management (such as view maintenance, deletion propagation, and various interventions for fairness): What is the minimal number of tuples to delete from a database in order to remove all answers from a query? A long-open question is determining those conjunctive queries (CQs) for which this problem can be so… ▽ More

    Submitted 20 October, 2023; v1 submitted 17 December, 2022; originally announced December 2022.

    Comments: 43 pages, 15 figures

    Journal ref: Proc. ACM Manag. Data 1, 4 (SIGMOD), Article 228 (December 2023), 43 pages

  8. arXiv:2209.13589  [pdf, other

    cs.DB

    SANTOS: Relationship-based Semantic Table Union Search

    Authors: Aamod Khatiwada, Grace Fan, Roee Shraga, Zixuan Chen, Wolfgang Gatterbauer, Renée J. Miller, Mirek Riedewald

    Abstract: Existing techniques for unionable table search define unionability using metadata (tables must have the same or similar schemas) or column-based metrics (for example, the values in a table should be drawn from the same domain). In this work, we introduce the use of semantic relationships between pairs of columns in a table to improve the accuracy of union search. Consequently, we introduce a new n… ▽ More

    Submitted 27 September, 2022; originally announced September 2022.

    Comments: 15 pages, 10 figures, to appear at SIGMOD 2023

  9. arXiv:2208.01613  [pdf, other

    cs.DB cs.HC

    Principles of Query Visualization

    Authors: Wolfgang Gatterbauer, Cody Dunne, H. V. Jagadish, Mirek Riedewald

    Abstract: Query Visualization (QV) is the problem of transforming a given query into a graphical representation that helps humans understand its meaning. This task is notably different from designing a Visual Query Language (VQL) that helps a user compose a query. This article discusses the principles of relational query visualization and its potential for simplifying user interactions with relational data.

    Submitted 2 August, 2022; originally announced August 2022.

    Comments: 20 pages, 12 figures, preprint for IEEE Data Engineering Bulletin

  10. arXiv:2205.05649  [pdf, other

    cs.DB cs.DS cs.LO

    Any-k Algorithms for Enumerating Ranked Answers to Conjunctive Queries

    Authors: Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald

    Abstract: We study ranked enumeration for Conjunctive Queries (CQs) where the answers are ordered by a given ranking function (e.g., an ORDER BY clause in SQL). We develop "any-k" algorithms, which, without knowing the number k of desired answers, push down the ranking into joins by carefully ordering the computation of intermediate tuples and avoiding materialization of join answers until they are needed.… ▽ More

    Submitted 12 October, 2023; v1 submitted 11 May, 2022; originally announced May 2022.

  11. arXiv:2203.07284  [pdf, other

    cs.DB cs.LO cs.PL

    Relational Diagrams: a pattern-preserving diagrammatic representation of non-disjunctive Relational Queries

    Authors: Wolfgang Gatterbauer, Cody Dunne, Mirek Riedewald

    Abstract: Analyzing relational languages by their logical expressiveness is well understood. Something not well understood or even formalized is the vague concept of relational query patterns. What are query patterns? And how can we reason about query patterns across different relational languages, irrespective of their syntax and their procedural or declarative nature? In this paper, we formalize the conce… ▽ More

    Submitted 14 March, 2022; originally announced March 2022.

    Comments: 23 pages, 29 figures

  12. arXiv:2105.14307  [pdf, other

    cs.DB

    Minimally Factorizing the Provenance of Self-join Free Conjunctive Queries

    Authors: Neha Makhija, Wolfgang Gatterbauer

    Abstract: We consider the problem of finding the minimal-size factorization of the provenance of self-join-free conjunctive queries, i.e., we want to find a formula that minimizes the number of variable repetitions. This problem is equivalent to solving the fundamental Boolean formula factorization problem for the restricted setting of the provenance formulas of self-join free queries. While general Boolean… ▽ More

    Submitted 14 May, 2024; v1 submitted 29 May, 2021; originally announced May 2021.

    Comments: 57 pages, 38 figures

  13. arXiv:2103.09940  [pdf, other

    cs.DB

    DomainNet: Homograph Detection for Data Lake Disambiguation

    Authors: Aristotelis Leventidis, Laura Di Rocco, Wolfgang Gatterbauer, Renée J. Miller, Mirek Riedewald

    Abstract: Modern data lakes are deeply heterogeneous in the vocabulary that is used to describe data. We study a problem of disambiguation in data lakes: how can we determine if a data value occurring more than once in the lake has different meanings and is therefore a homograph? While word and entity disambiguation have been well studied in computational linguistics, data management and data science, we sh… ▽ More

    Submitted 22 March, 2021; v1 submitted 17 March, 2021; originally announced March 2021.

    Comments: Full version of paper appearing in EDBT 2021

  14. Beyond Equi-joins: Ranking, Enumeration and Factorization

    Authors: Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald

    Abstract: We study theta-joins in general and join predicates with conjunctions and disjunctions of inequalities in particular, focusing on ranked enumeration where the answers are returned incrementally in an order dictated by a given ranking function. Our approach achieves strong time and space complexity properties: with $n$ denoting the number of tuples in the database, we guarantee for acyclic full joi… ▽ More

    Submitted 30 August, 2021; v1 submitted 28 January, 2021; originally announced January 2021.

    Comments: 21 pages

    Journal ref: PVLDB, 14(11):2599-2612, 2021

  15. arXiv:2012.11965  [pdf, other

    cs.DB cs.DS

    Tractable Orders for Direct Access to Ranked Answers of Conjunctive Queries

    Authors: Nofar Carmeli, Nikolaos Tziavelis, Wolfgang Gatterbauer, Benny Kimelfeld, Mirek Riedewald

    Abstract: We study the question of when we can provide direct access to the k-th answer to a Conjunctive Query (CQ) according to a specified order over the answers in time logarithmic in the size of the database, following a preprocessing step that constructs a data structure in time quasilinear in database size. Specifically, we embark on the challenge of identifying the tractable answer orderings, that is… ▽ More

    Submitted 28 November, 2022; v1 submitted 22 December, 2020; originally announced December 2020.

    Comments: 44 pages

  16. Optimal Join Algorithms Meet Top-k

    Authors: Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald

    Abstract: Top-k queries have been studied intensively in the database community and they are an important means to reduce query cost when only the "best" or "most interesting" results are needed instead of the full output. While some optimality results exist, e.g., the famous Threshold Algorithm, they hold only in a fairly limited model of computation that does not account for the cost incurred by large int… ▽ More

    Submitted 1 May, 2020; originally announced May 2020.

    Comments: To be published in Proceedings ofthe 2020 ACM SIGMOD International Conference on Management of Data (SIGMOD'20), June 14-19, 2020, Portland, OR, USA, 7 pages

  17. arXiv:2004.11375  [pdf

    cs.DB cs.HC cs.LO

    QueryVis: Logic-based diagrams help users understand complicated SQL queries faster

    Authors: Aristotelis Leventidis, Jiahui Zhang, Cody Dunne, Wolfgang Gatterbauer, H. V. Jagadish, Mirek Riedewald

    Abstract: Understanding the meaning of existing SQL queries is critical for code maintenance and reuse. Yet SQL can be hard to read, even for expert users or the original creator of a query. We conjecture that it is possible to capture the logical intent of queries in \emph{automatically-generated visual diagrams} that can help users understand the meaning of queries faster and more accurately than SQL text… ▽ More

    Submitted 23 April, 2020; originally announced April 2020.

    Comments: Full version of paper appearing in SIGMOD 2020

  18. Near-Optimal Distributed Band-Joins through Recursive Partitioning

    Authors: Rundong Li, Wolfgang Gatterbauer, Mirek Riedewald

    Abstract: We consider running-time optimization for band-joins in a distributed system, e.g., the cloud. To balance load across worker machines, input has to be partitioned, which causes duplication. We explore how to resolve this tension between maximum load per worker and input duplication for band-joins between two relations. Previous work suffered from high optimization cost or considered partitionings… ▽ More

    Submitted 13 April, 2020; originally announced April 2020.

  19. arXiv:2003.02829  [pdf, other

    cs.LG cs.DB cs.SI stat.ML

    Factorized Graph Representations for Semi-Supervised Learning from Sparse Data

    Authors: Krishna Kumar P., Paul Langton, Wolfgang Gatterbauer

    Abstract: Node classification is an important problem in graph data management. It is commonly solved by various label propagation methods that work iteratively starting from a few labeled seed nodes. For graphs with arbitrary compatibilities between classes, these methods crucially depend on knowing the compatibility matrix that must be provided by either domain experts or heuristics. Can we instead direct… ▽ More

    Submitted 5 March, 2020; originally announced March 2020.

    Comments: SIGMOD 2020 (Extended version)

  20. arXiv:1911.05582  [pdf, other

    cs.DB cs.DS

    Optimal Algorithms for Ranked Enumeration of Answers to Full Conjunctive Queries

    Authors: Nikolaos Tziavelis, Deepak Ajwani, Wolfgang Gatterbauer, Mirek Riedewald, Xiaofeng Yang

    Abstract: We study ranked enumeration of join-query results according to very general orders defined by selective dioids. Our main contribution is a framework for ranked enumeration over a class of dynamic programming problems that generalizes seemingly different problems that had been studied in isolation. To this end, we extend classic algorithms that find the k-shortest paths in a weighted graph. For ful… ▽ More

    Submitted 11 September, 2020; v1 submitted 13 November, 2019; originally announced November 2019.

    Comments: 50 pages, 19 figures

  21. arXiv:1907.01129  [pdf, other

    cs.DB cs.CC

    New Results for the Complexity of Resilience for Binary Conjunctive Queries with Self-Joins

    Authors: Cibele Freire, Wolfgang Gatterbauer, Neil Immerman, Alexandra Meliou

    Abstract: The resilience of a Boolean query is the minimum number of tuples that need to be deleted from the input tables in order to make the query false. A solution to this problem immediately translates into a solution for the more widely known problem of deletion propagation with source-side effects. In this paper, we give several novel results on the hardness of the resilience problem for… ▽ More

    Submitted 15 June, 2020; v1 submitted 1 July, 2019; originally announced July 2019.

    Comments: 23 pages, 19 figures, included a new section

  22. arXiv:1806.10078  [pdf, other

    cs.DB

    A General Framework for Anytime Approximation in Probabilistic Databases

    Authors: Maarten Van den Heuvel, Floris Geerts, Wolfgang Gatterbauer, Martin Theobald

    Abstract: Anytime approximation algorithms that compute the probabilities of queries over probabilistic databases can be of great use to statistical learning tasks. Those approaches have been based so far on either (i) sampling or (ii) branch-and-bound with model-based bounds. We present here a more general branch-and-bound framework that extends the possible bounds by using 'dissociation', which yields tig… ▽ More

    Submitted 3 July, 2018; v1 submitted 26 June, 2018; originally announced June 2018.

    Comments: 3 pages, 2 figures, submitted to StarAI 2018 Workshop

  23. arXiv:1802.06060  [pdf, other

    cs.SI cs.DB cs.DS

    Any-k: Anytime Top-k Tree Pattern Retrieval in Labeled Graphs

    Authors: Xiaofeng Yang, Deepak Ajwani, Wolfgang Gatterbauer, Patrick K. Nicholson, Mirek Riedewald, Alessandra Sala

    Abstract: Many problems in areas as diverse as recommendation systems, social network analysis, semantic search, and distributed root cause analysis can be modeled as pattern search on labeled graphs (also called "heterogeneous information networks" or HINs). Given a large graph and a query pattern with node and edge label constraints, a fundamental challenge is to nd the top-k matches ac- cording to a rank… ▽ More

    Submitted 10 April, 2018; v1 submitted 16 February, 2018; originally announced February 2018.

    Comments: To appear in WWW 2018

  24. arXiv:1612.04794  [pdf, other

    cs.DS

    Algorithms for Automatic Ranking of Participants and Tasks in an Anonymized Contest

    Authors: Yang Jiao, R. Ravi, Wolfgang Gatterbauer

    Abstract: We introduce a new set of problems based on the Chain Editing problem. In our version of Chain Editing, we are given a set of anonymous participants and a set of undisclosed tasks that every participant attempts. For each participant-task pair, we know whether the participant has succeeded at the task or not. We assume that participants vary in their ability to solve tasks, and that tasks vary in… ▽ More

    Submitted 20 December, 2016; v1 submitted 14 December, 2016; originally announced December 2016.

    Comments: 21 pages, 5 figures, preprint, full version of paper to appear in WALCOM 2017

  25. arXiv:1512.00537  [pdf, other

    cs.DB

    Fault-Tolerant Entity Resolution with the Crowd

    Authors: Anja Gruenheid, Besmira Nushi, Tim Kraska, Wolfgang Gatterbauer, Donald Kossmann

    Abstract: In recent years, crowdsourcing is increasingly applied as a means to enhance data quality. Although the crowd generates insightful information especially for complex problems such as entity resolution (ER), the output quality of crowd workers is often noisy. That is, workers may unintentionally generate false or contradicting data even for simple tasks. The challenge that we address in this paper… ▽ More

    Submitted 1 December, 2015; originally announced December 2015.

  26. arXiv:1507.00674  [pdf, other

    cs.DB cs.CC

    A Characterization of the Complexity of Resilience and Responsibility for Self-join-free Conjunctive Queries

    Authors: Cibele Freire, Wolfgang Gatterbauer, Neil Immerman, Alexandra Meliou

    Abstract: Several research thrusts in the area of data management have focused on understanding how changes in the data affect the output of a view or standing query. Example applications are explaining query results, propagating updates through views, and anonymizing datasets. These applications usually rely on understanding how interventions in a database impact the output of a query. An important aspect… ▽ More

    Submitted 2 July, 2015; originally announced July 2015.

    Comments: 36 pages, 13 figures

  27. arXiv:1502.04956  [pdf, other

    cs.AI cs.LG cs.SI

    The Linearization of Belief Propagation on Pairwise Markov Networks

    Authors: Wolfgang Gatterbauer

    Abstract: Belief Propagation (BP) is a widely used approximation for exact probabilistic inference in graphical models, such as Markov Random Fields (MRFs). In graphs with cycles, however, no exact convergence guarantees for BP are known, in general. For the case when all edges in the MRF carry the same symmetric, doubly stochastic potential, recent works have proposed to approximate BP by linearizing the u… ▽ More

    Submitted 27 December, 2016; v1 submitted 17 February, 2015; originally announced February 2015.

    Comments: Full version of AAAI 2017 paper with same title (23 pages, 9 figures)

  28. arXiv:1412.3100  [pdf, other

    cs.LG cs.DB

    Semi-Supervised Learning with Heterophily

    Authors: Wolfgang Gatterbauer

    Abstract: We derive a family of linear inference algorithms that generalize existing graph-based label propagation algorithms by allowing them to propagate generalized assumptions about "attraction" or "compatibility" between classes of neighboring nodes (in particular those that involve heterophily between nodes where "opposites attract"). We thus call this formulation Semi-Supervised Learning with Heterop… ▽ More

    Submitted 27 December, 2016; v1 submitted 9 December, 2014; originally announced December 2014.

    Comments: 17 pages, 13 figures

  29. arXiv:1412.1069  [pdf, other

    cs.DB cs.AI

    Approximate Lifted Inference with Probabilistic Databases

    Authors: Wolfgang Gatterbauer, Dan Suciu

    Abstract: This paper proposes a new approach for approximate evaluation of #P-hard queries with probabilistic databases. In our approach, every query is evaluated entirely in the database engine by evaluating a fixed number of query plans, each providing an upper bound on the true probability, then taking their minimum. We provide an algorithm that takes into account important schema information to enumerat… ▽ More

    Submitted 2 December, 2014; originally announced December 2014.

    Comments: 12 pages, 5 figures, pre-print for a paper appearing in VLDB 2015. arXiv admin note: text overlap with arXiv:1310.6257

  30. arXiv:1409.6052  [pdf, other

    cs.AI cs.DB

    Oblivious Bounds on the Probability of Boolean Functions

    Authors: Wolfgang Gatterbauer, Dan Suciu

    Abstract: This paper develops upper and lower bounds for the probability of Boolean functions by treating multiple occurrences of variables as independent and assigning them new individual probabilities. We call this approach dissociation and give an exact characterization of optimal oblivious bounds, i.e. when the new probabilities are chosen independent of the probabilities of all other variables. Our mot… ▽ More

    Submitted 21 September, 2014; originally announced September 2014.

    Comments: 34 pages, 14 figures, supersedes: http://arxiv.longhoe.net/abs/1105.2813

    Journal ref: Pre-print for ACM Transactions on Database Systems, January 2014, Vol 39, No 1, Article 5

  31. arXiv:1406.7288  [pdf, other

    cs.DB cs.AI

    Linearized and Single-Pass Belief Propagation

    Authors: Wolfgang Gatterbauer, Stephan Günnemann, Danai Koutra, Christos Faloutsos

    Abstract: How can we tell when accounts are fake or real in a social network? And how can we tell which accounts belong to liberal, conservative or centrist users? Often, we can answer such questions and label nodes in a network based on the labels of their neighbors and appropriate assumptions of homophily ("birds of a feather flock together") or heterophily ("opposites attract"). One of the most widely us… ▽ More

    Submitted 16 October, 2014; v1 submitted 27 June, 2014; originally announced June 2014.

    Comments: 17 pages, 11 figures, 4 algorithms. Includes following major changes since v1: renaming of "turbo BP" to "single-pass BP", convergence criteria now give sufficient *and* necessary conditions, more detailed experiments, more detailed comparison with prior BP convergence results, overall improved exposition

  32. arXiv:1310.6257  [pdf, other

    cs.DB cs.AI

    Dissociation and Propagation for Approximate Lifted Inference with Standard Relational Database Management Systems

    Authors: Wolfgang Gatterbauer, Dan Suciu

    Abstract: Probabilistic inference over large data sets is a challenging data management problem since exact inference is generally #P-hard and is most often solved approximately with sampling-based methods today. This paper proposes an alternative approach for approximate evaluation of conjunctive queries with standard relational databases: In our approach, every query is evaluated entirely in the database… ▽ More

    Submitted 14 June, 2016; v1 submitted 23 October, 2013; originally announced October 2013.

    Comments: 33 pages, 27 figures, pre-print for VLDBJ full version of arXiv:1412.1069 [PVLDB 8(5):629-640, 2015: "Approximate lifted inference with probabilistic databases", http://www.vldb.org/pvldb/vol8/p629-gatterbauer.pdf ]. Former working title: "Dissociation and Propagation for Efficient Query Evaluation over Probabilistic Databases"

  33. arXiv:1105.4395  [pdf, other

    cs.DB

    Default-all is dangerous!

    Authors: Wolfgang Gatterbauer, Alexandra Meliou, Dan Suciu

    Abstract: We show that the default-all propagation scheme for database annotations is dangerous. Dangerous here means that it can propagate annotations to the query output which are semantically irrelevant to the query the user asked. This is the result of considering all relationally equivalent queries and returning the union of their where-provenance in an attempt to define a propagation scheme that is in… ▽ More

    Submitted 22 May, 2011; originally announced May 2011.

    Comments: 4 pages, 6 figures, preprint of paper appearing in Proceedings of TaPP '11 (3rd USENIX Workshop on the Theory and Practice of Provenance); for details see the project page: http://db.cs.washington.edu/causality/

  34. arXiv:1105.2813  [pdf, other

    cs.AI cs.DB cs.LO

    Optimal Upper and Lower Bounds for Boolean Expressions by Dissociation

    Authors: Wolfgang Gatterbauer, Dan Suciu

    Abstract: This paper develops upper and lower bounds for the probability of Boolean expressions by treating multiple occurrences of variables as independent and assigning them new individual probabilities. Our technique generalizes and extends the underlying idea of a number of recent approaches which are varyingly called node splitting, variable renaming, variable splitting, or dissociation for probabilist… ▽ More

    Submitted 13 May, 2011; originally announced May 2011.

    Comments: 7 pages, 2 figures; for details see the project page: http://LaPushDB.com/

  35. arXiv:1012.3502  [pdf, other

    cs.IR cs.DB physics.data-an

    Rules of Thumb for Information Acquisition from Large and Redundant Data

    Authors: Wolfgang Gatterbauer

    Abstract: We develop an abstract model of information acquisition from redundant data. We assume a random sampling process from data which provide information with bias and are interested in the fraction of information we expect to learn as function of (i) the sampled fraction (recall) and (ii) varying bias of information (redundancy distributions). We develop two rules of thumb with varying robustness. We… ▽ More

    Submitted 15 December, 2010; originally announced December 2010.

    Comments: 40 pages, 17 figures; for details see the project page: http://uniquerecall.com

    Journal ref: Full version of upcoming ECIR 2011 conference paper

  36. Data Conflict Resolution Using Trust Map**s

    Authors: Wolfgang Gatterbauer, Dan Suciu

    Abstract: In massively collaborative projects such as scientific or community databases, users often need to agree or disagree on the content of individual data items. On the other hand, trust relationships often exist between users, allowing them to accept or reject other users' beliefs by default. As those trust relationships become complex, however, it becomes difficult to define and compute a consistent… ▽ More

    Submitted 15 December, 2010; originally announced December 2010.

    Comments: 20 pages, 19 figures

    Report number: University of Washington CSE Technical Report 09-11-01

    Journal ref: Full version of SIGMOD 2010 conference paper, pp. 219-230

  37. arXiv:1009.2021  [pdf, other

    cs.DB cs.AI

    The Complexity of Causality and Responsibility for Query Answers and non-Answers

    Authors: Alexandra Meliou, Wolfgang Gatterbauer, Katherine F. Moore, Dan Suciu

    Abstract: An answer to a query has a well-defined lineage expression (alternatively called how-provenance) that explains how the answer was derived. Recent work has also shown how to compute the lineage of a non-answer to a query. However, the cause of an answer or non-answer is a more subtle notion and consists, in general, of only a fragment of the lineage. In this paper, we adapt Halpern, Pearl, and Choc… ▽ More

    Submitted 29 September, 2011; v1 submitted 10 September, 2010; originally announced September 2010.

    Comments: 15 pages, 12 figures, PVLDB 2011

  38. arXiv:0912.5340  [pdf, other

    cs.DB cs.AI

    Why so? or Why no? Functional Causality for Explaining Query Answers

    Authors: Alexandra Meliou, Wolfgang Gatterbauer, Katherine F. Moore, Dan Suciu

    Abstract: In this paper, we propose causality as a unified framework to explain query answers and non-answers, thus generalizing and extending several previously proposed approaches of provenance and missing query result explanations. We develop our framework starting from the well-studied definition of actual causes by Halpern and Pearl. After identifying some undesirable characteristics of the origina… ▽ More

    Submitted 29 December, 2009; originally announced December 2009.

    Comments: 18 pages, 15 figures

    Report number: University of Washington CSE Technical Report 09-12-01 ACM Class: H.2.1

  39. arXiv:0912.5241  [pdf, other

    cs.DB cs.AI

    Believe It or Not: Adding Belief Annotations to Databases

    Authors: Wolfgang Gatterbauer, Magdalena Balazinska, Nodira Khoussainova, Dan Suciu

    Abstract: We propose a database model that allows users to annotate data with belief statements. Our motivation comes from scientific database applications where a community of users is working together to assemble, revise, and curate a shared data repository. As the community accumulates knowledge and the database content evolves over time, it may contain conflicting information and members can disagree… ▽ More

    Submitted 30 December, 2009; originally announced December 2009.

    Comments: 17 pages, 10 figures

    Report number: University of Washington CSE Technical Report 08-12-01 ACM Class: H.2.1

    Journal ref: Full version of: VLDB 2009 conference version; PVLDB 2(1):1-12 (2009)

  40. arXiv:0909.1778  [pdf

    cs.DB

    A Case for A Collaborative Query Management System

    Authors: Nodira Khoussainova, Magda Balazinska, Wolfgang Gatterbauer, YongChul Kwon, Dan Suciu

    Abstract: Over the past 40 years, database management systems (DBMSs) have evolved to provide a sophisticated variety of data management capabilities. At the same time, tools for managing queries over the data have remained relatively primitive. One reason for this is that queries are typically issued through applications. They are thus debugged once and re-used repeatedly. This mode of interaction, howev… ▽ More

    Submitted 9 September, 2009; originally announced September 2009.

    Comments: CIDR 2009