Skip to main content

Showing 1–50 of 57 results for author: Bertossi, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2401.12731  [pdf, other

    cs.AI cs.LG cs.LO

    The Distributional Uncertainty of the SHAP score in Explainable Machine Learning

    Authors: Santiago Cifuentes, Leopoldo Bertossi, Nina Pardal, Sergio Abriola, Maria Vanina Martinez, Miguel Romero

    Abstract: Attribution scores reflect how important the feature values in an input entity are for the output of a machine learning model. One of the most popular attribution scores is the SHAP score, which is an instantiation of the general Shapley value used in coalition game theory. The definition of this score relies on a probability distribution on the entity population. Since the exact distribution is g… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    MSC Class: 68T37; 68T27

  2. The Shapley Value in Database Management

    Authors: Leopoldo Bertossi, Benny Kimelfeld, Ester Livshits, Mikaël Monet

    Abstract: Attribution scores can be applied in data management to quantify the contribution of individual items to conclusions from the data, as part of the explanation of what led to these conclusions. In Artificial Intelligence, Machine Learning, and Data Management, some of the common scores are deployments of the Shapley value, a formula for profit sharing in cooperative game theory. Since its invention… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

    Comments: 12 pages, including references. This is the authors version of the corresponding SIGMOD Record article

    Journal ref: SIGMOD Rec. 52(2): 6-17 (2023)

  3. arXiv:2308.00184  [pdf, other

    cs.DB cs.AI cs.LG

    Attribution-Scores in Data Management and Explainable Machine Learning

    Authors: Leopoldo Bertossi

    Abstract: We describe recent research on the use of actual causality in the definition of responsibility scores as explanations for query answers in databases, and for outcomes from classification models in machine learning. In the case of databases, useful connections with database repairs are illustrated and exploited. Repairs are also used to give a quantitative measure of the consistency of a database.… ▽ More

    Submitted 31 July, 2023; originally announced August 2023.

    Comments: Paper associated to ADBIS23 tutorial. To appear. arXiv admin note: substantial text overlap with arXiv:2303.02829, arXiv:2106.10562

  4. arXiv:2306.09374  [pdf, other

    cs.DB cs.AI

    From Database Repairs to Causality in Databases and Beyond

    Authors: Leopoldo Bertossi

    Abstract: We describe some recent approaches to score-based explanations for query answers in databases. The focus is on work done by the author and collaborators. Special emphasis is placed on the use of counterfactual reasoning for score specification and computation. Several examples that illustrate the flexibility of these methods are shown.

    Submitted 15 June, 2023; originally announced June 2023.

    Comments: Contributed paper associated to keynote presentation at BDA 2022. To appear in special issue of Springer TLDKS. arXiv admin note: substantial text overlap with arXiv:2106.10562

  5. arXiv:2303.06516  [pdf, other

    cs.AI cs.DB cs.LG

    Efficient Computation of Shap Explanation Scores for Neural Network Classifiers via Knowledge Compilation

    Authors: Leopoldo Bertossi, Jorge E. Leon

    Abstract: The use of Shap scores has become widespread in Explainable AI. However, their computation is in general intractable, in particular when done with a black-box classifier, such as neural network. Recent research has unveiled classes of open-box Boolean Circuit classifiers for which Shap can be computed efficiently. We show how to transform binary neural networks into those circuits for efficient Sh… ▽ More

    Submitted 22 July, 2023; v1 submitted 11 March, 2023; originally announced March 2023.

    Comments: Substantial revision of previous version with the same title. To appear in conference proceedings. It replaces the previously uploaded paper "Opening Up the Neural Network Classifier for Shap Score Computation", by the same authors

  6. arXiv:2303.02829  [pdf, other

    cs.AI cs.DB cs.LG

    Attribution-Scores and Causal Counterfactuals as Explanations in Artificial Intelligence

    Authors: Leopoldo Bertossi

    Abstract: In this expository article we highlight the relevance of explanations for artificial intelligence, in general, and for the newer developments in {\em explainable AI}, referring to origins and connections of and among different approaches. We describe in simple terms, explanations in data management and machine learning that are based on attribution-scores, and counterfactuals as found in the area… ▽ More

    Submitted 22 March, 2023; v1 submitted 5 March, 2023; originally announced March 2023.

    Comments: Submitted as chapter contribution. In this version some additional comments were added, and some wrong equation references corrected

  7. arXiv:2209.12110  [pdf, ps, other

    cs.LO cs.AI cs.DB

    Answer-Set Programs for Repair Updates and Counterfactual Interventions

    Authors: Leopoldo Bertossi

    Abstract: We briefly describe -- mainly through very simple examples -- different kinds of answer-set programs with annotations that have been proposed for specifying: database repairs and consistent query answering; secrecy view and query evaluation with them; counterfactual interventions for causality in databases; and counterfactual-based explanations in machine learning.

    Submitted 24 September, 2022; originally announced September 2022.

    Comments: Submitted to Festschrift volume

  8. arXiv:2108.11004  [pdf, ps, other

    cs.AI cs.LO

    Reasoning about Counterfactuals and Explanations: Problems, Results and Directions

    Authors: Leopoldo Bertossi

    Abstract: There are some recent approaches and results about the use of answer-set programming for specifying counterfactual interventions on entities under classification, and reasoning about them. These approaches are flexible and modular in that they allow the seamless addition of domain knowledge. Reasoning is enabled by query answering from the answer-set program. The programs can be used to specify an… ▽ More

    Submitted 24 August, 2021; originally announced August 2021.

    Comments: To appear in informal proceedings of 2nd Workshop on Explainable Logic-Based Knowledge Representation (XLoKR 2021), co-located with KR 2021. arXiv admin note: substantial text overlap with arXiv:2107.10159

  9. arXiv:2108.08423  [pdf, ps, other

    cs.DB cs.AI cs.LO

    Second-Order Specifications and Quantifier Elimination for Consistent Query Answering in Databases

    Authors: Leopoldo Bertossi

    Abstract: Consistent answers to a query from a possibly inconsistent database are answers that are simultaneously retrieved from every possible repair of the database. Repairs are consistent instances that minimally differ from the original inconsistent instance. It has been shown before that database repairs can be specified as the stable models of a disjunctive logic program. In this paper we show how to… ▽ More

    Submitted 18 October, 2021; v1 submitted 18 August, 2021; originally announced August 2021.

    Comments: A couple of minor mistakes corrected, and some explanations added

  10. arXiv:2108.00903  [pdf, other

    cs.DB cs.AI cs.LO

    Extending Sticky-Datalog+/- via Finite-Position Selection Functions: Tractability, Algorithms, and Optimization

    Authors: Leopoldo Bertossi, Mostafa Milani

    Abstract: Weakly-Sticky(WS) Datalog+/- is an expressive member of the family of Datalog+/- program classes that is defined on the basis of the conditions of stickiness and weak-acyclicity. Conjunctive query answering (QA) over the WS programs has been investigated, and its tractability in data complexity has been established. However, the design and implementation of practical QA algorithms and their optimi… ▽ More

    Submitted 2 August, 2021; v1 submitted 2 August, 2021; originally announced August 2021.

    Comments: Journal submission

  11. arXiv:2107.10159  [pdf, other

    cs.AI cs.LG cs.LO

    Answer-Set Programs for Reasoning about Counterfactual Interventions and Responsibility Scores for Classification

    Authors: Leopoldo Bertossi, Gabriela Reyes

    Abstract: We describe how answer-set programs can be used to declaratively specify counterfactual interventions on entities under classification, and reason about them. In particular, they can be used to define and compute responsibility scores as attribution-based explanations for outcomes from classification models. The approach allows for the inclusion of domain knowledge and supports query answering. A… ▽ More

    Submitted 1 September, 2021; v1 submitted 21 July, 2021; originally announced July 2021.

    Comments: Revised for camera ready. Extended version with appendices of paper to appear in IJCLR'21. arXiv admin note: text overlap with arXiv:2106.10562

  12. arXiv:2106.10562  [pdf, other

    cs.AI cs.DB cs.LG cs.LO

    Score-Based Explanations in Data Management and Machine Learning: An Answer-Set Programming Approach to Counterfactual Analysis

    Authors: Leopoldo Bertossi

    Abstract: We describe some recent approaches to score-based explanations for query answers in databases and outcomes from classification models in machine learning. The focus is on work done by the author and collaborators. Special emphasis is placed on declarative approaches based on answer-set programming to the use of counterfactual reasoning for score specification and computation. Several examples that… ▽ More

    Submitted 19 September, 2021; v1 submitted 19 June, 2021; originally announced June 2021.

    Comments: Revised version for camera ready. Typos corrected, new references, and a new section with background material added. Paper associated to forthcoming short course at Fall School. arXiv admin note: text overlap with arXiv:2007.12799

  13. arXiv:2104.08015  [pdf, other

    cs.AI cs.CC

    On the Complexity of SHAP-Score-Based Explanations: Tractability via Knowledge Compilation and Non-Approximability Results

    Authors: Marcelo Arenas, Pablo Barceló, Leopoldo Bertossi, Mikaël Monet

    Abstract: In Machine Learning, the $\mathsf{SHAP}$-score is a version of the Shapley value that is used to explain the result of a learned model on a specific entity by assigning a score to every feature. While in general computing Shapley values is an intractable problem, we prove a strong positive result stating that the $\mathsf{SHAP}$-score can be computed in polynomial time over deterministic and decom… ▽ More

    Submitted 30 March, 2023; v1 submitted 16 April, 2021; originally announced April 2021.

    Comments: Up to the formatting, this is the exact content of the paper in Journal of Machine Learning Research (JMLR)

  14. arXiv:2011.07423  [pdf, other

    cs.AI cs.DB cs.LG cs.LO

    Declarative Approaches to Counterfactual Explanations for Classification

    Authors: Leopoldo Bertossi

    Abstract: We propose answer-set programs that specify and compute counterfactual interventions on entities that are input on a classification model. In relation to the outcome of the model, the resulting counterfactual entities serve as a basis for the definition and computation of causality-based explanation scores for the feature values in the entity under classification, namely "responsibility scores". T… ▽ More

    Submitted 7 December, 2021; v1 submitted 14 November, 2020; originally announced November 2020.

    Comments: Camera-ready of journal version, with some final additions and revisions. Revised and considerably extended version of a RuleML-RR'20 paper [arXiv:2004.13237]. Submitted by invitation

  15. arXiv:2007.14045  [pdf, ps, other

    cs.AI cs.CC

    The Tractability of SHAP-Score-Based Explanations over Deterministic and Decomposable Boolean Circuits

    Authors: Marcelo Arenas, Pablo Barceló Leopoldo Bertossi, Mikaël Monet

    Abstract: Scores based on Shapley values are widely used for providing explanations to classification results over machine learning models. A prime example of this is the influential SHAP-score, a version of the Shapley value that can help explain the result of a learned model on a specific entity by assigning a score to every feature. While in general computing Shapley values is a computationally intractab… ▽ More

    Submitted 3 April, 2021; v1 submitted 28 July, 2020; originally announced July 2020.

    Comments: 17 pages, including 8 pages of main text. arXiv version of the AAAI'21 conference paper. Except from the addition of the technical appendix, the content is the same as the AAAI one

  16. arXiv:2007.12799  [pdf, other

    cs.DB cs.AI cs.LG

    Score-Based Explanations in Data Management and Machine Learning

    Authors: Leopoldo Bertossi

    Abstract: We describe some approaches to explanations for observed outcomes in data management and machine learning. They are based on the assignment of numerical scores to predefined and potentially relevant inputs. More specifically, we consider explanations for query answers in databases, and for results from classification models. The described approaches are mostly of a causal and counterfactual nature… ▽ More

    Submitted 18 August, 2020; v1 submitted 24 July, 2020; originally announced July 2020.

    Comments: Companion paper for a tutorial at the Scalable Uncertainty Management Conference (SUM'20). To appear in Proc. SUM'20. Minor fixes made

  17. arXiv:2004.13237  [pdf, ps, other

    cs.LG cs.DB cs.LO stat.ML

    An ASP-Based Approach to Counterfactual Explanations for Classification

    Authors: Leopoldo Bertossi

    Abstract: We propose answer-set programs that specify and compute counterfactual interventions as a basis for causality-based explanations to decisions produced by classification models. They can be applied with black-box models and models that can be specified as logic programs, such as rule-based classifiers. The main focus in on the specification and computation of maximum responsibility causal explanati… ▽ More

    Submitted 15 June, 2020; v1 submitted 27 April, 2020; originally announced April 2020.

    Comments: Revised and extended version. To appear in Proc. RuleML+RR, 2020

  18. arXiv:2003.06868  [pdf, other

    cs.LG cs.AI cs.DB stat.ML

    Causality-based Explanation of Classification Outcomes

    Authors: Leopoldo Bertossi, Jordan Li, Maximilian Schleich, Dan Suciu, Zografoula Vagena

    Abstract: We propose a simple definition of an explanation for the outcome of a classifier based on concepts from causality. We compare it with previously proposed notions of explanation, and study their complexity. We conduct an experimental evaluation with two real datasets from the financial domain.

    Submitted 25 May, 2020; v1 submitted 15 March, 2020; originally announced March 2020.

    Comments: 16 pages, 6 figures, 1 table

  19. The Shapley Value of Tuples in Query Answering

    Authors: Ester Livshits, Leopoldo Bertossi, Benny Kimelfeld, Moshe Sebag

    Abstract: We investigate the application of the Shapley value to quantifying the contribution of a tuple to a query answer. The Shapley value is a widely known numerical measure in cooperative game theory and in many applications of game theory for assessing the contribution of a player to a coalition game. It has been established already in the 1950s, and is theoretically justified by being the very single… ▽ More

    Submitted 1 September, 2021; v1 submitted 18 April, 2019; originally announced April 2019.

    Journal ref: Logical Methods in Computer Science, Volume 17, Issue 3 (September 2, 2021) lmcs:6942

  20. arXiv:1809.10286  [pdf, other

    cs.DB cs.AI

    Repair-Based Degrees of Database Inconsistency: Computation and Complexity

    Authors: Leopoldo Bertossi

    Abstract: We propose a generic numerical measure of the inconsistency of a database with respect to a set of integrity constraints. It is based on an abstract repair semantics. In particular, an inconsistency measure associated to cardinality-repairs is investigated in detail. More specifically, it is shown that it can be computed via answer-set programs, but sometimes its computation can be intractable in… ▽ More

    Submitted 22 January, 2019; v1 submitted 26 September, 2018; originally announced September 2018.

    Comments: Some editing made and some new paragraphs added

  21. arXiv:1804.08834  [pdf, ps, other

    cs.DB cs.AI cs.LO

    Measuring and Computing Database Inconsistency via Repairs

    Authors: Leopoldo Bertossi

    Abstract: We propose a generic numerical measure of inconsistency of a database with respect to a set of integrity constraints. It is based on an abstract repair semantics. A particular inconsistency measure associated to cardinality-repairs is investigated; and we show that it can be computed via answer-set programs. Keywords: Integrity constraints in databases, inconsistent databases, database repairs,… ▽ More

    Submitted 12 July, 2018; v1 submitted 24 April, 2018; originally announced April 2018.

    Comments: Submission as short paper; to appear in Proc. Scalable Uncertainty Management, SUM 2018. Abstract and keywords added

  22. arXiv:1803.06445  [pdf, other

    cs.DB cs.AI cs.LO

    Datalog: Bag Semantics via Set Semantics

    Authors: Leopoldo Bertossi, Georg Gottlob, Reinhard Pichler

    Abstract: Duplicates in data management are common and problematic. In this work, we present a translation of Datalog under bag semantics into a well-behaved extension of Datalog, the so-called {\em warded Datalog}$^\pm$, under set semantics. From a theoretical point of view, this allows us to reason on bag semantics by making use of the well-established theoretical foundations of set semantics. From a prac… ▽ More

    Submitted 12 February, 2019; v1 submitted 16 March, 2018; originally announced March 2018.

    Comments: Extended version of paper appearing in Proc. ICDT 2019

  23. arXiv:1712.01001  [pdf, other

    cs.DB cs.AI cs.LO

    Specifying and Computing Causes for Query Answers in Databases via Database Repairs and Repair Programs

    Authors: Leopoldo Bertossi

    Abstract: A correspondence between database tuples as causes for query answers in databases and tuple-based repairs of inconsistent databases with respect to denial constraints has already been established. In this work, answer-set programs that specify repairs of databases are used as a basis for solving computational and reasoning problems about causes. Here, causes are also introduced at the attribute le… ▽ More

    Submitted 28 September, 2020; v1 submitted 4 December, 2017; originally announced December 2017.

    Comments: To appear in "Knowledge and Information Systems" journal. This is the final version, and a much revised, corrected and extended version of: Bertossi, L. "Characterizing and Computing Causes for Query Answers in Databases from Database Repairs and Repair Programs". Proc. FoIKs, 2018, Springer LNCS 10833, pp. 55-76

  24. arXiv:1704.05136  [pdf, ps, other

    cs.DB cs.AI

    The Causality/Repair Connection in Databases: Causality-Programs

    Authors: Leopoldo Bertossi

    Abstract: In this work, answer-set programs that specify repairs of databases are used as a basis for solving computational and reasoning problems about causes for query answers from databases.

    Submitted 26 June, 2017; v1 submitted 17 April, 2017; originally announced April 2017.

    Comments: To appear in Proc. SUM'17 as short paper, 7-pages

  25. arXiv:1704.00115  [pdf, other

    cs.DB cs.AI

    Ontological Multidimensional Data Models and Contextual Data Qality

    Authors: Leopoldo Bertossi, Mostafa Milani

    Abstract: Data quality assessment and data cleaning are context-dependent activities. Motivated by this observation, we propose the Ontological Multidimensional Data Model (OMD model), which can be used to model and represent contexts as logic-based ontologies. The data under assessment is mapped into the context, for additional analysis, processing, and quality data extraction. The resulting contexts allow… ▽ More

    Submitted 13 August, 2017; v1 submitted 31 March, 2017; originally announced April 2017.

    Comments: Journal submission (revised version addressing reviewers' observations) Extended version of RuleML'15 paper

  26. arXiv:1703.03524  [pdf, other

    cs.DB cs.AI

    The Ontological Multidimensional Data Model

    Authors: Leopoldo Bertossi, Mostafa Milani

    Abstract: In this extended abstract we describe, mainly by examples, the main elements of the Ontological Multidimensional Data Model, which considerably extends a relational reconstruction of the multidimensional data model proposed by Hurtado and Mendelzon by means of tuple-generating dependencies, equality-generating dependencies, and negative constraints as found in Datalog+-. We briefly mention some go… ▽ More

    Submitted 3 May, 2017; v1 submitted 9 March, 2017; originally announced March 2017.

    Comments: Extended abstract. This version with minor revisions and slightly extended. To appear in Proc. AMW'17

  27. arXiv:1611.06951  [pdf, ps, other

    cs.DB cs.AI

    Enforcing Relational Matching Dependencies with Datalog for Entity Resolution

    Authors: Zeinab Bahmani, Leopoldo Bertossi

    Abstract: Entity resolution (ER) is about identifying and merging records in a database that represent the same real-world entity. Matching dependencies (MDs) have been introduced and investigated as declarative rules that specify ER policies. An ER process induced by MDs over a dirty instance leads to multiple clean instances, in general. General "answer sets programs" have been proposed to specify the MD-… ▽ More

    Submitted 25 February, 2017; v1 submitted 21 November, 2016; originally announced November 2016.

    Comments: New revisions applied. To appear in Proc. FLAIRS'17

  28. arXiv:1611.01711  [pdf, other

    cs.DB cs.AI

    Causes for Query Answers from Databases: Datalog Abduction, View-Updates, and Integrity Constraints

    Authors: Leopoldo Bertossi, Babak Salimi

    Abstract: Causality has been recently introduced in databases, to model, characterize, and possibly compute causes for query answers. Connections between QA-causality and consistency-based diagnosis and database repairs (wrt. integrity constraint violations) have already been established. In this work we establish precise connections between QA-causality and both abductive diagnosis and the view-update prob… ▽ More

    Submitted 31 July, 2017; v1 submitted 5 November, 2016; originally announced November 2016.

    Comments: To appear in International Journal of Approximate Reasoning. Extended version of "Flairs'16" and "UAI'15 WS on Causality" papers

  29. arXiv:1608.04142  [pdf, ps, other

    cs.DB

    Contexts and Data Quality Assessment

    Authors: Leopoldo Bertossi, Flavio Rizzolo

    Abstract: The quality of data is context dependent. Starting from this intuition and experience, we propose and develop a conceptual framework that captures in formal terms the notion of "context-dependent data quality". We start by proposing a generic and abstract notion of context, and also of its uses, in general and in data management in particular. On this basis, we investigate "data quality assessment… ▽ More

    Submitted 14 August, 2016; originally announced August 2016.

  30. arXiv:1607.02682  [pdf, ps, other

    cs.DB cs.AI

    Extending Weakly-Sticky Datalog+/-: Query-Answering Tractability and Optimizations

    Authors: Mostafa Milani, Leopoldo Bertossi

    Abstract: Weakly-sticky (WS) Datalog+/- is an expressive member of the family of Datalog+/- programs that is based on the syntactic notions of stickiness and weak-acyclicity. Query answering over the WS programs has been investigated, but there is still much work to do on the design and implementation of practical query answering (QA) algorithms and their optimizations. Here, we study sticky and WS programs… ▽ More

    Submitted 9 July, 2016; originally announced July 2016.

    Comments: Extended version of RR'16 paper

  31. Consistency and Trust in Peer Data Exchange Systems

    Authors: Leopoldo Bertossi, Loreto Bravo

    Abstract: We propose and investigate a semantics for "peer data exchange systems" where different peers are related by data exchange constraints and trust relationships. These two elements plus the data at the peers' sites and their local integrity constraints are made compatible via a semantics that characterizes sets of "solution instances" for the peers. They are the intended -possibly virtual- instances… ▽ More

    Submitted 6 June, 2016; originally announced June 2016.

    Comments: To appear in Theory and Practice of Logic Programming (TPLP). It includes appendix that will be published only in electronic format

  32. arXiv:1605.07159  [pdf, other

    cs.DB

    Complexity of Consistent Query Answering in Databases under Cardinality-Based and Incremental Repair Semantics (extended version)

    Authors: Andrei Lopatenko, Leopoldo Bertossi

    Abstract: A database D may be inconsistent wrt a given set IC of integrity constraints. Consistent Query Answering (CQA) is the problem of computing from D the answers to a query that are consistent wrt IC . Consistent answers are invariant under all the repairs of D, i.e. the consistent instances that minimally depart from D. Three classes of repair have been considered in the literature: those that minimi… ▽ More

    Submitted 23 May, 2016; originally announced May 2016.

    Comments: This paper, without the proofs provided here, arXiv:cs/0604002, appeared in the Proc. of ICDT 2007. This version contains all the proofs in correlation with the results reported in the ICDT paper (as opposed to a previous Arkiv Corr posting related to the same paper). One proof was corrected, and a corollary was added

  33. arXiv:1604.06770  [pdf, ps, other

    cs.DB cs.AI

    A Hybrid Approach to Query Answering under Expressive Datalog+/-

    Authors: Mostafa Milani, Andrea Cali, Leopoldo Bertossi

    Abstract: Datalog+/- is a family of ontology languages that combine good computational properties with high expressive power. Datalog+/- languages are provably able to capture the most relevant Semantic Web languages. In this paper we consider the class of weakly-sticky (WS) Datalog+/- programs, which allow for certain useful forms of joins in rule bodies as well as extending the well-known class of weakly-… ▽ More

    Submitted 25 July, 2016; v1 submitted 22 April, 2016; originally announced April 2016.

    Comments: Extended version of RR'16 paper, to appear

  34. arXiv:1603.02705  [pdf, ps, other

    cs.DB

    Quantifying Causal Effects on Query Answering in Databases

    Authors: Babak Salimi, Leopoldo Bertossi, Dan Suciu, Guy Van den Broeck

    Abstract: The notion of actual causation, as formalized by Halpern and Pearl, has been recently applied to relational databases, to characterize and compute actual causes for possibly unexpected answers to monotone queries. Causes take the form of database tuples, and can be ranked according to their causal responsibility, a numerical measure of their relevance as a cause to the query answer. In this work w… ▽ More

    Submitted 24 April, 2016; v1 submitted 8 March, 2016; originally announced March 2016.

    Comments: To appear in Proc. TAPP'16

    ACM Class: H.2; I.2

  35. arXiv:1602.06458  [pdf, other

    cs.DB cs.AI

    Causes for Query Answers from Databases, Datalog Abduction and View-Updates: The Presence of Integrity Constraints

    Authors: Babak Salimi, Leopoldo Bertossi

    Abstract: Causality has been recently introduced in databases, to model, characterize and possibly compute causes for query results (answers). Connections between queryanswer causality, consistency-based diagnosis, database repairs (wrt. integrity constraint violations), abductive diagnosis and the view-update problem have been established. In this work we further investigate connections between query-answe… ▽ More

    Submitted 20 February, 2016; originally announced February 2016.

    Comments: To appear in Proceedings Flairs, 2016

  36. arXiv:1602.02334  [pdf, other

    cs.DB cs.AI cs.LG

    ERBlox: Combining Matching Dependencies with Machine Learning for Entity Resolution

    Authors: Zeinab Bahmani, Leopoldo Bertossi, Nikolaos Vasiloglou

    Abstract: Entity resolution (ER), an important and common data cleaning problem, is about detecting data duplicate representations for the same external entities, and merging them into single representations. Relatively recently, declarative rules called "matching dependencies" (MDs) have been proposed for specifying similarity conditions under which attribute values in database records are merged. In this… ▽ More

    Submitted 18 January, 2017; v1 submitted 6 February, 2016; originally announced February 2016.

    Comments: Final journal version, with some minor technical corrections. Extended version of arXiv:1508.06013

  37. arXiv:1508.06013  [pdf, other

    cs.DB cs.AI cs.LG

    ERBlox: Combining Matching Dependencies with Machine Learning for Entity Resolution

    Authors: Zeinab Bahmani, Leopoldo Bertossi, Nikolaos Vasiloglou

    Abstract: Entity resolution (ER), an important and common data cleaning problem, is about detecting data duplicate representations for the same external entities, and merging them into single representations. Relatively recently, declarative rules called matching dependencies (MDs) have been proposed for specifying similarity conditions under which attribute values in database records are merged. In this wo… ▽ More

    Submitted 24 August, 2015; originally announced August 2015.

    Comments: To appear in Proc. SUM, 2015

    Journal ref: Proc. SUM'15, 2015, Springer LNAI 9310, pp. 399-414

  38. arXiv:1507.00257  [pdf, ps, other

    cs.DB cs.AI cs.LO

    From Causes for Database Queries to Repairs and Model-Based Diagnosis and Back

    Authors: Leopoldo Bertossi, Babak Salimi

    Abstract: In this work we establish and investigate connections between causes for query answers in databases, database repairs wrt. denial constraints, and consistency-based diagnosis. The first two are relatively new research areas in databases, and the third one is an established subject in knowledge representation. We show how to obtain database repairs from causes, and the other way around. Causality p… ▽ More

    Submitted 23 October, 2016; v1 submitted 1 July, 2015; originally announced July 2015.

    Comments: To appear in Theory of Computing Systems. By invitation to special issue with extended papers from ICDT 2015 (paper arXiv:1412.4311)

  39. arXiv:1506.04299  [pdf, ps, other

    cs.DB cs.AI

    Query-Answer Causality in Databases: Abductive Diagnosis and View-Updates

    Authors: Babak Salimi, Leopoldo Bertossi

    Abstract: Causality has been recently introduced in databases, to model, characterize and possibly compute causes for query results (answers). Connections between query causality and consistency-based diagnosis and database repairs (wrt. integrity constrain violations) have been established in the literature. In this work we establish connections between query causality and abductive diagnosis and the view-… ▽ More

    Submitted 19 September, 2015; v1 submitted 13 June, 2015; originally announced June 2015.

    Comments: To appear in Proc. UAI Causal Inference Workshop, 2015. One example was fixed

  40. arXiv:1504.03386  [pdf, ps, other

    cs.DB cs.AI cs.LO

    Tractable Query Answering and Optimization for Extensions of Weakly-Sticky Datalog+-

    Authors: Mostafa Milani, Leopoldo Bertossi

    Abstract: We consider a semantic class, weakly-chase-sticky (WChS), and a syntactic subclass, jointly-weakly-sticky (JWS), of Datalog+- programs. Both extend that of weakly-sticky (WS) programs, which appear in our applications to data quality. For WChS programs we propose a practical, polynomial-time query answering algorithm (QAA). We establish that the two classes are closed under magic-sets rewritings.… ▽ More

    Submitted 13 April, 2015; originally announced April 2015.

    Comments: To appear in Proc. Alberto Mendelzon WS on Foundations of Data Management (AMW15)

  41. arXiv:1412.4311  [pdf, other

    cs.DB

    From Causes for Database Queries to Repairs and Model-Based Diagnosis and Back

    Authors: Babak Salimi, Leopoldo Bertossi

    Abstract: In this work we establish and investigate connections between causality for query answers in databases, database repairs wrt. denial constraints, and consistency-based diagnosis. The first two are relatively new problems in databases, and the third one is an established subject in knowledge representation. We show how to obtain database repairs from causes and the other way around. Causality probl… ▽ More

    Submitted 13 December, 2014; originally announced December 2014.

    Comments: Extended version of paper to appear in Proceedings of ICDT 2015

  42. arXiv:1405.4228  [pdf, ps, other

    cs.DB

    Unifying Causality, Diagnosis, Repairs and View-Updates in Databases

    Authors: Leopoldo Bertossi, Babak Salimi

    Abstract: In this work we establish and point out connections between the notion of query-answer causality in databases and database repairs, model-based diagnosis in its consistency-based and abductive versions, and database updates through views. The mutual relationships among these areas of data management and knowledge representation shed light on each of them and help to share notions and results they… ▽ More

    Submitted 28 June, 2014; v1 submitted 16 May, 2014; originally announced May 2014.

    Comments: On-line Proc. First International Workshop on Big Uncertain Data (BUDA 2014). Co-located with ACM PODS 2014. arXiv admin note: text overlap with arXiv:1404.6857

  43. arXiv:1404.6857  [pdf, ps, other

    cs.DB

    Causality in Databases: The Diagnosis and Repair Connections

    Authors: Babak Salimi, Leopoldo Bertossi

    Abstract: In this work we establish and investigate the connections between causality for query answers in databases, database repairs wrt. denial constraints, and consistency-based diagnosis. The first two are relatively new problems in databases, and the third one is an established subject in knowledge representation. We show how to obtain database repairs from causes and the other way around. The vast bo… ▽ More

    Submitted 28 June, 2014; v1 submitted 27 April, 2014; originally announced April 2014.

    Comments: Proc. 15th International Workshop on Non-Monotonic Reasoning (NMR 2014)

  44. arXiv:1312.7373  [pdf, ps, other

    cs.DB

    Extending Contexts with Ontologies for Multidimensional Data Quality Assessment

    Authors: Mostafa Milani, Leopoldo Bertossi, Sina Ariyan

    Abstract: Data quality and data cleaning are context dependent activities. Starting from this observation, in previous work a context model for the assessment of the quality of a database instance was proposed. In that framework, the context takes the form of a possibly virtual database or data integration system into which a database instance under quality assessment is mapped, for additional analysis and… ▽ More

    Submitted 20 January, 2014; v1 submitted 27 December, 2013; originally announced December 2013.

    Comments: To appear in Proc. 5th International Workshop on Data Engineering meets the Semantic Web (DESWeb). In conjunction with ICDE 2014

  45. arXiv:1309.1884  [pdf, ps, other

    cs.DB cs.CC cs.LO

    Tractable vs. Intractable Cases of Matching Dependencies for Query Answering under Entity Resolution

    Authors: Leopoldo Bertossi, Jaffer Gardezi

    Abstract: Matching Dependencies (MDs) are a relatively recent proposal for declarative entity resolution. They are rules that specify, on the basis of similarities satisfied by values in a database, what values should be considered duplicates, and have to be matched. On the basis of a chase-like procedure for MD enforcement, we can obtain clean (duplicate-free) instances; actually possibly several of them.… ▽ More

    Submitted 6 April, 2014; v1 submitted 7 September, 2013; originally announced September 2013.

  46. arXiv:1304.7854  [pdf, ps, other

    cs.DB

    On the Complexity of Query Answering under Matching Dependencies for Entity Resolution

    Authors: Leopoldo Bertossi, Jaffer Gardezi

    Abstract: Matching Dependencies (MDs) are a relatively recent proposal for declarative entity resolution. They are rules that specify, given the similarities satisfied by values in a database, what values should be considered duplicates, and have to be matched. On the basis of a chase-like procedure for MD enforcement, we can obtain clean (duplicate-free) instances; actually possibly several of them. The re… ▽ More

    Submitted 26 May, 2013; v1 submitted 30 April, 2013; originally announced April 2013.

    Comments: To appear in Proc. of the Alberto Mendelzon International Workshop on Foundations of Data Management (AMW 2013)

  47. arXiv:1112.5908  [pdf, ps, other

    cs.DB cs.LO

    Query Answering under Matching Dependencies for Data Cleaning: Complexity and Algorithms

    Authors: Jaffer Gardezi, Leopoldo Bertossi

    Abstract: Matching dependencies (MDs) have been recently introduced as declarative rules for entity resolution (ER), i.e. for identifying and resolving duplicates in relational instance $D$. A set of MDs can be used as the basis for a possibly non-deterministic mechanism that computes a duplicate-free instance from $D$. The possible results of this process are the clean, "minimally resolved instances" (MRIs… ▽ More

    Submitted 26 December, 2011; originally announced December 2011.

    Comments: Conference submission, 2011

  48. arXiv:1106.1478  [pdf, ps, other

    cs.DB

    Consistent Query Answering under Spatial Semantic Constraints

    Authors: M. Andrea Rodríguez, Leopoldo Bertossi, Monica Caniupan

    Abstract: Consistent query answering is an inconsistency tolerant approach to obtaining semantically correct answers from a database that may be inconsistent with respect to its integrity constraints. In this work we formalize the notion of consistent query answer for spatial databases and spatial semantic integrity constraints. In order to do this, we first characterize conflicting spatial data, and next,… ▽ More

    Submitted 7 June, 2011; originally announced June 2011.

    Comments: Journal submission, 2010

  49. arXiv:1105.1364  [pdf, ps, other

    cs.DB cs.LO

    Achieving Data Privacy through Secrecy Views and Null-Based Virtual Updates

    Authors: Leopoldo Bertossi, Lechen Li

    Abstract: There may be sensitive information in a relational database, and we might want to keep it hidden from a user or group thereof. In this work, sensitive data is characterized as the contents of a set of secrecy views. For a user without permission to access that sensitive data, the database instance he queries is updated to make the contents of the views empty or contain only tuples with null values… ▽ More

    Submitted 5 April, 2012; v1 submitted 6 May, 2011; originally announced May 2011.

    Comments: Minor revisions of journal resubmission, 2012

    ACM Class: H.2; F.4.1

    Journal ref: IEEE Transaction on Knowledge and Data Engineering, 2013, 25(5):987-1000

  50. arXiv:1008.4627  [pdf, ps, other

    cs.DB

    Matching Dependencies with Arbitrary Attribute Values: Semantics, Query Answering and Integrity Constraints

    Authors: Jaffer Gardezi, Leopoldo Bertossi, Iluju Kiringa

    Abstract: Matching dependencies (MDs) were introduced to specify the identification or matching of certain attribute values in pairs of database tuples when some similarity conditions are satisfied. Their enforcement can be seen as a natural generalization of entity resolution. In what we call the "pure case" of MDs, any value from the underlying data domain can be used for the value in common that does the… ▽ More

    Submitted 26 August, 2010; originally announced August 2010.

    Comments: 13 pages, double column, 2 figures

    ACM Class: H.2; H.2.0; H.2.3