-
Counting Solutions to Conjunctive Queries: Structural and Hybrid Tractability
Authors:
Hubie Chen,
Gianluigi Greco,
Stefan Mengel,
Francesco Scarcello
Abstract:
Counting the number of answers to conjunctive queries is a fundamental problem in databases that, under standard assumptions, does not have an efficient solution. The issue is inherently #P-hard, extending even to classes of acyclic instances.
To address this, we pinpoint tractable classes by examining the structural properties of instances and introducing the novel concept of #-hypertree decomp…
▽ More
Counting the number of answers to conjunctive queries is a fundamental problem in databases that, under standard assumptions, does not have an efficient solution. The issue is inherently #P-hard, extending even to classes of acyclic instances.
To address this, we pinpoint tractable classes by examining the structural properties of instances and introducing the novel concept of #-hypertree decomposition. We establish the feasibility of counting answers in polynomial time for classes of queries featuring bounded #-hypertree width. Additionally, employing novel techniques from the realm of fixed-parameter computational complexity, we prove that, for bounded arity queries, the bounded #-hypertree width property precisely delineates the frontier of tractability for the counting problem. This result closes an important gap in our understanding of the complexity of such a basic problem for conjunctive queries and, equivalently, for constraint satisfaction problems (CSPs).
Drawing upon #-hypertree decompositions, a ''hybrid'' decomposition method emerges. This approach leverages both the structural characteristics of the query and properties intrinsic to the input database, including keys or other (weaker) degree constraints that limit the permissible combinations of values. Intuitively, these features may introduce distinct structural properties that elude identification through the ''worst-possible database'' perspective inherent in purely structural methods.
△ Less
Submitted 24 November, 2023;
originally announced November 2023.
-
A characterization of efficiently compilable constraint languages
Authors:
Christoph Berkholz,
Stefan Mengel,
Hermann Wilhelm
Abstract:
A central task in knowledge compilation is to compile a CNF-SAT instance into a succinct representation format that allows efficient operations such as testing satisfiability, counting, or enumerating all solutions. Useful representation formats studied in this area range from ordered binary decision diagrams (OBDDs) to circuits in decomposable negation normal form (DNNFs).
While it is known tha…
▽ More
A central task in knowledge compilation is to compile a CNF-SAT instance into a succinct representation format that allows efficient operations such as testing satisfiability, counting, or enumerating all solutions. Useful representation formats studied in this area range from ordered binary decision diagrams (OBDDs) to circuits in decomposable negation normal form (DNNFs).
While it is known that there exist CNF formulas that require exponential size representations, the situation is less well studied for other types of constraints than Boolean disjunctive clauses. The constraint satisfaction problem (CSP) is a powerful framework that generalizes CNF-SAT by allowing arbitrary sets of constraints over any finite domain. The main goal of our work is to understand for which type of constraints (also called the constraint language) it is possible to efficiently compute representations of polynomial size. We answer this question completely and prove two tight characterizations of efficiently compilable constraint languages, depending on whether target format is structured.
We first identify the combinatorial property of ``strong blockwise decomposability'' and show that if a constraint language has this property, we can compute DNNF representations of linear size. For all other constraint languages we construct families of CSP-instances that provably require DNNFs of exponential size. For a subclass of ``strong uniformly blockwise decomposable'' constraint languages we obtain a similar dichotomy for structured DNNFs. In fact, strong (uniform) blockwise decomposability even allows efficient compilation into multi-valued analogs of OBDDs and FBDDs, respectively. Thus, we get complete characterizations for all knowledge compilation classes between O(B)DDs and DNNFs.
△ Less
Submitted 16 November, 2023;
originally announced November 2023.
-
Subtractive Mixture Models via Squaring: Representation and Learning
Authors:
Lorenzo Loconte,
Aleksanteri M. Sladek,
Stefan Mengel,
Martin Trapp,
Arno Solin,
Nicolas Gillis,
Antonio Vergari
Abstract:
Mixture models are traditionally represented and learned by adding several distributions as components. Allowing mixtures to subtract probability mass or density can drastically reduce the number of components needed to model complex distributions. However, learning such subtractive mixtures while ensuring they still encode a non-negative function is challenging. We investigate how to learn and pe…
▽ More
Mixture models are traditionally represented and learned by adding several distributions as components. Allowing mixtures to subtract probability mass or density can drastically reduce the number of components needed to model complex distributions. However, learning such subtractive mixtures while ensuring they still encode a non-negative function is challenging. We investigate how to learn and perform inference on deep subtractive mixtures by squaring them. We do this in the framework of probabilistic circuits, which enable us to represent tensorized mixtures and generalize several other subtractive models. We theoretically prove that the class of squared circuits allowing subtractions can be exponentially more expressive than traditional additive mixtures; and, we empirically show this increased expressiveness on a series of real-world distribution estimation tasks.
△ Less
Submitted 26 April, 2024; v1 submitted 1 October, 2023;
originally announced October 2023.
-
Bounds on BDD-Based Bucket Elimination
Authors:
Stefan Mengel
Abstract:
We study BDD-based bucket elimination, an approach to satisfiability testing using variable elimination which has seen several practical implementations in the past. We prove that it allows solving the standard pigeonhole principle formulas efficiently, when allowing different orders for variable elimination and BDD-representations, a variant of bucket elimination that was recently introduced. Fur…
▽ More
We study BDD-based bucket elimination, an approach to satisfiability testing using variable elimination which has seen several practical implementations in the past. We prove that it allows solving the standard pigeonhole principle formulas efficiently, when allowing different orders for variable elimination and BDD-representations, a variant of bucket elimination that was recently introduced. Furthermore, we show that this upper bound is somewhat brittle as for formulas which we get from the pigeonhole principle by restriction, i.e., fixing some of the variables, the same approach with the same variable orders has exponential runtime. We also show that the more common implementation of bucket elimination using the same order for variable elimination and the BDDs has exponential runtime for the pigeonhole principle when using either of the two orders from our upper bound, which suggests that the combination of both is the key to efficiency in the setting.
△ Less
Submitted 1 June, 2023;
originally announced June 2023.
-
Skyline Operators for Document Spanners
Authors:
Antoine Amarilli,
Benny Kimelfeld,
Sébastien Labbé,
Stefan Mengel
Abstract:
When extracting a relation of spans (intervals) from a text document, a common practice is to filter out tuples of the relation that are deemed dominated by others. The domination rule is defined as a partial order that varies along different systems and tasks. For example, we may state that a tuple is dominated by tuples which extend it by assigning additional attributes, or assigning larger inte…
▽ More
When extracting a relation of spans (intervals) from a text document, a common practice is to filter out tuples of the relation that are deemed dominated by others. The domination rule is defined as a partial order that varies along different systems and tasks. For example, we may state that a tuple is dominated by tuples which extend it by assigning additional attributes, or assigning larger intervals. The result of filtering the relation would then be the skyline according to this partial order. As this filtering may remove most of the extracted tuples, we study whether we can improve the performance of the extraction by compiling the domination rule into the extractor.
To this aim, we introduce the skyline operator for declarative information extraction tasks expressed as document spanners. We show that this operator can be expressed via regular operations when the domination partial order can itself be expressed as a regular spanner, which covers several natural domination rules. Yet, we show that the skyline operator incurs a computational cost (under combined complexity). First, there are cases where the operator requires an exponential blowup on the number of states needed to represent the spanner as a sequential variable-set automaton. Second, the evaluation may become computationally hard. Our analysis more precisely identifies classes of domination rules for which the combined complexity is tractable or intractable.
△ Less
Submitted 4 March, 2024; v1 submitted 12 April, 2023;
originally announced April 2023.
-
No Efficient Disjunction or Conjunction of Switch-Lists
Authors:
Stefan Mengel
Abstract:
It is shown that disjunction of two switch-lists can blow up the representation size exponentially. Since switch-lists can be negated without any increase in size, this shows that conjunction of switch-lists also leads to an exponential blow-up in general.
It is shown that disjunction of two switch-lists can blow up the representation size exponentially. Since switch-lists can be negated without any increase in size, this shows that conjunction of switch-lists also leads to an exponential blow-up in general.
△ Less
Submitted 9 March, 2022;
originally announced March 2022.
-
Tight Fine-Grained Bounds for Direct Access on Join Queries
Authors:
Karl Bringmann,
Nofar Carmeli,
Stefan Mengel
Abstract:
We consider the task of lexicographic direct access to query answers. That is, we want to simulate an array containing the answers of a join query sorted in a lexicographic order chosen by the user. A recent dichotomy showed for which queries and orders this task can be done in polylogarithmic access time after quasilinear preprocessing, but this dichotomy does not tell us how much time is require…
▽ More
We consider the task of lexicographic direct access to query answers. That is, we want to simulate an array containing the answers of a join query sorted in a lexicographic order chosen by the user. A recent dichotomy showed for which queries and orders this task can be done in polylogarithmic access time after quasilinear preprocessing, but this dichotomy does not tell us how much time is required in the cases classified as hard. We determine the preprocessing time needed to achieve polylogarithmic access time for all join queries and all lexicographical orders. To this end, we propose a decomposition-based general algorithm for direct access on join queries. We then explore its optimality by proving lower bounds for the preprocessing time based on the hardness of a certain online Set-Disjointness problem, which shows that our algorithm's bounds are tight for all lexicographic orders on join queries. Then, we prove the hardness of Set-Disjointness based on the Zero-Clique Conjecture which is an established conjecture from fine-grained complexity theory. Interestingly, while proving our lower bound, we show that self-joins do not affect the complexity of direct access (up to logarithmic factors). Our algorithm can also be used to solve queries with projections and relaxed order requirements, though in these cases, its running time not necessarily optimal. We also show that similar techniques to those used in our lower bounds can be used to prove that, for enumerating answers to Loomis-Whitney joins, it is not possible to significantly improve upon trivially computing all answers at preprocessing. This, in turn, gives further evidence (based on the Zero-Clique Conjecture) to the enumeration hardness of self-join free cyclic joins with respect to linear preprocessing and constant delay.
△ Less
Submitted 3 May, 2024; v1 submitted 7 January, 2022;
originally announced January 2022.
-
Lower Bounds on Intermediate Results in Bottom-Up Knowledge Compilation
Authors:
Alexis de Colnet,
Stefan Mengel
Abstract:
Bottom-up knowledge compilation is a paradigm for generating representations of functions by iteratively conjoining constraints using a so-called apply function. When the input is not efficiently compilable into a language - generally a class of circuits - because optimal compiled representations are provably large, the problem is not the compilation algorithm as much as the choice of a language t…
▽ More
Bottom-up knowledge compilation is a paradigm for generating representations of functions by iteratively conjoining constraints using a so-called apply function. When the input is not efficiently compilable into a language - generally a class of circuits - because optimal compiled representations are provably large, the problem is not the compilation algorithm as much as the choice of a language too restrictive for the input. In contrast, in this paper, we look at CNF formulas for which very small circuits exists and look at the efficiency of their bottom-up compilation in one of the most general languages, namely that of structured decomposable negation normal forms (str-DNNF). We prove that, while the inputs have constant size representations as str-DNNF, any bottom-up compilation in the general setting where conjunction and structure modification are allowed takes exponential time and space, since large intermediate results have to be produced. This unconditionally proves that the inefficiency of bottom-up compilation resides in the bottom-up paradigm itself.
△ Less
Submitted 23 December, 2021;
originally announced December 2021.
-
A short note on the counting complexity of conjunctive queries
Authors:
Stefan Mengel
Abstract:
This note closes a minor gap in the literature on the counting complexity of conjunctive queries by showing that queries that are not free-connex do not have a linear time counting algorithm under standard complexity assumptions. More generally, it is shown that the so-called quantified star size is a lower bound for the exponent in the runtime of any counting algorithm for conjunctive queries.
This note closes a minor gap in the literature on the counting complexity of conjunctive queries by showing that queries that are not free-connex do not have a linear time counting algorithm under standard complexity assumptions. More generally, it is shown that the so-called quantified star size is a lower bound for the exponent in the runtime of any counting algorithm for conjunctive queries.
△ Less
Submitted 2 December, 2021;
originally announced December 2021.
-
A Compilation of Succinctness Results for Arithmetic Circuits
Authors:
Alexis de Colnet,
Stefan Mengel
Abstract:
Arithmetic circuits (AC) are circuits over the real numbers with 0/1-valued input variables whose gates compute the sum or the product of their inputs. Positive AC -- that is, AC representing non-negative functions -- subsume many interesting probabilistic models such as probabilistic sentential decision diagram (PSDD) or sum-product network (SPN) on indicator variables. Efficient algorithms for m…
▽ More
Arithmetic circuits (AC) are circuits over the real numbers with 0/1-valued input variables whose gates compute the sum or the product of their inputs. Positive AC -- that is, AC representing non-negative functions -- subsume many interesting probabilistic models such as probabilistic sentential decision diagram (PSDD) or sum-product network (SPN) on indicator variables. Efficient algorithms for many operations useful in probabilistic reasoning on these models critically depend on imposing structural restrictions to the underlying AC. Generally, adding structural restrictions yields new tractable operations but increases the size of the AC. In this paper we study the relative succinctness of classes of AC with different combinations of common restrictions. Building on existing results for Boolean circuits, we derive an unconditional succinctness map for classes of monotone AC -- that is, AC whose constant labels are non-negative reals -- respecting relevant combinations of the restrictions we consider. We extend a small part of the map to classes of positive AC. Those are known to generally be exponentially more succinct than their monotone counterparts, but we observe here that for so-called deterministic circuits there is no difference between the monotone and the positive setting which allows us to lift some of our results. We end the paper with some insights on the relative succinctness of positive AC by showing exponential lower bounds on the representations of certain functions in positive AC respecting structured decomposability.
△ Less
Submitted 25 October, 2021;
originally announced October 2021.
-
Proof Complexity of Symbolic QBF Reasoning
Authors:
Stefan Mengel,
Friedrich Slivovsky
Abstract:
We introduce and investigate symbolic proof systems for Quantified Boolean Formulas (QBF) operating on Ordered Binary Decision Diagrams (OBDDs). These systems capture QBF solvers that perform symbolic quantifier elimination, and as such admit short proofs of formulas of bounded path-width and quantifier complexity. As a consequence, we obtain exponential separations from standard clausal proof sys…
▽ More
We introduce and investigate symbolic proof systems for Quantified Boolean Formulas (QBF) operating on Ordered Binary Decision Diagrams (OBDDs). These systems capture QBF solvers that perform symbolic quantifier elimination, and as such admit short proofs of formulas of bounded path-width and quantifier complexity. As a consequence, we obtain exponential separations from standard clausal proof systems, specifically (long-distance) QU-Resolution and IR-Calc.
We further develop a lower bound technique for symbolic QBF proof systems based on strategy extraction that lifts known lower bounds from communication complexity. This allows us to derive strong lower bounds against symbolic QBF proof systems that are independent of the variable ordering of the underlying OBDDs, and that hold even if the proof system is allowed access to an NP-oracle.
△ Less
Submitted 6 April, 2021;
originally announced April 2021.
-
Characterizing Tseitin-formulas with short regular resolution refutations
Authors:
Alexis de Colnet,
Stefan Mengel
Abstract:
Tseitin-formulas are systems of parity constraints whose structure is described by a graph. These formulas have been studied extensively in proof complexity as hard instances in many proof systems. In this paper, we prove that a class of unsatisfiable Tseitin-formulas of bounded degree has regular resolution refutations of polynomial length if and only if the treewidth of all underlying graphs…
▽ More
Tseitin-formulas are systems of parity constraints whose structure is described by a graph. These formulas have been studied extensively in proof complexity as hard instances in many proof systems. In this paper, we prove that a class of unsatisfiable Tseitin-formulas of bounded degree has regular resolution refutations of polynomial length if and only if the treewidth of all underlying graphs $G$ for that class is in $O(\log|V(G)|)$. To do so, we show that any regular resolution refutation of an unsatisfiable Tseitin-formula with graph $G$ of bounded degree has length $2^{Ω(tw(G))}/|V(G)|$, thus essentially matching the known $2^{O(tw(G))}poly(|V(G)|)$ upper bound up. Our proof first connects the length of regular resolution refutations of unsatisfiable Tseitin-formulas to the size of representations of \textit{satisfiable} Tseitin-formulas in decomposable negation normal form (DNNF). Then we prove that for every graph $G$ of bounded degree, every DNNF-representation of every satisfiable Tseitin-formula with graph $G$ must have size $2^{Ω(tw(G))}$ which yields our lower bound for regular resolution.
△ Less
Submitted 17 March, 2021;
originally announced March 2021.
-
An extended Knowledge Compilation Map for Conditional Preference Statements-based and Generalized Additive Utilities-based Languages
Authors:
Hélène Fargier,
Stefan Mengel,
Jérôme Mengin
Abstract:
Conditional preference statements have been used to compactly represent preferences over combinatorial domains. They are at the core of CP-nets and their generalizations, and lexicographic preference trees. Several works have addressed the complexity of some queries (optimization, dominance in particular). We extend in this paper some of these results, and study other queries which have not been a…
▽ More
Conditional preference statements have been used to compactly represent preferences over combinatorial domains. They are at the core of CP-nets and their generalizations, and lexicographic preference trees. Several works have addressed the complexity of some queries (optimization, dominance in particular). We extend in this paper some of these results, and study other queries which have not been addressed so far, like equivalence, and transformations, like conditioning and variable elimination, thereby contributing to a knowledge compilation map for languages based on conditional preference statements. We also study the expressiveness and complexity of queries and transformations for generalized additive utilities.
△ Less
Submitted 23 January, 2024; v1 submitted 8 February, 2021;
originally announced February 2021.
-
On Irrelevant Literals in Pseudo-Boolean Constraint Learning
Authors:
Danel Le Berre,
Pierre Marquis,
Stefan Mengel,
Romain Wallon
Abstract:
Learning pseudo-Boolean (PB) constraints in PB solvers exploiting cutting planes based inference is not as well understood as clause learning in conflict-driven clause learning solvers. In this paper, we show that PB constraints derived using cutting planes may contain \emph{irrelevant literals}, i.e., literals whose assigned values (whatever they are) never change the truth value of the constrain…
▽ More
Learning pseudo-Boolean (PB) constraints in PB solvers exploiting cutting planes based inference is not as well understood as clause learning in conflict-driven clause learning solvers. In this paper, we show that PB constraints derived using cutting planes may contain \emph{irrelevant literals}, i.e., literals whose assigned values (whatever they are) never change the truth value of the constraint. Such literals may lead to infer constraints that are weaker than they should be, impacting the size of the proof built by the solver, and thus also affecting its performance. This suggests that current implementations of PB solvers based on cutting planes should be reconsidered to prevent the generation of irrelevant literals. Indeed, detecting and removing irrelevant literals is too expensive in practice to be considered as an option (the associated problem is NP-hard.
△ Less
Submitted 8 December, 2020;
originally announced December 2020.
-
Lower Bounds for Approximate Knowledge Compilation
Authors:
Alexis de Colnet,
Stefan Mengel
Abstract:
Knowledge compilation studies the trade-off between succinctness and efficiency of different representation languages. For many languages, there are known strong lower bounds on the representation size, but recent work shows that, for some languages, one can bypass these bounds using approximate compilation. The idea is to compile an approximation of the knowledge for which the number of errors ca…
▽ More
Knowledge compilation studies the trade-off between succinctness and efficiency of different representation languages. For many languages, there are known strong lower bounds on the representation size, but recent work shows that, for some languages, one can bypass these bounds using approximate compilation. The idea is to compile an approximation of the knowledge for which the number of errors can be controlled. We focus on circuits in deterministic decomposable negation normal form (d-DNNF), a compilation language suitable in contexts such as probabilistic reasoning, as it supports efficient model counting and probabilistic inference. Moreover, there are known size lower bounds for d-DNNF which by relaxing to approximation one might be able to avoid. In this paper we formalize two notions of approximation: weak approximation which has been studied before in the decision diagram literature and strong approximation which has been used in recent algorithmic results. We then show lower bounds for approximation by d-DNNF, complementing the positive results from the literature.
△ Less
Submitted 27 November, 2020;
originally announced November 2020.
-
Constant-Delay Enumeration for Nondeterministic Document Spanners
Authors:
Antoine Amarilli,
Pierre Bourhis,
Stefan Mengel,
Matthias Niewerth
Abstract:
We consider the information extraction framework known as document spanners, and study the problem of efficiently computing the results of the extraction from an input document, where the extraction task is described as a sequential variable-set automaton (VA). We pose this problem in the setting of enumeration algorithms, where we can first run a preprocessing phase and must then produce the resu…
▽ More
We consider the information extraction framework known as document spanners, and study the problem of efficiently computing the results of the extraction from an input document, where the extraction task is described as a sequential variable-set automaton (VA). We pose this problem in the setting of enumeration algorithms, where we can first run a preprocessing phase and must then produce the results with a small delay between any two consecutive results. Our goal is to have an algorithm which is tractable in combined complexity, i.e., in the sizes of the input document and the VA; while ensuring the best possible data complexity bounds in the input document size, i.e., constant delay in the document size. Several recent works at PODS'18 proposed such algorithms but with linear delay in the document size or with an exponential dependency in size of the (generally nondeterministic) input VA. In particular, Florenzano et al. suggest that our desired runtime guarantees cannot be met for general sequential VAs. We refute this and show that, given a nondeterministic sequential VA and an input document, we can enumerate the map**s of the VA on the document with the following bounds: the preprocessing is linear in the document size and polynomial in the size of the VA, and the delay is independent of the document and polynomial in the size of the VA. The resulting algorithm thus achieves tractability in combined complexity and the best possible data complexity bounds. Moreover, it is rather easy to describe, in particular for the restricted case of so-called extended VAs. Finally, we evaluate our algorithm empirically using a prototype implementation.
△ Less
Submitted 7 December, 2020; v1 submitted 5 March, 2020;
originally announced March 2020.
-
Graph Width Measures for CNF-Encodings with Auxiliary Variables
Authors:
Stefan Mengel,
Romain Wallon
Abstract:
We consider bounded width CNF-formulas where the width is measured by popular graph width measures on graphs associated to CNF-formulas. Such restricted graph classes, in particular those of bounded treewidth, have been extensively studied for their uses in the design of algorithms for various computational problems on CNF-formulas. Here we consider the expressivity of these formulas in the model…
▽ More
We consider bounded width CNF-formulas where the width is measured by popular graph width measures on graphs associated to CNF-formulas. Such restricted graph classes, in particular those of bounded treewidth, have been extensively studied for their uses in the design of algorithms for various computational problems on CNF-formulas. Here we consider the expressivity of these formulas in the model of clausal encodings with auxiliary variables. We first show that bounding the width for many of the measures from the literature leads to a dramatic loss of expressivity, restricting the formulas to such of low communication complexity. We then show that the width of optimal encodings with respect to different measures is strongly linked: there are two classes of width measures, one containing primal treewidth and the other incidence cliquewidth, such that in each class the width of optimal encodings only differs by constant factors. Moreover, between the two classes the width differs at most by a factor logarithmic in the number of variables. Both these results are in stark contrast to the setting without auxiliary variables where all width measures we consider here differ by more than constant factors and in many cases even by linear factors.
△ Less
Submitted 22 January, 2020; v1 submitted 9 May, 2019;
originally announced May 2019.
-
Enumeration on Trees with Tractable Combined Complexity and Efficient Updates
Authors:
Antoine Amarilli,
Pierre Bourhis,
Stefan Mengel,
Matthias Niewerth
Abstract:
We give an algorithm to enumerate the results on trees of monadic second-order (MSO) queries represented by nondeterministic tree automata. After linear time preprocessing (in the input tree), we can enumerate answers with linear delay (in each answer). We allow updates on the tree to take place at any time, and we can then restart the enumeration after logarithmic time in the tree. Further, all o…
▽ More
We give an algorithm to enumerate the results on trees of monadic second-order (MSO) queries represented by nondeterministic tree automata. After linear time preprocessing (in the input tree), we can enumerate answers with linear delay (in each answer). We allow updates on the tree to take place at any time, and we can then restart the enumeration after logarithmic time in the tree. Further, all our combined complexities are polynomial in the automaton.
Our result follows our previous circuit-based enumeration algorithms based on deterministic tree automata, and is also inspired by our earlier result on words and nondeterministic sequential extended variable-set automata in the context of document spanners. We extend these results and combine them with a recent tree balancing scheme by Niewerth, so that our enumeration structure supports updates to the underlying tree in logarithmic time (with leaf insertions, leaf deletions, and node relabelings). Our result implies that, for MSO queries with free first-order variables, we can enumerate the results with linear preprocessing and constant-delay and update the underlying tree in logarithmic time, which improves on several known results for words and trees.
Building on lower bounds from data structure research, we also show unconditionally that up to a doubly logarithmic factor the update time of our algorithm is optimal. Thus, unlike other settings, there can be no algorithm with constant update time.
△ Less
Submitted 27 August, 2019; v1 submitted 22 December, 2018;
originally announced December 2018.
-
Constant-Delay Enumeration for Nondeterministic Document Spanners
Authors:
Antoine Amarilli,
Pierre Bourhis,
Stefan Mengel,
Matthias Niewerth
Abstract:
We consider the information extraction framework known as document spanners, and study the problem of efficiently computing the results of the extraction from an input document, where the extraction task is described as a sequential variable-set automaton (VA). We pose this problem in the setting of enumeration algorithms, where we can first run a preprocessing phase and must then produce the resu…
▽ More
We consider the information extraction framework known as document spanners, and study the problem of efficiently computing the results of the extraction from an input document, where the extraction task is described as a sequential variable-set automaton (VA). We pose this problem in the setting of enumeration algorithms, where we can first run a preprocessing phase and must then produce the results with a small delay between any two consecutive results. Our goal is to have an algorithm which is tractable in combined complexity, i.e., in the sizes of the input document and the VA; while ensuring the best possible data complexity bounds in the input document size, i.e., constant delay in the document size. Several recent works at PODS'18 proposed such algorithms but with linear delay in the document size or with an exponential dependency in size of the (generally nondeterministic) input VA. In particular, Florenzano et al. suggest that our desired runtime guarantees cannot be met for general sequential VAs. We refute this and show that, given a nondeterministic sequential VA and an input document, we can enumerate the map**s of the VA on the document with the following bounds: the preprocessing is linear in the document size and polynomial in the size of the VA, and the delay is independent of the document and polynomial in the size of the VA. The resulting algorithm thus achieves tractability in combined complexity and the best possible data complexity bounds. Moreover, it is rather easy to describe, in particular for the restricted case of so-called extended VAs.
△ Less
Submitted 7 December, 2020; v1 submitted 24 July, 2018;
originally announced July 2018.
-
Knowledge Compilation, Width and Quantification
Authors:
Florent Capelli,
Stefan Mengel
Abstract:
We generalize many results concerning the tractability of SAT and #SAT on bounded treewidth CNF-formula in the context of Quantified Boolean Formulas (QBF). To this end, we start by studying the notion of width for OBDD and observe that the blow up in size while existentially or universally projecting a block of variables in an OBDD only affects its width. We then generalize this notion of width t…
▽ More
We generalize many results concerning the tractability of SAT and #SAT on bounded treewidth CNF-formula in the context of Quantified Boolean Formulas (QBF). To this end, we start by studying the notion of width for OBDD and observe that the blow up in size while existentially or universally projecting a block of variables in an OBDD only affects its width. We then generalize this notion of width to the more general representation of structured (deterministic) DNNF and give a similar algorithm to existentially or universally project a block of variables. Using a well-known algorithm transforming bounded treewidth CNF formula into deterministic DNNF, we are able to generalize this connection to quantified CNF which gives us as a byproduct that one can count the number of models of a bounded treewidth and bounded quantifier alternation quantified CNF in FPT time. We also give an extensive study of bounded width d-DNNF and proves the optimality of several of our results.
△ Less
Submitted 11 July, 2018;
originally announced July 2018.
-
QBF as an Alternative to Courcelle's Theorem
Authors:
Michael Lampis,
Stefan Mengel,
Valia Mitsou
Abstract:
We propose reductions to quantified Boolean formulas (QBF) as a new approach to showing fixed-parameter linear algorithms for problems parameterized by treewidth. We demonstrate the feasibility of this approach by giving new algorithms for several well-known problems from artificial intelligence that are in general complete for the second level of the polynomial hierarchy. By reduction from QBF we…
▽ More
We propose reductions to quantified Boolean formulas (QBF) as a new approach to showing fixed-parameter linear algorithms for problems parameterized by treewidth. We demonstrate the feasibility of this approach by giving new algorithms for several well-known problems from artificial intelligence that are in general complete for the second level of the polynomial hierarchy. By reduction from QBF we show that all resulting algorithms are essentially optimal in their dependence on the treewidth. Most of the problems that we consider were already known to be fixed-parameter linear by using Courcelle's Theorem or dynamic programming, but we argue that our approach has clear advantages over these techniques: on the one hand, in contrast to Courcelle's Theorem, we get concrete and tight guarantees for the runtime dependence on the treewidth. On the other hand, we avoid tedious dynamic programming and, after showing some normalization results for CNF-formulas, our upper bounds often boil down to a few lines.
△ Less
Submitted 22 May, 2018;
originally announced May 2018.
-
On tractable query evaluation for SPARQL
Authors:
Stefan Mengel,
Sebastian Skritek
Abstract:
Despite much work within the last decade on foundational properties of SPARQL - the standard query language for RDF data - rather little is known about the exact limits of tractability for this language. In particular, this is the case for SPARQL queries that contain the OPTIONAL-operator, even though it is one of the most intensively studied features of SPARQL. The aim of our work is to provide a…
▽ More
Despite much work within the last decade on foundational properties of SPARQL - the standard query language for RDF data - rather little is known about the exact limits of tractability for this language. In particular, this is the case for SPARQL queries that contain the OPTIONAL-operator, even though it is one of the most intensively studied features of SPARQL. The aim of our work is to provide a more thorough picture of tractable classes of SPARQL queries.
In general, SPARQL query evaluation is PSPACE-complete in combined complexity, and it remains PSPACE-hard already for queries containing only the OPTIONAL-operator. To amend this situation, research has focused on "well-designed SPARQL queries" and their recent generalization "weakly well-designed SPARQL queries". For these two fragments the evaluation problem is coNP-complete in the absence of projection and SigmaP2-complete otherwise. Moreover, they have been shown to contain most SPARQL queries asked in practical settings.
In this paper, we study tractable classes of weakly well-designed queries in parameterized complexity considering the equivalent formulation as pattern trees. We give a complete characterization of the tractable classes in the case without projection. Moreover, we show a characterization of all tractable classes of simple well-designed pattern trees in the presence of projection.
△ Less
Submitted 24 December, 2017;
originally announced December 2017.
-
Enumeration on Trees under Relabelings
Authors:
Antoine Amarilli,
Pierre Bourhis,
Stefan Mengel
Abstract:
We study how to evaluate MSO queries with free variables on trees, within the framework of enumeration algorithms. Previous work has shown how to enumerate answers with linear-time preprocessing and delay linear in the size of each output, i.e., constant-delay for free first-order variables. We extend this result to support relabelings, a restricted kind of update operations on trees which allows…
▽ More
We study how to evaluate MSO queries with free variables on trees, within the framework of enumeration algorithms. Previous work has shown how to enumerate answers with linear-time preprocessing and delay linear in the size of each output, i.e., constant-delay for free first-order variables. We extend this result to support relabelings, a restricted kind of update operations on trees which allows us to change the node labels. Our main result shows that we can enumerate the answers of MSO queries on trees with linear-time preprocessing and delay linear in each answer, while supporting node relabelings in logarithmic time. To prove this, we reuse the circuit-based enumeration structure from our earlier work, and develop techniques to maintain its index under node relabelings. We also show how enumeration under relabelings can be applied to evaluate practical query languages, such as aggregate, group-by, and parameterized queries.
△ Less
Submitted 31 May, 2018; v1 submitted 18 September, 2017;
originally announced September 2017.
-
A Circuit-Based Approach to Efficient Enumeration
Authors:
Antoine Amarilli,
Pierre Bourhis,
Louis Jachiet,
Stefan Mengel
Abstract:
We study the problem of enumerating the satisfying valuations of a circuit while bounding the delay, i.e., the time needed to compute each successive valuation. We focus on the class of structured d-DNNF circuits originally introduced in knowledge compilation, a sub-area of artificial intelligence. We propose an algorithm for these circuits that enumerates valuations with linear preprocessing and…
▽ More
We study the problem of enumerating the satisfying valuations of a circuit while bounding the delay, i.e., the time needed to compute each successive valuation. We focus on the class of structured d-DNNF circuits originally introduced in knowledge compilation, a sub-area of artificial intelligence. We propose an algorithm for these circuits that enumerates valuations with linear preprocessing and delay linear in the Hamming weight of each valuation. Moreover, valuations of constant Hamming weight can be enumerated with linear preprocessing and constant delay.
Our results yield a framework for efficient enumeration that applies to all problems whose solutions can be compiled to structured d-DNNFs. In particular, we use it to recapture classical results in database theory, for factorized database representations and for MSO evaluation. This gives an independent proof of constant-delay enumeration for MSO formulae with first-order free variables on bounded-treewidth structures.
△ Less
Submitted 5 May, 2017; v1 submitted 18 February, 2017;
originally announced February 2017.
-
On the relative power of reduction notions in arithmetic circuit complexity
Authors:
Christian Ikenmeyer,
Stefan Mengel
Abstract:
We show that the two main reduction notions in arithmetic circuit complexity, p-projections and c-reductions, differ in power. We do so by showing unconditionally that there are polynomials that are VNP-complete under c-reductions but not under p-projections. We also show that the question of which polynomials are VNP-complete under which type of reductions depends on the underlying field.
We show that the two main reduction notions in arithmetic circuit complexity, p-projections and c-reductions, differ in power. We do so by showing unconditionally that there are polynomials that are VNP-complete under c-reductions but not under p-projections. We also show that the question of which polynomials are VNP-complete under which type of reductions depends on the underlying field.
△ Less
Submitted 19 September, 2016;
originally announced September 2016.
-
Lower Bounds on the mim-width of Some Graph Classes
Authors:
Stefan Mengel
Abstract:
mim-width is a recent graph width measure that has seen applications in graph algorithms and problems related to propositional satisfiability. In this paper, we show linear lower bounds for the mim-width of strongly chordal split graphs, co-comparability graphs and circle graphs. This improves and refines lower bounds that were known before, some of them only conditionally. In the case of strongly…
▽ More
mim-width is a recent graph width measure that has seen applications in graph algorithms and problems related to propositional satisfiability. In this paper, we show linear lower bounds for the mim-width of strongly chordal split graphs, co-comparability graphs and circle graphs. This improves and refines lower bounds that were known before, some of them only conditionally. In the case of strongly chordal graphs not even a conditional lower bound was known before. All of the bounds given are optimal up to constants.
△ Less
Submitted 4 January, 2017; v1 submitted 4 August, 2016;
originally announced August 2016.
-
Parameterized Compilation Lower Bounds for Restricted CNF-formulas
Authors:
Stefan Mengel
Abstract:
We show unconditional parameterized lower bounds in the area of knowledge compilation, more specifically on the size of circuits in decomposable negation normal form (DNNF) that encode CNF-formulas restricted by several graph width measures. In particular, we show that
- there are CNF formulas of size $n$ and modular incidence treewidth $k$ whose smallest DNNF-encoding has size $n^{Ω(k)}$, and…
▽ More
We show unconditional parameterized lower bounds in the area of knowledge compilation, more specifically on the size of circuits in decomposable negation normal form (DNNF) that encode CNF-formulas restricted by several graph width measures. In particular, we show that
- there are CNF formulas of size $n$ and modular incidence treewidth $k$ whose smallest DNNF-encoding has size $n^{Ω(k)}$, and
- there are CNF formulas of size $n$ and incidence neighborhood diversity $k$ whose smallest DNNF-encoding has size $n^{Ω(\sqrt{k})}$.
These results complement recent upper bounds for compiling CNF into DNNF and strengthen---quantitatively and qualitatively---known conditional low\-er bounds for cliquewidth. Moreover, they show that, unlike for many graph problems, the parameters considered here behave significantly differently from treewidth.
△ Less
Submitted 22 April, 2016;
originally announced April 2016.
-
Counting Answers to Existential Positive Queries: A Complexity Classification
Authors:
Hubie Chen,
Stefan Mengel
Abstract:
Existential positive formulas form a fragment of first-order logic that includes and is semantically equivalent to unions of conjunctive queries, one of the most important and well-studied classes of queries in database theory. We consider the complexity of counting the number of answers to existential positive formulas on finite structures and give a trichotomy theorem on query classes, in the se…
▽ More
Existential positive formulas form a fragment of first-order logic that includes and is semantically equivalent to unions of conjunctive queries, one of the most important and well-studied classes of queries in database theory. We consider the complexity of counting the number of answers to existential positive formulas on finite structures and give a trichotomy theorem on query classes, in the setting of bounded arity. This theorem generalizes and unifies several known results on the complexity of conjunctive queries and unions of conjunctive queries.
△ Less
Submitted 20 April, 2016; v1 submitted 13 January, 2016;
originally announced January 2016.
-
Minimal Distance of Propositional Models
Authors:
Mike Behrisch,
Miki Hermann,
Stefan Mengel,
Gernot Salzer
Abstract:
We investigate the complexity of three optimization problems in Boolean propositional logic related to information theory: Given a conjunctive formula over a set of relations, find a satisfying assignment with minimal Hamming distance to a given assignment that satisfies the formula ($\mathsf{NeareastOtherSolution}$, $\mathsf{NOSol}$) or that does not need to satisfy it (…
▽ More
We investigate the complexity of three optimization problems in Boolean propositional logic related to information theory: Given a conjunctive formula over a set of relations, find a satisfying assignment with minimal Hamming distance to a given assignment that satisfies the formula ($\mathsf{NeareastOtherSolution}$, $\mathsf{NOSol}$) or that does not need to satisfy it ($\mathsf{NearestSolution}$, $\mathsf{NSol}$). The third problem asks for two satisfying assignments with a minimal Hamming distance among all such assignments ($\mathsf{MinSolutionDistance}$, $\mathsf{MSD}$).
For all three problems we give complete classifications with respect to the relations admitted in the formula. We give polynomial time algorithms for several classes of constraint languages. For all other cases we prove hardness or completeness regarding APX, APX, NPO, or equivalence to well-known hard optimization problems.
△ Less
Submitted 8 June, 2017; v1 submitted 24 February, 2015;
originally announced February 2015.
-
The Logic of Counting Query Answers
Authors:
Hubie Chen,
Stefan Mengel
Abstract:
We consider the problem of counting the number of answers to a first-order formula on a finite structure. We present and study an extension of first-order logic in which algorithms for this counting problem can be naturally and conveniently expressed, in senses that are made precise and that are motivated by the wish to understand tractable cases of the counting problem.
We consider the problem of counting the number of answers to a first-order formula on a finite structure. We present and study an extension of first-order logic in which algorithms for this counting problem can be naturally and conveniently expressed, in senses that are made precise and that are motivated by the wish to understand tractable cases of the counting problem.
△ Less
Submitted 20 April, 2017; v1 submitted 28 January, 2015;
originally announced January 2015.
-
A Strongly Exponential Separation of DNNFs from CNF Formulas
Authors:
Simone Bova,
Florent Capelli,
Stefan Mengel,
Friedrich Slivovsky
Abstract:
Decomposable Negation Normal Forms (DNNFs) are Boolean circuits in negation normal form where the subcircuits leading into each AND gate are defined on disjoint sets of variables. We prove a strongly exponential lower bound on the size of DNNFs for a class of CNF formulas built from expander graphs. As a corollary, we obtain a strongly exponential separation between DNNFs and CNF formulas in prime…
▽ More
Decomposable Negation Normal Forms (DNNFs) are Boolean circuits in negation normal form where the subcircuits leading into each AND gate are defined on disjoint sets of variables. We prove a strongly exponential lower bound on the size of DNNFs for a class of CNF formulas built from expander graphs. As a corollary, we obtain a strongly exponential separation between DNNFs and CNF formulas in prime implicates form. This settles an open problem in the area of knowledge compilation (Darwiche and Marquis, 2002).
△ Less
Submitted 19 February, 2015; v1 submitted 7 November, 2014;
originally announced November 2014.
-
A Trichotomy in the Complexity of Counting Answers to Conjunctive Queries
Authors:
Hubie Chen,
Stefan Mengel
Abstract:
Conjunctive queries are basic and heavily studied database queries; in relational algebra, they are the select-project-join queries. In this article, we study the fundamental problem of counting, given a conjunctive query and a relational database, the number of answers to the query on the database. In particular, we study the complexity of this problem relative to sets of conjunctive queries. We…
▽ More
Conjunctive queries are basic and heavily studied database queries; in relational algebra, they are the select-project-join queries. In this article, we study the fundamental problem of counting, given a conjunctive query and a relational database, the number of answers to the query on the database. In particular, we study the complexity of this problem relative to sets of conjunctive queries. We present a trichotomy theorem, which shows essentially that this problem on a set of conjunctive queries is either tractable, equivalent to the parameterized CLIQUE problem, or as hard as the parameterized counting CLIQUE problem; the criteria describing which of these situations occurs is simply stated, in terms of graph-theoretic conditions.
△ Less
Submitted 21 January, 2015; v1 submitted 5 August, 2014;
originally announced August 2014.
-
Understanding model counting for $β$-acyclic CNF-formulas
Authors:
Johann Brault-Baron,
Florent Capelli,
Stefan Mengel
Abstract:
We extend the knowledge about so-called structural restrictions of $\mathrm{\#SAT}$ by giving a polynomial time algorithm for $β$-acyclic $\mathrm{\#SAT}$. In contrast to previous algorithms in the area, our algorithm does not proceed by dynamic programming but works along an elimination order, solving a weighted version of constraint satisfaction. Moreover, we give evidence that this deviation fr…
▽ More
We extend the knowledge about so-called structural restrictions of $\mathrm{\#SAT}$ by giving a polynomial time algorithm for $β$-acyclic $\mathrm{\#SAT}$. In contrast to previous algorithms in the area, our algorithm does not proceed by dynamic programming but works along an elimination order, solving a weighted version of constraint satisfaction. Moreover, we give evidence that this deviation from more standard algorithm is not a coincidence, but that there is likely no dynamic programming algorithm of the usual style for $β$-acyclic $\mathrm{\#SAT}$.
△ Less
Submitted 23 May, 2014;
originally announced May 2014.
-
Hypergraph Acyclicity and Propositional Model Counting
Authors:
Florent Capelli,
Arnaud Durand,
Stefan Mengel
Abstract:
We show that the propositional model counting problem #SAT for CNF- formulas with hypergraphs that allow a disjoint branches decomposition can be solved in polynomial time. We show that this class of hypergraphs is incomparable to hypergraphs of bounded incidence cliquewidth which were the biggest class of hypergraphs for which #SAT was known to be solvable in polynomial time so far. Furthermore,…
▽ More
We show that the propositional model counting problem #SAT for CNF- formulas with hypergraphs that allow a disjoint branches decomposition can be solved in polynomial time. We show that this class of hypergraphs is incomparable to hypergraphs of bounded incidence cliquewidth which were the biggest class of hypergraphs for which #SAT was known to be solvable in polynomial time so far. Furthermore, we present a polynomial time algorithm that computes a disjoint branches decomposition of a given hypergraph if it exists and rejects otherwise. Finally, we show that some slight extensions of the class of hypergraphs with disjoint branches decompositions lead to intractable #SAT, leaving open how to generalize the counting result of this paper.
△ Less
Submitted 24 January, 2014;
originally announced January 2014.
-
Structural Tractability of Counting of Solutions to Conjunctive Queries
Authors:
Arnaud Durand,
Stefan Mengel
Abstract:
In this paper we explore the problem of counting solutions to conjunctive queries. We consider a parameter called the \emph{quantified star size} of a formula $\varphi$ which measures how the free variables are spread in $\varphi$. We show that for conjunctive queries that admit nice decomposition properties (such as being of bounded treewidth or generalized hypertree width) bounded quantified sta…
▽ More
In this paper we explore the problem of counting solutions to conjunctive queries. We consider a parameter called the \emph{quantified star size} of a formula $\varphi$ which measures how the free variables are spread in $\varphi$. We show that for conjunctive queries that admit nice decomposition properties (such as being of bounded treewidth or generalized hypertree width) bounded quantified star size exactly characterizes the classes of queries for which counting the number of solutions is tractable. This also allows us to fully characterize the conjunctive queries for which counting the solutions is tractable in the case of bounded arity. To illustrate the applicability of our results, we also show that computing the quantified star size of a formula is possible in time $n^{O(k)}$ for queries of generalized hypertree width $k$. Furthermore, quantified star size is even fixed parameter tractable parameterized by some other width measures, while it is $\W{1}$-hard for generalized hypertree width and thus unlikely to be fixed parameter tractable. We finally show how to compute an approximation of quantified star size in polynomial time where the approximation ratio depends on the width of the input.
△ Less
Submitted 8 March, 2013;
originally announced March 2013.
-
Arithmetic Branching Programs with Memory
Authors:
Stefan Mengel
Abstract:
We extend the well known characterization of $\vpws$ as the class of polynomials computed by polynomial size arithmetic branching programs to other complexity classes. In order to do so we add additional memory to the computation of branching programs to make them more expressive. We show that allowing different types of memory in branching programs increases the computational power even for const…
▽ More
We extend the well known characterization of $\vpws$ as the class of polynomials computed by polynomial size arithmetic branching programs to other complexity classes. In order to do so we add additional memory to the computation of branching programs to make them more expressive. We show that allowing different types of memory in branching programs increases the computational power even for constant width programs. In particular, this leads to very natural and robust characterizations of $\vp$ and $\vnp$ by branching programs with memory.
△ Less
Submitted 8 March, 2013;
originally announced March 2013.
-
The arithmetic complexity of tensor contractions
Authors:
Florent Capelli,
Arnaud Durand,
Stefan Mengel
Abstract:
We investigate the algebraic complexity of tensor calulus. We consider a generalization of iterated matrix product to tensors and show that the resulting formulas exactly capture VP, the class of polynomial families efficiently computable by arithmetic circuits. This gives a natural and robust characterization of this complexity class that despite its naturalness is not very well understood so far…
▽ More
We investigate the algebraic complexity of tensor calulus. We consider a generalization of iterated matrix product to tensors and show that the resulting formulas exactly capture VP, the class of polynomial families efficiently computable by arithmetic circuits. This gives a natural and robust characterization of this complexity class that despite its naturalness is not very well understood so far.
△ Less
Submitted 21 September, 2012;
originally announced September 2012.
-
Monomials in arithmetic circuits: Complete problems in the counting hierarchy
Authors:
Hervé Fournier,
Guillaume Malod,
Stefan Mengel
Abstract:
We consider the complexity of two questions on polynomials given by arithmetic circuits: testing whether a monomial is present and counting the number of monomials. We show that these problems are complete for subclasses of the counting hierarchy which had few or no known natural complete problems. We also study these questions for circuits computing multilinear polynomials.
We consider the complexity of two questions on polynomials given by arithmetic circuits: testing whether a monomial is present and counting the number of monomials. We show that these problems are complete for subclasses of the counting hierarchy which had few or no known natural complete problems. We also study these questions for circuits computing multilinear polynomials.
△ Less
Submitted 27 March, 2012; v1 submitted 28 October, 2011;
originally announced October 2011.
-
The Complexity of Weighted Counting for Acyclic Conjunctive Queries
Authors:
Arnaud Durand,
Stefan Mengel
Abstract:
This paper is a study of weighted counting of the solutions of acyclic conjunctive queries ($\ACQ$). The unweighted quantifier free version of this problem is known to be tractable (for combined complexity), but it is also known that introducing even a single quantified variable makes it $\sP$-hard. We first show that weighted counting for quantifier-free $\ACQ$ is still tractable and that even mi…
▽ More
This paper is a study of weighted counting of the solutions of acyclic conjunctive queries ($\ACQ$). The unweighted quantifier free version of this problem is known to be tractable (for combined complexity), but it is also known that introducing even a single quantified variable makes it $\sP$-hard. We first show that weighted counting for quantifier-free $\ACQ$ is still tractable and that even minimalistic extensions of the problem lead to hard cases. We then introduce a new parameter for quantified queries that permits to isolate large island of tractability. We show that, up to a standard assumption from parameterized complexity, this parameter fully characterizes tractable subclasses for counting weighted solutions of $\ACQ$ queries. Thus we completely determine the tractability frontier for weighted counting for $\ACQ$.
△ Less
Submitted 7 December, 2011; v1 submitted 19 October, 2011;
originally announced October 2011.