-
Using Color Refinement to Boost Enumeration and Counting for Acyclic CQs of Binary Schemas
Authors:
Cristian Riveros,
Benjamin Scheidt,
Nicole Schweikardt
Abstract:
We present an index structure, called the color-index, to boost the evaluation of acyclic conjunctive queries (ACQs) over binary schemas. The color-index is based on the color refinement algorithm, a widely used subroutine for graph isomorphism testing algorithms. Given a database $D$, we use a suitable version of the color refinement algorithm to produce a stable coloring of $D$, an assignment fr…
▽ More
We present an index structure, called the color-index, to boost the evaluation of acyclic conjunctive queries (ACQs) over binary schemas. The color-index is based on the color refinement algorithm, a widely used subroutine for graph isomorphism testing algorithms. Given a database $D$, we use a suitable version of the color refinement algorithm to produce a stable coloring of $D$, an assignment from the active domain of $D$ to a set of colors $C_D$. The main ingredient of the color-index is a particular database $D_c$ whose active domain is $C_D$ and whose size is at most $|D|$. Using the color-index, we can evaluate any free-connex ACQ $Q$ over $D$ with preprocessing time $O(|Q| \cdot |D_c|)$ and constant delay enumeration. Furthermore, we can also count the number of results of $Q$ over $D$ in time $O(|Q| \cdot |D_c|)$. Given that $|D_c|$ could be much smaller than $|D|$ (even constant-size for some families of databases), the color-index is the first index structure for evaluating free-connex ACQs that allows efficient enumeration and counting with performance that may be strictly smaller than the database size.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
A framework for extraction and transformation of documents
Authors:
Cristian Riveros,
Markus L. Schmid,
Nicole Schweikardt
Abstract:
We present a theoretical framework for the extraction and transformation of text documents. We propose to use a two-phase process where the first phase extracts span-tuples from a document, and the second phase maps the content of the span-tuples into new documents. We base the extraction phase on the framework of document spanners and the transformation phase on the theory of polyregular function…
▽ More
We present a theoretical framework for the extraction and transformation of text documents. We propose to use a two-phase process where the first phase extracts span-tuples from a document, and the second phase maps the content of the span-tuples into new documents. We base the extraction phase on the framework of document spanners and the transformation phase on the theory of polyregular functions, the class of regular string-to-string functions with polynomial growth.
For supporting practical extract-transform scenarios, we propose an extension of document spanners described by regex formulas from span-tuples to so-called multispan-tuples, where variables are mapped to sets of spans. We prove that this extension, called regex multispanners, has the same desirable properties as standard spanners described by regex formulas. In our framework, an Extract-Transform (ET) program is given by a regex multispanner followed by a polyregular function.
In this paper, we study the expressibility and evaluation problem of ET programs when the transformation function is linear, called linear ET programs. We show that linear ET programs are equally expressive as non-deterministic streaming string transducers under bag semantics. Moreover, we show that linear ET programs are closed under composition. Finally, we present an enumeration algorithm for evaluating every linear ET program over a document with linear time preprocessing and constant delay.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
Counting Homomorphisms from Hypergraphs of Bounded Generalised Hypertree Width: A Logical Characterisation
Authors:
Benjamin Scheidt,
Nicole Schweikardt
Abstract:
We introduce the 2-sorted counting logic $GC^k$ that expresses properties of hypergraphs. This logic has available k variables to address hyperedges, an unbounded number of variables to address vertices, and atomic formulas E(e,v) to express that a vertex v is contained in a hyperedge e. We show that two hypergraphs H, H' satisfy the same sentences of the logic $GC^k$ if, and only if, they are hom…
▽ More
We introduce the 2-sorted counting logic $GC^k$ that expresses properties of hypergraphs. This logic has available k variables to address hyperedges, an unbounded number of variables to address vertices, and atomic formulas E(e,v) to express that a vertex v is contained in a hyperedge e. We show that two hypergraphs H, H' satisfy the same sentences of the logic $GC^k$ if, and only if, they are homomorphism indistinguishable over the class of hypergraphs of generalised hypertree width at most k. Here, H, H' are called homomorphism indistinguishable over a class C if for every hypergraph G in C the number of homomorphisms from G to H equals the number of homomorphisms from G to H'. This result can be viewed as a generalisation (from graphs to hypergraphs) of a result by Dvorak (2010) stating that any two (undirected, simple, finite) graphs H, H' are indistinguishable by the (k+1)-variable counting logic $C^{k+1}$ if, and only if, they are homomorphism indistinguishable on the class of graphs of tree width at most k.
△ Less
Submitted 21 August, 2023; v1 submitted 20 March, 2023;
originally announced March 2023.
-
Spanner Evaluation over SLP-Compressed Documents
Authors:
Markus L. Schmid,
Nicole Schweikardt
Abstract:
We consider the problem of evaluating regular spanners over compressed documents, i.e., we wish to solve evaluation tasks directly on the compressed data, without decompression. As compressed forms of the documents we use straight-line programs (SLPs) -- a lossless compression scheme for textual data widely used in different areas of theoretical computer science and particularly well-suited for al…
▽ More
We consider the problem of evaluating regular spanners over compressed documents, i.e., we wish to solve evaluation tasks directly on the compressed data, without decompression. As compressed forms of the documents we use straight-line programs (SLPs) -- a lossless compression scheme for textual data widely used in different areas of theoretical computer science and particularly well-suited for algorithmics on compressed data. In terms of data complexity, our results are as follows. For a regular spanner M and an SLP S that represents a document D, we can solve the tasks of model checking and of checking non-emptiness in time O(size(S)). Computing the set M(D) of all span-tuples extracted from D can be done in time O(size(S) size(M(D))), and enumeration of M(D) can be done with linear preprocessing O(size(S)) and a delay of O(depth(S)), where depth(S) is the depth of S's derivation tree. Note that size(S) can be exponentially smaller than the document's size |D|; and, due to known balancing results for SLPs, we can always assume that depth(S) = O(log(|D|)) independent of D's compressibility. Hence, our enumeration algorithm has a delay logarithmic in the size of the non-compressed data and a preprocessing time that is at best (i.e., in the case of highly compressible documents) also logarithmic, but at worst still linear. Therefore, in a big-data perspective, our enumeration algorithm for SLP-compressed documents may nevertheless beat the known linear preprocessing and constant delay algorithms for non-compressed documents.
△ Less
Submitted 25 January, 2021;
originally announced January 2021.
-
A Purely Regular Approach to Non-Regular Core Spanners
Authors:
Markus L. Schmid,
Nicole Schweikardt
Abstract:
The regular spanners (characterised by vset-automata) are closed under the algebraic operations of union, join and projection, and have desirable algorithmic properties. The core spanners (introduced by Fagin, Kimelfeld, Reiss, and Vansummeren (PODS 2013, JACM 2015) as a formalisation of the core functionality of the query language AQL used in IBM's SystemT) additionally need string-equality selec…
▽ More
The regular spanners (characterised by vset-automata) are closed under the algebraic operations of union, join and projection, and have desirable algorithmic properties. The core spanners (introduced by Fagin, Kimelfeld, Reiss, and Vansummeren (PODS 2013, JACM 2015) as a formalisation of the core functionality of the query language AQL used in IBM's SystemT) additionally need string-equality selections and it has been shown by Freydenberger and Holldack (ICDT 2016, Theory of Computing Systems 2018) that this leads to high complexity and even undecidability of the typical problems in static analysis and query evaluation. We propose an alternative approach to core spanners: by incorporating the string-equality selections directly into the regular language that represents the underlying regular spanner (instead of treating it as an algebraic operation on the table extracted by the regular spanner), we obtain a fragment of core spanners that, while having slightly weaker expressive power than the full class of core spanners, arguably still covers the intuitive applications of string-equality selections for information extraction and has much better upper complexity bounds of the typical problems in static analysis and query evaluation.
△ Less
Submitted 12 February, 2024; v1 submitted 26 October, 2020;
originally announced October 2020.
-
Enumerating Answers to First-Order Queries over Databases of Low Degree
Authors:
Arnaud Durand,
Nicole Schweikardt,
Luc Segoufin
Abstract:
A class of relational databases has low degree if for all $δ>0$, all but finitely many databases in the class have degree at most $n^δ$, where $n$ is the size of the database. Typical examples are databases of bounded degree or of degree bounded by $\log n$.
It is known that over a class of databases having low degree, first-order boolean queries can be checked in pseudo-linear time, i.e.\ for a…
▽ More
A class of relational databases has low degree if for all $δ>0$, all but finitely many databases in the class have degree at most $n^δ$, where $n$ is the size of the database. Typical examples are databases of bounded degree or of degree bounded by $\log n$.
It is known that over a class of databases having low degree, first-order boolean queries can be checked in pseudo-linear time, i.e.\ for all $ε>0$ in time bounded by $n^{1+ε}$. We generalize this result by considering query evaluation.
We show that counting the number of answers to a query can be done in pseudo-linear time and that after a pseudo-linear time preprocessing we can test in constant time whether a given tuple is a solution to a query or enumerate the answers to a query with constant delay.
△ Less
Submitted 9 May, 2022; v1 submitted 16 October, 2020;
originally announced October 2020.
-
Learning Concepts Described by Weight Aggregation Logic
Authors:
Steffen van Bergerem,
Nicole Schweikardt
Abstract:
We consider weighted structures, which extend ordinary relational structures by assigning weights, i.e. elements from a particular group or ring, to tuples present in the structure. We introduce an extension of first-order logic that allows to aggregate weights of tuples, compare such aggregates, and use them to build more complex formulas. We provide locality properties of fragments of this logic…
▽ More
We consider weighted structures, which extend ordinary relational structures by assigning weights, i.e. elements from a particular group or ring, to tuples present in the structure. We introduce an extension of first-order logic that allows to aggregate weights of tuples, compare such aggregates, and use them to build more complex formulas. We provide locality properties of fragments of this logic including Feferman-Vaught decompositions and a Gaifman normal form for a fragment called FOW1, as well as a localisation theorem for a larger fragment called FOWA1. This fragment can express concepts from various machine learning scenarios. Using the locality properties, we show that concepts definable in FOWA1 over a weighted background structure of at most polylogarithmic degree are agnostically PAC-learnable in polylogarithmic time after pseudo-linear time preprocessing.
△ Less
Submitted 22 September, 2020;
originally announced September 2020.
-
Constant delay enumeration with FPT-preprocessing for conjunctive queries of bounded submodular width
Authors:
Christoph Berkholz,
Nicole Schweikardt
Abstract:
Marx (STOC~2010, J.~ACM 2013) introduced the notion of submodular width of a conjunctive query (CQ) and showed that for any class $Φ$ of Boolean CQs of bounded submodular width, the model-checking problem for $Φ$ on the class of all finite structures is fixed-parameter tractable (FPT). Note that for non-Boolean queries, the size of the query result may be far too large to be computed entirely with…
▽ More
Marx (STOC~2010, J.~ACM 2013) introduced the notion of submodular width of a conjunctive query (CQ) and showed that for any class $Φ$ of Boolean CQs of bounded submodular width, the model-checking problem for $Φ$ on the class of all finite structures is fixed-parameter tractable (FPT). Note that for non-Boolean queries, the size of the query result may be far too large to be computed entirely within FPT time. We investigate the free-connex variant of submodular width and generalise Marx's result to non-Boolean queries as follows: For every class $Φ$ of CQs of bounded free-connex submodular width, within FPT-preprocessing time we can build a data structure that allows to enumerate, without repetition and with constant delay, all tuples of the query result. Our proof builds upon Marx's splitting routine to decompose the query result into a union of results; but we have to tackle the additional technical difficulty to ensure that these can be enumerated efficiently.
△ Less
Submitted 2 March, 2020;
originally announced March 2020.
-
Answering (Unions of) Conjunctive Queries using Random Access and Random-Order Enumeration
Authors:
Nofar Carmeli,
Shai Zeevi,
Christoph Berkholz,
Benny Kimelfeld,
Nicole Schweikardt
Abstract:
As data analytics becomes more crucial to digital systems, so grows the importance of characterizing the database queries that admit a more efficient evaluation. We consider the tractability yardstick of answer enumeration with a polylogarithmic delay after a linear-time preprocessing phase. Such an evaluation is obtained by constructing, in the preprocessing phase, a data structure that supports…
▽ More
As data analytics becomes more crucial to digital systems, so grows the importance of characterizing the database queries that admit a more efficient evaluation. We consider the tractability yardstick of answer enumeration with a polylogarithmic delay after a linear-time preprocessing phase. Such an evaluation is obtained by constructing, in the preprocessing phase, a data structure that supports polylogarithmic-delay enumeration. In this paper, we seek a structure that supports the more demanding task of a "random permutation": polylogarithmic-delay enumeration in truly random order. Enumeration of this kind is required if downstream applications assume that the intermediate results are representative of the whole result set in a statistically valuable manner. An even more demanding task is that of a "random access": polylogarithmic-time retrieval of an answer whose position is given.
We establish that the free-connex acyclic CQs are tractable in all three senses: enumeration, random-order enumeration, and random access; and in the absence of self-joins, it follows from past results that every other CQ is intractable by each of the three (under some fine-grained complexity assumptions). However, the three yardsticks are separated in the case of a union of CQs (UCQ): while a union of free-connex acyclic CQs has a tractable enumeration, it may (provably) admit no random access. For such UCQs we devise a random-order enumeration whose delay is logarithmic in expectation. We also identify a subclass of UCQs for which we can provide random access with polylogarithmic access time. Finally, we present an implementation and an empirical study that show a considerable practical superiority of our random-order enumeration approach over state-of-the-art alternatives.
△ Less
Submitted 23 December, 2019;
originally announced December 2019.
-
Measuring Congruence on High Dimensional Time Series
Authors:
Jörg P. Bachmann,
Johann-Christoph Freytag,
Benjamin Hauskeller,
Nicole Schweikardt
Abstract:
A time series is a sequence of data items; typical examples are videos, stock ticker data, or streams of temperature measurements. Quite some research has been devoted to comparing and indexing simple time series, i.e., time series where the data items are real numbers or integers. However, for many application scenarios, the data items of a time series are not simple, but high-dimensional data po…
▽ More
A time series is a sequence of data items; typical examples are videos, stock ticker data, or streams of temperature measurements. Quite some research has been devoted to comparing and indexing simple time series, i.e., time series where the data items are real numbers or integers. However, for many application scenarios, the data items of a time series are not simple, but high-dimensional data points. Motivated by an application scenario dealing with motion gesture recognition, we develop a distance measure (which we call congruence distance) that serves as a model for the approximate congruency of two multi-dimensional time series. This distance measure generalizes the classical notion of congruence from point sets to multi-dimensional time series. We show that, given two input time series $S$ and $T$, computing the congruence distance of $S$ and $T$ is NP-hard. Afterwards, we present two algorithms that compute an approximation of the congruence distance. We provide theoretical bounds that relate these approximations with the exact congruence distance.
△ Less
Submitted 31 May, 2018; v1 submitted 27 May, 2018;
originally announced May 2018.
-
Answering UCQs under updates and in the presence of integrity constraints
Authors:
Christoph Berkholz,
Jens Keppeler,
Nicole Schweikardt
Abstract:
We investigate the query evaluation problem for fixed queries over fully dynamic databases where tuples can be inserted or deleted. The task is to design a dynamic data structure that can immediately report the new result of a fixed query after every database update. We consider unions of conjunctive queries (UCQs) and focus on the query evaluation tasks testing (decide whether an input tuple belo…
▽ More
We investigate the query evaluation problem for fixed queries over fully dynamic databases where tuples can be inserted or deleted. The task is to design a dynamic data structure that can immediately report the new result of a fixed query after every database update. We consider unions of conjunctive queries (UCQs) and focus on the query evaluation tasks testing (decide whether an input tuple belongs to the query result), enumeration (enumerate, without repetition, all tuples in the query result), and counting (output the number of tuples in the query result).
We identify three increasingly restrictive classes of UCQs which we call t-hierarchical, q-hierarchical, and exhaustively q-hierarchical UCQs. Our main results provide the following dichotomies: If the query's homomorphic core is t-hierarchical (q-hierarchical, exhaustively q-hierarchical), then the testing (enumeration, counting) problem can be solved with constant update time and constant testing time (delay, counting time). Otherwise, it cannot be solved with sublinear update time and sublinear testing time (delay, counting time), unless the OV-conjecture and/or the OMv-conjecture fails.
We also study the complexity of query evaluation in the dynamic setting in the presence of integrity constraints, and we obtain according dichotomy results for the special case of small domain constraints (i.e., constraints which state that all values in a particular column of a relation belong to a fixed domain of constant size).
△ Less
Submitted 28 September, 2017;
originally announced September 2017.
-
First-Order Query Evaluation with Cardinality Conditions
Authors:
Martin Grohe,
Nicole Schweikardt
Abstract:
We study an extension of first-order logic that allows to express cardinality conditions in a similar way as SQL's COUNT operator. The corresponding logic FOC(P) was introduced by Kuske and Schweikardt (LICS'17), who showed that query evaluation for this logic is fixed-parameter tractable on classes of structures (or databases) of bounded degree. In the present paper, we first show that the fixed-…
▽ More
We study an extension of first-order logic that allows to express cardinality conditions in a similar way as SQL's COUNT operator. The corresponding logic FOC(P) was introduced by Kuske and Schweikardt (LICS'17), who showed that query evaluation for this logic is fixed-parameter tractable on classes of structures (or databases) of bounded degree. In the present paper, we first show that the fixed-parameter tractability of FOC(P) cannot even be generalised to very simple classes of structures of unbounded degree such as unranked trees or strings with a linear order relation.
Then we identify a fragment FOC1(P) of FOC(P) which is still sufficiently strong to express standard applications of SQL's COUNT operator. Our main result shows that query evaluation for FOC1(P) is fixed-parameter tractable with almost linear running time on nowhere dense classes of structures. As a corollary, we also obtain a fixed-parameter tractable algorithm for counting the number of tuples satisfying a query over nowhere dense classes of structures.
△ Less
Submitted 19 July, 2017;
originally announced July 2017.
-
First-Order Logic with Counting: At Least, Weak Hanf Normal Forms Always Exist and Can Be Computed!
Authors:
Dietrich Kuske,
Nicole Schweikardt
Abstract:
We introduce the logic FOCN(P) which extends first-order logic by counting and by numerical predicates from a set P, and which can be viewed as a natural generalisation of various counting logics that have been studied in the literature.
We obtain a locality result showing that every FOCN(P)-formula can be transformed into a formula in Hanf normal form that is equivalent on all finite structures…
▽ More
We introduce the logic FOCN(P) which extends first-order logic by counting and by numerical predicates from a set P, and which can be viewed as a natural generalisation of various counting logics that have been studied in the literature.
We obtain a locality result showing that every FOCN(P)-formula can be transformed into a formula in Hanf normal form that is equivalent on all finite structures of degree at most d. A formula is in Hanf normal form if it is a Boolean combination of formulas describing the neighbourhood around its tuple of free variables and arithmetic sentences with predicates from P over atomic statements describing the number of realisations of a type with a single centre. The transformation into Hanf normal form can be achieved in time elementary in $d$ and the size of the input formula. From this locality result, we infer the following applications: (*) The Hanf-locality rank of first-order formulas of bounded quantifier alternation depth only grows polynomially with the formula size. (*) The model checking problem for the fragment FOC(P) of FOCN(P) on structures of bounded degree is fixed-parameter tractable (with elementary parameter dependence). (*) The query evaluation problem for fixed queries from FOC(P) over fully dynamic databases of degree at most d can be solved efficiently: there is a dynamic algorithm that can enumerate the tuples in the query result with constant delay, and that allows to compute the size of the query result and to test if a given tuple belongs to the query result within constant time after every database update.
△ Less
Submitted 3 March, 2017;
originally announced March 2017.
-
Answering FO+MOD queries under updates on bounded degree databases
Authors:
Christoph Berkholz,
Jens Keppeler,
Nicole Schweikardt
Abstract:
We investigate the query evaluation problem for fixed queries over fully dynamic databases, where tuples can be inserted or deleted. The task is to design a dynamic algorithm that immediately reports the new result of a fixed query after every database update. We consider queries in first-order logic (FO) and its extension with modulo-counting quantifiers (FO+MOD), and show that they can be effici…
▽ More
We investigate the query evaluation problem for fixed queries over fully dynamic databases, where tuples can be inserted or deleted. The task is to design a dynamic algorithm that immediately reports the new result of a fixed query after every database update. We consider queries in first-order logic (FO) and its extension with modulo-counting quantifiers (FO+MOD), and show that they can be efficiently evaluated under updates, provided that the dynamic database does not exceed a certain degree bound.
In particular, we construct a data structure that allows to answer a Boolean FO+MOD query and to compute the size of the result of a non-Boolean query within constant time after every database update. Furthermore, after every update we are able to immediately enumerate the new query result with constant delay between the output tuples. The time needed to build the data structure is linear in the size of the database. Our results extend earlier work on the evaluation of first-order queries on static databases of bounded degree and rely on an effective Hanf normal form for FO+MOD recently obtained by Heimberg, Kuske, and Schweikardt (LICS 2016).
△ Less
Submitted 28 February, 2017;
originally announced February 2017.
-
Answering Conjunctive Queries under Updates
Authors:
Christoph Berkholz,
Jens Keppeler,
Nicole Schweikardt
Abstract:
We consider the task of enumerating and counting answers to $k$-ary conjunctive queries against relational databases that may be updated by inserting or deleting tuples. We exhibit a new notion of q-hierarchical conjunctive queries and show that these can be maintained efficiently in the following sense. During a linear time preprocessing phase, we can build a data structure that enables constant…
▽ More
We consider the task of enumerating and counting answers to $k$-ary conjunctive queries against relational databases that may be updated by inserting or deleting tuples. We exhibit a new notion of q-hierarchical conjunctive queries and show that these can be maintained efficiently in the following sense. During a linear time preprocessing phase, we can build a data structure that enables constant delay enumeration of the query results; and when the database is updated, we can update the data structure and restart the enumeration phase within constant time. For the special case of self-join free conjunctive queries we obtain a dichotomy: if a query is not q-hierarchical, then query enumeration with sublinear$^\ast$ delay and sublinear update time (and arbitrary preprocessing time) is impossible.
For answering Boolean conjunctive queries and for the more general problem of counting the number of solutions of k-ary queries we obtain complete dichotomies: if the query's homomorphic core is q-hierarchical, then size of the the query result can be computed in linear time and maintained with constant update time. Otherwise, the size of the query result cannot be maintained with sublinear update time. All our lower bounds rely on the OMv-conjecture, a conjecture on the hardness of online matrix-vector multiplication that has recently emerged in the field of fine-grained complexity to characterise the hardness of dynamic problems. The lower bound for the counting problem additionally relies on the orthogonal vectors conjecture, which in turn is implied by the strong exponential time hypothesis.
$^\ast)$ By sublinear we mean $O(n^{1-\varepsilon})$ for some $\varepsilon>0$, where $n$ is the size of the active domain of the current database.
△ Less
Submitted 21 February, 2017;
originally announced February 2017.
-
On the locality of arb-invariant first-order formulas with modulo counting quantifiers
Authors:
Frederik Harwath,
Nicole Schweikardt
Abstract:
We study Gaifman locality and Hanf locality of an extension of first-order logic with modulo p counting quantifiers (FO+MOD_p, for short) with arbitrary numerical predicates. We require that the validity of formulas is independent of the particular interpretation of the numerical predicates and refer to such formulas as arb-invariant formulas. This paper gives a detailed picture of locality and no…
▽ More
We study Gaifman locality and Hanf locality of an extension of first-order logic with modulo p counting quantifiers (FO+MOD_p, for short) with arbitrary numerical predicates. We require that the validity of formulas is independent of the particular interpretation of the numerical predicates and refer to such formulas as arb-invariant formulas. This paper gives a detailed picture of locality and non-locality properties of arb-invariant FO+MOD_p. For example, on the class of all finite structures, for any p >= 2, arb-invariant FO+MOD_p is neither Hanf nor Gaifman local with respect to a sublinear locality radius. However, in case that p is an odd prime power, it is weakly Gaifman local with a polylogarithmic locality radius. And when restricting attention to the class of string structures, for odd prime powers p, arb-invariant FO+MOD_p is both Hanf and Gaifman local with a polylogarithmic locality radius. Our negative results build on examples of order-invariant FO+MOD_p formulas presented in Niemistö's PhD thesis. Our positive results make use of the close connection between FO+MOD_p and Boolean circuits built from NOT-gates and AND-, OR-, and MOD_p- gates of arbitrary fan-in.
△ Less
Submitted 27 December, 2016; v1 submitted 23 November, 2016;
originally announced November 2016.
-
Monadic Datalog Containment on Trees Using the Descendant-Axis
Authors:
André Frochaux,
Nicole Schweikardt
Abstract:
In their AMW14-paper, Frochaux, Grohe, and Schweikardt showed that the query containment problem for monadic datalog on finite unranked labeled trees is Exptime-complete when (a) considering unordered trees using the child-axis, and when (b) considering ordered trees using the axes firstchild, nextsibling, and child. Furthermore, when allowing to use also the descendant-axis, the query containment…
▽ More
In their AMW14-paper, Frochaux, Grohe, and Schweikardt showed that the query containment problem for monadic datalog on finite unranked labeled trees is Exptime-complete when (a) considering unordered trees using the child-axis, and when (b) considering ordered trees using the axes firstchild, nextsibling, and child. Furthermore, when allowing to use also the descendant-axis, the query containment problem was shown to be solvable in 2-fold exponential time, but it remained open to determine the problems exact complexity in presence of the descendant-axis. The present paper closes this gap by showing that, in the presence of the descendant-axis, the problem is 2Exptime-hard.
△ Less
Submitted 22 August, 2016;
originally announced August 2016.
-
Preservation and decomposition theorems for bounded degree structures
Authors:
Frederik Harwath,
Lucas Heimberg,
Nicole Schweikardt
Abstract:
We provide elementary algorithms for two preservation theorems for first-order sentences (FO) on the class âd of all finite structures of degree at most d: For each FO-sentence that is preserved under extensions (homomorphisms) on âd, a âd-equivalent existential (existential-positive) FO-sentence can be constructed in 5-fold (4-fold) exponential time. This is complemented by lower bounds showing…
▽ More
We provide elementary algorithms for two preservation theorems for first-order sentences (FO) on the class âd of all finite structures of degree at most d: For each FO-sentence that is preserved under extensions (homomorphisms) on âd, a âd-equivalent existential (existential-positive) FO-sentence can be constructed in 5-fold (4-fold) exponential time. This is complemented by lower bounds showing that a 3-fold exponential blow-up of the computed existential (existential-positive) sentence is unavoidable. Both algorithms can be extended (while maintaining the upper and lower bounds on their time complexity) to input first-order sentences with modulo m counting quantifiers (FO+MODm). Furthermore, we show that for an input FO-formula, a âd-equivalent Feferman-Vaught decomposition can be computed in 3-fold exponential time. We also provide a matching lower bound.
△ Less
Submitted 27 December, 2015; v1 submitted 18 November, 2015;
originally announced November 2015.
-
Monadic Datalog Containment on Trees
Authors:
André Frochaux,
Martin Grohe,
Nicole Schweikardt
Abstract:
We show that the query containment problem for monadic datalog on finite unranked labeled trees can be solved in 2-fold exponential time when (a) considering unordered trees using the axes child and descendant, and when (b) considering ordered trees using the axes firstchild, nextsibling, child, and descendant. When omitting the descendant-axis, we obtain that in both cases the problem is EXPTIME-…
▽ More
We show that the query containment problem for monadic datalog on finite unranked labeled trees can be solved in 2-fold exponential time when (a) considering unordered trees using the axes child and descendant, and when (b) considering ordered trees using the axes firstchild, nextsibling, child, and descendant. When omitting the descendant-axis, we obtain that in both cases the problem is EXPTIME-complete.
△ Less
Submitted 2 April, 2014;
originally announced April 2014.
-
A note on monadic datalog on unranked trees
Authors:
André Frochaux,
Nicole Schweikardt
Abstract:
In the article 'Recursive queries on trees and data trees' (ICDT'13), Abiteboul et al., asked whether the containment problem for monadic datalog over unordered unranked labeled trees using the child relation and the descendant relation is decidable. This note gives a positive answer to this question, as well as an overview of the relative expressive power of monadic datalog on various representat…
▽ More
In the article 'Recursive queries on trees and data trees' (ICDT'13), Abiteboul et al., asked whether the containment problem for monadic datalog over unordered unranked labeled trees using the child relation and the descendant relation is decidable. This note gives a positive answer to this question, as well as an overview of the relative expressive power of monadic datalog on various representations of unranked trees.
△ Less
Submitted 4 October, 2013;
originally announced October 2013.
-
A note on the expressive power of linear orders
Authors:
Thomas Schwentick,
Nicole Schweikardt
Abstract:
This article shows that there exist two particular linear orders such that first-order logic with these two linear orders has the same expressive power as first-order logic with the Bit-predicate FO(Bit). As a corollary we obtain that there also exists a built-in permutation such that first-order logic with a linear order and this permutation is as expressive as FO(Bit).
This article shows that there exist two particular linear orders such that first-order logic with these two linear orders has the same expressive power as first-order logic with the Bit-predicate FO(Bit). As a corollary we obtain that there also exists a built-in permutation such that first-order logic with a linear order and this permutation is as expressive as FO(Bit).
△ Less
Submitted 12 December, 2011; v1 submitted 25 November, 2011;
originally announced November 2011.
-
Lower Bounds for Multi-Pass Processing of Multiple Data Streams
Authors:
Nicole Schweikardt
Abstract:
This paper gives a brief overview of computation models for data stream processing, and it introduces a new model for multi-pass processing of multiple streams, the so-called mp2s-automata. Two algorithms for solving the set disjointness problem wi th these automata are presented. The main technical contribution of this paper is the proof of a lower bound on the size of memory and the number of…
▽ More
This paper gives a brief overview of computation models for data stream processing, and it introduces a new model for multi-pass processing of multiple streams, the so-called mp2s-automata. Two algorithms for solving the set disjointness problem wi th these automata are presented. The main technical contribution of this paper is the proof of a lower bound on the size of memory and the number of heads that are required for solvin g the set disjointness problem with mp2s-automata.
△ Less
Submitted 10 February, 2009;
originally announced February 2009.
-
Randomized Computations on Large Data Sets: Tight Lower Bounds
Authors:
Martin Grohe,
Andre Hernich,
Nicole Schweikardt
Abstract:
We study the randomized version of a computation model (introduced by Grohe, Koch, and Schweikardt (ICALP'05); Grohe and Schweikardt (PODS'05)) that restricts random access to external memory and internal memory space. Essentially, this model can be viewed as a powerful version of a data stream model that puts no cost on sequential scans of external memory (as other models for data streams) and,…
▽ More
We study the randomized version of a computation model (introduced by Grohe, Koch, and Schweikardt (ICALP'05); Grohe and Schweikardt (PODS'05)) that restricts random access to external memory and internal memory space. Essentially, this model can be viewed as a powerful version of a data stream model that puts no cost on sequential scans of external memory (as other models for data streams) and, in addition, (like other external memory models, but unlike streaming models), admits several large external memory devices that can be read and written to in parallel.
We obtain tight lower bounds for the decision problems set equality, multiset equality, and checksort. More precisely, we show that any randomized one-sided-error bounded Monte Carlo algorithm for these problems must perform Omega(log N) random accesses to external memory devices, provided that the internal memory size is at most O(N^(1/4)/log N), where N denotes the size of the input data.
From the lower bound on the set equality problem we can infer lower bounds on the worst case data complexity of query evaluation for the languages XQuery, XPath, and relational algebra on streaming data. More precisely, we show that there exist queries in XQuery, XPath, and relational algebra, such that any (randomized) Las Vegas algorithm that evaluates these queries must perform Omega(log N) random accesses to external memory devices, provided that the internal memory size is at most O(N^(1/4)/log N).
△ Less
Submitted 15 March, 2007;
originally announced March 2007.
-
Reversal Complexity Revisited
Authors:
Andre Hernich,
Nicole Schweikardt
Abstract:
We study a generalized version of reversal bounded Turing machines where, apart from several tapes on which the number of head reversals is bounded by r(n), there are several further tapes on which head reversals remain unrestricted, but size is bounded by s(n). Recently, such machines were introduced as a formalization of a computation model that restricts random access to external memory and i…
▽ More
We study a generalized version of reversal bounded Turing machines where, apart from several tapes on which the number of head reversals is bounded by r(n), there are several further tapes on which head reversals remain unrestricted, but size is bounded by s(n). Recently, such machines were introduced as a formalization of a computation model that restricts random access to external memory and internal memory space. Here, each of the tapes with a restriction on the head reversals corresponds to an external memory device, and the tapes of restricted size model internal memory. We use ST(r(n),s(n),O(1)) to denote the class of all problems that can be solved by deterministic Turing machines that comply to the above resource bounds. Similarly, NST and RST, respectively, are used for the corresponding nondeterministic and randomized classes.
While previous papers focused on lower bounds for particular problems, including sorting, the set equality problem, and several query evaluation problems, the present paper addresses the relations between the (R,N)ST-classes and classical complexity classes and investigates the structural complexity of the (R,N)ST-classes. Our main results are (1) a trade-off between internal memory space and external memory head reversals, (2) correspondences between the (R,N)ST-classes and ``classical'' time-bounded, space-bounded, reversal-bounded, and circuit complexity classes, and (3) hierarchies of (R)ST-classes in terms of increasing numbers of head reversals on external memory tapes.
△ Less
Submitted 7 August, 2006;
originally announced August 2006.
-
Tight Lower Bounds for Query Processing on Streaming and External Memory Data
Authors:
Martin Grohe,
Christoph Koch,
Nicole Schweikardt
Abstract:
We study a clean machine model for external memory and stream processing. We show that the number of scans of the external data induces a strict hierarchy (as long as work space is sufficiently small, e.g., polylogarithmic in the size of the input). We also show that neither joins nor sorting are feasible if the product of the number $r(n)$ of scans of the external memory and the size $s(n)$ of…
▽ More
We study a clean machine model for external memory and stream processing. We show that the number of scans of the external data induces a strict hierarchy (as long as work space is sufficiently small, e.g., polylogarithmic in the size of the input). We also show that neither joins nor sorting are feasible if the product of the number $r(n)$ of scans of the external memory and the size $s(n)$ of the internal memory buffers is sufficiently small, e.g., of size $o(\sqrt[5]{n})$. We also establish tight bounds for the complexity of XPath evaluation and filtering.
△ Less
Submitted 29 April, 2005;
originally announced May 2005.
-
The succinctness of first-order logic on linear orders
Authors:
Martin Grohe,
Nicole Schweikardt
Abstract:
Succinctness is a natural measure for comparing the strength of different logics. Intuitively, a logic L_1 is more succinct than another logic L_2 if all properties that can be expressed in L_2 can be expressed in L_1 by formulas of (approximately) the same size, but some properties can be expressed in L_1 by (significantly) smaller formulas.
We study the succinctness of logics on linear order…
▽ More
Succinctness is a natural measure for comparing the strength of different logics. Intuitively, a logic L_1 is more succinct than another logic L_2 if all properties that can be expressed in L_2 can be expressed in L_1 by formulas of (approximately) the same size, but some properties can be expressed in L_1 by (significantly) smaller formulas.
We study the succinctness of logics on linear orders. Our first theorem is concerned with the finite variable fragments of first-order logic. We prove that:
(i) Up to a polynomial factor, the 2- and the 3-variable fragments of first-order logic on linear orders have the same succinctness. (ii) The 4-variable fragment is exponentially more succinct than the 3-variable fragment. Our second main result compares the succinctness of first-order logic on linear orders with that of monadic second-order logic. We prove that the fragment of monadic second-order logic that has the same expressiveness as first-order logic on linear orders is non-elementarily more succinct than first-order logic.
△ Less
Submitted 8 March, 2006; v1 submitted 9 February, 2005;
originally announced February 2005.
-
Schema-based Scheduling of Event Processors and Buffer Minimization for Queries on Structured Data Streams
Authors:
Christoph Koch,
Stefanie Scherzinger,
Nicole Schweikardt,
Bernhard Stegmaier
Abstract:
We introduce an extension of the XQuery language, FluX, that supports event-based query processing and the conscious handling of main memory buffers. Purely event-based queries of this language can be executed on streaming XML data in a very direct way. We then develop an algorithm that allows to efficiently rewrite XQueries into the event-based FluX language. This algorithm uses order constrain…
▽ More
We introduce an extension of the XQuery language, FluX, that supports event-based query processing and the conscious handling of main memory buffers. Purely event-based queries of this language can be executed on streaming XML data in a very direct way. We then develop an algorithm that allows to efficiently rewrite XQueries into the event-based FluX language. This algorithm uses order constraints from a DTD to schedule event handlers and to thus minimize the amount of buffering required for evaluating a query. We discuss the various technical aspects of query optimization and query evaluation within our framework. This is complemented with an experimental evaluation of our approach.
△ Less
Submitted 7 June, 2004;
originally announced June 2004.
-
An Ehrenfeucht-Fraisse Game Approach to Collapse Results in Database Theory
Authors:
Nicole Schweikardt
Abstract:
We present a new Ehrenfeucht-Fraisse game approach to collapse results in database theory and we show that, in principle, this approach suffices to prove every natural generic collapse result. Following this approach we can deal with certain infinite databases where previous, highly involved methods fail. We prove the natural generic collapse for Z-embeddable databases over any linearly ordered…
▽ More
We present a new Ehrenfeucht-Fraisse game approach to collapse results in database theory and we show that, in principle, this approach suffices to prove every natural generic collapse result. Following this approach we can deal with certain infinite databases where previous, highly involved methods fail. We prove the natural generic collapse for Z-embeddable databases over any linearly ordered context structure with arbitrary monadic predicates, and for N-embeddable databases over the context structure (R,<,+,Mon_Q,Groups). Here, N, Z, R, denote the sets of natural numbers, integers, and real numbers, respectively. Groups is the collection of all subgroups of (R,+) that contain Z, and Mon_Q is the collection of all subsets of a particular infinite subset Q of N. Restricting the complexity of the formulas that may be used to formulate queries to Boolean combinations of purely existential first-order formulas, we even obtain the collapse for N-embeddable databases over any linearly ordered context structure with arbitrary predicates. Finally, we develop the notion of N-representable databases, which is a natural generalization of the classical notion of finitely representable databases. We show that natural generic collapse results for N-embeddable databases can be lifted to the larger class of N-representable databases. To obtain, in particular, the collapse result for (N,<,+,Mon_Q), we explicitly construct a winning strategy for the duplicator in the presence of the built-in addition relation +. This, as a side product, also leads to an Ehrenfeucht-Fraisse game proof of the theorem of Ginsburg and Spanier, stating that the spectra of FO(<,+)-sentences are semi-linear.
△ Less
Submitted 20 December, 2002;
originally announced December 2002.
-
Arithmetic, First-Order Logic, and Counting Quantifiers
Authors:
Nicole Schweikardt
Abstract:
This paper gives a thorough overview of what is known about first-order logic with counting quantifiers and with arithmetic predicates. As a main theorem we show that Presburger arithmetic is closed under unary counting quantifiers. Precisely, this means that for every first-order formula phi(y,z_1,...,z_k) over the signature {<,+} there is a first-order formula psi(x,z_1,...,z_k) which expresse…
▽ More
This paper gives a thorough overview of what is known about first-order logic with counting quantifiers and with arithmetic predicates. As a main theorem we show that Presburger arithmetic is closed under unary counting quantifiers. Precisely, this means that for every first-order formula phi(y,z_1,...,z_k) over the signature {<,+} there is a first-order formula psi(x,z_1,...,z_k) which expresses over the structure <Nat,<,+> (respectively, over initial segments of this structure) that the variable x is interpreted exactly by the number of possible interpretations of the variable y for which the formula phi(y,z_1,...,z_k) is satisfied. Applying this theorem, we obtain an easy proof of Ruhl's result that reachability (and similarly, connectivity) in finite graphs is not expressible in first-order logic with unary counting quantifiers and addition. Furthermore, the above result on Presburger arithmetic helps to show the failure of a particular version of the Crane Beach conjecture.
△ Less
Submitted 19 November, 2002;
originally announced November 2002.