-
Parikh's Theorem Made Symbolic
Authors:
Matthew Hague,
Artur Jeż,
Anthony W. Lin
Abstract:
Parikh's Theorem is a fundamental result in automata theory with numerous applications in computer science: software verification (e.g. infinite-state verification, string constraints, and theory of arrays), verification of cryptographic protocols (e.g. using Horn clauses modulo equational theories) and database querying (e.g. evaluating path-queries in graph databases). Parikh's Theorem states th…
▽ More
Parikh's Theorem is a fundamental result in automata theory with numerous applications in computer science: software verification (e.g. infinite-state verification, string constraints, and theory of arrays), verification of cryptographic protocols (e.g. using Horn clauses modulo equational theories) and database querying (e.g. evaluating path-queries in graph databases). Parikh's Theorem states that the letter-counting abstraction of a language recognized by finite automata or context-free grammars is definable in Presburger Arithmetic. Unfortunately, real-world applications typically require large alphabets - which are well-known to be not amenable to explicit treatment of the alphabets.
Symbolic automata have proven in the last decade to be an effective algorithmic framework for handling large finite or even infinite alphabets. A symbolic automaton employs an effective boolean algebra, which offers a symbolic representation of character sets and often lends itself to an exponentially more succinct representation of a language. Instead of letter-counting, Parikh's Theorem for symbolic automata amounts to counting the number of times different predicates are satisfied by an input sequence. Unfortunately, naively applying Parikh's Theorem from classical automata theory to symbolic automata yields existential Presburger formulas of exponential size. We provide a new construction for Parikh's Theorem for symbolic automata and grammars, which avoids this exponential blowup: our algorithm computes an existential formula in polynomial-time over (quantifier-free) Presburger and the base theory. In fact, our algorithm extends to the model of parametric symbolic grammars, which are one of the most expressive models of languages over infinite alphabets. We have implemented our algorithm and show it can be used to solve string constraints that are difficult to solve by existing solvers.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
Decision Procedures for Sequence Theories (Technical Report)
Authors:
Artur Jeż,
Anthony W. Lin,
Oliver Markgraf,
Philipp Rümmer
Abstract:
Sequence theories are an extension of theories of strings with an infinite alphabet of letters, together with a corresponding alphabet theory (e.g. linear integer arithmetic). Sequences are natural abstractions of extendable arrays, which permit a wealth of operations including append, map, split, and concatenation. In spite of the growing amount of tool support for theories of sequences by leadin…
▽ More
Sequence theories are an extension of theories of strings with an infinite alphabet of letters, together with a corresponding alphabet theory (e.g. linear integer arithmetic). Sequences are natural abstractions of extendable arrays, which permit a wealth of operations including append, map, split, and concatenation. In spite of the growing amount of tool support for theories of sequences by leading SMT-solvers, little is known about the decidability of sequence theories, which is in stark contrast to the state of the theories of strings. We show that the decidable theory of strings with concatenation and regular constraints can be extended to the world of sequences over an alphabet theory that forms a Boolean algebra, while preserving decidability. In particular, decidability holds when regular constraints are interpreted as parametric automata (which extend both symbolic automata and variable automata), but fails when interpreted as register automata (even over the alphabet theory of equality). When length constraints are added, the problem is Turing-equivalent to word equations with length (and regular) constraints. Similar investigations are conducted in the presence of symbolic transducers, which naturally model sequence functions like map, split, filter, etc. We have developed a new sequence solver, SeCo, based on parametric automata, and show its efficacy on two classes of benchmarks: (i) invariant checking on array-manipulating programs and parameterized systems, and (ii) benchmarks on symbolic register automata.
△ Less
Submitted 31 July, 2023;
originally announced August 2023.
-
Space-efficient conversions from SLPs
Authors:
Travis Gagie,
Adrián Goga,
Artur Jeż,
Gonzalo Navarro
Abstract:
We give algorithms that, given a straight-line program (SLP) with $g$ rules that generates (only) a text $T [1..n]$, builds within $O(g)$ space the Lempel-Ziv (LZ) parse of $T$ (of $z$ phrases) in time $O(n\log^2 n)$ or in time $O(gz\log^2(n/z))$. We also show how to build a locally consistent grammar (LCG) of optimal size $g_{lc} = O(δ\log\frac{n}δ)$ from the SLP within $O(g+g_{lc})$ space and in…
▽ More
We give algorithms that, given a straight-line program (SLP) with $g$ rules that generates (only) a text $T [1..n]$, builds within $O(g)$ space the Lempel-Ziv (LZ) parse of $T$ (of $z$ phrases) in time $O(n\log^2 n)$ or in time $O(gz\log^2(n/z))$. We also show how to build a locally consistent grammar (LCG) of optimal size $g_{lc} = O(δ\log\frac{n}δ)$ from the SLP within $O(g+g_{lc})$ space and in $O(n\log g)$ time, where $δ$ is the substring complexity measure of $T$. Finally, we show how to build the LZ parse of $T$ from such a LCG within $O(g_{lc})$ space and in time $O(z\log^2 n \log^2(n/z))$. All our results hold with high probability.
△ Less
Submitted 10 October, 2023; v1 submitted 5 December, 2022;
originally announced December 2022.
-
Solving one variable word equations in the free group in cubic time
Authors:
Robert Ferens,
Artur Jeż
Abstract:
A word equation with one variable in a free group is given as $U = V$, where both $U$ and $V$ are words over the alphabet of generators of the free group and $X, X^{-1}$, for a fixed variable $X$. An element of the free group is a solution when substituting it for $X$ yields a true equality (interpreted in the free group) of left- and right-hand sides. It is known that the set of all solutions of…
▽ More
A word equation with one variable in a free group is given as $U = V$, where both $U$ and $V$ are words over the alphabet of generators of the free group and $X, X^{-1}$, for a fixed variable $X$. An element of the free group is a solution when substituting it for $X$ yields a true equality (interpreted in the free group) of left- and right-hand sides. It is known that the set of all solutions of a given word equation with one variable is a finite union of sets of the form $\{αw^i β\: : \: i \in \mathbb Z \}$, where $α, w, β$ are reduced words over the alphabet of generators, and a polynomial-time algorithm (of a high degree) computing this set is known. We provide a cubic time algorithm for this problem, which also shows that the set of solutions consists of at most a quadratic number of the above-mentioned sets. The algorithm uses only simple tools of word combinatorics and group theory and is simple to state. Its analysis is involved and focuses on the combinatorics of occurrences of powers of a word within a larger word.
△ Less
Submitted 15 January, 2021;
originally announced January 2021.
-
The smallest grammar problem revisited
Authors:
Hideo Bannai,
Momoko Hirayama,
Danny Hucke,
Shunsuke Inenaga,
Artur Jez,
Markus Lohrey,
Carl Philipp Reh
Abstract:
In a seminal paper of Charikar et al. on the smallest grammar problem, the authors derive upper and lower bounds on the approximation ratios for several grammar-based compressors, but in all cases there is a gap between the lower and upper bound. Here the gaps for $\mathsf{LZ78}$ and $\mathsf{BISECTION}$ are closed by showing that the approximation ratio of $\mathsf{LZ78}$ is…
▽ More
In a seminal paper of Charikar et al. on the smallest grammar problem, the authors derive upper and lower bounds on the approximation ratios for several grammar-based compressors, but in all cases there is a gap between the lower and upper bound. Here the gaps for $\mathsf{LZ78}$ and $\mathsf{BISECTION}$ are closed by showing that the approximation ratio of $\mathsf{LZ78}$ is $Θ( (n/\log n)^{2/3})$, whereas the approximation ratio of $\mathsf{BISECTION}$ is $Θ(\sqrt{n/\log n})$. In addition, the lower bound for $\mathsf{RePair}$ is improved from $Ω(\sqrt{\log n})$ to $Ω(\log n/\log\log n)$. Finally, results of Arpe and Reischuk relating grammar-based compression for arbitrary alphabets and binary alphabets are improved.
△ Less
Submitted 18 August, 2019;
originally announced August 2019.
-
Balancing Straight-Line Programs
Authors:
Moses Ganardi,
Artur Jeż,
Markus Lohrey
Abstract:
It is shown that a context-free grammar of size $m$ that produces a single string $w$ (such a grammar is also called a string straight-line program) can be transformed in linear time into a context-free grammar for $w$ of size $\mathcal{O}(m)$, whose unique derivation tree has depth $\mathcal{O}(\log |w|)$. This solves an open problem in the area of grammar-based compression. Similar results are s…
▽ More
It is shown that a context-free grammar of size $m$ that produces a single string $w$ (such a grammar is also called a string straight-line program) can be transformed in linear time into a context-free grammar for $w$ of size $\mathcal{O}(m)$, whose unique derivation tree has depth $\mathcal{O}(\log |w|)$. This solves an open problem in the area of grammar-based compression. Similar results are shown for two formalism for grammar-based tree compression: top dags and forest straight-line programs. These balancing results are all deduced from a single meta theorem stating that the depth of an algebraic circuit over an algebra with a certain finite base property can be reduced to $\mathcal{O}(\log n)$ with the cost of a constant multiplicative size increase. Here, $n$ refers to the size of the unfolding (or unravelling) of the circuit.
△ Less
Submitted 1 July, 2020; v1 submitted 10 February, 2019;
originally announced February 2019.
-
Approximation ratio of RePair
Authors:
Danny Hucke,
Artur Jez,
Markus Lohrey
Abstract:
In a seminal paper of Charikar et al.~on the smallest grammar problem, the authors derive upper and lower bounds on the approximation ratios for several grammar-based compressors. Here we improve the lower bound for the famous {\sf RePair} algorithm from $Ω(\sqrt{\log n})$ to $Ω(\log n/\log\log n)$. The family of words used in our proof is defined over a binary alphabet, while the lower bound from…
▽ More
In a seminal paper of Charikar et al.~on the smallest grammar problem, the authors derive upper and lower bounds on the approximation ratios for several grammar-based compressors. Here we improve the lower bound for the famous {\sf RePair} algorithm from $Ω(\sqrt{\log n})$ to $Ω(\log n/\log\log n)$. The family of words used in our proof is defined over a binary alphabet, while the lower bound from Charikar et al. needs an alphabet of logarithmic size in the length of the provided words.
△ Less
Submitted 17 March, 2017;
originally announced March 2017.
-
Word equations in linear space
Authors:
Artur Jeż
Abstract:
Satisfiability of word equations is an important problem in the intersection of formal languages and algebra: Given two sequences consisting of letters and variables we are to decide whether there is a substitution for the variables that turns this equation into true equality of strings. The exact computational complexity of this problem remains unknown, with the best lower and upper bounds being,…
▽ More
Satisfiability of word equations is an important problem in the intersection of formal languages and algebra: Given two sequences consisting of letters and variables we are to decide whether there is a substitution for the variables that turns this equation into true equality of strings. The exact computational complexity of this problem remains unknown, with the best lower and upper bounds being, respectively, NP and PSPACE. Recently, the novel technique of recompression was applied to this problem, simplifying the known proofs and lowering the space complexity to (nondeterministic) O(n log n). In this paper we show that satisfiability of word equations is in nondeterministic linear space, thus the language of satisfiable word equations is context-sensitive, and by the famous Immerman-Szelepcsenyi theorem: the language of unsatisfiable word equations is also context-sensitive. We use the known recompression-based algorithm and additionally employ Huffman coding for letters. The proof, however, uses analysis of how the fragments of the equation depend on each other as well as a new strategy for nondeterministic choices of the algorithm, which uses several new ideas to limit the space occupied by the letters.
△ Less
Submitted 16 October, 2020; v1 submitted 2 February, 2017;
originally announced February 2017.
-
Solutions of Word Equations over Partially Commutative Structures
Authors:
Volker Diekert,
Artur Jeż,
Manfred Kufleitner
Abstract:
We give NSPACE(n log n) algorithms solving the following decision problems. Satisfiability: Is the given equation over a free partially commutative monoid with involution (resp. a free partially commutative group) solvable? Finiteness: Are there only finitely many solutions of such an equation? PSPACE algorithms with worse complexities for the first problem are known, but so far, a PSPACE algorith…
▽ More
We give NSPACE(n log n) algorithms solving the following decision problems. Satisfiability: Is the given equation over a free partially commutative monoid with involution (resp. a free partially commutative group) solvable? Finiteness: Are there only finitely many solutions of such an equation? PSPACE algorithms with worse complexities for the first problem are known, but so far, a PSPACE algorithm for the second problem was out of reach. Our results are much stronger: Given such an equation, its solutions form an EDT0L language effectively representable in NSPACE(n log n). In particular, we give an effective description of the set of all solutions for equations with constraints in free partially commutative monoids and groups.
△ Less
Submitted 9 March, 2016;
originally announced March 2016.
-
Constructing small tree grammars and small circuits for formulas
Authors:
Moses Ganardi,
Danny Hucke,
Artur Jez,
Markus Lohrey,
Eric Noeth
Abstract:
It is shown that every tree of size $n$ over a fixed set of $σ$ different ranked symbols can be decomposed (in linear time as well as in logspace) into $O\big(\frac{n}{\log_σn}\big) = O\big(\frac{n \log σ}{\log n}\big)$ many hierarchically defined pieces. Formally, such a hierarchical decomposition has the form of a straight-line linear context-free tree grammar of size…
▽ More
It is shown that every tree of size $n$ over a fixed set of $σ$ different ranked symbols can be decomposed (in linear time as well as in logspace) into $O\big(\frac{n}{\log_σn}\big) = O\big(\frac{n \log σ}{\log n}\big)$ many hierarchically defined pieces. Formally, such a hierarchical decomposition has the form of a straight-line linear context-free tree grammar of size $O\big(\frac{n}{\log_σn}\big)$, which can be used as a compressed representation of the input tree. This generalizes an analogous result for strings. Previous grammar-based tree compressors were not analyzed for the worst-case size of the computed grammar, except for the top dag of Bille et al., for which only the weaker upper bound of $O\big(\frac{n}{\log_σ^{0.19} n}\big)$ (which was very recently improved to $O\big(\frac{n \cdot \log \log_σn}{\log_σn}\big)$ by Hübschle-Schneider and Raman) for unranked and unlabelled trees has been derived. The main result is used to show that every arithmetical formula of size $n$, in which only $m \leq n$ different variables occur, can be transformed (in linear time as well as in logspace) into an arithmetical circuit of size $O\big(\frac{n \cdot \log m}{\log n}\big)$ and depth $O(\log n)$. This refines a classical result of Brent from 1974, according to which an arithmetical formula of size $n$ can be transformed into a logarithmic depth circuit of size $O(n)$.
△ Less
Submitted 21 September, 2015; v1 submitted 16 July, 2014;
originally announced July 2014.
-
Finding All Solutions of Equations in Free Groups and Monoids with Involution
Authors:
Volker Diekert,
Artur Jeż,
Wojciech Plandowski
Abstract:
The aim of this paper is to present a PSPACE algorithm which yields a finite graph of exponential size and which describes the set of all solutions of equations in free groups as well as the set of all solutions of equations in free monoids with involution in the presence of rational constraints. This became possible due to the recently invented emph{recompression} technique of the second author.…
▽ More
The aim of this paper is to present a PSPACE algorithm which yields a finite graph of exponential size and which describes the set of all solutions of equations in free groups as well as the set of all solutions of equations in free monoids with involution in the presence of rational constraints. This became possible due to the recently invented emph{recompression} technique of the second author.
He successfully applied the recompression technique for pure word equations without involution or rational constraints. In particular, his method could not be used as a black box for free groups (even without rational constraints). Actually, the presence of an involution (inverse elements) and rational constraints complicates the situation and some additional analysis is necessary. Still, the recompression technique is general enough to accommodate both extensions. In the end, it simplifies proofs that solving word equations is in PSPACE (Plandowski 1999) and the corresponding result for equations in free groups with rational constraints (Diekert, Hagenah and Gutierrez 2001). As a byproduct we obtain a direct proof that it is decidable in PSPACE whether or not the solution set is finite.
△ Less
Submitted 21 May, 2014; v1 submitted 20 May, 2014;
originally announced May 2014.
-
A really simple approximation of smallest grammar
Authors:
Artur Jeż
Abstract:
In this paper we present a really simple linear-time algorithm constructing a context-free grammar of size O(g log (N/g)) for the input string, where N is the size of the input string and g the size of the optimal grammar generating this string. The algorithm works for arbitrary size alphabets, but the running time is linear assuming that the alphabet Sigma of the input string can be identified wi…
▽ More
In this paper we present a really simple linear-time algorithm constructing a context-free grammar of size O(g log (N/g)) for the input string, where N is the size of the input string and g the size of the optimal grammar generating this string. The algorithm works for arbitrary size alphabets, but the running time is linear assuming that the alphabet Sigma of the input string can be identified with numbers from 1,ldots, N^c for some constant c. Algorithms with such an approximation guarantee and running time are known, however all of them were non-trivial and their analyses were involved. The here presented algorithm computes the LZ77 factorisation and transforms it in phases to a grammar. In each phase it maintains an LZ77-like factorisation of the word with at most l factors as well as additional O(l) letters, where l was the size of the original LZ77 factorisation. In one phase in a greedy way (by a left-to-right sweep and a help of the factorisation) we choose a set of pairs of consecutive letters to be replaced with new symbols, i.e. nonterminals of the constructed grammar. We choose at least 2/3 of the letters in the word and there are O(l) many different pairs among them. Hence there are O(log N) phases, each of them introduces O(l) nonterminals to a grammar. A more precise analysis yields a bound O(l log(N/l)). As l \leq g, this yields the desired bound O(g log(N/g)).
△ Less
Submitted 18 March, 2014;
originally announced March 2014.
-
Context unification is in PSPACE
Authors:
Artur Jeż
Abstract:
Contexts are terms with one `hole', i.e. a place in which we can substitute an argument. In context unification we are given an equation over terms with variables representing contexts and ask about the satisfiability of this equation. Context unification is a natural subvariant of second-order unification, which is undecidable, and a generalization of word equations, which are decidable, at the s…
▽ More
Contexts are terms with one `hole', i.e. a place in which we can substitute an argument. In context unification we are given an equation over terms with variables representing contexts and ask about the satisfiability of this equation. Context unification is a natural subvariant of second-order unification, which is undecidable, and a generalization of word equations, which are decidable, at the same time. It is the unique problem between those two whose decidability is uncertain (for already almost two decades). In this paper we show that the context unification is in PSPACE. The result holds under a (usual) assumption that the first-order signature is finite.
This result is obtained by an extension of the recompression technique, recently developed by the author and used in particular to obtain a new PSPACE algorithm for satisfiability of word equations, to context unification. The recompression is based on performing simple compression rules (replacing pairs of neighbouring function symbols), which are (conceptually) applied on the solution of the context equation and modifying the equation in a way so that such compression steps can be in fact performed directly on the equation, without the knowledge of the actual solution.
△ Less
Submitted 8 November, 2013; v1 submitted 16 October, 2013;
originally announced October 2013.
-
Approximation of smallest linear tree grammar
Authors:
Artur Jeż,
Markus Lohrey
Abstract:
A simple linear-time algorithm for constructing a linear context-free tree grammar of size O(rg + r g log (n/r g))for a given input tree T of size n is presented, where g is the size of a minimal linear context-free tree grammar for T, and r is the maximal rank of symbols in T (which is a constant in many applications). This is the first example of a grammar-based tree compression algorithm with a…
▽ More
A simple linear-time algorithm for constructing a linear context-free tree grammar of size O(rg + r g log (n/r g))for a given input tree T of size n is presented, where g is the size of a minimal linear context-free tree grammar for T, and r is the maximal rank of symbols in T (which is a constant in many applications). This is the first example of a grammar-based tree compression algorithm with a good, i.e. logarithmic in terms of the size of the input tree, approximation ratio. The analysis of the algorithm uses an extension of the recompression technique from strings to trees.
△ Less
Submitted 6 October, 2018; v1 submitted 19 September, 2013;
originally announced September 2013.
-
One-variable word equations in linear time
Authors:
Artur Jeż
Abstract:
In this paper we consider word equations with one variable (and arbitrary many appearances of it). A recent technique of recompression, which is applicable to general word equations, is shown to be suitable also in this case. While in general case it is non-deterministic, it determinises in case of one variable and the obtained running time is O(n + #_X log n), where #_X is the number of appearanc…
▽ More
In this paper we consider word equations with one variable (and arbitrary many appearances of it). A recent technique of recompression, which is applicable to general word equations, is shown to be suitable also in this case. While in general case it is non-deterministic, it determinises in case of one variable and the obtained running time is O(n + #_X log n), where #_X is the number of appearances of the variable in the equation. This matches the previously-best algorithm due to Dąbrowski and Plandowski. Then, using a couple of heuristics as well as more detailed time analysis the running time is lowered to O(n) in RAM model. Unfortunately no new properties of solutions are shown.
△ Less
Submitted 22 January, 2014; v1 submitted 14 February, 2013;
originally announced February 2013.
-
Approximation of grammar-based compression via recompression
Authors:
Artur Jeż
Abstract:
In this paper we present a simple linear-time algorithm constructing a context-free grammar of size O(g log(N/g)) for the input string, where N is the size of the input string and g the size of the optimal grammar generating this string. The algorithm works for arbitrary size alphabets, but the running time is linear assuming that the alphabet Σof the input string can be identified with numbers fr…
▽ More
In this paper we present a simple linear-time algorithm constructing a context-free grammar of size O(g log(N/g)) for the input string, where N is the size of the input string and g the size of the optimal grammar generating this string. The algorithm works for arbitrary size alphabets, but the running time is linear assuming that the alphabet Σof the input string can be identified with numbers from {1, ..., N^c} for some constant c. Otherwise, additional cost of O(n log|Σ|) is needed.
Algorithms with such approximation guarantees and running time are known, the novelty of this paper is a particular simplicity of the algorithm as well as the analysis of the algorithm, which uses a general technique of recompression recently introduced by the author. Furthermore, contrary to the previous results, this work does not use the LZ representation of the input string in the construction, nor in the analysis.
△ Less
Submitted 7 November, 2013; v1 submitted 24 January, 2013;
originally announced January 2013.
-
Recompression: a simple and powerful technique for word equations
Authors:
Artur Jeż
Abstract:
In this paper we present an application of a simple technique of local recompression, previously developed by the author in the context of compressed membership problems and compressed pattern matching, to word equations. The technique is based on local modification of variables (replacing X by aX or Xa) and iterative replacement of pairs of letters appearing in the equation by a `fresh' letter, w…
▽ More
In this paper we present an application of a simple technique of local recompression, previously developed by the author in the context of compressed membership problems and compressed pattern matching, to word equations. The technique is based on local modification of variables (replacing X by aX or Xa) and iterative replacement of pairs of letters appearing in the equation by a `fresh' letter, which can be seen as a bottom-up compression of the solution of the given word equation, to be more specific, building an SLP (Straight-Line Programme) for the solution of the word equation.
Using this technique we give a new, independent and self-contained proofs of most of the known results for word equations. To be more specific, the presented (nondeterministic) algorithm runs in O(n log n) space and in time polynomial in log N, where N is the size of the length-minimal solution of the word equation. The presented algorithm can be easily generalised to a generator of all solutions of the given word equation (without increasing the space usage). Furthermore, a further analysis of the algorithm yields a doubly exponential upper bound on the size of the length-minimal solution. The presented algorithm does not use exponential bound on the exponent of periodicity. Conversely, the analysis of the algorithm yields an independent proof of the exponential bound on exponent of periodicity.
We believe that the presented algorithm, its idea and analysis are far simpler than all previously applied. Furthermore, thanks to it we can obtain a unified and simple approach to most of known results for word equations.
As a small additional result we show that for O(1) variables (with arbitrary many appearances in the equation) word equations can be solved in linear space, i.e. they are context-sensitive.
△ Less
Submitted 18 March, 2014; v1 submitted 16 March, 2012;
originally announced March 2012.
-
Faster fully compressed pattern matching by recompression
Authors:
Artur Jeż
Abstract:
In this paper, a fully compressed pattern matching problem is studied. The compression is represented by straight-line programs (SLPs), i.e. a context-free grammars generating exactly one string; the term fully means that both the pattern and the text are given in the compressed form. The problem is approached using a recently developed technique of local recompression: the SLPs are refactored, so…
▽ More
In this paper, a fully compressed pattern matching problem is studied. The compression is represented by straight-line programs (SLPs), i.e. a context-free grammars generating exactly one string; the term fully means that both the pattern and the text are given in the compressed form. The problem is approached using a recently developed technique of local recompression: the SLPs are refactored, so that substrings of the pattern and text are encoded in both SLPs in the same way. To this end, the SLPs are locally decompressed and then recompressed in a uniform way.
This technique yields an O((n+m)log M) algorithm for compressed pattern matching, assuming that M fits in O(1) machine words, where n (m) is the size of the compressed representation of the text (pattern, respectively), while M is the size of the decompressed pattern. If only m+n fits in O(1) machine words, the running time increases to O((n+m)log M log(n+m)). The previous best algorithm due to Lifshits had O(n^2m) running time.
△ Less
Submitted 25 June, 2013; v1 submitted 14 November, 2011;
originally announced November 2011.
-
Compressed Membership for NFA (DFA) with Compressed Labels is in NP (P)
Authors:
Artur Jeż
Abstract:
In this paper, a compressed membership problem for finite automata, both deterministic and non-deterministic, with compressed transition labels is studied. The compression is represented by straight-line programs (SLPs), i.e. context-free grammars generating exactly one string. A novel technique of dealing with SLPs is introduced: the SLPs are recompressed, so that substrings of the input text are…
▽ More
In this paper, a compressed membership problem for finite automata, both deterministic and non-deterministic, with compressed transition labels is studied. The compression is represented by straight-line programs (SLPs), i.e. context-free grammars generating exactly one string. A novel technique of dealing with SLPs is introduced: the SLPs are recompressed, so that substrings of the input text are encoded in SLPs labelling the transitions of the NFA (DFA) in the same way, as in the SLP representing the input text. To this end, the SLPs are locally decompressed and then recompressed in a uniform way. Furthermore, such recompression induces only small changes in the automaton, in particular, the size of the automaton remains polynomial.
Using this technique it is shown that the compressed membership for NFA with compressed labels is in NP, thus confirming the conjecture of Plandowski and Rytter and extending the partial result of Lohrey and Mathissen; as it is already known, that this problem is NP-hard, we settle its exact computational complexity. Moreover, the same technique applied to the compressed membership for DFA with compressed labels yields that this problem is in P; for this problem, only trivial upper-bound PSPACE was known.
△ Less
Submitted 11 October, 2011;
originally announced October 2011.
-
On minimising automata with errors
Authors:
Paweł Gawrychowski,
Artur Jeż,
Andreas Maletti
Abstract:
The problem of k-minimisation for a DFA M is the computation of a smallest DFA N (where the size |M| of a DFA M is the size of the domain of the transition function) such that their recognized languages differ only on words of length less than k. The previously best algorithm, which runs in time O(|M| log^2 n) where n is the number of states, is extended to DFAs with partial transition functions.…
▽ More
The problem of k-minimisation for a DFA M is the computation of a smallest DFA N (where the size |M| of a DFA M is the size of the domain of the transition function) such that their recognized languages differ only on words of length less than k. The previously best algorithm, which runs in time O(|M| log^2 n) where n is the number of states, is extended to DFAs with partial transition functions. Moreover, a faster O(|M| log n) algorithm for DFAs that recognise finite languages is presented. In comparison to the previous algorithm for total DFAs, the new algorithm is much simpler and allows the calculation of a k-minimal DFA for each k in parallel. Secondly, it is demonstrated that calculating the least number of introduced errors is hard: Given a DFA M and numbers k and m, it is NP-hard to decide whether there exists a k-minimal DFA N differing from DFA M on at most m words. A similar result holds for hyper-minimisation of DFAs in general: Given a DFA M and numbers s and m, it is NP-hard to decide whether there exists a DFA N with at most s states such that DFA M and N differ on at msot m words.
△ Less
Submitted 28 February, 2011;
originally announced February 2011.
-
On equations over sets of integers
Authors:
Artur Jeż,
Alexander Okhotin
Abstract:
Systems of equations with sets of integers as unknowns are considered. It is shown that the class of sets representable by unique solutions of equations using the operations of union and addition $S+T=\makeset{m+n}{m \in S, \: n \in T}$ and with ultimately periodic constants is exactly the class of hyper-arithmetical sets. Equations using addition only can represent every hyper-arithmetical set…
▽ More
Systems of equations with sets of integers as unknowns are considered. It is shown that the class of sets representable by unique solutions of equations using the operations of union and addition $S+T=\makeset{m+n}{m \in S, \: n \in T}$ and with ultimately periodic constants is exactly the class of hyper-arithmetical sets. Equations using addition only can represent every hyper-arithmetical set under a simple encoding. All hyper-arithmetical sets can also be represented by equations over sets of natural numbers equipped with union, addition and subtraction $S \dotminus T=\makeset{m-n}{m \in S, \: n \in T, \: m \geqslant n}$. Testing whether a given system has a solution is $Σ^1_1$-complete for each model. These results, in particular, settle the expressive power of the most general types of language equations, as well as equations over subsets of free groups.
△ Less
Submitted 3 February, 2010; v1 submitted 17 January, 2010;
originally announced January 2010.
-
Online validation of the pi and pi' failure functions
Authors:
Pawel Gawrychowski,
Artur Jez,
Lukasz Jez
Abstract:
Let pi_w denote the failure function of the Morris-Pratt algorithm for a word w. In this paper we study the following problem: given an integer array A[1..n], is there a word w over arbitrary alphabet such that A[i]=pi_w[i] for all i? Moreover, what is the minimum required cardinality of the alphabet? We give a real time linear algorithm for this problem in the unit-cost RAM model with Θ(log n)…
▽ More
Let pi_w denote the failure function of the Morris-Pratt algorithm for a word w. In this paper we study the following problem: given an integer array A[1..n], is there a word w over arbitrary alphabet such that A[i]=pi_w[i] for all i? Moreover, what is the minimum required cardinality of the alphabet? We give a real time linear algorithm for this problem in the unit-cost RAM model with Θ(log n) bits word size. Our algorithm returns a word w over minimal alphabet such that pi_w = A as well and uses just o(n) words of memory. Then we consider function pi' instead of pi and give an online O(n log n) algorithm for this case. This is the first polynomial algorithm for online version of this problem.
△ Less
Submitted 13 April, 2009; v1 submitted 19 January, 2009;
originally announced January 2009.
-
Generalized Whac-a-Mole
Authors:
Marcin Bienkowski,
Marek Chrobak,
Christoph Durr,
Mathilde Hurand,
Artur Jez,
Lukasz Jez,
Jakub Lopuszanski,
Grzegorz Stachowiak
Abstract:
We consider online competitive algorithms for the problem of collecting weighted items from a dynamic set S, when items are added to or deleted from S over time. The objective is to maximize the total weight of collected items. We study the general version, as well as variants with various restrictions, including the following: the uniform case, when all items have the same weight, the decrement…
▽ More
We consider online competitive algorithms for the problem of collecting weighted items from a dynamic set S, when items are added to or deleted from S over time. The objective is to maximize the total weight of collected items. We study the general version, as well as variants with various restrictions, including the following: the uniform case, when all items have the same weight, the decremental sets, when all items are present at the beginning and only deletion operations are allowed, and dynamic queues, where the dynamic set is ordered and only its prefixes can be deleted (with no restriction on insertions). The dynamic queue case is a generalization of bounded-delay packet scheduling (also referred to as buffer management). We present several upper and lower bounds on the competitive ratio for these variants.
△ Less
Submitted 16 February, 2008; v1 submitted 12 February, 2008;
originally announced February 2008.