-
On the Representation of Block Languages
Authors:
Guilherme Duarte,
Nelma Moreira,
Luca Prigioniero,
Rogério Reis
Abstract:
In this paper we consider block languages, namely sets of words having the same length, and we propose a new representation for these languages. In particular, given an alphabet of size $k$ and a length $\ell$, these languages can be represented by bitmaps of size $k^\ell$, in which each bit indicates whether the correspondent word, according to the lexicographical order, belongs to the language (…
▽ More
In this paper we consider block languages, namely sets of words having the same length, and we propose a new representation for these languages. In particular, given an alphabet of size $k$ and a length $\ell$, these languages can be represented by bitmaps of size $k^\ell$, in which each bit indicates whether the correspondent word, according to the lexicographical order, belongs to the language (bit equal to 1) or not (bit equal to 0). This representation turns out to be a good tool for the investigation of several properties of block languages, making proofs simpler and reasoning clearer. After showing how to convert bitmaps into minimal deterministic and nondeterministic finite automata, we use this representation as a tool to study the deterministic and nondeterministic state complexity of block languages, as well as the costs of basic operations on block languages, in terms of the sizes of the equivalent finite automata.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Approximate NFA Universality and Related Problems Motivated by Information Theory
Authors:
Stavros Konstantinidis,
Mitja Mastnak,
Nelma Moreira,
Rogério Reis
Abstract:
In coding and information theory, it is desirable to construct maximal codes that can be either variable length codes or error control codes of fixed length. However deciding code maximality boils down to deciding whether a given NFA is universal, and this is a hard problem (including the case of whether the NFA accepts all words of a fixed length). On the other hand, it is acceptable to know whet…
▽ More
In coding and information theory, it is desirable to construct maximal codes that can be either variable length codes or error control codes of fixed length. However deciding code maximality boils down to deciding whether a given NFA is universal, and this is a hard problem (including the case of whether the NFA accepts all words of a fixed length). On the other hand, it is acceptable to know whether a code is `approximately' maximal, which then boils down to whether a given NFA is `approximately' universal. Here we introduce the notion of a $(1-ε)$-universal automaton and present polynomial randomized approximation algorithms to test NFA universality and related hard automata problems, for certain natural probability distributions on the set of words. We also conclude that the randomization aspect is necessary, as approximate universality remains hard for any fixed polynomially computable $ε$.
△ Less
Submitted 11 April, 2022; v1 submitted 3 February, 2022;
originally announced February 2022.
-
On the Uniform Distribution of Regular Expressions
Authors:
Sabine Broda,
António Machiavelo,
Nelma Moreira,
Rogério Reis
Abstract:
Although regular expressions do not correspond univocally to regular languages, it is still worthwhile to study their properties and algorithms. For the average case analysis one often relies on the uniform random generation using a specific grammar for regular expressions, that can represent regular languages with more or less redundancy. Generators that are uniform on the set of expressions are…
▽ More
Although regular expressions do not correspond univocally to regular languages, it is still worthwhile to study their properties and algorithms. For the average case analysis one often relies on the uniform random generation using a specific grammar for regular expressions, that can represent regular languages with more or less redundancy. Generators that are uniform on the set of expressions are not necessarily uniform on the set of regular languages. Nevertheless, it is not straightforward that asymptotic estimates obtained by considering the whole set of regular expressions are different from those obtained using a more refined set that avoids some large class of equivalent expressions. In this paper we study a set of expressions that avoid a given absorbing pattern. It is shown that, although this set is significantly smaller than the standard one, the asymptotic average estimates for the size of the Glushkov automaton for these expressions does not differ from the standard case.
△ Less
Submitted 24 March, 2021;
originally announced March 2021.
-
The computational power of parsing expression grammars
Authors:
Bruno Loff,
Nelma Moreira,
Rogério Reis
Abstract:
We study the computational power of parsing expression grammars (PEGs). We begin by constructing PEGs with unexpected behaviour, and surprising new examples of languages with PEGs, including the language of palindromes whose length is a power of two, and a binary-counting language. We then propose a new computational model, the scaffolding automaton, and prove that it exactly characterises the com…
▽ More
We study the computational power of parsing expression grammars (PEGs). We begin by constructing PEGs with unexpected behaviour, and surprising new examples of languages with PEGs, including the language of palindromes whose length is a power of two, and a binary-counting language. We then propose a new computational model, the scaffolding automaton, and prove that it exactly characterises the computational power of parsing expression grammars (PEGs).
Using this characterisation we show that:
(*) PEGs have unexpected power and semantics. We present several PEGs with surprising behaviour, and languages which, unexpectedly, have PEGs, including a PEG for the language of palindromes whose length is a power of two.
(*) PEGs are computationally `universal', in the following sense: take any computable function $f:\{0,1\}^\ast\to \{0,1\}^\ast$; then there exists a computable function $g: \{0,1\}^\ast \to \mathbb{N}$ such that $\{ f(x) \#^{g(x)} x \mid x \in \{0,1\}^\ast \}$ has a PEG.
(*) There can be no pum** lemma for PEGs. There is no total computable function $A$ with the following property: for every well-formed PEG $G$, there exists $n_0$ such that for every string $x \in \mathcal{L}(G)$ of size $|x| \ge n_0$, the output $y = A(G, x)$ is in $\mathcal{L}(G)$ and has $|y| > |x|$.
(*) PEGs are strongly non real-time for Turing machines. There exists a language with a PEG, such that neither it nor its reverse can be recognised by any multi-tape online Turing machine which is allowed to do only $o(n/\log n)$ steps after reading each input symbol.
△ Less
Submitted 14 February, 2020; v1 submitted 21 February, 2019;
originally announced February 2019.
-
Regular Expressions and Transducers over Alphabet-invariant and User-defined Labels
Authors:
Stavros Konstantinidis,
Nelma Moreira,
Rogerio Reis,
Joshua Young
Abstract:
We are interested in regular expressions and transducers that represent word relations in an alphabet-invariant way---for example, the set of all word pairs u,v where v is a prefix of u independently of what the alphabet is. Current software systems of formal language objects do not have a mechanism to define such objects. We define transducers in which transition labels involve what we call set s…
▽ More
We are interested in regular expressions and transducers that represent word relations in an alphabet-invariant way---for example, the set of all word pairs u,v where v is a prefix of u independently of what the alphabet is. Current software systems of formal language objects do not have a mechanism to define such objects. We define transducers in which transition labels involve what we call set specifications, some of which are alphabet invariant. In fact, we give a more broad definition of automata-type objects, called labelled graphs, where each transition label can be any string, as long as that string represents a subset of a certain monoid. Then, the behaviour of the labelled graph is a subset of that monoid. We do the same for regular expressions. We obtain extensions of a few classic algorithmic constructions on ordinary regular expressions and transducers at the broad level of labelled graphs and in such a way that the computational efficiency of the extended constructions is not sacrificed. For regular expressions with set specs we obtain the corresponding partial derivative automata. For transducers with set specs we obtain further algorithms that can be applied to questions about independent regular languages, in particular the witness version of the independent property satisfaction question.
△ Less
Submitted 4 May, 2018;
originally announced May 2018.
-
Channels with Synchronization/Substitution Errors and Computation of Error Control Codes
Authors:
Stavros Konstantinidis,
Nelma Moreira,
Rogerio Reis
Abstract:
We introduce the concept of an \ff-maximal error-detecting block code, for some parameter \ff{} between 0 and 1, in order to formalize the situation where a block code is close to maximal with respect to being error-detecting. Our motivation for this is that constructing a maximal error-detecting code is a computationally hard problem. We present a randomized algorithm that takes as input two posi…
▽ More
We introduce the concept of an \ff-maximal error-detecting block code, for some parameter \ff{} between 0 and 1, in order to formalize the situation where a block code is close to maximal with respect to being error-detecting. Our motivation for this is that constructing a maximal error-detecting code is a computationally hard problem. We present a randomized algorithm that takes as input two positive integers $N,\ell$, a probability value \ff, and a specification of the errors permitted in some application, and generates an error-detecting, or error-correcting, block code having up to $N$ codewords of length $\ell$. If the algorithm finds less than $N$ codewords, then those codewords constitute a code that is \ff-maximal with high probability. The error specification (also called channel) is modelled as a transducer, which allows one to model any rational combination of substitution and synchronization errors. We also present some elements of our implementation of various error-detecting properties and their associated methods. Then, we show several tests of the implemented randomized algorithm on various channels. A methodological contribution is the presentation of how various desirable error combinations can be expressed formally and processed algorithmically.
△ Less
Submitted 28 July, 2016; v1 submitted 23 January, 2016;
originally announced January 2016.
-
Formalization of context-free language theory
Authors:
Marcus V. M. Ramos,
Ruy J. G. B. de Queiroz,
Nelma Moreira,
José Carlos Bacelar Almeida
Abstract:
Context-free language theory is a subject of high importance in computer language processing technology as well as in formal language theory. This paper presents a formalization, using the Coq proof assistant, of fundamental results related to context-free grammars and languages. These include closure properties (union, concatenation and Kleene star), grammar simplification (elimination of useless…
▽ More
Context-free language theory is a subject of high importance in computer language processing technology as well as in formal language theory. This paper presents a formalization, using the Coq proof assistant, of fundamental results related to context-free grammars and languages. These include closure properties (union, concatenation and Kleene star), grammar simplification (elimination of useless symbols inaccessible symbols, empty rules and unit rules) and the existence of a Chomsky Normal Form for context-free grammars.
△ Less
Submitted 30 October, 2015;
originally announced October 2015.
-
Formalization of the pum** lemma for context-free languages
Authors:
Marcus V. M. Ramos,
Ruy J. G. B. de Queiroz,
Nelma Moreira,
José Carlos Bacelar Almeida
Abstract:
Context-free languages (CFLs) are highly important in computer language processing technology as well as in formal language theory. The Pum** Lemma is a property that is valid for all context-free languages, and is used to show the existence of non context-free languages. This paper presents a formalization, using the Coq proof assistant, of the Pum** Lemma for context-free languages.
Context-free languages (CFLs) are highly important in computer language processing technology as well as in formal language theory. The Pum** Lemma is a property that is valid for all context-free languages, and is used to show the existence of non context-free languages. This paper presents a formalization, using the Coq proof assistant, of the Pum** Lemma for context-free languages.
△ Less
Submitted 15 October, 2015;
originally announced October 2015.
-
A Survey on Operational State Complexity
Authors:
Yuan Gao,
Nelma Moreira,
Rogério Reis,
Sheng Yu
Abstract:
Descriptional complexity is the study of the conciseness of the various models representing formal languages. The state complexity of a regular language is the size, measured by the number of states of the smallest, either deterministic or nondeterministic, finite automaton that recognises it. Operational state complexity is the study of the state complexity of operations over languages. In this s…
▽ More
Descriptional complexity is the study of the conciseness of the various models representing formal languages. The state complexity of a regular language is the size, measured by the number of states of the smallest, either deterministic or nondeterministic, finite automaton that recognises it. Operational state complexity is the study of the state complexity of operations over languages. In this survey, we review the state complexities of individual regularity preserving language operations on regular and some subregular languages. Then we revisit the state complexities of the combination of individual operations. We also review methods of estimation and approximation of state complexity of more complex combined operations.
△ Less
Submitted 10 September, 2015;
originally announced September 2015.
-
Symbolic Manipulation of Code Properties
Authors:
Stavros Konstantinidis,
Casey Meijer,
Nelma Moreira,
Rogério Reis
Abstract:
The FAdo system is a symbolic manipulator of formal languages objects, implemented in Python. In this work, we extend its capabilities by implementing methods to manipulate transducers and we go one level higher than existing formal language systems and implement methods to manipulate objects representing classes of independent languages (widely known as code properties). Our methods allow users t…
▽ More
The FAdo system is a symbolic manipulator of formal languages objects, implemented in Python. In this work, we extend its capabilities by implementing methods to manipulate transducers and we go one level higher than existing formal language systems and implement methods to manipulate objects representing classes of independent languages (widely known as code properties). Our methods allow users to define their own code properties and combine them between themselves or with fixed properties such as prefix codes, suffix codes, error detecting codes, etc. The satisfaction and maximality decision questions are solvable for any of the definable properties. The new online system LaSer allows to query about code properties and obtain the answer in a batch mode. Our work is founded on independence theory as well as the theory of rational relations and transducers and contributes with improveded algorithms on these objects.
△ Less
Submitted 1 August, 2016; v1 submitted 18 April, 2015;
originally announced April 2015.
-
Partial Derivative Automaton for Regular Expressions with Shuffle
Authors:
Sabine Broda,
António Machiavelo,
Nelma Moreira,
Rogério Reis
Abstract:
We generalize the partial derivative automaton to regular expressions with shuffle and study its size in the worst and in the average case. The number of states of the partial derivative automata is in the worst case at most 2^m, where m is the number of letters in the expression, while asymptotically and on average it is no more than (4/3)^m.
We generalize the partial derivative automaton to regular expressions with shuffle and study its size in the worst and in the average case. The number of states of the partial derivative automata is in the worst case at most 2^m, where m is the number of letters in the expression, while asymptotically and on average it is no more than (4/3)^m.
△ Less
Submitted 1 March, 2015;
originally announced March 2015.
-
Distinguishability Operations and Closures on Regular Languages
Authors:
Cezar Câmpeanu,
Nelma Moreira,
Rogério Reis
Abstract:
Given a regular language $L$, we study the language of words $\mathsf{D}(L)$, that distinguish between pairs of different left-quotients of $L$. We characterize this distinguishability operation, show that its iteration has always a fixed point, and we generalize this result to operations derived from closure operators and Boolean operators. We give an upper bound for the state complexity of the d…
▽ More
Given a regular language $L$, we study the language of words $\mathsf{D}(L)$, that distinguish between pairs of different left-quotients of $L$. We characterize this distinguishability operation, show that its iteration has always a fixed point, and we generalize this result to operations derived from closure operators and Boolean operators. We give an upper bound for the state complexity of the distinguishability operation, and prove its tightness. We show that the set of minimal words that can be used to distinguish between different left-quotients of a language $L$ has at most $n-1$ elements, where $n$ is the state complexity of $L$, and we also study the properties of its iteration. We generalize the results for the languages of words that distinguish between pairs of different right-quotients and two-sided quotients of a language $L$.
△ Less
Submitted 10 December, 2014; v1 submitted 1 July, 2014;
originally announced July 2014.
-
Symmetric Groups and Quotient Complexity of Boolean Operations
Authors:
Jason Bell,
Janusz Brzozowski,
Nelma Moreira,
Rogério Reis
Abstract:
The quotient complexity of a regular language L is the number of left quotients of L, which is the same as the state complexity of L. Suppose that L and L' are binary regular languages with quotient complexities m and n, and that the transition semigroups of the minimal deterministic automata accepting L and L' are the symmetric groups S_m and S_n of degrees m and n, respectively. Denote by o any…
▽ More
The quotient complexity of a regular language L is the number of left quotients of L, which is the same as the state complexity of L. Suppose that L and L' are binary regular languages with quotient complexities m and n, and that the transition semigroups of the minimal deterministic automata accepting L and L' are the symmetric groups S_m and S_n of degrees m and n, respectively. Denote by o any binary boolean operation that is not a constant and not a function of one argument only. For m,n >= 2 with (m,n) not in {(2,2),(3,4),(4,3),(4,4)} we prove that the quotient complexity of LoL' is mn if and only either (a) m is not equal to n or (b) m=n and the bases (ordered pairs of generators) of S_m and S_n are not conjugate. For (m,n)\in {(2,2),(3,4),(4,3),(4,4)} we give examples to show that this need not hold. In proving these results we generalize the notion of uniform minimality to direct products of automata. We also establish a non-trivial connection between complexity of boolean operations and group theory.
△ Less
Submitted 7 October, 2013;
originally announced October 2013.
-
Incomplete Transition Complexity of Basic Operations on Finite Languages
Authors:
Eva Maia,
Nelma Moreira,
Rogério Reis
Abstract:
The state complexity of basic operations on finite languages (considering complete DFAs) has been in studied the literature. In this paper we study the incomplete (deterministic) state and transition complexity on finite languages of boolean operations, concatenation, star, and reversal. For all operations we give tight upper bounds for both description measures. We correct the published state com…
▽ More
The state complexity of basic operations on finite languages (considering complete DFAs) has been in studied the literature. In this paper we study the incomplete (deterministic) state and transition complexity on finite languages of boolean operations, concatenation, star, and reversal. For all operations we give tight upper bounds for both description measures. We correct the published state complexity of concatenation for complete DFAs and provide a tight upper bound for the case when the right automaton is larger than the left one. For all binary operations the tightness is proved using family languages with a variable alphabet size. In general the operational complexities depend not only on the complexities of the operands but also on other refined measures.
△ Less
Submitted 4 February, 2013;
originally announced February 2013.
-
Deciding KAT and Hoare Logic with Derivatives
Authors:
Ricardo Almeida,
Sabine Broda,
Nelma Moreira
Abstract:
Kleene algebra with tests (KAT) is an equational system for program verification, which is the combination of Boolean algebra (BA) and Kleene algebra (KA), the algebra of regular expressions. In particular, KAT subsumes the propositional fragment of Hoare logic (PHL) which is a formal system for the specification and verification of programs, and that is currently the base of most tools for check…
▽ More
Kleene algebra with tests (KAT) is an equational system for program verification, which is the combination of Boolean algebra (BA) and Kleene algebra (KA), the algebra of regular expressions. In particular, KAT subsumes the propositional fragment of Hoare logic (PHL) which is a formal system for the specification and verification of programs, and that is currently the base of most tools for checking program correctness. Both the equational theory of KAT and the encoding of PHL in KAT are known to be decidable. In this paper we present a new decision procedure for the equivalence of two KAT expressions based on the notion of partial derivatives. We also introduce the notion of derivative modulo particular sets of equations. With this we extend the previous procedure for deciding PHL. Some experimental results are also presented.
△ Less
Submitted 8 October, 2012;
originally announced October 2012.
-
Small NFAs from Regular Expressions: Some Experimental Results
Authors:
Hugo Gouveia,
Nelma Moreira,
Rogério Reis
Abstract:
Regular expressions (res), because of their succinctness and clear syntax, are the common choice to represent regular languages. However, efficient pattern matching or word recognition depend on the size of the equivalent nondeterministic finite automata (NFA). We present the implementation of several algorithms for constructing small epsilon-free NFAss from res within the FAdo system, and a compa…
▽ More
Regular expressions (res), because of their succinctness and clear syntax, are the common choice to represent regular languages. However, efficient pattern matching or word recognition depend on the size of the equivalent nondeterministic finite automata (NFA). We present the implementation of several algorithms for constructing small epsilon-free NFAss from res within the FAdo system, and a comparison of regular expression measures and NFA sizes based on experimental results obtained from uniform random generated res. For this analysis, nonredundant res and reduced res in star normal form were considered.
△ Less
Submitted 18 September, 2010;
originally announced September 2010.
-
State Elimination Ordering Strategies: Some Experimental Results
Authors:
Nelma Moreira,
Davide Nabais,
Rogério Reis
Abstract:
Recently, the problem of obtaining a short regular expression equivalent to a given finite automaton has been intensively investigated. Algorithms for converting finite automata to regular expressions have an exponential blow-up in the worst-case. To overcome this, simple heuristic methods have been proposed.
In this paper we analyse some of the heuristics presented in the literature and prop…
▽ More
Recently, the problem of obtaining a short regular expression equivalent to a given finite automaton has been intensively investigated. Algorithms for converting finite automata to regular expressions have an exponential blow-up in the worst-case. To overcome this, simple heuristic methods have been proposed.
In this paper we analyse some of the heuristics presented in the literature and propose new ones. We also present some experimental comparative results based on uniform random generated deterministic finite automata.
△ Less
Submitted 10 August, 2010;
originally announced August 2010.
-
Exact generation of acyclic deterministic finite automata
Authors:
Marco Almeida,
Nelma Moreira,
Rogério Reis
Abstract:
We give a canonical representation for trim acyclic deterministic finite automata (Adfa) with n states over an alphabet of k symbols. Using this normal form, we present a backtracking algorithm for the exact generation of Adfas. This algorithm is a non trivial adaptation of the algorithm for the exact generation of minimal acyclic deterministic finite automata, presented by Almeida et al.
We give a canonical representation for trim acyclic deterministic finite automata (Adfa) with n states over an alphabet of k symbols. Using this normal form, we present a backtracking algorithm for the exact generation of Adfas. This algorithm is a non trivial adaptation of the algorithm for the exact generation of minimal acyclic deterministic finite automata, presented by Almeida et al.
△ Less
Submitted 23 August, 2009;
originally announced August 2009.
-
Testing the Equivalence of Regular Languages
Authors:
Marco Almeida,
Nelma Moreira,
Rogério Reis
Abstract:
The minimal deterministic finite automaton is generally used to determine regular languages equality. Antimirov and Mosses proposed a rewrite system for deciding regular expressions equivalence of which Almeida et al. presented an improved variant. Hopcroft and Karp proposed an almost linear algorithm for testing the equivalence of two deterministic finite automata that avoids minimisation. In t…
▽ More
The minimal deterministic finite automaton is generally used to determine regular languages equality. Antimirov and Mosses proposed a rewrite system for deciding regular expressions equivalence of which Almeida et al. presented an improved variant. Hopcroft and Karp proposed an almost linear algorithm for testing the equivalence of two deterministic finite automata that avoids minimisation. In this paper we improve the best-case running time, present an extension of this algorithm to non-deterministic finite automata, and establish a relationship between this algorithm and the one proposed in Almeida et al. We also present some experimental comparative results. All these algorithms are closely related with the recent coalgebraic approach to automata proposed by Rutten.
△ Less
Submitted 29 July, 2009;
originally announced July 2009.
-
Aspects of enumeration and generation with a string automata representation
Authors:
Marco Almeida,
Nelma Moreira,
Rogério Reis
Abstract:
In general, the representation of combinatorial objects is decisive for the feasibility of several enumerative tasks. In this work, we show how a (unique) string representation for (complete) initially-connected deterministic automata (ICDFAs) with n states over an alphabet of k symbols can be used for counting, exact enumeration, sampling and optimal coding, not only the set of ICDFAs but, to s…
▽ More
In general, the representation of combinatorial objects is decisive for the feasibility of several enumerative tasks. In this work, we show how a (unique) string representation for (complete) initially-connected deterministic automata (ICDFAs) with n states over an alphabet of k symbols can be used for counting, exact enumeration, sampling and optimal coding, not only the set of ICDFAs but, to some extent, the set of regular languages. An exact generation algorithm can be used to partition the set of ICDFAs in order to parallelize the counting of minimal automata (and thus of regular languages). We present also a uniform random generator for ICDFAs that uses a table of pre-calculated values. Based on the same table it is also possible to obtain an optimal coding for ICDFAs.
△ Less
Submitted 21 June, 2009;
originally announced June 2009.
-
On the Representation of Finite Automata
Authors:
Rogério Reis,
Nelma Moreira,
Marco Almeida
Abstract:
We give an unique string representation, up to isomorphism, for initially connected deterministic finite automata (ICDFAs) with n states over an alphabet of k symbols. We show how to generate all these strings for each n and k, and how its enumeration provides an alternative way to obtain the exact number of ICDFAs.
We give an unique string representation, up to isomorphism, for initially connected deterministic finite automata (ICDFAs) with n states over an alphabet of k symbols. We show how to generate all these strings for each n and k, and how its enumeration provides an alternative way to obtain the exact number of ICDFAs.
△ Less
Submitted 13 June, 2009;
originally announced June 2009.
-
Constraint Categorial Grammars
Authors:
Luis Damas,
Nelma Moreira
Abstract:
Although unification can be used to implement a weak form of $β$-reduction, several linguistic phenomena are better handled by using some form of $λ$-calculus. In this paper we present a higher order feature description calculus based on a typed $λ$-calculus. We show how the techniques used in \CLG for resolving complex feature constraints can be efficiently extended. \CCLG is a simple formalism…
▽ More
Although unification can be used to implement a weak form of $β$-reduction, several linguistic phenomena are better handled by using some form of $λ$-calculus. In this paper we present a higher order feature description calculus based on a typed $λ$-calculus. We show how the techniques used in \CLG for resolving complex feature constraints can be efficiently extended. \CCLG is a simple formalism, based on categorial grammars, designed to test the practical feasibility of such a calculus.
△ Less
Submitted 4 July, 1995;
originally announced July 1995.