Search | arXiv e-print repository

State Complexity of Pattern Matching in Regular Languages

Authors: Janusz A. Brzozowski, Sylvie Davies, Abhishek Madan

Abstract: In a simple pattern matching problem one has a pattern $w$ and a text $t$, which are words over a finite alphabet $Σ$. One may ask whether $w$ occurs in $t$, and if so, where? More generally, we may have a set $P$ of patterns and a set $T$ of texts, where $P$ and $T$ are regular languages. We are interested whether any word of $T$ begins with a word of $P$, ends with a word of $P$, has a word of… ▽ More In a simple pattern matching problem one has a pattern $w$ and a text $t$, which are words over a finite alphabet $Σ$. One may ask whether $w$ occurs in $t$, and if so, where? More generally, we may have a set $P$ of patterns and a set $T$ of texts, where $P$ and $T$ are regular languages. We are interested whether any word of $T$ begins with a word of $P$, ends with a word of $P$, has a word of $P$ as a factor, or has a word of $P$ as a subsequence. Thus we are interested in the languages $(PΣ^*)\cap T$, $(Σ^*P)\cap T$, $(Σ^* PΣ^*)\cap T$, and $(Σ^* \mathbin{\operatorname{shu}} P)\cap T$, where $\operatorname{shu}$ is the shuffle operation. The state complexity $κ(L)$ of a regular language $L$ is the number of states in the minimal deterministic finite automaton recognizing $L$. We derive the following upper bounds on the state complexities of our pattern-matching languages, where $κ(P)\le m$, and $κ(T)\le n$: $κ((PΣ^*)\cap T) \le mn$; $κ((Σ^*P)\cap T) \le 2^{m-1}n$; $κ((Σ^*PΣ^*)\cap T) \le (2^{m-2}+1)n$; and $κ((Σ^*\mathbin{\operatorname{shu}} P)\cap T) \le (2^{m-2}+1)n$. We prove that these bounds are tight, and that to meet them, the alphabet must have at least two letters in the first three cases, and at least $m-1$ letters in the last case. We also consider the special case where $P$ is a single word $w$, and obtain the following tight upper bounds: $κ((wΣ^*)\cap T_n) \le m+n-1$; $κ((Σ^*w)\cap T_n) \le (m-1)n-(m-2)$; $κ((Σ^*wΣ^*)\cap T_n) \le (m-1)n$; and $κ((Σ^*\mathbin{\operatorname{shu}} w)\cap T_n) \le (m-1)n$. For unary languages, we have a tight upper bound of $m+n-2$ in all eight of the aforementioned cases. △ Less

Submitted 4 November, 2018; v1 submitted 12 June, 2018; originally announced June 2018.

Comments: 30 pages, 17 figures

arXiv:1711.09149 [pdf, ps, other]

Most Complex Deterministic Union-Free Regular Languages

Authors: Janusz A. Brzozowski, Sylvie Davies

Abstract: A regular language $L$ is union-free if it can be represented by a regular expression without the union operation. A union-free language is deterministic if it can be accepted by a deterministic one-cycle-free-path finite automaton; this is an automaton which has one final state and exactly one cycle-free path from any state to the final state. Jirásková and Masopust proved that the state complexi… ▽ More A regular language $L$ is union-free if it can be represented by a regular expression without the union operation. A union-free language is deterministic if it can be accepted by a deterministic one-cycle-free-path finite automaton; this is an automaton which has one final state and exactly one cycle-free path from any state to the final state. Jirásková and Masopust proved that the state complexities of the basic operations reversal, star, product, and boolean operations in deterministic union-free languages are exactly the same as those in the class of all regular languages. To prove that the bounds are met they used five types of automata, involving eight types of transformations of the set of states of the automata. We show that for each $n\ge 3$ there exists one ternary witness of state complexity $n$ that meets the bound for reversal and product. Moreover, the restrictions of this witness to binary alphabets meet the bounds for star and boolean operations. We also show that the tight upper bounds on the state complexity of binary operations that take arguments over different alphabets are the same as those for arbitrary regular languages. Furthermore, we prove that the maximal syntactic semigroup of a union-free language has $n^n$ elements, as in the case of regular languages, and that the maximal state complexities of atoms of union-free languages are the same as those for regular languages. Finally, we prove that there exists a most complex union-free language that meets the bounds for all these complexity measures. Altogether this proves that the complexity measures above cannot distinguish union-free languages from regular languages. △ Less

Submitted 2 January, 2018; v1 submitted 24 November, 2017; originally announced November 2017.

Comments: 12 pages, 3 Figures. This version corrects an error in the proof of Theorem 1 (7c). arXiv admin note: text overlap with arXiv:1701.03944

arXiv:1702.05024 [pdf, ps, other]

Towards a Theory of Complexity of Regular Languages

Authors: Janusz A. Brzozowski

Abstract: We survey recent results concerning the complexity of regular languages represented by their minimal deterministic finite automata. In addition to the quotient complexity of the language -- which is the number of its (left) quotients, and is the same as its state complexity -- we also consider the size of its syntactic semigroup and the quotient complexity of its atoms -- basic components of every… ▽ More We survey recent results concerning the complexity of regular languages represented by their minimal deterministic finite automata. In addition to the quotient complexity of the language -- which is the number of its (left) quotients, and is the same as its state complexity -- we also consider the size of its syntactic semigroup and the quotient complexity of its atoms -- basic components of every regular language. We then turn to the study of the quotient/state complexity of common operations on regular languages: reversal, (Kleene) star, product (concatenation) and boolean operations. We examine relations among these complexity measures. We discuss several subclasses of regular languages defined by convexity. In many, but not all, cases there exist "most complex" languages, languages satisfying all these complexity measures. △ Less

Submitted 16 February, 2017; originally announced February 2017.

Comments: 27 pages, 4 figures

arXiv:1701.03944 [pdf, ps, other]

Most Complex Non-Returning Regular Languages

Authors: Janusz A. Brzozowski, Sylvie Davies

Abstract: A regular language $L$ is non-returning if in the minimal deterministic finite automaton accepting it there are no transitions into the initial state. Eom, Han and Jirásková derived upper bounds on the state complexity of boolean operations and Kleene star, and proved that these bounds are tight using two different binary witnesses. They derived upper bounds for concatenation and reversal using th… ▽ More A regular language $L$ is non-returning if in the minimal deterministic finite automaton accepting it there are no transitions into the initial state. Eom, Han and Jirásková derived upper bounds on the state complexity of boolean operations and Kleene star, and proved that these bounds are tight using two different binary witnesses. They derived upper bounds for concatenation and reversal using three different ternary witnesses. These five witnesses use a total of six different transformations. We show that for each $n\ge 4$ there exists a ternary witness of state complexity $n$ that meets the bound for reversal and that at least three letters are needed to meet this bound. Moreover, the restrictions of this witness to binary alphabets meet the bounds for product, star, and boolean operations. We also derive tight upper bounds on the state complexity of binary operations that take arguments with different alphabets. We prove that the maximal syntactic semigroup of a non-returning language has $(n-1)^n$ elements and requires at least $\binom{n}{2}$ generators. We find the maximal state complexities of atoms of non-returning languages. Finally, we show that there exists a most complex non-returning language that meets the bounds for all these complexity measures. △ Less

Submitted 14 January, 2017; originally announced January 2017.

Comments: 22 pages, 6 figures

arXiv:1509.06032 [pdf, ps, other]

Syntactic complexity of regular ideals

Authors: Janusz A. Brzozowski, Marek Szykuła, Yuli Ye

Abstract: The state complexity of a regular language is the number of states in a minimal deterministic finite automaton accepting the language. The syntactic complexity of a regular language is the cardinality of its syntactic semigroup. The syntactic complexity of a subclass of regular languages is the worst-case syntactic complexity taken as a function of the state complexity $n$ of languages in that cla… ▽ More The state complexity of a regular language is the number of states in a minimal deterministic finite automaton accepting the language. The syntactic complexity of a regular language is the cardinality of its syntactic semigroup. The syntactic complexity of a subclass of regular languages is the worst-case syntactic complexity taken as a function of the state complexity $n$ of languages in that class. We prove that $n^{n-1}$, $n^{n-1}+n-1$, and $n^{n-2}+(n-2)2^{n-2}+1$ are tight upper bounds on the syntactic complexities of right ideals and prefix-closed languages, left ideals and suffix-closed languages, and two-sided ideals and factor-closed languages, respectively. Moreover, we show that the transition semigroups meeting the upper bounds for all three types of ideals are unique, and the numbers of generators (4, 5, and 6, respectively) cannot be reduced. △ Less

Submitted 13 January, 2017; v1 submitted 20 September, 2015; originally announced September 2015.

Comments: 26 pages, 13 figures, 1 table. arXiv admin note: text overlap with arXiv:1403.2090

Showing 1–5 of 5 results for author: Brzozowski, J A