Search | arXiv e-print repository

$k$-Universality of Regular Languages

Authors: Duncan Adamson, Pamela Fleischmann, Annika Huch, Tore Koß, Florin Manea, Dirk Nowotka

Abstract: A subsequence of a word $w$ is a word $u$ such that $u = w[i_1] w[i_2] \dots w[i_{k}]$, for some set of indices $1 \leq i_1 < i_2 < \dots < i_k \leq \lvert w\rvert$. A word $w$ is $k$-subsequence universal over an alphabet $Σ$ if every word in $Σ^k$ appears in $w$ as a subsequence. In this paper, we study the intersection between the set of $k$-subsequence universal words over some alphabet $Σ$ an… ▽ More A subsequence of a word $w$ is a word $u$ such that $u = w[i_1] w[i_2] \dots w[i_{k}]$, for some set of indices $1 \leq i_1 < i_2 < \dots < i_k \leq \lvert w\rvert$. A word $w$ is $k$-subsequence universal over an alphabet $Σ$ if every word in $Σ^k$ appears in $w$ as a subsequence. In this paper, we study the intersection between the set of $k$-subsequence universal words over some alphabet $Σ$ and regular languages over $Σ$. We call a regular language $L$ \emph{$k$-$\exists$-subsequence universal} if there exists a $k$-subsequence universal word in $L$, and \emph{$k$-$\forall$-subsequence universal} if every word of $L$ is $k$-subsequence universal. We give algorithms solving the problems of deciding if a given regular language, represented by a finite automaton recognising it, is \emph{$k$-$\exists$-subsequence universal} and, respectively, if it is \emph{$k$-$\forall$-subsequence universal}, for a given $k$. The algorithms are FPT w.r.t.~the size of the input alphabet, and their run-time does not depend on $k$; they run in polynomial time in the number $n$ of states of the input automaton when the size of the input alphabet is $O(\log n)$. Moreover, we show that the problem of deciding if a given regular language is \emph{$k$-$\exists$-subsequence universal} is NP-complete, when the language is over a large alphabet. Further, we provide algorithms for counting the number of $k$-subsequence universal words (paths) accepted by a given deterministic (respectively, nondeterministic) finite automaton, and ranking an input word (path) within the set of $k$-subsequence universal words accepted by a given finite automaton. △ Less

Submitted 17 November, 2023; originally announced November 2023.

arXiv:2308.08374 [pdf, other]

Matching Patterns with Variables Under Simon's Congruence

Authors: Pamela Fleischmann, Sungmin Kim, Tore Koß, Florin Manea, Dirk Nowotka, Stefan Siemer, Max Wiedenhöft

Abstract: We introduce and investigate a series of matching problems for patterns with variables under Simon's congruence. Our results provide a thorough picture of these problems' computational complexity. We introduce and investigate a series of matching problems for patterns with variables under Simon's congruence. Our results provide a thorough picture of these problems' computational complexity. △ Less

Submitted 16 August, 2023; originally announced August 2023.

ACM Class: F.4.3; E.1

arXiv:2304.05270 [pdf, ps, other]

Longest Common Subsequence with Gap Constraints

Authors: Duncan Adamson, Maria Kosche, Tore Koß, Florin Manea, Stefan Siemer

Abstract: We consider the longest common subsequence problem in the context of subsequences with gap constraints. In particular, following Day et al. 2022, we consider the setting when the distance (i. e., the gap) between two consecutive symbols of the subsequence has to be between a lower and an upper bound (which may depend on the position of those symbols in the subsequence or on the symbols bordering t… ▽ More We consider the longest common subsequence problem in the context of subsequences with gap constraints. In particular, following Day et al. 2022, we consider the setting when the distance (i. e., the gap) between two consecutive symbols of the subsequence has to be between a lower and an upper bound (which may depend on the position of those symbols in the subsequence or on the symbols bordering the gap) as well as the case where the entire subsequence is found in a bounded range (defined by a single upper bound), considered by Kosche et al. 2022. In all these cases, we present effcient algorithms for determining the length of the longest common constrained subsequence between two given strings. △ Less

Submitted 2 June, 2023; v1 submitted 11 April, 2023; originally announced April 2023.

arXiv:2208.14722 [pdf, ps, other]

Combinatorial Algorithms for Subsequence Matching: A Survey

Authors: Maria Kosche, Tore Koß, Florin Manea, Stefan Siemer

Abstract: In this paper we provide an overview of a series of recent results regarding algorithms for searching for subsequences in words or for the analysis of the sets of subsequences occurring in a word. In this paper we provide an overview of a series of recent results regarding algorithms for searching for subsequences in words or for the analysis of the sets of subsequences occurring in a word. △ Less

Submitted 10 October, 2022; v1 submitted 31 August, 2022; originally announced August 2022.

Comments: This is a revised version of the paper with the same title which appeared in the Proceedings of NCMA 2022, EPTCS 367, 2022, pp. 11-27 (DOI: 10.4204/EPTCS.367.2). The revision consists in citing a series of relevant references which were not covered in the initial version, and commenting on how they relate to the results we survey. arXiv admin note: text overlap with arXiv:2206.13896

arXiv:2207.09201 [pdf, ps, other]

Subsequences in Bounded Ranges: Matching and Analysis Problems

Authors: Maria Kosche, Tore Koß, Florin Manea, Viktoriya Pak

Abstract: In this paper, we consider a variant of the classical algorithmic problem of checking whether a given word $v$ is a subsequence of another word $w$. More precisely, we consider the problem of deciding, given a number $p$ (defining a range-bound) and two words $v$ and $w$, whether there exists a factor $w[i:i+p-1]$ (or, in other words, a range of length $p$) of $w$ having $v$ as subsequence (i.\,e.… ▽ More In this paper, we consider a variant of the classical algorithmic problem of checking whether a given word $v$ is a subsequence of another word $w$. More precisely, we consider the problem of deciding, given a number $p$ (defining a range-bound) and two words $v$ and $w$, whether there exists a factor $w[i:i+p-1]$ (or, in other words, a range of length $p$) of $w$ having $v$ as subsequence (i.\,e., $v$ occurs as a subsequence in the bounded range $w[i:i+p-1]$). We give matching upper and lower quadratic bounds for the time complexity of this problem. Further, we consider a series of algorithmic problems in this setting, in which, for given integers $k$, $p$ and a word $w$, we analyse the set $p$-Subseq$_{k}(w)$ of all words of length $k$ which occur as subsequence of some factor of length $p$ of $w$. Among these, we consider the $k$-universality problem, the $k$-equivalence problem, as well as problems related to absent subsequences. Surprisingly, unlike the case of the classical model of subsequences in words where such problems have efficient solutions in general, we show that most of these problems become intractable in the new setting when subsequences in bounded ranges are considered. Finally, we provide an example of how some of our results can be applied to subsequence matching problems for circular words. △ Less

Submitted 22 September, 2022; v1 submitted 19 July, 2022; originally announced July 2022.

Comments: Extended version of a paper which will appear in the proceedings of the 16th International Conference on Reachability Problems, RP 2022

arXiv:2108.13968 [pdf, other]

Absent Subsequences in Words

Authors: Maria Kosche, Tore Koß, Florin Manea, Stefan Siemer

Abstract: An absent factor of a string $w$ is a string $u$ which does not occur as a contiguous substring (a.k.a. factor) inside $w$. We extend this well-studied notion and define absent subsequences: a string $u$ is an absent subsequence of a string $w$ if $u$ does not occur as subsequence (a.k.a. scattered factor) inside $w$. Of particular interest to us are minimal absent subsequences, i.e., absent subse… ▽ More An absent factor of a string $w$ is a string $u$ which does not occur as a contiguous substring (a.k.a. factor) inside $w$. We extend this well-studied notion and define absent subsequences: a string $u$ is an absent subsequence of a string $w$ if $u$ does not occur as subsequence (a.k.a. scattered factor) inside $w$. Of particular interest to us are minimal absent subsequences, i.e., absent subsequences whose every subsequence is not absent, and shortest absent subsequences, i.e., absent subsequences of minimal length. We show a series of combinatorial and algorithmic results regarding these two notions. For instance: we give combinatorial characterisations of the sets of minimal and, respectively, shortest absent subsequences in a word, as well as compact representations of these sets; we show how we can test efficiently if a string is a shortest or minimal absent subsequence in a word, and we give efficient algorithms computing the lexicographically smallest absent subsequence of each kind; also, we show how a data structure for answering shortest absent subsequence-queries for the factors of a given string can be efficiently computed. △ Less

Submitted 11 October, 2023; v1 submitted 31 August, 2021; originally announced August 2021.

Comments: An extended abstract appeared in the proceedings of the 15th International Conference on Reachability Problems RP2021

Journal ref: Fundamenta Informaticae, Volume 189, Issues 3-4: Reachability Problems 2020 and 2021 (October 14, 2023) fi:9221

arXiv:2007.09192 [pdf, ps, other]

The Edit Distance to $k$-Subsequence Universality

Authors: Pamela Fleischmann, Maria Kosche, Tore Koß, Florin Manea, Stefan Siemer

Abstract: A word $u$ is a subsequence of another word $w$ if $u$ can be obtained from $w$ by deleting some of its letters. The word $w$ with alph$(w)=Σ$ is called $k$-subsequence universal if the set of subsequences of length $k$ of $w$ contains all possible words of length $k$ over $Σ$. We propose a series of efficient algorithms computing the minimal number of edit operations (insertion, deletion, substit… ▽ More A word $u$ is a subsequence of another word $w$ if $u$ can be obtained from $w$ by deleting some of its letters. The word $w$ with alph$(w)=Σ$ is called $k$-subsequence universal if the set of subsequences of length $k$ of $w$ contains all possible words of length $k$ over $Σ$. We propose a series of efficient algorithms computing the minimal number of edit operations (insertion, deletion, substitution) one needs to apply to a given word in order to reach the set of $k$-subsequence universal words. △ Less

Submitted 17 July, 2020; originally announced July 2020.

arXiv:2005.01112 [pdf, other]

doi 10.4230/LIPIcs.STACS.2021.34

Efficiently Testing Simon's Congruence

Authors: Pawel Gawrychowski, Maria Kosche, Tore Koss, Florin Manea, Stefan Siemer

Abstract: Simon's congruence $\sim_k$ is defined as follows: two words are $\sim_k$-equivalent if they have the same set of subsequences of length at most $k$. We propose an algorithm which computes, given two words $s$ and $t$, the largest $k$ for which $s\sim_k t$. Our algorithm runs in linear time $O(|s|+|t|)$ when the input words are over the integer alphabet $\{1,\ldots,|s|+|t|\}$ (or other alphabets w… ▽ More Simon's congruence $\sim_k$ is defined as follows: two words are $\sim_k$-equivalent if they have the same set of subsequences of length at most $k$. We propose an algorithm which computes, given two words $s$ and $t$, the largest $k$ for which $s\sim_k t$. Our algorithm runs in linear time $O(|s|+|t|)$ when the input words are over the integer alphabet $\{1,\ldots,|s|+|t|\}$ (or other alphabets which can be sorted in linear time). This approach leads to an optimal algorithm in the case of general alphabets as well. Our results are based on a novel combinatorial approach and a series of efficient data structures. △ Less

Submitted 15 March, 2021; v1 submitted 3 May, 2020; originally announced May 2020.

Showing 1–8 of 8 results for author: Koss, T