Search | arXiv e-print repository

Subsequences With Generalised Gap Constraints: Upper and Lower Complexity Bounds

Authors: Florin Manea, Jonas Richardsen, Markus L. Schmid

Abstract: For two strings u, v over some alphabet A, we investigate the problem of embedding u into w as a subsequence under the presence of generalised gap constraints. A generalised gap constraint is a triple (i, j, C_{i, j}), where 1 <= i < j <= |u| and C_{i, j} is a subset of A^*. Embedding u as a subsequence into v such that (i, j, C_{i, j}) is satisfied means that if u[i] and u[j] are mapped to v[k] a… ▽ More For two strings u, v over some alphabet A, we investigate the problem of embedding u into w as a subsequence under the presence of generalised gap constraints. A generalised gap constraint is a triple (i, j, C_{i, j}), where 1 <= i < j <= |u| and C_{i, j} is a subset of A^*. Embedding u as a subsequence into v such that (i, j, C_{i, j}) is satisfied means that if u[i] and u[j] are mapped to v[k] and v[l], respectively, then the induced gap v[k + 1..l - 1] must be a string from C_{i, j}. This generalises the setting recently investigated in [Day et al., ISAAC 2022], where only gap constraints of the form C_{i, i + 1} are considered, as well as the setting from [Kosche et al., RP 2022], where only gap constraints of the form C_{1, |u|} are considered. We show that subsequence matching under generalised gap constraints is NP-hard, and we complement this general lower bound with a thorough (parameterised) complexity analysis. Moreover, we identify several efficiently solvable subclasses that result from restricting the interval structure induced by the generalised gap constraints. △ Less

Submitted 16 April, 2024; originally announced April 2024.

arXiv:2401.17159 [pdf, other]

Layered and Staged Monte Carlo Tree Search for SMT Strategy Synthesis

Authors: Zhengyang Lu, Stefan Siemer, Piyush Jha, Joel Day, Florin Manea, Vijay Ganesh

Abstract: Modern SMT solvers, such as Z3, offer user-controllable strategies, enabling users to tailor solving strategies for their unique set of instances, thus dramatically enhancing solver performance for their use case. However, this approach of strategy customization presents a significant challenge: handcrafting an optimized strategy for a class of SMT instances remains a complex and demanding task fo… ▽ More Modern SMT solvers, such as Z3, offer user-controllable strategies, enabling users to tailor solving strategies for their unique set of instances, thus dramatically enhancing solver performance for their use case. However, this approach of strategy customization presents a significant challenge: handcrafting an optimized strategy for a class of SMT instances remains a complex and demanding task for both solver developers and users alike. In this paper, we address this problem of automatic SMT strategy synthesis via a novel Monte Carlo Tree Search (MCTS) based method. Our method treats strategy synthesis as a sequential decision-making process, whose search tree corresponds to the strategy space, and employs MCTS to navigate this vast search space. The key innovations that enable our method to identify effective strategies, while kee** costs low, are the ideas of layered and staged MCTS search. These novel heuristics allow for a deeper and more efficient exploration of the strategy space, enabling us to synthesize more effective strategies than the default ones in state-of-the-art (SOTA) SMT solvers. We implement our method, dubbed Z3alpha, as part of the Z3 SMT solver. Through extensive evaluations across six important SMT logics, Z3alpha demonstrates superior performance compared to the SOTA synthesis tool FastSMT, the default Z3 solver, and the CVC5 solver on most benchmarks. Remarkably, on a challenging QF_BV benchmark set, Z3alpha solves 42.7% more instances than the default strategy in the Z3 SMT solver. △ Less

Submitted 30 April, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

Comments: Accepted at IJCAI 2024

arXiv:2401.02163 [pdf, other]

Enumerating m-Length Walks in Directed Graphs with Constant Delay

Authors: Duncan Adamson, Pawel Gawrychowski, Florin Manea

Abstract: In this paper, we provide a novel enumeration algorithm for the set of all walks of a given length within a directed graph. Our algorithm has worst-case constant delay between outputting succinct representations of such walks, after a preprocessing step requiring linear time relative to the size of the graph. We apply these results to the problem of enumerating succinct representations of the stri… ▽ More In this paper, we provide a novel enumeration algorithm for the set of all walks of a given length within a directed graph. Our algorithm has worst-case constant delay between outputting succinct representations of such walks, after a preprocessing step requiring linear time relative to the size of the graph. We apply these results to the problem of enumerating succinct representations of the strings of a given length from a prefix-closed regular language (languages accepted by a finite automaton which has final states only). △ Less

Submitted 5 January, 2024; v1 submitted 4 January, 2024; originally announced January 2024.

arXiv:2311.10658 [pdf, other]

$k$-Universality of Regular Languages

Authors: Duncan Adamson, Pamela Fleischmann, Annika Huch, Tore Koß, Florin Manea, Dirk Nowotka

Abstract: A subsequence of a word $w$ is a word $u$ such that $u = w[i_1] w[i_2] \dots w[i_{k}]$, for some set of indices $1 \leq i_1 < i_2 < \dots < i_k \leq \lvert w\rvert$. A word $w$ is $k$-subsequence universal over an alphabet $Σ$ if every word in $Σ^k$ appears in $w$ as a subsequence. In this paper, we study the intersection between the set of $k$-subsequence universal words over some alphabet $Σ$ an… ▽ More A subsequence of a word $w$ is a word $u$ such that $u = w[i_1] w[i_2] \dots w[i_{k}]$, for some set of indices $1 \leq i_1 < i_2 < \dots < i_k \leq \lvert w\rvert$. A word $w$ is $k$-subsequence universal over an alphabet $Σ$ if every word in $Σ^k$ appears in $w$ as a subsequence. In this paper, we study the intersection between the set of $k$-subsequence universal words over some alphabet $Σ$ and regular languages over $Σ$. We call a regular language $L$ \emph{$k$-$\exists$-subsequence universal} if there exists a $k$-subsequence universal word in $L$, and \emph{$k$-$\forall$-subsequence universal} if every word of $L$ is $k$-subsequence universal. We give algorithms solving the problems of deciding if a given regular language, represented by a finite automaton recognising it, is \emph{$k$-$\exists$-subsequence universal} and, respectively, if it is \emph{$k$-$\forall$-subsequence universal}, for a given $k$. The algorithms are FPT w.r.t.~the size of the input alphabet, and their run-time does not depend on $k$; they run in polynomial time in the number $n$ of states of the input automaton when the size of the input alphabet is $O(\log n)$. Moreover, we show that the problem of deciding if a given regular language is \emph{$k$-$\exists$-subsequence universal} is NP-complete, when the language is over a large alphabet. Further, we provide algorithms for counting the number of $k$-subsequence universal words (paths) accepted by a given deterministic (respectively, nondeterministic) finite automaton, and ranking an input word (path) within the set of $k$-subsequence universal words accepted by a given finite automaton. △ Less

Submitted 17 November, 2023; originally announced November 2023.

arXiv:2308.08374 [pdf, other]

Matching Patterns with Variables Under Simon's Congruence

Authors: Pamela Fleischmann, Sungmin Kim, Tore Koß, Florin Manea, Dirk Nowotka, Stefan Siemer, Max Wiedenhöft

Abstract: We introduce and investigate a series of matching problems for patterns with variables under Simon's congruence. Our results provide a thorough picture of these problems' computational complexity. We introduce and investigate a series of matching problems for patterns with variables under Simon's congruence. Our results provide a thorough picture of these problems' computational complexity. △ Less

Submitted 16 August, 2023; originally announced August 2023.

ACM Class: F.4.3; E.1

arXiv:2306.14593 [pdf, other]

Semënov Arithmetic, Affine VASS, and String Constraints

Authors: Andrei Draghici, Christoph Haase, Florin Manea

Abstract: We study extensions of Semënov arithmetic, the first-order theory of the structure $(\mathbb{N}, +, 2^x)$. It is well-knonw that this theory becomes undecidable when extended with regular predicates over tuples of number strings, such as the Büchi $V_2$-predicate. We therefore restrict ourselves to the existential theory of Semënov arithmetic and show that this theory is decidable in EXPSPACE when… ▽ More We study extensions of Semënov arithmetic, the first-order theory of the structure $(\mathbb{N}, +, 2^x)$. It is well-knonw that this theory becomes undecidable when extended with regular predicates over tuples of number strings, such as the Büchi $V_2$-predicate. We therefore restrict ourselves to the existential theory of Semënov arithmetic and show that this theory is decidable in EXPSPACE when extended with arbitrary regular predicates over tuples of number strings. Our approach relies on a reduction to the language emptiness problem for a restricted class of affine vector addition systems with states, which we show decidable in EXPSPACE. As an application of our results, we settle an open problem from the literature and show decidability of a class of string constraints involving length constraints. △ Less

Submitted 26 June, 2023; originally announced June 2023.

Comments: 19 pages, one figure

arXiv:2304.05270 [pdf, ps, other]

Longest Common Subsequence with Gap Constraints

Authors: Duncan Adamson, Maria Kosche, Tore Koß, Florin Manea, Stefan Siemer

Abstract: We consider the longest common subsequence problem in the context of subsequences with gap constraints. In particular, following Day et al. 2022, we consider the setting when the distance (i. e., the gap) between two consecutive symbols of the subsequence has to be between a lower and an upper bound (which may depend on the position of those symbols in the subsequence or on the symbols bordering t… ▽ More We consider the longest common subsequence problem in the context of subsequences with gap constraints. In particular, following Day et al. 2022, we consider the setting when the distance (i. e., the gap) between two consecutive symbols of the subsequence has to be between a lower and an upper bound (which may depend on the position of those symbols in the subsequence or on the symbols bordering the gap) as well as the case where the entire subsequence is found in a bounded range (defined by a single upper bound), considered by Kosche et al. 2022. In all these cases, we present effcient algorithms for determining the length of the longest common constrained subsequence between two given strings. △ Less

Submitted 2 June, 2023; v1 submitted 11 April, 2023; originally announced April 2023.

arXiv:2208.14722 [pdf, ps, other]

Combinatorial Algorithms for Subsequence Matching: A Survey

Authors: Maria Kosche, Tore Koß, Florin Manea, Stefan Siemer

Abstract: In this paper we provide an overview of a series of recent results regarding algorithms for searching for subsequences in words or for the analysis of the sets of subsequences occurring in a word. In this paper we provide an overview of a series of recent results regarding algorithms for searching for subsequences in words or for the analysis of the sets of subsequences occurring in a word. △ Less

Submitted 10 October, 2022; v1 submitted 31 August, 2022; originally announced August 2022.

Comments: This is a revised version of the paper with the same title which appeared in the Proceedings of NCMA 2022, EPTCS 367, 2022, pp. 11-27 (DOI: 10.4204/EPTCS.367.2). The revision consists in citing a series of relevant references which were not covered in the initial version, and commenting on how they relate to the results we survey. arXiv admin note: text overlap with arXiv:2206.13896

arXiv:2208.08806 [pdf, other]

A Generic Information Extraction System for String Constraints

Authors: Joel D. Day, Adrian Kröger, Mitja Kulczynski, Florin Manea, Dirk Nowotka, Danny Bøgsted Poulsen

Abstract: String constraint solving, and the underlying theory of word equations, are highly interesting research topics both for practitioners and theoreticians working in the wide area of satisfiability modulo theories. As string constraint solving algorithms, a.k.a. string solvers, gained a more prominent role in the formal analysis of string-heavy programs, especially in connection to symbolic code exec… ▽ More String constraint solving, and the underlying theory of word equations, are highly interesting research topics both for practitioners and theoreticians working in the wide area of satisfiability modulo theories. As string constraint solving algorithms, a.k.a. string solvers, gained a more prominent role in the formal analysis of string-heavy programs, especially in connection to symbolic code execution and security protocol verification, we can witness an ever-growing number of benchmarks collecting string solving instances from real-world applications as well as an ever-growing need for more efficient and reliable solvers, especially for the aforementioned real-world instances. Thus, it seems that the string solving area (and the developers, theoreticians, and end-users active in it) could greatly benefit from a better understanding and processing of the existing string solving benchmarks. In this context, we propose SMTQUERY: an SMT-LIB benchmark analysis tool for string constraints. SMTQUERY is implemented in Python 3, and offers a collection of analysis and information extraction tools for a comprehensive data base of string benchmarks (presented in SMT-LIB format), based on an SQL-centred language called QLANG. △ Less

Submitted 18 August, 2022; originally announced August 2022.

arXiv:2207.09201 [pdf, ps, other]

Subsequences in Bounded Ranges: Matching and Analysis Problems

Authors: Maria Kosche, Tore Koß, Florin Manea, Viktoriya Pak

Abstract: In this paper, we consider a variant of the classical algorithmic problem of checking whether a given word $v$ is a subsequence of another word $w$. More precisely, we consider the problem of deciding, given a number $p$ (defining a range-bound) and two words $v$ and $w$, whether there exists a factor $w[i:i+p-1]$ (or, in other words, a range of length $p$) of $w$ having $v$ as subsequence (i.\,e.… ▽ More In this paper, we consider a variant of the classical algorithmic problem of checking whether a given word $v$ is a subsequence of another word $w$. More precisely, we consider the problem of deciding, given a number $p$ (defining a range-bound) and two words $v$ and $w$, whether there exists a factor $w[i:i+p-1]$ (or, in other words, a range of length $p$) of $w$ having $v$ as subsequence (i.\,e., $v$ occurs as a subsequence in the bounded range $w[i:i+p-1]$). We give matching upper and lower quadratic bounds for the time complexity of this problem. Further, we consider a series of algorithmic problems in this setting, in which, for given integers $k$, $p$ and a word $w$, we analyse the set $p$-Subseq$_{k}(w)$ of all words of length $k$ which occur as subsequence of some factor of length $p$ of $w$. Among these, we consider the $k$-universality problem, the $k$-equivalence problem, as well as problems related to absent subsequences. Surprisingly, unlike the case of the classical model of subsequences in words where such problems have efficient solutions in general, we show that most of these problems become intractable in the new setting when subsequences in bounded ranges are considered. Finally, we provide an example of how some of our results can be applied to subsequence matching problems for circular words. △ Less

Submitted 22 September, 2022; v1 submitted 19 July, 2022; originally announced July 2022.

Comments: Extended version of a paper which will appear in the proceedings of the 16th International Conference on Reachability Problems, RP 2022

arXiv:2207.07477 [pdf, ps, other]

Matching Patterns with Variables Under Edit Distance

Authors: Paweł Gawrychowski, Florin Manea, Stefan Siemer

Abstract: A pattern $α$ is a string of variables and terminal letters. We say that $α$ matches a word $w$, consisting only of terminal letters, if $w$ can be obtained by replacing the variables of $α$ by terminal words. The matching problem, i.e., deciding whether a given pattern matches a given word, was heavily investigated: it is NP-complete in general, but can be solved efficiently for classes of patter… ▽ More A pattern $α$ is a string of variables and terminal letters. We say that $α$ matches a word $w$, consisting only of terminal letters, if $w$ can be obtained by replacing the variables of $α$ by terminal words. The matching problem, i.e., deciding whether a given pattern matches a given word, was heavily investigated: it is NP-complete in general, but can be solved efficiently for classes of patterns with restricted structure. If we are interested in what is the minimum Hamming distance between $w$ and any word $u$ obtained by replacing the variables of $α$ by terminal words (so matching under Hamming distance), one can devise efficient algorithms and matching conditional lower bounds for the class of regular patterns (in which no variable occurs twice), as well as for classes of patterns where we allow unbounded repetitions of variables, but restrict the structure of the pattern, i.e., the way the occurrences of different variables can be interleaved. Moreover, under Hamming distance, if a variable occurs more than once and its occurrences can be interleaved arbitrarily with those of other variables, even if each of these occurs just once, the matching problem is intractable. In this paper, we consider the problem of matching patterns with variables under edit distance. We still obtain efficient algorithms and matching conditional lower bounds for the class of regular patterns, but show that the problem becomes, in this case, intractable already for unary patterns, consisting of repeated occurrences of a single variable interleaved with terminals. △ Less

Submitted 15 July, 2022; originally announced July 2022.

arXiv:2206.13896 [pdf, other]

Subsequences With Gap Constraints: Complexity Bounds for Matching and Analysis Problems

Authors: Joel D. Day, Maria Kosche, Florin Manea, Markus L. Schmid

Abstract: We consider subsequences with gap constraints, i.e., length-k subsequences p that can be embedded into a string w such that the induced gaps (i.e., the factors of w between the positions to which p is mapped to) satisfy given gap constraints $gc = (C_1, C_2, ..., C_{k-1})$; we call p a gc-subsequence of w. In the case where the gap constraints gc are defined by lower and upper length bounds… ▽ More We consider subsequences with gap constraints, i.e., length-k subsequences p that can be embedded into a string w such that the induced gaps (i.e., the factors of w between the positions to which p is mapped to) satisfy given gap constraints $gc = (C_1, C_2, ..., C_{k-1})$; we call p a gc-subsequence of w. In the case where the gap constraints gc are defined by lower and upper length bounds $C_i = (L^-_i, L^+_i) \in \mathbb{N}^2$ and/or regular languages $C_i \in REG$, we prove tight (conditional on the orthogonal vectors (OV) hypothesis) complexity bounds for checking whether a given p is a gc-subsequence of a string w. We also consider the whole set of all gc-subsequences of a string, and investigate the complexity of the universality, equivalence and containment problems for these sets of gc-subsequences. △ Less

Submitted 28 June, 2022; originally announced June 2022.

arXiv:2205.00475 [pdf, other]

Formal Languages via Theories over Strings

Authors: Joel D. Day, Vijay Ganesh, Nathan Grewal, Florin Manea

Abstract: We investigate the properties of formal languages expressible in terms of formulas over quantifier-free theories of word equations, arithmetic over length constraints, and language membership predicates for the classes of regular, visibly pushdown, and deterministic context-free languages. In total, we consider 20 distinct theories and decidability questions for problems such as emptiness and univ… ▽ More We investigate the properties of formal languages expressible in terms of formulas over quantifier-free theories of word equations, arithmetic over length constraints, and language membership predicates for the classes of regular, visibly pushdown, and deterministic context-free languages. In total, we consider 20 distinct theories and decidability questions for problems such as emptiness and universality for formal languages over them. First, we discuss their relative expressive power and observe a rough division into two hierarchies based on whether or not word equations are present. Second, we consider the decidability status of several important decision problems, such as emptiness and universality. Note that the emptiness problem is equivalent to the satisfiability problem over the corresponding theory. Third, we consider the problem of whether a language in one theory is expressible in another and show several negative results in which this problem is undecidable. These results are particularly relevant in the context of normal forms in both practical and theoretical aspects of string solving. △ Less

Submitted 1 May, 2022; originally announced May 2022.

arXiv:2108.13968 [pdf, other]

Absent Subsequences in Words

Authors: Maria Kosche, Tore Koß, Florin Manea, Stefan Siemer

Abstract: An absent factor of a string $w$ is a string $u$ which does not occur as a contiguous substring (a.k.a. factor) inside $w$. We extend this well-studied notion and define absent subsequences: a string $u$ is an absent subsequence of a string $w$ if $u$ does not occur as subsequence (a.k.a. scattered factor) inside $w$. Of particular interest to us are minimal absent subsequences, i.e., absent subse… ▽ More An absent factor of a string $w$ is a string $u$ which does not occur as a contiguous substring (a.k.a. factor) inside $w$. We extend this well-studied notion and define absent subsequences: a string $u$ is an absent subsequence of a string $w$ if $u$ does not occur as subsequence (a.k.a. scattered factor) inside $w$. Of particular interest to us are minimal absent subsequences, i.e., absent subsequences whose every subsequence is not absent, and shortest absent subsequences, i.e., absent subsequences of minimal length. We show a series of combinatorial and algorithmic results regarding these two notions. For instance: we give combinatorial characterisations of the sets of minimal and, respectively, shortest absent subsequences in a word, as well as compact representations of these sets; we show how we can test efficiently if a string is a shortest or minimal absent subsequence in a word, and we give efficient algorithms computing the lexicographically smallest absent subsequence of each kind; also, we show how a data structure for answering shortest absent subsequence-queries for the factors of a given string can be efficiently computed. △ Less

Submitted 11 October, 2023; v1 submitted 31 August, 2021; originally announced August 2021.

Comments: An extended abstract appeared in the proceedings of the 15th International Conference on Reachability Problems RP2021

Journal ref: Fundamenta Informaticae, Volume 189, Issues 3-4: Reachability Problems 2020 and 2021 (October 14, 2023) fi:9221

arXiv:2106.06249 [pdf, other]

Matching Patterns with Variables under Hamming Distance

Authors: Paweł Gawrychowski, Florin Manea, Stefan Siemer

Abstract: A pattern $α$ is a string of variables and terminal letters. We say that $α$ matches a word $w$, consisting only of terminal letters, if $w$ can be obtained by replacing the variables of $α$ by terminal words. The matching problem, i.e., deciding whether a given pattern matches a given word, was heavily investigated: it is NP-complete in general, but can be solved efficiently for classes of patter… ▽ More A pattern $α$ is a string of variables and terminal letters. We say that $α$ matches a word $w$, consisting only of terminal letters, if $w$ can be obtained by replacing the variables of $α$ by terminal words. The matching problem, i.e., deciding whether a given pattern matches a given word, was heavily investigated: it is NP-complete in general, but can be solved efficiently for classes of patterns with restricted structure. In this paper, we approach this problem in a generalized setting, by considering approximate pattern matching under Hamming distance. More precisely, we are interested in what is the minimum Hamming distance between $w$ and any word $u$ obtained by replacing the variables of $α$ by terminal words. Firstly, we address the class of regular patterns (in which no variable occurs twice) and propose efficient algorithms for this problem, as well as matching conditional lower bounds. We show that the problem can still be solved efficiently if we allow repeated variables, but restrict the way the different variables can be interleaved according to a locality parameter. However, as soon as we allow a variable to occur more than once and its occurrences can be interleaved arbitrarily with those of other variables, even if none of them occurs more than once, the problem becomes intractable. △ Less

Submitted 11 June, 2021; originally announced June 2021.

arXiv:2105.07220 [pdf, ps, other]

String Theories involving Regular Membership Predicates: From Practice to Theory and Back

Authors: Murphy Berzish, Joel D. Day, Vijay Ganesh, Mitja Kulczynski, Florin Manea, Federico Mora, Dirk Nowotka

Abstract: Widespread use of string solvers in formal analysis of string-heavy programs has led to a growing demand for more efficient and reliable techniques which can be applied in this context, especially for real-world cases. Designing an algorithm for the (generally undecidable) satisfiability problem for systems of string constraints requires a thorough understanding of the structure of constraints pre… ▽ More Widespread use of string solvers in formal analysis of string-heavy programs has led to a growing demand for more efficient and reliable techniques which can be applied in this context, especially for real-world cases. Designing an algorithm for the (generally undecidable) satisfiability problem for systems of string constraints requires a thorough understanding of the structure of constraints present in the targeted cases. In this paper, we investigate benchmarks presented in the literature containing regular expression membership predicates, extract different first order logic theories, and prove their decidability, resp. undecidability. Notably, the most common theories in real-world benchmarks are PSPACE-complete and directly lead to the implementation of a more efficient algorithm to solving string constraints. △ Less

Submitted 15 May, 2021; originally announced May 2021.

Comments: arXiv admin note: substantial text overlap with arXiv:2010.07253

arXiv:2010.07253 [pdf, ps, other]

An SMT Solver for Regular Expressions and Linear Arithmetic over String Length

Authors: Murphy Berzish, Mitja Kulczynski, Federico Mora, Florin Manea, Joel D. Day, Dirk Nowotka, Vijay Ganesh

Abstract: We present a novel length-aware solving algorithm for the quantifier-free first-order theory over regex membership predicate and linear arithmetic over string length. We implement and evaluate this algorithm and related heuristics in the Z3 theorem prover. A crucial insight that underpins our algorithm is that real-world instances contain a wealth of information about upper and lower bounds on len… ▽ More We present a novel length-aware solving algorithm for the quantifier-free first-order theory over regex membership predicate and linear arithmetic over string length. We implement and evaluate this algorithm and related heuristics in the Z3 theorem prover. A crucial insight that underpins our algorithm is that real-world instances contain a wealth of information about upper and lower bounds on lengths of strings under constraints, and such information can be used very effectively to simplify operations on automata representing regular expressions. Additionally, we present a number of novel general heuristics, such as the prefix/suffix method, that can be used in conjunction with a variety of regex solving algorithms, making them more efficient. We showcase the power of our algorithm and heuristics via an extensive empirical evaluation over a large and diverse benchmark of 57256 regex-heavy instances, almost 75% of which are derived from industrial applications or contributed by other solver developers. Our solver outperforms five other state-of-the-art string solvers, namely, CVC4, OSTRICH, Z3seq, Z3str3, and Z3-Trau, over this benchmark, in particular achieving a 2.4x speedup over CVC4, 4.4x speedup over Z3seq, 6.4x speedup over Z3-Trau, 9.1x speedup over Z3str3, and 13x speedup over OSTRICH. △ Less

Submitted 7 May, 2021; v1 submitted 14 October, 2020; originally announced October 2020.

Comments: 25 pages (main body 21 pages). 7 figures, 6 tables

arXiv:2008.03516 [pdf, ps, other]

Blocksequences of k-local Words

Authors: Pamela Fleischmann, Lukas Haschke, Florin Manea, Dirk Nowotka, Cedric Tsatia Tsida, Judith Wiedenbeck

Abstract: The locality of words is a relatively young structural complexity measure, introduced by Day et al. in 2017 in order to define classes of patterns with variables which can be matched in polynomial time. The main tool used to compute the locality of a word is called marking sequence: an ordering of the distinct letters occurring in the respective order. Once a marking sequence is defined, the lette… ▽ More The locality of words is a relatively young structural complexity measure, introduced by Day et al. in 2017 in order to define classes of patterns with variables which can be matched in polynomial time. The main tool used to compute the locality of a word is called marking sequence: an ordering of the distinct letters occurring in the respective order. Once a marking sequence is defined, the letters of the word are marked in steps: in the ith marking step, all occurrences of the ith letter of the marking sequence are marked. As such, after each marking step, the word can be seen as a sequence of blocks of marked letters separated by blocks of non-marked letters. By kee** track of the evolution of the marked blocks of the word through the marking defined by a marking sequence, one defines the blocksequence of the respective marking sequence. We first show that the words sharing the same blocksequence are only loosely connected, so we consider the stronger notion of extended blocksequence, which stores additional information on the form of each single marked block. In this context, we present a series of combinatorial results for words sharing the extended blocksequence. △ Less

Submitted 17 August, 2020; v1 submitted 8 August, 2020; originally announced August 2020.

MSC Class: 68Q45 ACM Class: F.4.3

arXiv:2007.09192 [pdf, ps, other]

The Edit Distance to $k$-Subsequence Universality

Authors: Pamela Fleischmann, Maria Kosche, Tore Koß, Florin Manea, Stefan Siemer

Abstract: A word $u$ is a subsequence of another word $w$ if $u$ can be obtained from $w$ by deleting some of its letters. The word $w$ with alph$(w)=Σ$ is called $k$-subsequence universal if the set of subsequences of length $k$ of $w$ contains all possible words of length $k$ over $Σ$. We propose a series of efficient algorithms computing the minimal number of edit operations (insertion, deletion, substit… ▽ More A word $u$ is a subsequence of another word $w$ if $u$ can be obtained from $w$ by deleting some of its letters. The word $w$ with alph$(w)=Σ$ is called $k$-subsequence universal if the set of subsequences of length $k$ of $w$ contains all possible words of length $k$ over $Σ$. We propose a series of efficient algorithms computing the minimal number of edit operations (insertion, deletion, substitution) one needs to apply to a given word in order to reach the set of $k$-subsequence universal words. △ Less

Submitted 17 July, 2020; originally announced July 2020.

arXiv:2005.01112 [pdf, other]

doi 10.4230/LIPIcs.STACS.2021.34

Efficiently Testing Simon's Congruence

Authors: Pawel Gawrychowski, Maria Kosche, Tore Koss, Florin Manea, Stefan Siemer

Abstract: Simon's congruence $\sim_k$ is defined as follows: two words are $\sim_k$-equivalent if they have the same set of subsequences of length at most $k$. We propose an algorithm which computes, given two words $s$ and $t$, the largest $k$ for which $s\sim_k t$. Our algorithm runs in linear time $O(|s|+|t|)$ when the input words are over the integer alphabet $\{1,\ldots,|s|+|t|\}$ (or other alphabets w… ▽ More Simon's congruence $\sim_k$ is defined as follows: two words are $\sim_k$-equivalent if they have the same set of subsequences of length at most $k$. We propose an algorithm which computes, given two words $s$ and $t$, the largest $k$ for which $s\sim_k t$. Our algorithm runs in linear time $O(|s|+|t|)$ when the input words are over the integer alphabet $\{1,\ldots,|s|+|t|\}$ (or other alphabets which can be sorted in linear time). This approach leads to an optimal algorithm in the case of general alphabets as well. Our results are based on a novel combinatorial approach and a series of efficient data structures. △ Less

Submitted 15 March, 2021; v1 submitted 3 May, 2020; originally announced May 2020.

arXiv:2003.04629 [pdf, other]

Scattered Factor-Universality of Words

Authors: Laura Barker, Pamela Fleischmann, Katharina Harwardt, Florin Manea, Dirk Nowotka

Abstract: A word $u=u_1\dots u_n$ is a scattered factor of a word $w$ if $u$ can be obtained from $w$ by deleting some of its letters: there exist the (potentially empty) words $v_0,v_1,..,v_n$ such that $w = v_0u_1v_1...u_nv_n$. The set of all scattered factors up to length $k$ of a word is called its full $k$-spectrum. Firstly, we show an algorithm deciding whether the $k$-spectra for given $k$ of two wor… ▽ More A word $u=u_1\dots u_n$ is a scattered factor of a word $w$ if $u$ can be obtained from $w$ by deleting some of its letters: there exist the (potentially empty) words $v_0,v_1,..,v_n$ such that $w = v_0u_1v_1...u_nv_n$. The set of all scattered factors up to length $k$ of a word is called its full $k$-spectrum. Firstly, we show an algorithm deciding whether the $k$-spectra for given $k$ of two words are equal or not, running in optimal time. Secondly, we consider a notion of scattered-factors universality: the word $w$, with $\letters(w)=Σ$, is called $k$-universal if its $k$-spectrum includes all words of length $k$ over the alphabet $Σ$; we extend this notion to $k$-circular universality. After a series of preliminary combinatorial results, we present an algorithm computing, for a given $k'$-universal word $w$ the minimal $i$ such that $w^i$ is $k$-universal for some $k>k'$. Several other connected problems~are~also~considered. △ Less

Submitted 10 March, 2020; originally announced March 2020.

arXiv:2001.11218 [pdf, ps, other]

Reconstructing Words from Right-Bounded-Block Words

Authors: Pamela Fleischmann, Marie Lejeune, Florin Manea, Dirk Nowotka, Michel Rigo

Abstract: A reconstruction problem of words from scattered factors asks for the minimal information, like multisets of scattered factors of a given length or the number of occurrences of scattered factors from a given set, necessary to uniquely determine a word. We show that a word $w \in \{a, b\}^{*}$ can be reconstructed from the number of occurrences of at most $\min(|w|_a, |w|_b)+ 1$ scattered factors o… ▽ More A reconstruction problem of words from scattered factors asks for the minimal information, like multisets of scattered factors of a given length or the number of occurrences of scattered factors from a given set, necessary to uniquely determine a word. We show that a word $w \in \{a, b\}^{*}$ can be reconstructed from the number of occurrences of at most $\min(|w|_a, |w|_b)+ 1$ scattered factors of the form $a^{i} b$. Moreover, we generalize the result to alphabets of the form $\{1,\ldots,q\}$ by showing that at most $ \sum^{q-1}_{i=1} |w|_i (q-i+1)$ scattered factors suffices to reconstruct $w$. Both results improve on the upper bounds known so far. Complexity time bounds on reconstruction algorithms are also considered here. △ Less

Submitted 16 March, 2020; v1 submitted 30 January, 2020; originally announced January 2020.

arXiv:1906.11718 [pdf, ps, other]

On Solving Word Equations Using SAT

Authors: Joel D. Day, Thorsten Ehlers, Mitja Kulczynski, Florin Manea, Dirk Nowotka, Danny Bøgsted Poulsen

Abstract: We present Woorpje, a string solver for bounded word equations (i.e., equations where the length of each variable is upper bounded by a given integer). Our algorithm works by reformulating the satisfiability of bounded word equations as a reachability problem for nondeterministic finite automata, and then carefully encoding this as a propositional satisfiability problem, which we then solve using… ▽ More We present Woorpje, a string solver for bounded word equations (i.e., equations where the length of each variable is upper bounded by a given integer). Our algorithm works by reformulating the satisfiability of bounded word equations as a reachability problem for nondeterministic finite automata, and then carefully encoding this as a propositional satisfiability problem, which we then solve using the well-known Glucose SAT-solver. This approach has the advantage of allowing for the natural inclusion of additional linear length constraints. Our solver obtains reliable and competitive results and, remarkably, discovered several cases where state-of-the-art solvers exhibit a faulty behaviour. △ Less

Submitted 27 June, 2019; originally announced June 2019.

arXiv:1906.06965 [pdf, ps, other]

Matching Patterns with Variables

Authors: Florin Manea, Markus L. Schmid

Abstract: A pattern p (i.e., a string of variables and terminals) matches a word w, if w can be obtained by uniformly replacing the variables of p by terminal words. The respective matching problem, i.e., deciding whether or not a given pattern matches a given word, is generally NP-complete, but can be solved in polynomial-time for classes of patterns with restricted structure. In this paper we overview a s… ▽ More A pattern p (i.e., a string of variables and terminals) matches a word w, if w can be obtained by uniformly replacing the variables of p by terminal words. The respective matching problem, i.e., deciding whether or not a given pattern matches a given word, is generally NP-complete, but can be solved in polynomial-time for classes of patterns with restricted structure. In this paper we overview a series of recent results related to efficient matching for patterns with variables, as well as a series of extensions of this problem. △ Less

Submitted 29 July, 2019; v1 submitted 17 June, 2019; originally announced June 2019.

arXiv:1906.00715 [pdf, ps, other]

On Modelling the Avoidability of Patterns as CSP

Authors: Thorsten Ehlers, Florin Manea, Dirk Nowotka, Kamellia Reshadi

Abstract: Solving avoidability problems in the area of string combinatorics often requires, in an initial step, the construction, via a computer program, of a very long word that does not contain any word that matches a given pattern. It is well known that this is a computationally hard task. Despite being rather straightforward that, ultimately, all such tasks can be formalized as constraints satisfaction… ▽ More Solving avoidability problems in the area of string combinatorics often requires, in an initial step, the construction, via a computer program, of a very long word that does not contain any word that matches a given pattern. It is well known that this is a computationally hard task. Despite being rather straightforward that, ultimately, all such tasks can be formalized as constraints satisfaction problems, no unified approach to solving them was proposed so far, and very diverse ad-hoc methods were used. We aim to fill this gap: we show how several relevant avoidability problems can be modelled, and consequently solved, in an uniform way as constraint satisfaction problems, using the framework of MiniZinc. The main advantage of this approach is that one is now required only to formulate the avoidability problem in the MiniZinc language, and then the actual search for a solution does not have to be implemented ad-hoc, being instead carried out by a standard CSP-solver. △ Less

Submitted 3 June, 2019; originally announced June 2019.

arXiv:1904.09125 [pdf, ps, other]

k-Spectra of weakly-c-Balanced Words

Authors: Joel D. Day, Pamela Fleischmann, Florin Manea, Dirk Nowotka

Abstract: A word $u$ is a scattered factor of $w$ if $u$ can be obtained from $w$ by deleting some of its letters. That is, there exist the (potentially empty) words $u_1,u_2,..., u_n$, and $v_0,v_1,..,v_n$ such that $u = u_1u_2...u_n$ and $w = v_0u_1v_1u_2v_2...u_nv_n$. We consider the set of length-$k$ scattered factors of a given word w, called here $k$-spectrum and denoted $\ScatFact_k(w)$. We prove a s… ▽ More A word $u$ is a scattered factor of $w$ if $u$ can be obtained from $w$ by deleting some of its letters. That is, there exist the (potentially empty) words $u_1,u_2,..., u_n$, and $v_0,v_1,..,v_n$ such that $u = u_1u_2...u_n$ and $w = v_0u_1v_1u_2v_2...u_nv_n$. We consider the set of length-$k$ scattered factors of a given word w, called here $k$-spectrum and denoted $\ScatFact_k(w)$. We prove a series of properties of the sets $\ScatFact_k(w)$ for binary strictly balanced and, respectively, $c$-balanced words $w$, i.e., words over a two-letter alphabet where the number of occurrences of each letter is the same, or, respectively, one letter has $c$-more occurrences than the other. In particular, we consider the question which cardinalities $n= |\ScatFact_k(w)|$ are obtainable, for a positive integer $k$, when $w$ is either a strictly balanced binary word of length $2k$, or a $c$-balanced binary word of length $2k-c$. We also consider the problem of reconstructing words from their $k$-spectra. △ Less

Submitted 24 May, 2019; v1 submitted 19 April, 2019; originally announced April 2019.

arXiv:1902.10983 [pdf, other]

Graph and String Parameters: Connections Between Pathwidth, Cutwidth and the Locality Number

Authors: Katrin Casel, Joel D. Day, Pamela Fleischmann, Tomasz Kociumaka, Florin Manea, Markus L. Schmid

Abstract: We investigate the locality number, a recently introduced structural parameter for strings (with applications in pattern matching with variables), and its connection to two important graph-parameters, cutwidth and pathwidth. These connections allow us to show that computing the locality number is NP-hard, but fixed-parameter tractable, if parameterised by the locality number or by the alphabet siz… ▽ More We investigate the locality number, a recently introduced structural parameter for strings (with applications in pattern matching with variables), and its connection to two important graph-parameters, cutwidth and pathwidth. These connections allow us to show that computing the locality number is NP-hard, but fixed-parameter tractable, if parameterised by the locality number or by the alphabet size, which has been formulated as open problems in the literature. Moreover, the locality number can be approximated with ratio O(sqrt(log(opt)) log(n)). An important aspect of our work -- that is relevant in its own right and of independent interest -- is that we identify connections between the string parameter of the locality number on the one hand, and the famous graph parameters of cutwidth and pathwidth, on the other hand. These two parameters have been jointly investigated in the literature and are arguably among the most central graph parameters that are based on "linearisations" of graphs. In this way, we also identify a direct approximation preserving reduction from cutwidth to pathwidth, which shows that any polynomial f(opt,|V|)-approximation algorithm for pathwidth yields a polynomial 2f(2 opt,h)-approximation algorithm for cutwidth on multigraphs (where h is the number of edges). In particular, this translates known approximation ratios for pathwidth into new approximation ratios for cutwidth, namely O(sqrt(log(opt)) log(h)) and O(sqrt(log(opt)) opt) for (multi) graphs with h edges. △ Less

Submitted 25 April, 2024; v1 submitted 28 February, 2019; originally announced February 2019.

arXiv:1810.07422 [pdf, other]

Fast and Longest Rollercoasters

Authors: Paweł Gawrychowski, Florin Manea, Radosław Serafin

Abstract: For $k\geq 3$, a k-rollercoaster is a sequence of numbers whose every maximal contiguous subsequence, that is increasing or decreasing, has length at least $k$; $3$-rollercoasters are called simply rollercoasters. Given a sequence of distinct numbers, we are interested in computing its maximum-length (not necessarily contiguous) subsequence that is a $k$-rollercoaster. Biedl et al. [ICALP 2018] ha… ▽ More For $k\geq 3$, a k-rollercoaster is a sequence of numbers whose every maximal contiguous subsequence, that is increasing or decreasing, has length at least $k$; $3$-rollercoasters are called simply rollercoasters. Given a sequence of distinct numbers, we are interested in computing its maximum-length (not necessarily contiguous) subsequence that is a $k$-rollercoaster. Biedl et al. [ICALP 2018] have shown that each sequence of $n$ distinct real numbers contains a rollercoaster of length at least $\lceil n/2\rceil$ for $n>7$, and that a longest rollercoaster contained in such a sequence can be computed in $O(n\log n)$-time. They have also shown that every sequence of $n\geq (k-1)^2+1$ distinct real numbers contains a $k$-rollercoaster of length at least $\frac{n}{2(k-1)}-\frac{3k}{2}$, and gave an $O(nk\log n)$-time algorithm computing a longest $k$-rollercoaster in a sequence of length $n$. In this paper, we give an $O(nk^2)$-time algorithm computing the length of a longest $k$-rollercoaster contained in a sequence of $n$ distinct real numbers; hence, for constant $k$, our algorithm computes the length of a longest $k$-rollercoaster in optimal linear time. The algorithm can be easily adapted to output the respective $k$-rollercoaster. In particular, this improves the results of Biedl et al. [ICALP 2018], by showing that a longest rollercoaster can be computed in optimal linear time. We also present an algorithm computing the length of a longest $k$-rollercoaster in $O(n \log^2 n)$-time, that is, subquadratic even for large values of $k\leq n$. Again, the rollercoaster can be easily retrieved. Finally, we show an $Ω(n \log k)$ lower bound for the number of comparisons in any comparison-based algorithm computing the length of a longest $k$-rollercoaster. △ Less

Submitted 7 August, 2019; v1 submitted 17 October, 2018; originally announced October 2018.

ACM Class: F.2.2

arXiv:1802.00523 [pdf, ps, other]

The Satisfiability of Extended Word Equations: The Boundary Between Decidability and Undecidability

Authors: Joel Day, Vijay Ganesh, Paul He, Florin Manea, Dirk Nowotka

Abstract: The study of word equations (or the existential theory of equations over free monoids) is a central topic in mathematics and theoretical computer science. The problem of deciding whether a given word equation has a solution was shown to be decidable by Makanin in the late 1970s, and since then considerable work has been done on this topic. In recent years, this decidability question has gained cri… ▽ More The study of word equations (or the existential theory of equations over free monoids) is a central topic in mathematics and theoretical computer science. The problem of deciding whether a given word equation has a solution was shown to be decidable by Makanin in the late 1970s, and since then considerable work has been done on this topic. In recent years, this decidability question has gained critical importance in the context of string SMT solvers for security analysis. Further, many extensions (e.g., quantifier-free word equations with linear arithmetic over the length function) and fragments (e.g., restrictions on the number of variables) of this theory are important from a theoretical point of view, as well as for program analysis applications. Motivated by these considerations, we prove several new results and thus shed light on the boundary between decidability and undecidability for many fragments and extensions of the first order theory of word equations. △ Less

Submitted 1 February, 2018; originally announced February 2018.

arXiv:1801.08565 [pdf, other]

Rollercoasters and Caterpillars

Authors: Therese Biedl, Ahmad Biniaz, Robert Cummings, Anna Lubiw, Florin Manea, Dirk Nowotka, Jeffrey Shallit

Abstract: A rollercoaster is a sequence of real numbers for which every maximal contiguous subsequence, that is increasing or decreasing, has length at least three. By translating this sequence to a set of points in the plane, a rollercoaster can be defined as a polygonal path for which every maximal sub-path, with positive- or negative-slope edges, has at least three points. Given a sequence of distinct re… ▽ More A rollercoaster is a sequence of real numbers for which every maximal contiguous subsequence, that is increasing or decreasing, has length at least three. By translating this sequence to a set of points in the plane, a rollercoaster can be defined as a polygonal path for which every maximal sub-path, with positive- or negative-slope edges, has at least three points. Given a sequence of distinct real numbers, the rollercoaster problem asks for a maximum-length subsequence that is a rollercoaster. It was conjectured that every sequence of $n$ distinct real numbers contains a rollercoaster of length at least $\lceil n/2\rceil$ for $n>7$, while the best known lower bound is $Ω(n/\log n)$. In this paper we prove this conjecture. Our proof is constructive and implies a linear-time algorithm for computing a rollercoaster of this length. Extending the $O(n\log n)$-time algorithm for computing a longest increasing subsequence, we show how to compute a maximum-length rollercoaster within the same time bound. A maximum-length rollercoaster in a permutation of $\{1,\dots,n\}$ can be computed in $O(n \log \log n)$ time. The search for rollercoasters was motivated by orthogeodesic point-set embedding of caterpillars. A caterpillar is a tree such that deleting the leaves gives a path, called the spine. A top-view caterpillar is one of degree 4 such that the two leaves adjacent to each vertex lie on opposite sides of the spine. As an application of our result on rollercoasters, we are able to find a planar drawing of every $n$-node top-view caterpillar on every set of $\frac{25}{3}n$ points in the plane, such that each edge is an orthogonal path with one bend. This improves the previous best known upper bound on the number of required points, which is $O(n \log n)$. We also show that such a drawing can be obtained in linear time, provided that the points are given in sorted order. △ Less

Submitted 25 January, 2018; originally announced January 2018.

Comments: 17 pages

arXiv:1702.07922 [pdf, other]

The Hardness of Solving Simple Word Equations

Authors: Joel D. Day, Florin Manea, Dirk Nowotka

Abstract: We investigate the class of regular-ordered word equations. In such equations, each variable occurs at most once in each side and the order of the variables occurring in both sides is the preserved (the variables can be, however, separated by potentially distinct constant factors). Surprisingly, we obtain that solving such simple equations, even when the sides contain exactly the same variables, i… ▽ More We investigate the class of regular-ordered word equations. In such equations, each variable occurs at most once in each side and the order of the variables occurring in both sides is the preserved (the variables can be, however, separated by potentially distinct constant factors). Surprisingly, we obtain that solving such simple equations, even when the sides contain exactly the same variables, is NP-hard. By considerations regarding the combinatorial structure of the minimal solutions of the more general quadratic equations we obtain that the satisfiability problem for regular-ordered equations is in NP. Finally, we also show that a related class of simple word equations, that generalises one-variable equations, is in P. △ Less

Submitted 28 February, 2017; v1 submitted 25 February, 2017; originally announced February 2017.

arXiv:1604.00054 [pdf, other]

Detecting One-variable Patterns

Authors: Dmitry Kosolobov, Florin Manea, Dirk Nowotka

Abstract: Given a pattern $p = s_1x_1s_2x_2\cdots s_{r-1}x_{r-1}s_r$ such that $x_1,x_2,\ldots,x_{r-1}\in\{x,\overset{{}_{\leftarrow}}{x}\}$, where $x$ is a variable and $\overset{{}_{\leftarrow}}{x}$ its reversal, and $s_1,s_2,\ldots,s_r$ are strings that contain no variables, we describe an algorithm that constructs in $O(rn)$ time a compact representation of all $P$ instances of $p$ in an input string of… ▽ More Given a pattern $p = s_1x_1s_2x_2\cdots s_{r-1}x_{r-1}s_r$ such that $x_1,x_2,\ldots,x_{r-1}\in\{x,\overset{{}_{\leftarrow}}{x}\}$, where $x$ is a variable and $\overset{{}_{\leftarrow}}{x}$ its reversal, and $s_1,s_2,\ldots,s_r$ are strings that contain no variables, we describe an algorithm that constructs in $O(rn)$ time a compact representation of all $P$ instances of $p$ in an input string of length $n$ over a polynomially bounded integer alphabet, so that one can report those instances in $O(P)$ time. △ Less

Submitted 18 July, 2017; v1 submitted 31 March, 2016; originally announced April 2016.

Comments: 16 pages (+13 pages of Appendix), 4 figures, accepted to SPIRE 2017

arXiv:1511.07180 [pdf, other]

doi 10.23638/DMTCS-19-4-4

Longest Gapped Repeats and Palindromes

Authors: Marius Dumitran, Paweł Gawrychowski, Florin Manea

Abstract: A gapped repeat (respectively, palindrome) occurring in a word $w$ is a factor $uvu$ (respectively, $u^Rvu$) of $w$. In such a repeat (palindrome) $u$ is called the arm of the repeat (respectively, palindrome), while $v$ is called the gap. We show how to compute efficiently, for every position $i$ of the word $w$, the longest gapped repeat and palindrome occurring at that position, provided that t… ▽ More A gapped repeat (respectively, palindrome) occurring in a word $w$ is a factor $uvu$ (respectively, $u^Rvu$) of $w$. In such a repeat (palindrome) $u$ is called the arm of the repeat (respectively, palindrome), while $v$ is called the gap. We show how to compute efficiently, for every position $i$ of the word $w$, the longest gapped repeat and palindrome occurring at that position, provided that the length of the gap is subject to various types of restrictions. That is, that for each position $i$ we compute the longest prefix $u$ of $w[i..n]$ such that $uv$ (respectively, $u^Rv$) is a suffix of $w[1..i-1]$ (defining thus a gapped repeat $uvu$ -- respectively, palindrome $u^Rvu$), and the length of $v$ is subject to the aforementioned restrictions. △ Less

Submitted 11 October, 2017; v1 submitted 23 November, 2015; originally announced November 2015.

Comments: This is an extension of the conference papers "Longest $α$-Gapped Repeat and Palindrome", presented by the second and third authors at FCT 2015, and "Longest Gapped Repeats and Palindromes", presented by the first and third authors at MFCS 2015

Journal ref: Discrete Mathematics & Theoretical Computer Science, Vol. 19 no. 4, FCT '15, special issue FCT'15 (October 13, 2017) dmtcs:1337

arXiv:1509.09237 [pdf, other]

Efficiently Finding All Maximal $α$-gapped Repeats

Authors: Paweł Gawrychowski, Tomohiro I, Shunsuke Inenaga, Dominik Köppl, Florin Manea

Abstract: For $α\geq 1$, an $α$-gapped repeat in a word $w$ is a factor $uvu$ of $w$ such that $|uv|\leq α|u|$; the two factors $u$ in such a repeat are called arms, while the factor $v$ is called gap. Such a repeat is called maximal if its arms cannot be extended simultaneously with the same symbol to the right or, respectively, to the left. In this paper we show that the number of maximal $α$-gapped repea… ▽ More For $α\geq 1$, an $α$-gapped repeat in a word $w$ is a factor $uvu$ of $w$ such that $|uv|\leq α|u|$; the two factors $u$ in such a repeat are called arms, while the factor $v$ is called gap. Such a repeat is called maximal if its arms cannot be extended simultaneously with the same symbol to the right or, respectively, to the left. In this paper we show that the number of maximal $α$-gapped repeats that may occur in a word is upper bounded by $18αn$. This allows us to construct an algorithm finding all the maximal $α$-gapped repeats of a word in $O(αn)$; this is optimal, in the worst case, as there are words that have $Θ(αn)$ maximal $α$-gapped repeats. Our techniques can be extended to get comparable results in the case of $α$-gapped palindromes, i.e., factors $uvu^\mathrm{T}$ with $|uv|\leq α|u|$. △ Less

Submitted 30 September, 2015; originally announced September 2015.

arXiv:1509.00622 [pdf, ps, other]

Testing k-binomial equivalence

Authors: Dominik D. Freydenberger, Pawel Gawrychowski, Juhani Karhumäki, Florin Manea, Wojciech Rytter

Abstract: Two words $w_1$ and $w_2$ are said to be $k$-binomial equivalent if every non-empty word $x$ of length at most $k$ over the alphabet of $w_1$ and $w_2$ appears as a scattered factor of $w_1$ exactly as many times as it appears as a scattered factor of $w_2$. We give two different polynomial-time algorithms testing the $k$-binomial equivalence of two words. The first one is deterministic (but the d… ▽ More Two words $w_1$ and $w_2$ are said to be $k$-binomial equivalent if every non-empty word $x$ of length at most $k$ over the alphabet of $w_1$ and $w_2$ appears as a scattered factor of $w_1$ exactly as many times as it appears as a scattered factor of $w_2$. We give two different polynomial-time algorithms testing the $k$-binomial equivalence of two words. The first one is deterministic (but the degree of the corresponding polynomial is too high) and the second one is randomised (it is more direct and more efficient). These are the first known algorithms for the problem which run in polynomial time. △ Less

Submitted 12 October, 2015; v1 submitted 2 September, 2015; originally announced September 2015.

Journal ref: "Multidisciplinary Creativity: homage to Gheorghe Paun on his 65th birthday", Pg. 239--248, Ed. Spandugino, Bucharest, Romania, ISBN: 978-606-8401-63-8, 2015

arXiv:1008.1649 [pdf, other]

doi 10.4204/EPTCS.31.9

Accepting Hybrid Networks of Evolutionary Processors with Special Topologies and Small Communication

Authors: Jürgen Dassow, Florin Manea

Abstract: Starting from the fact that complete Accepting Hybrid Networks of Evolutionary Processors allow much communication between the nodes and are far from network structures used in practice, we propose in this paper three network topologies that restrict the communication: star networks, ring networks, and grid networks. We show that ring-AHNEPs can simulate 2-tag systems, thus we deduce the existence… ▽ More Starting from the fact that complete Accepting Hybrid Networks of Evolutionary Processors allow much communication between the nodes and are far from network structures used in practice, we propose in this paper three network topologies that restrict the communication: star networks, ring networks, and grid networks. We show that ring-AHNEPs can simulate 2-tag systems, thus we deduce the existence of a universal ring-AHNEP. For star networks or grid networks, we show a more general result; that is, each recursively enumerable language can be accepted efficiently by a star- or grid-AHNEP. We also present bounds for the size of these star and grid networks. As a consequence we get that each recursively enumerable can be accepted by networks with at most 13 communication channels and by networks where each node communicates with at most three other nodes. △ Less

Submitted 10 August, 2010; originally announced August 2010.

Comments: In Proceedings DCFS 2010, arXiv:1008.1270

Journal ref: EPTCS 31, 2010, pp. 68-77

arXiv:0907.5130 [pdf, ps, other]

doi 10.4204/EPTCS.3.16

Small Universal Accepting Networks of Evolutionary Processors with Filtered Connections

Authors: Remco Loos, Florin Manea, Victor Mitrana

Abstract: In this paper, we present some results regarding the size complexity of Accepting Networks of Evolutionary Processors with Filtered Connections (ANEPFCs). We show that there are universal ANEPFCs of size 10, by devising a method for simulating 2-Tag Systems. This result significantly improves the known upper bound for the size of universal ANEPFCs which is 18. We also propose a new, computatio… ▽ More In this paper, we present some results regarding the size complexity of Accepting Networks of Evolutionary Processors with Filtered Connections (ANEPFCs). We show that there are universal ANEPFCs of size 10, by devising a method for simulating 2-Tag Systems. This result significantly improves the known upper bound for the size of universal ANEPFCs which is 18. We also propose a new, computationally and descriptionally efficient simulation of nondeterministic Turing machines by ANEPFCs. More precisely, we describe (informally, due to space limitations) how ANEPFCs with 16 nodes can simulate in O(f(n)) time any nondeterministic Turing machine of time complexity f(n). Thus the known upper bound for the number of nodes in a network simulating an arbitrary Turing machine is decreased from 26 to 16. △ Less

Submitted 29 July, 2009; originally announced July 2009.

Journal ref: EPTCS 3, 2009, pp. 173-182

Showing 1–37 of 37 results for author: Manea, F