-
On the cyclic regularities of strings
Authors:
Oluwole Ajala,
Miznah Alshammary,
Mai Alzamel,
Jia Gao,
Costas Iliopoulos,
Jakub Radoszewski,
Wojciech Rytter,
Bruce Watson
Abstract:
Regularities in strings are often related to periods and covers, which have extensively been studied, and algorithms for their efficient computation have broad application. In this paper we concentrate on computing cyclic regularities of strings, in particular, we propose several efficient algorithms for computing: (i) cyclic periodicity; (ii) all cyclic periodicity; (iii) maximal local cyclic per…
▽ More
Regularities in strings are often related to periods and covers, which have extensively been studied, and algorithms for their efficient computation have broad application. In this paper we concentrate on computing cyclic regularities of strings, in particular, we propose several efficient algorithms for computing: (i) cyclic periodicity; (ii) all cyclic periodicity; (iii) maximal local cyclic periodicity; (iv) cyclic covers.
△ Less
Submitted 5 August, 2019;
originally announced August 2019.
-
Quasi-Linear-Time Algorithm for Longest Common Circular Factor
Authors:
Mai Alzamel,
Maxime Crochemore,
Costas S. Iliopoulos,
Tomasz Kociumaka,
Jakub Radoszewski,
Wojciech Rytter,
Juliusz Straszyński,
Tomasz Waleń,
Wiktor Zuba
Abstract:
We introduce the Longest Common Circular Factor (LCCF) problem in which, given strings $S$ and $T$ of length $n$, we are to compute the longest factor of $S$ whose cyclic shift occurs as a factor of $T$. It is a new similarity measure, an extension of the classic Longest Common Factor. We show how to solve the LCCF problem in $O(n \log^5 n)$ time.
We introduce the Longest Common Circular Factor (LCCF) problem in which, given strings $S$ and $T$ of length $n$, we are to compute the longest factor of $S$ whose cyclic shift occurs as a factor of $T$. It is a new similarity measure, an extension of the classic Longest Common Factor. We show how to solve the LCCF problem in $O(n \log^5 n)$ time.
△ Less
Submitted 31 January, 2019;
originally announced January 2019.
-
How to answer a small batch of RMQs or LCA queries in practice
Authors:
Mai Alzamel,
Panagiotis Charalampopoulos,
Costas S. Iliopoulos,
Solon P. Pissis
Abstract:
In the Range Minimum Query (RMQ) problem, we are given an array $A$ of $n$ numbers and we are asked to answer queries of the following type: for indices $i$ and $j$ between $0$ and $n-1$, query $\text{RMQ}_A(i,j)$ returns the index of a minimum element in the subarray $A[i..j]$. Answering a small batch of RMQs is a core computational task in many real-world applications, in particular due to the c…
▽ More
In the Range Minimum Query (RMQ) problem, we are given an array $A$ of $n$ numbers and we are asked to answer queries of the following type: for indices $i$ and $j$ between $0$ and $n-1$, query $\text{RMQ}_A(i,j)$ returns the index of a minimum element in the subarray $A[i..j]$. Answering a small batch of RMQs is a core computational task in many real-world applications, in particular due to the connection with the Lowest Common Ancestor (LCA) problem. With small batch, we mean that the number $q$ of queries is $o(n)$ and we have them all at hand. It is therefore not relevant to build an $Ω(n)$-sized data structure or spend $Ω(n)$ time to build a more succinct one. It is well-known, among practitioners and elsewhere, that these data structures for online querying carry high constants in their pre-processing and querying time. We would thus like to answer this batch efficiently in practice. With efficiently in practice, we mean that we (ultimately) want to spend $n + \mathcal{O}(q)$ time and $\mathcal{O}(q)$ space. We write $n$ to stress that the number of operations per entry of $A$ should be a very small constant. Here we show how existing algorithms can be easily modified to satisfy these conditions. The presented experimental results highlight the practicality of this new scheme. The most significant improvement obtained is for answering a small batch of LCA queries. A library implementation of the presented algorithms is made available.
△ Less
Submitted 12 May, 2017;
originally announced May 2017.
-
Faster algorithms for 1-mappability of a sequence
Authors:
Mai Alzamel,
Panagiotis Charalampopoulos,
Costas S. Iliopoulos,
Solon P. Pissis,
Jakub Radoszewski,
Wing-Kin Sung
Abstract:
In the k-mappability problem, we are given a string x of length n and integers m and k, and we are asked to count, for each length-m factor y of x, the number of other factors of length m of x that are at Hamming distance at most k from y. We focus here on the version of the problem where k = 1. The fastest known algorithm for k = 1 requires time O(mn log n/ log log n) and space O(n). We present t…
▽ More
In the k-mappability problem, we are given a string x of length n and integers m and k, and we are asked to count, for each length-m factor y of x, the number of other factors of length m of x that are at Hamming distance at most k from y. We focus here on the version of the problem where k = 1. The fastest known algorithm for k = 1 requires time O(mn log n/ log log n) and space O(n). We present two algorithms that require worst-case time O(mn) and O(n log^2 n), respectively, and space O(n), thus greatly improving the state of the art. Moreover, we present an algorithm that requires average-case time and space O(n) for integer alphabets if m = Ω(log n/ log σ), where σ is the alphabet size.
△ Less
Submitted 11 May, 2017;
originally announced May 2017.
-
Palindromic Decompositions with Gaps and Errors
Authors:
Michał Adamczyk,
Mai Alzamel,
Panagiotis Charalampopoulos,
Costas S. Iliopoulos,
Jakub Radoszewski
Abstract:
Identifying palindromes in sequences has been an interesting line of research in combinatorics on words and also in computational biology, after the discovery of the relation of palindromes in the DNA sequence with the HIV virus. Efficient algorithms for the factorization of sequences into palindromes and maximal palindromes have been devised in recent years. We extend these studies by allowing ga…
▽ More
Identifying palindromes in sequences has been an interesting line of research in combinatorics on words and also in computational biology, after the discovery of the relation of palindromes in the DNA sequence with the HIV virus. Efficient algorithms for the factorization of sequences into palindromes and maximal palindromes have been devised in recent years. We extend these studies by allowing gaps in decompositions and errors in palindromes, and also imposing a lower bound to the length of acceptable palindromes.
We first present an algorithm for obtaining a palindromic decomposition of a string of length n with the minimal total gap length in time O(n log n * g) and space O(n g), where g is the number of allowed gaps in the decomposition. We then consider a decomposition of the string in maximal δ-palindromes (i.e. palindromes with δerrors under the edit or Hamming distance) and g allowed gaps. We present an algorithm to obtain such a decomposition with the minimal total gap length in time O(n (g + δ)) and space O(n g).
△ Less
Submitted 27 March, 2017;
originally announced March 2017.