-
Algorithms for Galois Words: Detection, Factorization, and Rotation
Authors:
Diptarama Hendrian,
Dominik Köppl,
Ryo Yoshinaka,
Ayumi Shinohara
Abstract:
Lyndon words are extensively studied in combinatorics on words -- they play a crucial role on upper bounding the number of runs a word can have [Bannai+, SIAM J. Comput.'17]. We can determine Lyndon words, factorize a word into Lyndon words in lexicographically non-increasing order, and find the Lyndon rotation of a word, all in linear time within constant additional working space. A recent resear…
▽ More
Lyndon words are extensively studied in combinatorics on words -- they play a crucial role on upper bounding the number of runs a word can have [Bannai+, SIAM J. Comput.'17]. We can determine Lyndon words, factorize a word into Lyndon words in lexicographically non-increasing order, and find the Lyndon rotation of a word, all in linear time within constant additional working space. A recent research interest emerged from the question of what happens when we change the lexicographic order, which is at the heart of the definition of Lyndon words. In particular, the alternating order, where the order of all odd positions becomes reversed, has been recently proposed. While a Lyndon word is, among all its cyclic rotations, the smallest one with respect to the lexicographic order, a Galois word exhibits the same property by exchanging the lexicographic order with the alternating order. Unfortunately, this exchange has a large impact on the properties Galois words exhibit, which makes it a nontrivial task to translate results from Lyndon words to Galois words. Up until now, it has only been conjectured that linear-time algorithms with constant additional working space in the spirit of Duval's algorithm are possible for computing the Galois factorization or the Galois rotation.
Here, we affirm this conjecture as follows. Given a word $T$ of length $n$, we can determine whether $T$ is a Galois word, in $O(n)$ time with constant additional working space. Within the same complexities, we can also determine the Galois rotation of $T$, and compute the Galois factorization of $T$ online. The last result settles Open Problem~1 in [Dolce et al., TCS 2019] for Galois words.
△ Less
Submitted 23 April, 2024; v1 submitted 4 March, 2024;
originally announced March 2024.
-
Breaking a Barrier in Constructing Compact Indexes for Parameterized Pattern Matching
Authors:
Kento Iseri,
Tomohiro I,
Diptarama Hendrian,
Dominik Köppl,
Ryo Yoshinaka,
Ayumi Shinohara
Abstract:
A parameterized string (p-string) is a string over an alphabet $(Σ_{s} \cup Σ_{p})$, where $Σ_{s}$ and $Σ_{p}$ are disjoint alphabets for static symbols (s-symbols) and for parameter symbols (p-symbols), respectively. Two p-strings $x$ and $y$ are said to parameterized match (p-match) if and only if $x$ can be transformed into $y$ by applying a bijection on $Σ_{p}$ to every occurrence of p-symbols…
▽ More
A parameterized string (p-string) is a string over an alphabet $(Σ_{s} \cup Σ_{p})$, where $Σ_{s}$ and $Σ_{p}$ are disjoint alphabets for static symbols (s-symbols) and for parameter symbols (p-symbols), respectively. Two p-strings $x$ and $y$ are said to parameterized match (p-match) if and only if $x$ can be transformed into $y$ by applying a bijection on $Σ_{p}$ to every occurrence of p-symbols in $x$. The indexing problem for p-matching is to preprocess a p-string $T$ of length $n$ so that we can efficiently find the occurrences of substrings of $T$ that p-match with a given pattern. Extending the Burrows-Wheeler Transform (BWT) based index for exact string pattern matching, Ganguly et al. [SODA 2017] proposed the first compact index (named pBWT) for p-matching, and posed an open problem on how to construct it in compact space, i.e., in $O(n \lg |Σ_{s} \cup Σ_{p}|)$ bits of space. Hashimoto et al. [SPIRE 2022] partially solved this problem by showing how to construct some components of pBWTs for $T$ in $O(n \frac{|Σ_{p}| \lg n}{\lg \lg n})$ time in an online manner while reading the symbols of $T$ from right to left. In this paper, we improve the time complexity to $O(n \frac{\lg |Σ_{p}| \lg n}{\lg \lg n})$. We remark that removing the multiplicative factor of $|Σ_{p}|$ from the complexity is of great interest because it has not been achieved for over a decade in the construction of related data structures like parameterized suffix arrays even in the offline setting. We also show that our data structure can support backward search, a core procedure of BWT-based indexes, at any stage of the online construction, making it the first compact index for p-matching that can be constructed in compact space and even in an online manner.
△ Less
Submitted 11 August, 2023;
originally announced August 2023.
-
Efficient Parameterized Pattern Matching in Sublinear Space
Authors:
Haruki Ideguchi,
Diptarama Hendrian,
Ryo Yoshinaka,
Ayumi Shinohara
Abstract:
The parameterized matching problem is a variant of string matching, which is to search for all parameterized occurrences of a pattern $P$ in a text $T$. In considering matching algorithms, the combinatorial natures of strings, especially periodicity, play an important role. In this paper, we analyze the properties of periods of parameterized strings and propose a generalization of Galil and Seifer…
▽ More
The parameterized matching problem is a variant of string matching, which is to search for all parameterized occurrences of a pattern $P$ in a text $T$. In considering matching algorithms, the combinatorial natures of strings, especially periodicity, play an important role. In this paper, we analyze the properties of periods of parameterized strings and propose a generalization of Galil and Seiferas's exact matching algorithm (1980) into parameterized matching, which runs in $O(π|T|+|P|)$ time and $O(\log{|P|}+|{\rmΠ}|)$ space in addition to the input space, where ${\rmΠ}$ is the parameter alphabet and $π$ is the number of parameter characters appearing in $P$ plus one.
△ Less
Submitted 19 June, 2023;
originally announced June 2023.
-
Efficient Non-isomorphic Graph Enumeration Algorithms for Subclasses of Perfect Graphs
Authors:
Jun Kawahara,
Toshiki Saitoh,
Hirokazu Takeda,
Ryo Yoshinaka,
Yui Yoshioka
Abstract:
Intersection graphs are well-studied in the area of graph algorithms. Some intersection graph classes are known to have algorithms enumerating all unlabeled graphs by reverse search. Since these algorithms output graphs one by one and the numbers of graphs in these classes are vast, they work only for a small number of vertices. Binary decision diagrams (BDDs) are compact data structures for vario…
▽ More
Intersection graphs are well-studied in the area of graph algorithms. Some intersection graph classes are known to have algorithms enumerating all unlabeled graphs by reverse search. Since these algorithms output graphs one by one and the numbers of graphs in these classes are vast, they work only for a small number of vertices. Binary decision diagrams (BDDs) are compact data structures for various types of data and useful for solving optimization and enumeration problems. This study proposes enumeration algorithms for five intersection graph classes, which admit $\mathrm{O}(n)$-bit string representations for their member graphs. Our algorithm for each class enumerates all unlabeled graphs with $n$ vertices over BDDs representing the binary strings in time polynomial in $n$. Moreover, our algorithms are extended to enumerate those with constraints on the maximum (bi)clique size and/or the number of edges.
△ Less
Submitted 14 December, 2022;
originally announced December 2022.
-
Inferring Strings from Position Heaps in Linear Time
Authors:
Koshiro Kumagai,
Diptarama Hendrian,
Ryo Yoshinaka,
Ayumi Shinohara
Abstract:
Position heaps are index structures of text strings used for the string matching problem. They are rooted trees whose edges and nodes are labeled and numbered, respectively. This paper is concerned with variants of the inverse problem of position heap construction and gives linear-time algorithms for those problems. The basic problem is to restore a text string from a rooted tree with labeled edge…
▽ More
Position heaps are index structures of text strings used for the string matching problem. They are rooted trees whose edges and nodes are labeled and numbered, respectively. This paper is concerned with variants of the inverse problem of position heap construction and gives linear-time algorithms for those problems. The basic problem is to restore a text string from a rooted tree with labeled edges and numbered nodes. In the variant problems, the input trees may miss edge labels or node numbers which we must restore as well.
△ Less
Submitted 12 December, 2022; v1 submitted 26 September, 2022;
originally announced September 2022.
-
Computing the Parameterized Burrows--Wheeler Transform Online
Authors:
Daiki Hashimoto,
Diptarama Hendrian,
Dominik Köppl,
Ryo Yoshinaka,
Ayumi Shinohara
Abstract:
Parameterized strings are a generalization of strings in that their characters are drawn from two different alphabets, where one is considered to be the alphabet of static characters and the other to be the alphabet of parameter characters. Two parameterized strings are a parameterized match if there is a bijection over all characters such that the bijection transforms one string to the other whil…
▽ More
Parameterized strings are a generalization of strings in that their characters are drawn from two different alphabets, where one is considered to be the alphabet of static characters and the other to be the alphabet of parameter characters. Two parameterized strings are a parameterized match if there is a bijection over all characters such that the bijection transforms one string to the other while kee** the static characters (i.e., it behaves as the identity on the static alphabet). Ganguly et al. [SODA 2017] proposed the parameterized Burrows--Wheeler transform (pBWT) as a variant of the Burrows--Wheeler transform for space-efficient parameterized pattern matching. In this paper, we propose an algorithm for computing the pBWT online by reading the characters of a given input string one-by-one from right to left. Our algorithm works in $O(|Π| \log n / \log \log n)$ amortized time for each input character, where $n$ and $Π$ denote the size of the input string and the alphabet of the parameter characters, respectively.
△ Less
Submitted 30 August, 2022; v1 submitted 30 June, 2022;
originally announced June 2022.
-
Parallel algorithm for pattern matching problems under substring consistent equivalence relations
Authors:
Davaajav Jargalsaikhan,
Diptarama Hendrian,
Ryo Yoshinaka,
Ayumi Shinohara
Abstract:
Given a text and a pattern over an alphabet, the pattern matching problem searches for all occurrences of the pattern in the text. An equivalence relation $\approx$ is called a substring consistent equivalence relation (SCER), if for two strings $X$ and $Y$, $X \approx Y$ implies $|X| = |Y|$ and $X[i:j] \approx Y[i:j]$ for all $1 \le i \le j \le |X|$. In this paper, we propose an efficient paralle…
▽ More
Given a text and a pattern over an alphabet, the pattern matching problem searches for all occurrences of the pattern in the text. An equivalence relation $\approx$ is called a substring consistent equivalence relation (SCER), if for two strings $X$ and $Y$, $X \approx Y$ implies $|X| = |Y|$ and $X[i:j] \approx Y[i:j]$ for all $1 \le i \le j \le |X|$. In this paper, we propose an efficient parallel algorithm for pattern matching under any SCER using the"duel-and-sweep" paradigm. For a pattern of length $m$ and a text of length $n$, our algorithm runs in $O(ξ_m^\mathrm{t} \log^2 m)$ time and $O(ξ_m^\mathrm{w} \cdot n \log^2 m)$ work, with $O(τ_n^\mathrm{t} + ξ_m^\mathrm{t} \log^2 m)$ time and $O(τ_n^\mathrm{w} + ξ_m^\mathrm{w} \cdot m \log^2 m)$ work preprocessing on the Priority Concurrent Read Concurrent Write Parallel Random-Access Machines (P-CRCW PRAM).
△ Less
Submitted 27 July, 2022; v1 submitted 26 February, 2022;
originally announced February 2022.
-
Sorting Balls and Water: Equivalence and Computational Complexity
Authors:
Takehiro Ito,
Jun Kawahara,
Shin-ichi Minato,
Yota Otachi,
Toshiki Saitoh,
Akira Suzuki,
Ryuhei Uehara,
Takeaki Uno,
Katsuhisa Yamanaka,
Ryo Yoshinaka
Abstract:
Various forms of sorting problems have been studied over the years. Recently, two kinds of sorting puzzle apps are popularized. In these puzzles, we are given a set of bins filled with colored units, balls or water, and some empty bins. These puzzles allow us to move colored units from a bin to another when the colors involved match in some way or the target bin is empty. The goal of these puzzles…
▽ More
Various forms of sorting problems have been studied over the years. Recently, two kinds of sorting puzzle apps are popularized. In these puzzles, we are given a set of bins filled with colored units, balls or water, and some empty bins. These puzzles allow us to move colored units from a bin to another when the colors involved match in some way or the target bin is empty. The goal of these puzzles is to sort all the color units in order. We investigate computational complexities of these puzzles. We first show that these two puzzles are essentially the same from the viewpoint of solvability. That is, an instance is sortable by ball-moves if and only if it is sortable by water-moves. We also show that every yes-instance has a solution of polynomial length, which implies that these puzzles belong to in NP. We then show that these puzzles are NP-complete. For some special cases, we give polynomial-time algorithms. We finally consider the number of empty bins sufficient for making all instances solvable and give non-trivial upper and lower bounds in terms of the number of filled bins and the capacity of bins.
△ Less
Submitted 18 February, 2022;
originally announced February 2022.
-
Fixed-Treewidth-Efficient Algorithms for Edge-Deletion to Intersection Graph Classes
Authors:
Toshiki Saitoh,
Ryo Yoshinaka,
Hans L. Bodlaender
Abstract:
For a graph class $\mathcal{C}$, the $\mathcal{C}$-Edge-Deletion problem asks for a given graph $G$ to delete the minimum number of edges from $G$ in order to obtain a graph in $\mathcal{C}$. We study the $\mathcal{C}$-Edge-Deletion problem for $\mathcal{C}$ the permutation graphs, interval graphs, and other related graph classes. It follows from Courcelle's Theorem that these problems are fixed p…
▽ More
For a graph class $\mathcal{C}$, the $\mathcal{C}$-Edge-Deletion problem asks for a given graph $G$ to delete the minimum number of edges from $G$ in order to obtain a graph in $\mathcal{C}$. We study the $\mathcal{C}$-Edge-Deletion problem for $\mathcal{C}$ the permutation graphs, interval graphs, and other related graph classes. It follows from Courcelle's Theorem that these problems are fixed parameter tractable when parameterized by treewidth. In this paper, we present concrete FPT algorithms for these problems. By giving explicit algorithms and analyzing these in detail, we obtain algorithms that are significantly faster than the algorithms obtained by using Courcelle's theorem.
△ Less
Submitted 12 November, 2021; v1 submitted 7 July, 2020;
originally announced July 2020.
-
Grammar compression with probabilistic context-free grammar
Authors:
Hiroaki Naganuma,
Diptarama Hendrian,
Ryo Yoshinaka,
Ayumi Shinohara,
Naoki Kobayashi
Abstract:
We propose a new approach for universal lossless text compression, based on grammar compression. In the literature, a target string $T$ has been compressed as a context-free grammar $G$ in Chomsky normal form satisfying $L(G) = \{T\}$. Such a grammar is often called a \emph{straight-line program} (SLP). In this paper, we consider a probabilistic grammar $G$ that generates $T$, but not necessarily…
▽ More
We propose a new approach for universal lossless text compression, based on grammar compression. In the literature, a target string $T$ has been compressed as a context-free grammar $G$ in Chomsky normal form satisfying $L(G) = \{T\}$. Such a grammar is often called a \emph{straight-line program} (SLP). In this paper, we consider a probabilistic grammar $G$ that generates $T$, but not necessarily as a unique element of $L(G)$. In order to recover the original text $T$ unambiguously, we keep both the grammar $G$ and the derivation tree of $T$ from the start symbol in $G$, in compressed form. We show some simple evidence that our proposal is indeed more efficient than SLPs for certain texts, both from theoretical and practical points of view.
△ Less
Submitted 18 March, 2020;
originally announced March 2020.
-
Fast and linear-time string matching algorithms based on the distances of $q$-gram occurrences
Authors:
Satoshi Kobayashi,
Diptarama Hendrian,
Ryo Yoshinaka,
Ayumi Shinohara
Abstract:
Given a text $T$ of length $n$ and a pattern $P$ of length $m$, the string matching problem is a task to find all occurrences of $P$ in $T$. In this study, we propose an algorithm that solves this problem in $O((n + m)q)$ time considering the distance between two adjacent occurrences of the same $q$-gram contained in $P$. We also propose a theoretical improvement of it which runs in $O(n + m)$ tim…
▽ More
Given a text $T$ of length $n$ and a pattern $P$ of length $m$, the string matching problem is a task to find all occurrences of $P$ in $T$. In this study, we propose an algorithm that solves this problem in $O((n + m)q)$ time considering the distance between two adjacent occurrences of the same $q$-gram contained in $P$. We also propose a theoretical improvement of it which runs in $O(n + m)$ time, though it is not necessarily faster in practice. We compare the execution times of our and existing algorithms on various kinds of real and artificial datasets such as an English text, a genome sequence and a Fibonacci string. The experimental results show that our algorithm is as fast as the state-of-the-art algorithms in many cases, particularly when a pattern frequently appears in a text.
△ Less
Submitted 12 April, 2020; v1 submitted 18 February, 2020;
originally announced February 2020.
-
Parameterized DAWGs: efficient constructions and bidirectional pattern searches
Authors:
Katsuhito Nakashima,
Noriki Fujisato,
Diptarama Hendrian,
Yuto Nakashima,
Ryo Yoshinaka,
Shunsuke Inenaga,
Hideo Bannai,
Ayumi Shinohara,
Masayuki Takeda
Abstract:
Two strings $x$ and $y$ over $Σ\cup Π$ of equal length are said to \emph{parameterized match} (\emph{p-match}) if there is a renaming bijection $f:Σ\cup Π\rightarrow Σ\cup Π$ that is identity on $Σ$ and transforms $x$ to $y$ (or vice versa). The \emph{p-matching} problem is to look for substrings in a text that p-match a given pattern. In this paper, we propose \emph{parameterized suffix automata}…
▽ More
Two strings $x$ and $y$ over $Σ\cup Π$ of equal length are said to \emph{parameterized match} (\emph{p-match}) if there is a renaming bijection $f:Σ\cup Π\rightarrow Σ\cup Π$ that is identity on $Σ$ and transforms $x$ to $y$ (or vice versa). The \emph{p-matching} problem is to look for substrings in a text that p-match a given pattern. In this paper, we propose \emph{parameterized suffix automata} (\emph{p-suffix automata}) and \emph{parameterized directed acyclic word graphs} (\emph{PDAWGs}) which are the p-matching versions of suffix automata and DAWGs. While suffix automata and DAWGs are equivalent for standard strings, we show that p-suffix automata can have $Θ(n^2)$ nodes and edges but PDAWGs have only $O(n)$ nodes and edges, where $n$ is the length of an input string. We also give an $O(n |Π| \log (|Π| + |Σ|))$-time $O(n)$-space algorithm that builds the PDAWG in a left-to-right online manner. As a byproduct, it is shown that the \emph{parameterized suffix tree} for the reversed string can also be built in the same time and space, in a right-to-left online manner. This duality also leads us to two further efficient algorithms for p-matching: Given the parameterized suffix tree for the reversal of the input string $T$, one can build the PDAWG of $T$ in $O(n)$ time in an offline manner; One can perform \emph{bidirectional} p-matching in $O(m \log (|Π|+|Σ|) + \mathit{occ})$ time using $O(n)$ space, where $m$ denotes the pattern length and $\mathit{occ}$ is the number of pattern occurrences in the text $T$.
△ Less
Submitted 16 September, 2022; v1 submitted 17 February, 2020;
originally announced February 2020.
-
Computing Covers under Substring Consistent Equivalence Relations
Authors:
Natsumi Kikuchi,
Diptarama Hendrian,
Ryo Yoshinaka,
Ayumi Shinohara
Abstract:
Covers are a kind of quasiperiodicity in strings. A string $C$ is a cover of another string $T$ if any position of $T$ is inside some occurrence of $C$ in $T$. The shortest and longest cover arrays of $T$ have the lengths of the shortest and longest covers of each prefix of $T$, respectively. The literature has proposed linear-time algorithms computing longest and shortest cover arrays taking bord…
▽ More
Covers are a kind of quasiperiodicity in strings. A string $C$ is a cover of another string $T$ if any position of $T$ is inside some occurrence of $C$ in $T$. The shortest and longest cover arrays of $T$ have the lengths of the shortest and longest covers of each prefix of $T$, respectively. The literature has proposed linear-time algorithms computing longest and shortest cover arrays taking border arrays as input. An equivalence relation $\approx$ over strings is called a substring consistent equivalence relation (SCER) iff $X \approx Y$ implies (1) $|X| = |Y|$ and (2) $X[i:j] \approx Y[i:j]$ for all $1 \le i \le j \le |X|$. In this paper, we generalize the notion of covers for SCERs and prove that existing algorithms to compute the shortest cover array and the longest cover array of a string $T$ under the identity relation will work for any SCERs taking the accordingly generalized border arrays.
△ Less
Submitted 30 July, 2020; v1 submitted 16 February, 2020;
originally announced February 2020.
-
Query Learning Algorithm for Residual Symbolic Finite Automata
Authors:
Kaizaburo Chubachi,
Diptarama Hendrian,
Ryo Yoshinaka,
Ayumi Shinohara
Abstract:
We propose a query learning algorithm for residual symbolic finite automata (RSFAs). Symbolic finite automata (SFAs) are finite automata whose transitions are labeled by predicates over a Boolean algebra, in which a big collection of characters leading the same transition may be represented by a single predicate. Residual finite automata (RFAs) are a special type of non-deterministic finite automa…
▽ More
We propose a query learning algorithm for residual symbolic finite automata (RSFAs). Symbolic finite automata (SFAs) are finite automata whose transitions are labeled by predicates over a Boolean algebra, in which a big collection of characters leading the same transition may be represented by a single predicate. Residual finite automata (RFAs) are a special type of non-deterministic finite automata which can be exponentially smaller than the minimum deterministic finite automata and have a favorable property for learning algorithms. RSFAs have both properties of SFAs and RFAs and can have more succinct representation of transitions and fewer states than RFAs and deterministic SFAs accepting the same language. The implementation of our algorithm efficiently learns RSFAs over a huge alphabet and outperforms an existing learning algorithm for deterministic SFAs. The result also shows that the benefit of non-determinism in efficiency is even larger in learning SFAs than non-symbolic automata.
△ Less
Submitted 17 September, 2019; v1 submitted 20 February, 2019;
originally announced February 2019.
-
An Extension of Linear-size Suffix Tries for Parameterized Strings
Authors:
Katsuhito Nakashima,
Diptarama Hendrian,
Ryo Yoshinaka,
Ayumi Shinohara
Abstract:
In this paper, we propose a new indexing structure for parameterized strings which we call PLSTs, by generalizing linear-size suffix tries for ordinary strings. Two parameterized strings are said to match if there is a bijection on the symbol set that makes the two coincide. PLSTs are applicable to the parameterized pattern matching problem, which is to decide whether the input parameterized text…
▽ More
In this paper, we propose a new indexing structure for parameterized strings which we call PLSTs, by generalizing linear-size suffix tries for ordinary strings. Two parameterized strings are said to match if there is a bijection on the symbol set that makes the two coincide. PLSTs are applicable to the parameterized pattern matching problem, which is to decide whether the input parameterized text has a substring that matches the input parameterized pattern. The size of PLSTs is linear in the text size, with which our algorithm solves the parameterized pattern matching problem in linear time in the pattern size. PLSTs can be seen as a compacted version of parameterized suffix tries and a combination of linear-size suffix tries and parameterized suffix trees. We experimentally show that PLSTs are more space efficient than parameterized suffix trees for highly repetitive strings.
△ Less
Submitted 4 September, 2019; v1 submitted 1 February, 2019;
originally announced February 2019.
-
Enumerating Cryptarithms Using Deterministic Finite Automata
Authors:
Yuki Nozaki,
Diptarama Hendrian,
Ryo Yoshinaka,
Takashi Horiyama,
Ayumi Shinohara
Abstract:
A cryptarithm is a mathematical puzzle where given an arithmetic equation written with letters rather than numerals, a player must discover an assignment of numerals on letters that makes the equation hold true. In this paper, we propose a method to construct a DFA that accepts cryptarithms that admit (unique) solutions for each base. We implemented the method and constructed a DFA for bases…
▽ More
A cryptarithm is a mathematical puzzle where given an arithmetic equation written with letters rather than numerals, a player must discover an assignment of numerals on letters that makes the equation hold true. In this paper, we propose a method to construct a DFA that accepts cryptarithms that admit (unique) solutions for each base. We implemented the method and constructed a DFA for bases $k \le 7$. Those DFAs can be used as complete catalogues of cryptarithms,whose applications include enumeration of and counting the exact numbers $G_k(n)$ of cryptarithm instances with $n$ digits that admit base-$k$ solutions. Moreover, explicit formulas for $G_2(n)$ and $G_3(n)$ are given.
△ Less
Submitted 26 July, 2018;
originally announced July 2018.
-
Linear-Time Online Algorithm Inferring the Shortest Path from a Walk
Authors:
Shintaro Narisada,
Diptarama Hendrian,
Ryo Yoshinaka,
Ayumi Shinohara
Abstract:
We consider the problem of inferring an edge-labeled graph from the sequence of edge labels seen in a walk of that graph. It has been known that this problem is solvable in $O(n \log n)$ time when the targets are path or cycle graphs. This paper presents an online algorithm for the problem of this restricted case that runs in $O(n)$ time, based on Manacher's algorithm for computing all the maximal…
▽ More
We consider the problem of inferring an edge-labeled graph from the sequence of edge labels seen in a walk of that graph. It has been known that this problem is solvable in $O(n \log n)$ time when the targets are path or cycle graphs. This paper presents an online algorithm for the problem of this restricted case that runs in $O(n)$ time, based on Manacher's algorithm for computing all the maximal palindromes in a string.
△ Less
Submitted 20 February, 2019; v1 submitted 26 June, 2018;
originally announced June 2018.
-
Enumerating All Subgraphs without Forbidden Induced Subgraphs via Multivalued Decision Diagrams
Authors:
Jun Kawahara,
Toshiki Saitoh,
Hirofumi Suzuki,
Ryo Yoshinaka
Abstract:
We propose a general method performed over multivalued decision diagrams that enumerates all subgraphs of an input graph that are characterized by input forbidden induced subgraphs. Our method combines elaborations of classical set operations and the develo** construction technique, called the frontier based search, for multivalued decision diagrams. Using the algorithm, we enumerated all the ch…
▽ More
We propose a general method performed over multivalued decision diagrams that enumerates all subgraphs of an input graph that are characterized by input forbidden induced subgraphs. Our method combines elaborations of classical set operations and the develo** construction technique, called the frontier based search, for multivalued decision diagrams. Using the algorithm, we enumerated all the chordal graphs of size at most 10 on multivalued decision diagrams.
△ Less
Submitted 11 April, 2018;
originally announced April 2018.
-
Efficient Dynamic Dictionary Matching with DAWGs and AC-automata
Authors:
Diptarama Hendrian,
Shunsuke Inenaga,
Ryo Yoshinaka,
Ayumi Shinohara
Abstract:
The dictionary matching is a task to find all occurrences of patterns in a set $D$ (called a dictionary) on a text $T$. The Aho-Corasick-automaton (AC-automaton) is a data structure which enables us to solve the dictionary matching problem in $O(d\logσ)$ preprocessing time and $O(n\logσ+occ)$ matching time, where $d$ is the total length of the patterns in $D$, $n$ is the length of the text, $σ$ is…
▽ More
The dictionary matching is a task to find all occurrences of patterns in a set $D$ (called a dictionary) on a text $T$. The Aho-Corasick-automaton (AC-automaton) is a data structure which enables us to solve the dictionary matching problem in $O(d\logσ)$ preprocessing time and $O(n\logσ+occ)$ matching time, where $d$ is the total length of the patterns in $D$, $n$ is the length of the text, $σ$ is the alphabet size, and $occ$ is the total number of occurrences of all the patterns in the text. The dynamic dictionary matching is a variant where patterns may dynamically be inserted into and deleted from $D$. This problem is called semi-dynamic dictionary matching if only insertions are allowed. In this paper, we propose two efficient algorithms. For a pattern of length $m$, our first algorithm supports insertions in $O(m\logσ+\log d/\log\log d)$ time and pattern matching in $O(n\logσ+occ)$ time for the semi-dynamic setting and supports both insertions and deletions in $O(σm+\log d/\log\log d)$ time and pattern matching in $O(n(\log d/\log\log d+\logσ)+occ(\log d/\log\log d))$ time for the dynamic setting by some modifications. This algorithm is based on the directed acyclic word graph. Our second algorithm, which is based on the AC-automaton, supports insertions in $O(m\log σ+u_f+u_o)$ time for the semi-dynamic setting and supports both insertions and deletions in $O(σm+u_f+u_o)$ time for the dynamic setting, where $u_f$ and $u_o$ respectively denote the numbers of states in which the failure function and the output function need to be updated. This algorithm performs pattern matching in $O(n\logσ+occ)$ time for both settings. Our algorithm achieves optimal update time for AC-automaton based methods over constant-size alphabets, since any algorithm which explicitly maintains the AC-automaton requires $Ω(m+u_f+u_o)$ update time.
△ Less
Submitted 20 February, 2019; v1 submitted 9 October, 2017;
originally announced October 2017.
-
New Variants of Pattern Matching with Constants and Variables
Authors:
Yuki Igarashi,
Diptarama,
Ryo Yoshinaka,
Ayumi Shinohara
Abstract:
Given a text and a pattern over two types of symbols called constants and variables, the parameterized pattern matching problem is to find all occurrences of substrings of the text that the pattern matches by substituting a variable in the text for each variable in the pattern, where the substitution should be injective. The function matching problem is a variant of it that lifts the injection con…
▽ More
Given a text and a pattern over two types of symbols called constants and variables, the parameterized pattern matching problem is to find all occurrences of substrings of the text that the pattern matches by substituting a variable in the text for each variable in the pattern, where the substitution should be injective. The function matching problem is a variant of it that lifts the injection constraint. In this paper, we discuss variants of those problems, where one can substitute a constant or a variable for each variable of the pattern. We give two kinds of algorithms for both problems, a convolution-based method and an extended KMP-based method, and analyze their complexity.
△ Less
Submitted 26 May, 2017;
originally announced May 2017.
-
Duel and sweep algorithm for order-preserving pattern matching
Authors:
Davaajav Jargalsaikhan,
Diptarama,
Ryo Yoshinaka,
Ayumi Shinohara
Abstract:
Given a text $T$ and a pattern $P$ over alphabet $Σ$, the classic exact matching problem searches for all occurrences of pattern $P$ in text $T$. Unlike exact matching problem, order-preserving pattern matching (OPPM) considers the relative order of elements, rather than their real values. In this paper, we propose an efficient algorithm for OPPM problem using the "duel-and-sweep" paradigm. Our al…
▽ More
Given a text $T$ and a pattern $P$ over alphabet $Σ$, the classic exact matching problem searches for all occurrences of pattern $P$ in text $T$. Unlike exact matching problem, order-preserving pattern matching (OPPM) considers the relative order of elements, rather than their real values. In this paper, we propose an efficient algorithm for OPPM problem using the "duel-and-sweep" paradigm. Our algorithm runs in $O(n + m\log m)$ time in general and $O(n + m)$ time under an assumption that the characters in a string can be sorted in linear time with respect to the string size. We also perform experiments and show that our algorithm is faster that KMP-based algorithm. Last, we introduce the two-dimensional order preserved pattern matching and give a duel and sweep algorithm that runs in $O(n^2)$ time for duel stage and $O(n^2 m)$ time for swee** time with $O(m^3)$ preprocessing time.
△ Less
Submitted 26 May, 2017;
originally announced May 2017.
-
The Time Complexity of Permutation Routing via Matching, Token Swap** and a Variant
Authors:
Jun Kawahara,
Toshiki Saitoh,
Ryo Yoshinaka
Abstract:
The problems of Permutation Routing via Matching and Token Swap** are reconfiguration problems on graphs. This paper is concerned with the complexity of those problems and a colored variant. For a given graph where each vertex has a unique token on it, those problems require to find a shortest way to modify a token placement into another by swap** tokens on adjacent vertices. While all pairs o…
▽ More
The problems of Permutation Routing via Matching and Token Swap** are reconfiguration problems on graphs. This paper is concerned with the complexity of those problems and a colored variant. For a given graph where each vertex has a unique token on it, those problems require to find a shortest way to modify a token placement into another by swap** tokens on adjacent vertices. While all pairs of tokens on a matching can be exchanged at once in Permutation Routing via Matching, Token Swap** allows only one pair of tokens can be swapped. In the colored version, vertices and tokens are colored and the goal is to relocate tokens so that each vertex has a token of the same color. We investigate the time complexity of several restricted cases of those problems and show when those problems become tractable and remain intractable.
△ Less
Submitted 12 September, 2017; v1 submitted 9 December, 2016;
originally announced December 2016.
-
Longest Common Subsequence in at Least $k$ Length Order-Isomorphic Substrings
Authors:
Yohei Ueki,
Diptarama,
Masatoshi Kurihara,
Yoshiaki Matsuoka,
Kazuyuki Narisawa,
Ryo Yoshinaka,
Hideo Bannai,
Shunsuke Inenaga,
Ayumi Shinohara
Abstract:
We consider the longest common subsequence (LCS) problem with the restriction that the common subsequence is required to consist of at least $k$ length substrings. First, we show an $O(mn)$ time algorithm for the problem which gives a better worst-case running time than existing algorithms, where $m$ and $n$ are lengths of the input strings. Furthermore, we mainly consider the LCS in at least $k$…
▽ More
We consider the longest common subsequence (LCS) problem with the restriction that the common subsequence is required to consist of at least $k$ length substrings. First, we show an $O(mn)$ time algorithm for the problem which gives a better worst-case running time than existing algorithms, where $m$ and $n$ are lengths of the input strings. Furthermore, we mainly consider the LCS in at least $k$ length order-isomorphic substrings problem. We show that the problem can also be solved in $O(mn)$ worst-case time by an easy-to-implement algorithm.
△ Less
Submitted 6 February, 2017; v1 submitted 12 September, 2016;
originally announced September 2016.
-
Micro-Clustering: Finding Small Clusters in Large Diversity
Authors:
Takeaki Uno,
Hiroki Maegawa,
Takanobu Nakahara,
Yukinobu Hamuro,
Ryo Yoshinaka,
Makoto Tatsuta
Abstract:
We address the problem of un-supervised soft-clustering called micro-clustering. The aim of the problem is to enumerate all groups composed of records strongly related to each other, while standard clustering methods separate records at sparse parts. The problem formulation of micro-clustering is non-trivial. Clique mining in a similarity graph is a typical approach, but it results in a huge numbe…
▽ More
We address the problem of un-supervised soft-clustering called micro-clustering. The aim of the problem is to enumerate all groups composed of records strongly related to each other, while standard clustering methods separate records at sparse parts. The problem formulation of micro-clustering is non-trivial. Clique mining in a similarity graph is a typical approach, but it results in a huge number of cliques that are of many similar cliques. We propose a new concept data polishing. The cause of huge solutions can be considered that the groups are not clear in the data, that is, the boundaries of the groups are not clear, because of noise, uncertainty, lie, lack, etc. Data polishing clarifies the groups by perturbating the data. Specifically, dense subgraphs that would correspond to clusters are replaced by cliques. The clusters are clarified as maximal cliques, thus the number of maximal cliques will be drastically reduced. We also propose an efficient algorithm applicable even for large scale data. Computational experiments showed the efficiency of our algorithm, i.e., the number of solutions is small, (e.g., 1,000), the members of each group are deeply related, and the computation time is short.
△ Less
Submitted 6 June, 2016; v1 submitted 11 July, 2015;
originally announced July 2015.