Search | arXiv e-print repository

doi 10.4230/LIPIcs.CPM.2024.1

Algorithms for Galois Words: Detection, Factorization, and Rotation

Authors: Diptarama Hendrian, Dominik Köppl, Ryo Yoshinaka, Ayumi Shinohara

Abstract: Lyndon words are extensively studied in combinatorics on words -- they play a crucial role on upper bounding the number of runs a word can have [Bannai+, SIAM J. Comput.'17]. We can determine Lyndon words, factorize a word into Lyndon words in lexicographically non-increasing order, and find the Lyndon rotation of a word, all in linear time within constant additional working space. A recent resear… ▽ More Lyndon words are extensively studied in combinatorics on words -- they play a crucial role on upper bounding the number of runs a word can have [Bannai+, SIAM J. Comput.'17]. We can determine Lyndon words, factorize a word into Lyndon words in lexicographically non-increasing order, and find the Lyndon rotation of a word, all in linear time within constant additional working space. A recent research interest emerged from the question of what happens when we change the lexicographic order, which is at the heart of the definition of Lyndon words. In particular, the alternating order, where the order of all odd positions becomes reversed, has been recently proposed. While a Lyndon word is, among all its cyclic rotations, the smallest one with respect to the lexicographic order, a Galois word exhibits the same property by exchanging the lexicographic order with the alternating order. Unfortunately, this exchange has a large impact on the properties Galois words exhibit, which makes it a nontrivial task to translate results from Lyndon words to Galois words. Up until now, it has only been conjectured that linear-time algorithms with constant additional working space in the spirit of Duval's algorithm are possible for computing the Galois factorization or the Galois rotation. Here, we affirm this conjecture as follows. Given a word $T$ of length $n$, we can determine whether $T$ is a Galois word, in $O(n)$ time with constant additional working space. Within the same complexities, we can also determine the Galois rotation of $T$, and compute the Galois factorization of $T$ online. The last result settles Open Problem~1 in [Dolce et al., TCS 2019] for Galois words. △ Less

Submitted 23 April, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

Comments: 16 pages,3 figures,accepted to CPM 2024

arXiv:2308.05977 [pdf, other]

Breaking a Barrier in Constructing Compact Indexes for Parameterized Pattern Matching

Authors: Kento Iseri, Tomohiro I, Diptarama Hendrian, Dominik Köppl, Ryo Yoshinaka, Ayumi Shinohara

Abstract: A parameterized string (p-string) is a string over an alphabet $(Σ_{s} \cup Σ_{p})$, where $Σ_{s}$ and $Σ_{p}$ are disjoint alphabets for static symbols (s-symbols) and for parameter symbols (p-symbols), respectively. Two p-strings $x$ and $y$ are said to parameterized match (p-match) if and only if $x$ can be transformed into $y$ by applying a bijection on $Σ_{p}$ to every occurrence of p-symbols… ▽ More A parameterized string (p-string) is a string over an alphabet $(Σ_{s} \cup Σ_{p})$, where $Σ_{s}$ and $Σ_{p}$ are disjoint alphabets for static symbols (s-symbols) and for parameter symbols (p-symbols), respectively. Two p-strings $x$ and $y$ are said to parameterized match (p-match) if and only if $x$ can be transformed into $y$ by applying a bijection on $Σ_{p}$ to every occurrence of p-symbols in $x$. The indexing problem for p-matching is to preprocess a p-string $T$ of length $n$ so that we can efficiently find the occurrences of substrings of $T$ that p-match with a given pattern. Extending the Burrows-Wheeler Transform (BWT) based index for exact string pattern matching, Ganguly et al. [SODA 2017] proposed the first compact index (named pBWT) for p-matching, and posed an open problem on how to construct it in compact space, i.e., in $O(n \lg |Σ_{s} \cup Σ_{p}|)$ bits of space. Hashimoto et al. [SPIRE 2022] partially solved this problem by showing how to construct some components of pBWTs for $T$ in $O(n \frac{|Σ_{p}| \lg n}{\lg \lg n})$ time in an online manner while reading the symbols of $T$ from right to left. In this paper, we improve the time complexity to $O(n \frac{\lg |Σ_{p}| \lg n}{\lg \lg n})$. We remark that removing the multiplicative factor of $|Σ_{p}|$ from the complexity is of great interest because it has not been achieved for over a decade in the construction of related data structures like parameterized suffix arrays even in the offline setting. We also show that our data structure can support backward search, a core procedure of BWT-based indexes, at any stage of the online construction, making it the first compact index for p-matching that can be constructed in compact space and even in an online manner. △ Less

Submitted 11 August, 2023; originally announced August 2023.

arXiv:2306.10714 [pdf, ps, other]

Efficient Parameterized Pattern Matching in Sublinear Space

Authors: Haruki Ideguchi, Diptarama Hendrian, Ryo Yoshinaka, Ayumi Shinohara

Abstract: The parameterized matching problem is a variant of string matching, which is to search for all parameterized occurrences of a pattern $P$ in a text $T$. In considering matching algorithms, the combinatorial natures of strings, especially periodicity, play an important role. In this paper, we analyze the properties of periods of parameterized strings and propose a generalization of Galil and Seifer… ▽ More The parameterized matching problem is a variant of string matching, which is to search for all parameterized occurrences of a pattern $P$ in a text $T$. In considering matching algorithms, the combinatorial natures of strings, especially periodicity, play an important role. In this paper, we analyze the properties of periods of parameterized strings and propose a generalization of Galil and Seiferas's exact matching algorithm (1980) into parameterized matching, which runs in $O(π|T|+|P|)$ time and $O(\log{|P|}+|{\rmΠ}|)$ space in addition to the input space, where ${\rmΠ}$ is the parameter alphabet and $π$ is the number of parameter characters appearing in $P$ plus one. △ Less

Submitted 19 June, 2023; originally announced June 2023.

arXiv:2209.12405 [pdf, ps, other]

Inferring Strings from Position Heaps in Linear Time

Authors: Koshiro Kumagai, Diptarama Hendrian, Ryo Yoshinaka, Ayumi Shinohara

Abstract: Position heaps are index structures of text strings used for the string matching problem. They are rooted trees whose edges and nodes are labeled and numbered, respectively. This paper is concerned with variants of the inverse problem of position heap construction and gives linear-time algorithms for those problems. The basic problem is to restore a text string from a rooted tree with labeled edge… ▽ More Position heaps are index structures of text strings used for the string matching problem. They are rooted trees whose edges and nodes are labeled and numbered, respectively. This paper is concerned with variants of the inverse problem of position heap construction and gives linear-time algorithms for those problems. The basic problem is to restore a text string from a rooted tree with labeled edges and numbered nodes. In the variant problems, the input trees may miss edge labels or node numbers which we must restore as well. △ Less

Submitted 12 December, 2022; v1 submitted 26 September, 2022; originally announced September 2022.

Comments: 10 pages, 5 figures

arXiv:2206.15100 [pdf, ps, other]

Computing the Parameterized Burrows--Wheeler Transform Online

Authors: Daiki Hashimoto, Diptarama Hendrian, Dominik Köppl, Ryo Yoshinaka, Ayumi Shinohara

Abstract: Parameterized strings are a generalization of strings in that their characters are drawn from two different alphabets, where one is considered to be the alphabet of static characters and the other to be the alphabet of parameter characters. Two parameterized strings are a parameterized match if there is a bijection over all characters such that the bijection transforms one string to the other whil… ▽ More Parameterized strings are a generalization of strings in that their characters are drawn from two different alphabets, where one is considered to be the alphabet of static characters and the other to be the alphabet of parameter characters. Two parameterized strings are a parameterized match if there is a bijection over all characters such that the bijection transforms one string to the other while kee** the static characters (i.e., it behaves as the identity on the static alphabet). Ganguly et al. [SODA 2017] proposed the parameterized Burrows--Wheeler transform (pBWT) as a variant of the Burrows--Wheeler transform for space-efficient parameterized pattern matching. In this paper, we propose an algorithm for computing the pBWT online by reading the characters of a given input string one-by-one from right to left. Our algorithm works in $O(|Π| \log n / \log \log n)$ amortized time for each input character, where $n$ and $Π$ denote the size of the input string and the alphabet of the parameter characters, respectively. △ Less

Submitted 30 August, 2022; v1 submitted 30 June, 2022; originally announced June 2022.

Comments: 13 pages, accepted to SPIRE 2022

arXiv:2202.13284 [pdf, other]

Parallel algorithm for pattern matching problems under substring consistent equivalence relations

Authors: Davaajav Jargalsaikhan, Diptarama Hendrian, Ryo Yoshinaka, Ayumi Shinohara

Abstract: Given a text and a pattern over an alphabet, the pattern matching problem searches for all occurrences of the pattern in the text. An equivalence relation $\approx$ is called a substring consistent equivalence relation (SCER), if for two strings $X$ and $Y$, $X \approx Y$ implies $|X| = |Y|$ and $X[i:j] \approx Y[i:j]$ for all $1 \le i \le j \le |X|$. In this paper, we propose an efficient paralle… ▽ More Given a text and a pattern over an alphabet, the pattern matching problem searches for all occurrences of the pattern in the text. An equivalence relation $\approx$ is called a substring consistent equivalence relation (SCER), if for two strings $X$ and $Y$, $X \approx Y$ implies $|X| = |Y|$ and $X[i:j] \approx Y[i:j]$ for all $1 \le i \le j \le |X|$. In this paper, we propose an efficient parallel algorithm for pattern matching under any SCER using the"duel-and-sweep" paradigm. For a pattern of length $m$ and a text of length $n$, our algorithm runs in $O(ξ_m^\mathrm{t} \log^2 m)$ time and $O(ξ_m^\mathrm{w} \cdot n \log^2 m)$ work, with $O(τ_n^\mathrm{t} + ξ_m^\mathrm{t} \log^2 m)$ time and $O(τ_n^\mathrm{w} + ξ_m^\mathrm{w} \cdot m \log^2 m)$ work preprocessing on the Priority Concurrent Read Concurrent Write Parallel Random-Access Machines (P-CRCW PRAM). △ Less

Submitted 27 July, 2022; v1 submitted 26 February, 2022; originally announced February 2022.

arXiv:2004.12590 [pdf, other]

doi 10.4230/LIPIcs.CPM.2020.23

In-Place Bijective Burrows-Wheeler Transforms

Authors: Dominik Köppl, Daiki Hashimoto, Diptarama Hendrian, Ayumi Shinohara

Abstract: One of the most well-known variants of the Burrows-Wheeler transform (BWT) [Burrows and Wheeler, 1994] is the bijective BWT (BBWT) [Gil and Scott, arXiv 2012], which applies the extended BWT (EBWT) [Mantaci et al., TCS 2007] to the multiset of Lyndon factors of a given text. Since the EBWT is invertible, the BBWT is a bijective transform in the sense that the inverse image of the EBWT restores thi… ▽ More One of the most well-known variants of the Burrows-Wheeler transform (BWT) [Burrows and Wheeler, 1994] is the bijective BWT (BBWT) [Gil and Scott, arXiv 2012], which applies the extended BWT (EBWT) [Mantaci et al., TCS 2007] to the multiset of Lyndon factors of a given text. Since the EBWT is invertible, the BBWT is a bijective transform in the sense that the inverse image of the EBWT restores this multiset of Lyndon factors such that the original text can be obtained by sorting these factors in non-increasing order. In this paper, we present algorithms constructing or inverting the BBWT in-place using quadratic time. We also present conversions from the BBWT to the BWT, or vice versa, either (a) in-place using quadratic time, or (b) in the run-length compressed setting using $O(n \lg r / \lg \lg r)$ time with $O(r \lg n)$ bits of words, where $r$ is the sum of character runs in the BWT and the BBWT. △ Less

Submitted 27 April, 2020; originally announced April 2020.

Comments: In proceedings of CPM 2020

arXiv:2003.08097 [pdf, other]

Grammar compression with probabilistic context-free grammar

Authors: Hiroaki Naganuma, Diptarama Hendrian, Ryo Yoshinaka, Ayumi Shinohara, Naoki Kobayashi

Abstract: We propose a new approach for universal lossless text compression, based on grammar compression. In the literature, a target string $T$ has been compressed as a context-free grammar $G$ in Chomsky normal form satisfying $L(G) = \{T\}$. Such a grammar is often called a \emph{straight-line program} (SLP). In this paper, we consider a probabilistic grammar $G$ that generates $T$, but not necessarily… ▽ More We propose a new approach for universal lossless text compression, based on grammar compression. In the literature, a target string $T$ has been compressed as a context-free grammar $G$ in Chomsky normal form satisfying $L(G) = \{T\}$. Such a grammar is often called a \emph{straight-line program} (SLP). In this paper, we consider a probabilistic grammar $G$ that generates $T$, but not necessarily as a unique element of $L(G)$. In order to recover the original text $T$ unambiguously, we keep both the grammar $G$ and the derivation tree of $T$ from the start symbol in $G$, in compressed form. We show some simple evidence that our proposal is indeed more efficient than SLPs for certain texts, both from theoretical and practical points of view. △ Less

Submitted 18 March, 2020; originally announced March 2020.

Comments: 11 pages, 3 figures, accepted for poster presentation at DCC 2020

arXiv:2002.08004 [pdf, ps, other]

Fast and linear-time string matching algorithms based on the distances of $q$-gram occurrences

Authors: Satoshi Kobayashi, Diptarama Hendrian, Ryo Yoshinaka, Ayumi Shinohara

Abstract: Given a text $T$ of length $n$ and a pattern $P$ of length $m$, the string matching problem is a task to find all occurrences of $P$ in $T$. In this study, we propose an algorithm that solves this problem in $O((n + m)q)$ time considering the distance between two adjacent occurrences of the same $q$-gram contained in $P$. We also propose a theoretical improvement of it which runs in $O(n + m)$ tim… ▽ More Given a text $T$ of length $n$ and a pattern $P$ of length $m$, the string matching problem is a task to find all occurrences of $P$ in $T$. In this study, we propose an algorithm that solves this problem in $O((n + m)q)$ time considering the distance between two adjacent occurrences of the same $q$-gram contained in $P$. We also propose a theoretical improvement of it which runs in $O(n + m)$ time, though it is not necessarily faster in practice. We compare the execution times of our and existing algorithms on various kinds of real and artificial datasets such as an English text, a genome sequence and a Fibonacci string. The experimental results show that our algorithm is as fast as the state-of-the-art algorithms in many cases, particularly when a pattern frequently appears in a text. △ Less

Submitted 12 April, 2020; v1 submitted 18 February, 2020; originally announced February 2020.

Comments: 14 pages, accepted to SEA 2020

arXiv:2002.06796 [pdf, other]

Detecting $k$-(Sub-)Cadences and Equidistant Subsequence Occurrences

Authors: Mitsuru Funakoshi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Ayumi Shinohara

Abstract: The equidistant subsequence pattern matching problem is considered. Given a pattern string $P$ and a text string $T$, we say that $P$ is an \emph{equidistant subsequence} of $T$ if $P$ is a subsequence of the text such that consecutive symbols of $P$ in the occurrence are equally spaced. We can consider the problem of equidistant subsequences as generalizations of (sub-)cadences. We give bit-paral… ▽ More The equidistant subsequence pattern matching problem is considered. Given a pattern string $P$ and a text string $T$, we say that $P$ is an \emph{equidistant subsequence} of $T$ if $P$ is a subsequence of the text such that consecutive symbols of $P$ in the occurrence are equally spaced. We can consider the problem of equidistant subsequences as generalizations of (sub-)cadences. We give bit-parallel algorithms that yield $o(n^2)$ time algorithms for finding $k$-(sub-)cadences and equidistant subsequences. Furthermore, $O(n\log^2 n)$ and $O(n\log n)$ time algorithms, respectively for equidistant and Abelian equidistant matching for the case $|P| = 3$, are shown. The algorithms make use of a technique that was recently introduced which can efficiently compute convolutions with linear constraints. △ Less

Submitted 17 February, 2020; originally announced February 2020.

arXiv:2002.06786 [pdf, other]

doi 10.1016/j.tcs.2022.09.008

Parameterized DAWGs: efficient constructions and bidirectional pattern searches

Authors: Katsuhito Nakashima, Noriki Fujisato, Diptarama Hendrian, Yuto Nakashima, Ryo Yoshinaka, Shunsuke Inenaga, Hideo Bannai, Ayumi Shinohara, Masayuki Takeda

Abstract: Two strings $x$ and $y$ over $Σ\cup Π$ of equal length are said to \emph{parameterized match} (\emph{p-match}) if there is a renaming bijection $f:Σ\cup Π\rightarrow Σ\cup Π$ that is identity on $Σ$ and transforms $x$ to $y$ (or vice versa). The \emph{p-matching} problem is to look for substrings in a text that p-match a given pattern. In this paper, we propose \emph{parameterized suffix automata}… ▽ More Two strings $x$ and $y$ over $Σ\cup Π$ of equal length are said to \emph{parameterized match} (\emph{p-match}) if there is a renaming bijection $f:Σ\cup Π\rightarrow Σ\cup Π$ that is identity on $Σ$ and transforms $x$ to $y$ (or vice versa). The \emph{p-matching} problem is to look for substrings in a text that p-match a given pattern. In this paper, we propose \emph{parameterized suffix automata} (\emph{p-suffix automata}) and \emph{parameterized directed acyclic word graphs} (\emph{PDAWGs}) which are the p-matching versions of suffix automata and DAWGs. While suffix automata and DAWGs are equivalent for standard strings, we show that p-suffix automata can have $Θ(n^2)$ nodes and edges but PDAWGs have only $O(n)$ nodes and edges, where $n$ is the length of an input string. We also give an $O(n |Π| \log (|Π| + |Σ|))$-time $O(n)$-space algorithm that builds the PDAWG in a left-to-right online manner. As a byproduct, it is shown that the \emph{parameterized suffix tree} for the reversed string can also be built in the same time and space, in a right-to-left online manner. This duality also leads us to two further efficient algorithms for p-matching: Given the parameterized suffix tree for the reversal of the input string $T$, one can build the PDAWG of $T$ in $O(n)$ time in an offline manner; One can perform \emph{bidirectional} p-matching in $O(m \log (|Π|+|Σ|) + \mathit{occ})$ time using $O(n)$ space, where $m$ denotes the pattern length and $\mathit{occ}$ is the number of pattern occurrences in the text $T$. △ Less

Submitted 16 September, 2022; v1 submitted 17 February, 2020; originally announced February 2020.

Comments: 28 pages, 7 figures

Journal ref: Theoretical Computer Science (2022)

arXiv:2002.06764 [pdf, ps, other]

Computing Covers under Substring Consistent Equivalence Relations

Authors: Natsumi Kikuchi, Diptarama Hendrian, Ryo Yoshinaka, Ayumi Shinohara

Abstract: Covers are a kind of quasiperiodicity in strings. A string $C$ is a cover of another string $T$ if any position of $T$ is inside some occurrence of $C$ in $T$. The shortest and longest cover arrays of $T$ have the lengths of the shortest and longest covers of each prefix of $T$, respectively. The literature has proposed linear-time algorithms computing longest and shortest cover arrays taking bord… ▽ More Covers are a kind of quasiperiodicity in strings. A string $C$ is a cover of another string $T$ if any position of $T$ is inside some occurrence of $C$ in $T$. The shortest and longest cover arrays of $T$ have the lengths of the shortest and longest covers of each prefix of $T$, respectively. The literature has proposed linear-time algorithms computing longest and shortest cover arrays taking border arrays as input. An equivalence relation $\approx$ over strings is called a substring consistent equivalence relation (SCER) iff $X \approx Y$ implies (1) $|X| = |Y|$ and (2) $X[i:j] \approx Y[i:j]$ for all $1 \le i \le j \le |X|$. In this paper, we generalize the notion of covers for SCERs and prove that existing algorithms to compute the shortest cover array and the longest cover array of a string $T$ under the identity relation will work for any SCERs taking the accordingly generalized border arrays. △ Less

Submitted 30 July, 2020; v1 submitted 16 February, 2020; originally announced February 2020.

Comments: 16 pages

arXiv:1902.07417 [pdf, other]

doi 10.4204/EPTCS.305.10

Query Learning Algorithm for Residual Symbolic Finite Automata

Authors: Kaizaburo Chubachi, Diptarama Hendrian, Ryo Yoshinaka, Ayumi Shinohara

Abstract: We propose a query learning algorithm for residual symbolic finite automata (RSFAs). Symbolic finite automata (SFAs) are finite automata whose transitions are labeled by predicates over a Boolean algebra, in which a big collection of characters leading the same transition may be represented by a single predicate. Residual finite automata (RFAs) are a special type of non-deterministic finite automa… ▽ More We propose a query learning algorithm for residual symbolic finite automata (RSFAs). Symbolic finite automata (SFAs) are finite automata whose transitions are labeled by predicates over a Boolean algebra, in which a big collection of characters leading the same transition may be represented by a single predicate. Residual finite automata (RFAs) are a special type of non-deterministic finite automata which can be exponentially smaller than the minimum deterministic finite automata and have a favorable property for learning algorithms. RSFAs have both properties of SFAs and RFAs and can have more succinct representation of transitions and fewer states than RFAs and deterministic SFAs accepting the same language. The implementation of our algorithm efficiently learns RSFAs over a huge alphabet and outperforms an existing learning algorithm for deterministic SFAs. The result also shows that the benefit of non-determinism in efficiency is even larger in learning SFAs than non-symbolic automata. △ Less

Submitted 17 September, 2019; v1 submitted 20 February, 2019; originally announced February 2019.

Comments: In Proceedings GandALF 2019, arXiv:1909.05979

Journal ref: EPTCS 305, 2019, pp. 140-153

arXiv:1902.00216 [pdf, other]

An Extension of Linear-size Suffix Tries for Parameterized Strings

Authors: Katsuhito Nakashima, Diptarama Hendrian, Ryo Yoshinaka, Ayumi Shinohara

Abstract: In this paper, we propose a new indexing structure for parameterized strings which we call PLSTs, by generalizing linear-size suffix tries for ordinary strings. Two parameterized strings are said to match if there is a bijection on the symbol set that makes the two coincide. PLSTs are applicable to the parameterized pattern matching problem, which is to decide whether the input parameterized text… ▽ More In this paper, we propose a new indexing structure for parameterized strings which we call PLSTs, by generalizing linear-size suffix tries for ordinary strings. Two parameterized strings are said to match if there is a bijection on the symbol set that makes the two coincide. PLSTs are applicable to the parameterized pattern matching problem, which is to decide whether the input parameterized text has a substring that matches the input parameterized pattern. The size of PLSTs is linear in the text size, with which our algorithm solves the parameterized pattern matching problem in linear time in the pattern size. PLSTs can be seen as a compacted version of parameterized suffix tries and a combination of linear-size suffix tries and parameterized suffix trees. We experimentally show that PLSTs are more space efficient than parameterized suffix trees for highly repetitive strings. △ Less

Submitted 4 September, 2019; v1 submitted 1 February, 2019; originally announced February 2019.

Comments: 13 pages, 6 figures

arXiv:1807.11580 [pdf, ps, other]

Enumerating Cryptarithms Using Deterministic Finite Automata

Authors: Yuki Nozaki, Diptarama Hendrian, Ryo Yoshinaka, Takashi Horiyama, Ayumi Shinohara

Abstract: A cryptarithm is a mathematical puzzle where given an arithmetic equation written with letters rather than numerals, a player must discover an assignment of numerals on letters that makes the equation hold true. In this paper, we propose a method to construct a DFA that accepts cryptarithms that admit (unique) solutions for each base. We implemented the method and constructed a DFA for bases… ▽ More A cryptarithm is a mathematical puzzle where given an arithmetic equation written with letters rather than numerals, a player must discover an assignment of numerals on letters that makes the equation hold true. In this paper, we propose a method to construct a DFA that accepts cryptarithms that admit (unique) solutions for each base. We implemented the method and constructed a DFA for bases $k \le 7$. Those DFAs can be used as complete catalogues of cryptarithms,whose applications include enumeration of and counting the exact numbers $G_k(n)$ of cryptarithm instances with $n$ digits that admit base-$k$ solutions. Moreover, explicit formulas for $G_2(n)$ and $G_3(n)$ are given. △ Less

Submitted 26 July, 2018; originally announced July 2018.

arXiv:1806.09806 [pdf, other]

Linear-Time Online Algorithm Inferring the Shortest Path from a Walk

Authors: Shintaro Narisada, Diptarama Hendrian, Ryo Yoshinaka, Ayumi Shinohara

Abstract: We consider the problem of inferring an edge-labeled graph from the sequence of edge labels seen in a walk of that graph. It has been known that this problem is solvable in $O(n \log n)$ time when the targets are path or cycle graphs. This paper presents an online algorithm for the problem of this restricted case that runs in $O(n)$ time, based on Manacher's algorithm for computing all the maximal… ▽ More We consider the problem of inferring an edge-labeled graph from the sequence of edge labels seen in a walk of that graph. It has been known that this problem is solvable in $O(n \log n)$ time when the targets are path or cycle graphs. This paper presents an online algorithm for the problem of this restricted case that runs in $O(n)$ time, based on Manacher's algorithm for computing all the maximal palindromes in a string. △ Less

Submitted 20 February, 2019; v1 submitted 26 June, 2018; originally announced June 2018.

Comments: 31 pages, 7 figures, extended version of the proceeding paper in SPIRE 2018

arXiv:1805.04929 [pdf, other]

doi 10.1103/PhysRevC.98.014317

Towards an energy measurement of the internal conversion electron in the de-excitation of the Th-229 isomer

Authors: Simon Stellmer, Yudai Shigekawa, Veronika Rosecker, Georgy A. Kazakov, Yoshitaka Kasamatsu, Yuki Yasuda, Atsushi Shinohara, Thorsten Schumm

Abstract: The first excited isomeric state of Th-229 has an exceptionally low energy of only a few eV and could form the gateway to high-precision laser spectroscopy of nuclei. The excitation energy of the isomeric state has been inferred from precision gamma spectroscopy, but its uncertainty is still too large to commence laser spectroscopy. Reducing this uncertainty is one of the most pressing challenges… ▽ More The first excited isomeric state of Th-229 has an exceptionally low energy of only a few eV and could form the gateway to high-precision laser spectroscopy of nuclei. The excitation energy of the isomeric state has been inferred from precision gamma spectroscopy, but its uncertainty is still too large to commence laser spectroscopy. Reducing this uncertainty is one of the most pressing challenges in the field. Here we present an approach to infer the energy of the isomer from spectroscopy of the electron which is emitted when the isomer de-excites through internal conversion (IC). The experiment builds on U-233, which decays to Th-229 and populates the isomeric state with a 2% fraction. A film of U-233 is covered by a stop** layer of few-nm thickness and placed between an alpha detector and an electron detector, such that the alpha particle and the IC electron can be detected in coincidence. Retarding field electrodes allow for an energy measurement. In the present design, the signal of the Th-229m IC electrons is masked by low-energy electrons emitted from the surface of the metallic stop** layer. We perform reference measurements with U-232 and U-234 to study systematic effects, and we study various means to reduce the background of low-energy electrons. Our study gives guidelines to the design of an experiment that is capable of detecting the IC electrons and measuring the isomer energy. △ Less

Submitted 13 May, 2018; originally announced May 2018.

Comments: 11 pages, 8 figures

Journal ref: Phys. Rev. C 98, 014317 (2018)

arXiv:1710.03395 [pdf, other]

doi 10.1016/j.tcs.2018.04.016

Efficient Dynamic Dictionary Matching with DAWGs and AC-automata

Authors: Diptarama Hendrian, Shunsuke Inenaga, Ryo Yoshinaka, Ayumi Shinohara

Abstract: The dictionary matching is a task to find all occurrences of patterns in a set $D$ (called a dictionary) on a text $T$. The Aho-Corasick-automaton (AC-automaton) is a data structure which enables us to solve the dictionary matching problem in $O(d\logσ)$ preprocessing time and $O(n\logσ+occ)$ matching time, where $d$ is the total length of the patterns in $D$, $n$ is the length of the text, $σ$ is… ▽ More The dictionary matching is a task to find all occurrences of patterns in a set $D$ (called a dictionary) on a text $T$. The Aho-Corasick-automaton (AC-automaton) is a data structure which enables us to solve the dictionary matching problem in $O(d\logσ)$ preprocessing time and $O(n\logσ+occ)$ matching time, where $d$ is the total length of the patterns in $D$, $n$ is the length of the text, $σ$ is the alphabet size, and $occ$ is the total number of occurrences of all the patterns in the text. The dynamic dictionary matching is a variant where patterns may dynamically be inserted into and deleted from $D$. This problem is called semi-dynamic dictionary matching if only insertions are allowed. In this paper, we propose two efficient algorithms. For a pattern of length $m$, our first algorithm supports insertions in $O(m\logσ+\log d/\log\log d)$ time and pattern matching in $O(n\logσ+occ)$ time for the semi-dynamic setting and supports both insertions and deletions in $O(σm+\log d/\log\log d)$ time and pattern matching in $O(n(\log d/\log\log d+\logσ)+occ(\log d/\log\log d))$ time for the dynamic setting by some modifications. This algorithm is based on the directed acyclic word graph. Our second algorithm, which is based on the AC-automaton, supports insertions in $O(m\log σ+u_f+u_o)$ time for the semi-dynamic setting and supports both insertions and deletions in $O(σm+u_f+u_o)$ time for the dynamic setting, where $u_f$ and $u_o$ respectively denote the numbers of states in which the failure function and the output function need to be updated. This algorithm performs pattern matching in $O(n\logσ+occ)$ time for both settings. Our algorithm achieves optimal update time for AC-automaton based methods over constant-size alphabets, since any algorithm which explicitly maintains the AC-automaton requires $Ω(m+u_f+u_o)$ update time. △ Less

Submitted 20 February, 2019; v1 submitted 9 October, 2017; originally announced October 2017.

Comments: 20 pages, 4 figures

arXiv:1705.09504 [pdf, other]

New Variants of Pattern Matching with Constants and Variables

Authors: Yuki Igarashi, Diptarama, Ryo Yoshinaka, Ayumi Shinohara

Abstract: Given a text and a pattern over two types of symbols called constants and variables, the parameterized pattern matching problem is to find all occurrences of substrings of the text that the pattern matches by substituting a variable in the text for each variable in the pattern, where the substitution should be injective. The function matching problem is a variant of it that lifts the injection con… ▽ More Given a text and a pattern over two types of symbols called constants and variables, the parameterized pattern matching problem is to find all occurrences of substrings of the text that the pattern matches by substituting a variable in the text for each variable in the pattern, where the substitution should be injective. The function matching problem is a variant of it that lifts the injection constraint. In this paper, we discuss variants of those problems, where one can substitute a constant or a variable for each variable of the pattern. We give two kinds of algorithms for both problems, a convolution-based method and an extended KMP-based method, and analyze their complexity. △ Less

Submitted 26 May, 2017; originally announced May 2017.

Comments: 15 pages, 2 figures

arXiv:1705.09438 [pdf, ps, other]

Duel and sweep algorithm for order-preserving pattern matching

Authors: Davaajav Jargalsaikhan, Diptarama, Ryo Yoshinaka, Ayumi Shinohara

Abstract: Given a text $T$ and a pattern $P$ over alphabet $Σ$, the classic exact matching problem searches for all occurrences of pattern $P$ in text $T$. Unlike exact matching problem, order-preserving pattern matching (OPPM) considers the relative order of elements, rather than their real values. In this paper, we propose an efficient algorithm for OPPM problem using the "duel-and-sweep" paradigm. Our al… ▽ More Given a text $T$ and a pattern $P$ over alphabet $Σ$, the classic exact matching problem searches for all occurrences of pattern $P$ in text $T$. Unlike exact matching problem, order-preserving pattern matching (OPPM) considers the relative order of elements, rather than their real values. In this paper, we propose an efficient algorithm for OPPM problem using the "duel-and-sweep" paradigm. Our algorithm runs in $O(n + m\log m)$ time in general and $O(n + m)$ time under an assumption that the characters in a string can be sorted in linear time with respect to the string size. We also perform experiments and show that our algorithm is faster that KMP-based algorithm. Last, we introduce the two-dimensional order preserved pattern matching and give a duel and sweep algorithm that runs in $O(n^2)$ time for duel stage and $O(n^2 m)$ time for swee** time with $O(m^3)$ preprocessing time. △ Less

Submitted 26 May, 2017; originally announced May 2017.

Comments: 13 pages, 5 figures

arXiv:1702.02321 [pdf, other]

Position Heaps for Parameterized Strings

Authors: Diptarama, Takashi Katsura, Yuhei Otomo, Kazuyuki Narisawa, Ayumi Shinohara

Abstract: We propose a new indexing structure for parameterized strings, called parameterized position heap. Parameterized position heap is applicable for parameterized pattern matching problem, where the pattern matches a substring of the text if there exists a bijective map** from the symbols of the pattern to the symbols of the substring. We propose an online construction algorithm of parameterized pos… ▽ More We propose a new indexing structure for parameterized strings, called parameterized position heap. Parameterized position heap is applicable for parameterized pattern matching problem, where the pattern matches a substring of the text if there exists a bijective map** from the symbols of the pattern to the symbols of the substring. We propose an online construction algorithm of parameterized position heap of a text and show that our algorithm runs in linear time with respect to the text size. We also show that by using parameterized position heap, we can find all occurrences of a pattern in the text in linear time with respect to the product of the pattern size and the alphabet size. △ Less

Submitted 17 April, 2017; v1 submitted 8 February, 2017; originally announced February 2017.

Comments: 14 pages, 4 figures, accepted to CPM 2017

ACM Class: F.2.2

arXiv:1609.03668 [pdf, other]

Longest Common Subsequence in at Least $k$ Length Order-Isomorphic Substrings

Authors: Yohei Ueki, Diptarama, Masatoshi Kurihara, Yoshiaki Matsuoka, Kazuyuki Narisawa, Ryo Yoshinaka, Hideo Bannai, Shunsuke Inenaga, Ayumi Shinohara

Abstract: We consider the longest common subsequence (LCS) problem with the restriction that the common subsequence is required to consist of at least $k$ length substrings. First, we show an $O(mn)$ time algorithm for the problem which gives a better worst-case running time than existing algorithms, where $m$ and $n$ are lengths of the input strings. Furthermore, we mainly consider the LCS in at least $k$… ▽ More We consider the longest common subsequence (LCS) problem with the restriction that the common subsequence is required to consist of at least $k$ length substrings. First, we show an $O(mn)$ time algorithm for the problem which gives a better worst-case running time than existing algorithms, where $m$ and $n$ are lengths of the input strings. Furthermore, we mainly consider the LCS in at least $k$ length order-isomorphic substrings problem. We show that the problem can also be solved in $O(mn)$ worst-case time by an easy-to-implement algorithm. △ Less

Submitted 6 February, 2017; v1 submitted 12 September, 2016; originally announced September 2016.

Comments: 14 pages, 7 figures, contains erratum to Springer's version (SOFSEM 2017)

arXiv:1609.03000 [pdf, other]

doi 10.1016/j.tcs.2019.10.025

Efficient computation of longest single-arm-gapped palindromes in a string

Authors: Shintaro Narisada, Diptarama Hendrian, Kazuyuki Narisawa, Shunsuke Inenaga, Ayumi Shinohara

Abstract: In this paper, we introduce new types of approximate palindromes called single-arm-gapped palindromes (shortly SAGPs). A SAGP contains a gap in either its left or right arm, which is in the form of either $wguc u^R w^R$ or $wuc u^Rgw^R$, where $w$ and $u$ are non-empty strings, $w^R$ and $u^R$ are respectively the reversed strings of $w$ and $u$, $g$ is a string called a gap, and $c$ is either a s… ▽ More In this paper, we introduce new types of approximate palindromes called single-arm-gapped palindromes (shortly SAGPs). A SAGP contains a gap in either its left or right arm, which is in the form of either $wguc u^R w^R$ or $wuc u^Rgw^R$, where $w$ and $u$ are non-empty strings, $w^R$ and $u^R$ are respectively the reversed strings of $w$ and $u$, $g$ is a string called a gap, and $c$ is either a single character or the empty string. Here we call $wu$ and $u^R w^R$ the arm of the SAGP, and $|uv|$ the length of the arm. We classify SAGPs into two groups: those which have $ucu^R$ as a maximal palindrome (type-1), and the others (type-2). We propose several algorithms to compute type-1 SAGPs with longest arms occurring in a given string, based on suffix arrays. Then, we propose a linear-time algorithm to compute all type-1 SAGPs with longest arms, based on suffix trees. Also, we show how to compute type-2 SAGPs with longest arms in linear time. We also perform some preliminary experiments to show practical performances of the proposed methods. △ Less

Submitted 31 October, 2019; v1 submitted 10 September, 2016; originally announced September 2016.

Comments: 19 pages, 11 figures

Journal ref: Theoretical Computer Science, 2019

arXiv:1304.7067 [pdf, ps, other]

Detecting regularities on grammar-compressed strings

Authors: Tomohiro I, Wataru Matsubara, Kouji Shimohira, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Kazuyuki Narisawa, Ayumi Shinohara

Abstract: We solve the problems of detecting and counting various forms of regularities in a string represented as a Straight Line Program (SLP). Given an SLP of size $n$ that represents a string $s$ of length $N$, our algorithm compute all runs and squares in $s$ in $O(n^3h)$ time and $O(n^2)$ space, where $h$ is the height of the derivation tree of the SLP. We also show an algorithm to compute all gapped-… ▽ More We solve the problems of detecting and counting various forms of regularities in a string represented as a Straight Line Program (SLP). Given an SLP of size $n$ that represents a string $s$ of length $N$, our algorithm compute all runs and squares in $s$ in $O(n^3h)$ time and $O(n^2)$ space, where $h$ is the height of the derivation tree of the SLP. We also show an algorithm to compute all gapped-palindromes in $O(n^3h + gnh\log N)$ time and $O(n^2)$ space, where $g$ is the length of the gap. The key technique of the above solution also allows us to compute the periods and covers of the string in $O(n^2 h)$ time and $O(nh(n+\log^2 N))$ time, respectively. △ Less

Submitted 26 April, 2013; originally announced April 2013.

arXiv:0804.1214 [pdf, ps, other]

New Lower Bounds for the Maximum Number of Runs in a String

Authors: Kazuhiko Kusano, Wataru Matsubara, Akira Ishino, Hideo Bannai, Ayumi Shinohara

Abstract: We show a new lower bound for the maximum number of runs in a string. We prove that for any e > 0, (a -- e)n is an asymptotic lower bound, where a = 56733/60064 = 0.944542. It is superior to the previous bound 0.927 given by Franek et al. Moreover, our construction of the strings and the proof is much simpler than theirs. We show a new lower bound for the maximum number of runs in a string. We prove that for any e > 0, (a -- e)n is an asymptotic lower bound, where a = 56733/60064 = 0.944542. It is superior to the previous bound 0.927 given by Franek et al. Moreover, our construction of the strings and the proof is much simpler than theirs. △ Less

Submitted 8 April, 2008; originally announced April 2008.

ACM Class: G.2.1

Showing 1–25 of 25 results for author: Shinohara, A