Skip to main content

Showing 1–13 of 13 results for author: Nishimoto, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.07510  [pdf, other

    cs.DS

    Dynamic Suffix Array in Optimal Compressed Space

    Authors: Takaaki Nishimoto, Yasuo Tabei

    Abstract: Big data, encompassing extensive datasets, has seen rapid expansion, notably with a considerable portion being textual data, including strings and texts. Simple compression methods and standard data structures prove inadequate for processing these datasets, as they require decompression for usage or consume extensive memory resources. Consequently, this motivation has led to the development of com… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: Abstract shortened to fit ArXiv requirements

  2. arXiv:2207.02571  [pdf, other

    cs.DS

    Computing NP-hard Repetitiveness Measures via MAX-SAT

    Authors: Hideo Bannai, Keisuke Goto, Masakazu Ishihata, Shunsuke Kanda, Dominik Köppl, Takaaki Nishimoto

    Abstract: Repetitiveness measures reveal profound characteristics of datasets, and give rise to compressed data structures and algorithms working in compressed space. Alas, the computation of some of these measures is NP-hard, and straight-forward computation is infeasible for datasets of even small sizes. Three such measures are the smallest size of a string attractor, the smallest size of a bidirectional… ▽ More

    Submitted 12 July, 2022; v1 submitted 6 July, 2022; originally announced July 2022.

    Comments: paper accepted to ESA 2022 (plus Appendix); corrected attribution of Python program for computing https://oeis.org/A339391

  3. arXiv:2202.07885  [pdf, other

    cs.DS

    An Optimal-Time RLBWT Construction in BWT-runs Bounded Space

    Authors: Takaaki Nishimoto, Shunsuke Kanda, Yasuo Tabei

    Abstract: The compression of highly repetitive strings (i.e., strings with many repetitions) has been a central research topic in string processing, and quite a few compression methods for these strings have been proposed thus far. Among them, an efficient compression format gathering increasing attention is the run-length Burrows--Wheeler transform (RLBWT), which is a run-length encoded BWT as a reversible… ▽ More

    Submitted 16 February, 2022; originally announced February 2022.

  4. arXiv:2104.09985  [pdf, other

    cs.DM math.CO

    A Separation of $γ$ and $b$ via Thue--Morse Words

    Authors: Hideo Bannai, Mitsuru Funakoshi, Tomohiro I, Dominik Koeppl, Takuya Mieno, Takaaki Nishimoto

    Abstract: We prove that for $n\geq 2$, the size $b(t_n)$ of the smallest bidirectional scheme for the $n$th Thue--Morse word $t_n$ is $n+2$. Since Kutsukake et al. [SPIRE 2020] show that the size $γ(t_n)$ of the smallest string attractor for $t_n$ is $4$ for $n \geq 4$, this shows for the first time that there is a separation between the size of the smallest string attractor $γ$ and the size of the smallest… ▽ More

    Submitted 19 April, 2021; originally announced April 2021.

  5. arXiv:2006.05104  [pdf, other

    cs.DS

    Optimal-Time Queries on BWT-runs Compressed Indexes

    Authors: Takaaki Nishimoto, Yasuo Tabei

    Abstract: Indexing highly repetitive strings (i.e., strings with many repetitions) for fast queries has become a central research topic in string processing, because it has a wide variety of applications in bioinformatics and natural language processing. Although a substantial number of indexes for highly repetitive strings have been proposed thus far, develo** compressed indexes that support various quer… ▽ More

    Submitted 16 April, 2021; v1 submitted 9 June, 2020; originally announced June 2020.

  6. arXiv:2004.01493  [pdf, other

    cs.DS

    R-enum: Enumeration of Characteristic Substrings in BWT-runs Bounded Space

    Authors: Takaaki Nishimoto, Yasuo Tabei

    Abstract: Enumerating characteristic substrings (e.g., maximal repeats, minimal unique substrings, and minimal absent words) in a given string has been an important research topic because there are a wide variety of applications in various areas such as string processing and computational biology. Although several enumeration algorithms for characteristic substrings have been proposed, they are not space-ef… ▽ More

    Submitted 2 March, 2021; v1 submitted 3 April, 2020; originally announced April 2020.

    Comments: The content of the paper is significantly different from the previous version

  7. arXiv:1902.05224  [pdf, other

    cs.DS

    Conversion from RLBWT to LZ77

    Authors: Takaaki Nishimoto, Yasuo Tabei

    Abstract: Converting a compressed format of a string into another compressed format without an explicit decompression is one of the central research topics in string processing. We discuss the problem of converting the run-length Burrows-Wheeler Transform (RLBWT) of a string to Lempel-Ziv 77 (LZ77) phrases of the reversed string. The first results with Policriti and Prezza's conversion algorithm [Algorithmi… ▽ More

    Submitted 14 February, 2019; originally announced February 2019.

  8. arXiv:1812.04261  [pdf, other

    cs.DS

    LZRR: LZ77 Parsing with Right Reference

    Authors: Takaaki Nishimoto, Yasuo Tabei

    Abstract: Lossless data compression has been widely studied in computer science. One of the most widely used lossless data compressions is Lempel-Zip(LZ) 77 parsing, which achieves a high compression ratio. Bidirectional (a.k.a. macro) parsing is a lossless data compression and computes a sequence of phrases copied from another substring (target phrase) on either the left or the right position in an input s… ▽ More

    Submitted 11 December, 2018; originally announced December 2018.

  9. arXiv:1711.02855  [pdf, ps, other

    cs.DS

    A compressed dynamic self-index for highly repetitive text collections

    Authors: Takaaki Nishimoto, Yoshimasa Takabatake, Yasuo Tabei

    Abstract: We present a novel compressed dynamic self-index for highly repetitive text collections. Signature encoding is a compressed dynamic self-index for highly repetitive texts and has a large disadvantage that the pattern search for short patterns is slow. We improve this disadvantage for faster pattern search by leveraging an idea behind truncated suffix tree and present the first compressed dynamic s… ▽ More

    Submitted 24 April, 2018; v1 submitted 8 November, 2017; originally announced November 2017.

  10. arXiv:1702.07458  [pdf, other

    cs.DS

    Small-space encoding LCE data structure with constant-time queries

    Authors: Yuka Tanimura, Takaaki Nishimoto, Hideo Bannai, Shunsuke Inenaga, Masayuki Takeda

    Abstract: The \emph{longest common extension} (\emph{LCE}) problem is to preprocess a given string $w$ of length $n$ so that the length of the longest common prefix between suffixes of $w$ that start at any two given positions is answered quickly. In this paper, we present a data structure of $O(z τ^2 + \frac{n}τ)$ words of space which answers LCE queries in $O(1)$ time and can be built in $O(n \log σ)$ tim… ▽ More

    Submitted 23 February, 2017; originally announced February 2017.

  11. arXiv:1605.09558  [pdf, ps, other

    cs.DS

    Dynamic index and LZ factorization in compressed space

    Authors: Takaaki Nishimoto, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    Abstract: In this paper, we propose a new \emph{dynamic compressed index} of $O(w)$ space for a dynamic text $T$, where $w = O(\min(z \log N \log^*M, N))$ is the size of the signature encoding of $T$, $z$ is the size of the Lempel-Ziv77 (LZ77) factorization of $T$, $N$ is the length of $T$, and $M \geq 3N$ is an integer that can be handled in constant time under word RAM model. Our index supports searching… ▽ More

    Submitted 19 July, 2016; v1 submitted 31 May, 2016; originally announced May 2016.

    Comments: arXiv admin note: substantial text overlap with arXiv:1605.01488; text overlap with arXiv:1504.06954

  12. arXiv:1605.01488  [pdf, ps, other

    cs.DS

    Fully dynamic data structure for LCE queries in compressed space

    Authors: Takaaki Nishimoto, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    Abstract: A Longest Common Extension (LCE) query on a text $T$ of length $N$ asks for the length of the longest common prefix of suffixes starting at given two positions. We show that the signature encoding $\mathcal{G}$ of size $w = O(\min(z \log N \log^* M, N))$ [Mehlhorn et al., Algorithmica 17(2):183-198, 1997] of $T$, which can be seen as a compressed representation of $T$, has a capability to support… ▽ More

    Submitted 26 June, 2016; v1 submitted 5 May, 2016; originally announced May 2016.

    Comments: arXiv admin note: text overlap with arXiv:1504.06954

  13. arXiv:1504.06954  [pdf, ps, other

    cs.DS

    Dynamic index, LZ factorization, and LCE queries in compressed space

    Authors: Takaaki Nishimoto, I Tomohiro, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    Abstract: In this paper, we present the following results: (1) We propose a new \emph{dynamic compressed index} of $O(w)$ space, that supports searching for a pattern $P$ in the current text in $O(|P| f(M,w) + \log w \log |P| \log^* M (\log N + \log |P| \log^* M) + \mathit{occ} \log N)$ time and insertion/deletion of a substring of length $y$ in $O((y+ \log N\log^* M)\log w \log N \log^* M)$ time, where… ▽ More

    Submitted 6 April, 2016; v1 submitted 27 April, 2015; originally announced April 2015.