Skip to main content

Showing 1–40 of 40 results for author: I., T

.
  1. arXiv:2406.15011  [pdf, other

    cs.DS

    Space-efficient SLP Encoding for $O(\log N)$-time Random Access

    Authors: Akito Takasaka, Tomohiro I

    Abstract: A Straight-Line Program (SLP) $G$ for a string $T$ is a context-free grammar (CFG) that derives $T$ only, which can be considered as a compressed representation of $T$. In this paper, we show how to encode $G$ in $n \lceil \lg N \rceil + (n + n') \lceil \lg (n+σ) \rceil + 4n - 2n' + o(n)$ bits to support random access queries of extracting $T[p..q]$ in worst-case $O(\log N + p - q)$ time, where… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  2. arXiv:2308.05977  [pdf, other

    cs.DS

    Breaking a Barrier in Constructing Compact Indexes for Parameterized Pattern Matching

    Authors: Kento Iseri, Tomohiro I, Diptarama Hendrian, Dominik Köppl, Ryo Yoshinaka, Ayumi Shinohara

    Abstract: A parameterized string (p-string) is a string over an alphabet $(Σ_{s} \cup Σ_{p})$, where $Σ_{s}$ and $Σ_{p}$ are disjoint alphabets for static symbols (s-symbols) and for parameter symbols (p-symbols), respectively. Two p-strings $x$ and $y$ are said to parameterized match (p-match) if and only if $x$ can be transformed into $y$ by applying a bijection on $Σ_{p}$ to every occurrence of p-symbols… ▽ More

    Submitted 11 August, 2023; originally announced August 2023.

  3. arXiv:2206.12600  [pdf, other

    cs.DS

    PalFM-index: FM-index for Palindrome Pattern Matching

    Authors: Shinya Nagashita, Tomohiro I

    Abstract: The palindrome pattern matching (pal-matching) is a kind of generalized pattern matching, in which two strings $x$ and $y$ of same length are considered to match (pal-match) if they have the same palindromic structures, i.e., for any possible $1 \le i < j \le |x| = |y|$, $x[i..j]$ is a palindrome if and only if $y[i..j]$ is a palindrome. The pal-matching problem is the problem of searching for, in… ▽ More

    Submitted 14 April, 2023; v1 submitted 25 June, 2022; originally announced June 2022.

    Comments: Accepted to 34th Annual Symposium on Combinatorial Pattern Matching (CPM) 2023

  4. arXiv:2205.12421  [pdf, other

    cs.DS

    Substring Complexities on Run-length Compressed Strings

    Authors: Akiyoshi Kawamoto, Tomohiro I

    Abstract: Let $S_{T}(k)$ denote the set of distinct substrings of length $k$ in a string $T$, then the $k$-th substring complexity is defined by its cardinality $|S_{T}(k)|$. Recently, $δ= \max \{ |S_{T}(k)| / k : k \ge 1 \}$ is shown to be a good compressibility measure of highly-repetitive strings. In this paper, given $T$ of length $n$ in the run-length compressed form of size $r$, we show that $δ$ can b… ▽ More

    Submitted 24 May, 2022; originally announced May 2022.

  5. LCP-dropout: Compression-based Multiple Subword Segmentation for Neural Machine Translation

    Authors: Keita Nonaka, Kazutaka Yamanouchi, Tomohiro I, Tsuyoshi Okita, Kazutaka Shimada, Hiroshi Sakamoto

    Abstract: In this study, we propose a simple and effective preprocessing method for subword segmentation based on a data compression algorithm. Compression-based subword segmentation has recently attracted significant attention as a preprocessing method for training data in Neural Machine Translation. Among them, BPE/BPE-dropout is one of the fastest and most effective method compared to conventional approa… ▽ More

    Submitted 19 March, 2022; v1 submitted 28 February, 2022; originally announced February 2022.

    Comments: 12 pages

    Journal ref: Electronics 11(7), Article number 1014, 2022

  6. arXiv:2202.07189  [pdf, other

    cs.DS

    Longest (Sub-)Periodic Subsequence

    Authors: Hideo Bannai, Tomohiro I, Dominik Köppl

    Abstract: We present an algorithm computing the longest periodic subsequence of a string of length $n$ in $O(n^7)$ time with $O(n^4)$ words of space. We obtain improvements when restricting the exponents or extending the search allowing the reported subsequence to be subperiodic down to $O(n^3)$ time and $O(n^2)$ words of space.

    Submitted 14 February, 2022; originally announced February 2022.

  7. arXiv:2201.06773  [pdf, other

    cs.DS

    Computing Longest (Common) Lyndon Subsequences

    Authors: Hideo Bannai, Tomohiro I, Tomasz Kociumaka, Dominik Köppl, Simon J. Puglisi

    Abstract: Given a string $T$ with length $n$ whose characters are drawn from an ordered alphabet of size $σ$, its longest Lyndon subsequence is a longest subsequence of $T$ that is a Lyndon word. We propose algorithms for finding such a subsequence in $O(n^3)$ time with $O(n)$ space, or online in $O(n^3 σ)$ space and time. Our first result can be extended to find the longest common Lyndon subsequence of two… ▽ More

    Submitted 18 January, 2022; originally announced January 2022.

  8. Privacy-Preserving Feature Selection with Fully Homomorphic Encryption

    Authors: Shinji Ono, Jun Takata, Masaharu Kataoka, Tomohiro I, Kilho Shin, Hiroshi Sakamoto

    Abstract: For the feature selection problem, we propose an efficient privacy-preserving algorithm. Let $D$, $F$, and $C$ be data, feature, and class sets, respectively, where the feature value $x(F_i)$ and the class label $x(C)$ are given for each $x\in D$ and $F_i \in F$. For a triple $(D,F,C)$, the feature selection problem is to find a consistent and minimal subset $F' \subseteq F$, where `consistent' me… ▽ More

    Submitted 1 June, 2022; v1 submitted 11 October, 2021; originally announced October 2021.

    Comments: 14 pages

    Journal ref: Algorithms 15(7), Article number 229, 2022

  9. arXiv:2104.09985  [pdf, other

    cs.DM math.CO

    A Separation of $γ$ and $b$ via Thue--Morse Words

    Authors: Hideo Bannai, Mitsuru Funakoshi, Tomohiro I, Dominik Koeppl, Takuya Mieno, Takaaki Nishimoto

    Abstract: We prove that for $n\geq 2$, the size $b(t_n)$ of the smallest bidirectional scheme for the $n$th Thue--Morse word $t_n$ is $n+2$. Since Kutsukake et al. [SPIRE 2020] show that the size $γ(t_n)$ of the smallest string attractor for $t_n$ is $4$ for $n \geq 4$, this shows for the first time that there is a separation between the size of the smallest string attractor $γ$ and the size of the smallest… ▽ More

    Submitted 19 April, 2021; originally announced April 2021.

  10. arXiv:2104.08751  [pdf, other

    cs.DS

    Load-Balancing Succinct B Trees

    Authors: Tomohiro I, Dominik Köppl

    Abstract: We propose a B tree representation storing $n$ keys, each of $k$ bits, in either (a) $nk + O(nk / \lg n)$ bits or (b) $nk + O(nk \lg \lg n/ \lg n)$ bits of space supporting all B tree operations in either (a) $O(\lg n )$ time or (b) $O(\lg n / \lg \lg n)$ time, respectively. We can augment each node with an aggregate value such as the minimum value within its subtree, and maintain these aggregate… ▽ More

    Submitted 18 April, 2021; originally announced April 2021.

  11. arXiv:2011.05610  [pdf, ps, other

    cs.DS

    PHONI: Streamed Matching Statistics with Multi-Genome References

    Authors: Christina Boucher, Travis Gagie, Tomohiro I, Dominik Köppl, Ben Langmead, Giovanni Manzini, Gonzalo Navarro, Alejandro Pacheco, Massimiliano Rossi

    Abstract: Computing the matching statistics of patterns with respect to a text is a fundamental task in bioinformatics, but a formidable one when the text is a highly compressed genomic database. Bannai et al. gave an efficient solution for this case, which Rossi et al. recently implemented, but it uses two passes over the patterns and buffers a pointer for each character during the first pass. In this pape… ▽ More

    Submitted 11 February, 2021; v1 submitted 11 November, 2020; originally announced November 2020.

    Comments: Our code is available at https://github.com/koeppl/phoni

  12. arXiv:2010.11132  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Sentence Boundary Augmentation For Neural Machine Translation Robustness

    Authors: Daniel Li, Te I, Naveen Arivazhagan, Colin Cherry, Dirk Padfield

    Abstract: Neural Machine Translation (NMT) models have demonstrated strong state of the art performance on translation tasks where well-formed training and evaluation data are provided, but they remain sensitive to inputs that include errors of various types. Specifically, in the context of long-form speech translation systems, where the input transcripts come from Automatic Speech Recognition (ASR), the NM… ▽ More

    Submitted 21 October, 2020; originally announced October 2020.

    Comments: 5 pages, 4 figures

  13. arXiv:2002.03947  [pdf, other

    cond-mat.mes-hall quant-ph

    Design of a Single-Shot Electron detector with sub-electron sensitivity for electron flying qubit operation

    Authors: Glattli D. C., Nath J., Taktak I., Roulleau P., Bauerle C., Waintal X

    Abstract: The recent realization of coherent single-electron sources in ballistic conductors let us envision performing time-resolved electronic interferometry experiments analogous to quantum optics experiments.One could eventually use propagating electronic excitations as flying qubits. However an important missing brick is the single-shot electron detection which would enable a complete quantum informati… ▽ More

    Submitted 10 February, 2020; originally announced February 2020.

    Comments: 7 pages, 3 figures

  14. arXiv:1912.03393  [pdf, other

    cs.CL cs.AI cs.LG

    Re-Translation Strategies For Long Form, Simultaneous, Spoken Language Translation

    Authors: Naveen Arivazhagan, Colin Cherry, Te I, Wolfgang Macherey, Pallavi Baljekar, George Foster

    Abstract: We investigate the problem of simultaneous machine translation of long-form speech content. We target a continuous speech-to-text scenario, generating translated captions for a live audio feed, such as a lecture or play-by-play commentary. As this scenario allows for revisions to our incremental translations, we adopt a re-translation approach to simultaneous translation, where the source is repea… ▽ More

    Submitted 7 April, 2020; v1 submitted 6 December, 2019; originally announced December 2019.

    Comments: ICASSP 2020

  15. arXiv:1911.10719  [pdf, other

    cs.CR

    Faster Privacy-Preserving Computation of Edit Distance with Moves

    Authors: Yohei Yoshimoto, Masaharu Kataoka, Yoshimasa Takabatake, Tomohiro I, Kilho Shin, Hiroshi Sakamoto

    Abstract: We consider an efficient two-party protocol for securely computing the similarity of strings w.r.t. an extended edit distance measure. Here, two parties possessing strings $x$ and $y$, respectively, want to jointly compute an approximate value for $\mathrm{EDM}(x,y)$, the minimum number of edit operations including substring moves needed to transform $x$ into $y$, without revealing any private inf… ▽ More

    Submitted 28 November, 2019; v1 submitted 25 November, 2019; originally announced November 2019.

    Comments: to appear in WALCOM 2020

    MSC Class: D.4.6; E.3 ACM Class: D.4.6; E.3

  16. arXiv:1910.07145  [pdf, other

    cs.DS

    Practical Random Access to SLP-Compressed Texts

    Authors: Travis Gagie, Tomohiro I, Giovanni Manzini, Gonzalo Navarro, Hiroshi Sakamoto, Louisa Seelbach Benkner, Yoshimasa Takabatake

    Abstract: Grammar-based compression is a popular and powerful approach to compressing repetitive texts but until recently its relatively poor time-space trade-offs during real-life construction made it impractical for truly massive datasets such as genomic databases. In a recent paper (SPIRE 2019) we showed how simple pre-processing can dramatically improve those trade-offs, and in this paper we turn our at… ▽ More

    Submitted 19 July, 2020; v1 submitted 15 October, 2019; originally announced October 2019.

    Comments: Accepted to SPIRE 2020

  17. arXiv:1908.04933  [pdf, ps, other

    cs.DS

    Re-Pair In Small Space

    Authors: Dominik Köppl, Tomohiro I, Isamu Furuya, Yoshimasa Takabatake, Kensuke Sakai, Keisuke Goto

    Abstract: Re-Pair is a grammar compression scheme with favorably good compression rates. The computation of Re-Pair comes with the cost of maintaining large frequency tables, which makes it hard to compute Re-Pair on large scale data sets. As a solution for this problem we present, given a text of length $n$ whose characters are drawn from an integer alphabet, an… ▽ More

    Submitted 16 November, 2019; v1 submitted 13 August, 2019; originally announced August 2019.

  18. arXiv:1907.03615  [pdf

    quant-ph

    Relaxation and pum** of quantum oscillator nonresonantly coupled with the other oscillator

    Authors: Trubilko A. I., Basharov A. M

    Abstract: The paper shows mechanisms of both the pum** and energy decay of an "isolated" oscillator. The oscillator is only non-resonantly coupled with the adjacent oscillator which resonantly interacts with the thermal bath environment. Under these conditions the "isolated" oscillator begins interacting with the thermal bath environment of the adjacent oscillator. The conclusion is based on the kinetic e… ▽ More

    Submitted 8 July, 2019; originally announced July 2019.

    Comments: 8 pages, 1 figure

  19. arXiv:1906.00809  [pdf, ps, other

    cs.DS

    Rpair: Rescaling RePair with Rsync

    Authors: Travis Gagie, Tomohiro I, Giovanni Manzini, Gonzalo Navarro, Hiroshi Sakamoto, Yoshimasa Takabatake

    Abstract: Data compression is a powerful tool for managing massive but repetitive datasets, especially schemes such as grammar-based compression that support computation over the data without decompressing it. In the best case such a scheme takes a dataset so big that it must be stored on disk and shrinks it enough that it can be stored and processed in internal memory. Even then, however, the scheme is ess… ▽ More

    Submitted 3 June, 2019; originally announced June 2019.

  20. arXiv:1811.01472  [pdf, other

    cs.DS

    RePair in Compressed Space and Time

    Authors: Kensuke Sakai, Tatsuya Ohno, Keisuke Goto, Yoshimasa Takabatake, Tomohiro I, Hiroshi Sakamoto

    Abstract: Given a string $T$ of length $N$, the goal of grammar compression is to construct a small context-free grammar generating only $T$. Among existing grammar compression methods, RePair (recursive paring) [Larsson and Moffat, 1999] is notable for achieving good compression ratios in practice. Although the original paper already achieved a time-optimal algorithm to compute the RePair grammar RePair(… ▽ More

    Submitted 4 November, 2018; originally announced November 2018.

  21. arXiv:1806.00198  [pdf, ps, other

    cs.DS

    Block Palindromes: A New Generalization of Palindromes

    Authors: Keisuke Goto, Tomohiro I, Hideo Bannai, Shunsuke Inenaga

    Abstract: We study a new generalization of palindromes and gapped palindromes called block palindromes. A block palindrome is a string that becomes a palindrome when identical substrings are replaced with a distinct character. We investigate several properties of block palindromes and in particular, study substrings of a string which are block palindromes. In so doing, we introduce the notion of a \emph{max… ▽ More

    Submitted 6 August, 2018; v1 submitted 1 June, 2018; originally announced June 2018.

    Comments: 7 pages

  22. arXiv:1802.10355  [pdf, ps, other

    cs.FL

    Improved Upper Bounds on all Maximal $α$-gapped Repeats and Palindromes

    Authors: Tomohiro I, Dominik Köppl

    Abstract: We show that the number of all maximal $α$-gapped repeats and palindromes of a word of length $n$ is at most $3(π^2/6 + 5/2) αn$ and $7 (π^2 / 6 + 1/2) αn - 5 n - 1$, respectively.

    Submitted 28 February, 2018; originally announced February 2018.

  23. arXiv:1802.05906  [pdf, other

    cs.DS

    Refining the $r$-index

    Authors: Hideo Bannai, Travis Gagie, Tomohiro I

    Abstract: Gagie, Navarro and Prezza's $r$-index (SODA, 2018) promises to speed up DNA alignment and variation calling by allowing us to index entire genomic databases, provided certain obstacles can be overcome. In this paper we first strengthen and simplify Policriti and Prezza's Toehold Lemma (DCC '16; Algorithmica, 2017), which inspired the $r$-index and plays an important role in its implementation. We… ▽ More

    Submitted 4 July, 2019; v1 submitted 16 February, 2018; originally announced February 2018.

    Comments: An extended version of the paper presented at CPM 2018 under the title "Online LZ77 parsing and matching statistics with RLBWTs"

  24. arXiv:1704.05233  [pdf, other

    cs.DS

    A Faster Implementation of Online Run-Length Burrows-Wheeler Transform

    Authors: Tatsuya Ohno, Yoshimasa Takabatake, Tomohiro I, Hiroshi Sakamoto

    Abstract: Run-length encoding Burrows-Wheeler Transformed strings, resulting in Run-Length BWT (RLBWT), is a powerful tool for processing highly repetitive strings. We propose a new algorithm for online RLBWT working in run-compressed space, which runs in $O(n\lg r)$ time and $O(r\lg n)$ bits of space, where $n$ is the length of input string $S$ received so far and $r$ is the number of runs in the BWT of th… ▽ More

    Submitted 14 October, 2017; v1 submitted 18 April, 2017; originally announced April 2017.

    Comments: In Proc. IWOCA2017

  25. arXiv:1611.05359  [pdf, other

    cs.DS

    Longest Common Extensions with Recompression

    Authors: Tomohiro I

    Abstract: Given two positions $i$ and $j$ in a string $T$ of length $N$, a longest common extension (LCE) query asks for the length of the longest common prefix between suffixes beginning at $i$ and $j$. A compressed LCE data structure is a data structure that stores $T$ in a compressed form while supporting fast LCE queries. In this article we show that the recompression technique is a powerful tool for co… ▽ More

    Submitted 20 November, 2016; v1 submitted 16 November, 2016; originally announced November 2016.

  26. arXiv:1608.06028  [pdf, ps, other

    cond-mat.str-el quant-ph

    Steady States of Infinite-Size Dissipative Quantum Chains via Imaginary Time Evolution

    Authors: Adil A. Gangat, Te I, Ying-Jer Kao

    Abstract: Directly in the thermodynamic limit, we show how to combine imaginary and real time evolution of tensor networks to efficiently and accurately find the nonequilibrium steady states (NESS) of one-dimensional dissipative quantum lattices governed by the Lindblad master equation. The imaginary time evolution first bypasses any highly correlated portions of the real-time evolution trajectory by direct… ▽ More

    Submitted 6 December, 2016; v1 submitted 21 August, 2016; originally announced August 2016.

    Comments: 5+3 pages, 5 figures, 2 tables

    Journal ref: Phys. Rev. Lett. 119, 010501 (2017)

  27. arXiv:1605.09558  [pdf, ps, other

    cs.DS

    Dynamic index and LZ factorization in compressed space

    Authors: Takaaki Nishimoto, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    Abstract: In this paper, we propose a new \emph{dynamic compressed index} of $O(w)$ space for a dynamic text $T$, where $w = O(\min(z \log N \log^*M, N))$ is the size of the signature encoding of $T$, $z$ is the size of the Lempel-Ziv77 (LZ77) factorization of $T$, $N$ is the length of $T$, and $M \geq 3N$ is an integer that can be handled in constant time under word RAM model. Our index supports searching… ▽ More

    Submitted 19 July, 2016; v1 submitted 31 May, 2016; originally announced May 2016.

    Comments: arXiv admin note: substantial text overlap with arXiv:1605.01488; text overlap with arXiv:1504.06954

  28. arXiv:1605.01488  [pdf, ps, other

    cs.DS

    Fully dynamic data structure for LCE queries in compressed space

    Authors: Takaaki Nishimoto, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    Abstract: A Longest Common Extension (LCE) query on a text $T$ of length $N$ asks for the length of the longest common prefix of suffixes starting at given two positions. We show that the signature encoding $\mathcal{G}$ of size $w = O(\min(z \log N \log^* M, N))$ [Mehlhorn et al., Algorithmica 17(2):183-198, 1997] of $T$, which can be seen as a compressed representation of $T$, has a capability to support… ▽ More

    Submitted 26 June, 2016; v1 submitted 5 May, 2016; originally announced May 2016.

    Comments: arXiv admin note: text overlap with arXiv:1504.06954

  29. arXiv:1601.07670  [pdf, ps, other

    cs.DS

    Deterministic sub-linear space LCE data structures with efficient construction

    Authors: Yuka Tanimura, Tomohiro I, Hideo Bannai, Shunsuke Inenaga, Simon J. Puglisi, Masayuki Takeda

    Abstract: Given a string $S$ of $n$ symbols, a longest common extension query $\mathsf{LCE}(i,j)$ asks for the length of the longest common prefix of the $i$th and $j$th suffixes of $S$. LCE queries have several important applications in string processing, perhaps most notably to suffix sorting. Recently, Bille et al. (J. Discrete Algorithms 25:42-50, 2014, Proc. CPM 2015: 65-76) described several data stru… ▽ More

    Submitted 29 January, 2016; v1 submitted 28 January, 2016; originally announced January 2016.

    Comments: updated title

  30. arXiv:1509.09237  [pdf, other

    cs.DS

    Efficiently Finding All Maximal $α$-gapped Repeats

    Authors: Paweł Gawrychowski, Tomohiro I, Shunsuke Inenaga, Dominik Köppl, Florin Manea

    Abstract: For $α\geq 1$, an $α$-gapped repeat in a word $w$ is a factor $uvu$ of $w$ such that $|uv|\leq α|u|$; the two factors $u$ in such a repeat are called arms, while the factor $v$ is called gap. Such a repeat is called maximal if its arms cannot be extended simultaneously with the same symbol to the right or, respectively, to the left. In this paper we show that the number of maximal $α$-gapped repea… ▽ More

    Submitted 30 September, 2015; originally announced September 2015.

  31. arXiv:1509.07417  [pdf, other

    cs.DS

    Deterministic Sparse Suffix Sorting in the Restore Model

    Authors: Johannes Fischer, Tomohiro I, Dominik Köppl

    Abstract: Given a text $T$ of length $n$, we propose a deterministic online algorithm computing the sparse suffix array and the sparse longest common prefix array of $T$ in $O(c \sqrt{\lg n} + m \lg m \lg n \lg^* n)$ time with $O(m)$ words of space under the premise that the space of $T$ is rewritable, where $m \le n$ is the number of suffixes to be sorted (provided online and arbitrarily), and $c$ is the n… ▽ More

    Submitted 28 February, 2018; v1 submitted 24 September, 2015; originally announced September 2015.

  32. arXiv:1504.02605  [pdf, ps, other

    cs.DS

    Lempel Ziv Computation In Small Space (LZ-CISS)

    Authors: Johannes Fischer, Tomohiro I, Dominik Köppl

    Abstract: For both the Lempel Ziv 77- and 78-factorization we propose algorithms generating the respective factorization using $(1+ε) n \lg n + O(n)$ bits (for any positive constant $ε\le 1$) working space (including the space for the output) for any text of size \$n\$ over an integer alphabet in $O(n / ε^{2})$ time.

    Submitted 10 April, 2015; originally announced April 2015.

    Comments: Full Version of CPM 2015 paper

  33. Beyond the Runs Theorem

    Authors: Johannes Fischer, Štěpán Holub, Tomohiro I, Moshe Lewenstein

    Abstract: Recently, a short and elegant proof was presented showing that a binary word of length $n$ contains at most $n-3$ runs. Here we show, using the same technique and a computer search, that the number of runs in a binary word of length $n$ is at most $\frac{22}{23}n<0.957n$.

    Submitted 30 April, 2015; v1 submitted 16 February, 2015; originally announced February 2015.

    Comments: New version with substantially improved bound and coauthors who carried out a similar research independently

    MSC Class: 68R15

    Journal ref: SPIRE 2015, LNCS 9309, 277-286

  34. arXiv:1501.06619  [pdf, ps, other

    cs.DS

    Constructing LZ78 Tries and Position Heaps in Linear Time for Large Alphabets

    Authors: Yuto Nakashima, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    Abstract: We present the first worst-case linear-time algorithm to compute the Lempel-Ziv 78 factorization of a given string over an integer alphabet. Our algorithm is based on nearest marked ancestor queries on the suffix tree of the given string. We also show that the same technique can be used to construct the position heap of a set of strings in worst-case linear time, when the set of strings is given a… ▽ More

    Submitted 26 January, 2015; originally announced January 2015.

  35. Size calibration of strained epitaxial islands due to dipole-monopole interaction

    Authors: Tokar V. I., Dreyssé H

    Abstract: Irreversible growth of strained epitaxial nanoislands has been studied with the use of the kinetic Monte Carlo (KMC) technique. It has been shown that the strain-inducing size misfit between the substrate and the overlayer produces long range dipole-monopole (d-m) interaction between the mobile adatoms and the islands. To simplify the account of the long range interactions in the KMC simulations,… ▽ More

    Submitted 6 October, 2014; originally announced October 2014.

    Comments: 14 pages, 5 figures

  36. The "Runs" Theorem

    Authors: Hideo Bannai, Tomohiro I, Shunsuke Inenaga, Yuto Nakashima, Masayuki Takeda, Kazuya Tsuruta

    Abstract: We give a new characterization of maximal repetitions (or runs) in strings based on Lyndon words. The characterization leads to a proof of what was known as the "runs" conjecture (Kolpakov \& Kucherov (FOCS '99)), which states that the maximum number of runs $ρ(n)$ in a string of length $n$ is less than $n$. The proof is remarkably simple, considering the numerous endeavors to tackle this problem… ▽ More

    Submitted 3 June, 2015; v1 submitted 2 June, 2014; originally announced June 2014.

    Comments: simple proof with some more bounds

    Journal ref: SIAM J. Comput., 46(5), 1501-1514, 2017

  37. arXiv:1305.6095  [pdf, ps, other

    cs.DS

    Faster Compact On-Line Lempel-Ziv Factorization

    Authors: Jun'ichi Yamamoto, Tomohiro I, Hideo Bannai, Shunsuke Inenaga, Masayuki Takeda

    Abstract: We present a new on-line algorithm for computing the Lempel-Ziv factorization of a string that runs in $O(N\log N)$ time and uses only $O(N\logσ)$ bits of working space, where $N$ is the length of the string and $σ$ is the size of the alphabet. This is a notable improvement compared to the performance of previous on-line algorithms using the same order of working space but running in either… ▽ More

    Submitted 26 May, 2013; originally announced May 2013.

  38. arXiv:1304.7067  [pdf, ps, other

    cs.DS

    Detecting regularities on grammar-compressed strings

    Authors: Tomohiro I, Wataru Matsubara, Kouji Shimohira, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda, Kazuyuki Narisawa, Ayumi Shinohara

    Abstract: We solve the problems of detecting and counting various forms of regularities in a string represented as a Straight Line Program (SLP). Given an SLP of size $n$ that represents a string $s$ of length $N$, our algorithm compute all runs and squares in $s$ in $O(n^3h)$ time and $O(n^2)$ space, where $h$ is the height of the derivation tree of the SLP. We also show an algorithm to compute all gapped-… ▽ More

    Submitted 26 April, 2013; originally announced April 2013.

  39. arXiv:1304.7061  [pdf, ps, other

    cs.DS

    Efficient Lyndon factorization of grammar compressed text

    Authors: Tomohiro I, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    Abstract: We present an algorithm for computing the Lyndon factorization of a string that is given in grammar compressed form, namely, a Straight Line Program (SLP). The algorithm runs in $O(n^4 + mn^3h)$ time and $O(n^2)$ space, where $m$ is the size of the Lyndon factorization, $n$ is the size of the SLP, and $h$ is the height of the derivation tree of the SLP. Since the length of the decompressed string… ▽ More

    Submitted 26 April, 2013; originally announced April 2013.

    Comments: CPM 2013

  40. arXiv:1303.3945  [pdf, ps, other

    cs.DS

    Computing convolution on grammar-compressed text

    Authors: Toshiya Tanaka, Tomohiro I, Shunsuke Inenaga, Hideo Bannai, Masayuki Takeda

    Abstract: The convolution between a text string $S$ of length $N$ and a pattern string $P$ of length $m$ can be computed in $O(N \log m)$ time by FFT. It is known that various types of approximate string matching problems are reducible to convolution. In this paper, we assume that the input text string is given in a compressed form, as a \emph{straight-line program (SLP)}, which is a context free grammar in… ▽ More

    Submitted 16 March, 2013; originally announced March 2013.

    Comments: DCC 2013