Skip to main content

Showing 1–21 of 21 results for author: Goto, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.04951  [pdf, other

    cs.DS

    NP-Completeness for the Space-Optimality of Double-Array Tries

    Authors: Hideo Bannai, Keisuke Goto, Shunsuke Kanda, Dominik Köppl

    Abstract: Indexing a set of strings for prefix search or membership queries is a fundamental task with many applications such as information retrieval or database systems. A classic abstract data type for modelling such an index is a trie. Due to the fundamental nature of this problem, it has sparked much interest, leading to a variety of trie implementations with different characteristics. A trie implement… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  2. arXiv:2303.03036  [pdf, other

    stat.ML cs.LG

    Deep Clustering with a Constraint for Topological Invariance based on Symmetric InfoNCE

    Authors: Yuhui Zhang, Yuichiro Wada, Hiroki Waida, Kaito Goto, Yusaku Hino, Takafumi Kanamori

    Abstract: We consider the scenario of deep clustering, in which the available prior knowledge is limited. In this scenario, few existing state-of-the-art deep clustering methods can perform well for both non-complex topology and complex topology datasets. To address the problem, we propose a constraint utilizing symmetric InfoNCE, which helps an objective of deep clustering method in the scenario train the… ▽ More

    Submitted 6 March, 2023; originally announced March 2023.

    Comments: 48 pages, 6 figures

  3. arXiv:2301.04295  [pdf, other

    cs.DS

    Linear Time Online Algorithms for Constructing Linear-size Suffix Trie

    Authors: Diptarama Hendrian, Takuya Takagi, Shunsuke Inenaga, Keisuke Goto, Mitsuru Funakoshi

    Abstract: The suffix trees are fundamental data structures for various kinds of string processing. The suffix tree of a text string $T$ of length $n$ has $O(n)$ nodes and edges, and the string label of each edge is encoded by a pair of positions in $T$. Thus, even after the tree is built, the input string $T$ needs to be kept stored and random access to $T$ is still needed. The \emph{linear-size suffix trie… ▽ More

    Submitted 4 December, 2023; v1 submitted 10 January, 2023; originally announced January 2023.

    Comments: 27 pages, 14 figures. arXiv admin note: text overlap with arXiv:1901.10045

  4. arXiv:2207.02571  [pdf, other

    cs.DS

    Computing NP-hard Repetitiveness Measures via MAX-SAT

    Authors: Hideo Bannai, Keisuke Goto, Masakazu Ishihata, Shunsuke Kanda, Dominik Köppl, Takaaki Nishimoto

    Abstract: Repetitiveness measures reveal profound characteristics of datasets, and give rise to compressed data structures and algorithms working in compressed space. Alas, the computation of some of these measures is NP-hard, and straight-forward computation is infeasible for datasets of even small sizes. Three such measures are the smallest size of a string attractor, the smallest size of a bidirectional… ▽ More

    Submitted 12 July, 2022; v1 submitted 6 July, 2022; originally announced July 2022.

    Comments: paper accepted to ESA 2022 (plus Appendix); corrected attribution of Python program for computing https://oeis.org/A339391

  5. arXiv:2101.11906  [pdf, other

    physics.data-an cs.LG hep-ex physics.ins-det

    Development of a Vertex Finding Algorithm using Recurrent Neural Network

    Authors: Kiichi Goto, Taikan Suehara, Tamaki Yoshioka, Masakazu Kurata, Hajime Nagahara, Yuta Nakashima, Noriko Takemura, Masako Iwasaki

    Abstract: Deep learning is a rapidly-evolving technology with possibility to significantly improve physics reach of collider experiments. In this study we developed a novel algorithm of vertex finding for future lepton colliders such as the International Linear Collider. We deploy two networks; one is simple fully-connected layers to look for vertex seeds from track pairs, and the other is a customized Recu… ▽ More

    Submitted 19 November, 2022; v1 submitted 28 January, 2021; originally announced January 2021.

    Comments: 16 pages, 9 figures

    Journal ref: Nucl.Instrum.MethodsPhys.Res. 1047 (2023) 167836

  6. arXiv:2006.04326  [pdf, ps, other

    eess.AS cs.SD

    Semi-Supervised Contrastive Learning with Generalized Contrastive Loss and Its Application to Speaker Recognition

    Authors: Nakamasa Inoue, Keita Goto

    Abstract: This paper introduces a semi-supervised contrastive learning framework and its application to text-independent speaker verification. The proposed framework employs generalized contrastive loss (GCL). GCL unifies losses from two different learning frameworks, supervised metric learning and unsupervised contrastive learning, and thus it naturally determines the loss for semi-supervised learning. In… ▽ More

    Submitted 7 June, 2020; originally announced June 2020.

  7. arXiv:2004.08015  [pdf, other

    cs.DB

    Efficient Constrained Pattern Mining Using Dynamic Item Ordering for Explainable Classification

    Authors: Hiroaki Iwashita, Takuya Takagi, Hirofumi Suzuki, Keisuke Goto, Kotaro Ohori, Hiroki Arimura

    Abstract: Learning of interpretable classification models has been attracting much attention for the last few years. Discovery of succinct and contrasting patterns that can highlight the differences between the two classes is very important. Such patterns are useful for human experts, and can be used to construct powerful classifiers. In this paper, we consider mining of minimal emerging patterns from high-… ▽ More

    Submitted 16 April, 2020; originally announced April 2020.

  8. arXiv:1908.04933  [pdf, ps, other

    cs.DS

    Re-Pair In Small Space

    Authors: Dominik Köppl, Tomohiro I, Isamu Furuya, Yoshimasa Takabatake, Kensuke Sakai, Keisuke Goto

    Abstract: Re-Pair is a grammar compression scheme with favorably good compression rates. The computation of Re-Pair comes with the cost of maintaining large frequency tables, which makes it hard to compute Re-Pair on large scale data sets. As a solution for this problem we present, given a text of length $n$ whose characters are drawn from an integer alphabet, an… ▽ More

    Submitted 16 November, 2019; v1 submitted 13 August, 2019; originally announced August 2019.

  9. arXiv:1811.01472  [pdf, other

    cs.DS

    RePair in Compressed Space and Time

    Authors: Kensuke Sakai, Tatsuya Ohno, Keisuke Goto, Yoshimasa Takabatake, Tomohiro I, Hiroshi Sakamoto

    Abstract: Given a string $T$ of length $N$, the goal of grammar compression is to construct a small context-free grammar generating only $T$. Among existing grammar compression methods, RePair (recursive paring) [Larsson and Moffat, 1999] is notable for achieving good compression ratios in practice. Although the original paper already achieved a time-optimal algorithm to compute the RePair grammar RePair(… ▽ More

    Submitted 4 November, 2018; originally announced November 2018.

  10. arXiv:1806.00198  [pdf, ps, other

    cs.DS

    Block Palindromes: A New Generalization of Palindromes

    Authors: Keisuke Goto, Tomohiro I, Hideo Bannai, Shunsuke Inenaga

    Abstract: We study a new generalization of palindromes and gapped palindromes called block palindromes. A block palindrome is a string that becomes a palindrome when identical substrings are replaced with a distinct character. We investigate several properties of block palindromes and in particular, study substrings of a string which are block palindromes. In so doing, we introduce the notion of a \emph{max… ▽ More

    Submitted 6 August, 2018; v1 submitted 1 June, 2018; originally announced June 2018.

    Comments: 7 pages

  11. Data-Driven Analysis of Pareto Set Topology

    Authors: Naoki Hamada, Keisuke Goto

    Abstract: When and why can evolutionary multi-objective optimization (EMO) algorithms cover the entire Pareto set? That is a major concern for EMO researchers and practitioners. A recent theoretical study revealed that (roughly speaking) if the Pareto set forms a topological simplex (a curved line, a curved triangle, a curved tetrahedron, etc.), then decomposition-based EMO algorithms can cover the entire P… ▽ More

    Submitted 19 April, 2018; originally announced April 2018.

    Comments: 8 pages, accepted at GECCO'18 as a full paper

  12. In-Place Initializable Arrays

    Authors: Takashi Katoh, Keisuke Goto

    Abstract: An initializable array is an array that supports the read and write operations for any element and the initialization of the entire array. This paper proposes a simple in-place algorithm to implement an initializable array of length $N$ containing $\ell \in O(w)$ bits entries in $N \ell +1$ bits on the word RAM model with $w$ bits word size, i.e., the proposed array requires only 1 extra bit on to… ▽ More

    Submitted 21 December, 2021; v1 submitted 26 September, 2017; originally announced September 2017.

  13. arXiv:1705.09779  [pdf, ps, other

    cs.DS

    Linear-size CDAWG: new repetition-aware indexing and grammar compression

    Authors: Takuya Takagi, Keisuke Goto, Yuta Fujishige, Shunsuke Inenaga, Hiroki Arimura

    Abstract: In this paper, we propose a novel approach to combine \emph{compact directed acyclic word graphs} (CDAWGs) and grammar-based compression. This leads us to an efficient self-index, called Linear-size CDAWGs (L-CDAWGs), which can be represented with $O(\tilde e_T \log n)$ bits of space allowing for $O(\log n)$-time random and $O(1)$-time sequential accesses to edge labels, and $O(m \log σ+ occ)$-tim… ▽ More

    Submitted 27 July, 2017; v1 submitted 27 May, 2017; originally announced May 2017.

    Comments: 12 pages, 2 figures

  14. arXiv:1703.01009  [pdf, other

    cs.DS

    Optimal Time and Space Construction of Suffix Arrays and LCP Arrays for Integer Alphabets

    Authors: Keisuke Goto

    Abstract: Suffix arrays and LCP arrays are one of the most fundamental data structures widely used for various kinds of string processing. We consider two problems for a read-only string of length $N$ over an integer alphabet $[1, \dots, σ]$ for $1 \leq σ\leq N$, the string contains $σ$ distinct characters, the construction of the suffix array, and a simultaneous construction of both the suffix array and LC… ▽ More

    Submitted 13 July, 2019; v1 submitted 2 March, 2017; originally announced March 2017.

  15. arXiv:1310.1448  [pdf, ps, other

    cs.DS

    Space Efficient Linear Time Lempel-Ziv Factorization on Constant~Size~Alphabets

    Authors: Keisuke Goto, Hideo Bannai

    Abstract: We present a new algorithm for computing the Lempel-Ziv Factorization (LZ77) of a given string of length $N$ in linear time, that utilizes only $N\log N + O(1)$ bits of working space, i.e., a single integer array, for constant size integer alphabets. This greatly improves the previous best space requirement for linear time LZ77 factorization (Kärkkäinen et al. CPM 2013), which requires two integer… ▽ More

    Submitted 5 October, 2013; originally announced October 2013.

  16. arXiv:1211.3642  [pdf, ps, other

    cs.DS

    Simpler and Faster Lempel Ziv Factorization

    Authors: Keisuke Goto, Hideo Bannai

    Abstract: We present a new, simple, and efficient approach for computing the Lempel-Ziv (LZ77) factorization of a string in linear time, based on suffix arrays. Computational experiments on various data sets show that our approach constantly outperforms the currently fastest algorithm LZ OG (Ohlebusch and Gog 2011), and can be up to 2 to 3 times faster in the processing after obtaining the suffix array, whi… ▽ More

    Submitted 18 January, 2013; v1 submitted 15 November, 2012; originally announced November 2012.

  17. Speeding-up $q$-gram mining on grammar-based compressed texts

    Authors: Keisuke Goto, Hideo Bannai, Shunsuke Inenaga, Masayuki Takeda

    Abstract: We present an efficient algorithm for calculating $q$-gram frequencies on strings represented in compressed form, namely, as a straight line program (SLP). Given an SLP $\mathcal{T}$ of size $n$ that represents string $T$, the algorithm computes the occurrence frequencies of all $q$-grams in $T$, by reducing the problem to the weighted $q$-gram frequencies problem on a trie-like structure of size… ▽ More

    Submitted 15 February, 2012; originally announced February 2012.

  18. arXiv:1107.3022  [pdf, ps, other

    cs.DS

    Computing q-gram Non-overlap** Frequencies on SLP Compressed Texts

    Authors: Keisuke Goto, Hideo Bannai, Shunsuke Inenaga, Masayuki Takeda

    Abstract: Length-$q$ substrings, or $q$-grams, can represent important characteristics of text data, and determining the frequencies of all $q$-grams contained in the data is an important problem with many applications in the field of data mining and machine learning. In this paper, we consider the problem of calculating the {\em non-overlap** frequencies} of all $q$-grams in a text given in compressed fo… ▽ More

    Submitted 15 July, 2011; originally announced July 2011.

  19. arXiv:1107.3019  [pdf, ps, other

    cs.DS

    Computing q-gram Frequencies on Collage Systems

    Authors: Keisuke Goto, Hideo Bannai, Shunsuke Inenaga, Masayuki Takeda

    Abstract: Collage systems are a general framework for representing outputs of various text compression algorithms. We consider the all $q$-gram frequency problem on compressed string represented as a collage system, and present an $O((q+h\log n)n)$-time $O(qn)$-space algorithm for calculating the frequencies for all $q$-grams that occur in the string. Here, $n$ and $h$ are respectively the size and height o… ▽ More

    Submitted 15 July, 2011; originally announced July 2011.

  20. arXiv:1107.2729  [pdf, ps, other

    cs.DS

    Restructuring Compressed Texts without Explicit Decompression

    Authors: Keisuke Goto, Shirou Maruyama, Shunsuke Inenaga, Hideo Bannai, Hiroshi Sakamoto, Masayuki Takeda

    Abstract: We consider the problem of {\em restructuring} compressed texts without explicit decompression. We present algorithms which allow conversions from compressed representations of a string $T$ produced by any grammar-based compression algorithm, to representations produced by several specific compression algorithms including LZ77, LZ78, run length encoding, and some grammar based compression algorith… ▽ More

    Submitted 14 July, 2011; originally announced July 2011.

  21. arXiv:1103.3114  [pdf, ps, other

    cs.DS

    Fast $q$-gram Mining on SLP Compressed Strings

    Authors: Keisuke Goto, Hideo Bannai, Shunsuke Inenaga, Masayuki Takeda

    Abstract: We present simple and efficient algorithms for calculating $q$-gram frequencies on strings represented in compressed form, namely, as a straight line program (SLP). Given an SLP of size $n$ that represents string $T$, we present an $O(qn)$ time and space algorithm that computes the occurrence frequencies of $q$-grams in $T$. Computational experiments show that our algorithm and its variation are p… ▽ More

    Submitted 13 July, 2011; v1 submitted 16 March, 2011; originally announced March 2011.