Skip to main content

Showing 1–16 of 16 results for author: Sakamoto, H

Searching in archive cs. Search in all archives.
.
  1. LCP-dropout: Compression-based Multiple Subword Segmentation for Neural Machine Translation

    Authors: Keita Nonaka, Kazutaka Yamanouchi, Tomohiro I, Tsuyoshi Okita, Kazutaka Shimada, Hiroshi Sakamoto

    Abstract: In this study, we propose a simple and effective preprocessing method for subword segmentation based on a data compression algorithm. Compression-based subword segmentation has recently attracted significant attention as a preprocessing method for training data in Neural Machine Translation. Among them, BPE/BPE-dropout is one of the fastest and most effective method compared to conventional approa… ▽ More

    Submitted 19 March, 2022; v1 submitted 28 February, 2022; originally announced February 2022.

    Comments: 12 pages

    Journal ref: Electronics 11(7), Article number 1014, 2022

  2. Privacy-Preserving Feature Selection with Fully Homomorphic Encryption

    Authors: Shinji Ono, Jun Takata, Masaharu Kataoka, Tomohiro I, Kilho Shin, Hiroshi Sakamoto

    Abstract: For the feature selection problem, we propose an efficient privacy-preserving algorithm. Let $D$, $F$, and $C$ be data, feature, and class sets, respectively, where the feature value $x(F_i)$ and the class label $x(C)$ are given for each $x\in D$ and $F_i \in F$. For a triple $(D,F,C)$, the feature selection problem is to find a consistent and minimal subset $F' \subseteq F$, where `consistent' me… ▽ More

    Submitted 1 June, 2022; v1 submitted 11 October, 2021; originally announced October 2021.

    Comments: 14 pages

    Journal ref: Algorithms 15(7), Article number 229, 2022

  3. arXiv:1911.10719  [pdf, other

    cs.CR

    Faster Privacy-Preserving Computation of Edit Distance with Moves

    Authors: Yohei Yoshimoto, Masaharu Kataoka, Yoshimasa Takabatake, Tomohiro I, Kilho Shin, Hiroshi Sakamoto

    Abstract: We consider an efficient two-party protocol for securely computing the similarity of strings w.r.t. an extended edit distance measure. Here, two parties possessing strings $x$ and $y$, respectively, want to jointly compute an approximate value for $\mathrm{EDM}(x,y)$, the minimum number of edit operations including substring moves needed to transform $x$ into $y$, without revealing any private inf… ▽ More

    Submitted 28 November, 2019; v1 submitted 25 November, 2019; originally announced November 2019.

    Comments: to appear in WALCOM 2020

    MSC Class: D.4.6; E.3 ACM Class: D.4.6; E.3

  4. arXiv:1910.07145  [pdf, other

    cs.DS

    Practical Random Access to SLP-Compressed Texts

    Authors: Travis Gagie, Tomohiro I, Giovanni Manzini, Gonzalo Navarro, Hiroshi Sakamoto, Louisa Seelbach Benkner, Yoshimasa Takabatake

    Abstract: Grammar-based compression is a popular and powerful approach to compressing repetitive texts but until recently its relatively poor time-space trade-offs during real-life construction made it impractical for truly massive datasets such as genomic databases. In a recent paper (SPIRE 2019) we showed how simple pre-processing can dramatically improve those trade-offs, and in this paper we turn our at… ▽ More

    Submitted 19 July, 2020; v1 submitted 15 October, 2019; originally announced October 2019.

    Comments: Accepted to SPIRE 2020

  5. arXiv:1906.00809  [pdf, ps, other

    cs.DS

    Rpair: Rescaling RePair with Rsync

    Authors: Travis Gagie, Tomohiro I, Giovanni Manzini, Gonzalo Navarro, Hiroshi Sakamoto, Yoshimasa Takabatake

    Abstract: Data compression is a powerful tool for managing massive but repetitive datasets, especially schemes such as grammar-based compression that support computation over the data without decompressing it. In the best case such a scheme takes a dataset so big that it must be stored on disk and shrinks it enough that it can be stored and processed in internal memory. Even then, however, the scheme is ess… ▽ More

    Submitted 3 June, 2019; originally announced June 2019.

  6. arXiv:1811.01472  [pdf, other

    cs.DS

    RePair in Compressed Space and Time

    Authors: Kensuke Sakai, Tatsuya Ohno, Keisuke Goto, Yoshimasa Takabatake, Tomohiro I, Hiroshi Sakamoto

    Abstract: Given a string $T$ of length $N$, the goal of grammar compression is to construct a small context-free grammar generating only $T$. Among existing grammar compression methods, RePair (recursive paring) [Larsson and Moffat, 1999] is notable for achieving good compression ratios in practice. Although the original paper already achieved a time-optimal algorithm to compute the RePair grammar RePair(… ▽ More

    Submitted 4 November, 2018; originally announced November 2018.

  7. arXiv:1704.05233  [pdf, other

    cs.DS

    A Faster Implementation of Online Run-Length Burrows-Wheeler Transform

    Authors: Tatsuya Ohno, Yoshimasa Takabatake, Tomohiro I, Hiroshi Sakamoto

    Abstract: Run-length encoding Burrows-Wheeler Transformed strings, resulting in Run-Length BWT (RLBWT), is a powerful tool for processing highly repetitive strings. We propose a new algorithm for online RLBWT working in run-compressed space, which runs in $O(n\lg r)$ time and $O(r\lg n)$ bits of space, where $n$ is the length of input string $S$ received so far and $r$ is the number of runs in the BWT of th… ▽ More

    Submitted 14 October, 2017; v1 submitted 18 April, 2017; originally announced April 2017.

    Comments: In Proc. IWOCA2017

  8. arXiv:1607.04446  [pdf, other

    cs.DS

    Online Grammar Compression for Frequent Pattern Discovery

    Authors: Shouhei Fukunaga, Yoshimasa Takabatake, I Tomohiro, Hiroshi Sakamoto

    Abstract: Various grammar compression algorithms have been proposed in the last decade. A grammar compression is a restricted CFG deriving the string deterministically. An efficient grammar compression develops a smaller CFG by finding duplicated patterns and removing them. This process is just a frequent pattern discovery by grammatical inference. While we can get any frequent pattern in linear time using… ▽ More

    Submitted 30 August, 2016; v1 submitted 15 July, 2016; originally announced July 2016.

    Comments: 14 pages

  9. arXiv:1602.06688  [pdf, ps, other

    cs.DS

    siEDM: an efficient string index and search algorithm for edit distance with moves

    Authors: Yoshimasa Takabatake, Kenta Nakashima, Tetsuji Kuboyama, Yasuo Tabei, Hiroshi Sakamoto

    Abstract: Although several self-indexes for highly repetitive text collections exist, develo** an index and search algorithm with editing operations remains a challenge. Edit distance with moves (EDM) is a string-to-string distance measure that includes substring moves in addition to ordinal editing operations to turn one string into another. Although the problem of computing EDM is intractable, it has a… ▽ More

    Submitted 8 April, 2016; v1 submitted 22 February, 2016; originally announced February 2016.

    Comments: 23 pages

  10. arXiv:1507.00805  [pdf, ps, other

    cs.DS

    Online Self-Indexed Grammar Compression

    Authors: Yoshimasa Takabatake, Yasuo Tabei, Hiroshi Sakamoto

    Abstract: Although several grammar-based self-indexes have been proposed thus far, their applicability is limited to offline settings where whole input texts are prepared, thus requiring to rebuild index structures for given additional inputs, which is often the case in the big data era. In this paper, we present the first online self-indexed grammar compression named OESP-index that can gradually build the… ▽ More

    Submitted 6 July, 2015; v1 submitted 2 July, 2015; originally announced July 2015.

    Comments: To appear in the Proceedings of the 22nd edition of the International Symposium on String Processing and Information Retrieval (SPIRE2015)

  11. arXiv:1408.0467  [pdf, ps, other

    cs.DS

    Online Pattern Matching for String Edit Distance with Moves

    Authors: Yoshimasa Takabatake, Yasuo Tabei, Hiroshi Sakamoto

    Abstract: Edit distance with moves (EDM) is a string-to-string distance measure that includes substring moves in addition to ordinal editing operations to turn one string to the other. Although optimizing EDM is intractable, it has many applications especially in error detections. Edit sensitive parsing (ESP) is an efficient parsing algorithm that guarantees an upper bound of parsing discrepancies between d… ▽ More

    Submitted 26 August, 2014; v1 submitted 3 August, 2014; originally announced August 2014.

    Comments: This paper has been accepted to the 21st edition of the International Symposium on String Processing and Information Retrieval (SPIRE2014)

  12. arXiv:1404.4972  [pdf, ps, other

    cs.DS

    Improved ESP-index: a practical self-index for highly repetitive texts

    Authors: Yoshimasa Takabatake, Yasuo Tabei, Hiroshi Sakamoto

    Abstract: While several self-indexes for highly repetitive texts exist, develo** a practical self-index applicable to real world repetitive texts remains a challenge. ESP-index is a grammar-based self-index on the notion of edit-sensitive parsing (ESP), an efficient parsing algorithm that guarantees upper bounds of parsing discrepancies between different appearances of the same subtexts in a text. Althoug… ▽ More

    Submitted 27 April, 2014; v1 submitted 19 April, 2014; originally announced April 2014.

    Comments: This is the full version of a proceeding accepted to the 11th International Symposium on Experimental Algorithms (SEA2014)

  13. arXiv:1304.0917  [pdf, ps, other

    cs.DS

    A Succinct Grammar Compression

    Authors: Yasuo Tabei, Yoshimasa Takabatake, Hiroshi Sakamoto

    Abstract: We solve an open problem related to an optimal encoding of a straight line program (SLP), a canonical form of grammar compression deriving a single string deterministically. We show that an information-theoretic lower bound for representing an SLP with n symbols requires at least 2n+logn!+o(n) bits. We then present a succinct representation of an SLP; this representation is asymptotically equivale… ▽ More

    Submitted 14 June, 2013; v1 submitted 3 April, 2013; originally announced April 2013.

    Comments: The paper is accepted to 24th Annual Symposium on Combinatorial Pattern Matching (CPM2013)

  14. arXiv:1107.2729  [pdf, ps, other

    cs.DS

    Restructuring Compressed Texts without Explicit Decompression

    Authors: Keisuke Goto, Shirou Maruyama, Shunsuke Inenaga, Hideo Bannai, Hiroshi Sakamoto, Masayuki Takeda

    Abstract: We consider the problem of {\em restructuring} compressed texts without explicit decompression. We present algorithms which allow conversions from compressed representations of a string $T$ produced by any grammar-based compression algorithm, to representations produced by several specific compression algorithms including LZ77, LZ78, run length encoding, and some grammar based compression algorith… ▽ More

    Submitted 14 July, 2011; originally announced July 2011.

  15. arXiv:1101.0080  [pdf, ps, other

    cs.DS

    A Searchable Compressed Edit-Sensitive Parsing

    Authors: Naoya Kishiue, Masaya Nakahara, Shirou Maruyama, Hiroshi Sakamoto

    Abstract: Practical data structures for the edit-sensitive parsing (ESP) are proposed. Given a string S, its ESP tree is equivalent to a context-free grammar G generating just S, which is represented by a DAG. Using the succinct data structures for trees and permutations, G is decomposed to two LOUDS bit strings and single array in (1+ε)n\log n+4n+o(n) bits for any 0<ε<1 and the number n of variables in G.… ▽ More

    Submitted 9 January, 2011; v1 submitted 30 December, 2010; originally announced January 2011.

    Comments: 16 pages, 14 figures

  16. arXiv:cs/0306051  [pdf, ps, other

    cs.DC

    A data Grid testbed environment in Gigabit WAN with HPSS

    Authors: Atsushi Manabe, Kohki Ishikawa, Yoshihiko Itoh, Setsuya Kawabata, Tetsuro Mashimo, Youhei Morita, Hiroshi Sakamoto, Takashi Sasaki, Hiroyuki Sato, Junichi Tanaka, Ikuo Ueda, Yoshiyuki Watase, Satomi Yamamoto, Shigeo Yashiro

    Abstract: For data analysis of large-scale experiments such as LHC Atlas and other Japanese high energy and nuclear physics projects, we have constructed a Grid test bed at ICEPP and KEK. These institutes are connected to national scientific gigabit network backbone called SuperSINET. In our test bed, we have installed NorduGrid middleware based on Globus, and connected 120TB HPSS at KEK as a large scale… ▽ More

    Submitted 3 September, 2003; v1 submitted 12 June, 2003; originally announced June 2003.

    Comments: Talk from the 2003 Computing in High Energy and Nuclear Physics (CHEP03), La Jolla, Ca, USA, March 2003, 5 pages, LaTeX, 9 figures, PSN THCT002

    ACM Class: C.2.4; J.2; H.3.4