-
Tunnelling conductance of $d+ip$-wave superconductor
Authors:
Yuhi Takabatake,
Shu-Ichiro Suzuki,
Yukio Tanaka
Abstract:
We theoretically investigate the tunneling conductance of the $d+ip$-wave superconductor which is recently proposed to be realised at the (110) surface of a high-$T_c$ cuprate superconductor. Utilizing the quasiclassical Eilenberger theory, we obtain the self-consistent pair potentials and the differential conductance of the normal-metal/$d+ip$-wave superconductor junction. We demonstrate that the…
▽ More
We theoretically investigate the tunneling conductance of the $d+ip$-wave superconductor which is recently proposed to be realised at the (110) surface of a high-$T_c$ cuprate superconductor. Utilizing the quasiclassical Eilenberger theory, we obtain the self-consistent pair potentials and the differential conductance of the normal-metal/$d+ip$-wave superconductor junction. We demonstrate that the zero-bias peak of a $d$-wave superconductor is robust against the spin-triplet $p$-wave surface subdominant order even though it is fragile against the spin-singlet $s$-wave one. Comparing our numerical results and the experimental results, we conclude the spin-triplet $p$-wave surface subdominant order is feasible.
△ Less
Submitted 29 March, 2021; v1 submitted 15 November, 2020;
originally announced November 2020.
-
Faster Privacy-Preserving Computation of Edit Distance with Moves
Authors:
Yohei Yoshimoto,
Masaharu Kataoka,
Yoshimasa Takabatake,
Tomohiro I,
Kilho Shin,
Hiroshi Sakamoto
Abstract:
We consider an efficient two-party protocol for securely computing the similarity of strings w.r.t. an extended edit distance measure. Here, two parties possessing strings $x$ and $y$, respectively, want to jointly compute an approximate value for $\mathrm{EDM}(x,y)$, the minimum number of edit operations including substring moves needed to transform $x$ into $y$, without revealing any private inf…
▽ More
We consider an efficient two-party protocol for securely computing the similarity of strings w.r.t. an extended edit distance measure. Here, two parties possessing strings $x$ and $y$, respectively, want to jointly compute an approximate value for $\mathrm{EDM}(x,y)$, the minimum number of edit operations including substring moves needed to transform $x$ into $y$, without revealing any private information. Recently, the first secure two-party protocol for this was proposed, based on homomorphic encryption, but this approach is not suitable for long strings due to its high communication and round complexities. In this paper, we propose an improved algorithm that significantly reduces the round complexity without sacrificing its cryptographic strength. We examine the performance of our algorithm for DNA sequences compared to previous one.
△ Less
Submitted 28 November, 2019; v1 submitted 25 November, 2019;
originally announced November 2019.
-
Practical Random Access to SLP-Compressed Texts
Authors:
Travis Gagie,
Tomohiro I,
Giovanni Manzini,
Gonzalo Navarro,
Hiroshi Sakamoto,
Louisa Seelbach Benkner,
Yoshimasa Takabatake
Abstract:
Grammar-based compression is a popular and powerful approach to compressing repetitive texts but until recently its relatively poor time-space trade-offs during real-life construction made it impractical for truly massive datasets such as genomic databases. In a recent paper (SPIRE 2019) we showed how simple pre-processing can dramatically improve those trade-offs, and in this paper we turn our at…
▽ More
Grammar-based compression is a popular and powerful approach to compressing repetitive texts but until recently its relatively poor time-space trade-offs during real-life construction made it impractical for truly massive datasets such as genomic databases. In a recent paper (SPIRE 2019) we showed how simple pre-processing can dramatically improve those trade-offs, and in this paper we turn our attention to one of the features that make grammar-based compression so attractive: the possibility of supporting fast random access. This is an essential primitive in many algorithms that process grammar-compressed texts without decompressing them and so many theoretical bounds have been published about it, but experimentation has lagged behind. We give a new encoding of grammars that is about as small as the practical state of the art (Maruyama et al., SPIRE 2013) but with significantly faster queries.
△ Less
Submitted 19 July, 2020; v1 submitted 15 October, 2019;
originally announced October 2019.
-
Re-Pair In Small Space
Authors:
Dominik Köppl,
Tomohiro I,
Isamu Furuya,
Yoshimasa Takabatake,
Kensuke Sakai,
Keisuke Goto
Abstract:
Re-Pair is a grammar compression scheme with favorably good compression rates. The computation of Re-Pair comes with the cost of maintaining large frequency tables, which makes it hard to compute Re-Pair on large scale data sets. As a solution for this problem we present, given a text of length $n$ whose characters are drawn from an integer alphabet, an…
▽ More
Re-Pair is a grammar compression scheme with favorably good compression rates. The computation of Re-Pair comes with the cost of maintaining large frequency tables, which makes it hard to compute Re-Pair on large scale data sets. As a solution for this problem we present, given a text of length $n$ whose characters are drawn from an integer alphabet, an $O(n^2) \cap O(n^2 \lg \log_τn \lg \lg \lg n / \log_τn)$ time algorithm computing Re-Pair in $n \lg \max(n,τ)$ bits of space including the text space, where $τ$ is the number of terminals and non-terminals. The algorithm works in the restore model, supporting the recovery of the original input in the time for the Re-Pair computation with $O(\lg n)$ additional bits of working space. We give variants of our solution working in parallel or in the external memory model.
△ Less
Submitted 16 November, 2019; v1 submitted 13 August, 2019;
originally announced August 2019.
-
Rpair: Rescaling RePair with Rsync
Authors:
Travis Gagie,
Tomohiro I,
Giovanni Manzini,
Gonzalo Navarro,
Hiroshi Sakamoto,
Yoshimasa Takabatake
Abstract:
Data compression is a powerful tool for managing massive but repetitive datasets, especially schemes such as grammar-based compression that support computation over the data without decompressing it. In the best case such a scheme takes a dataset so big that it must be stored on disk and shrinks it enough that it can be stored and processed in internal memory. Even then, however, the scheme is ess…
▽ More
Data compression is a powerful tool for managing massive but repetitive datasets, especially schemes such as grammar-based compression that support computation over the data without decompressing it. In the best case such a scheme takes a dataset so big that it must be stored on disk and shrinks it enough that it can be stored and processed in internal memory. Even then, however, the scheme is essentially useless unless it can be built on the original dataset reasonably quickly while kee** the dataset on disk. In this paper we show how we can preprocess such datasets with context-triggered piecewise hashing such that afterwards we can apply RePair and other grammar-based compressors more easily. We first give our algorithm, then show how a variant of it can be used to approximate the LZ77 parse, then leverage that to prove theoretical bounds on compression, and finally give experimental evidence that our approach is competitive in practice.
△ Less
Submitted 3 June, 2019;
originally announced June 2019.
-
RePair in Compressed Space and Time
Authors:
Kensuke Sakai,
Tatsuya Ohno,
Keisuke Goto,
Yoshimasa Takabatake,
Tomohiro I,
Hiroshi Sakamoto
Abstract:
Given a string $T$ of length $N$, the goal of grammar compression is to construct a small context-free grammar generating only $T$. Among existing grammar compression methods, RePair (recursive paring) [Larsson and Moffat, 1999] is notable for achieving good compression ratios in practice. Although the original paper already achieved a time-optimal algorithm to compute the RePair grammar RePair(…
▽ More
Given a string $T$ of length $N$, the goal of grammar compression is to construct a small context-free grammar generating only $T$. Among existing grammar compression methods, RePair (recursive paring) [Larsson and Moffat, 1999] is notable for achieving good compression ratios in practice. Although the original paper already achieved a time-optimal algorithm to compute the RePair grammar RePair($T$) in expected $O(N)$ time, the study to reduce its working space is still active so that it is applicable to large-scale data. In this paper, we propose the first RePair algorithm working in compressed space, i.e., potentially $o(N)$ space for highly compressible texts. The key idea is to give a new way to restructure an arbitrary grammar $S$ for $T$ into RePair($T$) in compressed space and time. Based on the recompression technique, we propose an algorithm for RePair($T$) in $O(\min(N, nm \log N))$ space and expected $O(\min(N, nm \log N) m)$ time or $O(\min(N, nm \log N) \log \log N)$ time, where $n$ is the size of $S$ and $m$ is the number of variables in RePair($T$). We implemented our algorithm running in $O(\min(N, nm \log N) m)$ time and show it can actually run in compressed space. We also present a new approach to reduce the peak memory usage of existing RePair algorithms combining with our algorithms, and show that the new approach outperforms, both in computation time and space, the most space efficient linear-time RePair implementation to date.
△ Less
Submitted 4 November, 2018;
originally announced November 2018.
-
A compressed dynamic self-index for highly repetitive text collections
Authors:
Takaaki Nishimoto,
Yoshimasa Takabatake,
Yasuo Tabei
Abstract:
We present a novel compressed dynamic self-index for highly repetitive text collections. Signature encoding is a compressed dynamic self-index for highly repetitive texts and has a large disadvantage that the pattern search for short patterns is slow. We improve this disadvantage for faster pattern search by leveraging an idea behind truncated suffix tree and present the first compressed dynamic s…
▽ More
We present a novel compressed dynamic self-index for highly repetitive text collections. Signature encoding is a compressed dynamic self-index for highly repetitive texts and has a large disadvantage that the pattern search for short patterns is slow. We improve this disadvantage for faster pattern search by leveraging an idea behind truncated suffix tree and present the first compressed dynamic self-index named TST-index that supports not only fast pattern search but also dynamic update operation of index for highly repetitive texts. Experiments using a benchmark dataset of highly repetitive texts show that the pattern search of TST-index is significantly improved.
△ Less
Submitted 24 April, 2018; v1 submitted 8 November, 2017;
originally announced November 2017.
-
A Faster Implementation of Online Run-Length Burrows-Wheeler Transform
Authors:
Tatsuya Ohno,
Yoshimasa Takabatake,
Tomohiro I,
Hiroshi Sakamoto
Abstract:
Run-length encoding Burrows-Wheeler Transformed strings, resulting in Run-Length BWT (RLBWT), is a powerful tool for processing highly repetitive strings. We propose a new algorithm for online RLBWT working in run-compressed space, which runs in $O(n\lg r)$ time and $O(r\lg n)$ bits of space, where $n$ is the length of input string $S$ received so far and $r$ is the number of runs in the BWT of th…
▽ More
Run-length encoding Burrows-Wheeler Transformed strings, resulting in Run-Length BWT (RLBWT), is a powerful tool for processing highly repetitive strings. We propose a new algorithm for online RLBWT working in run-compressed space, which runs in $O(n\lg r)$ time and $O(r\lg n)$ bits of space, where $n$ is the length of input string $S$ received so far and $r$ is the number of runs in the BWT of the reversed $S$. We improve the state-of-the-art algorithm for online RLBWT in terms of empirical construction time. Adopting the dynamic list for maintaining a total order, we can replace rank queries in a dynamic wavelet tree on a run-length compressed string by the direct comparison of labels in a dynamic list. The empirical result for various benchmarks show the efficiency of our algorithm, especially for highly repetitive strings.
△ Less
Submitted 14 October, 2017; v1 submitted 18 April, 2017;
originally announced April 2017.
-
Online Grammar Compression for Frequent Pattern Discovery
Authors:
Shouhei Fukunaga,
Yoshimasa Takabatake,
I Tomohiro,
Hiroshi Sakamoto
Abstract:
Various grammar compression algorithms have been proposed in the last decade. A grammar compression is a restricted CFG deriving the string deterministically. An efficient grammar compression develops a smaller CFG by finding duplicated patterns and removing them. This process is just a frequent pattern discovery by grammatical inference. While we can get any frequent pattern in linear time using…
▽ More
Various grammar compression algorithms have been proposed in the last decade. A grammar compression is a restricted CFG deriving the string deterministically. An efficient grammar compression develops a smaller CFG by finding duplicated patterns and removing them. This process is just a frequent pattern discovery by grammatical inference. While we can get any frequent pattern in linear time using a preprocessed string, a huge working space is required for longer patterns, and the whole string must be loaded into the memory preliminarily. We propose an online algorithm approximating this problem within a compressed space. The main contribution is an improvement of the previously best known approximation ratio $Ω(\frac{1}{\lg^2m})$ to $Ω(\frac{1}{\lg^*N\lg m})$ where $m$ is the length of an optimal pattern in a string of length $N$ and $\lg^*$ is the iteration of the logarithm base $2$. For a sufficiently large $N$, $\lg^*N$ is practically constant. The experimental results show that our algorithm extracts nearly optimal patterns and achieves a significant improvement in memory consumption compared to the offline algorithm.
△ Less
Submitted 30 August, 2016; v1 submitted 15 July, 2016;
originally announced July 2016.
-
siEDM: an efficient string index and search algorithm for edit distance with moves
Authors:
Yoshimasa Takabatake,
Kenta Nakashima,
Tetsuji Kuboyama,
Yasuo Tabei,
Hiroshi Sakamoto
Abstract:
Although several self-indexes for highly repetitive text collections exist, develo** an index and search algorithm with editing operations remains a challenge. Edit distance with moves (EDM) is a string-to-string distance measure that includes substring moves in addition to ordinal editing operations to turn one string into another. Although the problem of computing EDM is intractable, it has a…
▽ More
Although several self-indexes for highly repetitive text collections exist, develo** an index and search algorithm with editing operations remains a challenge. Edit distance with moves (EDM) is a string-to-string distance measure that includes substring moves in addition to ordinal editing operations to turn one string into another. Although the problem of computing EDM is intractable, it has a wide range of potential applications, especially in approximate string retrieval. Despite the importance of computing EDM, there has been no efficient method for indexing and searching large text collections based on the EDM measure. We propose the first algorithm, named string index for edit distance with moves (siEDM), for indexing and searching strings with EDM. The siEDM algorithm builds an index structure by leveraging the idea behind the edit sensitive parsing (ESP), an efficient algorithm enabling approximately computing EDM with guarantees of upper and lower bounds for the exact EDM. siEDM efficiently prunes the space for searching query strings by the proposed method, which enables fast query searches with the same guarantee as ESP. We experimentally tested the ability of siEDM to index and search strings on benchmark datasets, and we showed siEDM's efficiency.
△ Less
Submitted 8 April, 2016; v1 submitted 22 February, 2016;
originally announced February 2016.
-
Online Self-Indexed Grammar Compression
Authors:
Yoshimasa Takabatake,
Yasuo Tabei,
Hiroshi Sakamoto
Abstract:
Although several grammar-based self-indexes have been proposed thus far, their applicability is limited to offline settings where whole input texts are prepared, thus requiring to rebuild index structures for given additional inputs, which is often the case in the big data era. In this paper, we present the first online self-indexed grammar compression named OESP-index that can gradually build the…
▽ More
Although several grammar-based self-indexes have been proposed thus far, their applicability is limited to offline settings where whole input texts are prepared, thus requiring to rebuild index structures for given additional inputs, which is often the case in the big data era. In this paper, we present the first online self-indexed grammar compression named OESP-index that can gradually build the index structure by reading input characters one-by-one. Such a property is another advantage which enables saving a working space for construction, because we do not need to store input texts in memory. We experimentally test OESP-index on the ability to build index structures and search query texts, and we show OESP-index's efficiency, especially space-efficiency for building index structures.
△ Less
Submitted 6 July, 2015; v1 submitted 2 July, 2015;
originally announced July 2015.
-
Online Pattern Matching for String Edit Distance with Moves
Authors:
Yoshimasa Takabatake,
Yasuo Tabei,
Hiroshi Sakamoto
Abstract:
Edit distance with moves (EDM) is a string-to-string distance measure that includes substring moves in addition to ordinal editing operations to turn one string to the other. Although optimizing EDM is intractable, it has many applications especially in error detections. Edit sensitive parsing (ESP) is an efficient parsing algorithm that guarantees an upper bound of parsing discrepancies between d…
▽ More
Edit distance with moves (EDM) is a string-to-string distance measure that includes substring moves in addition to ordinal editing operations to turn one string to the other. Although optimizing EDM is intractable, it has many applications especially in error detections. Edit sensitive parsing (ESP) is an efficient parsing algorithm that guarantees an upper bound of parsing discrepancies between different appearances of the same substrings in a string. ESP can be used for computing an approximate EDM as the L1 distance between characteristic vectors built by node labels in parsing trees. However, ESP is not applicable to a streaming text data where a whole text is unknown in advance. We present an online ESP (OESP) that enables an online pattern matching for EDM. OESP builds a parse tree for a streaming text and computes the L1 distance between characteristic vectors in an online manner. For the space-efficient computation of EDM, OESP directly encodes the parse tree into a succinct representation by leveraging the idea behind recent results of a dynamic succinct tree. We experimentally test OESP on the ability to compute EDM in an online manner on benchmark datasets, and we show OESP's efficiency.
△ Less
Submitted 26 August, 2014; v1 submitted 3 August, 2014;
originally announced August 2014.
-
Improved ESP-index: a practical self-index for highly repetitive texts
Authors:
Yoshimasa Takabatake,
Yasuo Tabei,
Hiroshi Sakamoto
Abstract:
While several self-indexes for highly repetitive texts exist, develo** a practical self-index applicable to real world repetitive texts remains a challenge. ESP-index is a grammar-based self-index on the notion of edit-sensitive parsing (ESP), an efficient parsing algorithm that guarantees upper bounds of parsing discrepancies between different appearances of the same subtexts in a text. Althoug…
▽ More
While several self-indexes for highly repetitive texts exist, develo** a practical self-index applicable to real world repetitive texts remains a challenge. ESP-index is a grammar-based self-index on the notion of edit-sensitive parsing (ESP), an efficient parsing algorithm that guarantees upper bounds of parsing discrepancies between different appearances of the same subtexts in a text. Although ESP-index performs efficient top-down searches of query texts, it has a serious issue on binary searches for finding appearances of variables for a query text, which resulted in slowing down the query searches. We present an improved ESP-index (ESP-index-I) by leveraging the idea behind succinct data structures for large alphabets. While ESP-index-I keeps the same types of efficiencies as ESP-index about the top-down searches, it avoid the binary searches using fast rank/select operations. We experimentally test ESP-index-I on the ability to search query texts and extract subtexts from real world repetitive texts on a large-scale, and we show that ESP-index-I performs better that other possible approaches.
△ Less
Submitted 27 April, 2014; v1 submitted 19 April, 2014;
originally announced April 2014.
-
A Succinct Grammar Compression
Authors:
Yasuo Tabei,
Yoshimasa Takabatake,
Hiroshi Sakamoto
Abstract:
We solve an open problem related to an optimal encoding of a straight line program (SLP), a canonical form of grammar compression deriving a single string deterministically. We show that an information-theoretic lower bound for representing an SLP with n symbols requires at least 2n+logn!+o(n) bits. We then present a succinct representation of an SLP; this representation is asymptotically equivale…
▽ More
We solve an open problem related to an optimal encoding of a straight line program (SLP), a canonical form of grammar compression deriving a single string deterministically. We show that an information-theoretic lower bound for representing an SLP with n symbols requires at least 2n+logn!+o(n) bits. We then present a succinct representation of an SLP; this representation is asymptotically equivalent to the lower bound. The space is at most 2n log {rho}(1 + o(1)) bits for rho leq 2sqrt{n}, while supporting random access to any production rule of an SLP in O(log log n) time. In addition, we present a novel dynamic data structure associating a digram with a unique symbol. Such a data structure is called a naming function and has been implemented using a hash table that has a space-time tradeoff. Thus, the memory space is mainly occupied by the hash table during the development of production rules. Alternatively, we build a dynamic data structure for the naming function by leveraging the idea behind the wavelet tree. The space is strictly bounded by 2n log n(1 + o(1)) bits, while supporting O(log n) query and update time.
△ Less
Submitted 14 June, 2013; v1 submitted 3 April, 2013;
originally announced April 2013.