-
Almost Linear Size Edit Distance Sketch
Authors:
Michal Koucký,
Michael Saks
Abstract:
Edit distance is an important measure of string similarity. It counts the number of insertions, deletions and substitutions one has to make to a string $x$ to get a string $y$. In this paper we design an almost linear-size sketching scheme for computing edit distance up to a given threshold $k$. The scheme consists of two algorithms, a sketching algorithm and a recovery algorithm. The sketching al…
▽ More
Edit distance is an important measure of string similarity. It counts the number of insertions, deletions and substitutions one has to make to a string $x$ to get a string $y$. In this paper we design an almost linear-size sketching scheme for computing edit distance up to a given threshold $k$. The scheme consists of two algorithms, a sketching algorithm and a recovery algorithm. The sketching algorithm depends on the parameter $k$ and takes as input a string $x$ and a public random string $ρ$ and computes a sketch $sk_ρ(x;k)$, which is a digested version of $x$. The recovery algorithm is given two sketches $sk_ρ(x;k)$ and $sk_ρ(y;k)$ as well as the public random string $ρ$ used to create the two sketches, and (with high probability) if the edit distance $ED(x,y)$ between $x$ and $y$ is at most $k$, will output $ED(x,y)$ together with an optimal sequence of edit operations that transforms $x$ to $y$, and if $ED(x,y) > k$ will output LARGE. The size of the sketch output by the sketching algorithm on input $x$ is $k{2^{O(\sqrt{\log(n)\log\log(n)})}}$ (where $n$ is an upper bound on length of $x$). The sketching and recovery algorithms both run in time polynomial in $n$. The dependence of sketch size on $k$ is information theoretically optimal and improves over the quadratic dependence on $k$ in schemes of Kociumaka, Porat and Starikovskaya (FOCS'2021), and Bhattacharya and Koucký (STOC'2023).
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Nearly Optimal List Labeling
Authors:
Michael A. Bender,
Alex Conway,
Martín Farach-Colton,
Hanna Komlós,
Michal Koucký,
William Kuszmaul,
Michael Saks
Abstract:
The list-labeling problem captures the basic task of storing a dynamically changing set of up to $n$ elements in sorted order in an array of size $m = (1 + Θ(1))n$. The goal is to support insertions and deletions while moving around elements within the array as little as possible.
Until recently, the best known upper bound stood at $O(\log^2 n)$ amortized cost. This bound, which was first establ…
▽ More
The list-labeling problem captures the basic task of storing a dynamically changing set of up to $n$ elements in sorted order in an array of size $m = (1 + Θ(1))n$. The goal is to support insertions and deletions while moving around elements within the array as little as possible.
Until recently, the best known upper bound stood at $O(\log^2 n)$ amortized cost. This bound, which was first established in 1981, was finally improved two years ago, when a randomized $O(\log^{3/2} n)$ expected-cost algorithm was discovered. The best randomized lower bound for this problem remains $Ω(\log n)$, and closing this gap is considered to be a major open problem in data structures.
In this paper, we present the See-Saw Algorithm, a randomized list-labeling solution that achieves a nearly optimal bound of $O(\log n \operatorname{polyloglog} n)$ amortized expected cost. This bound is achieved despite at least three lower bounds showing that this type of result is impossible for large classes of solutions.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Streaming $k$-edit approximate pattern matching via string decomposition
Authors:
Sudatta Bhattacharya,
Michal Koucký
Abstract:
In this paper we give an algorithm for streaming $k$-edit approximate pattern matching which uses space $\widetilde{O}(k^2)$ and time $\widetilde{O}(k^2)$ per arriving symbol. This improves substantially on the recent algorithm of Kociumaka, Porat and Starikovskaya (2022) which uses space $\widetilde{O}(k^5)$ and time $\widetilde{O}(k^8)$ per arriving symbol. In the $k$-edit approximate pattern ma…
▽ More
In this paper we give an algorithm for streaming $k$-edit approximate pattern matching which uses space $\widetilde{O}(k^2)$ and time $\widetilde{O}(k^2)$ per arriving symbol. This improves substantially on the recent algorithm of Kociumaka, Porat and Starikovskaya (2022) which uses space $\widetilde{O}(k^5)$ and time $\widetilde{O}(k^8)$ per arriving symbol. In the $k$-edit approximate pattern matching problem we get a pattern $P$ and text $T$ and we want to identify all substrings of the text $T$ that are at edit distance at most $k$ from $P$. In the streaming version of this problem both the pattern and the text arrive in a streaming fashion symbol by symbol and after each symbol of the text we need to report whether there is a current suffix of the text with edit distance at most $k$ from $P$. We measure the total space needed by the algorithm and time needed per arriving symbol.
△ Less
Submitted 30 April, 2023;
originally announced May 2023.
-
Locally consistent decomposition of strings with applications to edit distance sketching
Authors:
Sudatta Bhattacharya,
Michal Koucký
Abstract:
In this paper we provide a new locally consistent decomposition of strings. Each string $x$ is decomposed into blocks that can be described by grammars of size $\widetilde{O}(k)$ (using some amount of randomness). If we take two strings $x$ and $y$ of edit distance at most $k$ then their block decomposition uses the same number of grammars and the $i$-th grammar of $x$ is the same as the $i$-th gr…
▽ More
In this paper we provide a new locally consistent decomposition of strings. Each string $x$ is decomposed into blocks that can be described by grammars of size $\widetilde{O}(k)$ (using some amount of randomness). If we take two strings $x$ and $y$ of edit distance at most $k$ then their block decomposition uses the same number of grammars and the $i$-th grammar of $x$ is the same as the $i$-th grammar of $y$ except for at most $k$ indexes $i$. The edit distance of $x$ and $y$ equals to the sum of edit distances of pairs of blocks where $x$ and $y$ differ. Our decomposition can be used to design a sketch of size $\widetilde{O}(k^2)$ for edit distance, and also a rolling sketch for edit distance of size $\widetilde{O}(k^2)$. The rolling sketch allows to update the sketched string by appending a symbol or removing a symbol from the beginning of the string.
△ Less
Submitted 27 November, 2023; v1 submitted 9 February, 2023;
originally announced February 2023.
-
A Separator Theorem for Hypergraphs and a CSP-SAT Algorithm
Authors:
Michal Koucký,
Vojtěch Rödl,
Navid Talebanfard
Abstract:
We show that for every $r \ge 2$ there exists $ε_r > 0$ such that any $r$-uniform hypergraph with $m$ edges and maximum vertex degree $o(\sqrt{m})$ contains a set of at most $(\frac{1}{2} - ε_r)m$ edges the removal of which breaks the hypergraph into connected components with at most $m/2$ edges. We use this to give an algorithm running in time $d^{(1 - ε_r)m}$ that decides satisfiability of $m$-v…
▽ More
We show that for every $r \ge 2$ there exists $ε_r > 0$ such that any $r$-uniform hypergraph with $m$ edges and maximum vertex degree $o(\sqrt{m})$ contains a set of at most $(\frac{1}{2} - ε_r)m$ edges the removal of which breaks the hypergraph into connected components with at most $m/2$ edges. We use this to give an algorithm running in time $d^{(1 - ε_r)m}$ that decides satisfiability of $m$-variable $(d, k)$-CSPs in which every variable appears in at most $r$ constraints, where $ε_r$ depends only on $r$ and $k\in o(\sqrt{m})$. Furthermore our algorithm solves the corresponding #CSP-SAT and Max-CSP-SAT of these CSPs. We also show that CNF representations of unsatisfiable $(2, k)$-CSPs with variable frequency $r$ can be refuted in tree-like resolution in size $2^{(1 - ε_r)m}$. Furthermore for Tseitin formulas on graphs with degree at most $k$ (which are $(2, k)$-CSPs) we give a deterministic algorithm finding such a refutation.
△ Less
Submitted 10 December, 2021; v1 submitted 14 May, 2021;
originally announced May 2021.
-
Sorting Short Integers
Authors:
Michal Koucký,
Karel Král
Abstract:
We build boolean circuits of size $O(nm^2)$ and depth $O(\log(n) + m \log(m))$ for sorting $n$ integers each of $m$-bits. We build also circuits that sort $n$ integers each of $m$-bits according to their first $k$ bits that are of size $O(nmk(1 + \log^*(n) - \log^*(m)))$ and depth $O(\log^{3}(n))$. This improves on the result of Asharov et al. arXiv:2010.09884 and resolves some of their open quest…
▽ More
We build boolean circuits of size $O(nm^2)$ and depth $O(\log(n) + m \log(m))$ for sorting $n$ integers each of $m$-bits. We build also circuits that sort $n$ integers each of $m$-bits according to their first $k$ bits that are of size $O(nmk(1 + \log^*(n) - \log^*(m)))$ and depth $O(\log^{3}(n))$. This improves on the result of Asharov et al. arXiv:2010.09884 and resolves some of their open questions.
△ Less
Submitted 7 May, 2021; v1 submitted 19 February, 2021;
originally announced February 2021.
-
Data Structures Lower Bounds and Popular Conjectures
Authors:
Pavel Dvořák,
Michal Koucký,
Karel Král,
Veronika Slívová
Abstract:
In this paper, we investigate the relative power of several conjectures that attracted recently lot of interest. We establish a connection between the Network Coding Conjecture (NCC) of Li and Li and several data structure like problems such as non-adaptive function inversion of Hellman and the well-studied problem of polynomial evaluation and interpolation. In turn these data structure problems i…
▽ More
In this paper, we investigate the relative power of several conjectures that attracted recently lot of interest. We establish a connection between the Network Coding Conjecture (NCC) of Li and Li and several data structure like problems such as non-adaptive function inversion of Hellman and the well-studied problem of polynomial evaluation and interpolation. In turn these data structure problems imply super-linear circuit lower bounds for explicit functions such as integer sorting and multi-point polynomial evaluation.
△ Less
Submitted 18 February, 2021;
originally announced February 2021.
-
Barrington Plays Cards: The Complexity of Card-based Protocols
Authors:
Pavel Dvořák,
Michal Koucký
Abstract:
In this paper we study the computational complexity of functions that have efficient card-based protocols. Card-based protocols were proposed by den Boer [EUROCRYPT '89] as a means for secure two-party computation. Our contribution is two-fold: We classify a large class of protocols with respect to the computational complexity of functions they compute, and we propose other encodings of inputs whi…
▽ More
In this paper we study the computational complexity of functions that have efficient card-based protocols. Card-based protocols were proposed by den Boer [EUROCRYPT '89] as a means for secure two-party computation. Our contribution is two-fold: We classify a large class of protocols with respect to the computational complexity of functions they compute, and we propose other encodings of inputs which require fewer cards than the usual 2-card representation.
△ Less
Submitted 16 October, 2020;
originally announced October 2020.
-
Constant factor approximations to edit distance on far input pairs in nearly linear time
Authors:
Michal Koucký,
Michael E. Saks
Abstract:
For any $T \geq 1$, there are constants $R=R(T) \geq 1$ and $ζ=ζ(T)>0$ and a randomized algorithm that takes as input an integer $n$ and two strings $x,y$ of length at most $n$, and runs in time $O(n^{1+\frac{1}{T}})$ and outputs an upper bound $U$ on the edit distance $ED(x,y)$ that with high probability, satisfies $U \leq R(ED(x,y)+n^{1-ζ})$. In particular, on any input with…
▽ More
For any $T \geq 1$, there are constants $R=R(T) \geq 1$ and $ζ=ζ(T)>0$ and a randomized algorithm that takes as input an integer $n$ and two strings $x,y$ of length at most $n$, and runs in time $O(n^{1+\frac{1}{T}})$ and outputs an upper bound $U$ on the edit distance $ED(x,y)$ that with high probability, satisfies $U \leq R(ED(x,y)+n^{1-ζ})$. In particular, on any input with $ED(x,y) \geq n^{1-ζ}$ the algorithm outputs a constant factor approximation with high probability.
A similar result has been proven independently by Brakensiek and Rubinstein (2019).
△ Less
Submitted 9 May, 2019; v1 submitted 10 April, 2019;
originally announced April 2019.
-
Stronger Lower Bounds for Online ORAM
Authors:
Pavel Hubáček,
Michal Koucký,
Karel Král,
Veronika Slívová
Abstract:
Oblivious RAM (ORAM), introduced in the context of software protection by Goldreich and Ostrovsky [JACM'96], aims at obfuscating the memory access pattern induced by a RAM computation. Ideally, the memory access pattern of an ORAM should be independent of the data being processed. Since the work of Goldreich and Ostrovsky, it was believed that there is an inherent $ Ω(\log n) $ bandwidth overhead…
▽ More
Oblivious RAM (ORAM), introduced in the context of software protection by Goldreich and Ostrovsky [JACM'96], aims at obfuscating the memory access pattern induced by a RAM computation. Ideally, the memory access pattern of an ORAM should be independent of the data being processed. Since the work of Goldreich and Ostrovsky, it was believed that there is an inherent $ Ω(\log n) $ bandwidth overhead in any ORAM working with memory of size $ n $. Larsen and Nielsen [CRYPTO'18] were the first to give a general $ Ω(\log n) $ lower bound for any online ORAM, i.e., an ORAM that must process its inputs in an online manner.
In this work, we revisit the lower bound of Larsen and Nielsen, which was proved under the assumption that the adversarial server knows exactly which server accesses correspond to which input operation. We give an $Ω(\log n) $ lower bound for the bandwidth overhead of any online ORAM even when the adversary has no access to this information. For many known constructions of ORAM this information is provided implicitly as each input operation induces an access sequence of roughly the same length. Thus, they are subject to the lower bound of Larsen and Nielsen. Our results rule out a broader class of constructions and specifically, they imply that obfuscating the boundaries between the input operations does not help in building a more efficient ORAM.
As our main technical contribution and to handle the lack of structure, we study the properties of access graphs induced naturally by the memory access pattern of an ORAM computation. We identify a particular graph property that can be efficiently tested and that all access graphs of ORAM computation must satisfy with high probability. This property is reminiscent of the Larsen-Nielsen property but it is substantially less structured; that is, it is more generic.
△ Less
Submitted 23 September, 2019; v1 submitted 8 March, 2019;
originally announced March 2019.
-
Approximating Edit Distance Within Constant Factor in Truly Sub-Quadratic Time
Authors:
Diptarka Chakraborty,
Debarati Das,
Elazar Goldenberg,
Michal Koucky,
Michael Saks
Abstract:
Edit distance is a measure of similarity of two strings based on the minimum number of character insertions, deletions, and substitutions required to transform one string into the other. The edit distance can be computed exactly using a dynamic programming algorithm that runs in quadratic time. Andoni, Krauthgamer, and Onak (2010) gave a nearly linear time algorithm that approximates edit distance…
▽ More
Edit distance is a measure of similarity of two strings based on the minimum number of character insertions, deletions, and substitutions required to transform one string into the other. The edit distance can be computed exactly using a dynamic programming algorithm that runs in quadratic time. Andoni, Krauthgamer, and Onak (2010) gave a nearly linear time algorithm that approximates edit distance within an approximation factor $\text{poly}(\log n)$.
In this paper, we provide an algorithm with running time $\tilde{O}(n^{2-2/7})$ that approximates the edit distance within a constant factor.
△ Less
Submitted 15 February, 2021; v1 submitted 8 October, 2018;
originally announced October 2018.
-
Approximate Online Pattern Matching in Sub-linear Time
Authors:
Diptarka Chakraborty,
Debarati Das,
Michal Koucky
Abstract:
We consider the approximate pattern matching problem under edit distance. In this problem we are given a pattern $P$ of length $w$ and a text $T$ of length $n$ over some alphabet $Σ$, and a positive integer $k$. The goal is to find all the positions $j$ in $T$ such that there is a substring of $T$ ending at $j$ which has edit distance at most $k$ from the pattern $P$. Recall, the edit distance bet…
▽ More
We consider the approximate pattern matching problem under edit distance. In this problem we are given a pattern $P$ of length $w$ and a text $T$ of length $n$ over some alphabet $Σ$, and a positive integer $k$. The goal is to find all the positions $j$ in $T$ such that there is a substring of $T$ ending at $j$ which has edit distance at most $k$ from the pattern $P$. Recall, the edit distance between two strings is the minimum number of character insertions, deletions, and substitutions required to transform one string into the other. For a position $t$ in $\{1,...,n\}$, let $k_t$ be the smallest edit distance between $P$ and any substring of $T$ ending at $t$. In this paper we give a constant factor approximation to the sequence $k_1,k_2,...,k_{n}$. We consider both offline and online settings.
In the offline setting, where both $P$ and $T$ are available, we present an algorithm that for all $t$ in $\{1,...,n\}$, computes the value of $k_t$ approximately within a constant factor. The worst case running time of our algorithm is $O(n w^{3/4})$. As a consequence we break the $O(nw)$-time barrier for this problem.
In the online setting, we are given $P$ and then $T$ arrives one symbol at a time. We design an algorithm that upon arrival of the $t$-th symbol of $T$ computes $k_t$ approximately within $O(1)$-multiplicative factor and $w^{8/9}$-additive error. Our algorithm takes $O(w^{1-(7/54)})$ amortized time per symbol arrival and takes $O(w^{1-(1/54)})$ additional space apart from storing the pattern $P$.
Both of our algorithms are randomized and produce correct answer with high probability. To the best of our knowledge this is the first worst-case sub-linear (in the length of the pattern) time and sub-linear succinct space algorithm for online approximate pattern matching problem.
△ Less
Submitted 5 November, 2018; v1 submitted 8 October, 2018;
originally announced October 2018.
-
Improved bounds on Fourier entropy and Min-entropy
Authors:
Srinivasan Arunachalam,
Sourav Chakraborty,
Michal Koucký,
Nitin Saurabh,
Ronald de Wolf
Abstract:
Given a Boolean function $f:\{-1,1\}^n\to \{-1,1\}$, the Fourier distribution assigns probability $\widehat{f}(S)^2$ to $S\subseteq [n]$. The Fourier Entropy-Influence (FEI) conjecture of Friedgut and Kalai asks if there exist a universal constant C>0 such that $H(\hat{f}^2)\leq C Inf(f)$, where $H(\hat{f}^2)$ is the Shannon entropy of the Fourier distribution of $f$ and $Inf(f)$ is the total infl…
▽ More
Given a Boolean function $f:\{-1,1\}^n\to \{-1,1\}$, the Fourier distribution assigns probability $\widehat{f}(S)^2$ to $S\subseteq [n]$. The Fourier Entropy-Influence (FEI) conjecture of Friedgut and Kalai asks if there exist a universal constant C>0 such that $H(\hat{f}^2)\leq C Inf(f)$, where $H(\hat{f}^2)$ is the Shannon entropy of the Fourier distribution of $f$ and $Inf(f)$ is the total influence of $f$.
1) We consider the weaker Fourier Min-entropy-Influence (FMEI) conjecture. This asks if $H_{\infty}(\hat{f}^2)\leq C Inf(f)$, where $H_{\infty}(\hat{f}^2)$ is the min-entropy of the Fourier distribution. We show $H_{\infty}(\hat{f}^2)\leq 2C_{\min}^\oplus(f)$, where $C_{\min}^\oplus(f)$ is the minimum parity certificate complexity of $f$. We also show that for every $ε\geq 0$, we have $H_{\infty}(\hat{f}^2)\leq 2\log (\|\hat{f}\|_{1,ε}/(1-ε))$, where $\|\hat{f}\|_{1,ε}$ is the approximate spectral norm of $f$. As a corollary, we verify the FMEI conjecture for the class of read-$k$ $DNF$s (for constant $k$).
2) We show that $H(\hat{f}^2)\leq 2 aUC^\oplus(f)$, where $aUC^\oplus(f)$ is the average unambiguous parity certificate complexity of $f$. This improves upon Chakraborty et al. An important consequence of the FEI conjecture is the long-standing Mansour's conjecture. We show that a weaker version of FEI already implies Mansour's conjecture: is $H(\hat{f}^2)\leq C \min\{C^0(f),C^1(f)\}$?, where $C^0(f), C^1(f)$ are the 0- and 1-certificate complexities of $f$, respectively.
3) We study what FEI implies about the structure of polynomials that 1/3-approximate a Boolean function. We pose a conjecture (which is implied by FEI): no "flat" degree-$d$ polynomial of sparsity $2^{ω(d)}$ can 1/3-approximate a Boolean function. We prove this conjecture unconditionally for a particular class of polynomials.
△ Less
Submitted 17 September, 2021; v1 submitted 26 September, 2018;
originally announced September 2018.
-
Lower bounds for Combinatorial Algorithms for Boolean Matrix Multiplication
Authors:
Debarati Das,
Michal Koucký,
Michael Saks
Abstract:
In this paper we propose models of combinatorial algorithms for the Boolean Matrix Multiplication (BMM), and prove lower bounds on computing BMM in these models. First, we give a relatively relaxed combinatorial model which is an extension of the model by Angluin (1976), and we prove that the time required by any algorithm for the BMM is at least $Ω(n^3 / 2^{O( \sqrt{ \log n })})$. Subsequently, w…
▽ More
In this paper we propose models of combinatorial algorithms for the Boolean Matrix Multiplication (BMM), and prove lower bounds on computing BMM in these models. First, we give a relatively relaxed combinatorial model which is an extension of the model by Angluin (1976), and we prove that the time required by any algorithm for the BMM is at least $Ω(n^3 / 2^{O( \sqrt{ \log n })})$. Subsequently, we propose a more general model capable of simulating the "Four Russians Algorithm". We prove a lower bound of $Ω(n^{7/3} / 2^{O(\sqrt{ \log n })})$ for the BMM under this model. We use a special class of graphs, called $(r,t)$-graphs, originally discovered by Rusza and Szemeredi (1978), along with randomization, to construct matrices that are hard instances for our combinatorial models.
△ Less
Submitted 16 January, 2018;
originally announced January 2018.
-
Optimal Quasi-Gray Codes: The Alphabet Matters
Authors:
Diptarka Chakraborty,
Debarati Das,
Michal Koucký,
Nitin Saurabh
Abstract:
A quasi-Gray code of dimension $n$ and length $\ell$ over an alphabet $Σ$ is a sequence of distinct words $w_1,w_2,\dots,w_\ell$ from $Σ^n$ such that any two consecutive words differ in at most $c$ coordinates, for some fixed constant $c>0$. In this paper we are interested in the read and write complexity of quasi-Gray codes in the bit-probe model, where we measure the number of symbols read and w…
▽ More
A quasi-Gray code of dimension $n$ and length $\ell$ over an alphabet $Σ$ is a sequence of distinct words $w_1,w_2,\dots,w_\ell$ from $Σ^n$ such that any two consecutive words differ in at most $c$ coordinates, for some fixed constant $c>0$. In this paper we are interested in the read and write complexity of quasi-Gray codes in the bit-probe model, where we measure the number of symbols read and written in order to transform any word $w_i$ into its successor $w_{i+1}$.
We present construction of quasi-Gray codes of dimension $n$ and length $3^n$ over the ternary alphabet $\{0,1,2\}$ with worst-case read complexity $O(\log n)$ and write complexity $2$. This generalizes to arbitrary odd-size alphabets. For the binary alphabet, we present quasi-Gray codes of dimension $n$ and length at least $2^n - 20n$ with worst-case read complexity $6+\log n$ and write complexity $2$. This complements a recent result by Raskin [Raskin '17] who shows that any quasi-Gray code over binary alphabet of length $2^n$ has read complexity $Ω(n)$.
Our results significantly improve on previously known constructions and for the odd-size alphabets we break the $Ω(n)$ worst-case barrier for space-optimal (non-redundant) quasi-Gray codes with constant number of writes. We obtain our results via a novel application of algebraic tools together with the principles of catalytic computation [Buhrman et al. '14, Ben-Or and Cleve '92, Barrington '89, Coppersmith and Grossman '75].
△ Less
Submitted 17 July, 2018; v1 submitted 5 December, 2017;
originally announced December 2017.
-
Simulation Theorems via Pseudorandom Properties
Authors:
Arkadev Chattopadhyay,
Michal Koucký,
Bruno Loff,
Sagnik Mukhopadhyay
Abstract:
We generalize the deterministic simulation theorem of Raz and McKenzie [RM99], to any gadget which satisfies certain hitting property. We prove that inner-product and gap-Hamming satisfy this property, and as a corollary we obtain deterministic simulation theorem for these gadgets, where the gadget's input-size is logarithmic in the input-size of the outer function. This answers an open question p…
▽ More
We generalize the deterministic simulation theorem of Raz and McKenzie [RM99], to any gadget which satisfies certain hitting property. We prove that inner-product and gap-Hamming satisfy this property, and as a corollary we obtain deterministic simulation theorem for these gadgets, where the gadget's input-size is logarithmic in the input-size of the outer function. This answers an open question posed by Göös, Pitassi and Watson [GPW15]. Our result also implies the previous results for the Indexing gadget, with better parameters than was previously known. A preliminary version of the results obtained in this work appeared in [CKL+17].
△ Less
Submitted 22 April, 2017;
originally announced April 2017.
-
Streaming Algorithms For Computing Edit Distance Without Exploiting Suffix Trees
Authors:
Diptarka Chakraborty,
Elazar Goldenberg,
Michal Koucký
Abstract:
The edit distance is a way of quantifying how similar two strings are to one another by counting the minimum number of character insertions, deletions, and substitutions required to transform one string into the other.
In this paper we study the computational problem of computing the edit distance between a pair of strings where their distance is bounded by a parameter $k\ll n$. We present two s…
▽ More
The edit distance is a way of quantifying how similar two strings are to one another by counting the minimum number of character insertions, deletions, and substitutions required to transform one string into the other.
In this paper we study the computational problem of computing the edit distance between a pair of strings where their distance is bounded by a parameter $k\ll n$. We present two streaming algorithms for computing edit distance: One runs in time $O(n+k^2)$ and the other $n+O(k^3)$. By writing $n+O(k^3)$ we want to emphasize that the number of operations per an input symbol is a small constant. In particular, the running time does not depend on the alphabet size, and the algorithm should be easy to implement.
Previously a streaming algorithm with running time $O(n+k^4)$ was given in the paper by the current authors (STOC'16). The best off-line algorithm runs in time $O(n+k^2)$ (Landau et al., 1998) which is known to be optimal under the Strong Exponential Time Hypothesis.
△ Less
Submitted 13 July, 2016;
originally announced July 2016.
-
The Big Match in Small Space
Authors:
Kristoffer Arnsfelt Hansen,
Rasmus Ibsen-Jensen,
Michal Koucký
Abstract:
In this paper we study how to play (stochastic) games optimally using little space. We focus on repeated games with absorbing states, a type of two-player, zero-sum concurrent mean-payoff games. The prototypical example of these games is the well known Big Match of Gillete (1957). These games may not allow optimal strategies but they always have ε-optimal strategies. In this paper we design ε-opti…
▽ More
In this paper we study how to play (stochastic) games optimally using little space. We focus on repeated games with absorbing states, a type of two-player, zero-sum concurrent mean-payoff games. The prototypical example of these games is the well known Big Match of Gillete (1957). These games may not allow optimal strategies but they always have ε-optimal strategies. In this paper we design ε-optimal strategies for Player 1 in these games that use only O(log log T ) space. Furthermore, we construct strategies for Player 1 that use space s(T), for an arbitrary small unbounded non-decreasing function s, and which guarantee an ε-optimal value for Player 1 in the limit superior sense. The previously known strategies use space Ω(logT) and it was known that no strategy can use constant space if it is ε-optimal even in the limit superior sense. We also give a complementary lower bound. Furthermore, we also show that no Markov strategy, even extended with finite memory, can ensure value greater than 0 in the Big Match, answering a question posed by Abraham Neyman.
△ Less
Submitted 26 April, 2016;
originally announced April 2016.
-
A communication game related to the sensitivity conjecture
Authors:
Justin Gilmer,
Michal Koucký,
Michael Saks
Abstract:
One of the major outstanding foundational problems about boolean functions is the sensitivity conjecture, which (in one of its many forms) asserts that the degree of a boolean function (i.e. the minimum degree of a real polynomial that interpolates the function) is bounded above by some fixed power of its sensitivity (which is the maximum vertex degree of the graph defined on the inputs where two…
▽ More
One of the major outstanding foundational problems about boolean functions is the sensitivity conjecture, which (in one of its many forms) asserts that the degree of a boolean function (i.e. the minimum degree of a real polynomial that interpolates the function) is bounded above by some fixed power of its sensitivity (which is the maximum vertex degree of the graph defined on the inputs where two inputs are adjacent if they differ in exactly one coordinate and their function values are different). We propose an attack on the sensitivity conjecture in terms of a novel two-player communication game. A lower bound of the form $n^{Ω(1)}$ on the cost of this game would imply the sensitivity conjecture.
To investigate the problem of bounding the cost of the game, three natural (stronger) variants of the question are considered. For two of these variants, protocols are presented that show that the hoped for lower bound does not hold. These protocols satisfy a certain monotonicity property, and (in contrast to the situation for the two variants) we show that the cost of any monotone protocol satisfies a strong lower bound.
There is an easy upper bound of $\sqrt{n}$ on the cost of the game. We also improve slightly on this upper bound.
△ Less
Submitted 24 November, 2015;
originally announced November 2015.
-
On Online Labeling with Polynomially Many Labels
Authors:
Martin Babka,
Jan Bulánek,
Vladimír Čunát,
Michal Koucký,
Michael Saks
Abstract:
In the online labeling problem with parameters n and m we are presented with a sequence of n keys from a totally ordered universe U and must assign each arriving key a label from the label set {1,2,...,m} so that the order of labels (strictly) respects the ordering on U. As new keys arrive it may be necessary to change the labels of some items; such changes may be done at any time at unit cost for…
▽ More
In the online labeling problem with parameters n and m we are presented with a sequence of n keys from a totally ordered universe U and must assign each arriving key a label from the label set {1,2,...,m} so that the order of labels (strictly) respects the ordering on U. As new keys arrive it may be necessary to change the labels of some items; such changes may be done at any time at unit cost for each change. The goal is to minimize the total cost. An alternative formulation of this problem is the file maintenance problem, in which the items, instead of being labeled, are maintained in sorted order in an array of length m, and we pay unit cost for moving an item.
For the case m=cn for constant c>1, there are known algorithms that use at most O(n log(n)^2) relabelings in total [Itai, Konheim, Rodeh, 1981], and it was shown recently that this is asymptotically optimal [Bulánek, Koucký, Saks, 2012]. For the case of m=Θ(n^C) for C>1, algorithms are known that use O(n log n) relabelings. A matching lower bound was claimed in [Dietz, Seiferas, Zhang, 2004]. That proof involved two distinct steps: a lower bound for a problem they call prefix bucketing and a reduction from prefix bucketing to online labeling. The reduction seems to be incorrect, leaving a (seemingly significant) gap in the proof. In this paper we close the gap by presenting a correct reduction to prefix bucketing. Furthermore we give a simplified and improved analysis of the prefix bucketing lower bound. This improvement allows us to extend the lower bounds for online labeling to the case where the number m of labels is superpolynomial in n. In particular, for superpolynomial m we get an asymptotically optimal lower bound Ω((n log n) / (log log m - log log n)).
△ Less
Submitted 11 October, 2012;
originally announced October 2012.
-
Exact Algorithms for Solving Stochastic Games
Authors:
Kristoffer Arnsfelt Hansen,
Michal Koucky,
Niels Lauritzen,
Peter Bro Miltersen,
Elias Tsigaridas
Abstract:
Shapley's discounted stochastic games, Everett's recursive games and Gillette's undiscounted stochastic games are classical models of game theory describing two-player zero-sum games of potentially infinite duration. We describe algorithms for exactly solving these games.
Shapley's discounted stochastic games, Everett's recursive games and Gillette's undiscounted stochastic games are classical models of game theory describing two-player zero-sum games of potentially infinite duration. We describe algorithms for exactly solving these games.
△ Less
Submitted 17 February, 2012;
originally announced February 2012.
-
Tight lower bounds for online labeling problem
Authors:
Jan Bulánek,
Michal Koucký,
Michael Saks
Abstract:
We consider the file maintenance problem (also called the online labeling problem) in which n integer items from the set {1,...,r} are to be stored in an array of size m >= n. The items are presented sequentially in an arbitrary order, and must be stored in the array in sorted order (but not necessarily in consecutive locations in the array). Each new item must be stored in the array before the ne…
▽ More
We consider the file maintenance problem (also called the online labeling problem) in which n integer items from the set {1,...,r} are to be stored in an array of size m >= n. The items are presented sequentially in an arbitrary order, and must be stored in the array in sorted order (but not necessarily in consecutive locations in the array). Each new item must be stored in the array before the next item is received. If r<=m then we can simply store item j in location j but if r>m then we may have to shift the location of stored items to make space for a newly arrived item. The algorithm is charged each time an item is stored in the array, or moved to a new location. The goal is to minimize the total number of such moves done by the algorithm. This problem is non-trivial when n=<m<r.
In the case that m=Cn for some C>1, algorithms for this problem with cost O(log(n)^2) per item have been given [IKR81, Wil92, BCD+02]. When m=n, algorithms with cost O(log(n)^3) per item were given [Zha93, BS07]. In this paper we prove lower bounds that show that these algorithms are optimal, up to constant factors. Previously, the only lower bound known for this range of parameters was a lower bound of Ω(log(n)^2) for the restricted class of smooth algorithms [DSZ05a, Zha93].
We also provide an algorithm for the sparse case: If the number of items is polylogarithmic in the array size then the problem can be solved in amortized constant time per item.
△ Less
Submitted 23 December, 2011;
originally announced December 2011.
-
Derandomizing from Random Strings
Authors:
Harry Buhrman,
Lance Fortnow,
Michal Koucký,
Bruno Loff
Abstract:
In this paper we show that BPP is truth-table reducible to the set of Kolmogorov random strings R_K. It was previously known that PSPACE, and hence BPP is Turing-reducible to R_K. The earlier proof relied on the adaptivity of the Turing-reduction to find a Kolmogorov-random string of polynomial length using the set R_K as oracle. Our new non-adaptive result relies on a new fundamental fact about…
▽ More
In this paper we show that BPP is truth-table reducible to the set of Kolmogorov random strings R_K. It was previously known that PSPACE, and hence BPP is Turing-reducible to R_K. The earlier proof relied on the adaptivity of the Turing-reduction to find a Kolmogorov-random string of polynomial length using the set R_K as oracle. Our new non-adaptive result relies on a new fundamental fact about the set R_K, namely each initial segment of the characteristic sequence of R_K is not compressible by recursive means. As a partial converse to our claim we show that strings of high Kolmogorov-complexity when used as advice are not much more useful than randomly chosen strings.
△ Less
Submitted 16 December, 2009;
originally announced December 2009.
-
Many Random Walks Are Faster Than One
Authors:
Noga Alon,
Chen Avin,
Michal Koucky,
Gady Kozma,
Zvi Lotker,
Mark R. Tuttle
Abstract:
We pose a new and intriguing question motivated by distributed computing regarding random walks on graphs: How long does it take for several independent random walks, starting from the same vertex, to cover an entire graph? We study the cover time - the expected time required to visit every node in a graph at least once - and we show that for a large collection of interesting graphs, running man…
▽ More
We pose a new and intriguing question motivated by distributed computing regarding random walks on graphs: How long does it take for several independent random walks, starting from the same vertex, to cover an entire graph? We study the cover time - the expected time required to visit every node in a graph at least once - and we show that for a large collection of interesting graphs, running many random walks in parallel yields a speed-up in the cover time that is linear in the number of parallel walks. We demonstrate that an exponential speed-up is sometimes possible, but that some natural graphs allow only a logarithmic speed-up. A problem related to ours (in which the walks start from some probabilistic distribution on vertices) was previously studied in the context of space efficient algorithms for undirected s-t connectivity and our results yield, in certain cases, an improvement upon some of the earlier bounds.
△ Less
Submitted 20 November, 2007; v1 submitted 3 May, 2007;
originally announced May 2007.