-
The Dynamic k-Mismatch Problem
Authors:
Raphaël Clifford,
Paweł Gawrychowski,
Tomasz Kociumaka,
Daniel P. Martin,
Przemysław Uznański
Abstract:
The text-to-pattern Hamming distances problem asks to compute the Hamming distances between a given pattern of length $m$ and all length-$m$ substrings of a given text of length $n\ge m$. We focus on the $k$-mismatch version of the problem, where a distance needs to be returned only if it does not exceed a threshold $k$. We assume $n\le 2m$ (in general, one can partition the text into overlap**…
▽ More
The text-to-pattern Hamming distances problem asks to compute the Hamming distances between a given pattern of length $m$ and all length-$m$ substrings of a given text of length $n\ge m$. We focus on the $k$-mismatch version of the problem, where a distance needs to be returned only if it does not exceed a threshold $k$. We assume $n\le 2m$ (in general, one can partition the text into overlap** blocks). In this work, we show data structures for the dynamic version of this problem supporting two operations: An update performs a single-letter substitution in the pattern or the text, and a query, given an index $i$, returns the Hamming distance between the pattern and the text substring starting at position $i$, or reports that it exceeds $k$.
First, we show a data structure with $\tilde{O}(1)$ update and $\tilde{O}(k)$ query time. Then we show that $\tilde{O}(k)$ update and $\tilde{O}(1)$ query time is also possible. These two provide an optimal trade-off for the dynamic $k$-mismatch problem with $k \le \sqrt{n}$: we prove that, conditioned on the strong 3SUM conjecture, one cannot simultaneously achieve $k^{1-Ω(1)}$ time for all operations.
For $k\ge \sqrt{n}$, we give another lower bound, conditioned on the Online Matrix-Vector conjecture, that excludes algorithms taking $n^{1/2-Ω(1)}$ time per operation. This is tight for constant-sized alphabets: Clifford et al. (STACS 2018) achieved $\tilde{O}(\sqrt{n})$ time per operation in that case, but with $\tilde{O}(n^{3/4})$ time per operation for large alphabets. We improve and extend this result with an algorithm that, given $1\le x\le k$, achieves update time $\tilde{O}(\frac{n}{k} +\sqrt{\frac{nk}{x}})$ and query time $\tilde{O}(x)$. In particular, for $k\ge \sqrt{n}$, an appropriate choice of $x$ yields $\tilde{O}(\sqrt[3]{nk})$ time per operation, which is $\tilde{O}(n^{2/3})$ when no threshold $k$ is provided.
△ Less
Submitted 28 March, 2022; v1 submitted 13 May, 2021;
originally announced May 2021.
-
Faster classical Boson Sampling
Authors:
Peter Clifford,
Raphaël Clifford
Abstract:
Since its introduction Boson Sampling has been the subject of intense study in the world of quantum computing. The task is to sample independently from the set of all $n \times n$ submatrices built from possibly repeated rows of a larger $m \times n$ complex matrix according to a probability distribution related to the permanents of the submatrices. Experimental systems exploiting quantum photonic…
▽ More
Since its introduction Boson Sampling has been the subject of intense study in the world of quantum computing. The task is to sample independently from the set of all $n \times n$ submatrices built from possibly repeated rows of a larger $m \times n$ complex matrix according to a probability distribution related to the permanents of the submatrices. Experimental systems exploiting quantum photonic effects can in principle perform the task at great speed. In the framework of classical computing, Aaronson and Arkhipov (2011) showed that exact Boson Sampling problem cannot be solved in polynomial time unless the polynomial hierarchy collapses to the third level. Indeed for a number of years the fastest known exact classical algorithm ran in $O({m+n-1 \choose n} n 2^n )$ time per sample, emphasising the potential speed advantage of quantum computation. The advantage was reduced by Clifford and Clifford (2018) who gave a significantly faster classical solution taking $O(n 2^n + \operatorname{poly}(m,n))$ time and linear space, matching the complexity of computing the permanent of a single matrix when $m$ is polynomial in $n$.
We continue by presenting an algorithm for Boson Sampling whose average-case time complexity is much faster when $m$ is proportional to $n$. In particular, when $m = n$ our algorithm runs in approximately $O(n\cdot1.69^n)$ time on average. This result further increases the problem size needed to establish quantum computational supremacy via Boson Sampling.
△ Less
Submitted 26 May, 2020; v1 submitted 7 May, 2020;
originally announced May 2020.
-
RLE edit distance in near optimal time
Authors:
Raphaël Clifford,
Paweł Gawrychowski,
Tomasz Kociumaka,
Daniel P. Martin,
Przemysław Uznański
Abstract:
We show that the edit distance between two run-length encoded strings of compressed lengths $m$ and $n$ respectively, can be computed in $\mathcal{O}(mn\log(mn))$ time. This improves the previous record by a factor of $\mathcal{O}(n/\log(mn))$. The running time of our algorithm is within subpolynomial factors of being optimal, subject to the standard SETH-hardness assumption. This effectively clos…
▽ More
We show that the edit distance between two run-length encoded strings of compressed lengths $m$ and $n$ respectively, can be computed in $\mathcal{O}(mn\log(mn))$ time. This improves the previous record by a factor of $\mathcal{O}(n/\log(mn))$. The running time of our algorithm is within subpolynomial factors of being optimal, subject to the standard SETH-hardness assumption. This effectively closes a line of algorithmic research first started in 1993.
△ Less
Submitted 3 May, 2019;
originally announced May 2019.
-
Upper and lower bounds for dynamic data structures on strings
Authors:
Raphael Clifford,
Allan Grønlund,
Kasper Green Larsen,
Tatiana Starikovskaya
Abstract:
We consider a range of simply stated dynamic data structure problems on strings. An update changes one symbol in the input and a query asks us to compute some function of the pattern of length $m$ and a substring of a longer text. We give both conditional and unconditional lower bounds for variants of exact matching with wildcards, inner product, and Hamming distance computation via a sequence of…
▽ More
We consider a range of simply stated dynamic data structure problems on strings. An update changes one symbol in the input and a query asks us to compute some function of the pattern of length $m$ and a substring of a longer text. We give both conditional and unconditional lower bounds for variants of exact matching with wildcards, inner product, and Hamming distance computation via a sequence of reductions. As an example, we show that there does not exist an $O(m^{1/2-\varepsilon})$ time algorithm for a large range of these problems unless the online Boolean matrix-vector multiplication conjecture is false. We also provide nearly matching upper bounds for most of the problems we consider.
△ Less
Submitted 19 February, 2018;
originally announced February 2018.
-
The streaming $k$-mismatch problem
Authors:
Raphaël Clifford,
Tomasz Kociumaka,
Ely Porat
Abstract:
We consider the streaming complexity of a fundamental task in approximate pattern matching: the $k$-mismatch problem. It asks to compute Hamming distances between a pattern of length $n$ and all length-$n$ substrings of a text for which the Hamming distance does not exceed a given threshold $k$. In our problem formulation, we report not only the Hamming distance but also, on demand, the full \emph…
▽ More
We consider the streaming complexity of a fundamental task in approximate pattern matching: the $k$-mismatch problem. It asks to compute Hamming distances between a pattern of length $n$ and all length-$n$ substrings of a text for which the Hamming distance does not exceed a given threshold $k$. In our problem formulation, we report not only the Hamming distance but also, on demand, the full \emph{mismatch information}, that is the list of mismatched pairs of symbols and their indices. The twin challenges of streaming pattern matching derive from the need both to achieve small working space and also to guarantee that every arriving input symbol is processed quickly.
We present a streaming algorithm for the $k$-mismatch problem which uses $O(k\log{n}\log\frac{n}{k})$ bits of space and spends \ourcomplexity time on each symbol of the input stream, which consists of the pattern followed by the text. The running time almost matches the classic offline solution and the space usage is within a logarithmic factor of optimal.
Our new algorithm therefore effectively resolves and also extends an open problem first posed in FOCS'09. En route to this solution, we also give a deterministic $O( k (\log \frac{n}{k} + \log |Σ|) )$-bit encoding of all the alignments with Hamming distance at most $k$ of a length-$n$ pattern within a text of length $O(n)$. This secondary result provides an optimal solution to a natural communication complexity problem which may be of independent interest.
△ Less
Submitted 9 April, 2018; v1 submitted 17 August, 2017;
originally announced August 2017.
-
The Classical Complexity of Boson Sampling
Authors:
Peter Clifford,
Raphaël Clifford
Abstract:
We study the classical complexity of the exact Boson Sampling problem where the objective is to produce provably correct random samples from a particular quantum mechanical distribution. The computational framework was proposed by Aaronson and Arkhipov in 2011 as an attainable demonstration of `quantum supremacy', that is a practical quantum computing experiment able to produce output at a speed b…
▽ More
We study the classical complexity of the exact Boson Sampling problem where the objective is to produce provably correct random samples from a particular quantum mechanical distribution. The computational framework was proposed by Aaronson and Arkhipov in 2011 as an attainable demonstration of `quantum supremacy', that is a practical quantum computing experiment able to produce output at a speed beyond the reach of classical (that is non-quantum) computer hardware. Since its introduction Boson Sampling has been the subject of intense international research in the world of quantum computing. On the face of it, the problem is challenging for classical computation. Aaronson and Arkhipov show that exact Boson Sampling is not efficiently solvable by a classical computer unless $P^{\#P} = BPP^{NP}$ and the polynomial hierarchy collapses to the third level.
The fastest known exact classical algorithm for the standard Boson Sampling problem takes $O({m + n -1 \choose n} n 2^n )$ time to produce samples for a system with input size $n$ and $m$ output modes, making it infeasible for anything but the smallest values of $n$ and $m$. We give an algorithm that is much faster, running in $O(n 2^n + \operatorname{poly}(m,n))$ time and $O(m)$ additional space. The algorithm is simple to implement and has low constant factor overheads. As a consequence our classical algorithm is able to solve the exact Boson Sampling problem for system sizes far beyond current photonic quantum computing experimentation, thereby significantly reducing the likelihood of achieving near-term quantum supremacy in the context of Boson Sampling.
△ Less
Submitted 17 October, 2017; v1 submitted 5 June, 2017;
originally announced June 2017.
-
No imminent quantum supremacy by boson sampling
Authors:
Alex Neville,
Chris Sparrow,
Raphaël Clifford,
Eric Johnston,
Patrick M. Birchall,
Ashley Montanaro,
Anthony Laing
Abstract:
It is predicted that quantum computers will dramatically outperform their conventional counterparts. However, large-scale universal quantum computers are yet to be built. Boson sampling is a rudimentary quantum algorithm tailored to the platform of photons in linear optics, which has sparked interest as a rapid way to demonstrate this quantum supremacy. Photon statistics are governed by intractabl…
▽ More
It is predicted that quantum computers will dramatically outperform their conventional counterparts. However, large-scale universal quantum computers are yet to be built. Boson sampling is a rudimentary quantum algorithm tailored to the platform of photons in linear optics, which has sparked interest as a rapid way to demonstrate this quantum supremacy. Photon statistics are governed by intractable matrix functions known as permanents, which suggests that sampling from the distribution obtained by injecting photons into a linear-optical network could be solved more quickly by a photonic experiment than by a classical computer. The contrast between the apparently awesome challenge faced by any classical sampling algorithm and the apparently near-term experimental resources required for a large boson sampling experiment has raised expectations that quantum supremacy by boson sampling is on the horizon. Here we present classical boson sampling algorithms and theoretical analyses of prospects for scaling boson sampling experiments, showing that near-term quantum supremacy via boson sampling is unlikely. While the largest boson sampling experiments reported so far are with 5 photons, our classical algorithm, based on Metropolised independence sampling (MIS), allowed the boson sampling problem to be solved for 30 photons with standard computing hardware. We argue that the impact of experimental photon losses means that demonstrating quantum supremacy by boson sampling would require a step change in technology.
△ Less
Submitted 1 May, 2017;
originally announced May 2017.
-
Approximate Hamming distance in a stream
Authors:
Raphael Clifford,
Tatiana Starikovskaya
Abstract:
We consider the problem of computing a $(1+ε)$-approximation of the Hamming distance between a pattern of length $n$ and successive substrings of a stream. We first look at the one-way randomised communication complexity of this problem, giving Alice the first half of the stream and Bob the second half. We show the following: (1) If Alice and Bob both share the pattern then there is an…
▽ More
We consider the problem of computing a $(1+ε)$-approximation of the Hamming distance between a pattern of length $n$ and successive substrings of a stream. We first look at the one-way randomised communication complexity of this problem, giving Alice the first half of the stream and Bob the second half. We show the following: (1) If Alice and Bob both share the pattern then there is an $O(ε^{-4} \log^2 n)$ bit randomised one-way communication protocol. (2) If only Alice has the pattern then there is an $O(ε^{-2}\sqrt{n}\log n)$ bit randomised one-way communication protocol.
We then go on to develop small space streaming algorithms for $(1+ε)$-approximate Hamming distance which give worst case running time guarantees per arriving symbol. (1) For binary input alphabets there is an $O(ε^{-3} \sqrt{n} \log^{2} n)$ space and $O(ε^{-2} \log{n})$ time streaming $(1+ε)$-approximate Hamming distance algorithm. (2) For general input alphabets there is an $O(ε^{-5} \sqrt{n} \log^{4} n)$ space and $O(ε^{-4} \log^3 {n})$ time streaming $(1+ε)$-approximate Hamming distance algorithm.
△ Less
Submitted 23 February, 2016;
originally announced February 2016.
-
The k-mismatch problem revisited
Authors:
Raphaël Clifford,
Allyx Fontaine,
Ely Porat,
Benjamin Sach,
Tatiana Starikovskaya
Abstract:
We revisit the complexity of one of the most basic problems in pattern matching. In the k-mismatch problem we must compute the Hamming distance between a pattern of length m and every m-length substring of a text of length n, as long as that Hamming distance is at most k. Where the Hamming distance is greater than k at some alignment of the pattern and text, we simply output "No".
We study this…
▽ More
We revisit the complexity of one of the most basic problems in pattern matching. In the k-mismatch problem we must compute the Hamming distance between a pattern of length m and every m-length substring of a text of length n, as long as that Hamming distance is at most k. Where the Hamming distance is greater than k at some alignment of the pattern and text, we simply output "No".
We study this problem in both the standard offline setting and also as a streaming problem. In the streaming k-mismatch problem the text arrives one symbol at a time and we must give an output before processing any future symbols. Our main results are as follows:
1) Our first result is a deterministic $O(n k^2\log{k} / m+n \text{polylog} m)$ time offline algorithm for k-mismatch on a text of length n. This is a factor of k improvement over the fastest previous result of this form from SODA 2000 by Amihood Amir et al.
2) We then give a randomised and online algorithm which runs in the same time complexity but requires only $O(k^2\text{polylog} {m})$ space in total.
3) Next we give a randomised $(1+ε)$-approximation algorithm for the streaming k-mismatch problem which uses $O(k^2\text{polylog} m / ε^2)$ space and runs in $O(\text{polylog} m / ε^2)$ worst-case time per arriving symbol.
4) Finally we combine our new results to derive a randomised $O(k^2\text{polylog} {m})$ space algorithm for the streaming k-mismatch problem which runs in $O(\sqrt{k}\log{k} + \text{polylog} {m})$ worst-case time per arriving symbol. This improves the best previous space complexity for streaming k-mismatch from FOCS 2009 by Benny Porat and Ely Porat by a factor of k. We also improve the time complexity of this previous result by an even greater factor to match the fastest known offline algorithm (up to logarithmic factors).
△ Less
Submitted 27 August, 2015; v1 submitted 4 August, 2015;
originally announced August 2015.
-
Dictionary matching in a stream
Authors:
Raphael Clifford,
Allyx Fontaine,
Ely Porat,
Benjamin Sach,
Tatiana Starikovskaya
Abstract:
We consider the problem of dictionary matching in a stream. Given a set of strings, known as a dictionary, and a stream of characters arriving one at a time, the task is to report each time some string in our dictionary occurs in the stream. We present a randomised algorithm which takes O(log log(k + m)) time per arriving character and uses O(k log m) words of space, where k is the number of strin…
▽ More
We consider the problem of dictionary matching in a stream. Given a set of strings, known as a dictionary, and a stream of characters arriving one at a time, the task is to report each time some string in our dictionary occurs in the stream. We present a randomised algorithm which takes O(log log(k + m)) time per arriving character and uses O(k log m) words of space, where k is the number of strings in the dictionary and m is the length of the longest string in the dictionary.
△ Less
Submitted 23 April, 2015;
originally announced April 2015.
-
New Unconditional Hardness Results for Dynamic and Online Problems
Authors:
Raphael Clifford,
Allan Grønlund,
Kasper Green Larsen
Abstract:
There has been a resurgence of interest in lower bounds whose truth rests on the conjectured hardness of well known computational problems. These conditional lower bounds have become important and popular due to the painfully slow progress on proving strong unconditional lower bounds. Nevertheless, the long term goal is to replace these conditional bounds with unconditional ones. In this paper we…
▽ More
There has been a resurgence of interest in lower bounds whose truth rests on the conjectured hardness of well known computational problems. These conditional lower bounds have become important and popular due to the painfully slow progress on proving strong unconditional lower bounds. Nevertheless, the long term goal is to replace these conditional bounds with unconditional ones. In this paper we make progress in this direction by studying the cell probe complexity of two conjectured to be hard problems of particular importance: matrix-vector multiplication and a version of dynamic set disjointness known as Patrascu's Multiphase Problem. We give improved unconditional lower bounds for these problems as well as introducing new proof techniques of independent interest. These include a technique capable of proving strong threshold lower bounds of the following form: If we insist on having a very fast query time, then the update time has to be slow enough to compute a lookup table with the answer to every possible query. This is the first time a lower bound of this type has been proven.
△ Less
Submitted 8 April, 2015;
originally announced April 2015.
-
The complexity of computation in bit streams
Authors:
Raphael Clifford,
Markus Jalsenius,
Benjamin Sach
Abstract:
We revisit the complexity of online computation in the cell probe model. We consider a class of problems where we are first given a fixed pattern or vector $F$ of $n$ symbols and then one symbol arrives at a time in a stream. After each symbol has arrived we must output some function of $F$ and the $n$-length suffix of the arriving stream. Cell probe bounds of $Ω(δ\lg{n}/w)$ have previously been s…
▽ More
We revisit the complexity of online computation in the cell probe model. We consider a class of problems where we are first given a fixed pattern or vector $F$ of $n$ symbols and then one symbol arrives at a time in a stream. After each symbol has arrived we must output some function of $F$ and the $n$-length suffix of the arriving stream. Cell probe bounds of $Ω(δ\lg{n}/w)$ have previously been shown for both convolution and Hamming distance in this setting, where $δ$ is the size of a symbol in bits and $w\inΩ(\lg{n})$ is the cell size in bits. However, when $δ$ is a constant, as it is in many natural situations, these previous results no longer give us non-trivial bounds.
We introduce a new lop-sided information transfer proof technique which enables us to prove meaningful lower bounds even for constant size input alphabets. We use our new framework to prove an amortised cell probe lower bound of $Ω(\lg^2 n/(w\cdot \lg \lg n))$ time per arriving bit for an online version of a well studied problem known as pattern matching with address errors. This is the first non-trivial cell probe lower bound for any online problem on bit streams that still holds when the cell sizes are large. We also show the same bound for online convolution conditioned on a new combinatorial conjecture related to Toeplitz matrices.
△ Less
Submitted 3 April, 2015;
originally announced April 2015.
-
Time Bounds for Streaming Problems
Authors:
Raphael Clifford,
Markus Jalsenius,
Benjamin Sach
Abstract:
We give tight cell-probe bounds for the time to compute convolution, multiplication and Hamming distance in a stream. The cell probe model is a particularly strong computational model and subsumes, for example, the popular word RAM model.
We first consider online convolution where the task is to output the inner product between a fixed $n$-dimensional vector and a vector of the $n$ most recent v…
▽ More
We give tight cell-probe bounds for the time to compute convolution, multiplication and Hamming distance in a stream. The cell probe model is a particularly strong computational model and subsumes, for example, the popular word RAM model.
We first consider online convolution where the task is to output the inner product between a fixed $n$-dimensional vector and a vector of the $n$ most recent values from a stream. One symbol of the stream arrives at a time and the each output must be computed before the next symbols arrives.
Next we show bounds for online multiplication where the stream consists of pairs of digits, one from each of two $n$ digit numbers that are to be multiplied. One pair arrives at a time and the task is to output a single new digit from the product before the next pair of digits arrives.
Finally we look at the online Hamming distance problem where the Hamming distance is outputted instead of the inner product.
For each of these three problems, we give a lower bound of $Ω(\fracδ{w}\log n)$ time on average per output, where $δ$ is the number of bits needed to represent an input symbol and $w$ is the cell or word size. We argue that these bound are in fact tight within the cell probe model.
△ Less
Submitted 22 December, 2014;
originally announced December 2014.
-
Cell-Probe Bounds for Online Edit Distance and Other Pattern Matching Problems
Authors:
Raphael Clifford,
Markus Jalsenius,
Benjamin Sach
Abstract:
We give cell-probe bounds for the computation of edit distance, Hamming distance, convolution and longest common subsequence in a stream. In this model, a fixed string of $n$ symbols is given and one $δ$-bit symbol arrives at a time in a stream. After each symbol arrives, the distance between the fixed string and a suffix of most recent symbols of the stream is reported. The cell-probe model is pe…
▽ More
We give cell-probe bounds for the computation of edit distance, Hamming distance, convolution and longest common subsequence in a stream. In this model, a fixed string of $n$ symbols is given and one $δ$-bit symbol arrives at a time in a stream. After each symbol arrives, the distance between the fixed string and a suffix of most recent symbols of the stream is reported. The cell-probe model is perhaps the strongest model of computation for showing data structure lower bounds, subsuming in particular the popular word-RAM model.
* We first give an $Ω((δ\log n)/(w+\log\log n))$ lower bound for the time to give each output for both online Hamming distance and convolution, where $w$ is the word size. This bound relies on a new encoding scheme and for the first time holds even when $w$ is as small as a single bit.
* We then consider the online edit distance and longest common subsequence problems in the bit-probe model ($w=1$) with a constant sized input alphabet. We give a lower bound of $Ω(\sqrt{\log n}/(\log\log n)^{3/2})$ which applies for both problems. This second set of results relies both on our new encoding scheme as well as a carefully constructed hard distribution.
* Finally, for the online edit distance problem we show that there is an $O((\log n)^2/w)$ upper bound in the cell-probe model. This bound gives a contrast to our new lower bound and also establishes an exponential gap between the known cell-probe and RAM model complexities.
△ Less
Submitted 24 July, 2014;
originally announced July 2014.
-
Element Distinctness, Frequency Moments, and Sliding Windows
Authors:
Paul Beame,
Raphael Clifford,
Widad Machmouchi
Abstract:
We derive new time-space tradeoff lower bounds and algorithms for exactly computing statistics of input data, including frequency moments, element distinctness, and order statistics, that are simple to calculate for sorted data. We develop a randomized algorithm for the element distinctness problem whose time T and space S satisfy T in O (n^{3/2}/S^{1/2}), smaller than previous lower bounds for co…
▽ More
We derive new time-space tradeoff lower bounds and algorithms for exactly computing statistics of input data, including frequency moments, element distinctness, and order statistics, that are simple to calculate for sorted data. We develop a randomized algorithm for the element distinctness problem whose time T and space S satisfy T in O (n^{3/2}/S^{1/2}), smaller than previous lower bounds for comparison-based algorithms, showing that element distinctness is strictly easier than sorting for randomized branching programs. This algorithm is based on a new time and space efficient algorithm for finding all collisions of a function f from a finite set to itself that are reachable by iterating f from a given set of starting points. We further show that our element distinctness algorithm can be extended at only a polylogarithmic factor cost to solve the element distinctness problem over sliding windows, where the task is to take an input of length 2n-1 and produce an output for each window of length n, giving n outputs in total. In contrast, we show a time-space tradeoff lower bound of T in Omega(n^2/S) for randomized branching programs to compute the number of distinct elements over sliding windows. The same lower bound holds for computing the low-order bit of F_0 and computing any frequency moment F_k, k neq 1. This shows that those frequency moments and the decision problem F_0 mod 2 are strictly harder than element distinctness. We complement this lower bound with a T in O(n^2/S) comparison-based deterministic RAM algorithm for exactly computing F_k over sliding windows, nearly matching both our lower bound for the sliding-window version and the comparison-based lower bounds for the single-window version. We further exhibit a quantum algorithm for F_0 over sliding windows with T in O(n^{3/2}/S^{1/2}). Finally, we consider the computations of order statistics over sliding windows.
△ Less
Submitted 14 September, 2013;
originally announced September 2013.
-
Sliding Windows with Limited Storage
Authors:
Paul Beame,
Raphael Clifford,
Widad Machmouchi
Abstract:
We consider time-space tradeoffs for exactly computing frequency moments and order statistics over sliding windows. Given an input of length 2n-1, the task is to output the function of each window of length n, giving n outputs in total. Computations over sliding windows are related to direct sum problems except that inputs to instances almost completely overlap.
We show an average case and ran…
▽ More
We consider time-space tradeoffs for exactly computing frequency moments and order statistics over sliding windows. Given an input of length 2n-1, the task is to output the function of each window of length n, giving n outputs in total. Computations over sliding windows are related to direct sum problems except that inputs to instances almost completely overlap.
We show an average case and randomized time-space tradeoff lower bound of TS in Omega(n^2) for multi-way branching programs, and hence standard RAM and word-RAM models, to compute the number of distinct elements, F_0, in sliding windows over alphabet [n]. The same lower bound holds for computing the low-order bit of F_0 and computing any frequency moment F_k for k not equal to 1. We complement this lower bound with a TS in \tilde O(n^2) deterministic RAM algorithm for exactly computing F_k in sliding windows.
We show time-space separations between the complexity of sliding-window element distinctness and that of sliding-window $F_0\bmod 2$ computation. In particular for alphabet [n] there is a very simple errorless sliding-window algorithm for element distinctness that runs in O(n) time on average and uses O(log{n}) space.
We show that any algorithm for a single element distinctness instance can be extended to an algorithm for the sliding-window version of element distinctness with at most a polylogarithmic increase in the time-space product.
Finally, we show that the sliding-window computation of order statistics such as the maximum and minimum can be computed with only a logarithmic increase in time, but that a TS in Omega(n^2) lower bound holds for sliding-window computation of order statistics such as the median, a nearly linear increase in time when space is small.
△ Less
Submitted 17 September, 2013; v1 submitted 18 December, 2012;
originally announced December 2012.
-
Tight Cell-Probe Bounds for Online Hamming Distance Computation
Authors:
Raphael Clifford,
Markus Jalsenius,
Benjamin Sach
Abstract:
We show tight bounds for online Hamming distance computation in the cell-probe model with word size w. The task is to output the Hamming distance between a fixed string of length n and the last n symbols of a stream. We give a lower bound of Omega((d/w)*log n) time on average per output, where d is the number of bits needed to represent an input symbol. We argue that this bound is tight within the…
▽ More
We show tight bounds for online Hamming distance computation in the cell-probe model with word size w. The task is to output the Hamming distance between a fixed string of length n and the last n symbols of a stream. We give a lower bound of Omega((d/w)*log n) time on average per output, where d is the number of bits needed to represent an input symbol. We argue that this bound is tight within the model. The lower bound holds under randomisation and amortisation.
△ Less
Submitted 15 October, 2012; v1 submitted 8 July, 2012;
originally announced July 2012.
-
Pattern Matching in Multiple Streams
Authors:
Raphael Clifford,
Markus Jalsenius,
Ely Porat,
Benjamin Sach
Abstract:
We investigate the problem of deterministic pattern matching in multiple streams. In this model, one symbol arrives at a time and is associated with one of s streaming texts. The task at each time step is to report if there is a new match between a fixed pattern of length m and a newly updated stream. As is usual in the streaming context, the goal is to use as little space as possible while still…
▽ More
We investigate the problem of deterministic pattern matching in multiple streams. In this model, one symbol arrives at a time and is associated with one of s streaming texts. The task at each time step is to report if there is a new match between a fixed pattern of length m and a newly updated stream. As is usual in the streaming context, the goal is to use as little space as possible while still reporting matches quickly. We give almost matching upper and lower space bounds for three distinct pattern matching problems. For exact matching we show that the problem can be solved in constant time per arriving symbol and O(m+s) words of space. For the k-mismatch and k-difference problems we give O(k) time solutions that require O(m+ks) words of space. In all three cases we also give space lower bounds which show our methods are optimal up to a single logarithmic factor. Finally we set out a number of open problems related to this new model for pattern matching.
△ Less
Submitted 25 April, 2012; v1 submitted 15 February, 2012;
originally announced February 2012.
-
Pattern Matching under Polynomial Transformation
Authors:
Ayelet Butman,
Peter Clifford,
Raphael Clifford,
Markus Jalsenius,
Noa Lewenstein,
Benny Porat,
Ely Porat,
Benjamin Sach
Abstract:
We consider a class of pattern matching problems where a normalising transformation is applied at every alignment. Normalised pattern matching plays a key role in fields as diverse as image processing and musical information processing where application specific transformations are often applied to the input. By considering the class of polynomial transformations of the input, we provide fast algo…
▽ More
We consider a class of pattern matching problems where a normalising transformation is applied at every alignment. Normalised pattern matching plays a key role in fields as diverse as image processing and musical information processing where application specific transformations are often applied to the input. By considering the class of polynomial transformations of the input, we provide fast algorithms and the first lower bounds for both new and old problems. Given a pattern of length m and a longer text of length n where both are assumed to contain integer values only, we first show O(n log m) time algorithms for pattern matching under linear transformations even when wildcard symbols can occur in the input. We then show how to extend the technique to polynomial transformations of arbitrary degree. Next we consider the problem of finding the minimum Hamming distance under polynomial transformation. We show that, for any epsilon>0, there cannot exist an O(n m^(1-epsilon)) time algorithm for additive and linear transformations conditional on the hardness of the classic 3SUM problem. Finally, we consider a version of the Hamming distance problem under additive transformations with a bound k on the maximum distance that need be reported. We give a deterministic O(nk log k) time solution which we then improve by careful use of randomisation to O(n sqrt(k log k) log n) time for sufficiently small k. Our randomised solution outputs the correct answer at every position with high probability.
△ Less
Submitted 28 October, 2011; v1 submitted 7 September, 2011;
originally announced September 2011.
-
Space Lower Bounds for Online Pattern Matching
Authors:
Raphael Clifford,
Markus Jalsenius,
Ely Porat,
Benjamin Sach
Abstract:
We present space lower bounds for online pattern matching under a number of different distance measures. Given a pattern of length m and a text that arrives one character at a time, the online pattern matching problem is to report the distance between the pattern and a sliding window of the text as soon as the new character arrives. We require that the correct answer is given at each position with…
▽ More
We present space lower bounds for online pattern matching under a number of different distance measures. Given a pattern of length m and a text that arrives one character at a time, the online pattern matching problem is to report the distance between the pattern and a sliding window of the text as soon as the new character arrives. We require that the correct answer is given at each position with constant probability. We give Omega(m) bit space lower bounds for L_1, L_2, L_\infty, Hamming, edit and swap distances as well as for any algorithm that computes the cross-correlation/convolution. We then show a dichotomy between distance functions that have wildcard-like properties and those that do not. In the former case which includes, as an example, pattern matching with character classes, we give Omega(m) bit space lower bounds. For other distance functions, we show that there exist space bounds of Omega(log m) and O(log^2 m) bits. Finally we discuss space lower bounds for non-binary inputs and show how in some cases they can be improved.
△ Less
Submitted 22 June, 2011;
originally announced June 2011.
-
Tight Cell-Probe Bounds for Online Integer Multiplication and Convolution
Authors:
Raphael Clifford,
Markus Jalsenius
Abstract:
We show tight bounds for both online integer multiplication and convolution in the cell-probe model with word size w. For the multiplication problem, one pair of digits, each from one of two n digit numbers that are to be multiplied, is given as input at step i. The online algorithm outputs a single new digit from the product of the numbers before step i+1. We give a Theta((d/w)*log n) bound on av…
▽ More
We show tight bounds for both online integer multiplication and convolution in the cell-probe model with word size w. For the multiplication problem, one pair of digits, each from one of two n digit numbers that are to be multiplied, is given as input at step i. The online algorithm outputs a single new digit from the product of the numbers before step i+1. We give a Theta((d/w)*log n) bound on average per output digit for this problem where 2^d is the maximum value of a digit. In the convolution problem, we are given a fixed vector V of length n and we consider a stream in which numbers arrive one at a time. We output the inner product of V and the vector that consists of the last n numbers of the stream. We show a Theta((d/w)*log n) bound for the number of probes required per new number in the stream. All the bounds presented hold under randomisation and amortisation. Multiplication and convolution are central problems in the study of algorithms which also have the widest range of practical applications.
△ Less
Submitted 23 February, 2012; v1 submitted 4 January, 2011;
originally announced January 2011.
-
Restricted Common Superstring and Restricted Common Supersequence
Authors:
Raphaël Clifford,
Zvi Gotthilf,
Moshe Lewenstein,
Alexandru Popa
Abstract:
The {\em shortest common superstring} and the {\em shortest common supersequence} are two well studied problems having a wide range of applications. In this paper we consider both problems with resource constraints, denoted as the Restricted Common Superstring (shortly \textit{RCSstr}) problem and the Restricted Common Supersequence (shortly \textit{RCSseq}). In the \textit{RCSstr} (\textit{RCSseq…
▽ More
The {\em shortest common superstring} and the {\em shortest common supersequence} are two well studied problems having a wide range of applications. In this paper we consider both problems with resource constraints, denoted as the Restricted Common Superstring (shortly \textit{RCSstr}) problem and the Restricted Common Supersequence (shortly \textit{RCSseq}). In the \textit{RCSstr} (\textit{RCSseq}) problem we are given a set $S$ of $n$ strings, $s_1$, $s_2$, $\ldots$, $s_n$, and a multiset $t = \{t_1, t_2, \dots, t_m\}$, and the goal is to find a permutation $π: \{1, \dots, m\} \to \{1, \dots, m\}$ to maximize the number of strings in $S$ that are substrings (subsequences) of $π(t) = t_{π(1)}t_{π(2)}...t_{π(m)}$ (we call this ordering of the multiset, $π(t)$, a permutation of $t$). We first show that in its most general setting the \textit{RCSstr} problem is {\em NP-complete} and hard to approximate within a factor of $n^{1-ε}$, for any $ε> 0$, unless P = NP. Afterwards, we present two separate reductions to show that the \textit{RCSstr} problem remains NP-Hard even in the case where the elements of $t$ are drawn from a binary alphabet or for the case where all input strings are of length two. We then present some approximation results for several variants of the \textit{RCSstr} problem. In the second part of this paper, we turn to the \textit{RCSseq} problem, where we present some hardness results, tight lower bounds and approximation algorithms.
△ Less
Submitted 27 June, 2010; v1 submitted 3 April, 2010;
originally announced April 2010.
-
The Complexity of Flood Filling Games
Authors:
Raphael Clifford,
Markus Jalsenius,
Ashley Montanaro,
Benjamin Sach
Abstract:
We study the complexity of the popular one player combinatorial game known as Flood-It. In this game the player is given an n by n board of tiles where each tile is allocated one of c colours. The goal is to make the colours of all tiles equal via the shortest possible sequence of flooding operations. In the standard version, a flooding operation consists of the player choosing a colour k, which t…
▽ More
We study the complexity of the popular one player combinatorial game known as Flood-It. In this game the player is given an n by n board of tiles where each tile is allocated one of c colours. The goal is to make the colours of all tiles equal via the shortest possible sequence of flooding operations. In the standard version, a flooding operation consists of the player choosing a colour k, which then changes the colour of all the tiles in the monochromatic region connected to the top left tile to k. After this operation has been performed, neighbouring regions which are already of the chosen colour k will then also become connected, thereby extending the monochromatic region of the board. We show that finding the minimum number of flooding operations is NP-hard for c>=3 and that this even holds when the player can perform flooding operations from any position on the board. However, we show that this "free" variant is in P for c=2. We also prove that for an unbounded number of colours, Flood-It remains NP-hard for boards of height at least 3, but is in P for boards of height 2. Next we show how a c-1 approximation and a randomised 2c/3 approximation algorithm can be derived, and that no polynomial time constant factor, independent of c, approximation algorithm exists unless P=NP. We then investigate how many moves are required for the "most demanding" n by n boards (those requiring the most moves) and show that the number grows as fast as Theta(n*c^0.5). Finally, we consider boards where the colours of the tiles are chosen at random and show that for c>=2, the number of moves required to flood the whole board is Omega(n) with high probability.
△ Less
Submitted 9 June, 2011; v1 submitted 25 January, 2010;
originally announced January 2010.
-
An Empirical Study of Cache-Oblivious Priority Queues and their Application to the Shortest Path Problem
Authors:
Benjamin Sach,
Raphaël Clifford
Abstract:
In recent years the Cache-Oblivious model of external memory computation has provided an attractive theoretical basis for the analysis of algorithms on massive datasets. Much progress has been made in discovering algorithms that are asymptotically optimal or near optimal. However, to date there are still relatively few successful experimental studies. In this paper we compare two different Cache…
▽ More
In recent years the Cache-Oblivious model of external memory computation has provided an attractive theoretical basis for the analysis of algorithms on massive datasets. Much progress has been made in discovering algorithms that are asymptotically optimal or near optimal. However, to date there are still relatively few successful experimental studies. In this paper we compare two different Cache-Oblivious priority queues based on the Funnel and Bucket Heap and apply them to the single source shortest path problem on graphs with positive edge weights. Our results show that when RAM is limited and data is swap** to external storage, the Cache-Oblivious priority queues achieve orders of magnitude speedups over standard internal memory techniques. However, for the single source shortest path problem both on simulated and real world graph data, these speedups are markedly lower due to the time required to access the graph adjacency list itself.
△ Less
Submitted 7 February, 2008;
originally announced February 2008.
-
Scheduling Algorithms for Procrastinators
Authors:
Michael A. Bender,
Raphael Clifford,
Kostas Tsichlas
Abstract:
This paper presents scheduling algorithms for procrastinators, where the speed that a procrastinator executes a job increases as the due date approaches. We give optimal off-line scheduling policies for linearly increasing speed functions. We then explain the computational/numerical issues involved in implementing this policy. We next explore the online setting, showing that there exist adversar…
▽ More
This paper presents scheduling algorithms for procrastinators, where the speed that a procrastinator executes a job increases as the due date approaches. We give optimal off-line scheduling policies for linearly increasing speed functions. We then explain the computational/numerical issues involved in implementing this policy. We next explore the online setting, showing that there exist adversaries that force any online scheduling policy to miss due dates. This impossibility result motivates the problem of minimizing the maximum interval stretch of any job; the interval stretch of a job is the job's flow time divided by the job's due date minus release time. We show that several common scheduling strategies, including the "hit-the-highest-nail" strategy beloved by procrastinators, have arbitrarily large maximum interval stretch. Then we give the "thrashing" scheduling policy and show that it is a Θ(1) approximation algorithm for the maximum interval stretch.
△ Less
Submitted 14 August, 2007; v1 submitted 14 June, 2006;
originally announced June 2006.