-
On Hardness of Jumbled Indexing
Authors:
Amihood Amir,
Timothy Chan,
Moshe Lewenstein,
Noa Lewenstein
Abstract:
Jumbled indexing is the problem of indexing a text $T$ for queries that ask whether there is a substring of $T$ matching a pattern represented as a Parikh vector, i.e., the vector of frequency counts for each character. Jumbled indexing has garnered a lot of interest in the last four years. There is a naive algorithm that preprocesses all answers in $O(n^2|Σ|)$ time allowing quick queries afterwar…
▽ More
Jumbled indexing is the problem of indexing a text $T$ for queries that ask whether there is a substring of $T$ matching a pattern represented as a Parikh vector, i.e., the vector of frequency counts for each character. Jumbled indexing has garnered a lot of interest in the last four years. There is a naive algorithm that preprocesses all answers in $O(n^2|Σ|)$ time allowing quick queries afterwards, and there is another naive algorithm that requires no preprocessing but has $O(n\log|Σ|)$ query time. Despite a tremendous amount of effort there has been little improvement over these running times.
In this paper we provide good reason for this. We show that, under a 3SUM-hardness assumption, jumbled indexing for alphabets of size $ω(1)$ requires $Ω(n^{2-ε})$ preprocessing time or $Ω(n^{1-δ})$ query time for any $ε,δ>0$. In fact, under a stronger 3SUM-hardness assumption, for any constant alphabet size $r\ge 3$ there exist describable fixed constant $ε_r$ and $δ_r$ such that jumbled indexing requires $Ω(n^{2-ε_r})$ preprocessing time or $Ω(n^{1-δ_r})$ query time.
△ Less
Submitted 1 May, 2014;
originally announced May 2014.
-
Managing Unbounded-Length Keys in Comparison-Driven Data Structures with Applications to On-Line Indexing
Authors:
Amihood Amir,
Gianni Franceschini,
Roberto Grossi,
Tsvi Kopelowitz,
Moshe Lewenstein,
Noa Lewenstein
Abstract:
This paper presents a general technique for optimally transforming any dynamic data structure that operates on atomic and indivisible keys by constant-time comparisons, into a data structure that handles unbounded-length keys whose comparison cost is not a constant. Examples of these keys are strings, multi-dimensional points, multiple-precision numbers, multi-key data (e.g.~records), XML paths, U…
▽ More
This paper presents a general technique for optimally transforming any dynamic data structure that operates on atomic and indivisible keys by constant-time comparisons, into a data structure that handles unbounded-length keys whose comparison cost is not a constant. Examples of these keys are strings, multi-dimensional points, multiple-precision numbers, multi-key data (e.g.~records), XML paths, URL addresses, etc. The technique is more general than what has been done in previous work as no particular exploitation of the underlying structure of is required. The only requirement is that the insertion of a key must identify its predecessor or its successor.
Using the proposed technique, online suffix tree can be constructed in worst case time $O(\log n)$ per input symbol (as opposed to amortized $O(\log n)$ time per symbol, achieved by previously known algorithms). To our knowledge, our algorithm is the first that achieves $O(\log n)$ worst case time per input symbol. Searching for a pattern of length $m$ in the resulting suffix tree takes $O(\min(m\log |Σ|, m + \log n) + tocc)$ time, where $tocc$ is the number of occurrences of the pattern. The paper also describes more applications and show how to obtain alternative methods for dealing with suffix sorting, dynamic lowest common ancestors and order maintenance.
△ Less
Submitted 3 June, 2013;
originally announced June 2013.
-
Pattern Matching under Polynomial Transformation
Authors:
Ayelet Butman,
Peter Clifford,
Raphael Clifford,
Markus Jalsenius,
Noa Lewenstein,
Benny Porat,
Ely Porat,
Benjamin Sach
Abstract:
We consider a class of pattern matching problems where a normalising transformation is applied at every alignment. Normalised pattern matching plays a key role in fields as diverse as image processing and musical information processing where application specific transformations are often applied to the input. By considering the class of polynomial transformations of the input, we provide fast algo…
▽ More
We consider a class of pattern matching problems where a normalising transformation is applied at every alignment. Normalised pattern matching plays a key role in fields as diverse as image processing and musical information processing where application specific transformations are often applied to the input. By considering the class of polynomial transformations of the input, we provide fast algorithms and the first lower bounds for both new and old problems. Given a pattern of length m and a longer text of length n where both are assumed to contain integer values only, we first show O(n log m) time algorithms for pattern matching under linear transformations even when wildcard symbols can occur in the input. We then show how to extend the technique to polynomial transformations of arbitrary degree. Next we consider the problem of finding the minimum Hamming distance under polynomial transformation. We show that, for any epsilon>0, there cannot exist an O(n m^(1-epsilon)) time algorithm for additive and linear transformations conditional on the hardness of the classic 3SUM problem. Finally, we consider a version of the Hamming distance problem under additive transformations with a bound k on the maximum distance that need be reported. We give a deterministic O(nk log k) time solution which we then improve by careful use of randomisation to O(n sqrt(k log k) log n) time for sufficiently small k. Our randomised solution outputs the correct answer at every position with high probability.
△ Less
Submitted 28 October, 2011; v1 submitted 7 September, 2011;
originally announced September 2011.