Search | arXiv e-print repository

Explicit Good Codes Approaching Distance 1 in Ulam Metric

Authors: Elazar Goldenberg, Mursalin Habib, Karthik C. S

Abstract: The Ulam distance of two permutations on $[n]$ is $n$ minus the length of their longest common subsequence. In this paper, we show that for every $\varepsilon>0$, there exists some $α>0$, and an infinite set $Γ\subseteq \mathbb{N}$, such that for all $n\inΓ$, there is an explicit set $C_n$ of $(n!)^α$ many permutations on $[n]$, such that every pair of permutations in $C_n$ has pairwise Ulam dista… ▽ More The Ulam distance of two permutations on $[n]$ is $n$ minus the length of their longest common subsequence. In this paper, we show that for every $\varepsilon>0$, there exists some $α>0$, and an infinite set $Γ\subseteq \mathbb{N}$, such that for all $n\inΓ$, there is an explicit set $C_n$ of $(n!)^α$ many permutations on $[n]$, such that every pair of permutations in $C_n$ has pairwise Ulam distance at least $(1-\varepsilon)\cdot n$. Moreover, we can compute the $i^{\text{th}}$ permutation in $C_n$ in poly$(n)$ time and can also decode in poly$(n)$ time, a permutation $π$ on $[n]$ to its closest permutation $π^*$ in $C_n$, if the Ulam distance of $π$ and $π^*$ is less than $ \frac{(1-\varepsilon)\cdot n}{4} $. Previously, it was implicitly known by combining works of Goldreich and Wigderson [Israel Journal of Mathematics'23] and Farnoud, Skachek, and Milenkovic [IEEE Transactions on Information Theory'13] in a black-box manner, that it is possible to explicitly construct $(n!)^{Ω(1)}$ many permutations on $[n]$, such that every pair of them have pairwise Ulam distance at least $\frac{n}{6}\cdot (1-\varepsilon)$, for any $\varepsilon>0$, and the bound on the distance can be improved to $\frac{n}{4}\cdot (1-\varepsilon)$ if the construction of Goldreich and Wigderson is directly analyzed in the Ulam metric. △ Less

Submitted 11 May, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

arXiv:2305.16878 [pdf, ps, other]

Can You Solve Closest String Faster than Exhaustive Search?

Authors: Amir Abboud, Nick Fischer, Elazar Goldenberg, Karthik C. S., Ron Safier

Abstract: We study the fundamental problem of finding the best string to represent a given set, in the form of the Closest String problem: Given a set $X \subseteq Σ^d$ of $n$ strings, find the string $x^*$ minimizing the radius of the smallest Hamming ball around $x^*$ that encloses all the strings in $X$. In this paper, we investigate whether the Closest String problem admits algorithms that are faster th… ▽ More We study the fundamental problem of finding the best string to represent a given set, in the form of the Closest String problem: Given a set $X \subseteq Σ^d$ of $n$ strings, find the string $x^*$ minimizing the radius of the smallest Hamming ball around $x^*$ that encloses all the strings in $X$. In this paper, we investigate whether the Closest String problem admits algorithms that are faster than the trivial exhaustive search algorithm. We obtain the following results for the two natural versions of the problem: $\bullet$ In the continuous Closest String problem, the goal is to find the solution string $x^*$ anywhere in $Σ^d$. For binary strings, the exhaustive search algorithm runs in time $O(2^d poly(nd))$ and we prove that it cannot be improved to time $O(2^{(1-ε) d} poly(nd))$, for any $ε> 0$, unless the Strong Exponential Time Hypothesis fails. $\bullet$ In the discrete Closest String problem, $x^*$ is required to be in the input set $X$. While this problem is clearly in polynomial time, its fine-grained complexity has been pinpointed to be quadratic time $n^{2 \pm o(1)}$ whenever the dimension is $ω(\log n) < d < n^{o(1)}$. We complement this known hardness result with new algorithms, proving essentially that whenever $d$ falls out of this hard range, the discrete Closest String problem can be solved faster than exhaustive search. In the small-$d$ regime, our algorithm is based on a novel application of the inclusion-exclusion principle. Interestingly, all of our results apply (and some are even stronger) to the natural dual of the Closest String problem, called the Remotest String problem, where the task is to find a string maximizing the Hamming distance to all the strings in $X$. △ Less

Submitted 29 May, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

arXiv:2211.12496 [pdf, other]

An Algorithmic Bridge Between Hamming and Levenshtein Distances

Authors: Elazar Goldenberg, Tomasz Kociumaka, Robert Krauthgamer, Barna Saha

Abstract: The edit distance between strings classically assigns unit cost to every character insertion, deletion, and substitution, whereas the Hamming distance only allows substitutions. In many real-life scenarios, insertions and deletions (abbreviated indels) appear frequently but significantly less so than substitutions. To model this, we consider substitutions being cheaper than indels, with cost… ▽ More The edit distance between strings classically assigns unit cost to every character insertion, deletion, and substitution, whereas the Hamming distance only allows substitutions. In many real-life scenarios, insertions and deletions (abbreviated indels) appear frequently but significantly less so than substitutions. To model this, we consider substitutions being cheaper than indels, with cost $1/a$ for a parameter $a\ge 1$. This basic variant, denoted $ED_a$, bridges classical edit distance ($a=1$) with Hamming distance ($a\to\infty$), leading to interesting algorithmic challenges: Does the time complexity of computing $ED_a$ interpolate between that of Hamming distance (linear time) and edit distance (quadratic time)? What about approximating $ED_a$? We first present a simple deterministic exact algorithm for $ED_a$ and further prove that it is near-optimal assuming the Orthogonal Vectors Conjecture. Our main result is a randomized algorithm computing a $(1+ε)$-approximation of $ED_a(X,Y)$, given strings $X,Y$ of total length $n$ and a bound $k\ge ED_a(X,Y)$. For simplicity, let us focus on $k\ge 1$ and a constant $ε> 0$; then, our algorithm takes $\tilde{O}(n/a + ak^3)$ time. Unless $a=\tilde{O}(1)$ and for small enough $k$, this running time is sublinear in $n$. We also consider a very natural version that asks to find a $(k_I, k_S)$-alignment -- an alignment with at most $k_I$ indels and $k_S$ substitutions. In this setting, we give an exact algorithm and, more importantly, an $\tilde{O}(nk_I/k_S + k_S\cdot k_I^3)$-time $(1,1+ε)$-bicriteria approximation algorithm. The latter solution is based on the techniques we develop for $ED_a$ for $a=Θ(k_S / k_I)$. These bounds are in stark contrast to unit-cost edit distance, where state-of-the-art algorithms are far from achieving $(1+ε)$-approximation in sublinear time, even for a favorable choice of $k$. △ Less

Submitted 22 November, 2022; originally announced November 2022.

Comments: The full version of a paper accepted to ITCS 2023; abstract shortened to meet arXiv requirements

ACM Class: F.2.2

arXiv:2111.12706 [pdf, ps, other]

Gap Edit Distance via Non-Adaptive Queries: Simple and Optimal

Authors: Elazar Goldenberg, Tomasz Kociumaka, Robert Krauthgamer, Barna Saha

Abstract: We study the problem of approximating edit distance in sublinear time. This is formalized as the $(k,k^c)$-Gap Edit Distance problem, where the input is a pair of strings $X,Y$ and parameters $k,c>1$, and the goal is to return YES if $ED(X,Y)\leq k$, NO if $ED(X,Y)> k^c$, and an arbitrary answer when $k < ED(X,Y) \le k^c$. Recent years have witnessed significant interest in designing sublinear-tim… ▽ More We study the problem of approximating edit distance in sublinear time. This is formalized as the $(k,k^c)$-Gap Edit Distance problem, where the input is a pair of strings $X,Y$ and parameters $k,c>1$, and the goal is to return YES if $ED(X,Y)\leq k$, NO if $ED(X,Y)> k^c$, and an arbitrary answer when $k < ED(X,Y) \le k^c$. Recent years have witnessed significant interest in designing sublinear-time algorithms for Gap Edit Distance. In this work, we resolve the non-adaptive query complexity of Gap Edit Distance for the entire range of parameters, improving over a sequence of previous results. Specifically, we design a non-adaptive algorithm with query complexity $\tilde{O}(n/k^{c-0.5})$, and we further prove that this bound is optimal up to polylogarithmic factors. Our algorithm also achieves optimal time complexity $\tilde{O}(n/k^{c-0.5})$ whenever $c\geq 1.5$. For $1<c<1.5$, the running time of our algorithm is $\tilde{O}(n/k^{2c-1})$. In the restricted case of $k^c=Ω(n)$, this matches a known result [Batu, Ergün, Kilian, Magen, Raskhodnikova, Rubinfeld, and Sami; STOC 2003], and in all other (nontrivial) cases, our running time is strictly better than all previous algorithms, including the adaptive ones. However, an independent work of Bringmann, Cassis, Fischer, and Nakos [STOC 2022] provides an adaptive algorithm that bypasses the non-adaptive lower bound, but only for small enough $k$ and $c$. △ Less

Submitted 2 October, 2022; v1 submitted 24 November, 2021; originally announced November 2021.

Comments: Accepted to FOCS 2022

arXiv:2108.09115 [pdf, ps, other]

Does Preprocessing help in Fast Sequence Comparisons?

Authors: Elazar Goldenberg, Aviad Rubinstein, Barna Saha

Abstract: We study edit distance computation with preprocessing: the preprocessing algorithm acts on each string separately, and then the query algorithm takes as input the two preprocessed strings. This model is inspired by scenarios where we would like to compute edit distance between many pairs in the same pool of strings. Our results include: Permutation-LCS: If the LCS between two permutations has… ▽ More We study edit distance computation with preprocessing: the preprocessing algorithm acts on each string separately, and then the query algorithm takes as input the two preprocessed strings. This model is inspired by scenarios where we would like to compute edit distance between many pairs in the same pool of strings. Our results include: Permutation-LCS: If the LCS between two permutations has length $n-k$, we can compute it \textit{ exactly} with $O(n \log(n))$ preprocessing and $O(k \log(n))$ query time. Small edit distance: For general strings, if their edit distance is at most $k$, we can compute it \textit{ exactly} with $O(n\log(n))$ preprocessing and $O(k^2 \log(n))$ query time. Approximate edit distance: For the most general input, we can approximate the edit distance to within factor $(7+o(1))$ with preprocessing time $\tilde{O}(n^2)$ and query time $\tilde{O}(n^{1.5+o(1)})$. All of these results significantly improve over the state of the art in edit distance computation without preprocessing. Interestingly, by combining ideas from our algorithms with preprocessing, we provide new improved results for approximating edit distance without preprocessing in subquadratic time. △ Less

Submitted 20 August, 2021; originally announced August 2021.

ACM Class: F.2.2

arXiv:1910.00901 [pdf, ps, other]

Sublinear Algorithms for Gap Edit Distance

Authors: Elazar Goldenberg, Robert Krauthgamer, Barna Saha

Abstract: The edit distance is a way of quantifying how similar two strings are to one another by counting the minimum number of character insertions, deletions, and substitutions required to transform one string into the other. A simple dynamic programming computes the edit distance between two strings of length $n$ in $O(n^2)$ time, and a more sophisticated algorithm runs in time $O(n+t^2)$ when the edit… ▽ More The edit distance is a way of quantifying how similar two strings are to one another by counting the minimum number of character insertions, deletions, and substitutions required to transform one string into the other. A simple dynamic programming computes the edit distance between two strings of length $n$ in $O(n^2)$ time, and a more sophisticated algorithm runs in time $O(n+t^2)$ when the edit distance is $t$ [Landau, Myers and Schmidt, SICOMP 1998]. In pursuit of obtaining faster running time, the last couple of decades have seen a flurry of research on approximating edit distance, including polylogarithmic approximation in near-linear time [Andoni, Krauthgamer and Onak, FOCS 2010], and a constant-factor approximation in subquadratic time [Chakrabarty, Das, Goldenberg, Koucký and Saks, FOCS 2018]. We study sublinear-time algorithms for small edit distance, which was investigated extensively because of its numerous applications. Our main result is an algorithm for distinguishing whether the edit distance is at most $t$ or at least $t^2$ (the quadratic gap problem) in time $\tilde{O}(\frac{n}{t}+t^3)$. This time bound is sublinear roughly for all $t$ in $[ω(1), o(n^{1/3})]$, which was not known before. The best previous algorithms solve this problem in sublinear time only for $t=ω(n^{1/3})$ [Andoni and Onak, STOC 2009]. Our algorithm is based on a new approach that adaptively switches between uniform sampling and reading contiguous blocks of the input strings. In contrast, all previous algorithms choose which coordinates to query non-adaptively. Moreover, it can be extended to solve the $t$ vs $t^{2-ε}$ gap problem in time $\tilde{O}(\frac{n}{t^{1-ε}}+t^3)$. △ Less

Submitted 2 October, 2019; originally announced October 2019.

arXiv:1908.10248 [pdf, ps, other]

Hardness Amplification of Optimization Problems

Authors: Elazar Goldenberg, Karthik C. S.

Abstract: In this paper, we prove a general hardness amplification scheme for optimization problems based on the technique of direct products. We say that an optimization problem $Π$ is direct product feasible if it is possible to efficiently aggregate any $k$ instances of $Π$ and form one large instance of $Π$ such that given an optimal feasible solution to the larger instance, we can efficiently find opti… ▽ More In this paper, we prove a general hardness amplification scheme for optimization problems based on the technique of direct products. We say that an optimization problem $Π$ is direct product feasible if it is possible to efficiently aggregate any $k$ instances of $Π$ and form one large instance of $Π$ such that given an optimal feasible solution to the larger instance, we can efficiently find optimal feasible solutions to all the $k$ smaller instances. Given a direct product feasible optimization problem $Π$, our hardness amplification theorem may be informally stated as follows: If there is a distribution $\mathcal{D}$ over instances of $Π$ of size $n$ such that every randomized algorithm running in time $t(n)$ fails to solve $Π$ on $\frac{1}{α(n)}$ fraction of inputs sampled from $\mathcal{D}$, then, assuming some relationships on $α(n)$ and $t(n)$, there is a distribution $\mathcal{D}'$ over instances of $Π$ of size $O(n\cdot α(n))$ such that every randomized algorithm running in time $\frac{t(n)}{poly(α(n))}$ fails to solve $Π$ on $\frac{99}{100}$ fraction of inputs sampled from $\mathcal{D}'$. As a consequence of the above theorem, we show hardness amplification of problems in various classes such as NP-hard problems like Max-Clique, Knapsack, and Max-SAT, problems in P such as Longest Common Subsequence, Edit Distance, Matrix Multiplication, and even problems in TFNP such as Factoring and computing Nash equilibrium. △ Less

Submitted 27 August, 2019; originally announced August 2019.

arXiv:1901.06220 [pdf, ps, other]

Towards a General Direct Product Testing Theorem

Authors: Elazar Goldenberg, Karthik C. S.

Abstract: The Direct Product encoding of a string $a\in \{0,1\}^n$ on an underlying domain $V\subseteq \binom{n}{k}$, is a function DP$_V(a)$ which gets as input a set $S\in V$ and outputs $a$ restricted to $S$. In the Direct Product Testing Problem, we are given a function $F:V\to \{0,1\}^k$, and our goal is to test whether $F$ is close to a direct product encoding, i.e., whether there exists some… ▽ More The Direct Product encoding of a string $a\in \{0,1\}^n$ on an underlying domain $V\subseteq \binom{n}{k}$, is a function DP$_V(a)$ which gets as input a set $S\in V$ and outputs $a$ restricted to $S$. In the Direct Product Testing Problem, we are given a function $F:V\to \{0,1\}^k$, and our goal is to test whether $F$ is close to a direct product encoding, i.e., whether there exists some $a\in \{0,1\}^n$ such that on most sets $S$, we have $F(S)=$DP$_V(a)(S)$. A natural test is as follows: select a pair $(S,S')\in V$ according to some underlying distribution over $V\times V$, query $F$ on this pair, and check for consistency on their intersection. Note that the above distribution may be viewed as a weighted graph over the vertex set $V$ and is referred to as a test graph. The testability of direct products was studied over various specific domains and test graphs (for example see Dinur-Steurer [CCC'14]; Dinur-Kaufman [FOCS'17]). In this paper, we study the testability of direct products in a general setting, addressing the question: what properties of the domain and the test graph allow one to prove a direct product testing theorem? Towards this goal we introduce the notion of coordinate expansion of a test graph. Roughly speaking a test graph is a coordinate expander if it has global and local expansion, and has certain nice intersection properties on sampling. We show that whenever the test graph has coordinate expansion then it admits a direct product testing theorem. Additionally, for every $k$ and $n$ we provide a direct product domain $V\subseteq \binom{n}{k}$ of size $n$, called the Sliding Window domain for which we prove direct product testability. △ Less

Submitted 18 January, 2019; originally announced January 2019.

arXiv:1810.03664 [pdf, ps, other]

doi 10.1145/3422823

Approximating Edit Distance Within Constant Factor in Truly Sub-Quadratic Time

Authors: Diptarka Chakraborty, Debarati Das, Elazar Goldenberg, Michal Koucky, Michael Saks

Abstract: Edit distance is a measure of similarity of two strings based on the minimum number of character insertions, deletions, and substitutions required to transform one string into the other. The edit distance can be computed exactly using a dynamic programming algorithm that runs in quadratic time. Andoni, Krauthgamer, and Onak (2010) gave a nearly linear time algorithm that approximates edit distance… ▽ More Edit distance is a measure of similarity of two strings based on the minimum number of character insertions, deletions, and substitutions required to transform one string into the other. The edit distance can be computed exactly using a dynamic programming algorithm that runs in quadratic time. Andoni, Krauthgamer, and Onak (2010) gave a nearly linear time algorithm that approximates edit distance within an approximation factor $\text{poly}(\log n)$. In this paper, we provide an algorithm with running time $\tilde{O}(n^{2-2/7})$ that approximates the edit distance within a constant factor. △ Less

Submitted 15 February, 2021; v1 submitted 8 October, 2018; originally announced October 2018.

ACM Class: F.2.0

Journal ref: Journal of the ACM, Volume 67, Issue 6, October 2020, Article No.: 36, Page number: 1-22

arXiv:1607.03718 [pdf, ps, other]

Streaming Algorithms For Computing Edit Distance Without Exploiting Suffix Trees

Authors: Diptarka Chakraborty, Elazar Goldenberg, Michal Koucký

Abstract: The edit distance is a way of quantifying how similar two strings are to one another by counting the minimum number of character insertions, deletions, and substitutions required to transform one string into the other. In this paper we study the computational problem of computing the edit distance between a pair of strings where their distance is bounded by a parameter $k\ll n$. We present two s… ▽ More The edit distance is a way of quantifying how similar two strings are to one another by counting the minimum number of character insertions, deletions, and substitutions required to transform one string into the other. In this paper we study the computational problem of computing the edit distance between a pair of strings where their distance is bounded by a parameter $k\ll n$. We present two streaming algorithms for computing edit distance: One runs in time $O(n+k^2)$ and the other $n+O(k^3)$. By writing $n+O(k^3)$ we want to emphasize that the number of operations per an input symbol is a small constant. In particular, the running time does not depend on the alphabet size, and the algorithm should be easy to implement. Previously a streaming algorithm with running time $O(n+k^4)$ was given in the paper by the current authors (STOC'16). The best off-line algorithm runs in time $O(n+k^2)$ (Landau et al., 1998) which is known to be optimal under the Strong Exponential Time Hypothesis. △ Less

Submitted 13 July, 2016; originally announced July 2016.

arXiv:cs/0606126 [pdf]

May We Have Your Attention: Analysis of a Selective Attention Task

Authors: Eldan Goldenberg, Jacob R. Garcowski, Randall D. Beer

Abstract: In this paper we present a deeper analysis than has previously been carried out of a selective attention problem, and the evolution of continuous-time recurrent neural networks to solve it. We show that the task has a rich structure, and agents must solve a variety of subproblems to perform well. We consider the relationship between the complexity of an agent and the ease with which it can evolv… ▽ More In this paper we present a deeper analysis than has previously been carried out of a selective attention problem, and the evolution of continuous-time recurrent neural networks to solve it. We show that the task has a rich structure, and agents must solve a variety of subproblems to perform well. We consider the relationship between the complexity of an agent and the ease with which it can evolve behavior that generalizes well across subproblems, and demonstrate a sha** protocol that improves generalization. △ Less

Submitted 29 June, 2006; originally announced June 2006.

Comments: In S. Schaal, A. Ijspeert, A. Billard, S. Vijayakumar, J. Hallam & J-A. Meyer (Eds.), From Animals to Animats 8: Proceedings of the Eighth International Conference on the Simulation of Adaptive Behavior (pp 49-56). MIT Press

Showing 1–11 of 11 results for author: Goldenberg, E