-
Pathway: a fast and flexible unified stream data processing framework for analytical and Machine Learning applications
Authors:
Michal Bartoszkiewicz,
Jan Chorowski,
Adrian Kosowski,
Jakub Kowalski,
Sergey Kulik,
Mateusz Lewandowski,
Krzysztof Nowicki,
Kamil Piechowiak,
Olivier Ruas,
Zuzanna Stamirowska,
Przemyslaw Uznanski
Abstract:
We present Pathway, a new unified data processing framework that can run workloads on both bounded and unbounded data streams. The framework was created with the original motivation of resolving challenges faced when analyzing and processing data from the physical economy, including streams of data generated by IoT and enterprise systems. These required rapid reaction while calling for the applica…
▽ More
We present Pathway, a new unified data processing framework that can run workloads on both bounded and unbounded data streams. The framework was created with the original motivation of resolving challenges faced when analyzing and processing data from the physical economy, including streams of data generated by IoT and enterprise systems. These required rapid reaction while calling for the application of advanced computation paradigms (machinelearning-powered analytics, contextual analysis, and other elements of complex event processing). Pathway is equipped with a Table API tailored for Python and Python/SQL workflows, and is powered by a distributed incremental dataflow in Rust. We describe the system and present benchmarking results which demonstrate its capabilities in both batch and streaming contexts, where it is able to surpass state-of-the-art industry frameworks in both scenarios. We also discuss streaming use cases handled by Pathway which cannot be easily resolved with state-of-the-art industry frameworks, such as streaming iterative graph algorithms (PageRank, etc.).
△ Less
Submitted 12 July, 2023;
originally announced July 2023.
-
Noisy searching: simple, fast and correct
Authors:
Dariusz Dereniowski,
Aleksander Łukasiewicz,
Przemysław Uznański
Abstract:
This work considers the problem of the noisy binary search in a sorted array. The noise is modeled by a parameter $p$ that dictates that a comparison can be incorrect with probability $p$, independently of other queries. We state two types of upper bounds on the number of queries: the worst-case and expected query complexity scenarios. The bounds improve the ones known to date, i.e., our algorithm…
▽ More
This work considers the problem of the noisy binary search in a sorted array. The noise is modeled by a parameter $p$ that dictates that a comparison can be incorrect with probability $p$, independently of other queries. We state two types of upper bounds on the number of queries: the worst-case and expected query complexity scenarios. The bounds improve the ones known to date, i.e., our algorithms require fewer queries. Additionally, they have simpler statements, and work for the full range of parameters. All query complexities for the expected query scenarios are tight up to lower order terms. For the problem where target prior is uniform over all possible inputs, we provide algorithm with expected complexity upperbounded by $(\log_2 n + \log_2 δ^{-1} + 3)/I(p)$, where $n$ is the domain size, $0\le p < 1/2$ is the noise ratio, and $δ>0$ is the failure probability, and $I(p)$ is the information gain function. As a side-effect, we close some correctness issues regarding previous work. Also, en route, we obtain new and improved query complexities for the search generalized to arbitrary graphs. This paper continues and improves upon the lines of research of Burnashev-Zigangirov [Prob. Per. Informatsii, 1974], Ben-Or and Hassidim [FOCS 2008], Gu and Xu [STOC 2023], and Emamjomeh-Zadeh et al. [STOC 2016], Dereniowski et al. [SOSA@SODA 2019].
△ Less
Submitted 15 July, 2023; v1 submitted 12 July, 2021;
originally announced July 2021.
-
A time and space optimal stable population protocol solving exact majority
Authors:
David Doty,
Mahsa Eftekhari,
Leszek Gąsieniec,
Eric Severson,
Grzegorz Stachowiak,
Przemysław Uznański
Abstract:
We study population protocols, a model of distributed computing appropriate for modeling well-mixed chemical reaction networks and other physical systems where agents exchange information in pairwise interactions, but have no control over their schedule of interaction partners. The well-studied *majority* problem is that of determining in an initial population of $n$ agents, each with one of two o…
▽ More
We study population protocols, a model of distributed computing appropriate for modeling well-mixed chemical reaction networks and other physical systems where agents exchange information in pairwise interactions, but have no control over their schedule of interaction partners. The well-studied *majority* problem is that of determining in an initial population of $n$ agents, each with one of two opinions $A$ or $B$, whether there are more $A$, more $B$, or a tie. A *stable* protocol solves this problem with probability 1 by eventually entering a configuration in which all agents agree on a correct consensus decision of $\mathsf{A}$, $\mathsf{B}$, or $\mathsf{T}$, from which the consensus cannot change. We describe a protocol that solves this problem using $O(\log n)$ states ($\log \log n + O(1)$ bits of memory) and optimal expected time $O(\log n)$. The number of states $O(\log n)$ is known to be optimal for the class of polylogarithmic time stable protocols that are "output dominant" and "monotone". These are two natural constraints satisfied by our protocol, making it simultaneously time- and state-optimal for that class. We introduce a key technique called a "fixed resolution clock" to achieve partial synchronization.
Our protocol is *nonuniform*: the transition function has the value $\left \lceil {\log n} \right \rceil$ encoded in it. We show that the protocol can be modified to be uniform, while increasing the state complexity to $Θ(\log n \log \log n)$.
△ Less
Submitted 20 January, 2022; v1 submitted 4 June, 2021;
originally announced June 2021.
-
The Dynamic k-Mismatch Problem
Authors:
Raphaël Clifford,
Paweł Gawrychowski,
Tomasz Kociumaka,
Daniel P. Martin,
Przemysław Uznański
Abstract:
The text-to-pattern Hamming distances problem asks to compute the Hamming distances between a given pattern of length $m$ and all length-$m$ substrings of a given text of length $n\ge m$. We focus on the $k$-mismatch version of the problem, where a distance needs to be returned only if it does not exceed a threshold $k$. We assume $n\le 2m$ (in general, one can partition the text into overlap**…
▽ More
The text-to-pattern Hamming distances problem asks to compute the Hamming distances between a given pattern of length $m$ and all length-$m$ substrings of a given text of length $n\ge m$. We focus on the $k$-mismatch version of the problem, where a distance needs to be returned only if it does not exceed a threshold $k$. We assume $n\le 2m$ (in general, one can partition the text into overlap** blocks). In this work, we show data structures for the dynamic version of this problem supporting two operations: An update performs a single-letter substitution in the pattern or the text, and a query, given an index $i$, returns the Hamming distance between the pattern and the text substring starting at position $i$, or reports that it exceeds $k$.
First, we show a data structure with $\tilde{O}(1)$ update and $\tilde{O}(k)$ query time. Then we show that $\tilde{O}(k)$ update and $\tilde{O}(1)$ query time is also possible. These two provide an optimal trade-off for the dynamic $k$-mismatch problem with $k \le \sqrt{n}$: we prove that, conditioned on the strong 3SUM conjecture, one cannot simultaneously achieve $k^{1-Ω(1)}$ time for all operations.
For $k\ge \sqrt{n}$, we give another lower bound, conditioned on the Online Matrix-Vector conjecture, that excludes algorithms taking $n^{1/2-Ω(1)}$ time per operation. This is tight for constant-sized alphabets: Clifford et al. (STACS 2018) achieved $\tilde{O}(\sqrt{n})$ time per operation in that case, but with $\tilde{O}(n^{3/4})$ time per operation for large alphabets. We improve and extend this result with an algorithm that, given $1\le x\le k$, achieves update time $\tilde{O}(\frac{n}{k} +\sqrt{\frac{nk}{x}})$ and query time $\tilde{O}(x)$. In particular, for $k\ge \sqrt{n}$, an appropriate choice of $x$ yields $\tilde{O}(\sqrt[3]{nk})$ time per operation, which is $\tilde{O}(n^{2/3})$ when no threshold $k$ is provided.
△ Less
Submitted 28 March, 2022; v1 submitted 13 May, 2021;
originally announced May 2021.
-
Time and Space Optimal Exact Majority Population Protocols
Authors:
Leszek Gąsieniec,
Grzegorz Stachowiak,
Przemysław Uznański
Abstract:
In this paper we study population protocols governed by the {\em random scheduler}, which uniformly at random selects pairwise interactions between $n$ agents. The main result of this paper is the first time and space optimal {\em exact majority population protocol} which also works with high probability. The new protocol operates in the optimal {\em parallel time} $O(\log n),$ which is equivalent…
▽ More
In this paper we study population protocols governed by the {\em random scheduler}, which uniformly at random selects pairwise interactions between $n$ agents. The main result of this paper is the first time and space optimal {\em exact majority population protocol} which also works with high probability. The new protocol operates in the optimal {\em parallel time} $O(\log n),$ which is equivalent to $O(n\log n)$ sequential {\em pairwise interactions}, where each agent utilises the optimal number of $O(\log n)$ states.
The time optimality of the new majority protocol is possible thanks to the novel concept of fixed-resolution phase clocks introduced and analysed in this paper. The new phase clock allows to count approximately constant parallel time in population protocols.
△ Less
Submitted 26 June, 2021; v1 submitted 14 November, 2020;
originally announced November 2020.
-
Cardinality estimation using Gumbel distribution
Authors:
Aleksander Łukasiewicz,
Przemysław Uznański
Abstract:
Cardinality estimation is the task of approximating the number of distinct elements in a large dataset with possibly repeating elements. LogLog and HyperLogLog (c.f. Durand and Flajolet [ESA 2003], Flajolet et al. [Discrete Math Theor. 2007]) are small space sketching schemes for cardinality estimation, which have both strong theoretical guarantees of performance and are highly effective in practi…
▽ More
Cardinality estimation is the task of approximating the number of distinct elements in a large dataset with possibly repeating elements. LogLog and HyperLogLog (c.f. Durand and Flajolet [ESA 2003], Flajolet et al. [Discrete Math Theor. 2007]) are small space sketching schemes for cardinality estimation, which have both strong theoretical guarantees of performance and are highly effective in practice. This makes them a highly popular solution with many implementations in big-data systems (e.g. Algebird, Apache DataSketches, BigQuery, Presto and Redis). However, despite having simple and elegant formulation, both the analysis of LogLog and HyperLogLog are extremely involved -- spanning over tens of pages of analytic combinatorics and complex function analysis.
We propose a modification to both LogLog and HyperLogLog that replaces discrete geometric distribution with a continuous Gumbel distribution. This leads to a very short, simple and elementary analysis of estimation guarantees, and smoother behavior of the estimator.
△ Less
Submitted 17 August, 2020;
originally announced August 2020.
-
All-Pairs LCA in DAGs: Breaking through the $O(n^{2.5})$ barrier
Authors:
Fabrizio Grandoni,
Giuseppe F. Italiano,
Aleksander Łukasiewicz,
Nikos Parotsidis,
Przemysław Uznański
Abstract:
Let $G=(V,E)$ be an $n$-vertex directed acyclic graph (DAG). A lowest common ancestor (LCA) of two vertices $u$ and $v$ is a common ancestor $w$ of $u$ and $v$ such that no descendant of $w$ has the same property. In this paper, we consider the problem of computing an LCA, if any, for all pairs of vertices in a DAG. The fastest known algorithms for this problem exploit fast matrix multiplication s…
▽ More
Let $G=(V,E)$ be an $n$-vertex directed acyclic graph (DAG). A lowest common ancestor (LCA) of two vertices $u$ and $v$ is a common ancestor $w$ of $u$ and $v$ such that no descendant of $w$ has the same property. In this paper, we consider the problem of computing an LCA, if any, for all pairs of vertices in a DAG. The fastest known algorithms for this problem exploit fast matrix multiplication subroutines and have running times ranging from $O(n^{2.687})$ [Bender et al.~SODA'01] down to $O(n^{2.615})$ [Kowaluk and Lingas~ICALP'05] and $O(n^{2.569})$ [Czumaj et al.~TCS'07]. Somewhat surprisingly, all those bounds would still be $Ω(n^{2.5})$ even if matrix multiplication could be solved optimally (i.e., $ω=2$). This appears to be an inherent barrier for all the currently known approaches, which raises the natural question on whether one could break through the $O(n^{2.5})$ barrier for this problem.
In this paper, we answer this question affirmatively: in particular, we present an $\tilde O(n^{2.447})$ ($\tilde O(n^{7/3})$ for $ω=2$) algorithm for finding an LCA for all pairs of vertices in a DAG, which represents the first improvement on the running times for this problem in the last 13 years. A key tool in our approach is a fast algorithm to partition the vertex set of the transitive closure of $G$ into a collection of $O(\ell)$ chains and $O(n/\ell)$ antichains, for a given parameter $\ell$. As usual, a chain is a path while an antichain is an independent set. We then find, for all pairs of vertices, a \emph{candidate} LCA among the chain and antichain vertices, separately. The first set is obtained via a reduction to min-max matrix multiplication. The computation of the second set can be reduced to Boolean matrix multiplication similarly to previous results on this problem. We finally combine the two solutions together in a careful (non-obvious) manner.
△ Less
Submitted 13 November, 2020; v1 submitted 17 July, 2020;
originally announced July 2020.
-
Improved Circular $k$-Mismatch Sketches
Authors:
Shay Golan,
Tomasz Kociumaka,
Tsvi Kopelowitz,
Ely Porat,
Przemysław Uznański
Abstract:
The shift distance $\mathsf{sh}(S_1,S_2)$ between two strings $S_1$ and $S_2$ of the same length is defined as the minimum Hamming distance between $S_1$ and any rotation (cyclic shift) of $S_2$. We study the problem of sketching the shift distance, which is the following communication complexity problem: Strings $S_1$ and $S_2$ of length $n$ are given to two identical players (encoders), who inde…
▽ More
The shift distance $\mathsf{sh}(S_1,S_2)$ between two strings $S_1$ and $S_2$ of the same length is defined as the minimum Hamming distance between $S_1$ and any rotation (cyclic shift) of $S_2$. We study the problem of sketching the shift distance, which is the following communication complexity problem: Strings $S_1$ and $S_2$ of length $n$ are given to two identical players (encoders), who independently compute sketches (summaries) $\mathtt{sk}(S_1)$ and $\mathtt{sk}(S_2)$, respectively, so that upon receiving the two sketches, a third player (decoder) is able to compute (or approximate) $\mathsf{sh}(S_1,S_2)$ with high probability.
This paper primarily focuses on the more general $k$-mismatch version of the problem, where the decoder is allowed to declare a failure if $\mathsf{sh}(S_1,S_2)>k$, where $k$ is a parameter known to all parties. Andoni et al. (STOC'13) introduced exact circular $k$-mismatch sketches of size $\widetilde{O}(k+D(n))$, where $D(n)$ is the number of divisors of $n$. Andoni et al. also showed that their sketch size is optimal in the class of linear homomorphic sketches.
We circumvent this lower bound by designing a (non-linear) exact circular $k$-mismatch sketch of size $\widetilde{O}(k)$; this size matches communication-complexity lower bounds. We also design $(1\pm \varepsilon)$-approximate circular $k$-mismatch sketch of size $\widetilde{O}(\min(\varepsilon^{-2}\sqrt{k}, \varepsilon^{-1.5}\sqrt{n}))$, which improves upon an $\widetilde{O}(\varepsilon^{-2}\sqrt{n})$-size sketch of Crouch and McGregor (APPROX'11).
△ Less
Submitted 24 June, 2020;
originally announced June 2020.
-
An Efficient Noisy Binary Search in Graphs via Median Approximation
Authors:
Dariusz Dereniowski,
Aleksander Łukasiewicz,
Przemysław Uznański
Abstract:
Consider a generalization of the classical binary search problem in linearly sorted data to the graph-theoretic setting. The goal is to design an adaptive query algorithm, called a strategy, that identifies an initially unknown target vertex in a graph by asking queries. Each query is conducted as follows: the strategy selects a vertex $q$ and receives a reply $v$: if $q$ is the target, then…
▽ More
Consider a generalization of the classical binary search problem in linearly sorted data to the graph-theoretic setting. The goal is to design an adaptive query algorithm, called a strategy, that identifies an initially unknown target vertex in a graph by asking queries. Each query is conducted as follows: the strategy selects a vertex $q$ and receives a reply $v$: if $q$ is the target, then $v=q$, and if $q$ is not the target, then $v$ is a neighbor of $q$ that lies on a shortest path to the target. Furthermore, there is a noise parameter $0\leq p<\frac{1}{2}$, which means that each reply can be incorrect with probability $p$. The optimization criterion to be minimized is the overall number of queries asked by the strategy, called the query complexity. The query complexity is well understood to be $O(\varepsilon^{-2}\log n)$ for general graphs, where $n$ is the order of the graph and $\varepsilon=\frac{1}{2}-p$. However, implementing such a strategy is computationally expensive, with each query requiring possibly $O(n^2)$ operations.
In this work we propose two efficient strategies that keep the optimal query complexity. The first strategy achieves the overall complexity of $O(\varepsilon^{-1}n\log n)$ per a single query. The second strategy is dedicated to graphs of small diameter $D$ and maximum degree $Δ$ and has the average complexity of $O(n+\varepsilon^{-2}DΔ\log n)$ per query. We stress out that we develop an algorithmic tool of graph median approximation that is of independent interest: the median can be efficiently approximated by finding a vertex minimizing the sum of distances to a randomly sampled vertex subset of size $O(\varepsilon^{-2}\log n)$.
△ Less
Submitted 30 April, 2020;
originally announced May 2020.
-
Robust Comparison in Population Protocols
Authors:
Dan Alistarh,
Martin Töpfer,
Przemysław Uznański
Abstract:
There has recently been a surge of interest in the computational and complexity properties of the population model, which assumes $n$ anonymous, computationally-bounded nodes, interacting at random, and attempting to jointly compute global predicates. Significant work has gone towards investigating majority and consensus dynamics in this model: assuming that each node is initially in one of two st…
▽ More
There has recently been a surge of interest in the computational and complexity properties of the population model, which assumes $n$ anonymous, computationally-bounded nodes, interacting at random, and attempting to jointly compute global predicates. Significant work has gone towards investigating majority and consensus dynamics in this model: assuming that each node is initially in one of two states $X$ or $Y$, determine which state had higher initial count.
In this paper, we consider a natural generalization of majority/consensus, which we call comparison. We are given two baseline states, $X_0$ and $Y_0$, present in any initial configuration in fixed, possibly small counts. Importantly, one of these states has higher count than the other: we will assume $|X_0| \ge C |Y_0|$ for some constant $C$. The challenge is to design a protocol which can quickly and reliably decide on which of the baseline states $X_0$ and $Y_0$ has higher initial count.
We propose a simple algorithm solving comparison: the baseline algorithm uses $O(\log n)$ states per node, and converges in $O(\log n)$ (parallel) time, with high probability, to a state where whole population votes on opinions $X$ or $Y$ at rates proportional to initial $|X_0|$ vs. $|Y_0|$ concentrations. We then describe how such output can be then used to solve comparison. The algorithm is self-stabilizing, in the sense that it converges to the correct decision even if the relative counts of baseline states $X_0$ and $Y_0$ change dynamically during the execution, and leak-robust, in the sense that it can withstand spurious faulty reactions. Our analysis relies on a new martingale concentration result which relates the evolution of a population protocol to its expected (steady-state) analysis, which should be broadly applicable in the context of population protocols and opinion dynamics.
△ Less
Submitted 11 January, 2022; v1 submitted 13 March, 2020;
originally announced March 2020.
-
Approximating Text-to-Pattern Distance via Dimensionality Reduction
Authors:
Przemysław Uznański
Abstract:
Text-to-pattern distance is a fundamental problem in string matching, where given a pattern of length $m$ and a text of length $n$, over an integer alphabet, we are asked to compute the distance between pattern and the text at every location. The distance function can be e.g. Hamming distance or $\ell_p$ distance for some parameter $p > 0$. Almost all state-of-the-art exact and approximate algorit…
▽ More
Text-to-pattern distance is a fundamental problem in string matching, where given a pattern of length $m$ and a text of length $n$, over an integer alphabet, we are asked to compute the distance between pattern and the text at every location. The distance function can be e.g. Hamming distance or $\ell_p$ distance for some parameter $p > 0$. Almost all state-of-the-art exact and approximate algorithms developed in the past $\sim 40$ years were using FFT as a black-box. In this work we present $\widetilde{O}(n/\varepsilon^2)$ time algorithms for $(1\pm\varepsilon)$-approximation of $\ell_2$ distances, and $\widetilde{O}(n/\varepsilon^3)$ algorithm for approximation of Hamming and $\ell_1$ distances, all without use of FFT. This is independent to the very recent development by Chan et al. [STOC 2020], where $O(n/\varepsilon^2)$ algorithm for Hamming distances not using FFT was presented -- although their algorithm is much more "combinatorial", our techniques apply to other norms than Hamming.
△ Less
Submitted 1 May, 2020; v1 submitted 9 February, 2020;
originally announced February 2020.
-
$L_p$ Pattern Matching in a Stream
Authors:
Tatiana Starikovskaya,
Michal Svagerka,
Przemysław Uznański
Abstract:
We consider the problem of computing distance between a pattern of length $n$ and all $n$-length subwords of a text in the streaming model. In the streaming setting, only the Hamming distance ($L_0$) has been studied. It is known that computing the exact Hamming distance between a pattern and a streaming text requires $Ω(n)$ space (folklore). Therefore, to develop sublinear-space solutions, one mu…
▽ More
We consider the problem of computing distance between a pattern of length $n$ and all $n$-length subwords of a text in the streaming model. In the streaming setting, only the Hamming distance ($L_0$) has been studied. It is known that computing the exact Hamming distance between a pattern and a streaming text requires $Ω(n)$ space (folklore). Therefore, to develop sublinear-space solutions, one must relax their requirements. One possibility to do so is to compute only the distances bounded by a threshold $k$, see~[SODA'19, Clifford, Kociumaka, Porat] and references therein. The motivation for this variant of this problem is that we are interested in subwords of the text that are similar to the pattern, i.e. in subwords such that the distance between them and the pattern is relatively small. On the other hand, the main application of the streaming setting is processing large-scale data, such as biological data. Recent advances in hardware technology allow generating such data at a very high speed, but unfortunately, the produced data may contain about 10\% of noise~[Biol. Direct.'07, Klebanov and Yakovlev]. To analyse such data, it is not sufficient to consider small distances only. A possible workaround for this issue is the $(1\pm\varepsilon)$-approximation. This line of research was initiated in [ICALP'16, Clifford and Starikovskaya] who gave a $(1\pm\varepsilon)$-approximation algorithm with space~$\tilde{O}(\varepsilon^{-5}\sqrt{n})$. In this work, we show a suite of new streaming algorithms for computing the Hamming, $L_1$, $L_2$ and general $L_p$ ($0 < p < 2$) distances between the pattern and the text. Our results significantly extend over the previous result in this setting. In particular, for the Hamming distance and for the $L_p$ distance when $0 < p \le 1$ we show a streaming algorithm that uses $\tilde{O}(\varepsilon^{-2}\sqrt{n})$ space for polynomial-size alphabets.
△ Less
Submitted 8 November, 2020; v1 submitted 9 July, 2019;
originally announced July 2019.
-
RLE edit distance in near optimal time
Authors:
Raphaël Clifford,
Paweł Gawrychowski,
Tomasz Kociumaka,
Daniel P. Martin,
Przemysław Uznański
Abstract:
We show that the edit distance between two run-length encoded strings of compressed lengths $m$ and $n$ respectively, can be computed in $\mathcal{O}(mn\log(mn))$ time. This improves the previous record by a factor of $\mathcal{O}(n/\log(mn))$. The running time of our algorithm is within subpolynomial factors of being optimal, subject to the standard SETH-hardness assumption. This effectively clos…
▽ More
We show that the edit distance between two run-length encoded strings of compressed lengths $m$ and $n$ respectively, can be computed in $\mathcal{O}(mn\log(mn))$ time. This improves the previous record by a factor of $\mathcal{O}(n/\log(mn))$. The running time of our algorithm is within subpolynomial factors of being optimal, subject to the standard SETH-hardness assumption. This effectively closes a line of algorithmic research first started in 1993.
△ Less
Submitted 3 May, 2019;
originally announced May 2019.
-
Hardness of Exact Distance Queries in Sparse Graphs Through Hub Labeling
Authors:
Adrian Kosowski,
Przemysław Uznański,
Laurent Viennot
Abstract:
A distance labeling scheme is an assignment of bit-labels to the vertices of an undirected, unweighted graph such that the distance between any pair of vertices can be decoded solely from their labels. An important class of distance labeling schemes is that of hub labelings, where a node $v \in G$ stores its distance to the so-called hubs $S_v \subseteq V$, chosen so that for any $u,v \in V$ there…
▽ More
A distance labeling scheme is an assignment of bit-labels to the vertices of an undirected, unweighted graph such that the distance between any pair of vertices can be decoded solely from their labels. An important class of distance labeling schemes is that of hub labelings, where a node $v \in G$ stores its distance to the so-called hubs $S_v \subseteq V$, chosen so that for any $u,v \in V$ there is $w \in S_u \cap S_v$ belonging to some shortest $uv$ path. Notice that for most existing graph classes, the best distance labelling constructions existing use at some point a hub labeling scheme at least as a key building block. Our interest lies in hub labelings of sparse graphs, i.e., those with $|E(G)| = O(n)$, for which we show a lowerbound of $\frac{n}{2^{O(\sqrt{\log n})}}$ for the average size of the hubsets. Additionally, we show a hub-labeling construction for sparse graphs of average size $O(\frac{n}{RS(n)^{c}})$ for some $0 < c < 1$, where $RS(n)$ is the so-called Ruzsa-Szemer{é}di function, linked to structure of induced matchings in dense graphs. This implies that further improving the lower bound on hub labeling size to $\frac{n}{2^{(\log n)^{o(1)}}}$ would require a breakthrough in the study of lower bounds on $RS(n)$, which have resisted substantial improvement in the last 70 years. For general distance labeling of sparse graphs, we show a lowerbound of $\frac{1}{2^{O(\sqrt{\log n})}} SumIndex(n)$, where $SumIndex(n)$ is the communication complexity of the Sum-Index problem over $Z_n$. Our results suggest that the best achievable hub-label size and distance-label size in sparse graphs may be $Θ(\frac{n}{2^{(\log n)^c}})$ for some $0<c < 1$.
△ Less
Submitted 21 June, 2019; v1 submitted 19 February, 2019;
originally announced February 2019.
-
Approximating Approximate Pattern Matching
Authors:
Jan Studený,
Przemysław Uznański
Abstract:
Given a text $T$ of length $n$ and a pattern $P$ of length $m$, the approximate pattern matching problem asks for computation of a particular \emph{distance} function between $P$ and every $m$-substring of $T$. We consider a $(1\pm\varepsilon)$ multiplicative approximation variant of this problem, for $\ell_p$ distance function. In this paper, we describe two $(1+\varepsilon)$-approximate algorith…
▽ More
Given a text $T$ of length $n$ and a pattern $P$ of length $m$, the approximate pattern matching problem asks for computation of a particular \emph{distance} function between $P$ and every $m$-substring of $T$. We consider a $(1\pm\varepsilon)$ multiplicative approximation variant of this problem, for $\ell_p$ distance function. In this paper, we describe two $(1+\varepsilon)$-approximate algorithms with a runtime of $\widetilde{O}(\frac{n}{\varepsilon})$ for all (constant) non-negative values of $p$. For constant $p \ge 1$ we show a deterministic $(1+\varepsilon)$-approximation algorithm. Previously, such run time was known only for the case of $\ell_1$ distance, by Gawrychowski and Uznański [ICALP 2018] and only with a randomized algorithm. For constant $0 \le p \le 1$ we show a randomized algorithm for the $\ell_p$, thereby providing a smooth tradeoff between algorithms of Kopelowitz and Porat [FOCS~2015, SOSA~2018] for Hamming distance (case of $p=0$) and of Gawrychowski and Uznański for $\ell_1$ distance.
△ Less
Submitted 23 July, 2019; v1 submitted 3 October, 2018;
originally announced October 2018.
-
Faster Algorithms for All-Pairs Bounded Min-Cuts
Authors:
Amir Abboud,
Loukas Georgiadis,
Giuseppe F. Italiano,
Robert Krauthgamer,
Nikos Parotsidis,
Ohad Trabelsi,
Przemysław Uznański,
Daniel Wolleb-Graf
Abstract:
The All-Pairs Min-Cut problem (aka All-Pairs Max-Flow) asks to compute a minimum $s$-$t$ cut (or just its value) for all pairs of vertices $s,t$. We study this problem in directed graphs with unit edge/vertex capacities (corresponding to edge/vertex connectivity). Our focus is on the $k$-bounded case, where the algorithm has to find all pairs with min-cut value less than $k$, and report only those…
▽ More
The All-Pairs Min-Cut problem (aka All-Pairs Max-Flow) asks to compute a minimum $s$-$t$ cut (or just its value) for all pairs of vertices $s,t$. We study this problem in directed graphs with unit edge/vertex capacities (corresponding to edge/vertex connectivity). Our focus is on the $k$-bounded case, where the algorithm has to find all pairs with min-cut value less than $k$, and report only those. The most basic case $k=1$ is the Transitive Closure (TC) problem, which can be solved in graphs with $n$ vertices and $m$ edges in time $O(mn)$ combinatorially, and in time $O(n^ω)$ where $ω<2.38$ is the matrix-multiplication exponent. These time bounds are conjectured to be optimal.
We present new algorithms and conditional lower bounds that advance the frontier for larger $k$, as follows: (i) A randomized algorithm for vertex capacities that runs in time $O((nk)^ω)$. (ii) Two deterministic algorithms for edge capacities (which is more general) that work in DAGs and further reports a minimum cut for each pair. The first algorithm is combinatorial (does not involve matrix multiplication) and runs in time $O(2^{O(k^2)}\cdot mn)$. The second algorithm can be faster on dense DAGs and runs in time $O((k\log n)^{4^k+o(k)} n^ω)$. (iii) The first super-cubic lower bound of $n^{ω-1-o(1)} k^2$ time under the $4$-Clique conjecture, which holds even in the simplest case of DAGs with unit vertex capacities. It improves on the previous (SETH-based) lower bounds even in the unbounded setting $k=n$. For combinatorial algorithms, our reduction implies an $n^{2-o(1)} k^2$ conditional lower bound. Thus, we identify new settings where the complexity of the problem is (conditionally) higher than that of TC.
△ Less
Submitted 21 February, 2019; v1 submitted 16 July, 2018;
originally announced July 2018.
-
A Framework for Searching in Graphs in the Presence of Errors
Authors:
Dariusz Dereniowski,
Stefan Tiegel,
Przemysław Uznański,
Daniel Wolleb-Graf
Abstract:
We consider the problem of searching for an unknown target vertex $t$ in a (possibly edge-weighted) graph. Each \emph{vertex-query} points to a vertex $v$ and the response either admits $v$ is the target or provides any neighbor $s\not=v$ that lies on a shortest path from $v$ to $t$. This model has been introduced for trees by Onak and Parys [FOCS 2006] and for general graphs by Emamjomeh-Zadeh et…
▽ More
We consider the problem of searching for an unknown target vertex $t$ in a (possibly edge-weighted) graph. Each \emph{vertex-query} points to a vertex $v$ and the response either admits $v$ is the target or provides any neighbor $s\not=v$ that lies on a shortest path from $v$ to $t$. This model has been introduced for trees by Onak and Parys [FOCS 2006] and for general graphs by Emamjomeh-Zadeh et al. [STOC 2016]. In the latter, the authors provide algorithms for the error-less case and for the independent noise model (where each query independently receives an erroneous answer with known probability $p<1/2$ and a correct one with probability $1-p$).
We study this problem in both adversarial errors and independent noise models. First, we show an algorithm that needs $\frac{\log_2 n}{1 - H(r)}$ queries against \emph{adversarial} errors, where adversary is bounded with its rate of errors by a known constant $r<1/2$. Our algorithm is in fact a simplification of previous work, and our refinement lies in invoking amortization argument. We then show that our algorithm coupled with Chernoff bound argument leads to an algorithm for independent noise that is simpler and with a query complexity that is both simpler and asymptotically better to one of Emamjomeh-Zadeh et al. [STOC 2016].
Our approach has a wide range of applications. First, it improves and simplifies Robust Interactive Learning framework proposed by Emamjomeh-Zadeh et al. [NIPS 2017]. Secondly, performing analogous analysis for \emph{edge-queries} (where query to edge $e$ returns its endpoint that is closer to target) we actually recover (as a special case) noisy binary search algorithm that is asymptotically optimal, matching the complexity of Feige et al. [SIAM J. Comput. 1994]. Thirdly, we improve and simplify upon existing algorithm for searching of \emph{unbounded} domains due to Aslam and Dhagat [STOC 1991].
△ Less
Submitted 5 March, 2020; v1 submitted 5 April, 2018;
originally announced April 2018.
-
Population Protocols Are Fast
Authors:
Adrian Kosowski,
Przemysław Uznański
Abstract:
A population protocol describes a set of state change rules for a population of $n$ indistinguishable finite-state agents (automata), undergoing random pairwise interactions. Within this very basic framework, it is possible to resolve a number of fundamental tasks in distributed computing, including: leader election, aggregate and threshold functions on the population, such as majority computation…
▽ More
A population protocol describes a set of state change rules for a population of $n$ indistinguishable finite-state agents (automata), undergoing random pairwise interactions. Within this very basic framework, it is possible to resolve a number of fundamental tasks in distributed computing, including: leader election, aggregate and threshold functions on the population, such as majority computation, and plurality consensus. For the first time, we show that solutions to all of these problems can be obtained \emph{quickly} using finite-state protocols. For any input, the designed finite-state protocols converge under a fair random scheduler to an output which is correct with high probability in expected $O(\mathrm{poly} \log n)$ parallel time. In the same setting, we also show protocols which always reach a valid solution, in expected parallel time $O(n^\varepsilon)$, where the number of states of the interacting automata depends only on the choice of $\varepsilon>0$. The stated time bounds hold for \emph{any} semi-linear predicate computable in the population protocol framework.
The key ingredient of our result is the decentralized design of a hierarchy of phase-clocks, which tick at different rates, with the rates of adjacent clocks separated by a factor of $Θ(\log n)$. The construction of this clock hierarchy relies on a new protocol composition technique, combined with an adapted analysis of a self-organizing process of oscillatory dynamics. This clock hierarchy is used to provide nested synchronization primitives, which allow us to view the population in a global manner and design protocols using a high-level imperative programming language with a (limited) capacity for loops and branching instructions.
△ Less
Submitted 16 April, 2018; v1 submitted 19 February, 2018;
originally announced February 2018.
-
Almost logarithmic-time space optimal leader election in population protocols
Authors:
Leszek Gąsieniec,
Grzegorz Stachowiak,
Przemysław Uznański
Abstract:
The model of population protocols refers to a large collection of simple indistinguishable entities, frequently called {\em agents}. The agents communicate and perform computation through pairwise interactions. We study fast and space efficient leader election in population of cardinality $n$ governed by a random scheduler, where during each time step the scheduler uniformly at random selects for…
▽ More
The model of population protocols refers to a large collection of simple indistinguishable entities, frequently called {\em agents}. The agents communicate and perform computation through pairwise interactions. We study fast and space efficient leader election in population of cardinality $n$ governed by a random scheduler, where during each time step the scheduler uniformly at random selects for interaction exactly one pair of agents.
We propose the first $o(\log^2 n)$-time leader election protocol. Our solution operates in expected parallel time $O(\log n\log\log n)$ which is equivalent to $O(n \log n\log\log n)$ pairwise interactions. This is the fastest currently known leader election algorithm in which each agent utilises asymptotically optimal number of $O(\log\log n)$ states.
The new protocol incorporates and amalgamates successfully the power of assorted {\em synthetic coins} with variable rate {\em phase clocks}.
△ Less
Submitted 13 May, 2018; v1 submitted 19 February, 2018;
originally announced February 2018.
-
Faster Approximate(d) Text-to-Pattern L1 Distance
Authors:
Przemysław Uznański
Abstract:
The problem of finding \emph{distance} between \emph{pattern} of length $m$ and \emph{text} of length $n$ is a typical way of generalizing pattern matching to incorporate dissimilarity score. For both Hamming and $L_1$ distances only a super linear upper bound $\widetilde{O}(n\sqrt{m})$ are known, which prompts the question of relaxing the problem: either by asking for $(1 \pm \varepsilon)$ approx…
▽ More
The problem of finding \emph{distance} between \emph{pattern} of length $m$ and \emph{text} of length $n$ is a typical way of generalizing pattern matching to incorporate dissimilarity score. For both Hamming and $L_1$ distances only a super linear upper bound $\widetilde{O}(n\sqrt{m})$ are known, which prompts the question of relaxing the problem: either by asking for $(1 \pm \varepsilon)$ approximate distance (every distance is reported up to a multiplicative factor), or $k$-approximated distance (distances exceeding $k$ are reported as $\infty$). We focus on $L_1$ distance, for which we show new algorithms achieving complexities respectively $\widetilde{O}(\varepsilon^{-1} n)$ and $\widetilde{O}((m+k\sqrt{m}) \cdot n/m)$. This is a significant improvement upon previous algorithms with runtime $\widetilde{O}(\varepsilon^{-2} n)$ of Lipsky and Porat [Algorithmica 2011] and $\widetilde{O}(n\sqrt{k})$ of Amir, Lipsky, Porat and Umanski [CPM 2005].
△ Less
Submitted 2 May, 2018; v1 submitted 27 January, 2018;
originally announced January 2018.
-
Hamming distance completeness and sparse matrix multiplication
Authors:
Daniel Graf,
Karim Labib,
Przemysław Uznański
Abstract:
We show that a broad class of $(+,\diamond)$ vector products (for binary integer functions $\diamond$) are equivalent under one-to-polylog reductions to the computation of the Hamming distance. Examples include: the dominance product, the threshold product and $\ell_{2p+1}$ distances for constant $p$. Our results imply equivalence (up to polylog factors) between complexity of computation of All Pa…
▽ More
We show that a broad class of $(+,\diamond)$ vector products (for binary integer functions $\diamond$) are equivalent under one-to-polylog reductions to the computation of the Hamming distance. Examples include: the dominance product, the threshold product and $\ell_{2p+1}$ distances for constant $p$. Our results imply equivalence (up to polylog factors) between complexity of computation of All Pairs: Hamming Distances, $\ell_{2p+1}$ Distances, Dominance Products and Threshold Products. As a consequence, Yuster's~(SODA'09) algorithm improves not only Matoušek's (IPL'91), but also the results of Indyk, Lewenstein, Lipsky and Porat (ICALP'04) and Min, Kao and Zhu (COCOON'09). Furthermore, our reductions apply to the pattern matching setting, showing equivalence (up to polylog factors) between pattern matching under Hamming Distance, $\ell_{2p+1}$ Distance, Dominance Product and Threshold Product, with current best upperbounds due to results of Abrahamson (SICOMP'87), Amir and Farach (Ann.~Math.~Artif.~Intell.'91), Atallah and Duket (IPL'11), Clifford, Clifford and Iliopoulous (CPM'05) and Amir, Lipsky, Porat and Umanski (CPM'05). The resulting algorithms for $\ell_{2p+1}$ Pattern Matching and All Pairs $\ell_{2p+1}$, for $2p+1 = 3,5,7,\dots$ are new.
Additionally, we show that the complexity of AllPairsHammingDistances (and thus of other aforementioned AllPairs- problems) is within polylog from the time it takes to multiply matrices $n \times (n\cdot d)$ and $(n\cdot d) \times n$, each with $(n\cdot d)$ non-zero entries. This means that the current upperbounds by Yuster (SODA'09) cannot be improved without improving the sparse matrix multiplication algorithm by Yuster and Zwick~(ACM TALG'05) and vice versa.
△ Less
Submitted 3 May, 2018; v1 submitted 10 November, 2017;
originally announced November 2017.
-
Energy Constrained Depth First Search
Authors:
Shantanu Das,
Dariusz Dereniowski,
Przemysław Uznański
Abstract:
Depth first search is a natural algorithmic technique for constructing a closed route that visits all vertices of a graph. The length of such route equals, in an edge-weighted tree, twice the total weight of all edges of the tree and this is asymptotically optimal over all exploration strategies. This paper considers a variant of such search strategies where the length of each route is bounded by…
▽ More
Depth first search is a natural algorithmic technique for constructing a closed route that visits all vertices of a graph. The length of such route equals, in an edge-weighted tree, twice the total weight of all edges of the tree and this is asymptotically optimal over all exploration strategies. This paper considers a variant of such search strategies where the length of each route is bounded by a positive integer $B$ (e.g. due to limited energy resources of the searcher). The objective is to cover all the edges of a tree $T$ using the minimum number of routes, each starting and ending at the root and each being of length at most $B$. To this end, we analyze the following natural greedy tree traversal process that is based on decomposing a depth first search traversal into a sequence of limited length routes. Given any arbitrary depth first search traversal $R$ of the tree $T$, we cover $R$ with routes $R_1,\ldots,R_l$, each of length at most $B$ such that: $R_i$ starts at the root, reaches directly the farthest point of $R$ visited by $R_{i-1}$, then $R_i$ continues along the path $R$ as far as possible, and finally $R_i$ returns to the root. We call the above algorithm \emph{piecemeal-DFS} and we prove that it achieves the asymptotically minimal number of routes $l$, regardless of the choice of $R$. Our analysis also shows that the total length of the traversal (and thus the traversal time) of piecemeal-DFS is asymptotically minimum over all energy-constrained exploration strategies. The fact that $R$ can be chosen arbitrarily means that the exploration strategy can be constructed in an online fashion when the input tree $T$ is not known in advance. Surprisingly, our results show that depth first search is efficient for energy constrained exploration of trees, even though it is known that the same does not hold for energy constrained exploration of arbitrary graphs.
△ Less
Submitted 17 February, 2018; v1 submitted 28 September, 2017;
originally announced September 2017.
-
Distributed Colour Reduction Revisited
Authors:
Jukka Kohonen,
Janne H. Korhonen,
Christopher Purcell,
Jukka Suomela,
Przemysław Uznański
Abstract:
We give a new, simple distributed algorithm for graph colouring in paths and cycles. Our algorithm is fast and self-contained, it does not need any globally consistent orientation, and it reduces the number of colours from $10^{100}$ to $3$ in three iterations.
We give a new, simple distributed algorithm for graph colouring in paths and cycles. Our algorithm is fast and self-contained, it does not need any globally consistent orientation, and it reduces the number of colours from $10^{100}$ to $3$ in three iterations.
△ Less
Submitted 4 September, 2017;
originally announced September 2017.
-
Robust Detection in Leak-Prone Population Protocols
Authors:
Dan Alistarh,
Bartłomiej Dudek,
Adrian Kosowski,
David Soloveichik,
Przemysław Uznański
Abstract:
In contrast to electronic computation, chemical computation is noisy and susceptible to a variety of sources of error, which has prevented the construction of robust complex systems. To be effective, chemical algorithms must be designed with an appropriate error model in mind. Here we consider the model of chemical reaction networks that preserve molecular count (population protocols), and ask whe…
▽ More
In contrast to electronic computation, chemical computation is noisy and susceptible to a variety of sources of error, which has prevented the construction of robust complex systems. To be effective, chemical algorithms must be designed with an appropriate error model in mind. Here we consider the model of chemical reaction networks that preserve molecular count (population protocols), and ask whether computation can be made robust to a natural model of unintended "leak" reactions. Our definition of leak is motivated by both the particular spurious behavior seen when implementing chemical reaction networks with DNA strand displacement cascades, as well as the unavoidable side reactions in any implementation due to the basic laws of chemistry. We develop a new "Robust Detection" algorithm for the problem of fast (logarithmic time) single molecule detection, and prove that it is robust to this general model of leaks. Besides potential applications in single molecule detection, the error-correction ideas developed here might enable a new class of robust-by-design chemical algorithms. Our analysis is based on a non-standard hybrid argument, combining ideas from discrete analysis of population protocols with classic Markov chain techniques.
△ Less
Submitted 16 August, 2019; v1 submitted 29 June, 2017;
originally announced June 2017.
-
Optimal trade-offs for pattern matching with $k$ mismatches
Authors:
Paweł Gawrychowski,
Przemysław Uznański
Abstract:
Given a pattern of length $m$ and a text of length $n$, the goal in $k$-mismatch pattern matching is to compute, for every $m$-substring of the text, the exact Hamming distance to the pattern or report that it exceeds $k$. This can be solved in either $\widetilde{O}(n \sqrt{k})$ time as shown by Amir et al. [J. Algorithms 2004] or $\widetilde{O}((m + k^2) \cdot n/m)$ time due to a result of Cliffo…
▽ More
Given a pattern of length $m$ and a text of length $n$, the goal in $k$-mismatch pattern matching is to compute, for every $m$-substring of the text, the exact Hamming distance to the pattern or report that it exceeds $k$. This can be solved in either $\widetilde{O}(n \sqrt{k})$ time as shown by Amir et al. [J. Algorithms 2004] or $\widetilde{O}((m + k^2) \cdot n/m)$ time due to a result of Clifford et al. [SODA 2016]. We provide a smooth time trade-off between these two bounds by designing an algorithm working in time $\widetilde{O}( (m + k \sqrt{m}) \cdot n/m)$. We complement this with a matching conditional lower bound, showing that a significantly faster combinatorial algorithm is not possible, unless the combinatorial matrix multiplication conjecture fails.
△ Less
Submitted 5 April, 2017;
originally announced April 2017.
-
Approximation Strategies for Generalized Binary Search in Weighted Trees
Authors:
Dariusz Dereniowski,
Adrian Kosowski,
Przemyslaw Uznanski,
Mengchuan Zou
Abstract:
We consider the following generalization of the binary search problem. A search strategy is required to locate an unknown target node $t$ in a given tree $T$. Upon querying a node $v$ of the tree, the strategy receives as a reply an indication of the connected component of $T\setminus\{v\}$ containing the target $t$. The cost of querying each node is given by a known non-negative weight function,…
▽ More
We consider the following generalization of the binary search problem. A search strategy is required to locate an unknown target node $t$ in a given tree $T$. Upon querying a node $v$ of the tree, the strategy receives as a reply an indication of the connected component of $T\setminus\{v\}$ containing the target $t$. The cost of querying each node is given by a known non-negative weight function, and the considered objective is to minimize the total query cost for a worst-case choice of the target. Designing an optimal strategy for a weighted tree search instance is known to be strongly NP-hard, in contrast to the unweighted variant of the problem which can be solved optimally in linear time. Here, we show that weighted tree search admits a quasi-polynomial time approximation scheme: for any $0 \textless{} \varepsilon \textless{} 1$, there exists a $(1+\varepsilon)$-approximation strategy with a computation time of $n^{O(\log n / \varepsilon^2)}$. Thus, the problem is not APX-hard, unless $NP \subseteq DTIME(n^{O(\log n)})$. By applying a generic reduction, we obtain as a corollary that the studied problem admits a polynomial-time $O(\sqrt{\log n})$-approximation. This improves previous $\hat O(\log n)$-approximation approaches, where the $\hat O$-notation disregards $O(\mathrm{poly}\log\log n)$-factors.
△ Less
Submitted 27 February, 2017;
originally announced February 2017.
-
LCL problems on grids
Authors:
Sebastian Brandt,
Juho Hirvonen,
Janne H. Korhonen,
Tuomo Lempiäinen,
Patric R. J. Östergård,
Christopher Purcell,
Joel Rybicki,
Jukka Suomela,
Przemysław Uznański
Abstract:
LCLs or locally checkable labelling problems (e.g. maximal independent set, maximal matching, and vertex colouring) in the LOCAL model of computation are very well-understood in cycles (toroidal 1-dimensional grids): every problem has a complexity of $O(1)$, $Θ(\log^* n)$, or $Θ(n)$, and the design of optimal algorithms can be fully automated.
This work develops the complexity theory of LCL prob…
▽ More
LCLs or locally checkable labelling problems (e.g. maximal independent set, maximal matching, and vertex colouring) in the LOCAL model of computation are very well-understood in cycles (toroidal 1-dimensional grids): every problem has a complexity of $O(1)$, $Θ(\log^* n)$, or $Θ(n)$, and the design of optimal algorithms can be fully automated.
This work develops the complexity theory of LCL problems for toroidal 2-dimensional grids. The complexity classes are the same as in the 1-dimensional case: $O(1)$, $Θ(\log^* n)$, and $Θ(n)$. However, given an LCL problem it is undecidable whether its complexity is $Θ(\log^* n)$ or $Θ(n)$ in 2-dimensional grids.
Nevertheless, if we correctly guess that the complexity of a problem is $Θ(\log^* n)$, we can completely automate the design of optimal algorithms. For any problem we can find an algorithm that is of a normal form $A' \circ S_k$, where $A'$ is a finite function, $S_k$ is an algorithm for finding a maximal independent set in $k$th power of the grid, and $k$ is a constant.
Finally, partially with the help of automated design tools, we classify the complexity of several concrete LCL problems related to colourings and orientations.
△ Less
Submitted 24 May, 2017; v1 submitted 17 February, 2017;
originally announced February 2017.
-
Ergodic Effects in Token Circulation
Authors:
Adrian Kosowski,
Przemysław Uznański
Abstract:
We consider a dynamical process in a network which distributes all particles (tokens) located at a node among its neighbors, in a round-robin manner.
We show that in the recurrent state of this dynamics (i.e., disregarding a polynomially long initialization phase of the system), the number of particles located on a given edge, averaged over an interval of time, is tightly concentrated around the…
▽ More
We consider a dynamical process in a network which distributes all particles (tokens) located at a node among its neighbors, in a round-robin manner.
We show that in the recurrent state of this dynamics (i.e., disregarding a polynomially long initialization phase of the system), the number of particles located on a given edge, averaged over an interval of time, is tightly concentrated around the average particle density in the system. Formally, for a system of $k$ particles in a graph of $m$ edges, during any interval of length $T$, this time-averaged value is $k/m \pm \widetilde{O}(1/T)$, whenever $\gcd(m,k) = \widetilde{O}(1)$ (and so, e.g., whenever $m$ is a prime number). To achieve these bounds, we link the behavior of the studied dynamics to ergodic properties of traversals based on Eulerian circuits on a symmetric directed graph. These results are proved through sum set methods and are likely to be of independent interest.
As a corollary, we also obtain bounds on the \emph{idleness} of the studied dynamics, i.e., on the longest possible time between two consecutive appearances of a token on an edge, taken over all edges. Designing trajectories for $k$ tokens in a way which minimizes idleness is fundamental to the study of the patrolling problem in networks. Our results immediately imply a bound of $\widetilde{O}(m/k)$ on the idleness of the studied process, showing that it is a distributed $\widetilde{O}(1)$-competitive solution to the patrolling task, for all of the covered cases. Our work also provides some further insights that may be interesting in load-balancing applications.
△ Less
Submitted 3 November, 2017; v1 submitted 29 December, 2016;
originally announced December 2016.
-
All-Pairs 2-Reachability in $\mathcal{O}(n^ω\log n)$ Time
Authors:
Loukas Georgiadis,
Daniel Graf,
Giuseppe F. Italiano,
Nikos Parotsidis,
Przemysław Uznański
Abstract:
In the $2$-reachability problem we are given a directed graph $G$ and we wish to determine if there are two (edge or vertex) disjoint paths from $u$ to $v$, for a given pair of vertices $u$ and $v$. In this paper, we present an algorithm that computes $2$-reachability information for all pairs of vertices in $\mathcal{O}(n^ω\log n)$ time, where $n$ is the number of vertices and $ω$ is the matrix m…
▽ More
In the $2$-reachability problem we are given a directed graph $G$ and we wish to determine if there are two (edge or vertex) disjoint paths from $u$ to $v$, for a given pair of vertices $u$ and $v$. In this paper, we present an algorithm that computes $2$-reachability information for all pairs of vertices in $\mathcal{O}(n^ω\log n)$ time, where $n$ is the number of vertices and $ω$ is the matrix multiplication exponent. Hence, we show that the running time of all-pairs $2$-reachability is only within a $\log$ factor of transitive closure.
Moreover, our algorithm produces a witness (i.e., a separating edge or a separating vertex) for all pair of vertices where $2$-reachability does not hold. By processing these witnesses, we can compute all the edge- and vertex-dominator trees of $G$ in $\mathcal{O}(n^2)$ additional time, which in turn enables us to answer various connectivity queries in $\mathcal{O}(1)$ time. For instance, we can test in constant time if there is a path from $u$ to $v$ avoiding an edge $e$, for any pair of query vertices $u$ and $v$, and any query edge $e$, or if there is a path from $u$ to $v$ avoiding a vertex $w$, for any query vertices $u$, $v$, and $w$.
△ Less
Submitted 26 July, 2017; v1 submitted 23 December, 2016;
originally announced December 2016.
-
A note on distance labeling in planar graphs
Authors:
Paweł Gawrychowski,
Przemysław Uznański
Abstract:
A distance labeling scheme is an assignments of labels, that is binary strings, to all nodes of a graph, so that the distance between any two nodes can be computed from their labels and the labels are as short as possible. A major open problem is to determine the complexity of distance labeling in unweighted and undirected planar graphs. It is known that, in such a graph on $n$ nodes, some labels…
▽ More
A distance labeling scheme is an assignments of labels, that is binary strings, to all nodes of a graph, so that the distance between any two nodes can be computed from their labels and the labels are as short as possible. A major open problem is to determine the complexity of distance labeling in unweighted and undirected planar graphs. It is known that, in such a graph on $n$ nodes, some labels must consist of $Ω(n^{1/3})$ bits, but the best known labeling scheme uses labels of length $O(\sqrt{n}\log n)$ [Gavoille, Peleg, Pérennes, and Raz, J. Algorithms, 2004]. We show that, in fact, labels of length $O(\sqrt{n})$ are enough.
△ Less
Submitted 20 November, 2016;
originally announced November 2016.
-
Tight Tradeoffs for Real-Time Approximation of Longest Palindromes in Streams
Authors:
Paweł Gawrychowski,
Oleg Merkurev,
Arseny M. Shur,
Przemysław Uznański
Abstract:
We consider computing a longest palindrome in the streaming model, where the symbols arrive one-by-one and we do not have random access to the input. While computing the answer exactly using sublinear space is not possible in such a setting, one can still hope for a good approximation guarantee. Our contribution is twofold. First, we provide lower bounds on the space requirements for randomized ap…
▽ More
We consider computing a longest palindrome in the streaming model, where the symbols arrive one-by-one and we do not have random access to the input. While computing the answer exactly using sublinear space is not possible in such a setting, one can still hope for a good approximation guarantee. Our contribution is twofold. First, we provide lower bounds on the space requirements for randomized approximation algorithms processing inputs of length $n$. We rule out Las Vegas algorithms, as they cannot achieve sublinear space complexity. For Monte Carlo algorithms, we prove a lower bounds of $Ω( M \log\min\{|Σ|,M\})$ bits of memory; here $M=n/E$ for approximating the answer with additive error $E$, and $M= \frac{\log n}{\log (1+\varepsilon)}$ for approximating the answer with multiplicative error $(1 + \varepsilon)$. Second, we design three real-time algorithms for this problem. Our Monte Carlo approximation algorithms for both additive and multiplicative versions of the problem use $O(M)$ words of memory. Thus the obtained lower bounds are asymptotically tight up to a logarithmic factor. The third algorithm is deterministic and finds a longest palindrome exactly if it is short. This algorithm can be run in parallel with a Monte Carlo algorithm to obtain better results in practice. Overall, both the time and space complexity of finding a longest palindrome in a stream are essentially settled.
△ Less
Submitted 10 October, 2016;
originally announced October 2016.
-
Randomized algorithms for finding a majority element
Authors:
Paweł Gawrychowski,
Jukka Suomela,
Przemysław Uznański
Abstract:
Given $n$ colored balls, we want to detect if more than $\lfloor n/2\rfloor$ of them have the same color, and if so find one ball with such majority color. We are only allowed to choose two balls and compare their colors, and the goal is to minimize the total number of such operations. A well-known exercise is to show how to find such a ball with only $2n$ comparisons while using only a logarithmi…
▽ More
Given $n$ colored balls, we want to detect if more than $\lfloor n/2\rfloor$ of them have the same color, and if so find one ball with such majority color. We are only allowed to choose two balls and compare their colors, and the goal is to minimize the total number of such operations. A well-known exercise is to show how to find such a ball with only $2n$ comparisons while using only a logarithmic number of bits for bookkee**. The resulting algorithm is called the Boyer--Moore majority vote algorithm. It is known that any deterministic method needs $\lceil 3n/2\rceil-2$ comparisons in the worst case, and this is tight. However, it is not clear what is the required number of comparisons if we allow randomization. We construct a randomized algorithm which always correctly finds a ball of the majority color (or detects that there is none) using, with high probability, only $7n/6+o(n)$ comparisons. We also prove that the expected number of comparisons used by any such randomized method is at least $1.019n$.
△ Less
Submitted 28 April, 2016; v1 submitted 4 March, 2016;
originally announced March 2016.
-
Prime Factorization of the Kirchhoff Polynomial: Compact Enumeration of Arborescences
Authors:
Matúš Mihalák,
Przemysław Uznański,
Pencho Yordanov
Abstract:
We study the problem of enumerating all rooted directed spanning trees (arborescences) of a directed graph (digraph) $G=(V,E)$ of $n$ vertices. An arborescence $A$ consisting of edges $e_1,\ldots,e_{n-1}$ can be represented as a monomial $e_1\cdot e_2 \cdots e_{n-1}$ in variables $e \in E$. All arborescences $\mathsf{arb}(G)$ of a digraph then define the Kirchhoff polynomial…
▽ More
We study the problem of enumerating all rooted directed spanning trees (arborescences) of a directed graph (digraph) $G=(V,E)$ of $n$ vertices. An arborescence $A$ consisting of edges $e_1,\ldots,e_{n-1}$ can be represented as a monomial $e_1\cdot e_2 \cdots e_{n-1}$ in variables $e \in E$. All arborescences $\mathsf{arb}(G)$ of a digraph then define the Kirchhoff polynomial $\sum_{A \in \mathsf{arb}(G)} \prod_{e\in A} e$. We show how to compute a compact representation of the Kirchhoff polynomial -- its prime factorization, and how it relates to combinatorial properties of digraphs such as strong connectivity and vertex domination. In particular, we provide digraph decomposition rules that correspond to factorization steps of the polynomial, and also give necessary and sufficient primality conditions of the resulting factors expressed by connectivity properties of the corresponding decomposed components. Thereby, we obtain a linear time algorithm for decomposing a digraph into components corresponding to factors of the initial polynomial, and a guarantee that no finer factorization is possible. The decomposition serves as a starting point for a recursive deletion-contraction algorithm, and also as a preprocessing phase for iterative enumeration algorithms. Both approaches produce a compressed output and retain some structural properties in the resulting polynomial. This proves advantageous in practical applications such as calculating steady states on digraphs governed by Laplacian dynamics, or computing the greatest common divisor of Kirchhoff polynomials. Finally, we initiate the study of a class of digraphs which allow for a practical enumeration of arborescences. Using our decomposition rules we observe that various digraphs from real-world applications fall into this class or are structurally similar to it.
△ Less
Submitted 28 July, 2015;
originally announced July 2015.
-
Sublinear-Space Distance Labeling using Hubs
Authors:
Paweł Gawrychowski,
Adrian Kosowski,
Przemysław Uznański
Abstract:
A distance labeling scheme is an assignment of bit-labels to the vertices of an undirected, unweighted graph such that the distance between any pair of vertices can be decoded solely from their labels. We propose a series of new labeling schemes within the framework of so-called hub labeling (HL, also known as landmark labeling or 2-hop-cover labeling), in which each node $u$ stores its distance t…
▽ More
A distance labeling scheme is an assignment of bit-labels to the vertices of an undirected, unweighted graph such that the distance between any pair of vertices can be decoded solely from their labels. We propose a series of new labeling schemes within the framework of so-called hub labeling (HL, also known as landmark labeling or 2-hop-cover labeling), in which each node $u$ stores its distance to all nodes from an appropriately chosen set of hubs $S(u) \subseteq V$. For a queried pair of nodes $(u,v)$, the length of a shortest $u-v$-path passing through a hub node from $S(u)\cap S(v)$ is then used as an upper bound on the distance between $u$ and $v$.
We present a hub labeling which allows us to decode exact distances in sparse graphs using labels of size sublinear in the number of nodes. For graphs with at most $n$ nodes and average degree $Δ$, the tradeoff between label bit size $L$ and query decoding time $T$ for our approach is given by $L = O(n \log \log_ΔT / \log_ΔT)$, for any $T \leq n$. Our simple approach is thus the first sublinear-space distance labeling for sparse graphs that simultaneously admits small decoding time (for constant $Δ$, we can achieve any $T=ω(1)$ while maintaining $L=o(n)$), and it also provides an improvement in terms of label size with respect to previous slower approaches.
By using similar techniques, we then present a $2$-additive labeling scheme for general graphs, i.e., one in which the decoder provides a 2-additive-approximation of the distance between any pair of nodes. We achieve almost the same label size-time tradeoff $L = O(n \log^2 \log T / \log T)$, for any $T \leq n$. To our knowledge, this is the first additive scheme with constant absolute error to use labels of sublinear size. The corresponding decoding time is then small (any $T=ω(1)$ is sufficient).
△ Less
Submitted 20 July, 2016; v1 submitted 22 July, 2015;
originally announced July 2015.
-
All Permutations Supersequence is coNP-complete
Authors:
Przemysław Uznański
Abstract:
We prove that deciding whether a given input word contains as subsequence every possible permutation of integers $\{1,2,\ldots,n\}$ is coNP-complete. The coNP-completeness holds even when given the guarantee that the input word contains as subsequences all of length $n-1$ sequences over the same set of integers. We also show NP-completeness of a related problem of Partially Non-crossing Perfect Ma…
▽ More
We prove that deciding whether a given input word contains as subsequence every possible permutation of integers $\{1,2,\ldots,n\}$ is coNP-complete. The coNP-completeness holds even when given the guarantee that the input word contains as subsequences all of length $n-1$ sequences over the same set of integers. We also show NP-completeness of a related problem of Partially Non-crossing Perfect Matching in Bipartite Graphs, i.e. to find a perfect matching in an ordered bipartite graph where edges of the matching incident to selected vertices (even only from one side) are non-crossing.
△ Less
Submitted 8 July, 2015; v1 submitted 16 June, 2015;
originally announced June 2015.
-
On Convergence and Threshold Properties of Discrete Lotka-Volterra Population Protocols
Authors:
Jurek Czyzowicz,
Leszek Gasieniec,
Adrian Kosowski,
Evangelos Kranakis,
Paul G. Spirakis,
Przemyslaw Uznanski
Abstract:
In this work we focus on a natural class of population protocols whose dynamics are modelled by the discrete version of Lotka-Volterra equations. In such protocols, when an agent $a$ of type (species) $i$ interacts with an agent $b$ of type (species) $j$ with $a$ as the initiator, then $b$'s type becomes $i$ with probability $P\_{ij}$. In such an interaction, we think of $a$ as the predator, $b$…
▽ More
In this work we focus on a natural class of population protocols whose dynamics are modelled by the discrete version of Lotka-Volterra equations. In such protocols, when an agent $a$ of type (species) $i$ interacts with an agent $b$ of type (species) $j$ with $a$ as the initiator, then $b$'s type becomes $i$ with probability $P\_{ij}$. In such an interaction, we think of $a$ as the predator, $b$ as the prey, and the type of the prey is either converted to that of the predator or stays as is. Such protocols capture the dynamics of some opinion spreading models and generalize the well-known Rock-Paper-Scissors discrete dynamics. We consider the pairwise interactions among agents that are scheduled uniformly at random. We start by considering the convergence time and show that any Lotka-Volterra-type protocol on an $n$-agent population converges to some absorbing state in time polynomial in $n$, w.h.p., when any pair of agents is allowed to interact. By contrast, when the interaction graph is a star, even the Rock-Paper-Scissors protocol requires exponential time to converge. We then study threshold effects exhibited by Lotka-Volterra-type protocols with 3 and more species under interactions between any pair of agents. We start by presenting a simple 4-type protocol in which the probability difference of reaching the two possible absorbing states is strongly amplified by the ratio of the initial populations of the two other types, which are transient, but "control" convergence. We then prove that the Rock-Paper-Scissors protocol reaches each of its three possible absorbing states with almost equal probability, starting from any configuration satisfying some sub-linear lower bound on the initial size of each species. That is, Rock-Paper-Scissors is a realization of a "coin-flip consensus" in a distributed system. Some of our techniques may be of independent value.
△ Less
Submitted 31 March, 2015;
originally announced March 2015.
-
Time and space optimality of rotor-router graph exploration
Authors:
Artur Menc,
Dominik Pająk,
Przemysław Uznański
Abstract:
We consider the problem of exploration of an anonymous, port-labeled, undirected graph with $n$ nodes and $m$ edges and diameter $D$, by a single mobile agent. Initially the agent does not know the graph topology nor any of the global parameters. Moreover, the agent does not know the incoming port when entering to a vertex. Each vertex is endowed with memory that can be read and modified by the ag…
▽ More
We consider the problem of exploration of an anonymous, port-labeled, undirected graph with $n$ nodes and $m$ edges and diameter $D$, by a single mobile agent. Initially the agent does not know the graph topology nor any of the global parameters. Moreover, the agent does not know the incoming port when entering to a vertex. Each vertex is endowed with memory that can be read and modified by the agent upon its visit to that node. However the agent has no operational memory i.e., it cannot carry any state while traversing an edge. In such a model at least $\log_2 d$ bits are needed at each vertex of degree $d$ for the agent to be able to traverse each graph edge. This number of bits is always sufficient to explore any graph in time $O(mD)$ using algorithm Rotor-Router. We show that even if the available node memory is unlimited then time $Ω(n^3)$ is sometimes required for any algorithm. This shows that Rotor-Router is asymptotically optimal in the worst-case graphs. Secondly we show that for the case of the path the Rotor-Router attains exactly optimal time.
△ Less
Submitted 29 November, 2015; v1 submitted 19 February, 2015;
originally announced February 2015.
-
Tight tradeoffs for approximating palindromes in streams
Authors:
Paweł Gawrychowski,
Przemysław Uznański
Abstract:
We consider computing the longest palindrome in a text of length $n$ in the streaming model, where the characters arrive one-by-one, and we do not have random access to the input. While computing the answer exactly using sublinear memory is not possible in such a setting, one can still hope for a good approximation guarantee.
We focus on the two most natural variants, where we aim for either add…
▽ More
We consider computing the longest palindrome in a text of length $n$ in the streaming model, where the characters arrive one-by-one, and we do not have random access to the input. While computing the answer exactly using sublinear memory is not possible in such a setting, one can still hope for a good approximation guarantee.
We focus on the two most natural variants, where we aim for either additive or multiplicative approximation of the length of the longest palindrome. We first show that there is no point in considering Las Vegas algorithms in such a setting, as they cannot achieve sublinear space complexity. For Monte Carlo algorithms, we provide a lowerbound of $Ω(\frac{n}{E})$ bits for approximating the answer with additive error $E$, and $Ω(\frac{\log n}{\log(1+\varepsilon)})$ bits for approximating the answer with multiplicative error $(1+\varepsilon)$ for the binary alphabet. Then, we construct a generic Monte Carlo algorithm, which by choosing the parameters appropriately achieves space complexity matching up to a logarithmic factor for both variants. This substantially improves the previous results by Berenbrink et al. (STACS 2014) and essentially settles the space complexity.
△ Less
Submitted 7 April, 2016; v1 submitted 23 October, 2014;
originally announced October 2014.
-
Lock-in Problem for Parallel Rotor-router Walks
Authors:
Jérémie Chalopin,
Shantanu Das,
Pawel Gawrychowski,
Adrian Kosowski,
Arnaud Labourel,
Przemyslaw Uznański
Abstract:
The rotor-router model, also called the Propp machine, was introduced as a deterministic alternative to the random walk. In this model, a group of identical tokens are initially placed at nodes of the graph. Each node maintains a cyclic ordering of the outgoing arcs, and during consecutive turns the tokens are propagated along arcs chosen according to this ordering in round-robin fashion. The beha…
▽ More
The rotor-router model, also called the Propp machine, was introduced as a deterministic alternative to the random walk. In this model, a group of identical tokens are initially placed at nodes of the graph. Each node maintains a cyclic ordering of the outgoing arcs, and during consecutive turns the tokens are propagated along arcs chosen according to this ordering in round-robin fashion. The behavior of the model is fully deterministic. Yanovski et al.(2003) proved that a single rotor-router walk on any graph with m edges and diameter $D$ stabilizes to a traversal of an Eulerian circuit on the set of all 2m directed arcs on the edge set of the graph, and that such periodic behaviour of the system is achieved after an initial transient phase of at most 2mD steps. The case of multiple parallel rotor-routers was studied experimentally, leading Yanovski et al. to the conjecture that a system of $k \textgreater{} 1$ parallel walks also stabilizes with a period of length at most $2m$ steps. In this work we disprove this conjecture, showing that the period of parallel rotor-router walks can in fact, be superpolynomial in the size of graph. On the positive side, we provide a characterization of the periodic behavior of parallel router walks, in terms of a structural property of stable states called a subcycle decomposition. This property provides us the tools to efficiently detect whether a given system configuration corresponds to the transient or to the limit behavior of the system. Moreover, we provide polynomial upper bounds of $O(m^4 D^2 + mD \log k)$ and $O(m^5 k^2)$ on the number of steps it takes for the system to stabilize. Thus, we are able to predict any future behavior of the system using an algorithm that takes polynomial time and space. In addition, we show that there exists a separation between the stabilization time of the single-walk and multiple-walk rotor-router systems, and that for some graphs the latter can be asymptotically larger even for the case of $k = 2$ walks.
△ Less
Submitted 28 May, 2015; v1 submitted 10 July, 2014;
originally announced July 2014.
-
Rendezvous of Distance-aware Mobile Agents in Unknown Graphs
Authors:
Shantanu Das,
Dariusz Dereniowski,
Adrian Kosowski,
Przemyslaw Uznanski
Abstract:
We study the problem of rendezvous of two mobile agents starting at distinct locations in an unknown graph. The agents have distinct labels and walk in synchronous steps. However the graph is unlabelled and the agents have no means of marking the nodes of the graph and cannot communicate with or see each other until they meet at a node. When the graph is very large we want the time to rendezvous t…
▽ More
We study the problem of rendezvous of two mobile agents starting at distinct locations in an unknown graph. The agents have distinct labels and walk in synchronous steps. However the graph is unlabelled and the agents have no means of marking the nodes of the graph and cannot communicate with or see each other until they meet at a node. When the graph is very large we want the time to rendezvous to be independent of the graph size and to depend only on the initial distance between the agents and some local parameters such as the degree of the vertices, and the size of the agent's label. It is well known that even for simple graphs of degree $Δ$, the rendezvous time can be exponential in $Δ$ in the worst case. In this paper, we introduce a new version of the rendezvous problem where the agents are equipped with a device that measures its distance to the other agent after every step. We show that these \emph{distance-aware} agents are able to rendezvous in any unknown graph, in time polynomial in all the local parameters such the degree of the nodes, the initial distance $D$ and the size of the smaller of the two agent labels $l = \min(l_1, l_2)$. Our algorithm has a time complexity of $O(Δ(D+\log{l}))$ and we show an almost matching lower bound of $Ω(Δ(D+\log{l}/\logΔ))$ on the time complexity of any rendezvous algorithm in our scenario. Further, this lower bound extends existing lower bounds for the general rendezvous problem without distance awareness.
△ Less
Submitted 11 June, 2014;
originally announced June 2014.
-
Improved Analysis of Deterministic Load-Balancing Schemes
Authors:
Petra Berenbrink,
Ralf Klasing,
Adrian Kosowski,
Frederik Mallmann-Trenn,
Przemyslaw Uznanski
Abstract:
We consider the problem of deterministic load balancing of tokens in the discrete model. A set of $n$ processors is connected into a $d$-regular undirected network. In every time step, each processor exchanges some of its tokens with each of its neighbors in the network. The goal is to minimize the discrepancy between the number of tokens on the most-loaded and the least-loaded processor as quickl…
▽ More
We consider the problem of deterministic load balancing of tokens in the discrete model. A set of $n$ processors is connected into a $d$-regular undirected network. In every time step, each processor exchanges some of its tokens with each of its neighbors in the network. The goal is to minimize the discrepancy between the number of tokens on the most-loaded and the least-loaded processor as quickly as possible.
Rabani et al. (1998) present a general technique for the analysis of a wide class of discrete load balancing algorithms. Their approach is to characterize the deviation between the actual loads of a discrete balancing algorithm with the distribution generated by a related Markov chain. The Markov chain can also be regarded as the underlying model of a continuous diffusion algorithm. Rabani et al. showed that after time $T = O(\log (Kn)/μ)$, any algorithm of their class achieves a discrepancy of $O(d\log n/μ)$, where $μ$ is the spectral gap of the transition matrix of the graph, and $K$ is the initial load discrepancy in the system.
In this work we identify some natural additional conditions on deterministic balancing algorithms, resulting in a class of algorithms reaching a smaller discrepancy. This class contains well-known algorithms, eg., the Rotor-Router.
Specifically, we introduce the notion of cumulatively fair load-balancing algorithms where in any interval of consecutive time steps, the total number of tokens sent out over an edge by a node is the same (up to constants) for all adjacent edges. We prove that algorithms which are cumulatively fair and where every node retains a sufficient part of its load in each step, achieve a discrepancy of $O(\min\{d\sqrt{\log n/μ},d\sqrt{n}\})$ in time $O(T)$. We also show that in general neither of these assumptions may be omitted without increasing discrepancy. We then show by a combinatorial potential reduction argument that any cumulatively fair scheme satisfying some additional assumptions achieves a discrepancy of $O(d)$ almost as quickly as the continuous diffusion process. This positive result applies to some of the simplest and most natural discrete load balancing schemes.
△ Less
Submitted 22 February, 2015; v1 submitted 16 April, 2014;
originally announced April 2014.
-
Order-preserving pattern matching with k mismatches
Authors:
Pawel Gawrychowski,
Przemyslaw Uznanski
Abstract:
We study a generalization of the recently introduced order-preserving pattern matching, where instead of looking for an exact copy of the pattern, we only require that the relative order between the elements is the same. In our variant, we additionally allow up to k mismatches between the pattern and the text, and the goal is to construct an efficient algorithm for small values of k. For a pattern…
▽ More
We study a generalization of the recently introduced order-preserving pattern matching, where instead of looking for an exact copy of the pattern, we only require that the relative order between the elements is the same. In our variant, we additionally allow up to k mismatches between the pattern and the text, and the goal is to construct an efficient algorithm for small values of k. For a pattern of length m and a text of length n, our algorithm detects an order-preserving occurrence with up to k mismatches in O(n(loglogm + kloglogk)) time.
△ Less
Submitted 5 March, 2014; v1 submitted 25 September, 2013;
originally announced September 2013.