-
Towards Testing Monotonicity of Distributions Over General Posets
Authors:
Maryam Aliakbarpour,
Themis Gouleakis,
John Peebles,
Ronitt Rubinfeld,
Anak Yodpinyanee
Abstract:
In this work, we consider the sample complexity required for testing the monotonicity of distributions over partial orders. A distribution $p$ over a poset is monotone if, for any pair of domain elements $x$ and $y$ such that $x \preceq y$, $p(x) \leq p(y)$. To understand the sample complexity of this problem, we introduce a new property called bigness over a finite domain, where the distribution…
▽ More
In this work, we consider the sample complexity required for testing the monotonicity of distributions over partial orders. A distribution $p$ over a poset is monotone if, for any pair of domain elements $x$ and $y$ such that $x \preceq y$, $p(x) \leq p(y)$. To understand the sample complexity of this problem, we introduce a new property called bigness over a finite domain, where the distribution is $T$-big if the minimum probability for any domain element is at least $T$. We establish a lower bound of $Ω(n/\log n)$ for testing bigness of distributions on domains of size $n$. We then build on these lower bounds to give $Ω(n/\log{n})$ lower bounds for testing monotonicity over a matching poset of size $n$ and significantly improved lower bounds over the hypercube poset. We give sublinear sample complexity bounds for testing bigness and for testing monotonicity over the matching poset.
We then give a number of tools for analyzing upper bounds on the sample complexity of
the monotonicity testing problem.
△ Less
Submitted 6 July, 2019;
originally announced July 2019.
-
Local Computation Algorithms for Spanners
Authors:
Merav Parter,
Ronitt Rubinfeld,
Ali Vakilian,
Anak Yodpinyanee
Abstract:
A graph spanner is a fundamental graph structure that faithfully preserves the pairwise distances in the input graph up to a small multiplicative stretch. The common objective in the computation of spanners is to achieve the best-known existential size-stretch trade-off efficiently.
Classical models and algorithmic analysis of graph spanners essentially assume that the algorithm can read the inp…
▽ More
A graph spanner is a fundamental graph structure that faithfully preserves the pairwise distances in the input graph up to a small multiplicative stretch. The common objective in the computation of spanners is to achieve the best-known existential size-stretch trade-off efficiently.
Classical models and algorithmic analysis of graph spanners essentially assume that the algorithm can read the input graph, construct the desired spanner, and write the answer to the output tape. However, when considering massive graphs containing millions or even billions of nodes not only the input graph, but also the output spanner might be too large for a single processor to store.
To tackle this challenge, we initiate the study of local computation algorithms (LCAs) for graph spanners in general graphs, where the algorithm should locally decide whether a given edge $(u,v) \in E$ belongs to the output spanner. Such LCAs give the user the `illusion' that a specific sparse spanner for the graph is maintained, without ever fully computing it. We present the following results:
-For general $n$-vertex graphs and $r \in \{2,3\}$, there exists an LCA for $(2r-1)$-spanners with $\widetilde{O}(n^{1+1/r})$ edges and sublinear probe complexity of $\widetilde{O}(n^{1-1/2r})$. These size/stretch tradeoffs are best possible (up to polylogarithmic factors).
-For every $k \geq 1$ and $n$-vertex graph with maximum degree $Δ$, there exists an LCA for $O(k^2)$ spanners with $\widetilde{O}(n^{1+1/k})$ edges, probe complexity of $\widetilde{O}(Δ^4 n^{2/3})$, and random seed of size $\mathrm{polylog}(n)$. This improves upon, and extends the work of [Lenzen-Levi, 2018].
We also complement our results by providing a polynomial lower bound on the probe complexity of LCAs for graph spanners that holds even for the simpler task of computing a sparse connected subgraph with $o(m)$ edges.
△ Less
Submitted 21 February, 2019;
originally announced February 2019.
-
Set Cover in Sub-linear Time
Authors:
Piotr Indyk,
Sepideh Mahabadi,
Ronitt Rubinfeld,
Ali Vakilian,
Anak Yodpinyanee
Abstract:
We study the classic set cover problem from the perspective of sub-linear algorithms. Given access to a collection of $m$ sets over $n$ elements in the query model, we show that sub-linear algorithms derived from existing techniques have almost tight query complexities.
On one hand, first we show an adaptation of the streaming algorithm presented in Har-Peled et al. [2016] to the sub-linear quer…
▽ More
We study the classic set cover problem from the perspective of sub-linear algorithms. Given access to a collection of $m$ sets over $n$ elements in the query model, we show that sub-linear algorithms derived from existing techniques have almost tight query complexities.
On one hand, first we show an adaptation of the streaming algorithm presented in Har-Peled et al. [2016] to the sub-linear query model, that returns an $α$-approximate cover using $\tilde{O}(m(n/k)^{1/(α-1)} + nk)$ queries to the input, where $k$ denotes the value of a minimum set cover. We then complement this upper bound by proving that for lower values of $k$, the required number of queries is $\tildeΩ(m(n/k)^{1/(2α)})$, even for estimating the optimal cover size. Moreover, we prove that even checking whether a given collection of sets covers all the elements would require $Ω(nk)$ queries. These two lower bounds provide strong evidence that the upper bound is almost tight for certain values of the parameter $k$.
On the other hand, we show that this bound is not optimal for larger values of the parameter $k$, as there exists a $(1+\varepsilon)$-approximation algorithm with $\tilde{O}(mn/k\varepsilon^2)$ queries. We show that this bound is essentially tight for sufficiently small constant $\varepsilon$, by establishing a lower bound of $\tildeΩ(mn/k)$ query complexity.
△ Less
Submitted 9 February, 2019;
originally announced February 2019.
-
Local Access to Huge Random Objects through Partial Sampling
Authors:
Amartya Shankha Biswas,
Ronitt Rubinfeld,
Anak Yodpinyanee
Abstract:
Consider an algorithm performing a computation on a huge random object. Is it necessary to generate the entire object up front, or is it possible to provide query access to the object and sample it incrementally "on-the-fly"? Such an implementation should emulate the object by answering queries in a manner consistent with a random instance sampled from the true distribution.
Our first set of res…
▽ More
Consider an algorithm performing a computation on a huge random object. Is it necessary to generate the entire object up front, or is it possible to provide query access to the object and sample it incrementally "on-the-fly"? Such an implementation should emulate the object by answering queries in a manner consistent with a random instance sampled from the true distribution.
Our first set of results focus on undirected graphs with independent edge probabilities, under certain assumptions. Then, we use this to obtain the first efficient implementations for the Erdos-Renyi model and the Stochastic Block model. As in previous local-access implementations for random graphs, we support Vertex-Pair and Next-Neighbor queries. We also introduce a new Random-Neighbor query.
Next, we show how to implement random Catalan objects, specifically focusing on Dyck paths (always positive random walks on the integer line). Here, we support Height queries to find the position of the walk, and First-Return queries to find the time when the walk returns to a specified height. This in turn can be used to implement Next-Neighbor queries on random rooted/binary trees, and Matching-Bracket queries on random well bracketed expressions.
Finally, we define a new model that: (1) allows multiple independent simultaneous instantiations of the same implementation to be consistent with each other without communication (2) allows us to generate a richer class of random objects that do not have a succinct description. Specifically, we study uniformly random valid $q$-colorings of an input graph $G$ with max degree $Δ$. The distribution over valid colorings is specified via a "huge" underlying graph $G$, that is far too large to be read in sub-linear time. Instead, we access $G$ through local neighborhood probes. We are able to answer queries to the color of any vertex in sub-linear time for $q > 9Δ$.
△ Less
Submitted 5 December, 2020; v1 submitted 29 November, 2017;
originally announced November 2017.
-
Even $1 \times n$ Edge-Matching and Jigsaw Puzzles are Really Hard
Authors:
Jeffrey Bosboom,
Erik D. Demaine,
Martin L. Demaine,
Adam Hesterberg,
Pasin Manurangsi,
Anak Yodpinyanee
Abstract:
We prove the computational intractability of rotating and placing $n$ square tiles into a $1 \times n$ array such that adjacent tiles are compatible--either equal edge colors, as in edge-matching puzzles, or matching tab/pocket shapes, as in jigsaw puzzles. Beyond basic NP-hardness, we prove that it is NP-hard even to approximately maximize the number of placed tiles (allowing blanks), while satis…
▽ More
We prove the computational intractability of rotating and placing $n$ square tiles into a $1 \times n$ array such that adjacent tiles are compatible--either equal edge colors, as in edge-matching puzzles, or matching tab/pocket shapes, as in jigsaw puzzles. Beyond basic NP-hardness, we prove that it is NP-hard even to approximately maximize the number of placed tiles (allowing blanks), while satisfying the compatibility constraint between nonblank tiles, within a factor of 0.9999999851. (On the other hand, there is an easy $1 \over 2$-approximation.) This is the first (correct) proof of inapproximability for edge-matching and jigsaw puzzles. Along the way, we prove NP-hardness of distinguishing, for a directed graph on $n$ nodes, between having a Hamiltonian path (length $n-1$) and having at most $0.999999284 (n-1)$ edges that form a vertex-disjoint union of paths. We use this gap hardness and gap-preserving reductions to establish similar gap hardness for $1 \times n$ jigsaw and edge-matching puzzles.
△ Less
Submitted 31 December, 2016;
originally announced January 2017.
-
Sublinear-Time Algorithms for Counting Star Subgraphs with Applications to Join Selectivity Estimation
Authors:
Maryam Aliakbarpour,
Amartya Shankha Biswas,
Themistoklis Gouleakis,
John Peebles,
Ronitt Rubinfeld,
Anak Yodpinyanee
Abstract:
We study the problem of estimating the value of sums of the form $S_p \triangleq \sum \binom{x_i}{p}$ when one has the ability to sample $x_i \geq 0$ with probability proportional to its magnitude. When $p=2$, this problem is equivalent to estimating the selectivity of a self-join query in database systems when one can sample rows randomly. We also study the special case when $\{x_i\}$ is the degr…
▽ More
We study the problem of estimating the value of sums of the form $S_p \triangleq \sum \binom{x_i}{p}$ when one has the ability to sample $x_i \geq 0$ with probability proportional to its magnitude. When $p=2$, this problem is equivalent to estimating the selectivity of a self-join query in database systems when one can sample rows randomly. We also study the special case when $\{x_i\}$ is the degree sequence of a graph, which corresponds to counting the number of $p$-stars in a graph when one has the ability to sample edges randomly.
Our algorithm for a $(1 \pm \varepsilon)$-multiplicative approximation of $S_p$ has query and time complexities $Ø(\frac{m \log \log n}{ε^2 S_p^{1/p}})$. Here, $m=\sum x_i/2$ is the number of edges in the graph, or equivalently, half the number of records in the database table. Similarly, $n$ is the number of vertices in the graph and the number of unique values in the database table. We also provide tight lower bounds (up to polylogarithmic factors) in almost all cases, even when $\{x_i\}$ is a degree sequence and one is allowed to use the structure of the graph to try to get a better estimate. We are not aware of any prior lower bounds on the problem of join selectivity estimation.
For the graph problem, prior work which assumed the ability to sample only \emph{vertices} uniformly gave algorithms with matching lower bounds [Gonen, Ron, and Shavitt. \textit{SIAM J. Comput.}, 25 (2011), pp. 1365-1411]. With the ability to sample edges randomly, we show that one can achieve faster algorithms for approximating the number of star subgraphs, bypassing the lower bounds in this prior work. For example, in the regime where $S_p\leq n$, and $p=2$, our upper bound is $\tilde{O}(n/S_p^{1/2})$, in contrast to their $Ω(n/S_p^{1/3})$ lower bound when no random edge queries are available.
△ Less
Submitted 16 January, 2016;
originally announced January 2016.
-
Dissection with the Fewest Pieces is Hard, Even to Approximate
Authors:
Jeffrey Bosboom,
Erik D. Demaine,
Martin L. Demaine,
Jayson Lynch,
Pasin Manurangsi,
Mikhail Rudoy,
Anak Yodpinyanee
Abstract:
We prove that it is NP-hard to dissect one simple orthogonal polygon into another using a given number of pieces, as is approximating the fewest pieces to within a factor of $1+1/1080-\varepsilon$.
We prove that it is NP-hard to dissect one simple orthogonal polygon into another using a given number of pieces, as is approximating the fewest pieces to within a factor of $1+1/1080-\varepsilon$.
△ Less
Submitted 21 December, 2015;
originally announced December 2015.
-
Local Computation Algorithms for Graphs of Non-Constant Degrees
Authors:
Reut Levi,
Ronitt Rubinfeld,
Anak Yodpinyanee
Abstract:
In the model of \emph{local computation algorithms} (LCAs), we aim to compute the queried part of the output by examining only a small (sublinear) portion of the input. Many recently developed LCAs on graph problems achieve time and space complexities with very low dependence on $n$, the number of vertices. Nonetheless, these complexities are generally at least exponential in $d$, the upper bound…
▽ More
In the model of \emph{local computation algorithms} (LCAs), we aim to compute the queried part of the output by examining only a small (sublinear) portion of the input. Many recently developed LCAs on graph problems achieve time and space complexities with very low dependence on $n$, the number of vertices. Nonetheless, these complexities are generally at least exponential in $d$, the upper bound on the degree of the input graph. Instead, we consider the case where parameter $d$ can be moderately dependent on $n$, and aim for complexities with subexponential dependence on $d$, while maintaining polylogarithmic dependence on $n$. We present: a randomized LCA for computing maximal independent sets whose time and space complexities are quasi-polynomial in $d$ and polylogarithmic in $n$; for constant $ε> 0$, a randomized LCA that provides a $(1-ε)$-approximation to maximum matching whose time and space complexities are polynomial in $d$ and polylogarithmic in $n$.
△ Less
Submitted 13 February, 2015;
originally announced February 2015.