-
No Complete Problem for Constant-Cost Randomized Communication
Authors:
Yuting Fang,
Lianna Hambardzumyan,
Nathaniel Harms,
Pooya Hatami
Abstract:
We prove that the class of communication problems with public-coin randomized constant-cost protocols, called $BPP^0$, does not contain a complete problem. In other words, there is no randomized constant-cost problem $Q \in BPP^0$, such that all other problems $P \in BPP^0$ can be computed by a constant-cost deterministic protocol with access to an oracle for $Q$. We also show that the $k$-Hamming…
▽ More
We prove that the class of communication problems with public-coin randomized constant-cost protocols, called $BPP^0$, does not contain a complete problem. In other words, there is no randomized constant-cost problem $Q \in BPP^0$, such that all other problems $P \in BPP^0$ can be computed by a constant-cost deterministic protocol with access to an oracle for $Q$. We also show that the $k$-Hamming Distance problems form an infinite hierarchy within $BPP^0$. Previously, it was known only that Equality is not complete for $BPP^0$. We introduce a new technique, using Ramsey theory, that can prove lower bounds against arbitrary oracles in $BPP^0$, and more generally, we show that $k$-Hamming Distance matrices cannot be expressed as a Boolean combination of any constant number of matrices which forbid large Greater-Than subproblems.
△ Less
Submitted 31 March, 2024;
originally announced April 2024.
-
Distribution Testing with a Confused Collector
Authors:
Renato Ferreira Pinto Jr.,
Nathaniel Harms
Abstract:
We are interested in testing properties of distributions with systematically mislabeled samples. Our goal is to make decisions about unknown probability distributions, using a sample that has been collected by a confused collector, such as a machine-learning classifier that has not learned to distinguish all elements of the domain. The confused collector holds an unknown clustering of the domain a…
▽ More
We are interested in testing properties of distributions with systematically mislabeled samples. Our goal is to make decisions about unknown probability distributions, using a sample that has been collected by a confused collector, such as a machine-learning classifier that has not learned to distinguish all elements of the domain. The confused collector holds an unknown clustering of the domain and an input distribution $μ$, and provides two oracles: a sample oracle which produces a sample from $μ$ that has been labeled according to the clustering; and a label-query oracle which returns the label of a query point $x$ according to the clustering.
Our first set of results shows that identity, uniformity, and equivalence of distributions can be tested efficiently, under the earth-mover distance, with remarkably weak conditions on the confused collector, even when the unknown clustering is adversarial. This requires defining a variant of the distribution testing task (inspired by the recent testable learning framework of Rubinfeld & Vasilyan), where the algorithm should test a joint property of the distribution and its clustering. As an example, we get efficient testers when the distribution tester is allowed to reject if it detects that the confused collector clustering is "far" from being a decision tree.
The second set of results shows that we can sometimes do significantly better when the clustering is random instead of adversarial. For certain one-dimensional random clusterings, we show that uniformity can be tested under the TV distance using $\widetilde O\left(\frac{\sqrt n}{ρ^{3/2} ε^2}\right)$ samples and zero queries, where $ρ\in (0,1]$ controls the "resolution" of the clustering. We improve this to $O\left(\frac{\sqrt n}{ρε^2}\right)$ when queries are allowed.
△ Less
Submitted 23 November, 2023;
originally announced November 2023.
-
Randomized Communication and Implicit Representations for Matrices and Graphs of Small Sign-Rank
Authors:
Nathaniel Harms,
Viktor Zamaraev
Abstract:
We prove a characterization of the structural conditions on matrices of sign-rank 3 and unit disk graphs (UDGs) which permit constant-cost public-coin randomized communication protocols. Therefore, under these conditions, these graphs also admit implicit representations.
The sign-rank of a matrix $M \in \{\pm 1\}^{N \times N}$ is the smallest rank of a matrix $R$ such that…
▽ More
We prove a characterization of the structural conditions on matrices of sign-rank 3 and unit disk graphs (UDGs) which permit constant-cost public-coin randomized communication protocols. Therefore, under these conditions, these graphs also admit implicit representations.
The sign-rank of a matrix $M \in \{\pm 1\}^{N \times N}$ is the smallest rank of a matrix $R$ such that $M_{i,j} = \mathrm{sign}(R_{i,j})$ for all $i,j \in [N]$; equivalently, it is the smallest dimension $d$ in which $M$ can be represented as a point-halfspace incidence matrix with halfspaces through the origin, and it is essentially equivalent to the unbounded-error communication complexity. Matrices of sign-rank 3 can achieve the maximum possible bounded-error randomized communication complexity $Θ(\log N)$, and meanwhile the existence of implicit representations for graphs of bounded sign-rank (including UDGs, which have sign-rank 4) has been open since at least 2003. We prove that matrices of sign-rank 3, and UDGs, have constant randomized communication complexity if and only if they do not encode arbitrarily large instances of the Greater-Than communication problem, or, equivalently, if they do not contain arbitrarily large half-graphs as semi-induced subgraphs. This also establishes the existence of implicit representations for these graphs under the same conditions.
△ Less
Submitted 10 July, 2023;
originally announced July 2023.
-
Testing and Learning Convex Sets in the Ternary Hypercube
Authors:
Hadley Black,
Eric Blais,
Nathaniel Harms
Abstract:
We study the problems of testing and learning high-dimensional discrete convex sets. The simplest high-dimensional discrete domain where convexity is a non-trivial property is the ternary hypercube, $\{-1,0,1\}^n$. The goal of this work is to understand structural combinatorial properties of convex sets in this domain and to determine the complexity of the testing and learning problems. We obtain…
▽ More
We study the problems of testing and learning high-dimensional discrete convex sets. The simplest high-dimensional discrete domain where convexity is a non-trivial property is the ternary hypercube, $\{-1,0,1\}^n$. The goal of this work is to understand structural combinatorial properties of convex sets in this domain and to determine the complexity of the testing and learning problems. We obtain the following results.
Structural: We prove nearly tight bounds on the edge boundary of convex sets in $\{0,\pm 1\}^n$, showing that the maximum edge boundary of a convex set is $\widetilde Θ(n^{3/4}) \cdot 3^n$, or equivalently that every convex set has influence $\widetilde{O}(n^{3/4})$ and a convex set exists with influence $Ω(n^{3/4})$.
Learning and sample-based testing: We prove upper and lower bounds of $3^{\widetilde{O}(n^{3/4})}$ and $3^{Ω(\sqrt{n})}$ for the task of learning convex sets under the uniform distribution from random examples. The analysis of the learning algorithm relies on our upper bound on the influence. Both the upper and lower bound also hold for the problem of sample-based testing with two-sided error. For sample-based testing with one-sided error we show that the sample-complexity is $3^{Θ(n)}$.
Testing with queries: We prove nearly matching upper and lower bounds of $3^{\widetildeΘ(\sqrt{n})}$ for one-sided error testing of convex sets with non-adaptive queries.
△ Less
Submitted 18 November, 2023; v1 submitted 4 May, 2023;
originally announced May 2023.
-
Distribution Testing Under the Parity Trace
Authors:
Renato Ferreira Pinto Jr.,
Nathaniel Harms
Abstract:
Distribution testing is a fundamental statistical task with many applications, but we are interested in a variety of problems where systematic mislabelings of the sample prevent us from applying the existing theory. To apply distribution testing to these problems, we introduce distribution testing under the parity trace, where the algorithm receives an ordered sample $S$ that reveals only the leas…
▽ More
Distribution testing is a fundamental statistical task with many applications, but we are interested in a variety of problems where systematic mislabelings of the sample prevent us from applying the existing theory. To apply distribution testing to these problems, we introduce distribution testing under the parity trace, where the algorithm receives an ordered sample $S$ that reveals only the least significant bit of each element. This abstraction reveals connections between the following three problems of interest, allowing new upper and lower bounds:
1. In distribution testing with a confused collector, the collector of the sample may be incapable of distinguishing between nearby elements of a domain (e.g. a machine learning classifier). We prove bounds for distribution testing with a confused collector on domains structured as a cycle or a path.
2. Recent work on the fundamental testing vs. learning question established tight lower bounds on distribution-free sample-based property testing by reduction from distribution testing, but the tightness is limited to symmetric properties. The parity trace allows a broader family of equivalences to non-symmetric properties, while recovering and strengthening many of the previous results with a different technique.
3. We give the first results for property testing in the well-studied trace reconstruction model, where the goal is to test whether an unknown string $x$ satisfies some property or is far from satisfying that property, given only independent random traces of $x$.
Our main technical result is a tight bound of $\widetilde Θ\left((n/ε)^{4/5} + \sqrt n/ε^2\right)$ for testing uniformity of distributions over $[n]$ under the parity trace, leading also to results for the problems above.
△ Less
Submitted 3 April, 2023;
originally announced April 2023.
-
Graphs with minimum fractional domatic number
Authors:
Maximilien Gadouleau,
Nathaniel Harms,
George B. Mertzios,
Viktor Zamaraev
Abstract:
The domatic number of a graph is the maximum number of vertex disjoint dominating sets that partition the vertex set of the graph. In this paper we consider the fractional variant of this notion. Graphs with fractional domatic number 1 are exactly the graphs that contain an isolated vertex. Furthermore, it is known that all other graphs have fractional domatic number at least 2. In this note we ch…
▽ More
The domatic number of a graph is the maximum number of vertex disjoint dominating sets that partition the vertex set of the graph. In this paper we consider the fractional variant of this notion. Graphs with fractional domatic number 1 are exactly the graphs that contain an isolated vertex. Furthermore, it is known that all other graphs have fractional domatic number at least 2. In this note we characterize graphs with fractional domatic number 2. More specifically, we show that a graph without isolated vertices has fractional domatic number 2 if and only if it has a vertex of degree 1 or a connected component isomorphic to a 4-cycle. We conjecture that if the fractional domatic number is more than 2, then it is at least 7/3.
△ Less
Submitted 13 October, 2023; v1 submitted 22 February, 2023;
originally announced February 2023.
-
Optimal Adjacency Labels for Subgraphs of Cartesian Products
Authors:
Louis Esperet,
Nathaniel Harms,
Viktor Zamaraev
Abstract:
For any hereditary graph class $F$, we construct optimal adjacency labeling schemes for the classes of subgraphs and induced subgraphs of Cartesian products of graphs in $F$. As a consequence, we show that, if $F$ admits efficient adjacency labels (or, equivalently, small induced-universal graphs) meeting the information-theoretic minimum, then the classes of subgraphs and induced subgraphs of Car…
▽ More
For any hereditary graph class $F$, we construct optimal adjacency labeling schemes for the classes of subgraphs and induced subgraphs of Cartesian products of graphs in $F$. As a consequence, we show that, if $F$ admits efficient adjacency labels (or, equivalently, small induced-universal graphs) meeting the information-theoretic minimum, then the classes of subgraphs and induced subgraphs of Cartesian products of graphs in $F$ do too. Our proof uses ideas from randomized communication complexity, hashing, and additive combinatorics, and improves upon recent results of Chepoi, Labourel, and Ratel [Journal of Graph Theory, 2020].
△ Less
Submitted 22 April, 2024; v1 submitted 6 June, 2022;
originally announced June 2022.
-
Sketching Distances in Monotone Graph Classes
Authors:
Louis Esperet,
Nathaniel Harms,
Andrey Kupavskii
Abstract:
We study the two-player communication problem of determining whether two vertices $x, y$ are nearby in a graph $G$, with the goal of determining the graph structures that allow the problem to be solved with a constant-cost randomized protocol. Equivalently, we consider the problem of assigning constant-size random labels (sketches) to the vertices of a graph, which allow adjacency, exact distance…
▽ More
We study the two-player communication problem of determining whether two vertices $x, y$ are nearby in a graph $G$, with the goal of determining the graph structures that allow the problem to be solved with a constant-cost randomized protocol. Equivalently, we consider the problem of assigning constant-size random labels (sketches) to the vertices of a graph, which allow adjacency, exact distance thresholds, or approximate distance thresholds to be computed with high probability from the labels.
Our main results are that, for monotone classes of graphs: constant-size adjacency sketches exist if and only if the class has bounded arboricity; constant-size sketches for exact distance thresholds exist if and only if the class has bounded expansion; constant-size approximate distance threshold (ADT) sketches imply that the class has bounded expansion; any class of constant expansion (i.e. any proper minor closed class) has constant-size ADT sketches; and a class may have arbitrarily small expansion without admitting constant-size ADT sketches.
△ Less
Submitted 15 December, 2023; v1 submitted 18 February, 2022;
originally announced February 2022.
-
Randomized Communication and Implicit Graph Representations
Authors:
Nathaniel Harms,
Sebastian Wild,
Viktor Zamaraev
Abstract:
We study constant-cost randomized communication problems and relate them to implicit graph representations in structural graph theory. Specifically, constant-cost communication problems correspond to hereditary graph families that admit constant-size adjacency sketches, or equivalently constant-size probabilistic universal graphs (PUGs), and these graph families are a subset of families that admit…
▽ More
We study constant-cost randomized communication problems and relate them to implicit graph representations in structural graph theory. Specifically, constant-cost communication problems correspond to hereditary graph families that admit constant-size adjacency sketches, or equivalently constant-size probabilistic universal graphs (PUGs), and these graph families are a subset of families that admit adjacency labeling schemes of size O(log n), which are the subject of the well-studied implicit graph question (IGQ).
We initiate the study of the hereditary graph families that admit constant-size PUGs, with the two (equivalent) goals of (1) understanding randomized constant-cost communication problems, and (2) understanding a probabilistic version of the IGQ. For each family $\mathcal F$ studied in this paper (including the monogenic bipartite families, product graphs, interval and permutation graphs, families of bounded twin-width, and others), it holds that the subfamilies $\mathcal H \subseteq \mathcal F$ admit constant-size PUGs (i.e. adjacency sketches) if and only if they are stable (i.e. they forbid a half-graph as a semi-induced subgraph).
The correspondence between communication problems and hereditary graph families allows for a new method of constructing adjacency labeling schemes. By this method, we show that the induced subgraphs of any Cartesian products are positive examples to the IGQ. We prove that this probabilistic construction cannot be derandomized by using an Equality oracle, i.e. the Equality oracle cannot simulate the k-Hamming Distance communication protocol. We also obtain constant-size sketches for deciding $\mathsf{dist}(x, y) \le k$ for vertices $x$, $y$ in any stable graph family with bounded twin-width. This generalizes to constant-size sketches for deciding first-order formulas over the same graphs.
△ Less
Submitted 18 July, 2023; v1 submitted 5 November, 2021;
originally announced November 2021.
-
VC Dimension and Distribution-Free Sample-Based Testing
Authors:
Eric Blais,
Renato Ferreira Pinto Jr.,
Nathaniel Harms
Abstract:
We consider the problem of determining which classes of functions can be tested more efficiently than they can be learned, in the distribution-free sample-based model that corresponds to the standard PAC learning setting. Our main result shows that while VC dimension by itself does not always provide tight bounds on the number of samples required to test a class of functions in this model, it can…
▽ More
We consider the problem of determining which classes of functions can be tested more efficiently than they can be learned, in the distribution-free sample-based model that corresponds to the standard PAC learning setting. Our main result shows that while VC dimension by itself does not always provide tight bounds on the number of samples required to test a class of functions in this model, it can be combined with a closely-related variant that we call "lower VC" (or LVC) dimension to obtain strong lower bounds on this sample complexity.
We use this result to obtain strong and in many cases nearly optimal lower bounds on the sample complexity for testing unions of intervals, halfspaces, intersections of halfspaces, polynomial threshold functions, and decision trees. Conversely, we show that two natural classes of functions, juntas and monotone functions, can be tested with a number of samples that is polynomially smaller than the number of samples required for PAC learning.
Finally, we also use the connection between VC dimension and property testing to establish new lower bounds for testing radius clusterability and testing feasibility of linear constraint systems.
△ Less
Submitted 7 December, 2020;
originally announced December 2020.
-
Downsampling for Testing and Learning in Product Distributions
Authors:
Nathaniel Harms,
Yuichi Yoshida
Abstract:
We study distribution-free property testing and learning problems where the unknown probability distribution is a product distribution over $\mathbb{R}^d$. For many important classes of functions, such as intersections of halfspaces, polynomial threshold functions, convex sets, and $k$-alternating functions, the known algorithms either have complexity that depends on the support size of the distri…
▽ More
We study distribution-free property testing and learning problems where the unknown probability distribution is a product distribution over $\mathbb{R}^d$. For many important classes of functions, such as intersections of halfspaces, polynomial threshold functions, convex sets, and $k$-alternating functions, the known algorithms either have complexity that depends on the support size of the distribution, or are proven to work only for specific examples of product distributions. We introduce a general method, which we call downsampling, that resolves these issues. Downsampling uses a notion of "rectilinear isoperimetry" for product distributions, which further strengthens the connection between isoperimetry, testing, and learning. Using this technique, we attain new efficient distribution-free algorithms under product distributions on $\mathbb{R}^d$:
1. A simpler proof for non-adaptive, one-sided monotonicity testing of functions $[n]^d \to \{0,1\}$, and improved sample complexity for testing monotonicity over unknown product distributions, from $O(d^7)$ [Black, Chakrabarty, & Seshadhri, SODA 2020] to $\widetilde O(d^3)$.
2. Polynomial-time agnostic learning algorithms for functions of a constant number of halfspaces, and constant-degree polynomial threshold functions.
3. An $\exp(O(d \log(dk)))$-time agnostic learning algorithm, and an $\exp(O(d \log(dk)))$-sample tolerant tester, for functions of $k$ convex sets; and a $2^{\widetilde O(d)}$ sample-based one-sided tester for convex sets.
4. An $\exp(\widetilde O(k \sqrt d))$-time agnostic learning algorithm for $k$-alternating functions, and a sample-based tolerant tester with the same complexity.
△ Less
Submitted 15 November, 2021; v1 submitted 14 July, 2020;
originally announced July 2020.
-
Universal Communication, Universal Graphs, and Graph Labeling
Authors:
Nathaniel Harms
Abstract:
We introduce a communication model called universal SMP, in which Alice and Bob receive a function $f$ belonging to a family $\mathcal{F}$, and inputs $x$ and $y$. Alice and Bob use shared randomness to send a message to a third party who cannot see $f, x, y$, or the shared randomness, and must decide $f(x,y)$. Our main application of universal SMP is to relate communication complexity to graph la…
▽ More
We introduce a communication model called universal SMP, in which Alice and Bob receive a function $f$ belonging to a family $\mathcal{F}$, and inputs $x$ and $y$. Alice and Bob use shared randomness to send a message to a third party who cannot see $f, x, y$, or the shared randomness, and must decide $f(x,y)$. Our main application of universal SMP is to relate communication complexity to graph labeling, where the goal is to give a short label to each vertex in a graph, so that adjacency or other functions of two vertices $x$ and $y$ can be determined from the labels $\ell(x),\ell(y)$. We give a universal SMP protocol using $O(k^2)$ bits of communication for deciding whether two vertices have distance at most $k$ on distributive lattices (generalizing the $k$-Hamming Distance problem in communication complexity), and explain how this implies an $O(k^2\log n)$ labeling scheme for determining $\mathrm{dist}(x,y) \leq k$ on distributive lattices with size $n$; in contrast, we show that a universal SMP protocol for determining $\mathrm{dist}(x,y) \leq 2$ in modular lattices (a superset of distributive lattices) has super-constant $Ω(n^{1/4})$ communication cost. On the other hand, we demonstrate that many graph families known to have efficient adjacency labeling schemes, such as trees, low-arboricity graphs, and planar graphs, admit constant-cost communication protocols for adjacency. Trees also have an $O(k)$ protocol for deciding $\mathrm{dist}(x,y) \leq k$ and planar graphs have an $O(1)$ protocol for $\mathrm{dist}(x,y) \leq 2$, which implies a new $O(\log n)$ labeling scheme for the same problem on planar graphs.
△ Less
Submitted 9 November, 2019;
originally announced November 2019.
-
Testing Halfspaces over Rotation-Invariant Distributions
Authors:
Nathaniel Harms
Abstract:
We present an algorithm for testing halfspaces over arbitrary, unknown rotation-invariant distributions. Using $\tilde O(\sqrt{n}ε^{-7})$ random examples of an unknown function $f$, the algorithm determines with high probability whether $f$ is of the form $f(x) = sign(\sum_i w_ix_i-t)$ or is $ε$-far from all such functions. This sample size is significantly smaller than the well-known requirement…
▽ More
We present an algorithm for testing halfspaces over arbitrary, unknown rotation-invariant distributions. Using $\tilde O(\sqrt{n}ε^{-7})$ random examples of an unknown function $f$, the algorithm determines with high probability whether $f$ is of the form $f(x) = sign(\sum_i w_ix_i-t)$ or is $ε$-far from all such functions. This sample size is significantly smaller than the well-known requirement of $Ω(n)$ samples for learning halfspaces, and known lower bounds imply that our sample size is optimal (in its dependence on $n$) up to logarithmic factors. The algorithm is distribution-free in the sense that it requires no knowledge of the distribution aside from the promise of rotation invariance. To prove the correctness of this algorithm we present a theorem relating the distance between a function and a halfspace to the distance between their centers of mass, that applies to arbitrary distributions.
△ Less
Submitted 31 October, 2018;
originally announced November 2018.