-
An Analysis of the t-SNE Algorithm for Data Visualization
Authors:
Sanjeev Arora,
Wei Hu,
Pravesh K. Kothari
Abstract:
A first line of attack in exploratory data analysis is data visualization, i.e., generating a 2-dimensional representation of data that makes clusters of similar points visually identifiable. Standard Johnson-Lindenstrauss dimensionality reduction does not produce data visualizations. The t-SNE heuristic of van der Maaten and Hinton, which is based on non-convex optimization, has become the de fac…
▽ More
A first line of attack in exploratory data analysis is data visualization, i.e., generating a 2-dimensional representation of data that makes clusters of similar points visually identifiable. Standard Johnson-Lindenstrauss dimensionality reduction does not produce data visualizations. The t-SNE heuristic of van der Maaten and Hinton, which is based on non-convex optimization, has become the de facto standard for visualization in a wide range of applications.
This work gives a formal framework for the problem of data visualization - finding a 2-dimensional embedding of clusterable data that correctly separates individual clusters to make them visually identifiable. We then give a rigorous analysis of the performance of t-SNE under a natural, deterministic condition on the "ground-truth" clusters (similar to conditions assumed in earlier analyses of clustering) in the underlying data. These are the first provable guarantees on t-SNE for constructing good data visualizations.
We show that our deterministic condition is satisfied by considerably general probabilistic generative models for clusterable data such as mixtures of well-separated log-concave distributions. Finally, we give theoretical evidence that t-SNE provably succeeds in partially recovering cluster structure even when the above deterministic condition is not met.
△ Less
Submitted 6 June, 2018; v1 submitted 5 March, 2018;
originally announced March 2018.
-
Surprise in Elections
Authors:
Palash Dey,
Pravesh K. Kothari,
Swaprava Nath
Abstract:
Elections involving a very large voter population often lead to outcomes that surprise many. This is particularly important for the elections in which results affect the economy of a sizable population. A better prediction of the true outcome helps reduce the surprise and keeps the voters prepared. This paper starts from the basic observation that individuals in the underlying population build est…
▽ More
Elections involving a very large voter population often lead to outcomes that surprise many. This is particularly important for the elections in which results affect the economy of a sizable population. A better prediction of the true outcome helps reduce the surprise and keeps the voters prepared. This paper starts from the basic observation that individuals in the underlying population build estimates of the distribution of preferences of the whole population based on their local neighborhoods. The outcome of the election leads to a surprise if these local estimates contradict the outcome of the election for some fixed voting rule. To get a quantitative understanding, we propose a simple mathematical model of the setting where the individuals in the population and their connections (through geographical proximity, social networks etc.) are described by a random graph with connection probabilities that are biased based on the preferences of the individuals. Each individual also has some estimate of the bias in their connections.
We show that the election outcome leads to a surprise if the discrepancy between the estimated bias and the true bias in the local connections exceeds a certain threshold, and confirm the phenomenon that surprising outcomes are associated only with {\em closely contested elections}. We compare standard voting rules based on their performance on surprise and show that they have different behavior for different parts of the population. It also hints at an impossibility that a single voting rule will be less surprising for {\em all} parts of a population. Finally, we experiment with the UK-EU referendum (a.k.a.\ Brexit) dataset that attest some of our theoretical predictions.
△ Less
Submitted 30 January, 2018;
originally announced January 2018.
-
Outlier-robust moment-estimation via sum-of-squares
Authors:
Pravesh K. Kothari,
David Steurer
Abstract:
We develop efficient algorithms for estimating low-degree moments of unknown distributions in the presence of adversarial outliers. The guarantees of our algorithms improve in many cases significantly over the best previous ones, obtained in recent works of Diakonikolas et al, Lai et al, and Charikar et al. We also show that the guarantees of our algorithms match information-theoretic lower-bounds…
▽ More
We develop efficient algorithms for estimating low-degree moments of unknown distributions in the presence of adversarial outliers. The guarantees of our algorithms improve in many cases significantly over the best previous ones, obtained in recent works of Diakonikolas et al, Lai et al, and Charikar et al. We also show that the guarantees of our algorithms match information-theoretic lower-bounds for the class of distributions we consider. These improved guarantees allow us to give improved algorithms for independent component analysis and learning mixtures of Gaussians in the presence of outliers.
Our algorithms are based on a standard sum-of-squares relaxation of the following conceptually-simple optimization problem: Among all distributions whose moments are bounded in the same way as for the unknown distribution, find the one that is closest in statistical distance to the empirical distribution of the adversarially-corrupted sample.
△ Less
Submitted 23 December, 2017; v1 submitted 30 November, 2017;
originally announced November 2017.
-
Better Agnostic Clustering Via Relaxed Tensor Norms
Authors:
Pravesh K. Kothari,
Jacob Steinhardt
Abstract:
We develop a new family of convex relaxations for $k$-means clustering based on sum-of-squares norms, a relaxation of the injective tensor norm that is efficiently computable using the Sum-of-Squares algorithm. We give an algorithm based on this relaxation that recovers a faithful approximation to the true means in the given data whenever the low-degree moments of the points in each cluster have b…
▽ More
We develop a new family of convex relaxations for $k$-means clustering based on sum-of-squares norms, a relaxation of the injective tensor norm that is efficiently computable using the Sum-of-Squares algorithm. We give an algorithm based on this relaxation that recovers a faithful approximation to the true means in the given data whenever the low-degree moments of the points in each cluster have bounded sum-of-squares norms.
We then prove a sharp upper bound on the sum-of-squares norms for moment tensors of any distribution that satisfies the \emph{Poincare inequality}. The Poincare inequality is a central inequality in probability theory, and a large class of distributions satisfy it including Gaussians, product distributions, strongly log-concave distributions, and any sum or uniformly continuous transformation of such distributions.
As an immediate corollary, for any $γ> 0$, we obtain an efficient algorithm for learning the means of a mixture of $k$ arbitrary \Poincare distributions in $\mathbb{R}^d$ in time $d^{O(1/γ)}$ so long as the means have separation $Ω(k^γ)$. This in particular yields an algorithm for learning Gaussian mixtures with separation $Ω(k^γ)$, thus partially resolving an open problem of Regev and Vijayaraghavan \citet{regev2017learning}.
Our algorithm works even in the outlier-robust setting where an $ε$ fraction of arbitrary outliers are added to the data, as long as the fraction of outliers is smaller than the smallest cluster. We, therefore, obtain results in the strong agnostic setting where, in addition to not knowing the distribution family, the data itself may be arbitrarily corrupted.
△ Less
Submitted 20 November, 2017;
originally announced November 2017.
-
The power of sum-of-squares for detecting hidden structures
Authors:
Samuel B. Hopkins,
Pravesh K. Kothari,
Aaron Potechin,
Prasad Raghavendra,
Tselil Schramm,
David Steurer
Abstract:
We study planted problems---finding hidden structures in random noisy inputs---through the lens of the sum-of-squares semidefinite programming hierarchy (SoS). This family of powerful semidefinite programs has recently yielded many new algorithms for planted problems, often achieving the best known polynomial-time guarantees in terms of accuracy of recovered solutions and robustness to noise. One…
▽ More
We study planted problems---finding hidden structures in random noisy inputs---through the lens of the sum-of-squares semidefinite programming hierarchy (SoS). This family of powerful semidefinite programs has recently yielded many new algorithms for planted problems, often achieving the best known polynomial-time guarantees in terms of accuracy of recovered solutions and robustness to noise. One theme in recent work is the design of spectral algorithms which match the guarantees of SoS algorithms for planted problems. Classical spectral algorithms are often unable to accomplish this: the twist in these new spectral algorithms is the use of spectral structure of matrices whose entries are low-degree polynomials of the input variables. We prove that for a wide class of planted problems, including refuting random constraint satisfaction problems, tensor and sparse PCA, densest-k-subgraph, community detection in stochastic block models, planted clique, and others, eigenvalues of degree-d matrix polynomials are as powerful as SoS semidefinite programs of roughly degree d. For such problems it is therefore always possible to match the guarantees of SoS without solving a large semidefinite program. Using related ideas on SoS algorithms and low-degree matrix polynomials (and inspired by recent work on SoS and the planted clique problem by Barak et al.), we prove new nearly-tight SoS lower bounds for the tensor and sparse principal component analysis problems. Our lower bounds for sparse principal component analysis are the first to suggest that going beyond existing algorithms for this problem may require sub-exponential time.
△ Less
Submitted 13 October, 2017;
originally announced October 2017.
-
Engineering Enhanced Thermal Transport in Layered Nanomaterials
Authors:
Abhinav Malhotra,
Kartik Kothari,
Martin Maldovan
Abstract:
A comprehensive rational thermal material design paradigm requires the ability to reduce and enhance the thermal conductivities of nanomaterials. In contrast to the existing ability to reduce the thermal conductivity, methods that allow to enhance heat conduction are currently limited. Enhancing the nanoscale thermal conductivity could bring radical improvements in the performance of electronics,…
▽ More
A comprehensive rational thermal material design paradigm requires the ability to reduce and enhance the thermal conductivities of nanomaterials. In contrast to the existing ability to reduce the thermal conductivity, methods that allow to enhance heat conduction are currently limited. Enhancing the nanoscale thermal conductivity could bring radical improvements in the performance of electronics, optoelectronics, and photovoltaic systems. Here, we show that enhanced thermal conductivities can be achieved in semiconductor nanostructures by rationally engineering phonon spectral coupling between materials. By embedding a germanium film between silicon layers, we show that its thermal conductivity can be increased by more than 100% at room temperature in contrast to a free standing thin-film. The injection of phonons from the cladding silicon layers creates the observed enhancement in thermal conductivity. We study the key factors underlying the phonon injection mechanism and find that the surface roughness and layer thicknesses play a determining role. The findings presented in this letter will allow for the creation of nanomaterials with an increased thermal conductivity.
△ Less
Submitted 5 October, 2017;
originally announced October 2017.
-
Agnostic Learning by Refuting
Authors:
Pravesh K. Kothari,
Roi Livni
Abstract:
The sample complexity of learning a Boolean-valued function class is precisely characterized by its Rademacher complexity. This has little bearing, however, on the sample complexity of \emph{efficient} agnostic learning.
We introduce \emph{refutation complexity}, a natural computational analog of Rademacher complexity of a Boolean concept class and show that it exactly characterizes the sample c…
▽ More
The sample complexity of learning a Boolean-valued function class is precisely characterized by its Rademacher complexity. This has little bearing, however, on the sample complexity of \emph{efficient} agnostic learning.
We introduce \emph{refutation complexity}, a natural computational analog of Rademacher complexity of a Boolean concept class and show that it exactly characterizes the sample complexity of \emph{efficient} agnostic learning. Informally, refutation complexity of a class $\mathcal{C}$ is the minimum number of example-label pairs required to efficiently distinguish between the case that the labels correlate with the evaluation of some member of $\mathcal{C}$ (\emph{structure}) and the case where the labels are i.i.d. Rademacher random variables (\emph{noise}). The easy direction of this relationship was implicitly used in the recent framework for improper PAC learning lower bounds of Daniely and co-authors via connections to the hardness of refuting random constraint satisfaction problems. Our work can be seen as making the relationship between agnostic learning and refutation implicit in their work into an explicit equivalence. In a recent, independent work, Salil Vadhan discovered a similar relationship between refutation and PAC-learning in the realizable (i.e. noiseless) case.
△ Less
Submitted 30 November, 2017; v1 submitted 12 September, 2017;
originally announced September 2017.
-
Quantum entanglement, sum of squares, and the log rank conjecture
Authors:
Boaz Barak,
Pravesh Kothari,
David Steurer
Abstract:
For every $ε>0$, we give an $\exp(\tilde{O}(\sqrt{n}/ε^2))$-time algorithm for the $1$ vs $1-ε$ \emph{Best Separable State (BSS)} problem of distinguishing, given an $n^2\times n^2$ matrix $\mathcal{M}$ corresponding to a quantum measurement, between the case that there is a separable (i.e., non-entangled) state $ρ$ that $\mathcal{M}$ accepts with probability $1$, and the case that every separable…
▽ More
For every $ε>0$, we give an $\exp(\tilde{O}(\sqrt{n}/ε^2))$-time algorithm for the $1$ vs $1-ε$ \emph{Best Separable State (BSS)} problem of distinguishing, given an $n^2\times n^2$ matrix $\mathcal{M}$ corresponding to a quantum measurement, between the case that there is a separable (i.e., non-entangled) state $ρ$ that $\mathcal{M}$ accepts with probability $1$, and the case that every separable state is accepted with probability at most $1-ε$. Equivalently, our algorithm takes the description of a subspace $\mathcal{W} \subseteq \mathbb{F}^{n^2}$ (where $\mathbb{F}$ can be either the real or complex field) and distinguishes between the case that $\mathcal{W}$ contains a rank one matrix, and the case that every rank one matrix is at least $ε$ far (in $\ell_2$ distance) from $\mathcal{W}$.
To the best of our knowledge, this is the first improvement over the brute-force $\exp(n)$-time algorithm for this problem. Our algorithm is based on the \emph{sum-of-squares} hierarchy and its analysis is inspired by Lovett's proof (STOC '14, JACM '16) that the communication complexity of every rank-$n$ Boolean matrix is bounded by $\tilde{O}(\sqrt{n})$.
△ Less
Submitted 9 July, 2017; v1 submitted 23 January, 2017;
originally announced January 2017.
-
Sum of squares lower bounds for refuting any CSP
Authors:
Pravesh K. Kothari,
Ryuhei Mori,
Ryan O'Donnell,
David Witmer
Abstract:
Let $P:\{0,1\}^k \to \{0,1\}$ be a nontrivial $k$-ary predicate. Consider a random instance of the constraint satisfaction problem $\mathrm{CSP}(P)$ on $n$ variables with $Δn$ constraints, each being $P$ applied to $k$ randomly chosen literals. Provided the constraint density satisfies $Δ\gg 1$, such an instance is unsatisfiable with high probability. The \emph{refutation} problem is to efficientl…
▽ More
Let $P:\{0,1\}^k \to \{0,1\}$ be a nontrivial $k$-ary predicate. Consider a random instance of the constraint satisfaction problem $\mathrm{CSP}(P)$ on $n$ variables with $Δn$ constraints, each being $P$ applied to $k$ randomly chosen literals. Provided the constraint density satisfies $Δ\gg 1$, such an instance is unsatisfiable with high probability. The \emph{refutation} problem is to efficiently find a proof of unsatisfiability.
We show that whenever the predicate $P$ supports a $t$-\emph{wise uniform} probability distribution on its satisfying assignments, the sum of squares (SOS) algorithm of degree $d = Θ(\frac{n}{Δ^{2/(t-1)} \log Δ})$ (which runs in time $n^{O(d)}$) \emph{cannot} refute a random instance of $\mathrm{CSP}(P)$. In particular, the polynomial-time SOS algorithm requires $\widetildeΩ(n^{(t+1)/2})$ constraints to refute random instances of CSP$(P)$ when $P$ supports a $t$-wise uniform distribution on its satisfying assignments. Together with recent work of Lee et al. [LRS15], our result also implies that \emph{any} polynomial-size semidefinite programming relaxation for refutation requires at least $\widetildeΩ(n^{(t+1)/2})$ constraints.
Our results (which also extend with no change to CSPs over larger alphabets) subsume all previously known lower bounds for semialgebraic refutation of random CSPs. For every constraint predicate~$P$, they give a three-way hardness tradeoff between the density of constraints, the SOS degree (hence running time), and the strength of the refutation. By recent algorithmic results of Allen et al. [AOW15] and Raghavendra et al. [RRS16], this full three-way tradeoff is \emph{tight}, up to lower-order factors.
△ Less
Submitted 16 January, 2017;
originally announced January 2017.
-
Approximating Rectangles by Juntas and Weakly-Exponential Lower Bounds for LP Relaxations of CSPs
Authors:
Pravesh K. Kothari,
Raghu Meka,
Prasad Raghavendra
Abstract:
We show that for constraint satisfaction problems (CSPs), sub-exponential size linear programming relaxations are as powerful as $n^{Ω(1)}$-rounds of the Sherali-Adams linear programming hierarchy. As a corollary, we obtain sub-exponential size lower bounds for linear programming relaxations that beat random guessing for many CSPs such as MAX-CUT and MAX-3SAT. This is a nearly-exponential improvem…
▽ More
We show that for constraint satisfaction problems (CSPs), sub-exponential size linear programming relaxations are as powerful as $n^{Ω(1)}$-rounds of the Sherali-Adams linear programming hierarchy. As a corollary, we obtain sub-exponential size lower bounds for linear programming relaxations that beat random guessing for many CSPs such as MAX-CUT and MAX-3SAT. This is a nearly-exponential improvement over previous results, previously, it was only known that linear programs of size $n^{o(\log n)}$ cannot beat random guessing for any CSP (Chan-Lee-Raghavendra-Steurer 2013).
Our bounds are obtained by exploiting and extending the recent progress in communication complexity for "lifting" query lower bounds to communication problems. The main ingredient in our results is a new structural result on "high-entropy rectangles" that may of independent interest in communication complexity.
△ Less
Submitted 30 December, 2017; v1 submitted 9 October, 2016;
originally announced October 2016.
-
Localization and instability in sheared granular materials: Role of friction and vibration
Authors:
Konik R. Kothari,
Ahmed Elbanna
Abstract:
Shear banding and stick-slip instabilities have been long observed in sheared granular materials. Yet, their microscopic underpinnings, interdependencies and variability under different loading conditions have not been fully explored. Here, we use a non-equilibrium thermodynamics model, the Shear Transformation Zone theory, to investigate the dynamics of strain localization and its connection to s…
▽ More
Shear banding and stick-slip instabilities have been long observed in sheared granular materials. Yet, their microscopic underpinnings, interdependencies and variability under different loading conditions have not been fully explored. Here, we use a non-equilibrium thermodynamics model, the Shear Transformation Zone theory, to investigate the dynamics of strain localization and its connection to stability of sliding in sheared, dry, granular materials. We consider frictional and frictionless grains as well as presence and absence of acoustic vibrations. Our results suggest that at low and intermediate strain rates, persistent shear bands develop only in the absence of vibrations. Vibrations tend to fluidize the granular network and de-localize slip at these rates. Stick-slip is only observed for frictional grains and it is confined to the shear band. At high strain rates, stick-slip disappears and the different systems exhibit similar stress-slip response. Changing the vibration intensity, duration or time of application alters the system response and may cause long-lasting rheological changes. We analyse these observations in terms of possible transitions between rate strengthening and rate weakening response facilitated by a competition between shear induced dilation and vibration induced compaction. We discuss the implications of our results on dynamic triggering, quiescence and strength evolution in gouge filled fault zones.
△ Less
Submitted 14 January, 2017; v1 submitted 18 July, 2016;
originally announced July 2016.
-
A Nearly Tight Sum-of-Squares Lower Bound for the Planted Clique Problem
Authors:
Boaz Barak,
Samuel B. Hopkins,
Jonathan Kelner,
Pravesh K. Kothari,
Ankur Moitra,
Aaron Potechin
Abstract:
We prove that with high probability over the choice of a random graph $G$ from the Erdős-Rényi distribution $G(n,1/2)$, the $n^{O(d)}$-time degree $d$ Sum-of-Squares semidefinite programming relaxation for the clique problem will give a value of at least $n^{1/2-c(d/\log n)^{1/2}}$ for some constant $c>0$. This yields a nearly tight $n^{1/2 - o(1)}$ bound on the value of this program for any degre…
▽ More
We prove that with high probability over the choice of a random graph $G$ from the Erdős-Rényi distribution $G(n,1/2)$, the $n^{O(d)}$-time degree $d$ Sum-of-Squares semidefinite programming relaxation for the clique problem will give a value of at least $n^{1/2-c(d/\log n)^{1/2}}$ for some constant $c>0$. This yields a nearly tight $n^{1/2 - o(1)}$ bound on the value of this program for any degree $d = o(\log n)$. Moreover we introduce a new framework that we call \emph{pseudo-calibration} to construct Sum of Squares lower bounds. This framework is inspired by taking a computational analog of Bayesian probability theory. It yields a general recipe for constructing good pseudo-distributions (i.e., dual certificates for the Sum-of-Squares semidefinite program), and sheds further light on the ways in which this hierarchy differs from others.
△ Less
Submitted 12 April, 2016; v1 submitted 11 April, 2016;
originally announced April 2016.
-
SoS and Planted Clique: Tight Analysis of MPW Moments at all Degrees and an Optimal Lower Bound at Degree Four
Authors:
Samuel B. Hopkins,
Pravesh K. Kothari,
Aaron Potechin
Abstract:
The problem of finding large cliques in random graphs and its "planted" variant, where one wants to recover a clique of size $ω\gg \log{(n)}$ added to an \Erdos-\Renyi graph $G \sim G(n,\frac{1}{2})$, have been intensely studied. Nevertheless, existing polynomial time algorithms can only recover planted cliques of size $ω= Ω(\sqrt{n})$. By contrast, information theoretically, one can recover plant…
▽ More
The problem of finding large cliques in random graphs and its "planted" variant, where one wants to recover a clique of size $ω\gg \log{(n)}$ added to an \Erdos-\Renyi graph $G \sim G(n,\frac{1}{2})$, have been intensely studied. Nevertheless, existing polynomial time algorithms can only recover planted cliques of size $ω= Ω(\sqrt{n})$. By contrast, information theoretically, one can recover planted cliques so long as $ω\gg \log{(n)}$. In this work, we continue the investigation of algorithms from the sum of squares hierarchy for solving the planted clique problem begun by Meka, Potechin, and Wigderson (MPW, 2015) and Deshpande and Montanari (DM,2015). Our main results improve upon both these previous works by showing:
1. Degree four SoS does not recover the planted clique unless $ω\gg \sqrt n poly \log n$, improving upon the bound $ω\gg n^{1/3}$ due to DM. A similar result was obtained independently by Raghavendra and Schramm (2015).
2. For $2 < d = o(\sqrt{\log{(n)}})$, degree $2d$ SoS does not recover the planted clique unless $ω\gg n^{1/(d + 1)} /(2^d poly \log n)$, improving upon the bound due to MPW.
Our proof for the second result is based on a fine spectral analysis of the certificate used in the prior works MPW,DM and Feige and Krauthgamer (2003) by decomposing it along an appropriately chosen basis. Along the way, we develop combinatorial tools to analyze the spectrum of random matrices with dependent entries and to understand the symmetries in the eigenspaces of the set symmetric matrices inspired by work of Grigoriev (2001).
An argument of Kelner shows that the first result cannot be proved using the same certificate. Rather, our proof involves constructing and analyzing a new certificate that yields the nearly tight lower bound by "correcting" the certificate of previous works.
△ Less
Submitted 18 July, 2015;
originally announced July 2015.
-
Communication with Contextual Uncertainty
Authors:
Badih Ghazi,
Ilan Komargodski,
Pravesh Kothari,
Madhu Sudan
Abstract:
We introduce a simple model illustrating the role of context in communication and the challenge posed by uncertainty of knowledge of context. We consider a variant of distributional communication complexity where Alice gets some information $x$ and Bob gets $y$, where $(x,y)$ is drawn from a known distribution, and Bob wishes to compute some function $g(x,y)$ (with high probability over $(x,y)$).…
▽ More
We introduce a simple model illustrating the role of context in communication and the challenge posed by uncertainty of knowledge of context. We consider a variant of distributional communication complexity where Alice gets some information $x$ and Bob gets $y$, where $(x,y)$ is drawn from a known distribution, and Bob wishes to compute some function $g(x,y)$ (with high probability over $(x,y)$). In our variant, Alice does not know $g$, but only knows some function $f$ which is an approximation of $g$. Thus, the function being computed forms the context for the communication, and knowing it imperfectly models (mild) uncertainty in this context.
A naive solution would be for Alice and Bob to first agree on some common function $h$ that is close to both $f$ and $g$ and then use a protocol for $h$ to compute $h(x,y)$. We show that any such agreement leads to a large overhead in communication ruling out such a universal solution.
In contrast, we show that if $g$ has a one-way communication protocol with complexity $k$ in the standard setting, then it has a communication protocol with complexity $O(k \cdot (1+I))$ in the uncertain setting, where $I$ denotes the mutual information between $x$ and $y$. In the particular case where the input distribution is a product distribution, the protocol in the uncertain setting only incurs a constant factor blow-up in communication and error.
Furthermore, we show that the dependence on the mutual information $I$ is required. Namely, we construct a class of functions along with a non-product distribution over $(x,y)$ for which the communication complexity is a single bit in the standard setting but at least $Ω(\sqrt{n})$ bits in the uncertain setting.
△ Less
Submitted 19 July, 2015; v1 submitted 19 April, 2015;
originally announced April 2015.
-
Sum of Squares Lower Bounds from Pairwise Independence
Authors:
Boaz Barak,
Siu On Chan,
Pravesh Kothari
Abstract:
We prove that for every $ε>0$ and predicate $P:\{0,1\}^k\rightarrow \{0,1\}$ that supports a pairwise independent distribution, there exists an instance $\mathcal{I}$ of the $\mathsf{Max}P$ constraint satisfaction problem on $n$ variables such that no assignment can satisfy more than a $\tfrac{|P^{-1}(1)|}{2^k}+ε$ fraction of $\mathcal{I}$'s constraints but the degree $Ω(n)$ Sum of Squares semidef…
▽ More
We prove that for every $ε>0$ and predicate $P:\{0,1\}^k\rightarrow \{0,1\}$ that supports a pairwise independent distribution, there exists an instance $\mathcal{I}$ of the $\mathsf{Max}P$ constraint satisfaction problem on $n$ variables such that no assignment can satisfy more than a $\tfrac{|P^{-1}(1)|}{2^k}+ε$ fraction of $\mathcal{I}$'s constraints but the degree $Ω(n)$ Sum of Squares semidefinite programming hierarchy cannot certify that $\mathcal{I}$ is unsatisfiable. Similar results were previously only known for weaker hierarchies.
△ Less
Submitted 26 March, 2015; v1 submitted 4 January, 2015;
originally announced January 2015.
-
Almost Optimal Pseudorandom Generators for Spherical Caps
Authors:
Pravesh Kothari,
Raghu Meka
Abstract:
Halfspaces or linear threshold functions are widely studied in complexity theory, learning theory and algorithm design. In this work we study the natural problem of constructing pseudorandom generators (PRGs) for halfspaces over the sphere, aka spherical caps, which besides being interesting and basic geometric objects, also arise frequently in the analysis of various randomized algorithms (e.g.,…
▽ More
Halfspaces or linear threshold functions are widely studied in complexity theory, learning theory and algorithm design. In this work we study the natural problem of constructing pseudorandom generators (PRGs) for halfspaces over the sphere, aka spherical caps, which besides being interesting and basic geometric objects, also arise frequently in the analysis of various randomized algorithms (e.g., randomized rounding). We give an explicit PRG which fools spherical caps within error $ε$ and has an almost optimal seed-length of $O(\log n + \log(1/ε) \cdot \log\log(1/ε))$. For an inverse-polynomially growing error $ε$, our generator has a seed-length optimal up to a factor of $O( \log \log {(n)})$. The most efficient PRG previously known (due to Kane, 2012) requires a seed-length of $Ω(\log^{3/2}{(n)})$ in this setting. We also obtain similar constructions to fool halfspaces with respect to the Gaussian distribution.
Our construction and analysis are significantly different from previous works on PRGs for halfspaces and build on the iterative dimension reduction ideas of Kane et. al. (2011) and Celis et. al. (2013), the \emph{classical moment problem} from probability theory and explicit constructions of \emph{orthogonal designs} based on the seminal work of Bourgain and Gamburd (2011) on expansion in Lie groups.
△ Less
Submitted 26 March, 2015; v1 submitted 23 November, 2014;
originally announced November 2014.
-
Tight Bounds on $\ell_1$ Approximation and Learning of Self-Bounding Functions
Authors:
Vitaly Feldman,
Pravesh Kothari,
Jan Vondrák
Abstract:
We study the complexity of learning and approximation of self-bounding functions over the uniform distribution on the Boolean hypercube ${0,1}^n$. Informally, a function $f:{0,1}^n \rightarrow \mathbb{R}$ is self-bounding if for every $x \in {0,1}^n$, $f(x)$ upper bounds the sum of all the $n$ marginal decreases in the value of the function at $x$. Self-bounding functions include such well-known c…
▽ More
We study the complexity of learning and approximation of self-bounding functions over the uniform distribution on the Boolean hypercube ${0,1}^n$. Informally, a function $f:{0,1}^n \rightarrow \mathbb{R}$ is self-bounding if for every $x \in {0,1}^n$, $f(x)$ upper bounds the sum of all the $n$ marginal decreases in the value of the function at $x$. Self-bounding functions include such well-known classes of functions as submodular and fractionally-subadditive (XOS) functions. They were introduced by Boucheron et al. (2000) in the context of concentration of measure inequalities. Our main result is a nearly tight $\ell_1$-approximation of self-bounding functions by low-degree juntas. Specifically, all self-bounding functions can be $ε$-approximated in $\ell_1$ by a polynomial of degree $\tilde{O}(1/ε)$ over $2^{\tilde{O}(1/ε)}$ variables. We show that both the degree and junta-size are optimal up to logarithmic terms. Previous techniques considered stronger $\ell_2$ approximation and proved nearly tight bounds of $Θ(1/ε^{2})$ on the degree and $2^{Θ(1/ε^2)}$ on the number of variables. Our bounds rely on the analysis of noise stability of self-bounding functions together with a stronger connection between noise stability and $\ell_1$ approximation by low-degree polynomials. This technique can also be used to get tighter bounds on $\ell_1$ approximation by low-degree polynomials and faster learning algorithm for halfspaces.
These results lead to improved and in several cases almost tight bounds for PAC and agnostic learning of self-bounding functions relative to the uniform distribution. In particular, assuming hardness of learning juntas, we show that PAC and agnostic learning of self-bounding functions have complexity of $n^{\tildeΘ(1/ε)}$.
△ Less
Submitted 1 June, 2019; v1 submitted 18 April, 2014;
originally announced April 2014.