Search | arXiv e-print repository

Efficient Certificates of Anti-Concentration Beyond Gaussians

Authors: Ainesh Bakshi, Pravesh Kothari, Goutham Rajendran, Madhur Tulsiani, Aravindan Vijayaraghavan

Abstract: A set of high dimensional points $X=\{x_1, x_2,\ldots, x_n\} \subset R^d$ in isotropic position is said to be $δ$-anti concentrated if for every direction $v$, the fraction of points in $X$ satisfying $|\langle x_i,v \rangle |\leq δ$ is at most $O(δ)$. Motivated by applications to list-decodable learning and clustering, recent works have considered the problem of constructing efficient certificate… ▽ More A set of high dimensional points $X=\{x_1, x_2,\ldots, x_n\} \subset R^d$ in isotropic position is said to be $δ$-anti concentrated if for every direction $v$, the fraction of points in $X$ satisfying $|\langle x_i,v \rangle |\leq δ$ is at most $O(δ)$. Motivated by applications to list-decodable learning and clustering, recent works have considered the problem of constructing efficient certificates of anti-concentration in the average case, when the set of points $X$ corresponds to samples from a Gaussian distribution. Their certificates played a crucial role in several subsequent works in algorithmic robust statistics on list-decodable learning and settling the robust learnability of arbitrary Gaussian mixtures, yet remain limited to rotationally invariant distributions. This work presents a new (and arguably the most natural) formulation for anti-concentration. Using this formulation, we give quasi-polynomial time verifiable sum-of-squares certificates of anti-concentration that hold for a wide class of non-Gaussian distributions including anti-concentrated bounded product distributions and uniform distributions over $L_p$ balls (and their affine transformations). Consequently, our method upgrades and extends results in algorithmic robust statistics e.g., list-decodable learning and clustering, to such distributions. Our approach constructs a canonical integer program for anti-concentration and analysis a sum-of-squares relaxation of it, independent of the intended application. We rely on duality and analyze a pseudo-expectation on large subsets of the input points that take a small value in some direction. Our analysis uses the method of polynomial reweightings to reduce the problem to analyzing only analytically dense or sparse directions. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.10238 [pdf, other]

Rounding Large Independent Sets on Expanders

Authors: Mitali Bafna, Jun-Ting Hsieh, Pravesh K. Kothari

Abstract: We develop a new approach for approximating large independent sets when the input graph is a one-sided spectral expander - that is, the uniform random walk matrix of the graph has the second eigenvalue bounded away from 1. Consequently, we obtain a polynomial time algorithm to find linear-sized independent sets in one-sided expanders that are almost $3$-colorable or are promised to contain an inde… ▽ More We develop a new approach for approximating large independent sets when the input graph is a one-sided spectral expander - that is, the uniform random walk matrix of the graph has the second eigenvalue bounded away from 1. Consequently, we obtain a polynomial time algorithm to find linear-sized independent sets in one-sided expanders that are almost $3$-colorable or are promised to contain an independent set of size $(1/2-ε)n$. Our second result above can be refined to require only a weaker vertex expansion property with an efficient certificate. Somewhat surprisingly, we observe that the analogous task of finding a linear-sized independent set in almost $4$-colorable one-sided expanders (even when the second eigenvalue is $o_n(1)$) is NP-hard, assuming the Unique Games Conjecture. All prior algorithms that beat the worst-case guarantees for this problem rely on bottom eigenspace enumeration techniques (following the classical spectral methods of Alon and Kahale) and require two-sided expansion, meaning a bounded number of negative eigenvalues of magnitude $Ω(1)$. Such techniques naturally extend to almost $k$-colorable graphs for any constant $k$, in contrast to analogous guarantees on one-sided expanders, which are Unique Games-hard to achieve for $k \geq 4$. Our rounding builds on the method of simulating multiple samples from a pseudodistribution introduced by Barak et. al. for rounding Unique Games instances. The key to our analysis is a new clustering property of large independent sets in expanding graphs - every large independent set has a larger-than-expected intersection with some member of a small list - and its formalization in the low-degree sum-of-squares proof system. △ Less

Submitted 16 May, 2024; originally announced May 2024.

Comments: 57 pages, 3 figures

arXiv:2404.14159 [pdf, ps, other]

Semirandom Planted Clique and the Restricted Isometry Property

Authors: Jarosław Błasiok, Rares-Darius Buhai, Pravesh K. Kothari, David Steurer

Abstract: We give a simple, greedy $O(n^{ω+0.5})=O(n^{2.872})$-time algorithm to list-decode planted cliques in a semirandom model introduced in [CSV17] (following [FK01]) that succeeds whenever the size of the planted clique is $k\geq O(\sqrt{n} \log^2 n)$. In the model, the edges touching the vertices in the planted $k$-clique are drawn independently with probability $p=1/2$ while the edges not touching t… ▽ More We give a simple, greedy $O(n^{ω+0.5})=O(n^{2.872})$-time algorithm to list-decode planted cliques in a semirandom model introduced in [CSV17] (following [FK01]) that succeeds whenever the size of the planted clique is $k\geq O(\sqrt{n} \log^2 n)$. In the model, the edges touching the vertices in the planted $k$-clique are drawn independently with probability $p=1/2$ while the edges not touching the planted clique are chosen by an adversary in response to the random choices. Our result shows that the computational threshold in the semirandom setting is within a $O(\log^2 n)$ factor of the information-theoretic one [Ste17] thus resolving an open question of Steinhardt. This threshold also essentially matches the conjectured computational threshold for the well-studied special case of fully random planted clique. All previous algorithms [CSV17, MMT20, BKS23] in this model are based on rather sophisticated rounding algorithms for entropy-constrained semidefinite programming relaxations and their sum-of-squares strengthenings and the best known guarantee is a $n^{O(1/ε)}$-time algorithm to list-decode planted cliques of size $k \geq \tilde{O}(n^{1/2+ε})$. In particular, the guarantee trivializes to quasi-polynomial time if the planted clique is of size $O(\sqrt{n} \operatorname{polylog} n)$. Our algorithm achieves an almost optimal guarantee with a surprisingly simple greedy algorithm. The prior state-of-the-art algorithmic result above is based on a reduction to certifying bounds on the size of unbalanced bicliques in random graphs -- closely related to certifying the restricted isometry property (RIP) of certain random matrices and known to be hard in the low-degree polynomial model. Our key idea is a new approach that relies on the truth of -- but not efficient certificates for -- RIP of a new class of matrices built from the input graphs. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: 21 pages

arXiv:2404.06513 [pdf, ps, other]

Superpolynomial Lower Bounds for Smooth 3-LCCs and Sharp Bounds for Designs

Authors: Pravesh K. Kothari, Peter Manohar

Abstract: We give improved lower bounds for binary $3$-query locally correctable codes (3-LCCs) $C \colon \{0,1\}^k \rightarrow \{0,1\}^n$. Specifically, we prove: (1) If $C$ is a linear design 3-LCC, then $n \geq 2^{(1 - o(1))\sqrt{k} }$. A design 3-LCC has the additional property that the correcting sets for every codeword bit form a perfect matching and every pair of codeword bits is queried an equal n… ▽ More We give improved lower bounds for binary $3$-query locally correctable codes (3-LCCs) $C \colon \{0,1\}^k \rightarrow \{0,1\}^n$. Specifically, we prove: (1) If $C$ is a linear design 3-LCC, then $n \geq 2^{(1 - o(1))\sqrt{k} }$. A design 3-LCC has the additional property that the correcting sets for every codeword bit form a perfect matching and every pair of codeword bits is queried an equal number of times across all matchings. Our bound is tight up to a factor $\sqrt{8}$ in the exponent of $2$, as the best construction of binary $3$-LCCs (obtained by taking Reed-Muller codes on $\mathbb{F}_4$ and applying a natural projection map) is a design $3$-LCC with $n \leq 2^{\sqrt{8 k}}$. Up to a $\sqrt{8}$ factor, this resolves the Hamada conjecture on the maximum $\mathbb{F}_2$-codimension of a $4$-design. (2) If $C$ is a smooth, non-linear $3$-LCC with near-perfect completeness, then, $n \geq k^{Ω(\log k)}$. (3) If $C$ is a smooth, non-linear $3$-LCC with completeness $1 - \varepsilon$, then $n \geq \tildeΩ(k^{\frac{1}{2\varepsilon}})$. In particular, when $\varepsilon$ is a small constant, this implies a lower bound for general non-linear LCCs that beats the prior best $n \geq \tildeΩ(k^3)$ lower bound of [AGKM23] by a polynomial factor. Our design LCC lower bound is obtained via a fine-grained analysis of the Kikuchi matrix method applied to a variant of the matrix used in [KM23]. Our lower bounds for non-linear codes are obtained by designing a from-scratch reduction from nonlinear $3$-LCCs to a system of "chain polynomial equations": polynomial equations with similar structure to the long chain derivations that arise in the lower bounds for linear $3$-LCCs [KM23]. △ Less

Submitted 9 April, 2024; originally announced April 2024.

arXiv:2401.11590 [pdf, ps, other]

Small Even Covers, Locally Decodable Codes and Restricted Subgraphs of Edge-Colored Kikuchi Graphs

Authors: Jun-Ting Hsieh, Pravesh K. Kothari, Sidhanth Mohanty, David Munhá Correia, Benny Sudakov

Abstract: Given a $k$-uniform hypergraph $H$ on $n$ vertices, an even cover in $H$ is a collection of hyperedges that touch each vertex an even number of times. Even covers are a generalization of cycles in graphs and are equivalent to linearly dependent subsets of a system of linear equations modulo $2$. As a result, they arise naturally in the context of well-studied questions in coding theory and refutin… ▽ More Given a $k$-uniform hypergraph $H$ on $n$ vertices, an even cover in $H$ is a collection of hyperedges that touch each vertex an even number of times. Even covers are a generalization of cycles in graphs and are equivalent to linearly dependent subsets of a system of linear equations modulo $2$. As a result, they arise naturally in the context of well-studied questions in coding theory and refuting unsatisfiable $k$-SAT formulas. Analogous to the irregular Moore bound of Alon, Hoory, and Linial (2002), in 2008, Feige conjectured an extremal trade-off between the number of hyperedges and the length of the smallest even cover in a $k$-uniform hypergraph. This conjecture was recently settled up to a multiplicative logarithmic factor in the number of hyperedges (Guruswami, Kothari, and 1Manohar 2022 and Hsieh, Kothari, and Mohanty 2023). These works introduce the new technique that relates hypergraph even covers to cycles in the associated \emph{Kikuchi} graphs. Their analysis of these Kikuchi graphs, especially for odd $k$, is rather involved and relies on matrix concentration inequalities. In this work, we give a simple and purely combinatorial argument that recovers the best-known bound for Feige's conjecture for even $k$. We also introduce a novel variant of a Kikuchi graph which together with this argument improves the logarithmic factor in the best-known bounds for odd $k$. As an application of our ideas, we also give a purely combinatorial proof of the improved lower bounds (Alrabiah, Guruswami, Kothari and Manohar, 2023) on 3-query binary linear locally decodable codes. △ Less

Submitted 21 January, 2024; originally announced January 2024.

Comments: 19 pages

arXiv:2311.13490 [pdf, other]

Benchmarking Toxic Molecule Classification using Graph Neural Networks and Few Shot Learning

Authors: Bhavya Mehta, Kush Kothari, Reshmika Nambiar, Seema Shrawne

Abstract: Traditional methods like Graph Convolutional Networks (GCNs) face challenges with limited data and class imbalance, leading to suboptimal performance in graph classification tasks during toxicity prediction of molecules as a whole. To address these issues, we harness the power of Graph Isomorphic Networks, Multi Headed Attention and Free Large-scale Adversarial Augmentation separately on Graphs fo… ▽ More Traditional methods like Graph Convolutional Networks (GCNs) face challenges with limited data and class imbalance, leading to suboptimal performance in graph classification tasks during toxicity prediction of molecules as a whole. To address these issues, we harness the power of Graph Isomorphic Networks, Multi Headed Attention and Free Large-scale Adversarial Augmentation separately on Graphs for precisely capturing the structural data of molecules and their toxicological properties. Additionally, we incorporate Few-Shot Learning to improve the model's generalization with limited annotated samples. Extensive experiments on a diverse toxicology dataset demonstrate that our method achieves an impressive state-of-art AUC-ROC value of 0.816, surpassing the baseline GCN model by 11.4%. This highlights the significance of our proposed methodology and Few Shot Learning in advancing Toxic Molecular Classification, with the potential to enhance drug discovery and environmental risk assessment processes. △ Less

Submitted 22 November, 2023; originally announced November 2023.

arXiv:2311.12737 [pdf, other]

doi 10.1109/ICCCNT56998.2023.10307388

Exploring Graph Classification Techniques Under Low Data Constraints: A Comprehensive Study

Authors: Kush Kothari, Bhavya Mehta, Reshmika Nambiar, Seema Shrawne

Abstract: This survey paper presents a brief overview of recent research on graph data augmentation and few-shot learning. It covers various techniques for graph data augmentation, including node and edge perturbation, graph coarsening, and graph generation, as well as the latest developments in few-shot learning, such as meta-learning and model-agnostic meta-learning. The paper explores these areas in dept… ▽ More This survey paper presents a brief overview of recent research on graph data augmentation and few-shot learning. It covers various techniques for graph data augmentation, including node and edge perturbation, graph coarsening, and graph generation, as well as the latest developments in few-shot learning, such as meta-learning and model-agnostic meta-learning. The paper explores these areas in depth and delves into further sub classifications. Rule based approaches and learning based approaches are surveyed under graph augmentation techniques. Few-Shot Learning on graphs is also studied in terms of metric learning techniques and optimization-based techniques. In all, this paper provides an extensive array of techniques that can be employed in solving graph processing problems faced in low-data scenarios. △ Less

Submitted 21 November, 2023; originally announced November 2023.

arXiv:2311.00558 [pdf, other]

An Exponential Lower Bound for Linear 3-Query Locally Correctable Codes

Authors: Pravesh K. Kothari, Peter Manohar

Abstract: We prove that the blocklength $n$ of a linear $3$-query locally correctable code (LCC) $\mathcal{L} \colon {\mathbb F}^k \to {\mathbb F}^n$ with distance $δ$ must be at least $n \geq 2^{Ω\left(\left(\frac{δ^2 k}{(|{\mathbb F}|-1)^2}\right)^{1/8}\right)}$. In particular, the blocklength of a linear $3$-query LCC with constant distance over any small field grows exponentially with $k$. This improves… ▽ More We prove that the blocklength $n$ of a linear $3$-query locally correctable code (LCC) $\mathcal{L} \colon {\mathbb F}^k \to {\mathbb F}^n$ with distance $δ$ must be at least $n \geq 2^{Ω\left(\left(\frac{δ^2 k}{(|{\mathbb F}|-1)^2}\right)^{1/8}\right)}$. In particular, the blocklength of a linear $3$-query LCC with constant distance over any small field grows exponentially with $k$. This improves on the best prior lower bound of $n \geq \tildeΩ(k^3)$ [AGKM23], which holds even for the weaker setting of $3$-query locally decodable codes (LDCs), and comes close to matching the best-known construction of $3$-query LCCs based on binary Reed-Muller codes, which achieve $n \leq 2^{O(k^{1/2})}$. Because there is a $3$-query LDC with a strictly subexponential blocklength [Yek08, Efr09], as a corollary we obtain the first strong separation between $q$-query LCCs and LDCs for any constant $q \geq 3$. Our proof is based on a new upgrade of the method of spectral refutations via Kikuchi matrices developed in recent works [GKM22, HKM23, AGKM23] that reduces establishing (non-)existence of combinatorial objects to proving unsatisfiability of associated XOR instances. Our key conceptual idea is to apply this method with XOR instances obtained via long-chain derivations, a structured variant of low-width resolution for XOR formulas from proof complexity [Gri01, Sch08]. △ Less

Submitted 1 November, 2023; originally announced November 2023.

arXiv:2310.05651 [pdf, other]

FENCE: Fairplay Ensuring Network Chain Entity for Real-Time Multiple ID Detection at Scale In Fantasy Sports

Authors: Akriti Upreti, Kartavya Kothari, Utkarsh Thukral, Vishal Verma

Abstract: Dream11 takes pride in being a unique platform that enables over 190 million fantasy sports users to demonstrate their skills and connect deeper with their favorite sports. While managing such a scale, one issue we are faced with is duplicate/multiple account creation in the system. This is done by some users with the intent of abusing the platform, typically for bonus offers. The challenge is to… ▽ More Dream11 takes pride in being a unique platform that enables over 190 million fantasy sports users to demonstrate their skills and connect deeper with their favorite sports. While managing such a scale, one issue we are faced with is duplicate/multiple account creation in the system. This is done by some users with the intent of abusing the platform, typically for bonus offers. The challenge is to detect these multiple accounts before it is too late. We propose a graph-based solution to solve this problem in which we first predict edges/associations between users. Using the edge information we highlight clusters of colluding multiple accounts. In this paper, we talk about our distributed ML system which is deployed to serve and support the inferences from our detection models. The challenge is to do this in real-time in order to take corrective actions. A core part of this setup also involves human-in-the-loop components for validation, feedback, and ground-truth labeling. △ Less

Submitted 9 October, 2023; originally announced October 2023.

Comments: 7 pages, 7 figures, accepted in AIML Systems 2023

ACM Class: I.2.1

arXiv:2310.00393 [pdf, ps, other]

New SDP Roundings and Certifiable Approximation for Cubic Optimization

Authors: Jun-Ting Hsieh, Pravesh K. Kothari, Lucas Pesenti, Luca Trevisan

Abstract: We give new rounding schemes for SDP relaxations for the problems of maximizing cubic polynomials over the unit sphere and the $n$-dimensional hypercube. In both cases, the resulting algorithms yield a $O(\sqrt{n/k})$ multiplicative approximation in $2^{O(k)} \text{poly}(n)$ time. In particular, we obtain a $O(\sqrt{n/\log n})$ approximation in polynomial time. For the unit sphere, this improves o… ▽ More We give new rounding schemes for SDP relaxations for the problems of maximizing cubic polynomials over the unit sphere and the $n$-dimensional hypercube. In both cases, the resulting algorithms yield a $O(\sqrt{n/k})$ multiplicative approximation in $2^{O(k)} \text{poly}(n)$ time. In particular, we obtain a $O(\sqrt{n/\log n})$ approximation in polynomial time. For the unit sphere, this improves on the rounding algorithms of Bhattiprolu et. al. [BGG+17] that need quasi-polynomial time to obtain a similar approximation guarantee. Over the $n$-dimensional hypercube, our results match the guarantee of a search algorithm of Khot and Naor [KN08] that obtains a similar approximation ratio via techniques from convex geometry. Unlike their method, our algorithm obtains an upper bound on the integrality gap of SDP relaxations for the problem and as a result, also yields a certificate on the optimum value of the input instance. Our results naturally generalize to homogeneous polynomials of higher degree and imply improved algorithms for approximating satisfiable instances of Max-3SAT. Our main motivation is the stark lack of rounding techniques for SDP relaxations of higher degree polynomial optimization in sharp contrast to a rich theory of SDP roundings for the quadratic case. Our rounding algorithms introduce two new ideas: 1) a new polynomial reweighting based method to round sum-of-squares relaxations of higher degree polynomial maximization problems, and 2) a general technique to compress such relaxations down to substantially smaller SDPs by relying on an explicit construction of certain hitting sets. We hope that our work will inspire improved rounding algorithms for polynomial optimization and related problems. △ Less

Submitted 30 September, 2023; originally announced October 2023.

arXiv:2309.16897 [pdf, other]

Efficient Algorithms for Semirandom Planted CSPs at the Refutation Threshold

Authors: Venkatesan Guruswami, Jun-Ting Hsieh, Pravesh K. Kothari, Peter Manohar

Abstract: We present an efficient algorithm to solve semirandom planted instances of any Boolean constraint satisfaction problem (CSP). The semirandom model is a hybrid between worst-case and average-case input models, where the input is generated by (1) choosing an arbitrary planted assignment $x^*$, (2) choosing an arbitrary clause structure, and (3) choosing literal negations for each clause from an arbi… ▽ More We present an efficient algorithm to solve semirandom planted instances of any Boolean constraint satisfaction problem (CSP). The semirandom model is a hybrid between worst-case and average-case input models, where the input is generated by (1) choosing an arbitrary planted assignment $x^*$, (2) choosing an arbitrary clause structure, and (3) choosing literal negations for each clause from an arbitrary distribution "shifted by $x^*$" so that $x^*$ satisfies each constraint. For an $n$ variable semirandom planted instance of a $k$-arity CSP, our algorithm runs in polynomial time and outputs an assignment that satisfies all but a $o(1)$-fraction of constraints, provided that the instance has at least $\tilde{O}(n^{k/2})$ constraints. This matches, up to $polylog(n)$ factors, the clause threshold for algorithms that solve fully random planted CSPs [FPV15], as well as algorithms that refute random and semirandom CSPs [AOW15, AGK21]. Our result shows that despite having worst-case clause structure, the randomness in the literal patterns makes semirandom planted CSPs significantly easier than worst-case, where analogous results require $O(n^k)$ constraints [AKK95, FLP16]. Perhaps surprisingly, our algorithm follows a significantly different conceptual framework when compared to the recent resolution of semirandom CSP refutation. This turns out to be inherent and, at a technical level, can be attributed to the need for relative spectral approximation of certain random matrices - reminiscent of the classical spectral sparsification - which ensures that an SDP can certify the uniqueness of the planted assignment. In contrast, in the refutation setting, it suffices to obtain a weaker guarantee of absolute upper bounds on the spectral norm of related matrices. △ Less

Submitted 28 September, 2023; originally announced September 2023.

Comments: FOCS 2023

arXiv:2308.15403 [pdf, ps, other]

A Near-Cubic Lower Bound for 3-Query Locally Decodable Codes from Semirandom CSP Refutation

Authors: Omar Alrabiah, Venkatesan Guruswami, Pravesh K. Kothari, Peter Manohar

Abstract: A code $C \colon \{0,1\}^k \to \{0,1\}^n$ is a $q$-locally decodable code ($q$-LDC) if one can recover any chosen bit $b_i$ of the message $b \in \{0,1\}^k$ with good confidence by randomly querying the encoding $x := C(b)$ on at most $q$ coordinates. Existing constructions of $2$-LDCs achieve $n = \exp(O(k))$, and lower bounds show that this is in fact tight. However, when $q = 3$, far less is kn… ▽ More A code $C \colon \{0,1\}^k \to \{0,1\}^n$ is a $q$-locally decodable code ($q$-LDC) if one can recover any chosen bit $b_i$ of the message $b \in \{0,1\}^k$ with good confidence by randomly querying the encoding $x := C(b)$ on at most $q$ coordinates. Existing constructions of $2$-LDCs achieve $n = \exp(O(k))$, and lower bounds show that this is in fact tight. However, when $q = 3$, far less is known: the best constructions achieve $n = \exp(k^{o(1)})$, while the best known results only show a quadratic lower bound $n \geq \tildeΩ(k^2)$ on the blocklength. In this paper, we prove a near-cubic lower bound of $n \geq \tildeΩ(k^3)$ on the blocklength of $3$-query LDCs. This improves on the best known prior works by a polynomial factor in $k$. Our proof relies on a new connection between LDCs and refuting constraint satisfaction problems with limited randomness. Our quantitative improvement builds on the new techniques for refuting semirandom instances of CSPs developed in [GKM22, HKM23] and, in particular, relies on bounding the spectral norm of appropriate Kikuchi matrices. △ Less

Submitted 29 August, 2023; originally announced August 2023.

arXiv:2307.05954 [pdf, other]

Ellipsoid Fitting Up to a Constant

Authors: Jun-Ting Hsieh, Pravesh K. Kothari, Aaron Potechin, Jeff Xu

Abstract: In [Sau11,SPW13], Saunderson, Parrilo and Willsky asked the following elegant geometric question: what is the largest $m= m(d)$ such that there is an ellipsoid in $\mathbb{R}^d$ that passes through $v_1, v_2, \ldots, v_m$ with high probability when the $v_i$s are chosen independently from the standard Gaussian distribution $N(0,I_{d})$. The existence of such an ellipsoid is equivalent to the exist… ▽ More In [Sau11,SPW13], Saunderson, Parrilo and Willsky asked the following elegant geometric question: what is the largest $m= m(d)$ such that there is an ellipsoid in $\mathbb{R}^d$ that passes through $v_1, v_2, \ldots, v_m$ with high probability when the $v_i$s are chosen independently from the standard Gaussian distribution $N(0,I_{d})$. The existence of such an ellipsoid is equivalent to the existence of a positive semidefinite matrix $X$ such that $v_i^{\top}X v_i =1$ for every $1 \leq i \leq m$ - a natural example of a random semidefinite program. SPW conjectured that $m= (1-o(1)) d^2/4$ with high probability. Very recently, Potechin, Turner, Venkat and Wein and Kane and Diakonikolas proved that $m \geq d^2/\log^{O(1)}(d)$ via certain explicit constructions. In this work, we give a substantially tighter analysis of their construction to prove that $m \geq d^2/C$ for an absolute constant $C>0$. This resolves one direction of the SPW conjecture up to a constant. Our analysis proceeds via the method of Graphical Matrix Decomposition that has recently been used to analyze correlated random matrices arising in various areas [BHK+19]. Our key new technical tool is a refined method to prove singular value upper bounds on certain correlated random matrices that are tight up to absolute dimension-independent constants. In contrast, all previous methods that analyze such matrices lose logarithmic factors in the dimension. △ Less

Submitted 12 July, 2023; originally announced July 2023.

Comments: ICALP 2023

arXiv:2303.00252 [pdf, ps, other]

Is Planted Coloring Easier than Planted Clique?

Authors: Pravesh K. Kothari, Santosh S. Vempala, Alexander S. Wein, Jeff Xu

Abstract: We study the computational complexity of two related problems: recovering a planted $q$-coloring in $G(n,1/2)$, and finding efficiently verifiable witnesses of non-$q$-colorability (a.k.a. refutations) in $G(n,1/2)$. Our main results show hardness for both these problems in a restricted-but-powerful class of algorithms based on computing low-degree polynomials in the inputs. The problem of recov… ▽ More We study the computational complexity of two related problems: recovering a planted $q$-coloring in $G(n,1/2)$, and finding efficiently verifiable witnesses of non-$q$-colorability (a.k.a. refutations) in $G(n,1/2)$. Our main results show hardness for both these problems in a restricted-but-powerful class of algorithms based on computing low-degree polynomials in the inputs. The problem of recovering a planted $q$-coloring is equivalent to recovering $q$ disjoint planted cliques that cover all the vertices -- a potentially easier variant of the well-studied planted clique problem. Our first result shows that this variant is as hard as the original planted clique problem in the low-degree polynomial model of computation: each clique needs to have size $k \gg \sqrt{n}$ for efficient recovery to be possible. For the related variant where the cliques cover a $(1-ε)$-fraction of the vertices, we also show hardness by reduction from planted clique. Our second result shows that refuting $q$-colorability of $G(n,1/2)$ is hard in the low-degree polynomial model when $q \gg n^{2/3}$ but easy when $q \lesssim n^{1/2}$, and we leave closing this gap for future work. Our proof is more subtle than similar results for planted clique and involves constructing a non-standard distribution over $q$-colorable graphs. We note that while related to several prior works, this is the first work that explicitly formulates refutation problems in the low-degree polynomial model. The proofs of our main results involve showing low-degree hardness of hypothesis testing between an appropriately constructed pair of distributions. For refutation, we show completeness of this approach: in the low-degree model, the refutation task is precisely as hard as the hardest associated testing problem, i.e., proving hardness of refutation amounts to finding a "hard" distribution. △ Less

Submitted 1 March, 2023; originally announced March 2023.

Comments: 23 pages

arXiv:2302.12289 [pdf, other]

Beyond Moments: Robustly Learning Affine Transformations with Asymptotically Optimal Error

Authors: He Jia, Pravesh K . Kothari, Santosh S. Vempala

Abstract: We present a polynomial-time algorithm for robustly learning an unknown affine transformation of the standard hypercube from samples, an important and well-studied setting for independent component analysis (ICA). Specifically, given an $ε$-corrupted sample from a distribution $D$ obtained by applying an unknown affine transformation $x \rightarrow Ax+s$ to the uniform distribution on a $d$-dimens… ▽ More We present a polynomial-time algorithm for robustly learning an unknown affine transformation of the standard hypercube from samples, an important and well-studied setting for independent component analysis (ICA). Specifically, given an $ε$-corrupted sample from a distribution $D$ obtained by applying an unknown affine transformation $x \rightarrow Ax+s$ to the uniform distribution on a $d$-dimensional hypercube $[-1,1]^d$, our algorithm constructs $\hat{A}, \hat{s}$ such that the total variation distance of the distribution $\hat{D}$ from $D$ is $O(ε)$ using poly$(d)$ time and samples. Total variation distance is the information-theoretically strongest possible notion of distance in our setting and our recovery guarantees in this distance are optimal up to the absolute constant factor multiplying $ε$. In particular, if the columns of $A$ are normalized to be unit length, our total variation distance guarantee implies a bound on the sum of the $\ell_2$ distances between the column vectors of $A$ and $A'$, $\sum_{i =1}^d \|a_i-\hat{a}_i\|_2 = O(ε)$. In contrast, the strongest known prior results only yield a $ε^{O(1)}$ (relative) bound on the distance between individual $a_i$'s and their estimates and translate into an $O(dε)$ bound on the total variation distance. Our key innovation is a new approach to ICA (even to outlier-free ICA) that circumvents the difficulties in the classical method of moments and instead relies on a new geometric certificate of correctness of an affine transformation. Our algorithm is based on a new method that iteratively improves an estimate of the unknown affine transformation whenever the requirements of the certificate are not met. △ Less

Submitted 23 February, 2023; originally announced February 2023.

arXiv:2212.08018 [pdf, ps, other]

Privately Estimating a Gaussian: Efficient, Robust and Optimal

Authors: Daniel Alabi, Pravesh K. Kothari, Pranay Tankala, Prayaag Venkat, Fred Zhang

Abstract: In this work, we give efficient algorithms for privately estimating a Gaussian distribution in both pure and approximate differential privacy (DP) models with optimal dependence on the dimension in the sample complexity. In the pure DP setting, we give an efficient algorithm that estimates an unknown $d$-dimensional Gaussian distribution up to an arbitrary tiny total variation error using… ▽ More In this work, we give efficient algorithms for privately estimating a Gaussian distribution in both pure and approximate differential privacy (DP) models with optimal dependence on the dimension in the sample complexity. In the pure DP setting, we give an efficient algorithm that estimates an unknown $d$-dimensional Gaussian distribution up to an arbitrary tiny total variation error using $\widetilde{O}(d^2 \log κ)$ samples while tolerating a constant fraction of adversarial outliers. Here, $κ$ is the condition number of the target covariance matrix. The sample bound matches best non-private estimators in the dependence on the dimension (up to a polylogarithmic factor). We prove a new lower bound on differentially private covariance estimation to show that the dependence on the condition number $κ$ in the above sample bound is also tight. Prior to our work, only identifiability results (yielding inefficient super-polynomial time algorithms) were known for the problem. In the approximate DP setting, we give an efficient algorithm to estimate an unknown Gaussian distribution up to an arbitrarily tiny total variation error using $\widetilde{O}(d^2)$ samples while tolerating a constant fraction of adversarial outliers. Prior to our work, all efficient approximate DP algorithms incurred a super-quadratic sample cost or were not outlier-robust. For the special case of mean estimation, our algorithm achieves the optimal sample complexity of $\widetilde O(d)$, improving on a $\widetilde O(d^{1.5})$ bound from prior work. Our pure DP algorithm relies on a recursive private preconditioning subroutine that utilizes the recent work on private mean estimation [Hopkins et al., 2022]. Our approximate DP algorithms are based on a substantial upgrade of the method of stabilizing convex relaxations introduced in [Kothari et al., 2022]. △ Less

Submitted 1 June, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

arXiv:2212.05619 [pdf, ps, other]

Algorithms approaching the threshold for semi-random planted clique

Authors: Rares-Darius Buhai, Pravesh K. Kothari, David Steurer

Abstract: We design new polynomial-time algorithms for recovering planted cliques in the semi-random graph model introduced by Feige and Kilian 2001. The previous best algorithms for this model succeed if the planted clique has size at least $n^{2/3}$ in a graph with $n$ vertices (Mehta, Mckenzie, Trevisan 2019 and Charikar, Steinhardt, Valiant 2017). Our algorithms work for planted-clique sizes approaching… ▽ More We design new polynomial-time algorithms for recovering planted cliques in the semi-random graph model introduced by Feige and Kilian 2001. The previous best algorithms for this model succeed if the planted clique has size at least $n^{2/3}$ in a graph with $n$ vertices (Mehta, Mckenzie, Trevisan 2019 and Charikar, Steinhardt, Valiant 2017). Our algorithms work for planted-clique sizes approaching $n^{1/2}$ -- the information-theoretic threshold in the semi-random model (Steinhardt 2017) and a conjectured computational threshold even in the easier fully-random model. This result comes close to resolving open questions by Feige 2019 and Steinhardt 2017. Our algorithms are based on higher constant degree sum-of-squares relaxation and rely on a new conceptual connection that translates certificates of upper bounds on biclique numbers in unbalanced bipartite Erdős--Rényi random graphs into algorithms for semi-random planted clique. The use of a higher-constant degree sum-of-squares is essential in our setting: we prove a lower bound on the basic SDP for certifying bicliques that shows that the basic SDP cannot succeed for planted cliques of size $k =o(n^{2/3})$. We also provide some evidence that the information-computation trade-off of our current algorithms may be inherent by proving an average-case lower bound for unbalanced bicliques in the low-degree-polynomials model. △ Less

Submitted 6 June, 2023; v1 submitted 11 December, 2022; originally announced December 2022.

Comments: 51 pages, the arxiv landing page contains a shortened abstract

ACM Class: F.2

arXiv:2211.14312 [pdf]

Karyotype AI for Precision Oncology

Authors: Zahra Shamsi, Drew Bryant, Jacob Wilson, Xiaoyu Qu, Avinava Dubey, Konik Kothari, Mostafa Dehghani, Mariya Chavarha, Valerii Likhosherstov, Brian Williams, Michael Frumkin, Fred Appelbaum, Krzysztof Choromanski, Ali Bashir, Min Fang

Abstract: Chromosome analysis is essential for diagnosing genetic disorders. For hematologic malignancies, identification of somatic clonal aberrations by karyotype analysis remains the standard of care. However, karyoty** is costly and time-consuming because of the largely manual process and the expertise required in identifying and annotating aberrations. Efforts to automate karyotype analysis to date f… ▽ More Chromosome analysis is essential for diagnosing genetic disorders. For hematologic malignancies, identification of somatic clonal aberrations by karyotype analysis remains the standard of care. However, karyoty** is costly and time-consuming because of the largely manual process and the expertise required in identifying and annotating aberrations. Efforts to automate karyotype analysis to date fell short in aberration detection. Using a training set of ~10k patient specimens and ~50k karyograms from over 5 years from the Fred Hutchinson Cancer Center, we created a labeled set of images representing individual chromosomes. These individual chromosomes were used to train and assess deep learning models for classifying the 24 human chromosomes and identifying chromosomal aberrations. The top-accuracy models utilized the recently introduced Topological Vision Transformers (TopViTs) with 2-level-block-Toeplitz masking, to incorporate structural inductive bias. TopViT outperformed CNN (Inception) models with >99.3% accuracy for chromosome identification, and exhibited accuracies >99% for aberration detection in most aberrations. Notably, we were able to show high-quality performance even in "few shot" learning scenarios. Incorporating the definition of clonality substantially improved both precision and recall (sensitivity). When applied to "zero shot" scenarios, the model captured aberrations without training, with perfect precision at >50% recall. Together these results show that modern deep learning models can approach expert-level performance for chromosome aberration detection. To our knowledge, this is the first study demonstrating the downstream effectiveness of TopViTs. These results open up exciting opportunities for not only expediting patient results but providing a scalable technology for early screening of low-abundance chromosomal lesions. △ Less

Submitted 19 October, 2023; v1 submitted 19 November, 2022; originally announced November 2022.

arXiv:2211.13312 [pdf, ps, other]

A Moment-Matching Approach to Testable Learning and a New Characterization of Rademacher Complexity

Authors: Aravind Gollakota, Adam R. Klivans, Pravesh K. Kothari

Abstract: A remarkable recent paper by Rubinfeld and Vasilyan (2022) initiated the study of \emph{testable learning}, where the goal is to replace hard-to-verify distributional assumptions (such as Gaussianity) with efficiently testable ones and to require that the learner succeed whenever the unknown distribution passes the corresponding test. In this model, they gave an efficient algorithm for learning ha… ▽ More A remarkable recent paper by Rubinfeld and Vasilyan (2022) initiated the study of \emph{testable learning}, where the goal is to replace hard-to-verify distributional assumptions (such as Gaussianity) with efficiently testable ones and to require that the learner succeed whenever the unknown distribution passes the corresponding test. In this model, they gave an efficient algorithm for learning halfspaces under testable assumptions that are provably satisfied by Gaussians. In this paper we give a powerful new approach for develo** algorithms for testable learning using tools from moment matching and metric distances in probability. We obtain efficient testable learners for any concept class that admits low-degree \emph{sandwiching polynomials}, capturing most important examples for which we have ordinary agnostic learners. We recover the results of Rubinfeld and Vasilyan as a corollary of our techniques while achieving improved, near-optimal sample complexity bounds for a broad range of concept classes and distributions. Surprisingly, we show that the information-theoretic sample complexity of testable learning is tightly characterized by the Rademacher complexity of the concept class, one of the most well-studied measures in statistical learning theory. In particular, uniform convergence is necessary and sufficient for testable learning. This leads to a fundamental separation from (ordinary) distribution-specific agnostic learning, where uniform convergence is sufficient but not necessary. △ Less

Submitted 23 November, 2022; originally announced November 2022.

Comments: 34 pages

arXiv:2211.10525 [pdf, other]

Differentiable Uncalibrated Imaging

Authors: Sidharth Gupta, Konik Kothari, Valentin Debarnot, Ivan Dokmanić

Abstract: We propose a differentiable imaging framework to address uncertainty in measurement coordinates such as sensor locations and projection angles. We formulate the problem as measurement interpolation at unknown nodes supervised through the forward operator. To solve it we apply implicit neural networks, also known as neural fields, which are naturally differentiable with respect to the input coordin… ▽ More We propose a differentiable imaging framework to address uncertainty in measurement coordinates such as sensor locations and projection angles. We formulate the problem as measurement interpolation at unknown nodes supervised through the forward operator. To solve it we apply implicit neural networks, also known as neural fields, which are naturally differentiable with respect to the input coordinates. We also develop differentiable spline interpolators which perform as well as neural networks, require less time to optimize and have well-understood properties. Differentiability is key as it allows us to jointly fit a measurement representation, optimize over the uncertain measurement coordinates, and perform image reconstruction which in turn ensures consistent calibration. We apply our approach to 2D and 3D computed tomography, and show that it produces improved reconstructions compared to baselines that do not account for the lack of calibration. The flexibility of the proposed framework makes it easy to extend to almost arbitrary imaging problems. △ Less

Submitted 20 December, 2023; v1 submitted 18 November, 2022; originally announced November 2022.

arXiv:2208.00122 [pdf, ps, other]

Polynomial-Time Power-Sum Decomposition of Polynomials

Authors: Mitali Bafna, Jun-Ting Hsieh, Pravesh K. Kothari, Jeff Xu

Abstract: We give efficient algorithms for finding power-sum decomposition of an input polynomial $P(x)= \sum_{i\leq m} p_i(x)^d$ with component $p_i$s. The case of linear $p_i$s is equivalent to the well-studied tensor decomposition problem while the quadratic case occurs naturally in studying identifiability of non-spherical Gaussian mixtures from low-order moments. Unlike tensor decomposition, both the… ▽ More We give efficient algorithms for finding power-sum decomposition of an input polynomial $P(x)= \sum_{i\leq m} p_i(x)^d$ with component $p_i$s. The case of linear $p_i$s is equivalent to the well-studied tensor decomposition problem while the quadratic case occurs naturally in studying identifiability of non-spherical Gaussian mixtures from low-order moments. Unlike tensor decomposition, both the unique identifiability and algorithms for this problem are not well-understood. For the simplest setting of quadratic $p_i$s and $d=3$, prior work of Ge, Huang and Kakade yields an algorithm only when $m \leq \tilde{O}(\sqrt{n})$. On the other hand, the more general recent result of Garg, Kayal and Saha builds an algebraic approach to handle any $m=n^{O(1)}$ components but only when $d$ is large enough (while yielding no bounds for $d=3$ or even $d=100$) and only handles an inverse exponential noise. Our results obtain a substantial quantitative improvement on both the prior works above even in the base case of $d=3$ and quadratic $p_i$s. Specifically, our algorithm succeeds in decomposing a sum of $m \sim \tilde{O}(n)$ generic quadratic $p_i$s for $d=3$ and more generally the $d$th power-sum of $m \sim n^{2d/15}$ generic degree-$K$ polynomials for any $K \geq 2$. Our algorithm relies only on basic numerical linear algebraic primitives, is exact (i.e., obtain arbitrarily tiny error up to numerical precision), and handles an inverse polynomial noise when the $p_i$s have random Gaussian coefficients. Our main tool is a new method for extracting the linear span of $p_i$s by studying the linear subspace of low-order partial derivatives of the input $P$. For establishing polynomial stability of our algorithm in average-case, we prove inverse polynomial bounds on the smallest singular value of certain correlated random matrices with low-degree polynomial entries that arise in our analyses. △ Less

Submitted 29 July, 2022; originally announced August 2022.

Comments: To appear in FOCS 2022

arXiv:2207.10850 [pdf, other]

A simple and sharper proof of the hypergraph Moore bound

Authors: Jun-Ting Hsieh, Pravesh K. Kothari, Sidhanth Mohanty

Abstract: The hypergraph Moore bound is an elegant statement that characterizes the extremal trade-off between the girth - the number of hyperedges in the smallest cycle or even cover (a subhypergraph with all degrees even) and size - the number of hyperedges in a hypergraph. For graphs (i.e., $2$-uniform hypergraphs), a bound tight up to the leading constant was proven in a classical work of Alon, Hoory an… ▽ More The hypergraph Moore bound is an elegant statement that characterizes the extremal trade-off between the girth - the number of hyperedges in the smallest cycle or even cover (a subhypergraph with all degrees even) and size - the number of hyperedges in a hypergraph. For graphs (i.e., $2$-uniform hypergraphs), a bound tight up to the leading constant was proven in a classical work of Alon, Hoory and Linial [AHL02]. For hypergraphs of uniformity $k>2$, an appropriate generalization was conjectured by Feige [Fei08]. The conjecture was settled up to an additional $\log^{4k+1} n$ factor in the size in a recent work of Guruswami, Kothari and Manohar [GKM21]. Their argument relies on a connection between the existence of short even covers and the spectrum of a certain randomly signed Kikuchi matrix. Their analysis, especially for the case of odd $k$, is significantly complicated. In this work, we present a substantially simpler and shorter proof of the hypergraph Moore bound. Our key idea is the use of a new reweighted Kikuchi matrix and an edge deletion step that allows us to drop several involved steps in [GKM21]'s analysis such as combinatorial bucketing of rows of the Kikuchi matrix and the use of the Schudy-Sviridenko polynomial concentration. Our simpler proof also obtains tighter parameters: in particular, the argument gives a new proof of the classical Moore bound of [AHL02] with no loss (the proof in [GKM21] loses a $\log^3 n$ factor), and loses only a single logarithmic factor for all $k>2$-uniform hypergraphs. As in [GKM21], our ideas naturally extend to yield a simpler proof of the full trade-off for strongly refuting smoothed instances of constraint satisfaction problems with similarly improved parameters. △ Less

Submitted 21 July, 2022; originally announced July 2022.

arXiv:2206.10942 [pdf, ps, other]

doi 10.1145/3519935.3520006

List-Decodable Covariance Estimation

Authors: Misha Ivkov, Pravesh K. Kothari

Abstract: We give the first polynomial time algorithm for \emph{list-decodable covariance estimation}. For any $α> 0$, our algorithm takes input a sample $Y \subseteq \mathbb{R}^d$ of size $n\geq d^{\mathsf{poly}(1/α)}$ obtained by adversarially corrupting an $(1-α)n$ points in an i.i.d. sample $X$ of size $n$ from the Gaussian distribution with unknown mean $μ_*$ and covariance $Σ_*$. In… ▽ More We give the first polynomial time algorithm for \emph{list-decodable covariance estimation}. For any $α> 0$, our algorithm takes input a sample $Y \subseteq \mathbb{R}^d$ of size $n\geq d^{\mathsf{poly}(1/α)}$ obtained by adversarially corrupting an $(1-α)n$ points in an i.i.d. sample $X$ of size $n$ from the Gaussian distribution with unknown mean $μ_*$ and covariance $Σ_*$. In $n^{\mathsf{poly}(1/α)}$ time, it outputs a constant-size list of $k = k(α)= (1/α)^{\mathsf{poly}(1/α)}$ candidate parameters that, with high probability, contains a $(\hatμ,\hatΣ)$ such that the total variation distance $TV(\mathcal{N}(μ_*,Σ_*),\mathcal{N}(\hatμ,\hatΣ))<1-O_α(1)$. This is the statistically strongest notion of distance and implies multiplicative spectral and relative Frobenius distance approximation for parameters with dimension independent error. Our algorithm works more generally for $(1-α)$-corruptions of any distribution $D$ that possesses low-degree sum-of-squares certificates of two natural analytic properties: 1) anti-concentration of one-dimensional marginals and 2) hypercontractivity of degree 2 polynomials. Prior to our work, the only known results for estimating covariance in the list-decodable setting were for the special cases of list-decodable linear regression and subspace recovery due to Karmarkar, Klivans, and Kothari (2019), Raghavendra and Yau (2019 and 2020) and Bakshi and Kothari (2020). These results need superpolynomial time for obtaining any subconstant error in the underlying dimension. Our result implies the first polynomial-time \emph{exact} algorithm for list-decodable linear regression and subspace recovery that allows, in particular, to obtain $2^{-\mathsf{poly}(d)}$ error in polynomial-time. Our result also implies an improved algorithm for clustering non-spherical mixtures. △ Less

Submitted 22 June, 2022; originally announced June 2022.

Comments: Abstract slightly clipped. To appear at STOC 2022

ACM Class: F.2.1

arXiv:2206.09204 [pdf, ps, other]

Approximating Max-Cut on Bounded Degree Graphs: Tighter Analysis of the FKL Algorithm

Authors: Jun-Ting Hsieh, Pravesh K. Kothari

Abstract: In this note, we describe a $α_{GW} + \tildeΩ(1/d^2)$-factor approximation algorithm for Max-Cut on weighted graphs of degree $\leq d$. Here, $α_{GW}\approx 0.878$ is the worst-case approximation ratio of the Goemans-Williamson rounding for Max-Cut. This improves on previous results for unweighted graphs by Feige, Karpinski, and Langberg and Florén. Our guarantee is obtained by a tighter analysis… ▽ More In this note, we describe a $α_{GW} + \tildeΩ(1/d^2)$-factor approximation algorithm for Max-Cut on weighted graphs of degree $\leq d$. Here, $α_{GW}\approx 0.878$ is the worst-case approximation ratio of the Goemans-Williamson rounding for Max-Cut. This improves on previous results for unweighted graphs by Feige, Karpinski, and Langberg and Florén. Our guarantee is obtained by a tighter analysis of the solution obtained by applying a natural local improvement procedure to the Goemans-Williamson rounding of the basic SDP strengthened with triangle inequalities. △ Less

Submitted 18 June, 2022; originally announced June 2022.

arXiv:2205.06739 [pdf, ps, other]

Bypassing the XOR Trick: Stronger Certificates for Hypergraph Clique Number

Authors: Venkatesan Guruswami, Pravesh K. Kothari, Peter Manohar

Abstract: Let $\mathcal{H}(k,n,p)$ be the distribution on $k$-uniform hypergraphs where every subset of $[n]$ of size $k$ is included as an hyperedge with probability $p$ independently. In this work, we design and analyze a simple spectral algorithm that certifies a bound on the size of the largest clique, $ω(H)$, in hypergraphs $H \sim \mathcal{H}(k,n,p)$. For example, for any constant $p$, with high proba… ▽ More Let $\mathcal{H}(k,n,p)$ be the distribution on $k$-uniform hypergraphs where every subset of $[n]$ of size $k$ is included as an hyperedge with probability $p$ independently. In this work, we design and analyze a simple spectral algorithm that certifies a bound on the size of the largest clique, $ω(H)$, in hypergraphs $H \sim \mathcal{H}(k,n,p)$. For example, for any constant $p$, with high probability over the choice of the hypergraph, our spectral algorithm certifies a bound of $\tilde{O}(\sqrt{n})$ on the clique number in polynomial time. This matches, up to $\textrm{polylog}(n)$ factors, the best known certificate for the clique number in random graphs, which is the special case of $k = 2$. Prior to our work, the best known refutation algorithms [CGL04, AOW15] rely on a reduction to the problem of refuting random $k$-XOR via Feige's XOR trick [Fei02], and yield a polynomially worse bound of $\tilde{O}(n^{3/4})$ on the clique number when $p = O(1)$. Our algorithm bypasses the XOR trick and relies instead on a natural generalization of the Lovasz theta semidefinite programming relaxation for cliques in hypergraphs. △ Less

Submitted 13 May, 2022; originally announced May 2022.

arXiv:2204.07664 [pdf, other]

doi 10.1109/TCI.2023.3248949

Conditional Injective Flows for Bayesian Imaging

Authors: AmirEhsan Khorashadizadeh, Konik Kothari, Leonardo Salsi, Ali Aghababaei Harandi, Maarten de Hoop, Ivan Dokmanić

Abstract: Most deep learning models for computational imaging regress a single reconstructed image. In practice, however, ill-posedness, nonlinearity, model mismatch, and noise often conspire to make such point estimates misleading or insufficient. The Bayesian approach models images and (noisy) measurements as jointly distributed random vectors and aims to approximate the posterior distribution of unknowns… ▽ More Most deep learning models for computational imaging regress a single reconstructed image. In practice, however, ill-posedness, nonlinearity, model mismatch, and noise often conspire to make such point estimates misleading or insufficient. The Bayesian approach models images and (noisy) measurements as jointly distributed random vectors and aims to approximate the posterior distribution of unknowns. Recent variational inference methods based on conditional normalizing flows are a promising alternative to traditional MCMC methods, but they come with drawbacks: excessive memory and compute demands for moderate to high resolution images and underwhelming performance on hard nonlinear problems. In this work, we propose C-Trumpets -- conditional injective flows specifically designed for imaging problems, which greatly diminish these challenges. Injectivity reduces memory footprint and training time while low-dimensional latent space together with architectural innovations like fixed-volume-change layers and skip-connection revnet layers, C-Trumpets outperform regular conditional flow models on a variety of imaging and image restoration tasks, including limited-view CT and nonlinear inverse scattering, with a lower compute and memory budget. C-Trumpets enable fast approximation of point estimates like MMSE or MAP as well as physically-meaningful uncertainty quantification. △ Less

Submitted 3 April, 2023; v1 submitted 15 April, 2022; originally announced April 2022.

Comments: 23 pages, 23 figures

Journal ref: IEEE Transactions on Computational Imaging, vol. 9, pp. 224-237, 2023

arXiv:2112.03548 [pdf, ps, other]

Private Robust Estimation by Stabilizing Convex Relaxations

Authors: Pravesh K. Kothari, Pasin Manurangsi, Ameya Velingker

Abstract: We give the first polynomial time and sample $(ε, δ)$-differentially private (DP) algorithm to estimate the mean, covariance and higher moments in the presence of a constant fraction of adversarial outliers. Our algorithm succeeds for families of distributions that satisfy two well-studied properties in prior works on robust estimation: certifiable subgaussianity of directional moments and certifi… ▽ More We give the first polynomial time and sample $(ε, δ)$-differentially private (DP) algorithm to estimate the mean, covariance and higher moments in the presence of a constant fraction of adversarial outliers. Our algorithm succeeds for families of distributions that satisfy two well-studied properties in prior works on robust estimation: certifiable subgaussianity of directional moments and certifiable hypercontractivity of degree 2 polynomials. Our recovery guarantees hold in the "right affine-invariant norms": Mahalanobis distance for mean, multiplicative spectral and relative Frobenius distance guarantees for covariance and injective norms for higher moments. Prior works obtained private robust algorithms for mean estimation of subgaussian distributions with bounded covariance. For covariance estimation, ours is the first efficient algorithm (even in the absence of outliers) that succeeds without any condition-number assumptions. Our algorithms arise from a new framework that provides a general blueprint for modifying convex relaxations for robust estimation to satisfy strong worst-case stability guarantees in the appropriate parameter norms whenever the algorithms produce witnesses of correctness in their run. We verify such guarantees for a modification of standard sum-of-squares (SoS) semidefinite programming relaxations for robust estimation. Our privacy guarantees are obtained by combining stability guarantees with a new "estimate dependent" noise injection mechanism in which noise scales with the eigenvalues of the estimated covariance. We believe this framework will be useful more generally in obtaining DP counterparts of robust estimators. Independently of our work, Ashtiani and Liaw [AL21] also obtained a polynomial time and sample private robust estimation algorithm for Gaussian distributions. △ Less

Submitted 7 December, 2021; originally announced December 2021.

arXiv:2110.11853 [pdf, ps, other]

Polynomial-Time Sum-of-Squares Can Robustly Estimate Mean and Covariance of Gaussians Optimally

Authors: Pravesh K. Kothari, Peter Manohar, Brian Hu Zhang

Abstract: In this work, we revisit the problem of estimating the mean and covariance of an unknown $d$-dimensional Gaussian distribution in the presence of an $\varepsilon$-fraction of adversarial outliers. The pioneering work of [DKK+16] gave a polynomial time algorithm for this task with optimal $\tilde{O}(\varepsilon)$ error using $n = \textrm{poly}(d, 1/\varepsilon)$ samples. On the other hand, [KS17b… ▽ More In this work, we revisit the problem of estimating the mean and covariance of an unknown $d$-dimensional Gaussian distribution in the presence of an $\varepsilon$-fraction of adversarial outliers. The pioneering work of [DKK+16] gave a polynomial time algorithm for this task with optimal $\tilde{O}(\varepsilon)$ error using $n = \textrm{poly}(d, 1/\varepsilon)$ samples. On the other hand, [KS17b] introduced a general framework for robust moment estimation via a canonical sum-of-squares relaxation that succeeds for the more general class of certifiably subgaussian and certifiably hypercontractive [BK20] distributions. When specialized to Gaussians, this algorithm obtains the same $\tilde{O}(\varepsilon)$ error guarantee as [DKK+16] but incurs a super-polynomial sample complexity ($n = d^{O(\log(1/\varepsilon)}$) and running time ($n^{O(\log(1/\varepsilon))}$). This cost appears inherent to their analysis as it relies only on sum-of-squares certificates of upper bounds on directional moments while the analysis in [DKK+16] relies on lower bounds on directional moments inferred from algebraic relationships between moments of Gaussian distributions. We give a new, simple analysis of the same canonical sum-of-squares relaxation used in [KS17b, BK20] and show that for Gaussian distributions, their algorithm achieves the same error, sample complexity and running time guarantees as of the specialized algorithm in [DKK+16]. Our key innovation is a new argument that allows using moment lower bounds without having sum-of-squares certificates for them. We believe that our proof technique will likely be useful in develo** further robust estimation algorithms. △ Less

Submitted 22 October, 2021; originally announced October 2021.

arXiv:2110.08677 [pdf, ps, other]

Algorithmic Thresholds for Refuting Random Polynomial Systems

Authors: Jun-Ting Hsieh, Pravesh K. Kothari

Abstract: Consider a system of $m$ polynomial equations $\{p_i(x) = b_i\}_{i \leq m}$ of degree $D\geq 2$ in $n$-dimensional variable $x \in \mathbb{R}^n$ such that each coefficient of every $p_i$ and $b_i$s are chosen at random and independently from some continuous distribution. We study the basic question of determining the smallest $m$ -- the algorithmic threshold -- for which efficient algorithms can f… ▽ More Consider a system of $m$ polynomial equations $\{p_i(x) = b_i\}_{i \leq m}$ of degree $D\geq 2$ in $n$-dimensional variable $x \in \mathbb{R}^n$ such that each coefficient of every $p_i$ and $b_i$s are chosen at random and independently from some continuous distribution. We study the basic question of determining the smallest $m$ -- the algorithmic threshold -- for which efficient algorithms can find refutations (i.e. certificates of unsatisfiability) for such systems. This setting generalizes problems such as refuting random SAT instances, low-rank matrix sensing and certifying pseudo-randomness of Goldreich's candidate generators and generalizations. We show that for every $d \in \mathbb{N}$, the $(n+m)^{O(d)}$-time canonical sum-of-squares (SoS) relaxation refutes such a system with high probability whenever $m \geq O(n) \cdot (\frac{n}{d})^{D-1}$. We prove a lower bound in the restricted low-degree polynomial model of computation which suggests that this trade-off between SoS degree and the number of equations is nearly tight for all $d$. We also confirm the predictions of this lower bound in a limited setting by showing a lower bound on the canonical degree-$4$ sum-of-squares relaxation for refuting random quadratic polynomials. Together, our results provide evidence for an algorithmic threshold for the problem at $m \gtrsim \widetilde{O}(n) \cdot n^{(1-δ)(D-1)}$ for $2^{n^δ}$-time algorithms for all $δ$. △ Less

Submitted 16 October, 2021; originally announced October 2021.

arXiv:2109.04415 [pdf, other]

Algorithms and Certificates for Boolean CSP Refutation: "Smoothed is no harder than Random"

Authors: Venkatesan Guruswami, Pravesh K. Kothari, Peter Manohar

Abstract: We present an algorithm for strongly refuting smoothed instances of all Boolean CSPs. The smoothed model is a hybrid between worst and average-case input models, where the input is an arbitrary instance of the CSP with only the negation patterns of the literals re-randomized with some small probability. For an $n$-variable smoothed instance of a $k$-arity CSP, our algorithm runs in $n^{O(\ell)}$ t… ▽ More We present an algorithm for strongly refuting smoothed instances of all Boolean CSPs. The smoothed model is a hybrid between worst and average-case input models, where the input is an arbitrary instance of the CSP with only the negation patterns of the literals re-randomized with some small probability. For an $n$-variable smoothed instance of a $k$-arity CSP, our algorithm runs in $n^{O(\ell)}$ time, and succeeds with high probability in bounding the optimum fraction of satisfiable constraints away from $1$, provided that the number of constraints is at least $\tilde{O}(n) (\frac{n}{\ell})^{\frac{k}{2} - 1}$. This matches, up to polylogarithmic factors in $n$, the trade-off between running time and the number of constraints of the state-of-the-art algorithms for refuting fully random instances of CSPs [RRS17]. We also make a surprising new connection between our algorithm and even covers in hypergraphs, which we use to positively resolve Feige's 2008 conjecture, an extremal combinatorics conjecture on the existence of even covers in sufficiently dense hypergraphs that generalizes the well-known Moore bound for the girth of graphs. As a corollary, we show that polynomial-size refutation witnesses exist for arbitrary smoothed CSP instances with number of constraints a polynomial factor below the "spectral threshold" of $n^{k/2}$, extending the celebrated result for random 3-SAT of Feige, Kim and Ofek [FKO06]. △ Less

Submitted 3 September, 2023; v1 submitted 9 September, 2021; originally announced September 2021.

arXiv:2107.02320 [pdf, ps, other]

Memory-Sample Lower Bounds for Learning Parity with Noise

Authors: Sumegha Garg, Pravesh K. Kothari, Pengda Liu, Ran Raz

Abstract: In this work, we show, for the well-studied problem of learning parity under noise, where a learner tries to learn $x=(x_1,\ldots,x_n) \in \{0,1\}^n$ from a stream of random linear equations over $\mathrm{F}_2$ that are correct with probability $\frac{1}{2}+\varepsilon$ and flipped with probability $\frac{1}{2}-\varepsilon$, that any learning algorithm requires either a memory of size… ▽ More In this work, we show, for the well-studied problem of learning parity under noise, where a learner tries to learn $x=(x_1,\ldots,x_n) \in \{0,1\}^n$ from a stream of random linear equations over $\mathrm{F}_2$ that are correct with probability $\frac{1}{2}+\varepsilon$ and flipped with probability $\frac{1}{2}-\varepsilon$, that any learning algorithm requires either a memory of size $Ω(n^2/\varepsilon)$ or an exponential number of samples. In fact, we study memory-sample lower bounds for a large class of learning problems, as characterized by [GRT'18], when the samples are noisy. A matrix $M: A \times X \rightarrow \{-1,1\}$ corresponds to the following learning problem with error parameter $\varepsilon$: an unknown element $x \in X$ is chosen uniformly at random. A learner tries to learn $x$ from a stream of samples, $(a_1, b_1), (a_2, b_2) \ldots$, where for every $i$, $a_i \in A$ is chosen uniformly at random and $b_i = M(a_i,x)$ with probability $1/2+\varepsilon$ and $b_i = -M(a_i,x)$ with probability $1/2-\varepsilon$ ($0<\varepsilon< \frac{1}{2}$). Assume that $k,\ell, r$ are such that any submatrix of $M$ of at least $2^{-k} \cdot |A|$ rows and at least $2^{-\ell} \cdot |X|$ columns, has a bias of at most $2^{-r}$. We show that any learning algorithm for the learning problem corresponding to $M$, with error, requires either a memory of size at least $Ω\left(\frac{k \cdot \ell}{\varepsilon} \right)$, or at least $2^{Ω(r)}$ samples. In particular, this shows that for a large class of learning problems, same as those in [GRT'18], any learning algorithm requires either a memory of size at least $Ω\left(\frac{(\log |X|) \cdot (\log |A|)}{\varepsilon}\right)$ or an exponential number of noisy samples. Our proof is based on adapting the arguments in [Raz'17,GRT'18] to the noisy case. △ Less

Submitted 5 July, 2021; originally announced July 2021.

Comments: 19 pages. To appear in RANDOM 2021. arXiv admin note: substantial text overlap with arXiv:1708.02639

ACM Class: F.2.3

arXiv:2105.07517 [pdf, ps, other]

A Stress-Free Sum-of-Squares Lower Bound for Coloring

Authors: Pravesh K. Kothari, Peter Manohar

Abstract: We prove that with high probability over the choice of a random graph $G$ from the Erdős-Rényi distribution $G(n,1/2)$, a natural $n^{O(\varepsilon^2 \log n)}$-time, degree $O(\varepsilon^2 \log n)$ sum-of-squares semidefinite program cannot refute the existence of a valid $k$-coloring of $G$ for $k = n^{1/2 +\varepsilon}$. Our result implies that the refutation guarantee of the basic semidefinite… ▽ More We prove that with high probability over the choice of a random graph $G$ from the Erdős-Rényi distribution $G(n,1/2)$, a natural $n^{O(\varepsilon^2 \log n)}$-time, degree $O(\varepsilon^2 \log n)$ sum-of-squares semidefinite program cannot refute the existence of a valid $k$-coloring of $G$ for $k = n^{1/2 +\varepsilon}$. Our result implies that the refutation guarantee of the basic semidefinite program (a close variant of the Lovász theta function) cannot be appreciably improved by a natural $o(\log n)$-degree sum-of-squares strengthening, and this is tight up to a $n^{o(1)}$ slack in $k$. To the best of our knowledge, this is the first lower bound for coloring $G(n,1/2)$ for even a single round strengthening of the basic SDP in any SDP hierarchy. Our proof relies on a new variant of instance-preserving non-pointwise complete reduction within SoS from coloring a graph to finding large independent sets in it. Our proof is (perhaps surprisingly) short, simple and does not require complicated spectral norm bounds on random matrices with dependent entries that have been otherwise necessary in the proofs of many similar results [BHK+16, HKP+17, KB19, GJJ+20, MRX20]. Our result formally holds for a constraint system where vertices are allowed to belong to multiple color classes; we leave the extension to the formally stronger formulation of coloring, where vertices must belong to unique colors classes, as an outstanding open problem. △ Less

Submitted 16 May, 2021; originally announced May 2021.

arXiv:2102.10461 [pdf, other]

Trumpets: Injective Flows for Inference and Inverse Problems

Authors: Konik Kothari, AmirEhsan Khorashadizadeh, Maarten de Hoop, Ivan Dokmanić

Abstract: We propose injective generative models called Trumpets that generalize invertible normalizing flows. The proposed generators progressively increase dimension from a low-dimensional latent space. We demonstrate that Trumpets can be trained orders of magnitudes faster than standard flows while yielding samples of comparable or better quality. They retain many of the advantages of the standard flows… ▽ More We propose injective generative models called Trumpets that generalize invertible normalizing flows. The proposed generators progressively increase dimension from a low-dimensional latent space. We demonstrate that Trumpets can be trained orders of magnitudes faster than standard flows while yielding samples of comparable or better quality. They retain many of the advantages of the standard flows such as training based on maximum likelihood and a fast, exact inverse of the generator. Since Trumpets are injective and have fast inverses, they can be effectively used for downstream Bayesian inference. To wit, we use Trumpet priors for maximum a posteriori estimation in the context of image reconstruction from compressive measurements, outperforming competitive baselines in terms of reconstruction quality and speed. We then propose an efficient method for posterior characterization and uncertainty quantification with Trumpets by taking advantage of the low-dimensional latent space. △ Less

Submitted 20 February, 2021; originally announced February 2021.

Comments: 16 pages

Journal ref: Uncertainty in Artificial Intelligence (UAI 2021)

arXiv:2012.02119 [pdf, other]

Robustly Learning Mixtures of $k$ Arbitrary Gaussians

Authors: Ainesh Bakshi, Ilias Diakonikolas, He Jia, Daniel M. Kane, Pravesh K. Kothari, Santosh S. Vempala

Abstract: We give a polynomial-time algorithm for the problem of robustly estimating a mixture of $k$ arbitrary Gaussians in $\mathbb{R}^d$, for any fixed $k$, in the presence of a constant fraction of arbitrary corruptions. This resolves the main open problem in several previous works on algorithmic robust statistics, which addressed the special cases of robustly estimating (a) a single Gaussian, (b) a mix… ▽ More We give a polynomial-time algorithm for the problem of robustly estimating a mixture of $k$ arbitrary Gaussians in $\mathbb{R}^d$, for any fixed $k$, in the presence of a constant fraction of arbitrary corruptions. This resolves the main open problem in several previous works on algorithmic robust statistics, which addressed the special cases of robustly estimating (a) a single Gaussian, (b) a mixture of TV-distance separated Gaussians, and (c) a uniform mixture of two Gaussians. Our main tools are an efficient \emph{partial clustering} algorithm that relies on the sum-of-squares method, and a novel \emph{tensor decomposition} algorithm that allows errors in both Frobenius norm and low-rank terms. △ Less

Submitted 7 June, 2021; v1 submitted 3 December, 2020; originally announced December 2020.

Comments: This version extends the previous one to yield 1) robust proper learning algorithm with poly(eps) error and 2) an information theoretic argument proving that the same algorithms in fact also yield parameter recovery guarantees. The updates are included in Sections 7,8, and 9 and the main result from the previous version (Thm 1.4) is presented and proved in Section 6

arXiv:2011.06585 [pdf, ps, other]

Sparse PCA: Algorithms, Adversarial Perturbations and Certificates

Authors: Tommaso d'Orsi, Pravesh K. Kothari, Gleb Novikov, David Steurer

Abstract: We study efficient algorithms for Sparse PCA in standard statistical models (spiked covariance in its Wishart form). Our goal is to achieve optimal recovery guarantees while being resilient to small perturbations. Despite a long history of prior works, including explicit studies of perturbation resilience, the best known algorithmic guarantees for Sparse PCA are fragile and break down under small… ▽ More We study efficient algorithms for Sparse PCA in standard statistical models (spiked covariance in its Wishart form). Our goal is to achieve optimal recovery guarantees while being resilient to small perturbations. Despite a long history of prior works, including explicit studies of perturbation resilience, the best known algorithmic guarantees for Sparse PCA are fragile and break down under small adversarial perturbations. We observe a basic connection between perturbation resilience and \emph{certifying algorithms} that are based on certificates of upper bounds on sparse eigenvalues of random matrices. In contrast to other techniques, such certifying algorithms, including the brute-force maximum likelihood estimator, are automatically robust against small adversarial perturbation. We use this connection to obtain the first polynomial-time algorithms for this problem that are resilient against additive adversarial perturbations by obtaining new efficient certificates for upper bounds on sparse eigenvalues of random matrices. Our algorithms are based either on basic semidefinite programming or on its low-degree sum-of-squares strengthening depending on the parameter regimes. Their guarantees either match or approach the best known guarantees of \emph{fragile} algorithms in terms of sparsity of the unknown vector, number of samples and the ambient dimension. To complement our algorithmic results, we prove rigorous lower bounds matching the gap between fragile and robust polynomial-time algorithms in a natural computational model based on low-degree polynomials (closely related to the pseudo-calibration technique for sum-of-squares lower bounds) that is known to capture the best known guarantees for related statistical estimation problems. The combination of these results provides formal evidence of an inherent price to pay to achieve robustness. △ Less

Submitted 12 November, 2020; originally announced November 2020.

arXiv:2009.08032 [pdf, ps, other]

Strongly refuting all semi-random Boolean CSPs

Authors: Jackson Abascal, Venkatesan Guruswami, Pravesh K. Kothari

Abstract: We give an efficient algorithm to strongly refute \emph{semi-random} instances of all Boolean constraint satisfaction problems. The number of constraints required by our algorithm matches (up to polylogarithmic factors) the best-known bounds for efficient refutation of fully random instances. Our main technical contribution is an algorithm to strongly refute semi-random instances of the Boolean… ▽ More We give an efficient algorithm to strongly refute \emph{semi-random} instances of all Boolean constraint satisfaction problems. The number of constraints required by our algorithm matches (up to polylogarithmic factors) the best-known bounds for efficient refutation of fully random instances. Our main technical contribution is an algorithm to strongly refute semi-random instances of the Boolean $k$-XOR problem on $n$ variables that have $\widetilde{O}(n^{k/2})$ constraints. (In a semi-random $k$-XOR instance, the equations can be arbitrary and only the right-hand sides are random.) One of our key insights is to identify a simple combinatorial property of random XOR instances that makes spectral refutation work. Our approach involves taking an instance that does not satisfy this property (i.e., is \emph{not} pseudorandom) and reducing it to a partitioned collection of $2$-XOR instances. We analyze these subinstances using a carefully chosen quadratic form as a proxy, which in turn is bounded via a combination of spectral methods and semidefinite programming. The analysis of our spectral bounds relies only on an off-the-shelf matrix Bernstein inequality. Even for the purely random case, this leads to a shorter proof compared to the ones in the literature that rely on problem-specific trace-moment computations. △ Less

Submitted 16 September, 2020; originally announced September 2020.

Comments: 31 Pages

ACM Class: F.2.2

arXiv:2006.09969 [pdf, ps, other]

Playing Unique Games on Certified Small-Set Expanders

Authors: Mitali Bafna, Boaz Barak, Pravesh Kothari, Tselil Schramm, David Steurer

Abstract: We give an algorithm for solving unique games (UG) instances whenever low-degree sum-of-squares proofs certify good bounds on the small-set-expansion of the underlying constraint graph via a hypercontractive inequality. Our algorithm is in fact more versatile, and succeeds even when the constraint graph is not a small-set expander as long as the structure of non-expanding small sets is (informally… ▽ More We give an algorithm for solving unique games (UG) instances whenever low-degree sum-of-squares proofs certify good bounds on the small-set-expansion of the underlying constraint graph via a hypercontractive inequality. Our algorithm is in fact more versatile, and succeeds even when the constraint graph is not a small-set expander as long as the structure of non-expanding small sets is (informally speaking) "characterized" by a low-degree sum-of-squares proof. Our results are obtained by rounding \emph{low-entropy} solutions -- measured via a new global potential function -- to sum-of-squares (SoS) semidefinite programs. This technique adds to the (currently short) list of general tools for analyzing SoS relaxations for \emph{worst-case} optimization problems. As corollaries, we obtain the first polynomial-time algorithms for solving any UG instance where the constraint graph is either the \emph{noisy hypercube}, the \emph{short code} or the \emph{Johnson} graph. The prior best algorithm for such instances was the eigenvalue enumeration algorithm of Arora, Barak, and Steurer (2010) which requires quasi-polynomial time for the noisy hypercube and nearly-exponential time for the short code and Johnson graphs. All of our results achieve an approximation of $1-ε$ vs $δ$ for UG instances, where $ε>0$ and $δ> 0$ depend on the expansion parameters of the graph but are independent of the alphabet size. △ Less

Submitted 26 June, 2021; v1 submitted 17 June, 2020; originally announced June 2020.

Comments: To appear in STOC 2021

arXiv:2006.08464 [pdf, other]

Globally Injective ReLU Networks

Authors: Michael Puthawala, Konik Kothari, Matti Lassas, Ivan Dokmanić, Maarten de Hoop

Abstract: Injectivity plays an important role in generative models where it enables inference; in inverse problems and compressed sensing with generative priors it is a precursor to well posedness. We establish sharp characterizations of injectivity of fully-connected and convolutional ReLU layers and networks. First, through a layerwise analysis, we show that an expansivity factor of two is necessary and s… ▽ More Injectivity plays an important role in generative models where it enables inference; in inverse problems and compressed sensing with generative priors it is a precursor to well posedness. We establish sharp characterizations of injectivity of fully-connected and convolutional ReLU layers and networks. First, through a layerwise analysis, we show that an expansivity factor of two is necessary and sufficient for injectivity by constructing appropriate weight matrices. We show that global injectivity with iid Gaussian matrices, a commonly used tractable model, requires larger expansivity between 3.4 and 10.5. We also characterize the stability of inverting an injective network via worst-case Lipschitz constants of the inverse. We then use arguments from differential topology to study injectivity of deep networks and prove that any Lipschitz map can be approximated by an injective ReLU network. Finally, using an argument based on random projections, we show that an end-to-end -- rather than layerwise -- doubling of the dimension suffices for injectivity. Our results establish a theoretical basis for the study of nonlinear inverse and inference problems using neural networks. △ Less

Submitted 8 October, 2021; v1 submitted 15 June, 2020; originally announced June 2020.

Comments: 48 pages, 18 figures, submitted to JMLR

arXiv:2006.05854 [pdf, other]

Learning the geometry of wave-based imaging

Authors: Konik Kothari, Maarten de Hoop, Ivan Dokmanić

Abstract: We propose a general physics-based deep learning architecture for wave-based imaging problems. A key difficulty in imaging problems with a varying background wave speed is that the medium "bends" the waves differently depending on their position and direction. This space-bending geometry makes the equivariance to translations of convolutional networks an undesired inductive bias. We build an inter… ▽ More We propose a general physics-based deep learning architecture for wave-based imaging problems. A key difficulty in imaging problems with a varying background wave speed is that the medium "bends" the waves differently depending on their position and direction. This space-bending geometry makes the equivariance to translations of convolutional networks an undesired inductive bias. We build an interpretable neural architecture inspired by Fourier integral operators (FIOs) which approximate the wave physics. FIOs model a wide range of imaging modalities, from seismology and radar to Doppler and ultrasound. We focus on learning the geometry of wave propagation captured by FIOs, which is implicit in the data, via a loss based on optimal transport. The proposed FIONet performs significantly better than the usual baselines on a number of imaging inverse problems, especially in out-of-distribution tests. △ Less

Submitted 10 November, 2020; v1 submitted 10 June, 2020; originally announced June 2020.

Comments: Accepted as spotlight presentation to NeurIPS '20

arXiv:2005.02970 [pdf, other]

Outlier-Robust Clustering of Non-Spherical Mixtures

Authors: Ainesh Bakshi, Pravesh Kothari

Abstract: We give the first outlier-robust efficient algorithm for clustering a mixture of $k$ statistically separated d-dimensional Gaussians (k-GMMs). Concretely, our algorithm takes input an $ε$-corrupted sample from a $k$-GMM and whp in $d^{\text{poly}(k/η)}$ time, outputs an approximate clustering that misclassifies at most $k^{O(k)}(ε+η)$ fraction of the points whenever every pair of mixture component… ▽ More We give the first outlier-robust efficient algorithm for clustering a mixture of $k$ statistically separated d-dimensional Gaussians (k-GMMs). Concretely, our algorithm takes input an $ε$-corrupted sample from a $k$-GMM and whp in $d^{\text{poly}(k/η)}$ time, outputs an approximate clustering that misclassifies at most $k^{O(k)}(ε+η)$ fraction of the points whenever every pair of mixture components are separated by $1-\exp(-\text{poly}(k/η)^k)$ in total variation (TV) distance. Such a result was not previously known even for $k=2$. TV separation is the statistically weakest possible notion of separation and captures important special cases such as mixed linear regression and subspace clustering. Our main conceptual contribution is to distill simple analytic properties - (certifiable) hypercontractivity and bounded variance of degree 2 polynomials and anti-concentration of linear projections - that are necessary and sufficient for mixture models to be (efficiently) clusterable. As a consequence, our results extend to clustering mixtures of arbitrary affine transforms of the uniform distribution on the $d$-dimensional unit sphere. Even the information-theoretic clusterability of separated distributions satisfying these two analytic assumptions was not known prior to our work and is likely to be of independent interest. Our algorithms build on the recent sequence of works relying on certifiable anti-concentration first introduced in the works of Karmarkar, Klivans, and Kothari and Raghavendra, and Yau in 2019. Our techniques expand the sum-of-squares toolkit to show robust certifiability of TV-separated Gaussian clusters in data. This involves giving a low-degree sum-of-squares proof of statements that relate parameter (i.e. mean and covariances) distance to total variation distance by relying only on hypercontractivity and anti-concentration. △ Less

Submitted 14 December, 2020; v1 submitted 6 May, 2020; originally announced May 2020.

Comments: This version fixes a few typos and includes detailed proofs of the certifiable bounded variance property in Section 8 for natural distributions classes (fixing an issue with a generic lemma that proved such a property for a class of distributions in the previous version)

arXiv:2002.07235 [pdf, ps, other]

Time-Space Tradeoffs for Distinguishing Distributions and Applications to Security of Goldreich's PRG

Authors: Sumegha Garg, Pravesh K. Kothari, Ran Raz

Abstract: In this work, we establish lower-bounds against memory bounded algorithms for distinguishing between natural pairs of related distributions from samples that arrive in a streaming setting. In our first result, we show that any algorithm that distinguishes between uniform distribution on $\{0,1\}^n$ and uniform distribution on an $n/2$-dimensional linear subspace of $\{0,1\}^n$ with non-negligibl… ▽ More In this work, we establish lower-bounds against memory bounded algorithms for distinguishing between natural pairs of related distributions from samples that arrive in a streaming setting. In our first result, we show that any algorithm that distinguishes between uniform distribution on $\{0,1\}^n$ and uniform distribution on an $n/2$-dimensional linear subspace of $\{0,1\}^n$ with non-negligible advantage needs $2^{Ω(n)}$ samples or $Ω(n^2)$ memory. Our second result applies to distinguishing outputs of Goldreich's local pseudorandom generator from the uniform distribution on the output domain. Specifically, Goldreich's pseudorandom generator $G$ fixes a predicate $P:\{0,1\}^k \rightarrow \{0,1\}$ and a collection of subsets $S_1, S_2, \ldots, S_m \subseteq [n]$ of size $k$. For any seed $x \in \{0,1\}^n$, it outputs $P(x_{S_1}), P(x_{S_2}), \ldots, P(x_{S_m})$ where $x_{S_i}$ is the projection of $x$ to the coordinates in $S_i$. We prove that whenever $P$ is $t$-resilient (all non-zero Fourier coefficients of $(-1)^P$ are of degree $t$ or higher), then no algorithm, with $<n^ε$ memory, can distinguish the output of $G$ from the uniform distribution on $\{0,1\}^m$ with a large inverse polynomial advantage, for stretch $m \le \left(\frac{n}{t}\right)^{\frac{(1-ε)}{36}\cdot t}$ (barring some restrictions on $k$). The lower bound holds in the streaming model where at each time step $i$, $S_i\subseteq [n]$ is a randomly chosen (ordered) subset of size $k$ and the distinguisher sees either $P(x_{S_i})$ or a uniformly random bit along with $S_i$. Our proof builds on the recently developed machinery for proving time-space trade-offs (Raz 2016 and follow-ups) for search/learning problems. △ Less

Submitted 17 February, 2020; originally announced February 2020.

Comments: 35 pages

arXiv:2002.05139 [pdf, ps, other]

List-Decodable Subspace Recovery: Dimension Independent Error in Polynomial Time

Authors: Ainesh Bakshi, Pravesh K. Kothari

Abstract: In list-decodable subspace recovery, the input is a collection of $n$ points $αn$ (for some $α\ll 1/2$) of which are drawn i.i.d. from a distribution $\mathcal{D}$ with a isotropic rank $r$ covariance $Π_*$ (the \emph{inliers}) and the rest are arbitrary, potential adversarial outliers. The goal is to recover a $O(1/α)$ size list of candidate covariances that contains a $\hatΠ$ close to $Π_*$. Two… ▽ More In list-decodable subspace recovery, the input is a collection of $n$ points $αn$ (for some $α\ll 1/2$) of which are drawn i.i.d. from a distribution $\mathcal{D}$ with a isotropic rank $r$ covariance $Π_*$ (the \emph{inliers}) and the rest are arbitrary, potential adversarial outliers. The goal is to recover a $O(1/α)$ size list of candidate covariances that contains a $\hatΠ$ close to $Π_*$. Two recent independent works (Raghavendra-Yau, Bakshi-Kothari 2020) gave the first efficient algorithm for this problem. These results, however, obtain an error that grows with the dimension (linearly in [RY] and logarithmically in BK) at the cost of quasi-polynomial running time) and rely on \emph{certifiable anti-concentration} - a relatively strict condition satisfied essentially only by the Gaussian distribution. In this work, we improve on these results on all three fronts: \emph{dimension-independent} error via a faster fixed-polynomial running time under less restrictive distributional assumptions. Specifically, we give a $poly(1/α) d^{O(1)}$ time algorithm that outputs a list containing a $\hatΠ$ satisfying $\|\hatΠ -Π_*\|_F \leq O(1/α)$. Our result only needs $\mathcal{D}$ to have \emph{certifiably hypercontractive} degree 2 polynomials. As a result, in addition to Gaussians, our algorithm applies to the uniform distribution on the hypercube and $q$-ary cubes and arbitrary product distributions with subgaussian marginals. Prior work (Raghavendra and Yau, 2020) had identified such distributions as potential hard examples as such distributions do not exhibit strong enough anti-concentration. When $\mathcal{D}$ satisfies certifiable anti-concentration, we obtain a stronger error guarantee of $\|\hatΠ-Π_*\|_F \leq η$ for any arbitrary $η> 0$ in $d^{O(poly(1/α) + \log (1/η))}$ time. △ Less

Submitted 7 January, 2021; v1 submitted 12 February, 2020; originally announced February 2020.

Comments: To appear in SODA 2021. This version fixes an issue in a technical claim bounding the variance of degree 2 polynomials and improves exposition

ACM Class: F.2.2

arXiv:1905.05679 [pdf, ps, other]

List-Decodable Linear Regression

Authors: Sushrut Karmalkar, Adam R. Klivans, Pravesh K. Kothari

Abstract: We give the first polynomial-time algorithm for robust regression in the list-decodable setting where an adversary can corrupt a greater than $1/2$ fraction of examples. For any $α< 1$, our algorithm takes as input a sample $\{(x_i,y_i)\}_{i \leq n}$ of $n$ linear equations where $αn$ of the equations satisfy $y_i = \langle x_i,\ell^*\rangle +ζ$ for some small noise $ζ$ and $(1-α)n$ of the equat… ▽ More We give the first polynomial-time algorithm for robust regression in the list-decodable setting where an adversary can corrupt a greater than $1/2$ fraction of examples. For any $α< 1$, our algorithm takes as input a sample $\{(x_i,y_i)\}_{i \leq n}$ of $n$ linear equations where $αn$ of the equations satisfy $y_i = \langle x_i,\ell^*\rangle +ζ$ for some small noise $ζ$ and $(1-α)n$ of the equations are {\em arbitrarily} chosen. It outputs a list $L$ of size $O(1/α)$ - a fixed constant - that contains an $\ell$ that is close to $\ell^*$. Our algorithm succeeds whenever the inliers are chosen from a \emph{certifiably} anti-concentrated distribution $D$. In particular, this gives a $(d/α)^{O(1/α^8)}$ time algorithm to find a $O(1/α)$ size list when the inlier distribution is standard Gaussian. For discrete product distributions that are anti-concentrated only in \emph{regular} directions, we give an algorithm that achieves similar guarantee under the promise that $\ell^*$ has all coordinates of the same magnitude. To complement our result, we prove that the anti-concentration assumption on the inliers is information-theoretically necessary. Our algorithm is based on a new framework for list-decodable learning that strengthens the `identifiability to algorithms' paradigm based on the sum-of-squares method. In an independent and concurrent work, Raghavendra and Yau also used the Sum-of-Squares method to give a similar result for list-decodable regression. △ Less

Submitted 30 May, 2019; v1 submitted 14 May, 2019; originally announced May 2019.

Comments: 28 Pages

arXiv:1905.05231 [pdf, ps, other]

doi 10.1109/FOCS.2019.00023

Approximation Schemes for a Unit-Demand Buyer with Independent Items via Symmetries

Authors: Pravesh Kothari, Divyarthi Mohan, Ariel Schvartzman, Sahil Singla, S. Matthew Weinberg

Abstract: We consider a revenue-maximizing seller with $n$ items facing a single buyer. We introduce the notion of symmetric menu complexity of a mechanism, which counts the number of distinct options the buyer may purchase, up to permutations of the items. Our main result is that a mechanism of quasi-polynomial symmetric menu complexity suffices to guarantee a $(1-\varepsilon)$-approximation when the buyer… ▽ More We consider a revenue-maximizing seller with $n$ items facing a single buyer. We introduce the notion of symmetric menu complexity of a mechanism, which counts the number of distinct options the buyer may purchase, up to permutations of the items. Our main result is that a mechanism of quasi-polynomial symmetric menu complexity suffices to guarantee a $(1-\varepsilon)$-approximation when the buyer is unit-demand over independent items, even when the value distribution is unbounded, and that this mechanism can be found in quasi-polynomial time. Our key technical result is a polynomial time, (symmetric) menu-complexity-preserving black-box reduction from achieving a $(1-\varepsilon)$-approximation for unbounded valuations that are subadditive over independent items to achieving a $(1-O(\varepsilon))$-approximation when the values are bounded (and still subadditive over independent items). We further apply this reduction to deduce approximation schemes for a suite of valuation classes beyond our main result. Finally, we show that selling separately (which has exponential menu complexity) can be approximated up to a $(1-\varepsilon)$ factor with a menu of efficient-linear $(f(\varepsilon) \cdot n)$ symmetric menu complexity. △ Less

Submitted 19 November, 2019; v1 submitted 13 May, 2019; originally announced May 2019.

Comments: FOCS 2019

arXiv:1902.04782 [pdf, ps, other]

On the Expressive Power of Kernel Methods and the Efficiency of Kernel Learning by Association Schemes

Authors: Pravesh K. Kothari, Roi Livni

Abstract: We study the expressive power of kernel methods and the algorithmic feasibility of multiple kernel learning for a special rich class of kernels. Specifically, we define \emph{Euclidean kernels}, a diverse class that includes most, if not all, families of kernels studied in literature such as polynomial kernels and radial basis functions. We then describe the geometric and spectral structure of t… ▽ More We study the expressive power of kernel methods and the algorithmic feasibility of multiple kernel learning for a special rich class of kernels. Specifically, we define \emph{Euclidean kernels}, a diverse class that includes most, if not all, families of kernels studied in literature such as polynomial kernels and radial basis functions. We then describe the geometric and spectral structure of this family of kernels over the hypercube (and to some extent for any compact domain). Our structural results allow us to prove meaningful limitations on the expressive power of the class as well as derive several efficient algorithms for learning kernels over different domains. △ Less

Submitted 13 February, 2019; originally announced February 2019.

arXiv:1806.09426 [pdf, ps, other]

Sum-of-Squares meets Nash: Optimal Lower Bounds for Finding any Equilibrium

Authors: Pravesh K. Kothari, Ruta Mehta

Abstract: Several works have shown unconditional hardness (via integrality gaps) of computing equilibria using strong hierarchies of convex relaxations. Such results however only apply to the problem of computing equilibria that optimize a certain objective function and not to the (arguably more fundamental) task of finding \emph{any} equilibrium. We present an algorithmic model based on the sum-of-square… ▽ More Several works have shown unconditional hardness (via integrality gaps) of computing equilibria using strong hierarchies of convex relaxations. Such results however only apply to the problem of computing equilibria that optimize a certain objective function and not to the (arguably more fundamental) task of finding \emph{any} equilibrium. We present an algorithmic model based on the sum-of-squares (SoS) hierarchy that allows esca** this inherent limitation of integrality gaps. In this model, algorithms access the input game only through a relaxed solution to the natural SoS relaxation for computing equilibria. They can then adaptively construct a list of candidate solutions and invoke a verification oracle to check if any candidate on the list is a solution. This model captures most well-studied approximation algorithms such as those for Max-Cut, Sparsest Cut, and Unique-Games. The state-of-the-art algorithms for computing exact and approximate equilibria in two-player, n-strategy games are captured in this model and require that at least one of i) size (~ running time) of the SoS relaxation or ii) the size of the list of candidates, be at least $2^{Ω(n)}$ and $n^{Ω(\log{(n)})}$ respectively. Our main result shows a lower bound that matches these upper bound up to constant factors in the exponent. This can be interpreted as an unconditional confirmation, in our restricted algorithmic framework, of Rubinstein's recent conditional hardness \cite{Rub} for computing approximate equilibria. Our proof strategy involves constructing a family of games that all share a common sum-of-squares solution but every (approximate) equilibrium of one game is far from every (approximate) equilibrium of any other game in the family. △ Less

Submitted 25 June, 2018; originally announced June 2018.

ACM Class: F.2.2

Journal ref: Proceedings of STOC 2018

arXiv:1805.11718 [pdf, other]

Random mesh projectors for inverse problems

Authors: Sidharth Gupta, Konik Kothari, Maarten V. de Hoop, Ivan Dokmanić

Abstract: We propose a new learning-based approach to solve ill-posed inverse problems in imaging. We address the case where ground truth training samples are rare and the problem is severely ill-posed - both because of the underlying physics and because we can only get few measurements. This setting is common in geophysical imaging and remote sensing. We show that in this case the common approach to direct… ▽ More We propose a new learning-based approach to solve ill-posed inverse problems in imaging. We address the case where ground truth training samples are rare and the problem is severely ill-posed - both because of the underlying physics and because we can only get few measurements. This setting is common in geophysical imaging and remote sensing. We show that in this case the common approach to directly learn the map** from the measured data to the reconstruction becomes unstable. Instead, we propose to first learn an ensemble of simpler map**s from the data to projections of the unknown image into random piecewise-constant subspaces. We then combine the projections to form a final reconstruction by solving a deconvolution-like problem. We show experimentally that the proposed method is more robust to measurement noise and corruptions not seen during training than a directly learned inverse. △ Less

Submitted 5 December, 2018; v1 submitted 29 May, 2018; originally announced May 2018.

Comments: S. Gupta and K. Kothari contributed equally

arXiv:1804.08662 [pdf, ps, other]

Small-Set Expansion in Shortcode Graph and the 2-to-2 Conjecture

Authors: Boaz Barak, Pravesh K. Kothari, David Steurer

Abstract: Dinur, Khot, Kindler, Minzer and Safra (2016) recently showed that the (imperfect completeness variant of) Khot's 2 to 2 games conjecture follows from a combinatorial hypothesis about the soundness of a certain "Grassmanian agreement tester". In this work, we show that the hypothesis of Dinur et. al. follows from a conjecture we call the "Inverse Shortcode Hypothesis" characterizing the non-expand… ▽ More Dinur, Khot, Kindler, Minzer and Safra (2016) recently showed that the (imperfect completeness variant of) Khot's 2 to 2 games conjecture follows from a combinatorial hypothesis about the soundness of a certain "Grassmanian agreement tester". In this work, we show that the hypothesis of Dinur et. al. follows from a conjecture we call the "Inverse Shortcode Hypothesis" characterizing the non-expanding sets of the degree-two shortcode graph. We also show the latter conjecture is equivalent to a characterization of the non-expanding sets in the Grassman graph, as hypothesized by a follow-up paper of Dinur et. al. (2017). Following our work, Khot, Minzer and Safra (2018) proved the "Inverse Shortcode Hypothesis". Combining their proof with our result and the reduction of Dinur et. al. (2016), completes the proof of the 2 to 2 conjecture with imperfect completeness. Moreover, we believe that the shortcode graph provides a useful view of both the hypothesis and the reduction, and might be useful in extending it further. △ Less

Submitted 23 April, 2018; originally announced April 2018.

Comments: 13 pages

arXiv:1803.03241 [pdf, ps, other]

Efficient Algorithms for Outlier-Robust Regression

Authors: Adam Klivans, Pravesh K. Kothari, Raghu Meka

Abstract: We give the first polynomial-time algorithm for performing linear or polynomial regression resilient to adversarial corruptions in both examples and labels. Given a sufficiently large (polynomial-size) training set drawn i.i.d. from distribution D and subsequently corrupted on some fraction of points, our algorithm outputs a linear function whose squared error is close to the squared error of th… ▽ More We give the first polynomial-time algorithm for performing linear or polynomial regression resilient to adversarial corruptions in both examples and labels. Given a sufficiently large (polynomial-size) training set drawn i.i.d. from distribution D and subsequently corrupted on some fraction of points, our algorithm outputs a linear function whose squared error is close to the squared error of the best-fitting linear function with respect to D, assuming that the marginal distribution of D over the input space is \emph{certifiably hypercontractive}. This natural property is satisfied by many well-studied distributions such as Gaussian, strongly log-concave distributions and, uniform distribution on the hypercube among others. We also give a simple statistical lower bound showing that some distributional assumption is necessary to succeed in this setting. These results are the first of their kind and were not known to be even information-theoretically possible prior to our work. Our approach is based on the sum-of-squares (SoS) method and is inspired by the recent applications of the method for parameter recovery problems in unsupervised learning. Our algorithm can be seen as a natural convex relaxation of the following conceptually simple non-convex optimization problem: find a linear function and a large subset of the input corrupted sample such that the least squares loss of the function over the subset is minimized over all possible large subsets. △ Less

Submitted 4 June, 2020; v1 submitted 8 March, 2018; originally announced March 2018.

Comments: 27 pages. Appeared in COLT 2018. This update removes Lemma 6.2 that erroneously claimed an information-theoretic lower bound on error rate as a function of fraction of outliers

arXiv:1803.01768 [pdf, other]

An Analysis of the t-SNE Algorithm for Data Visualization

Authors: Sanjeev Arora, Wei Hu, Pravesh K. Kothari

Abstract: A first line of attack in exploratory data analysis is data visualization, i.e., generating a 2-dimensional representation of data that makes clusters of similar points visually identifiable. Standard Johnson-Lindenstrauss dimensionality reduction does not produce data visualizations. The t-SNE heuristic of van der Maaten and Hinton, which is based on non-convex optimization, has become the de fac… ▽ More A first line of attack in exploratory data analysis is data visualization, i.e., generating a 2-dimensional representation of data that makes clusters of similar points visually identifiable. Standard Johnson-Lindenstrauss dimensionality reduction does not produce data visualizations. The t-SNE heuristic of van der Maaten and Hinton, which is based on non-convex optimization, has become the de facto standard for visualization in a wide range of applications. This work gives a formal framework for the problem of data visualization - finding a 2-dimensional embedding of clusterable data that correctly separates individual clusters to make them visually identifiable. We then give a rigorous analysis of the performance of t-SNE under a natural, deterministic condition on the "ground-truth" clusters (similar to conditions assumed in earlier analyses of clustering) in the underlying data. These are the first provable guarantees on t-SNE for constructing good data visualizations. We show that our deterministic condition is satisfied by considerably general probabilistic generative models for clusterable data such as mixtures of well-separated log-concave distributions. Finally, we give theoretical evidence that t-SNE provably succeeds in partially recovering cluster structure even when the above deterministic condition is not met. △ Less

Submitted 6 June, 2018; v1 submitted 5 March, 2018; originally announced March 2018.

Comments: In Conference on Learning Theory (COLT) 2018

Showing 1–50 of 64 results for author: Kothari, K