Skip to main content

Showing 1–50 of 64 results for author: Kothari, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.15084  [pdf, other

    cs.DS cs.LG stat.ML

    Efficient Certificates of Anti-Concentration Beyond Gaussians

    Authors: Ainesh Bakshi, Pravesh Kothari, Goutham Rajendran, Madhur Tulsiani, Aravindan Vijayaraghavan

    Abstract: A set of high dimensional points $X=\{x_1, x_2,\ldots, x_n\} \subset R^d$ in isotropic position is said to be $δ$-anti concentrated if for every direction $v$, the fraction of points in $X$ satisfying $|\langle x_i,v \rangle |\leq δ$ is at most $O(δ)$. Motivated by applications to list-decodable learning and clustering, recent works have considered the problem of constructing efficient certificate… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  2. arXiv:2405.10238  [pdf, other

    cs.DS cs.CC

    Rounding Large Independent Sets on Expanders

    Authors: Mitali Bafna, Jun-Ting Hsieh, Pravesh K. Kothari

    Abstract: We develop a new approach for approximating large independent sets when the input graph is a one-sided spectral expander - that is, the uniform random walk matrix of the graph has the second eigenvalue bounded away from 1. Consequently, we obtain a polynomial time algorithm to find linear-sized independent sets in one-sided expanders that are almost $3$-colorable or are promised to contain an inde… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: 57 pages, 3 figures

  3. arXiv:2404.14159  [pdf, ps, other

    cs.DS

    Semirandom Planted Clique and the Restricted Isometry Property

    Authors: Jarosław Błasiok, Rares-Darius Buhai, Pravesh K. Kothari, David Steurer

    Abstract: We give a simple, greedy $O(n^{ω+0.5})=O(n^{2.872})$-time algorithm to list-decode planted cliques in a semirandom model introduced in [CSV17] (following [FK01]) that succeeds whenever the size of the planted clique is $k\geq O(\sqrt{n} \log^2 n)$. In the model, the edges touching the vertices in the planted $k$-clique are drawn independently with probability $p=1/2$ while the edges not touching t… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: 21 pages

  4. arXiv:2404.06513  [pdf, ps, other

    cs.CC

    Superpolynomial Lower Bounds for Smooth 3-LCCs and Sharp Bounds for Designs

    Authors: Pravesh K. Kothari, Peter Manohar

    Abstract: We give improved lower bounds for binary $3$-query locally correctable codes (3-LCCs) $C \colon \{0,1\}^k \rightarrow \{0,1\}^n$. Specifically, we prove: (1) If $C$ is a linear design 3-LCC, then $n \geq 2^{(1 - o(1))\sqrt{k} }$. A design 3-LCC has the additional property that the correcting sets for every codeword bit form a perfect matching and every pair of codeword bits is queried an equal n… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  5. arXiv:2401.11590  [pdf, ps, other

    cs.CC math.CO

    Small Even Covers, Locally Decodable Codes and Restricted Subgraphs of Edge-Colored Kikuchi Graphs

    Authors: Jun-Ting Hsieh, Pravesh K. Kothari, Sidhanth Mohanty, David Munhá Correia, Benny Sudakov

    Abstract: Given a $k$-uniform hypergraph $H$ on $n$ vertices, an even cover in $H$ is a collection of hyperedges that touch each vertex an even number of times. Even covers are a generalization of cycles in graphs and are equivalent to linearly dependent subsets of a system of linear equations modulo $2$. As a result, they arise naturally in the context of well-studied questions in coding theory and refutin… ▽ More

    Submitted 21 January, 2024; originally announced January 2024.

    Comments: 19 pages

  6. arXiv:2311.13490  [pdf, other

    q-bio.QM cs.LG

    Benchmarking Toxic Molecule Classification using Graph Neural Networks and Few Shot Learning

    Authors: Bhavya Mehta, Kush Kothari, Reshmika Nambiar, Seema Shrawne

    Abstract: Traditional methods like Graph Convolutional Networks (GCNs) face challenges with limited data and class imbalance, leading to suboptimal performance in graph classification tasks during toxicity prediction of molecules as a whole. To address these issues, we harness the power of Graph Isomorphic Networks, Multi Headed Attention and Free Large-scale Adversarial Augmentation separately on Graphs fo… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

  7. Exploring Graph Classification Techniques Under Low Data Constraints: A Comprehensive Study

    Authors: Kush Kothari, Bhavya Mehta, Reshmika Nambiar, Seema Shrawne

    Abstract: This survey paper presents a brief overview of recent research on graph data augmentation and few-shot learning. It covers various techniques for graph data augmentation, including node and edge perturbation, graph coarsening, and graph generation, as well as the latest developments in few-shot learning, such as meta-learning and model-agnostic meta-learning. The paper explores these areas in dept… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

  8. arXiv:2311.00558  [pdf, other

    cs.CC

    An Exponential Lower Bound for Linear 3-Query Locally Correctable Codes

    Authors: Pravesh K. Kothari, Peter Manohar

    Abstract: We prove that the blocklength $n$ of a linear $3$-query locally correctable code (LCC) $\mathcal{L} \colon {\mathbb F}^k \to {\mathbb F}^n$ with distance $δ$ must be at least $n \geq 2^{Ω\left(\left(\frac{δ^2 k}{(|{\mathbb F}|-1)^2}\right)^{1/8}\right)}$. In particular, the blocklength of a linear $3$-query LCC with constant distance over any small field grows exponentially with $k$. This improves… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

  9. arXiv:2310.05651  [pdf, other

    cs.LG cs.AI

    FENCE: Fairplay Ensuring Network Chain Entity for Real-Time Multiple ID Detection at Scale In Fantasy Sports

    Authors: Akriti Upreti, Kartavya Kothari, Utkarsh Thukral, Vishal Verma

    Abstract: Dream11 takes pride in being a unique platform that enables over 190 million fantasy sports users to demonstrate their skills and connect deeper with their favorite sports. While managing such a scale, one issue we are faced with is duplicate/multiple account creation in the system. This is done by some users with the intent of abusing the platform, typically for bonus offers. The challenge is to… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

    Comments: 7 pages, 7 figures, accepted in AIML Systems 2023

    ACM Class: I.2.1

  10. arXiv:2310.00393  [pdf, ps, other

    cs.DS cs.CC

    New SDP Roundings and Certifiable Approximation for Cubic Optimization

    Authors: Jun-Ting Hsieh, Pravesh K. Kothari, Lucas Pesenti, Luca Trevisan

    Abstract: We give new rounding schemes for SDP relaxations for the problems of maximizing cubic polynomials over the unit sphere and the $n$-dimensional hypercube. In both cases, the resulting algorithms yield a $O(\sqrt{n/k})$ multiplicative approximation in $2^{O(k)} \text{poly}(n)$ time. In particular, we obtain a $O(\sqrt{n/\log n})$ approximation in polynomial time. For the unit sphere, this improves o… ▽ More

    Submitted 30 September, 2023; originally announced October 2023.

  11. arXiv:2309.16897  [pdf, other

    cs.CC cs.DS

    Efficient Algorithms for Semirandom Planted CSPs at the Refutation Threshold

    Authors: Venkatesan Guruswami, Jun-Ting Hsieh, Pravesh K. Kothari, Peter Manohar

    Abstract: We present an efficient algorithm to solve semirandom planted instances of any Boolean constraint satisfaction problem (CSP). The semirandom model is a hybrid between worst-case and average-case input models, where the input is generated by (1) choosing an arbitrary planted assignment $x^*$, (2) choosing an arbitrary clause structure, and (3) choosing literal negations for each clause from an arbi… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

    Comments: FOCS 2023

  12. arXiv:2308.15403  [pdf, ps, other

    cs.CC cs.IT

    A Near-Cubic Lower Bound for 3-Query Locally Decodable Codes from Semirandom CSP Refutation

    Authors: Omar Alrabiah, Venkatesan Guruswami, Pravesh K. Kothari, Peter Manohar

    Abstract: A code $C \colon \{0,1\}^k \to \{0,1\}^n$ is a $q$-locally decodable code ($q$-LDC) if one can recover any chosen bit $b_i$ of the message $b \in \{0,1\}^k$ with good confidence by randomly querying the encoding $x := C(b)$ on at most $q$ coordinates. Existing constructions of $2$-LDCs achieve $n = \exp(O(k))$, and lower bounds show that this is in fact tight. However, when $q = 3$, far less is kn… ▽ More

    Submitted 29 August, 2023; originally announced August 2023.

  13. arXiv:2307.05954  [pdf, other

    math.PR cs.CC cs.DS

    Ellipsoid Fitting Up to a Constant

    Authors: Jun-Ting Hsieh, Pravesh K. Kothari, Aaron Potechin, Jeff Xu

    Abstract: In [Sau11,SPW13], Saunderson, Parrilo and Willsky asked the following elegant geometric question: what is the largest $m= m(d)$ such that there is an ellipsoid in $\mathbb{R}^d$ that passes through $v_1, v_2, \ldots, v_m$ with high probability when the $v_i$s are chosen independently from the standard Gaussian distribution $N(0,I_{d})$. The existence of such an ellipsoid is equivalent to the exist… ▽ More

    Submitted 12 July, 2023; originally announced July 2023.

    Comments: ICALP 2023

  14. arXiv:2303.00252  [pdf, ps, other

    cs.CC cs.DS

    Is Planted Coloring Easier than Planted Clique?

    Authors: Pravesh K. Kothari, Santosh S. Vempala, Alexander S. Wein, Jeff Xu

    Abstract: We study the computational complexity of two related problems: recovering a planted $q$-coloring in $G(n,1/2)$, and finding efficiently verifiable witnesses of non-$q$-colorability (a.k.a. refutations) in $G(n,1/2)$. Our main results show hardness for both these problems in a restricted-but-powerful class of algorithms based on computing low-degree polynomials in the inputs. The problem of recov… ▽ More

    Submitted 1 March, 2023; originally announced March 2023.

    Comments: 23 pages

  15. arXiv:2302.12289  [pdf, other

    cs.DS cs.LG math.ST stat.ML

    Beyond Moments: Robustly Learning Affine Transformations with Asymptotically Optimal Error

    Authors: He Jia, Pravesh K . Kothari, Santosh S. Vempala

    Abstract: We present a polynomial-time algorithm for robustly learning an unknown affine transformation of the standard hypercube from samples, an important and well-studied setting for independent component analysis (ICA). Specifically, given an $ε$-corrupted sample from a distribution $D$ obtained by applying an unknown affine transformation $x \rightarrow Ax+s$ to the uniform distribution on a $d$-dimens… ▽ More

    Submitted 23 February, 2023; originally announced February 2023.

  16. arXiv:2212.08018  [pdf, ps, other

    cs.DS cs.CR cs.IT stat.ML

    Privately Estimating a Gaussian: Efficient, Robust and Optimal

    Authors: Daniel Alabi, Pravesh K. Kothari, Pranay Tankala, Prayaag Venkat, Fred Zhang

    Abstract: In this work, we give efficient algorithms for privately estimating a Gaussian distribution in both pure and approximate differential privacy (DP) models with optimal dependence on the dimension in the sample complexity. In the pure DP setting, we give an efficient algorithm that estimates an unknown $d$-dimensional Gaussian distribution up to an arbitrary tiny total variation error using… ▽ More

    Submitted 1 June, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

  17. arXiv:2212.05619  [pdf, ps, other

    cs.DS

    Algorithms approaching the threshold for semi-random planted clique

    Authors: Rares-Darius Buhai, Pravesh K. Kothari, David Steurer

    Abstract: We design new polynomial-time algorithms for recovering planted cliques in the semi-random graph model introduced by Feige and Kilian 2001. The previous best algorithms for this model succeed if the planted clique has size at least $n^{2/3}$ in a graph with $n$ vertices (Mehta, Mckenzie, Trevisan 2019 and Charikar, Steinhardt, Valiant 2017). Our algorithms work for planted-clique sizes approaching… ▽ More

    Submitted 6 June, 2023; v1 submitted 11 December, 2022; originally announced December 2022.

    Comments: 51 pages, the arxiv landing page contains a shortened abstract

    ACM Class: F.2

  18. arXiv:2211.14312  [pdf

    q-bio.QM cs.CV cs.LG eess.IV

    Karyotype AI for Precision Oncology

    Authors: Zahra Shamsi, Drew Bryant, Jacob Wilson, Xiaoyu Qu, Avinava Dubey, Konik Kothari, Mostafa Dehghani, Mariya Chavarha, Valerii Likhosherstov, Brian Williams, Michael Frumkin, Fred Appelbaum, Krzysztof Choromanski, Ali Bashir, Min Fang

    Abstract: Chromosome analysis is essential for diagnosing genetic disorders. For hematologic malignancies, identification of somatic clonal aberrations by karyotype analysis remains the standard of care. However, karyoty** is costly and time-consuming because of the largely manual process and the expertise required in identifying and annotating aberrations. Efforts to automate karyotype analysis to date f… ▽ More

    Submitted 19 October, 2023; v1 submitted 19 November, 2022; originally announced November 2022.

  19. arXiv:2211.13312  [pdf, ps, other

    cs.LG cs.CC stat.ML

    A Moment-Matching Approach to Testable Learning and a New Characterization of Rademacher Complexity

    Authors: Aravind Gollakota, Adam R. Klivans, Pravesh K. Kothari

    Abstract: A remarkable recent paper by Rubinfeld and Vasilyan (2022) initiated the study of \emph{testable learning}, where the goal is to replace hard-to-verify distributional assumptions (such as Gaussianity) with efficiently testable ones and to require that the learner succeed whenever the unknown distribution passes the corresponding test. In this model, they gave an efficient algorithm for learning ha… ▽ More

    Submitted 23 November, 2022; originally announced November 2022.

    Comments: 34 pages

  20. arXiv:2211.10525  [pdf, other

    eess.IV cs.LG

    Differentiable Uncalibrated Imaging

    Authors: Sidharth Gupta, Konik Kothari, Valentin Debarnot, Ivan Dokmanić

    Abstract: We propose a differentiable imaging framework to address uncertainty in measurement coordinates such as sensor locations and projection angles. We formulate the problem as measurement interpolation at unknown nodes supervised through the forward operator. To solve it we apply implicit neural networks, also known as neural fields, which are naturally differentiable with respect to the input coordin… ▽ More

    Submitted 20 December, 2023; v1 submitted 18 November, 2022; originally announced November 2022.

  21. arXiv:2208.00122  [pdf, ps, other

    cs.DS cs.CC

    Polynomial-Time Power-Sum Decomposition of Polynomials

    Authors: Mitali Bafna, Jun-Ting Hsieh, Pravesh K. Kothari, Jeff Xu

    Abstract: We give efficient algorithms for finding power-sum decomposition of an input polynomial $P(x)= \sum_{i\leq m} p_i(x)^d$ with component $p_i$s. The case of linear $p_i$s is equivalent to the well-studied tensor decomposition problem while the quadratic case occurs naturally in studying identifiability of non-spherical Gaussian mixtures from low-order moments. Unlike tensor decomposition, both the… ▽ More

    Submitted 29 July, 2022; originally announced August 2022.

    Comments: To appear in FOCS 2022

  22. arXiv:2207.10850  [pdf, other

    math.CO cs.DM cs.DS

    A simple and sharper proof of the hypergraph Moore bound

    Authors: Jun-Ting Hsieh, Pravesh K. Kothari, Sidhanth Mohanty

    Abstract: The hypergraph Moore bound is an elegant statement that characterizes the extremal trade-off between the girth - the number of hyperedges in the smallest cycle or even cover (a subhypergraph with all degrees even) and size - the number of hyperedges in a hypergraph. For graphs (i.e., $2$-uniform hypergraphs), a bound tight up to the leading constant was proven in a classical work of Alon, Hoory an… ▽ More

    Submitted 21 July, 2022; originally announced July 2022.

  23. arXiv:2206.10942  [pdf, ps, other

    cs.DS cs.LG math.ST stat.ML

    List-Decodable Covariance Estimation

    Authors: Misha Ivkov, Pravesh K. Kothari

    Abstract: We give the first polynomial time algorithm for \emph{list-decodable covariance estimation}. For any $α> 0$, our algorithm takes input a sample $Y \subseteq \mathbb{R}^d$ of size $n\geq d^{\mathsf{poly}(1/α)}$ obtained by adversarially corrupting an $(1-α)n$ points in an i.i.d. sample $X$ of size $n$ from the Gaussian distribution with unknown mean $μ_*$ and covariance $Σ_*$. In… ▽ More

    Submitted 22 June, 2022; originally announced June 2022.

    Comments: Abstract slightly clipped. To appear at STOC 2022

    ACM Class: F.2.1

  24. arXiv:2206.09204  [pdf, ps, other

    cs.DS cs.CC

    Approximating Max-Cut on Bounded Degree Graphs: Tighter Analysis of the FKL Algorithm

    Authors: Jun-Ting Hsieh, Pravesh K. Kothari

    Abstract: In this note, we describe a $α_{GW} + \tildeΩ(1/d^2)$-factor approximation algorithm for Max-Cut on weighted graphs of degree $\leq d$. Here, $α_{GW}\approx 0.878$ is the worst-case approximation ratio of the Goemans-Williamson rounding for Max-Cut. This improves on previous results for unweighted graphs by Feige, Karpinski, and Langberg and Florén. Our guarantee is obtained by a tighter analysis… ▽ More

    Submitted 18 June, 2022; originally announced June 2022.

  25. arXiv:2205.06739  [pdf, ps, other

    cs.DS

    Bypassing the XOR Trick: Stronger Certificates for Hypergraph Clique Number

    Authors: Venkatesan Guruswami, Pravesh K. Kothari, Peter Manohar

    Abstract: Let $\mathcal{H}(k,n,p)$ be the distribution on $k$-uniform hypergraphs where every subset of $[n]$ of size $k$ is included as an hyperedge with probability $p$ independently. In this work, we design and analyze a simple spectral algorithm that certifies a bound on the size of the largest clique, $ω(H)$, in hypergraphs $H \sim \mathcal{H}(k,n,p)$. For example, for any constant $p$, with high proba… ▽ More

    Submitted 13 May, 2022; originally announced May 2022.

  26. Conditional Injective Flows for Bayesian Imaging

    Authors: AmirEhsan Khorashadizadeh, Konik Kothari, Leonardo Salsi, Ali Aghababaei Harandi, Maarten de Hoop, Ivan Dokmanić

    Abstract: Most deep learning models for computational imaging regress a single reconstructed image. In practice, however, ill-posedness, nonlinearity, model mismatch, and noise often conspire to make such point estimates misleading or insufficient. The Bayesian approach models images and (noisy) measurements as jointly distributed random vectors and aims to approximate the posterior distribution of unknowns… ▽ More

    Submitted 3 April, 2023; v1 submitted 15 April, 2022; originally announced April 2022.

    Comments: 23 pages, 23 figures

    Journal ref: IEEE Transactions on Computational Imaging, vol. 9, pp. 224-237, 2023

  27. arXiv:2112.03548  [pdf, ps, other

    stat.ML cs.CR cs.DS cs.IT cs.LG

    Private Robust Estimation by Stabilizing Convex Relaxations

    Authors: Pravesh K. Kothari, Pasin Manurangsi, Ameya Velingker

    Abstract: We give the first polynomial time and sample $(ε, δ)$-differentially private (DP) algorithm to estimate the mean, covariance and higher moments in the presence of a constant fraction of adversarial outliers. Our algorithm succeeds for families of distributions that satisfy two well-studied properties in prior works on robust estimation: certifiable subgaussianity of directional moments and certifi… ▽ More

    Submitted 7 December, 2021; originally announced December 2021.

  28. arXiv:2110.11853  [pdf, ps, other

    cs.DS math.ST

    Polynomial-Time Sum-of-Squares Can Robustly Estimate Mean and Covariance of Gaussians Optimally

    Authors: Pravesh K. Kothari, Peter Manohar, Brian Hu Zhang

    Abstract: In this work, we revisit the problem of estimating the mean and covariance of an unknown $d$-dimensional Gaussian distribution in the presence of an $\varepsilon$-fraction of adversarial outliers. The pioneering work of [DKK+16] gave a polynomial time algorithm for this task with optimal $\tilde{O}(\varepsilon)$ error using $n = \textrm{poly}(d, 1/\varepsilon)$ samples. On the other hand, [KS17b… ▽ More

    Submitted 22 October, 2021; originally announced October 2021.

  29. arXiv:2110.08677  [pdf, ps, other

    cs.CC cs.DS

    Algorithmic Thresholds for Refuting Random Polynomial Systems

    Authors: Jun-Ting Hsieh, Pravesh K. Kothari

    Abstract: Consider a system of $m$ polynomial equations $\{p_i(x) = b_i\}_{i \leq m}$ of degree $D\geq 2$ in $n$-dimensional variable $x \in \mathbb{R}^n$ such that each coefficient of every $p_i$ and $b_i$s are chosen at random and independently from some continuous distribution. We study the basic question of determining the smallest $m$ -- the algorithmic threshold -- for which efficient algorithms can f… ▽ More

    Submitted 16 October, 2021; originally announced October 2021.

  30. arXiv:2109.04415  [pdf, other

    cs.CC cs.DS

    Algorithms and Certificates for Boolean CSP Refutation: "Smoothed is no harder than Random"

    Authors: Venkatesan Guruswami, Pravesh K. Kothari, Peter Manohar

    Abstract: We present an algorithm for strongly refuting smoothed instances of all Boolean CSPs. The smoothed model is a hybrid between worst and average-case input models, where the input is an arbitrary instance of the CSP with only the negation patterns of the literals re-randomized with some small probability. For an $n$-variable smoothed instance of a $k$-arity CSP, our algorithm runs in $n^{O(\ell)}$ t… ▽ More

    Submitted 3 September, 2023; v1 submitted 9 September, 2021; originally announced September 2021.

  31. arXiv:2107.02320  [pdf, ps, other

    cs.LG cs.CC

    Memory-Sample Lower Bounds for Learning Parity with Noise

    Authors: Sumegha Garg, Pravesh K. Kothari, Pengda Liu, Ran Raz

    Abstract: In this work, we show, for the well-studied problem of learning parity under noise, where a learner tries to learn $x=(x_1,\ldots,x_n) \in \{0,1\}^n$ from a stream of random linear equations over $\mathrm{F}_2$ that are correct with probability $\frac{1}{2}+\varepsilon$ and flipped with probability $\frac{1}{2}-\varepsilon$, that any learning algorithm requires either a memory of size… ▽ More

    Submitted 5 July, 2021; originally announced July 2021.

    Comments: 19 pages. To appear in RANDOM 2021. arXiv admin note: substantial text overlap with arXiv:1708.02639

    ACM Class: F.2.3

  32. arXiv:2105.07517  [pdf, ps, other

    cs.CC

    A Stress-Free Sum-of-Squares Lower Bound for Coloring

    Authors: Pravesh K. Kothari, Peter Manohar

    Abstract: We prove that with high probability over the choice of a random graph $G$ from the Erdős-Rényi distribution $G(n,1/2)$, a natural $n^{O(\varepsilon^2 \log n)}$-time, degree $O(\varepsilon^2 \log n)$ sum-of-squares semidefinite program cannot refute the existence of a valid $k$-coloring of $G$ for $k = n^{1/2 +\varepsilon}$. Our result implies that the refutation guarantee of the basic semidefinite… ▽ More

    Submitted 16 May, 2021; originally announced May 2021.

  33. arXiv:2102.10461  [pdf, other

    cs.LG cs.AI eess.SP

    Trumpets: Injective Flows for Inference and Inverse Problems

    Authors: Konik Kothari, AmirEhsan Khorashadizadeh, Maarten de Hoop, Ivan Dokmanić

    Abstract: We propose injective generative models called Trumpets that generalize invertible normalizing flows. The proposed generators progressively increase dimension from a low-dimensional latent space. We demonstrate that Trumpets can be trained orders of magnitudes faster than standard flows while yielding samples of comparable or better quality. They retain many of the advantages of the standard flows… ▽ More

    Submitted 20 February, 2021; originally announced February 2021.

    Comments: 16 pages

    Journal ref: Uncertainty in Artificial Intelligence (UAI 2021)

  34. arXiv:2012.02119  [pdf, other

    cs.DS cs.LG math.ST stat.ML

    Robustly Learning Mixtures of $k$ Arbitrary Gaussians

    Authors: Ainesh Bakshi, Ilias Diakonikolas, He Jia, Daniel M. Kane, Pravesh K. Kothari, Santosh S. Vempala

    Abstract: We give a polynomial-time algorithm for the problem of robustly estimating a mixture of $k$ arbitrary Gaussians in $\mathbb{R}^d$, for any fixed $k$, in the presence of a constant fraction of arbitrary corruptions. This resolves the main open problem in several previous works on algorithmic robust statistics, which addressed the special cases of robustly estimating (a) a single Gaussian, (b) a mix… ▽ More

    Submitted 7 June, 2021; v1 submitted 3 December, 2020; originally announced December 2020.

    Comments: This version extends the previous one to yield 1) robust proper learning algorithm with poly(eps) error and 2) an information theoretic argument proving that the same algorithms in fact also yield parameter recovery guarantees. The updates are included in Sections 7,8, and 9 and the main result from the previous version (Thm 1.4) is presented and proved in Section 6

  35. arXiv:2011.06585  [pdf, ps, other

    cs.LG cs.DS

    Sparse PCA: Algorithms, Adversarial Perturbations and Certificates

    Authors: Tommaso d'Orsi, Pravesh K. Kothari, Gleb Novikov, David Steurer

    Abstract: We study efficient algorithms for Sparse PCA in standard statistical models (spiked covariance in its Wishart form). Our goal is to achieve optimal recovery guarantees while being resilient to small perturbations. Despite a long history of prior works, including explicit studies of perturbation resilience, the best known algorithmic guarantees for Sparse PCA are fragile and break down under small… ▽ More

    Submitted 12 November, 2020; originally announced November 2020.

  36. arXiv:2009.08032  [pdf, ps, other

    cs.CC cs.DS

    Strongly refuting all semi-random Boolean CSPs

    Authors: Jackson Abascal, Venkatesan Guruswami, Pravesh K. Kothari

    Abstract: We give an efficient algorithm to strongly refute \emph{semi-random} instances of all Boolean constraint satisfaction problems. The number of constraints required by our algorithm matches (up to polylogarithmic factors) the best-known bounds for efficient refutation of fully random instances. Our main technical contribution is an algorithm to strongly refute semi-random instances of the Boolean… ▽ More

    Submitted 16 September, 2020; originally announced September 2020.

    Comments: 31 Pages

    ACM Class: F.2.2

  37. arXiv:2006.09969  [pdf, ps, other

    cs.CC

    Playing Unique Games on Certified Small-Set Expanders

    Authors: Mitali Bafna, Boaz Barak, Pravesh Kothari, Tselil Schramm, David Steurer

    Abstract: We give an algorithm for solving unique games (UG) instances whenever low-degree sum-of-squares proofs certify good bounds on the small-set-expansion of the underlying constraint graph via a hypercontractive inequality. Our algorithm is in fact more versatile, and succeeds even when the constraint graph is not a small-set expander as long as the structure of non-expanding small sets is (informally… ▽ More

    Submitted 26 June, 2021; v1 submitted 17 June, 2020; originally announced June 2020.

    Comments: To appear in STOC 2021

  38. arXiv:2006.08464  [pdf, other

    cs.LG stat.ML

    Globally Injective ReLU Networks

    Authors: Michael Puthawala, Konik Kothari, Matti Lassas, Ivan Dokmanić, Maarten de Hoop

    Abstract: Injectivity plays an important role in generative models where it enables inference; in inverse problems and compressed sensing with generative priors it is a precursor to well posedness. We establish sharp characterizations of injectivity of fully-connected and convolutional ReLU layers and networks. First, through a layerwise analysis, we show that an expansivity factor of two is necessary and s… ▽ More

    Submitted 8 October, 2021; v1 submitted 15 June, 2020; originally announced June 2020.

    Comments: 48 pages, 18 figures, submitted to JMLR

  39. arXiv:2006.05854  [pdf, other

    eess.IV cs.LG

    Learning the geometry of wave-based imaging

    Authors: Konik Kothari, Maarten de Hoop, Ivan Dokmanić

    Abstract: We propose a general physics-based deep learning architecture for wave-based imaging problems. A key difficulty in imaging problems with a varying background wave speed is that the medium "bends" the waves differently depending on their position and direction. This space-bending geometry makes the equivariance to translations of convolutional networks an undesired inductive bias. We build an inter… ▽ More

    Submitted 10 November, 2020; v1 submitted 10 June, 2020; originally announced June 2020.

    Comments: Accepted as spotlight presentation to NeurIPS '20

  40. arXiv:2005.02970  [pdf, other

    cs.DS stat.ML

    Outlier-Robust Clustering of Non-Spherical Mixtures

    Authors: Ainesh Bakshi, Pravesh Kothari

    Abstract: We give the first outlier-robust efficient algorithm for clustering a mixture of $k$ statistically separated d-dimensional Gaussians (k-GMMs). Concretely, our algorithm takes input an $ε$-corrupted sample from a $k$-GMM and whp in $d^{\text{poly}(k/η)}$ time, outputs an approximate clustering that misclassifies at most $k^{O(k)}(ε+η)$ fraction of the points whenever every pair of mixture component… ▽ More

    Submitted 14 December, 2020; v1 submitted 6 May, 2020; originally announced May 2020.

    Comments: This version fixes a few typos and includes detailed proofs of the certifiable bounded variance property in Section 8 for natural distributions classes (fixing an issue with a generic lemma that proved such a property for a class of distributions in the previous version)

  41. arXiv:2002.07235  [pdf, ps, other

    cs.CC cs.CR

    Time-Space Tradeoffs for Distinguishing Distributions and Applications to Security of Goldreich's PRG

    Authors: Sumegha Garg, Pravesh K. Kothari, Ran Raz

    Abstract: In this work, we establish lower-bounds against memory bounded algorithms for distinguishing between natural pairs of related distributions from samples that arrive in a streaming setting. In our first result, we show that any algorithm that distinguishes between uniform distribution on $\{0,1\}^n$ and uniform distribution on an $n/2$-dimensional linear subspace of $\{0,1\}^n$ with non-negligibl… ▽ More

    Submitted 17 February, 2020; originally announced February 2020.

    Comments: 35 pages

  42. arXiv:2002.05139  [pdf, ps, other

    cs.DS cs.LG stat.ML

    List-Decodable Subspace Recovery: Dimension Independent Error in Polynomial Time

    Authors: Ainesh Bakshi, Pravesh K. Kothari

    Abstract: In list-decodable subspace recovery, the input is a collection of $n$ points $αn$ (for some $α\ll 1/2$) of which are drawn i.i.d. from a distribution $\mathcal{D}$ with a isotropic rank $r$ covariance $Π_*$ (the \emph{inliers}) and the rest are arbitrary, potential adversarial outliers. The goal is to recover a $O(1/α)$ size list of candidate covariances that contains a $\hatΠ$ close to $Π_*$. Two… ▽ More

    Submitted 7 January, 2021; v1 submitted 12 February, 2020; originally announced February 2020.

    Comments: To appear in SODA 2021. This version fixes an issue in a technical claim bounding the variance of degree 2 polynomials and improves exposition

    ACM Class: F.2.2

  43. arXiv:1905.05679  [pdf, ps, other

    cs.DS cs.LG stat.ML

    List-Decodable Linear Regression

    Authors: Sushrut Karmalkar, Adam R. Klivans, Pravesh K. Kothari

    Abstract: We give the first polynomial-time algorithm for robust regression in the list-decodable setting where an adversary can corrupt a greater than $1/2$ fraction of examples. For any $α< 1$, our algorithm takes as input a sample $\{(x_i,y_i)\}_{i \leq n}$ of $n$ linear equations where $αn$ of the equations satisfy $y_i = \langle x_i,\ell^*\rangle +ζ$ for some small noise $ζ$ and $(1-α)n$ of the equat… ▽ More

    Submitted 30 May, 2019; v1 submitted 14 May, 2019; originally announced May 2019.

    Comments: 28 Pages

  44. Approximation Schemes for a Unit-Demand Buyer with Independent Items via Symmetries

    Authors: Pravesh Kothari, Divyarthi Mohan, Ariel Schvartzman, Sahil Singla, S. Matthew Weinberg

    Abstract: We consider a revenue-maximizing seller with $n$ items facing a single buyer. We introduce the notion of symmetric menu complexity of a mechanism, which counts the number of distinct options the buyer may purchase, up to permutations of the items. Our main result is that a mechanism of quasi-polynomial symmetric menu complexity suffices to guarantee a $(1-\varepsilon)$-approximation when the buyer… ▽ More

    Submitted 19 November, 2019; v1 submitted 13 May, 2019; originally announced May 2019.

    Comments: FOCS 2019

  45. arXiv:1902.04782  [pdf, ps, other

    cs.LG stat.ML

    On the Expressive Power of Kernel Methods and the Efficiency of Kernel Learning by Association Schemes

    Authors: Pravesh K. Kothari, Roi Livni

    Abstract: We study the expressive power of kernel methods and the algorithmic feasibility of multiple kernel learning for a special rich class of kernels. Specifically, we define \emph{Euclidean kernels}, a diverse class that includes most, if not all, families of kernels studied in literature such as polynomial kernels and radial basis functions. We then describe the geometric and spectral structure of t… ▽ More

    Submitted 13 February, 2019; originally announced February 2019.

  46. arXiv:1806.09426  [pdf, ps, other

    cs.CC

    Sum-of-Squares meets Nash: Optimal Lower Bounds for Finding any Equilibrium

    Authors: Pravesh K. Kothari, Ruta Mehta

    Abstract: Several works have shown unconditional hardness (via integrality gaps) of computing equilibria using strong hierarchies of convex relaxations. Such results however only apply to the problem of computing equilibria that optimize a certain objective function and not to the (arguably more fundamental) task of finding \emph{any} equilibrium. We present an algorithmic model based on the sum-of-square… ▽ More

    Submitted 25 June, 2018; originally announced June 2018.

    ACM Class: F.2.2

    Journal ref: Proceedings of STOC 2018

  47. arXiv:1805.11718  [pdf, other

    cs.CV cs.LG eess.IV stat.ML

    Random mesh projectors for inverse problems

    Authors: Sidharth Gupta, Konik Kothari, Maarten V. de Hoop, Ivan Dokmanić

    Abstract: We propose a new learning-based approach to solve ill-posed inverse problems in imaging. We address the case where ground truth training samples are rare and the problem is severely ill-posed - both because of the underlying physics and because we can only get few measurements. This setting is common in geophysical imaging and remote sensing. We show that in this case the common approach to direct… ▽ More

    Submitted 5 December, 2018; v1 submitted 29 May, 2018; originally announced May 2018.

    Comments: S. Gupta and K. Kothari contributed equally

  48. arXiv:1804.08662  [pdf, ps, other

    cs.CC

    Small-Set Expansion in Shortcode Graph and the 2-to-2 Conjecture

    Authors: Boaz Barak, Pravesh K. Kothari, David Steurer

    Abstract: Dinur, Khot, Kindler, Minzer and Safra (2016) recently showed that the (imperfect completeness variant of) Khot's 2 to 2 games conjecture follows from a combinatorial hypothesis about the soundness of a certain "Grassmanian agreement tester". In this work, we show that the hypothesis of Dinur et. al. follows from a conjecture we call the "Inverse Shortcode Hypothesis" characterizing the non-expand… ▽ More

    Submitted 23 April, 2018; originally announced April 2018.

    Comments: 13 pages

  49. arXiv:1803.03241  [pdf, ps, other

    cs.LG cs.AI cs.DS stat.ML

    Efficient Algorithms for Outlier-Robust Regression

    Authors: Adam Klivans, Pravesh K. Kothari, Raghu Meka

    Abstract: We give the first polynomial-time algorithm for performing linear or polynomial regression resilient to adversarial corruptions in both examples and labels. Given a sufficiently large (polynomial-size) training set drawn i.i.d. from distribution D and subsequently corrupted on some fraction of points, our algorithm outputs a linear function whose squared error is close to the squared error of th… ▽ More

    Submitted 4 June, 2020; v1 submitted 8 March, 2018; originally announced March 2018.

    Comments: 27 pages. Appeared in COLT 2018. This update removes Lemma 6.2 that erroneously claimed an information-theoretic lower bound on error rate as a function of fraction of outliers

  50. arXiv:1803.01768  [pdf, other

    cs.LG

    An Analysis of the t-SNE Algorithm for Data Visualization

    Authors: Sanjeev Arora, Wei Hu, Pravesh K. Kothari

    Abstract: A first line of attack in exploratory data analysis is data visualization, i.e., generating a 2-dimensional representation of data that makes clusters of similar points visually identifiable. Standard Johnson-Lindenstrauss dimensionality reduction does not produce data visualizations. The t-SNE heuristic of van der Maaten and Hinton, which is based on non-convex optimization, has become the de fac… ▽ More

    Submitted 6 June, 2018; v1 submitted 5 March, 2018; originally announced March 2018.

    Comments: In Conference on Learning Theory (COLT) 2018