Skip to main content

Showing 1–26 of 26 results for author: Waingarten, E

.
  1. arXiv:2403.05041  [pdf, other

    cs.DS

    Data-Dependent LSH for the Earth Mover's Distance

    Authors: Rajesh Jayaram, Erik Waingarten, Tian Zhang

    Abstract: We give new data-dependent locality sensitive hashing schemes (LSH) for the Earth Mover's Distance ($\mathsf{EMD}$), and as a result, improve the best approximation for nearest neighbor search under $\mathsf{EMD}$ by a quadratic factor. Here, the metric $\mathsf{EMD}_s(\mathbb{R}^d,\ell_p)$ consists of sets of $s$ vectors in $\mathbb{R}^d$, and for any two sets $x,y$ of $s$ vectors the distance… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  2. arXiv:2401.02562  [pdf, ps, other

    cs.DS

    A Quasi-Monte Carlo Data Structure for Smooth Kernel Evaluations

    Authors: Moses Charikar, Michael Kapralov, Erik Waingarten

    Abstract: In the kernel density estimation (KDE) problem one is given a kernel $K(x, y)$ and a dataset $P$ of points in a Euclidean space, and must prepare a data structure that can quickly answer density queries: given a point $q$, output a $(1+ε)$-approximation to $μ:=\frac1{|P|}\sum_{p\in P} K(p, q)$. The classical approach to KDE is the celebrated fast multipole method of [Greengard and Rokhlin]. The fa… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

  3. arXiv:2310.16752  [pdf, other

    cs.LG cs.DS

    Simple, Scalable and Effective Clustering via One-Dimensional Projections

    Authors: Moses Charikar, Monika Henzinger, Lunjia Hu, Maxmilian Vötsch, Erik Waingarten

    Abstract: Clustering is a fundamental problem in unsupervised machine learning with many applications in data analysis. Popular clustering algorithms such as Lloyd's algorithm and $k$-means++ can take $Ω(ndk)$ time when clustering $n$ points in a $d$-dimensional space (represented by an $n\times d$ matrix $X$) into $k$ clusters. In applications with moderate to large $k$, the multiplicative $k$ factor can b… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: 41 pages, 6 figures, to appear in NeurIPS 2023

  4. arXiv:2307.10042  [pdf, ps, other

    cs.DS

    Fast Algorithms for a New Relaxation of Optimal Transport

    Authors: Moses Charikar, Beidi Chen, Christopher Re, Erik Waingarten

    Abstract: We introduce a new class of objectives for optimal transport computations of datasets in high-dimensional Euclidean spaces. The new objectives are parametrized by $ρ\geq 1$, and provide a metric space $\mathcal{R}_ρ(\cdot, \cdot)$ for discrete probability distributions in $\mathbb{R}^d$. As $ρ$ approaches $1$, the metric approaches the Earth Mover's distance, but for $ρ$ larger than (but close to)… ▽ More

    Submitted 14 July, 2023; originally announced July 2023.

    Comments: in COLT 2023

  5. arXiv:2307.03043  [pdf, other

    cs.DS cs.CG cs.GR cs.LG

    A Near-Linear Time Algorithm for the Chamfer Distance

    Authors: Ainesh Bakshi, Piotr Indyk, Rajesh Jayaram, Sandeep Silwal, Erik Waingarten

    Abstract: For any two point sets $A,B \subset \mathbb{R}^d$ of size up to $n$, the Chamfer distance from $A$ to $B$ is defined as $\text{CH}(A,B)=\sum_{a \in A} \min_{b \in B} d_X(a,b)$, where $d_X$ is the underlying distance measure (e.g., the Euclidean or Manhattan distance). The Chamfer distance is a popular measure of dissimilarity between point clouds, used in many machine learning, computer vision, an… ▽ More

    Submitted 6 July, 2023; originally announced July 2023.

  6. arXiv:2212.06546  [pdf, other

    cs.DS

    Streaming Euclidean MST to a Constant Factor

    Authors: Vincent Cohen-Addad, Xi Chen, Rajesh Jayaram, Amit Levi, Erik Waingarten

    Abstract: We study streaming algorithms for the fundamental geometric problem of computing the cost of the Euclidean Minimum Spanning Tree (MST) on an $n$-point set $X \subset \mathbb{R}^d$. In the streaming model, the points in $X$ can be added and removed arbitrarily, and the goal is to maintain an approximation in small space. In low dimensions, $(1+ε)$ approximations are possible in sublinear space [Fra… ▽ More

    Submitted 13 December, 2022; originally announced December 2022.

  7. arXiv:2205.09804  [pdf, ps, other

    cs.DS cs.IT cs.LG

    Estimation of Entropy in Constant Space with Improved Sample Complexity

    Authors: Maryam Aliakbarpour, Andrew McGregor, Jelani Nelson, Erik Waingarten

    Abstract: Recent work of Acharya et al. (NeurIPS 2019) showed how to estimate the entropy of a distribution $\mathcal D$ over an alphabet of size $k$ up to $\pmε$ additive error by streaming over $(k/ε^3) \cdot \text{polylog}(1/ε)$ i.i.d. samples and using only $O(1)$ words of memory. In this work, we give a new constant memory scheme that reduces the sample complexity to $(k/ε^2)\cdot \text{polylog}(1/ε)$.… ▽ More

    Submitted 19 May, 2022; originally announced May 2022.

  8. arXiv:2205.00371  [pdf, ps, other

    cs.DS

    The Johnson-Lindenstrauss Lemma for Clustering and Subspace Approximation: From Coresets to Dimension Reduction

    Authors: Moses Charikar, Erik Waingarten

    Abstract: We study the effect of Johnson-Lindenstrauss transforms in various projective clustering problems, generalizing recent results which only applied to center-based clustering [MMR19]. We ask the general question: for a Euclidean optimization problem and an accuracy parameter $ε\in (0, 1)$, what is the smallest target dimension $t \in \mathbb{N}$ such that a Johnson-Lindenstrauss transform… ▽ More

    Submitted 10 July, 2023; v1 submitted 30 April, 2022; originally announced May 2022.

  9. arXiv:2204.12358  [pdf, ps, other

    cs.DS

    Polylogarithmic Sketches for Clustering

    Authors: Moses Charikar, Erik Waingarten

    Abstract: Given $n$ points in $\ell_p^d$, we consider the problem of partitioning points into $k$ clusters with associated centers. The cost of a clustering is the sum of $p^{\text{th}}$ powers of distances of points to their cluster centers. For $p \in [1,2]$, we design sketches of size poly$(\log(nd),k,1/ε)$ such that the cost of the optimal clustering can be estimated to within factor $1+ε$, despite the… ▽ More

    Submitted 26 April, 2022; originally announced April 2022.

    Comments: ICALP 2022

  10. arXiv:2111.03528  [pdf, ps, other

    cs.DS

    New Streaming Algorithms for High Dimensional EMD and MST

    Authors: Xi Chen, Rajesh Jayaram, Amit Levi, Erik Waingarten

    Abstract: We study streaming algorithms for two fundamental geometric problems: computing the cost of a Minimum Spanning Tree (MST) of an $n$-point set $X \subset \{1,2,\dots,Δ\}^d$, and computing the Earth Mover Distance (EMD) between two multi-sets $A,B \subset \{1,2,\dots,Δ\}^d$ of size $n$. We consider the turnstile model, where points can be added and removed. We give a one-pass streaming algorithm for… ▽ More

    Submitted 5 November, 2021; originally announced November 2021.

  11. arXiv:2004.12496  [pdf, ps, other

    cs.DS cs.DM cs.LG math.PR math.ST

    Learning and Testing Junta Distributions with Subcube Conditioning

    Authors: Xi Chen, Rajesh Jayaram, Amit Levi, Erik Waingarten

    Abstract: We study the problems of learning and testing junta distributions on $\{-1,1\}^n$ with respect to the uniform distribution, where a distribution $p$ is a $k$-junta if its probability mass function $p(x)$ depends on a subset of at most $k$ variables. The main contribution is an algorithm for finding relevant coordinates in a $k$-junta distribution with subcube conditioning [BC18, CCKLW20]. We give… ▽ More

    Submitted 26 April, 2020; originally announced April 2020.

  12. arXiv:1911.07357  [pdf, ps, other

    cs.DS cs.IT cs.LG math.PR math.ST

    Random Restrictions of High-Dimensional Distributions and Uniformity Testing with Subcube Conditioning

    Authors: Clément L. Canonne, Xi Chen, Gautam Kamath, Amit Levi, Erik Waingarten

    Abstract: We give a nearly-optimal algorithm for testing uniformity of distributions supported on $\{-1,1\}^n$, which makes $\tilde O (\sqrt{n}/\varepsilon^2)$ queries to a subcube conditional sampling oracle (Bhattacharyya and Chakraborty (2018)). The key technical component is a natural notion of random restriction for distributions on $\{-1,1\}^n$, and a quantitative analysis of how such a restriction af… ▽ More

    Submitted 4 February, 2021; v1 submitted 17 November, 2019; originally announced November 2019.

    Comments: Added Remark 4.4, which discusses the time complexity (the algorithms are polynomial-time, based on an observation from [CJLW20]); removing log log log n factor for the Gaussian testing algorithm. These changes reflect those included in the conference version (SODA'21)

  13. Approximating the Distance to Monotonicity of Boolean Functions

    Authors: Ramesh Krishnan S. Pallavoor, Sofya Raskhodnikova, Erik Waingarten

    Abstract: We design a nonadaptive algorithm that, given oracle access to a function $f: \{0,1\}^n \to \{0,1\}$ which is $α$-far from monotone, makes poly$(n, 1/α)$ queries and returns an estimate that, with high probability, is an $\widetilde{O}(\sqrt{n})$-approximation to the distance of $f$ to monotonicity. The analysis of our algorithm relies on an improvement to the directed isoperimetric inequality of… ▽ More

    Submitted 25 February, 2021; v1 submitted 15 November, 2019; originally announced November 2019.

    Comments: To be published in Random Structures & Algorithms

  14. arXiv:1911.01169  [pdf, ps, other

    cs.DS cs.DM math.CO

    Optimal Adaptive Detection of Monotone Patterns

    Authors: Omri Ben-Eliezer, Shoham Letzter, Erik Waingarten

    Abstract: We investigate adaptive sublinear algorithms for detecting monotone patterns in an array. Given fixed $2 \leq k \in \mathbb{N}$ and $\varepsilon > 0$, consider the problem of finding a length-$k$ increasing subsequence in an array $f \colon [n] \to \mathbb{R}$, provided that $f$ is $\varepsilon$-far from free of such subsequences. Recently, it was shown that the non-adaptive query complexity of th… ▽ More

    Submitted 4 November, 2019; originally announced November 2019.

  15. arXiv:1910.01749  [pdf, other

    cs.DS cs.DM

    Finding monotone patterns in sublinear time

    Authors: Omri Ben-Eliezer, Clément L. Canonne, Shoham Letzter, Erik Waingarten

    Abstract: We study the problem of finding monotone subsequences in an array from the viewpoint of sublinear algorithms. For fixed $k \in \mathbb{N}$ and $\varepsilon > 0$, we show that the non-adaptive query complexity of finding a length-$k$ monotone subsequence of $f \colon [n] \to \mathbb{R}$, assuming that $f$ is $\varepsilon$-far from free of such subsequences, is… ▽ More

    Submitted 3 October, 2019; originally announced October 2019.

  16. arXiv:1907.04381  [pdf, ps, other

    cs.DS

    Nearly optimal edge estimation with independent set queries

    Authors: Xi Chen, Amit Levi, Erik Waingarten

    Abstract: We study the problem of estimating the number of edges of an unknown, undirected graph $G=([n],E)$ with access to an independent set oracle. When queried about a subset $S\subseteq [n]$ of vertices the independent set oracle answers whether $S$ is an independent set in $G$ or not. Our first main result is an algorithm that computes a $(1+ε)$-approximation of the number of edges $m$ of the graph us… ▽ More

    Submitted 9 July, 2019; originally announced July 2019.

  17. arXiv:1904.05309  [pdf, other

    cs.DS

    Testing Unateness Nearly Optimally

    Authors: Xi Chen, Erik Waingarten

    Abstract: We present an $\tilde{O}(n^{2/3}/ε^2)$-query algorithm that tests whether an unknown Boolean function $f\colon\{0,1\}^n\rightarrow \{0,1\}$ is unate (i.e., every variable is either non-decreasing or non-increasing) or $ε$-far from unate. The upper bound is nearly optimal given the $\tildeΩ(n^{2/3})$ lower~bound of [CWX17a]. The algorithm builds on a novel use of the binary search procedure and its… ▽ More

    Submitted 10 April, 2019; originally announced April 2019.

  18. arXiv:1902.02459  [pdf, ps, other

    cs.DS

    On Mean Estimation for General Norms with Statistical Queries

    Authors: Jerry Li, Aleksandar Nikolov, Ilya Razenshteyn, Erik Waingarten

    Abstract: We study the problem of mean estimation for high-dimensional distributions, assuming access to a statistical query oracle for the distribution. For a normed space $X = (\mathbb{R}^d, \|\cdot\|_X)$ and a distribution supported on vectors $x \in \mathbb{R}^d$ with $\|x\|_{X} \leq 1$, the task is to output an estimate $\hatμ \in \mathbb{R}^d$ which is $ε$-close in the distance induced by… ▽ More

    Submitted 6 February, 2019; originally announced February 2019.

  19. arXiv:1805.01074  [pdf, other

    cs.CC cs.DS

    Lower Bounds for Tolerant Junta and Unateness Testing via Rejection Sampling of Graphs

    Authors: Amit Levi, Erik Waingarten

    Abstract: We introduce a new model for testing graph properties which we call the \emph{rejection sampling model}. We show that testing bipartiteness of $n$-nodes graphs using rejection sampling queries requires complexity $\widetildeΩ(n^2)$. Via reductions from the rejection sampling model, we give three new lower bounds for tolerant testing of Boolean functions of the form $f\colon\{0,1\}^n\to \{0,1\}$:… ▽ More

    Submitted 2 May, 2018; originally announced May 2018.

  20. arXiv:1708.05786  [pdf, other

    cs.CC

    Boolean Unateness Testing with $\widetilde{O}(n^{3/4})$ Adaptive Queries

    Authors: Xi Chen, Erik Waingarten, **yu Xie

    Abstract: We give an adaptive algorithm which tests whether an unknown Boolean function $f\colon \{0, 1\}^n \to\{0, 1\}$ is unate, i.e. every variable of $f$ is either non-decreasing or non-increasing, or $ε$-far from unate with one-sided error using $\widetilde{O}(n^{3/4}/ε^2)$ queries. This improves on the best adaptive $O(n/ε)$-query algorithm from Baleshzar, Chakrabarty, Pallavoor, Raskhodnikova and Ses… ▽ More

    Submitted 18 August, 2017; originally announced August 2017.

  21. arXiv:1706.05556  [pdf, ps, other

    cs.CC

    Adaptivity is exponentially powerful for testing monotonicity of halfspaces

    Authors: Xi Chen, Rocco A. Servedio, Li-Yang Tan, Erik Waingarten

    Abstract: We give a $\mathrm{poly}(\log n, 1/ε)$-query adaptive algorithm for testing whether an unknown Boolean function $f: \{-1,1\}^n \to \{-1,1\}$, which is promised to be a halfspace, is monotone versus $ε$-far from monotone. Since non-adaptive algorithms are known to require almost $Ω(n^{1/2})$ queries to test whether an unknown halfspace is monotone versus far from monotone, this shows that adaptivit… ▽ More

    Submitted 17 June, 2017; originally announced June 2017.

  22. arXiv:1704.06314  [pdf, other

    cs.CC

    Settling the query complexity of non-adaptive junta testing

    Authors: Xi Chen, Rocco A. Servedio, Li-Yang Tan, Erik Waingarten, **yu Xie

    Abstract: We prove that any non-adaptive algorithm that tests whether an unknown Boolean function $f: \{0, 1\}^n\to \{0, 1\}$ is a $k$-junta or $ε$-far from every $k$-junta must make $\widetildeΩ(k^{3/2} / ε)$ many queries for a wide range of parameters $k$ and $ε$. Our result dramatically improves previous lower bounds from [BGSMdW13, STW15], and is essentially optimal given Blais's non-adaptive junta test… ▽ More

    Submitted 20 April, 2017; originally announced April 2017.

  23. arXiv:1702.06997  [pdf, other

    cs.CC

    Beyond Talagrand Functions: New Lower Bounds for Testing Monotonicity and Unateness

    Authors: Xi Chen, Erik Waingarten, **yu Xie

    Abstract: We prove a lower bound of $\tildeΩ(n^{1/3})$ for the query complexity of any two-sided and adaptive algorithm that tests whether an unknown Boolean function $f:\{0,1\}^n\rightarrow \{0,1\}$ is monotone or far from monotone. This improves the recent bound of $\tildeΩ(n^{1/4})$ for the same problem by Belovs and Blais [BB15]. Our result builds on a new family of random Boolean functions that can be… ▽ More

    Submitted 18 August, 2017; v1 submitted 22 February, 2017; originally announced February 2017.

  24. arXiv:1611.06222  [pdf, other

    cs.DS cs.CG cs.LG math.MG

    Approximate Near Neighbors for General Symmetric Norms

    Authors: Alexandr Andoni, Huy L. Nguyen, Aleksandar Nikolov, Ilya Razenshteyn, Erik Waingarten

    Abstract: We show that every symmetric normed space admits an efficient nearest neighbor search data structure with doubly-logarithmic approximation. Specifically, for every $n$, $d = n^{o(1)}$, and every $d$-dimensional symmetric norm $\|\cdot\|$, there exists a data structure for $\mathrm{poly}(\log \log n)$-approximate nearest neighbor search over $\|\cdot\|$ for $n$-point datasets achieving $n^{o(1)}$ q… ▽ More

    Submitted 24 July, 2017; v1 submitted 18 November, 2016; originally announced November 2016.

    Comments: 27 pages, 1 figure

  25. arXiv:1608.03580  [pdf, other

    cs.DS cs.CC cs.CG cs.IR

    Optimal Hashing-based Time-Space Trade-offs for Approximate Near Neighbors

    Authors: Alexandr Andoni, Thijs Laarhoven, Ilya Razenshteyn, Erik Waingarten

    Abstract: [See the paper for the full abstract.] We show tight upper and lower bounds for time-space trade-offs for the $c$-Approximate Near Neighbor Search problem. For the $d$-dimensional Euclidean space and $n$-point datasets, we develop a data structure with space $n^{1 + ρ_u + o(1)} + O(dn)$ and query time $n^{ρ_q + o(1)} + d n^{o(1)}$ for every $ρ_u, ρ_q \geq 0$ such that: \begin{equation} c^2 \sqrt… ▽ More

    Submitted 21 May, 2017; v1 submitted 11 August, 2016; originally announced August 2016.

    Comments: 62 pages, 5 figures; a merger of arXiv:1511.07527 [cs.DS] and arXiv:1605.02701 [cs.DS], which subsumes both of the preprints. New version contains more elaborated proofs and fixed some typos

  26. arXiv:1605.02701  [pdf, other

    cs.DS cs.CC cs.CG cs.IT

    Lower Bounds on Time-Space Trade-Offs for Approximate Near Neighbors

    Authors: Alexandr Andoni, Thijs Laarhoven, Ilya Razenshteyn, Erik Waingarten

    Abstract: We show tight lower bounds for the entire trade-off between space and query time for the Approximate Near Neighbor search problem. Our lower bounds hold in a restricted model of computation, which captures all hashing-based approaches. In articular, our lower bound matches the upper bound recently shown in [Laarhoven 2015] for the random instance on a Euclidean sphere (which we show in fact extend… ▽ More

    Submitted 18 August, 2016; v1 submitted 9 May, 2016; originally announced May 2016.

    Comments: 47 pages, 2 figures; v2: substantially revised introduction, lots of small corrections; subsumed by arXiv:1608.03580 [cs.DS] (along with arXiv:1511.07527 [cs.DS])