Skip to main content

Showing 1–31 of 31 results for author: Alman, J

.
  1. arXiv:2407.02372  [pdf, ps, other

    cs.DS math.NA

    Finer-Grained Hardness of Kernel Density Estimation

    Authors: Josh Alman, Yunfeng Guan

    Abstract: In batch Kernel Density Estimation (KDE) for a kernel function $f$, we are given as input $2n$ points $x^{(1)}, \cdots, x^{(n)}, y^{(1)}, \cdots, y^{(n)}$ in dimension $m$, as well as a vector $v \in \mathbb{R}^n$. These inputs implicitly define the $n \times n$ kernel matrix $K$ given by $K[i,j] = f(x^{(i)}, y^{(j)})$. The goal is to compute a vector $v$ which approximates $K w$ with… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 30 pages, to appear in the 39th Computational Complexity Conference (CCC 2024)

  2. arXiv:2404.16349  [pdf, ps, other

    cs.DS cs.CC

    More Asymmetry Yields Faster Matrix Multiplication

    Authors: Josh Alman, Ran Duan, Virginia Vassilevska Williams, Yinzhan Xu, Zixuan Xu, Renfei Zhou

    Abstract: We present a new improvement on the laser method for designing fast matrix multiplication algorithms. The new method further develops the recent advances by [Duan, Wu, Zhou FOCS 2023] and [Vassilevska Williams, Xu, Xu, Zhou SODA 2024]. Surprisingly the new improvement is achieved by incorporating more asymmetry in the analysis, circumventing a fundamental tool of prior work that requires two of th… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 44 pages. arXiv admin note: text overlap with arXiv:2307.07970

  3. arXiv:2402.04497  [pdf, ps, other

    cs.LG cs.CC cs.CL cs.DS

    The Fine-Grained Complexity of Gradient Computation for Training Large Language Models

    Authors: Josh Alman, Zhao Song

    Abstract: Large language models (LLMs) have made fundamental contributions over the last a few years. To train an LLM, one needs to alternatingly run `forward' computations and `backward' computations. The forward computation can be viewed as attention function evaluation, and the backward computation can be viewed as a gradient computation. In previous work by [Alman and Song, NeurIPS 2023], it was proved… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  4. arXiv:2311.01630  [pdf, other

    cs.DS

    Generalizations of Matrix Multiplication can solve the Light Bulb Problem

    Authors: Josh Alman, Hengjie Zhang

    Abstract: In the light bulb problem, one is given uniformly random vectors $x_1, \ldots, x_n, y_1, \ldots, y_n \in \{-1,1\}^d$. They are all chosen independently except a planted pair $(x_{i^*}, y_{j^*})$ is chosen with correlation $ρ>0$. The goal is to find the planted pair. This problem was introduced over 30 years ago by L.~Valiant, and is known to have many applications in data analysis, statistics, and… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: abstract shortened for arxiv

  5. arXiv:2310.04064  [pdf, ps, other

    cs.DS cs.CC cs.CL cs.LG stat.ML

    How to Capture Higher-order Correlations? Generalizing Matrix Softmax Attention to Kronecker Computation

    Authors: Josh Alman, Zhao Song

    Abstract: In the classical transformer attention scheme, we are given three $n \times d$ size matrices $Q, K, V$ (the query, key, and value tokens), and the goal is to compute a new $n \times d$ size matrix $D^{-1} \exp(QK^\top) V$ where $D = \mathrm{diag}( \exp(QK^\top) {\bf 1}_n )$. In this work, we study a generalization of attention which captures triple-wise correlations. This generalization is able to… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

  6. arXiv:2309.04683  [pdf, other

    cs.CC

    Tensor Ranks and the Fine-Grained Complexity of Dynamic Programming

    Authors: Josh Alman, Ethan Turok, Hantao Yu, Hengzhi Zhang

    Abstract: Generalizing work of Künnemann, Paturi, and Schneider [ICALP 2017], we study a wide class of high-dimensional dynamic programming (DP) problems in which one must find the shortest path between two points in a high-dimensional grid given a tensor of transition costs between nodes in the grid. This captures many classical problems which are solved using DP such as the knapsack problem, the airplane… ▽ More

    Submitted 2 January, 2024; v1 submitted 9 September, 2023; originally announced September 2023.

  7. arXiv:2302.13214  [pdf, ps, other

    cs.LG cs.CC cs.DS stat.ML

    Fast Attention Requires Bounded Entries

    Authors: Josh Alman, Zhao Song

    Abstract: In modern machine learning, inner product attention computation is a fundamental task for training large language models such as Transformer, GPT-1, BERT, GPT-2, GPT-3 and ChatGPT. Formally, in this problem, one is given as input three matrices $Q, K, V \in [-B,B]^{n \times d}$, and the goal is to construct the matrix… ▽ More

    Submitted 9 May, 2023; v1 submitted 25 February, 2023; originally announced February 2023.

  8. arXiv:2302.11476  [pdf, ps, other

    cs.CC cs.DM

    Matrix Multiplication and Number On the Forehead Communication

    Authors: Josh Alman, Jarosław Błasiok

    Abstract: Three-player Number On the Forehead communication may be thought of as a three-player Number In the Hand promise model, in which each player is given the inputs that are supposedly on the other two players' heads, and promised that they are consistent with the inputs of of the other players. The set of all allowed inputs under this promise may be thought of as an order-3 tensor. We surprisingly ob… ▽ More

    Submitted 22 February, 2023; originally announced February 2023.

  9. arXiv:2211.14227  [pdf, ps, other

    cs.LG cs.DS stat.ML

    Bypass Exponential Time Preprocessing: Fast Neural Network Training via Weight-Data Correlation Preprocessing

    Authors: Josh Alman, Jiehao Liang, Zhao Song, Ruizhe Zhang, Danyang Zhuo

    Abstract: Over the last decade, deep neural networks have transformed our society, and they are already widely applied in various machine learning applications. State-of-art deep neural networks are becoming larger in size every year to deliver increasing model accuracy, and as a result, model training consumes substantial computing resources and will only consume more in the future. Using current training… ▽ More

    Submitted 25 November, 2022; originally announced November 2022.

  10. arXiv:2211.06459  [pdf, other

    cs.DS cs.CC

    Faster Walsh-Hadamard and Discrete Fourier Transforms From Matrix Non-Rigidity

    Authors: Josh Alman, Kevin Rao

    Abstract: We give algorithms with lower arithmetic operation counts for both the Walsh-Hadamard Transform (WHT) and the Discrete Fourier Transform (DFT) on inputs of power-of-2 size $N$. For the WHT, our new algorithm has an operation count of $\frac{23}{24}N \log N + O(N)$. To our knowledge, this gives the first improvement on the $N \log N$ operation count of the simple, folklore Fast Walsh-Hadamard Tra… ▽ More

    Submitted 14 June, 2023; v1 submitted 11 November, 2022; originally announced November 2022.

    Comments: 42 pages

  11. arXiv:2211.05217  [pdf, other

    cs.DS

    Smaller Low-Depth Circuits for Kronecker Powers

    Authors: Josh Alman, Yunfeng Guan, Ashwin Padaki

    Abstract: We give new, smaller constructions of constant-depth linear circuits for computing any matrix which is the Kronecker power of a fixed matrix. A standard argument (e.g., the mixed product property of Kronecker products, or a generalization of the Fast Walsh-Hadamard transform) shows that any such $N \times N$ matrix has a depth-2 circuit of size $O(N^{1.5})$. We improve on this for all such matrice… ▽ More

    Submitted 9 November, 2022; originally announced November 2022.

    Comments: 36 pages, to appear in the 34th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2023)

  12. arXiv:2211.04643  [pdf, ps, other

    cs.DS

    Faster Walsh-Hadamard Transform and Matrix Multiplication over Finite Fields using Lookup Tables

    Authors: Josh Alman

    Abstract: We use lookup tables to design faster algorithms for important algebraic problems over finite fields. These faster algorithms, which only use arithmetic operations and lookup table operations, may help to explain the difficulty of determining the complexities of these important problems. Our results over a constant-sized finite field are as follows. The Walsh-Hadamard transform of a vector of le… ▽ More

    Submitted 8 November, 2022; originally announced November 2022.

    Comments: 10 pages, to appear in the 6th Symposium on Simplicity in Algorithms (SOSA 2023)

  13. arXiv:2205.06249  [pdf, ps, other

    cs.CC cs.DS math.CA

    Optimal-Degree Polynomial Approximations for Exponentials and Gaussian Kernel Density Estimation

    Authors: Amol Aggarwal, Josh Alman

    Abstract: For any real numbers $B \ge 1$ and $δ\in (0, 1)$ and function $f: [0, B] \rightarrow \mathbb{R}$, let $d_{B; δ} (f) \in \mathbb{Z}_{> 0}$ denote the minimum degree of a polynomial $p(x)$ satisfying $\sup_{x \in [0, B]} \big| p(x) - f(x) \big| < δ$. In this paper, we provide precise asymptotics for $d_{B; δ} (e^{-x})$ and $d_{B; δ} (e^{x})$ in terms of both $B$ and $δ$, improving both the previousl… ▽ More

    Submitted 12 May, 2022; originally announced May 2022.

    Comments: 27 pages, to appear in the 37th Computational Complexity Conference (CCC 2022)

  14. arXiv:2204.10819  [pdf, ps, other

    cs.DS

    Parameterized Sensitivity Oracles and Dynamic Algorithms using Exterior Algebras

    Authors: Josh Alman, Dean Hirsch

    Abstract: We design the first efficient sensitivity oracles and dynamic algorithms for a variety of parameterized problems. Our main approach is to modify the algebraic coding technique from static parameterized algorithm design, which had not previously been used in a dynamic context. We particularly build off of the `extensor coding' method of Brand, Dell and Husfeldt [STOC'18], employing properties of th… ▽ More

    Submitted 18 June, 2022; v1 submitted 22 April, 2022; originally announced April 2022.

  15. arXiv:2102.11992  [pdf, ps, other

    cs.DS cs.CC math.CO

    Kronecker Products, Low-Depth Circuits, and Matrix Rigidity

    Authors: Josh Alman

    Abstract: For a matrix $M$ and a positive integer $r$, the rank $r$ rigidity of $M$ is the smallest number of entries of $M$ which one must change to make its rank at most $r$. There are many known applications of rigidity lower bounds to a variety of areas in complexity theory, but fewer known applications of rigidity upper bounds. In this paper, we use rigidity upper bounds to prove new upper bounds in a… ▽ More

    Submitted 23 February, 2021; originally announced February 2021.

    Comments: 40 pages, to appear in STOC 2021

  16. arXiv:2011.11503  [pdf, ps, other

    cs.CG cs.LG math.MG

    Metric Transforms and Low Rank Matrices via Representation Theory of the Real Hyperrectangle

    Authors: Josh Alman, Timothy Chu, Gary Miller, Shyam Narayanan, Mark Sellke, Zhao Song

    Abstract: In this paper, we develop a new technique which we call representation theory of the real hyperrectangle, which describes how to compute the eigenvectors and eigenvalues of certain matrices arising from hyperrectangles. We show that these matrices arise naturally when analyzing a number of different algorithmic tasks such as kernel methods, neural network training, natural language processing, and… ▽ More

    Submitted 4 August, 2021; v1 submitted 23 November, 2020; originally announced November 2020.

  17. arXiv:2011.02466  [pdf, ps, other

    cs.DS cs.CC cs.LG stat.ML

    Algorithms and Hardness for Linear Algebra on Geometric Graphs

    Authors: Josh Alman, Timothy Chu, Aaron Schild, Zhao Song

    Abstract: For a function $\mathsf{K} : \mathbb{R}^{d} \times \mathbb{R}^{d} \to \mathbb{R}_{\geq 0}$, and a set $P = \{ x_1, \ldots, x_n\} \subset \mathbb{R}^d$ of $n$ points, the $\mathsf{K}$ graph $G_P$ of $P$ is the complete graph on $n$ nodes where the weight between nodes $i$ and $j$ is given by $\mathsf{K}(x_i, x_j)$. In this paper, we initiate the study of when efficient spectral graph theory is poss… ▽ More

    Submitted 4 November, 2020; originally announced November 2020.

    Comments: FOCS 2020

  18. arXiv:2010.05846  [pdf, ps, other

    cs.DS cs.CC math.CO

    A Refined Laser Method and Faster Matrix Multiplication

    Authors: Josh Alman, Virginia Vassilevska Williams

    Abstract: The complexity of matrix multiplication is measured in terms of $ω$, the smallest real number such that two $n\times n$ matrices can be multiplied using $O(n^{ω+ε})$ field operations for all $ε>0$; the best bound until now is $ω<2.37287$ [Le Gall'14]. All bounds on $ω$ since 1986 have been obtained using the so-called laser method, a way to lower-bound the `value' of a tensor in designing matrix m… ▽ More

    Submitted 12 October, 2020; originally announced October 2020.

    Comments: 29 pages, to appear in the 32nd Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2021)

  19. arXiv:1911.01351  [pdf, other

    cs.DS

    Faster Update Time for Turnstile Streaming Algorithms

    Authors: Josh Alman, Huacheng Yu

    Abstract: In this paper, we present a new algorithm for maintaining linear sketches in turnstile streams with faster update time. As an application, we show that $\log n$ \texttt{Count} sketches or \texttt{CountMin} sketches with a constant number of columns (i.e., buckets) can be implicitly maintained in \emph{worst-case} $O(\log^{0.582} n)$ update time using $O(\log n)$ words of space, on a standard word… ▽ More

    Submitted 4 November, 2019; originally announced November 2019.

    Comments: To appear in SODA 2020

  20. arXiv:1812.08731  [pdf, ps, other

    cs.CC cs.DS math.CO

    Limits on the Universal Method for Matrix Multiplication

    Authors: Josh Alman

    Abstract: In this work, we prove limitations on the known methods for designing matrix multiplication algorithms. Alman and Vassilevska Williams recently defined the Universal Method, which substantially generalizes all the known approaches including Strassen's Laser Method and Cohn and Umans' Group Theoretic Method. We prove concrete lower bounds on the algorithms one can design by applying the Universal M… ▽ More

    Submitted 1 May, 2019; v1 submitted 20 December, 2018; originally announced December 2018.

    Comments: 25 pages, to appear in 34th Computational Complexity Conference (CCC 2019)

  21. arXiv:1810.08671  [pdf, ps, other

    cs.CC cs.DS

    Limits on All Known (and Some Unknown) Approaches to Matrix Multiplication

    Authors: Josh Alman, Virginia Vassilevska Williams

    Abstract: We study the known techniques for designing Matrix Multiplication algorithms. The two main approaches are the Laser method of Strassen, and the Group theoretic approach of Cohn and Umans. We define a generalization based on zeroing outs which subsumes these two approaches, which we call the Solar method, and an even more general method based on monomial degenerations, which we call the Galactic me… ▽ More

    Submitted 19 October, 2018; originally announced October 2018.

    Comments: 32 pages. A preliminary version appeared in the 59th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2018)

  22. arXiv:1810.06740  [pdf, ps, other

    cs.DS

    An Illuminating Algorithm for the Light Bulb Problem

    Authors: Josh Alman

    Abstract: The Light Bulb Problem is one of the most basic problems in data analysis. One is given as input $n$ vectors in $\{-1,1\}^d$, which are all independently and uniformly random, except for a planted pair of vectors with inner product at least $ρ\cdot d$ for some constant $ρ> 0$. The task is to find the planted pair. The most straightforward algorithm leads to a runtime of $Ω(n^2)$. Algorithms based… ▽ More

    Submitted 15 October, 2018; originally announced October 2018.

    Comments: 10 pages. To appear in the 2nd Symposium on Simplicity in Algorithms (SOSA 2019)

  23. arXiv:1712.07246  [pdf, other

    cs.CC cs.DS

    Further limitations of the known approaches for matrix multiplication

    Authors: Josh Alman, Virginia Vassilevska Williams

    Abstract: We consider the techniques behind the current best algorithms for matrix multiplication. Our results are threefold. (1) We provide a unifying framework, showing that all known matrix multiplication running times since 1986 can be achieved from a single very natural tensor - the structural tensor $T_q$ of addition modulo an integer $q$. (2) We show that if one applies a generalization of the kn… ▽ More

    Submitted 19 December, 2017; originally announced December 2017.

    Comments: 16 pages. To appear in 9th Innovations in Theoretical Computer Science Conference (ITCS 2018)

  24. arXiv:1707.00362  [pdf, other

    cs.DS cs.CC

    Dynamic Parameterized Problems and Algorithms

    Authors: Josh Alman, Matthias Mnich, Virginia Vassilevska Williams

    Abstract: Fixed-parameter algorithms and kernelization are two powerful methods to solve $\mathsf{NP}$-hard problems. Yet, so far those algorithms have been largely restricted to static inputs. In this paper we provide fixed-parameter algorithms and kernelizations for fundamental $\mathsf{NP}$-hard problems with dynamic inputs. We consider a variety of parameterized graph and hitting set problems which ar… ▽ More

    Submitted 2 July, 2017; originally announced July 2017.

    Comments: 40 pages, appears in ICALP 2017

  25. arXiv:1704.06185  [pdf, ps, other

    cs.DS cs.CC

    Cell-Probe Lower Bounds from Online Communication Complexity

    Authors: Josh Alman, Joshua R. Wang, Huacheng Yu

    Abstract: In this work, we introduce an online model for communication complexity. Analogous to how online algorithms receive their input piece-by-piece, our model presents one of the players, Bob, his input piece-by-piece, and has the players Alice and Bob cooperate to compute a result each time before the next piece is revealed to Bob. This model has a closer and more natural correspondence to dynamic dat… ▽ More

    Submitted 15 November, 2017; v1 submitted 20 April, 2017; originally announced April 2017.

  26. arXiv:1611.05558  [pdf, ps, other

    cs.CC

    Probabilistic Rank and Matrix Rigidity

    Authors: Josh Alman, Ryan Williams

    Abstract: We consider a notion of probabilistic rank and probabilistic sign-rank of a matrix, which measures the extent to which a matrix can be probabilistically represented by low-rank matrices. We demonstrate several connections with matrix rigidity, communication complexity, and circuit lower bounds, including: The Walsh-Hadamard Transform is Not Very Rigid. We give surprising upper bounds on the rigi… ▽ More

    Submitted 7 January, 2017; v1 submitted 16 November, 2016; originally announced November 2016.

    Comments: 21 pages

    Report number: In ACM Symposium on Theory of Computing (STOC), 2017

  27. arXiv:1608.04355  [pdf, ps, other

    cs.DS cs.CC

    Polynomial Representations of Threshold Functions and Algorithmic Applications

    Authors: Josh Alman, Timothy M. Chan, Ryan Williams

    Abstract: We design new polynomials for representing threshold functions in three different regimes: probabilistic polynomials of low degree, which need far less randomness than previous constructions, polynomial threshold functions (PTFs) with "nice" threshold behavior and degree almost as low as the probabilistic polynomials, and a new notion of probabilistic PTFs where we combine the above techniques to… ▽ More

    Submitted 15 August, 2016; originally announced August 2016.

    Comments: 30 pages. To appear in 57th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2016)

  28. arXiv:1507.05106  [pdf, ps, other

    cs.DS cs.CC math.CO

    Probabilistic Polynomials and Hamming Nearest Neighbors

    Authors: Josh Alman, Ryan Williams

    Abstract: We show how to compute any symmetric Boolean function on $n$ variables over any field (as well as the integers) with a probabilistic polynomial of degree $O(\sqrt{n \log(1/ε)})$ and error at most $ε$. The degree dependence on $n$ and $ε$ is optimal, matching a lower bound of Razborov (1987) and Smolensky (1987) for the MAJORITY function. The proof is constructive: a low-degree polynomial can be ef… ▽ More

    Submitted 17 July, 2015; originally announced July 2015.

    Comments: 16 pages. To appear in 56th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2015)

  29. arXiv:1309.3011  [pdf, other

    math.CO math.RA math.RT

    Circular Planar Electrical Networks II: Positivity Phenomena

    Authors: Joshua Alman, Carl Lian, Brandon Tran

    Abstract: Curtis-Ingerman-Morrow characterize response matrices for circular planar electrical networks as symmetric square matrices with row sums zero and non-negative circular minors. In this paper, we study this positivity phenomenon more closely, from both algebraic and combinatorial perspectives. Extending work of Postnikov, we introduce electrical positroids, which are the sets of circular minors whic… ▽ More

    Submitted 11 September, 2013; originally announced September 2013.

    Comments: 30 pages, 4 figures

    MSC Class: 05E10; 13F60; 05B35

  30. arXiv:1309.2697  [pdf, other

    math.CO

    Circular Planar Electrical Networks I: The Electrical Poset EP_{n}

    Authors: Joshua Alman, Carl Lian, Brandon Tran

    Abstract: Following de Verdière-Gitler-Vertigan and Curtis-Ingerman-Morrow, we prove a host of new results on circular planar electrical networks. We introduce a poset EP_{n} of electrical networks with n boundary vertices, giving two equivalent characterizations, one combinatorial and the other topological. We then investigate various properties of the EP_{n}, proving that it is graded by number of edges o… ▽ More

    Submitted 10 September, 2013; originally announced September 2013.

    Comments: 29 pages, 12 figures

    MSC Class: 06A07; 05A15; 05A16

  31. arXiv:1309.0751  [pdf, ps, other

    math.CO math.RT

    Laurent Phenomenon Sequences

    Authors: Joshua Alman, Cesar Cuenca, Jiaoyang Huang

    Abstract: In this paper, we undertake a systematic study of recurrences x_{m+n}x_{m} = P(x_{m+1}, ..., x_{m+n-1}) which exhibit the Laurent phenomenon. Some of the most famous among these sequences come from the Somos and the Gale-Robinson recurrences. Our approach is based on finding period 1 seeds of Laurent phenomenon algebras of Lam-Pylyavskyy. We completely classify polynomials P that generate period 1… ▽ More

    Submitted 4 October, 2013; v1 submitted 3 September, 2013; originally announced September 2013.

    Comments: 38 pages

    MSC Class: 05E10