-
Finer-Grained Hardness of Kernel Density Estimation
Authors:
Josh Alman,
Yunfeng Guan
Abstract:
In batch Kernel Density Estimation (KDE) for a kernel function $f$, we are given as input $2n$ points $x^{(1)}, \cdots, x^{(n)}, y^{(1)}, \cdots, y^{(n)}$ in dimension $m$, as well as a vector $v \in \mathbb{R}^n$. These inputs implicitly define the $n \times n$ kernel matrix $K$ given by $K[i,j] = f(x^{(i)}, y^{(j)})$. The goal is to compute a vector $v$ which approximates $K w$ with…
▽ More
In batch Kernel Density Estimation (KDE) for a kernel function $f$, we are given as input $2n$ points $x^{(1)}, \cdots, x^{(n)}, y^{(1)}, \cdots, y^{(n)}$ in dimension $m$, as well as a vector $v \in \mathbb{R}^n$. These inputs implicitly define the $n \times n$ kernel matrix $K$ given by $K[i,j] = f(x^{(i)}, y^{(j)})$. The goal is to compute a vector $v$ which approximates $K w$ with $|| Kw - v||_\infty < \varepsilon ||w||_1$. A recent line of work has proved fine-grained lower bounds conditioned on SETH. Backurs et al. first showed the hardness of KDE for Gaussian-like kernels with high dimension $m = Ω(\log n)$ and large scale $B = Ω(\log n)$. Alman et al. later developed new reductions in roughly this same parameter regime, leading to lower bounds for more general kernels, but only for very small error $\varepsilon < 2^{- \log^{Ω(1)} (n)}$.
In this paper, we refine the approach of Alman et al. to show new lower bounds in all parameter regimes, closing gaps between the known algorithms and lower bounds. In the setting where $m = C\log n$ and $B = o(\log n)$, we prove Gaussian KDE requires $n^{2-o(1)}$ time to achieve additive error $\varepsilon < Ω(m/B)^{-m}$, matching the performance of the polynomial method up to low-order terms. In the low dimensional setting $m = o(\log n)$, we show that Gaussian KDE requires $n^{2-o(1)}$ time to achieve $\varepsilon$ such that $\log \log (\varepsilon^{-1}) > \tilde Ω((\log n)/m)$, matching the error bound achievable by FMM up to low-order terms. To our knowledge, no nontrivial lower bound was previously known in this regime.
Our new lower bounds make use of an intricate analysis of a special case of the kernel matrix -- the `counting matrix'. As a key technical lemma, we give a novel approach to bounding the entries of its inverse by using Schur polynomials from algebraic combinatorics.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
More Asymmetry Yields Faster Matrix Multiplication
Authors:
Josh Alman,
Ran Duan,
Virginia Vassilevska Williams,
Yinzhan Xu,
Zixuan Xu,
Renfei Zhou
Abstract:
We present a new improvement on the laser method for designing fast matrix multiplication algorithms. The new method further develops the recent advances by [Duan, Wu, Zhou FOCS 2023] and [Vassilevska Williams, Xu, Xu, Zhou SODA 2024]. Surprisingly the new improvement is achieved by incorporating more asymmetry in the analysis, circumventing a fundamental tool of prior work that requires two of th…
▽ More
We present a new improvement on the laser method for designing fast matrix multiplication algorithms. The new method further develops the recent advances by [Duan, Wu, Zhou FOCS 2023] and [Vassilevska Williams, Xu, Xu, Zhou SODA 2024]. Surprisingly the new improvement is achieved by incorporating more asymmetry in the analysis, circumventing a fundamental tool of prior work that requires two of the three dimensions to be treated identically. The method yields a new bound on the square matrix multiplication exponent $$ω<2.371339,$$ improved from the previous bound of $ω<2.371552$. We also improve the bounds of the exponents for multiplying rectangular matrices of various shapes.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
The Fine-Grained Complexity of Gradient Computation for Training Large Language Models
Authors:
Josh Alman,
Zhao Song
Abstract:
Large language models (LLMs) have made fundamental contributions over the last a few years. To train an LLM, one needs to alternatingly run `forward' computations and `backward' computations. The forward computation can be viewed as attention function evaluation, and the backward computation can be viewed as a gradient computation. In previous work by [Alman and Song, NeurIPS 2023], it was proved…
▽ More
Large language models (LLMs) have made fundamental contributions over the last a few years. To train an LLM, one needs to alternatingly run `forward' computations and `backward' computations. The forward computation can be viewed as attention function evaluation, and the backward computation can be viewed as a gradient computation. In previous work by [Alman and Song, NeurIPS 2023], it was proved that the forward step can be performed in almost-linear time in certain parameter regimes, but that there is no truly sub-quadratic time algorithm in the remaining parameter regimes unless the popular hypothesis SETH is false. In this work, we show nearly identical results for the harder-seeming problem of computing the gradient of loss function of one layer attention network, and thus for the entire process of LLM training. This completely characterizes the fine-grained complexity of every step of LLM training.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
Generalizations of Matrix Multiplication can solve the Light Bulb Problem
Authors:
Josh Alman,
Hengjie Zhang
Abstract:
In the light bulb problem, one is given uniformly random vectors $x_1, \ldots, x_n, y_1, \ldots, y_n \in \{-1,1\}^d$. They are all chosen independently except a planted pair $(x_{i^*}, y_{j^*})$ is chosen with correlation $ρ>0$. The goal is to find the planted pair. This problem was introduced over 30 years ago by L.~Valiant, and is known to have many applications in data analysis, statistics, and…
▽ More
In the light bulb problem, one is given uniformly random vectors $x_1, \ldots, x_n, y_1, \ldots, y_n \in \{-1,1\}^d$. They are all chosen independently except a planted pair $(x_{i^*}, y_{j^*})$ is chosen with correlation $ρ>0$. The goal is to find the planted pair. This problem was introduced over 30 years ago by L.~Valiant, and is known to have many applications in data analysis, statistics, and learning theory.
The naive algorithm runs in $Ω(n^2)$ time, and algorithms based on Locality-Sensitive Hashing approach quadratic time as $ρ\to 0$. In 2012, G.~Valiant gave a breakthrough algorithm using fast matrix multiplication that runs in time $O(n^{(5-ω)/(4-ω)}) < O(n^{1.615})$, no matter how small $ρ>0$ is. This was subsequently refined by Karppa, Kaski, and Kohonen in 2016 to $O(n^{2 ω/ 3}) < O(n^{1.582})$.
In this paper, we propose a new approach which can replace matrix multiplication tensor with other tensors. Those tensors can omit some terms one is supposed to compute, and include additional error terms. Our new approach can make use of any tensors which previously had no known algorithmic applications, including tensors which arise naturally as intermediate steps in border rank methods and in the Laser method.
We further show that our approach can be combined with locality-sensitive hashing to design an algorithm whose running time improves as $ρ$ gets larger. To our knowledge, this is the first algorithm which combines fast matrix multiplication with hashing for the light bulb problem or any closest pair problem, and it leads to faster algorithms for small $ρ>0$.
We also introduce a new tensor $T_{2112}$, which has the same size of $2 \times 2$ matrix multiplication tensor, but runs faster than the Strassen's algorithm for light bulb problem.
△ Less
Submitted 2 November, 2023;
originally announced November 2023.
-
How to Capture Higher-order Correlations? Generalizing Matrix Softmax Attention to Kronecker Computation
Authors:
Josh Alman,
Zhao Song
Abstract:
In the classical transformer attention scheme, we are given three $n \times d$ size matrices $Q, K, V$ (the query, key, and value tokens), and the goal is to compute a new $n \times d$ size matrix $D^{-1} \exp(QK^\top) V$ where $D = \mathrm{diag}( \exp(QK^\top) {\bf 1}_n )$. In this work, we study a generalization of attention which captures triple-wise correlations. This generalization is able to…
▽ More
In the classical transformer attention scheme, we are given three $n \times d$ size matrices $Q, K, V$ (the query, key, and value tokens), and the goal is to compute a new $n \times d$ size matrix $D^{-1} \exp(QK^\top) V$ where $D = \mathrm{diag}( \exp(QK^\top) {\bf 1}_n )$. In this work, we study a generalization of attention which captures triple-wise correlations. This generalization is able to solve problems about detecting triple-wise connections that were shown to be impossible for transformers. The potential downside of this generalization is that it appears as though computations are even more difficult, since the straightforward algorithm requires cubic time in $n$. However, we show that in the bounded-entry setting (which arises in practice, and which is well-studied in both theory and practice), there is actually a near-linear time algorithm. More precisely, we show that bounded entries are both necessary and sufficient for quickly performing generalized computations:
$\bullet$ On the positive side, if all entries of the input matrices are bounded above by $o(\sqrt[3]{\log n})$ then we show how to approximate the ``tensor-type'' attention matrix in $n^{1+o(1)}$ time.
$\bullet$ On the negative side, we show that if the entries of the input matrices may be as large as $Ω(\sqrt[3]{\log n})$, then there is no algorithm that runs faster than $n^{3-o(1)}$ (assuming the Strong Exponential Time Hypothesis from fine-grained complexity theory).
We also show that our construction, algorithms, and lower bounds naturally generalize to higher-order tensors and correlations. Interestingly, the higher the order of the tensors, the lower the bound on the entries needs to be for an efficient algorithm. Our results thus yield a natural tradeoff between the boundedness of the entries, and order of the tensor one may use for more expressive, efficient attention computation.
△ Less
Submitted 6 October, 2023;
originally announced October 2023.
-
Tensor Ranks and the Fine-Grained Complexity of Dynamic Programming
Authors:
Josh Alman,
Ethan Turok,
Hantao Yu,
Hengzhi Zhang
Abstract:
Generalizing work of Künnemann, Paturi, and Schneider [ICALP 2017], we study a wide class of high-dimensional dynamic programming (DP) problems in which one must find the shortest path between two points in a high-dimensional grid given a tensor of transition costs between nodes in the grid. This captures many classical problems which are solved using DP such as the knapsack problem, the airplane…
▽ More
Generalizing work of Künnemann, Paturi, and Schneider [ICALP 2017], we study a wide class of high-dimensional dynamic programming (DP) problems in which one must find the shortest path between two points in a high-dimensional grid given a tensor of transition costs between nodes in the grid. This captures many classical problems which are solved using DP such as the knapsack problem, the airplane refueling problem, and the minimal-weight polygon triangulation problem. We observe that for many of these problems, the tensor naturally has low tensor rank or low slice rank.
We then give new algorithms and a web of fine-grained reductions to tightly determine the complexity of these problems. For instance, we show that a polynomial speedup over the DP algorithm is possible when the tensor rank is a constant or the slice rank is 1, but that such a speedup is impossible if the tensor rank is slightly super-constant (assuming SETH) or the slice rank is at least 3 (assuming the APSP conjecture). We find that this characterizes the known complexities for many of these problems, and in some cases leads to new faster algorithms.
△ Less
Submitted 2 January, 2024; v1 submitted 9 September, 2023;
originally announced September 2023.
-
Fast Attention Requires Bounded Entries
Authors:
Josh Alman,
Zhao Song
Abstract:
In modern machine learning, inner product attention computation is a fundamental task for training large language models such as Transformer, GPT-1, BERT, GPT-2, GPT-3 and ChatGPT. Formally, in this problem, one is given as input three matrices $Q, K, V \in [-B,B]^{n \times d}$, and the goal is to construct the matrix…
▽ More
In modern machine learning, inner product attention computation is a fundamental task for training large language models such as Transformer, GPT-1, BERT, GPT-2, GPT-3 and ChatGPT. Formally, in this problem, one is given as input three matrices $Q, K, V \in [-B,B]^{n \times d}$, and the goal is to construct the matrix $\mathrm{Att}(Q,K,V) := \mathrm{diag}(A {\bf 1}_n)^{-1} A V \in \mathbb{R}^{n \times d}$, where $A = \exp(QK^\top/d)$ is the `attention matrix', and $\exp$ is applied entry-wise. Straightforward methods for this problem explicitly compute the $n \times n$ attention matrix $A$, and hence require time $Ω(n^2)$ even when $d = n^{o(1)}$ is small.
In this paper, we investigate whether faster algorithms are possible by implicitly making use of the matrix $A$. We present two results, showing that there is a sharp transition at $B = Θ(\sqrt{\log n})$.
$\bullet$ If $d = O(\log n)$ and $B = o(\sqrt{\log n})$, there is an $n^{1+o(1)}$ time algorithm to approximate $\mathrm{Att}(Q,K,V)$ up to $1/\mathrm{poly}(n)$ additive error.
$\bullet$ If $d = O(\log n)$ and $B = Θ(\sqrt{\log n})$, assuming the Strong Exponential Time Hypothesis from fine-grained complexity theory, it is impossible to approximate $\mathrm{Att}(Q,K,V)$ up to $1/\mathrm{poly}(n)$ additive error in truly subquadratic time $n^{2 - Ω(1)}$.
This gives a theoretical explanation for the phenomenon observed in practice that attention computation is much more efficient when the input matrices have smaller entries.
△ Less
Submitted 9 May, 2023; v1 submitted 25 February, 2023;
originally announced February 2023.
-
Matrix Multiplication and Number On the Forehead Communication
Authors:
Josh Alman,
Jarosław Błasiok
Abstract:
Three-player Number On the Forehead communication may be thought of as a three-player Number In the Hand promise model, in which each player is given the inputs that are supposedly on the other two players' heads, and promised that they are consistent with the inputs of of the other players. The set of all allowed inputs under this promise may be thought of as an order-3 tensor. We surprisingly ob…
▽ More
Three-player Number On the Forehead communication may be thought of as a three-player Number In the Hand promise model, in which each player is given the inputs that are supposedly on the other two players' heads, and promised that they are consistent with the inputs of of the other players. The set of all allowed inputs under this promise may be thought of as an order-3 tensor. We surprisingly observe that this tensor is exactly the matrix multiplication tensor, which is widely studied in the design of fast matrix multiplication algorithms.
Using this connection, we prove a number of results about both Number On the Forehead communication and matrix multiplication, each by using known results or techniques about the other. For example, we show how the Laser method, a key technique used to design the best matrix multiplication algorithms, can also be used to design communication protocols for a variety of problems. We also show how known lower bounds for Number On the Forehead communication can be used to bound properties of the matrix multiplication tensor such as its zeroing out subrank. Finally, we substantially generalize known methods based on slice-rank for studying communication, and show how they directly relate to the matrix multiplication exponent $ω$.
△ Less
Submitted 22 February, 2023;
originally announced February 2023.
-
Bypass Exponential Time Preprocessing: Fast Neural Network Training via Weight-Data Correlation Preprocessing
Authors:
Josh Alman,
Jiehao Liang,
Zhao Song,
Ruizhe Zhang,
Danyang Zhuo
Abstract:
Over the last decade, deep neural networks have transformed our society, and they are already widely applied in various machine learning applications. State-of-art deep neural networks are becoming larger in size every year to deliver increasing model accuracy, and as a result, model training consumes substantial computing resources and will only consume more in the future. Using current training…
▽ More
Over the last decade, deep neural networks have transformed our society, and they are already widely applied in various machine learning applications. State-of-art deep neural networks are becoming larger in size every year to deliver increasing model accuracy, and as a result, model training consumes substantial computing resources and will only consume more in the future. Using current training methods, in each iteration, to process a data point $x \in \mathbb{R}^d$ in a layer, we need to spend $Θ(md)$ time to evaluate all the $m$ neurons in the layer. This means processing the entire layer takes $Θ(nmd)$ time for $n$ data points. Recent work [Song, Yang and Zhang, NeurIPS 2021] reduces this time per iteration to $o(nmd)$, but requires exponential time to preprocess either the data or the neural network weights, making it unlikely to have practical usage.
In this work, we present a new preprocessing method that simply stores the weight-data correlation in a tree data structure in order to quickly, dynamically detect which neurons fire at each iteration. Our method requires only $O(nmd)$ time in preprocessing and still achieves $o(nmd)$ time per iteration. We complement our new algorithm with a lower bound, proving that assuming a popular conjecture from complexity theory, one could not substantially speed up our algorithm for dynamic detection of firing neurons.
△ Less
Submitted 25 November, 2022;
originally announced November 2022.
-
Faster Walsh-Hadamard and Discrete Fourier Transforms From Matrix Non-Rigidity
Authors:
Josh Alman,
Kevin Rao
Abstract:
We give algorithms with lower arithmetic operation counts for both the Walsh-Hadamard Transform (WHT) and the Discrete Fourier Transform (DFT) on inputs of power-of-2 size $N$.
For the WHT, our new algorithm has an operation count of $\frac{23}{24}N \log N + O(N)$. To our knowledge, this gives the first improvement on the $N \log N$ operation count of the simple, folklore Fast Walsh-Hadamard Tra…
▽ More
We give algorithms with lower arithmetic operation counts for both the Walsh-Hadamard Transform (WHT) and the Discrete Fourier Transform (DFT) on inputs of power-of-2 size $N$.
For the WHT, our new algorithm has an operation count of $\frac{23}{24}N \log N + O(N)$. To our knowledge, this gives the first improvement on the $N \log N$ operation count of the simple, folklore Fast Walsh-Hadamard Transform algorithm.
For the DFT, our new FFT algorithm uses $\frac{15}{4}N \log N + O(N)$ real arithmetic operations. Our leading constant $\frac{15}{4} = 3.75$ improves on the leading constant of $5$ from the Cooley-Tukey algorithm from 1965, leading constant $4$ from the split-radix algorithm of Yavne from 1968, leading constant $\frac{34}{9}=3.777\ldots$ from a modification of the split-radix algorithm by Van Buskirk from 2004, and leading constant $3.76875$ from a theoretically optimized version of Van Buskirk's algorithm by Sergeev from 2017.
Our new WHT algorithm takes advantage of a recent line of work on the non-rigidity of the WHT: we decompose the WHT matrix as the sum of a low-rank matrix and a sparse matrix, and then analyze the structures of these matrices to achieve a lower operation count. Our new DFT algorithm comes from a novel reduction, showing that parts of the previous best FFT algorithms can be replaced by calls to an algorithm for the WHT. Replacing the folklore WHT algorithm with our new improved algorithm leads to our improved FFT.
△ Less
Submitted 14 June, 2023; v1 submitted 11 November, 2022;
originally announced November 2022.
-
Smaller Low-Depth Circuits for Kronecker Powers
Authors:
Josh Alman,
Yunfeng Guan,
Ashwin Padaki
Abstract:
We give new, smaller constructions of constant-depth linear circuits for computing any matrix which is the Kronecker power of a fixed matrix. A standard argument (e.g., the mixed product property of Kronecker products, or a generalization of the Fast Walsh-Hadamard transform) shows that any such $N \times N$ matrix has a depth-2 circuit of size $O(N^{1.5})$. We improve on this for all such matrice…
▽ More
We give new, smaller constructions of constant-depth linear circuits for computing any matrix which is the Kronecker power of a fixed matrix. A standard argument (e.g., the mixed product property of Kronecker products, or a generalization of the Fast Walsh-Hadamard transform) shows that any such $N \times N$ matrix has a depth-2 circuit of size $O(N^{1.5})$. We improve on this for all such matrices, and especially for some such matrices of particular interest:
- For any integer $q > 1$ and any matrix which is the Kronecker power of a fixed $q \times q$ matrix, we construct a depth-2 circuit of size $O(N^{1.5 - a_q})$, where $a_q > 0$ is a positive constant depending only on $q$. No bound beating size $O(N^{1.5})$ was previously known for any $q>2$.
- For the case $q=2$, i.e., for any matrix which is the Kronecker power of a fixed $2 \times 2$ matrix, we construct a depth-2 circuit of size $O(N^{1.446})$, improving the prior best size $O(N^{1.493})$ [Alman, 2021].
- For the Walsh-Hadamard transform, we construct a depth-2 circuit of size $O(N^{1.443})$, improving the prior best size $O(N^{1.476})$ [Alman, 2021].
- For the disjointness matrix (the communication matrix of set disjointness, or equivalently, the matrix for the linear transform that evaluates a multilinear polynomial on all $0/1$ inputs), we construct a depth-2 circuit of size $O(N^{1.258})$, improving the prior best size $O(N^{1.272})$ [Jukna and Sergeev, 2013].
Our constructions also generalize to improving the standard construction for any depth $\leq O(\log N)$. Our main technical tool is an improved way to convert a nontrivial circuit for any matrix into a circuit for its Kronecker powers. Our new bounds provably could not be achieved using the approaches of prior work.
△ Less
Submitted 9 November, 2022;
originally announced November 2022.
-
Faster Walsh-Hadamard Transform and Matrix Multiplication over Finite Fields using Lookup Tables
Authors:
Josh Alman
Abstract:
We use lookup tables to design faster algorithms for important algebraic problems over finite fields. These faster algorithms, which only use arithmetic operations and lookup table operations, may help to explain the difficulty of determining the complexities of these important problems. Our results over a constant-sized finite field are as follows.
The Walsh-Hadamard transform of a vector of le…
▽ More
We use lookup tables to design faster algorithms for important algebraic problems over finite fields. These faster algorithms, which only use arithmetic operations and lookup table operations, may help to explain the difficulty of determining the complexities of these important problems. Our results over a constant-sized finite field are as follows.
The Walsh-Hadamard transform of a vector of length $N$ can be computed using $O(N \log N / \log \log N)$ bit operations. This generalizes to any transform defined as a Kronecker power of a fixed matrix. By comparison, the Fast Walsh-Hadamard transform (similar to the Fast Fourier transform) uses $O(N \log N)$ arithmetic operations, which is believed to be optimal up to constant factors.
Any algebraic algorithm for multiplying two $N \times N$ matrices using $O(N^ω)$ operations can be converted into an algorithm using $O(N^ω/ (\log N)^{ω/2 - 1})$ bit operations. For example, Strassen's algorithm can be converted into an algorithm using $O(N^{2.81} / (\log N)^{0.4})$ bit operations. It remains an open problem with practical implications to determine the smallest constant $c$ such that Strassen's algorithm can be implemented to use $c \cdot N^{2.81} + o(N^{2.81})$ arithmetic operations; using a lookup table allows one to save a super-constant factor in bit operations.
△ Less
Submitted 8 November, 2022;
originally announced November 2022.
-
Optimal-Degree Polynomial Approximations for Exponentials and Gaussian Kernel Density Estimation
Authors:
Amol Aggarwal,
Josh Alman
Abstract:
For any real numbers $B \ge 1$ and $δ\in (0, 1)$ and function $f: [0, B] \rightarrow \mathbb{R}$, let $d_{B; δ} (f) \in \mathbb{Z}_{> 0}$ denote the minimum degree of a polynomial $p(x)$ satisfying $\sup_{x \in [0, B]} \big| p(x) - f(x) \big| < δ$. In this paper, we provide precise asymptotics for $d_{B; δ} (e^{-x})$ and $d_{B; δ} (e^{x})$ in terms of both $B$ and $δ$, improving both the previousl…
▽ More
For any real numbers $B \ge 1$ and $δ\in (0, 1)$ and function $f: [0, B] \rightarrow \mathbb{R}$, let $d_{B; δ} (f) \in \mathbb{Z}_{> 0}$ denote the minimum degree of a polynomial $p(x)$ satisfying $\sup_{x \in [0, B]} \big| p(x) - f(x) \big| < δ$. In this paper, we provide precise asymptotics for $d_{B; δ} (e^{-x})$ and $d_{B; δ} (e^{x})$ in terms of both $B$ and $δ$, improving both the previously known upper bounds and lower bounds. In particular, we show $$d_{B; δ} (e^{-x}) = Θ\left( \max \left\{ \sqrt{B \log(δ^{-1})}, \frac{\log(δ^{-1}) }{ \log(B^{-1} \log(δ^{-1}))} \right\}\right), \text{ and}$$ $$d_{B; δ} (e^{x}) = Θ\left( \max \left\{ B, \frac{\log(δ^{-1}) }{ \log(B^{-1} \log(δ^{-1}))} \right\}\right).$$
Polynomial approximations for $e^{-x}$ and $e^x$ have applications to the design of algorithms for many problems, and our degree bounds show both the power and limitations of these algorithms.
We focus in particular on the Batch Gaussian Kernel Density Estimation problem for $n$ sample points in $Θ(\log n)$ dimensions with error $δ= n^{-Θ(1)}$. We show that the running time one can achieve depends on the square of the diameter of the point set, $B$, with a transition at $B = Θ(\log n)$ mirroring the corresponding transition in $d_{B; δ} (e^{-x})$:
- When $B=o(\log n)$, we give the first algorithm running in time $n^{1 + o(1)}$.
- When $B = κ\log n$ for a small constant $κ>0$, we give an algorithm running in time $n^{1 + O(\log \log κ^{-1} /\log κ^{-1})}$. The $\log \log κ^{-1} /\log κ^{-1}$ term in the exponent comes from analyzing the behavior of the leading constant in our computation of $d_{B; δ} (e^{-x})$.
- When $B = ω(\log n)$, we show that time $n^{2 - o(1)}$ is necessary assuming SETH.
△ Less
Submitted 12 May, 2022;
originally announced May 2022.
-
Parameterized Sensitivity Oracles and Dynamic Algorithms using Exterior Algebras
Authors:
Josh Alman,
Dean Hirsch
Abstract:
We design the first efficient sensitivity oracles and dynamic algorithms for a variety of parameterized problems. Our main approach is to modify the algebraic coding technique from static parameterized algorithm design, which had not previously been used in a dynamic context. We particularly build off of the `extensor coding' method of Brand, Dell and Husfeldt [STOC'18], employing properties of th…
▽ More
We design the first efficient sensitivity oracles and dynamic algorithms for a variety of parameterized problems. Our main approach is to modify the algebraic coding technique from static parameterized algorithm design, which had not previously been used in a dynamic context. We particularly build off of the `extensor coding' method of Brand, Dell and Husfeldt [STOC'18], employing properties of the exterior algebra over different fields.
For the $k$-Path detection problem for directed graphs, it is known that no efficient dynamic algorithm exists (under popular assumptions from fine-grained complexity). We circumvent this by designing an efficient sensitivity oracle, which preprocesses a directed graph on $n$ vertices in $2^k poly(k) n^{ω+o(1)}$ time, such that, given $\ell$ updates (mixing edge insertions and deletions, and vertex deletions) to that input graph, it can decide in time $\ell^2 2^kpoly(k)$ and with high probability, whether the updated graph contains a path of length $k$. We also give a deterministic sensitivity oracle requiring $4^k poly(k) n^{ω+o(1)}$ preprocessing time and $\ell^2 2^{ωk + o(k)}$ query time, and obtain a randomized sensitivity oracle for the task of approximately counting the number of $k$-paths. For $k$-Path detection in undirected graphs, we obtain a randomized sensitivity oracle with $O(1.66^k n^3)$ preprocessing time and $O(\ell^3 1.66^k)$ query time, and a better bound for undirected bipartite graphs.
In addition, we present the first fully dynamic algorithms for a variety of problems: $k$-Partial Cover, $m$-Set $k$-Packing, $t$-Dominating Set, $d$-Dimensional $k$-Matching, and Exact $k$-Partial Cover. For example, for $k$-Partial Cover we show a randomized dynamic algorithm with $2^k poly(k)polylog(n)$ update time, and a deterministic dynamic algorithm with $4^kpoly(k)polylog(n)$ update time.
△ Less
Submitted 18 June, 2022; v1 submitted 22 April, 2022;
originally announced April 2022.
-
Kronecker Products, Low-Depth Circuits, and Matrix Rigidity
Authors:
Josh Alman
Abstract:
For a matrix $M$ and a positive integer $r$, the rank $r$ rigidity of $M$ is the smallest number of entries of $M$ which one must change to make its rank at most $r$. There are many known applications of rigidity lower bounds to a variety of areas in complexity theory, but fewer known applications of rigidity upper bounds. In this paper, we use rigidity upper bounds to prove new upper bounds in a…
▽ More
For a matrix $M$ and a positive integer $r$, the rank $r$ rigidity of $M$ is the smallest number of entries of $M$ which one must change to make its rank at most $r$. There are many known applications of rigidity lower bounds to a variety of areas in complexity theory, but fewer known applications of rigidity upper bounds. In this paper, we use rigidity upper bounds to prove new upper bounds in a few different models of computation. Our results include:
$\bullet$ For any $d> 1$, and over any field $\mathbb{F}$, the $N \times N$ Walsh-Hadamard transform has a depth-$d$ linear circuit of size $O(d \cdot N^{1 + 0.96/d})$. This circumvents a known lower bound of $Ω(d \cdot N^{1 + 1/d})$ for circuits with bounded coefficients over $\mathbb{C}$ by Pudlák (2000), by using coefficients of magnitude polynomial in $N$. Our construction also generalizes to linear transformations given by a Kronecker power of any fixed $2 \times 2$ matrix.
$\bullet$ The $N \times N$ Walsh-Hadamard transform has a linear circuit of size $\leq (1.81 + o(1)) N \log_2 N$, improving on the bound of $\approx 1.88 N \log_2 N$ which one obtains from the standard fast Walsh-Hadamard transform.
$\bullet$ A new rigidity upper bound, showing that the following classes of matrices are not rigid enough to prove circuit lower bounds using Valiant's approach:
$-$ for any field $\mathbb{F}$ and any function $f : \{0,1\}^n \to \mathbb{F}$, the matrix $V_f \in \mathbb{F}^{2^n \times 2^n}$ given by, for any $x,y \in \{0,1\}^n$, $V_f[x,y] = f(x \wedge y)$, and
$-$ for any field $\mathbb{F}$ and any fixed-size matrices $M_1, \ldots, M_n \in \mathbb{F}^{q \times q}$, the Kronecker product $M_1 \otimes M_2 \otimes \cdots \otimes M_n$.
This generalizes recent results on non-rigidity, using a simpler approach which avoids needing the polynomial method.
△ Less
Submitted 23 February, 2021;
originally announced February 2021.
-
Metric Transforms and Low Rank Matrices via Representation Theory of the Real Hyperrectangle
Authors:
Josh Alman,
Timothy Chu,
Gary Miller,
Shyam Narayanan,
Mark Sellke,
Zhao Song
Abstract:
In this paper, we develop a new technique which we call representation theory of the real hyperrectangle, which describes how to compute the eigenvectors and eigenvalues of certain matrices arising from hyperrectangles. We show that these matrices arise naturally when analyzing a number of different algorithmic tasks such as kernel methods, neural network training, natural language processing, and…
▽ More
In this paper, we develop a new technique which we call representation theory of the real hyperrectangle, which describes how to compute the eigenvectors and eigenvalues of certain matrices arising from hyperrectangles. We show that these matrices arise naturally when analyzing a number of different algorithmic tasks such as kernel methods, neural network training, natural language processing, and the design of algorithms using the polynomial method. We then use our new technique along with these connections to prove several new structural results in these areas, including:
$\bullet$ A function is a positive definite Manhattan kernel if and only if it is a completely monotone function. These kernels are widely used across machine learning; one example is the Laplace kernel which is widely used in machine learning for chemistry.
$\bullet$ A function transforms Manhattan distances to Manhattan distances if and only if it is a Bernstein function. This completes the theory of Manhattan to Manhattan metric transforms initiated by Assouad in 1980.
$\bullet$ A function applied entry-wise to any square matrix of rank $r$ always results in a matrix of rank $< 2^{r-1}$ if and only if it is a polynomial of sufficiently low degree. This gives a converse to a key lemma used by the polynomial method in algorithm design.
Our work includes a sophisticated combination of techniques from different fields, including metric embeddings, the polynomial method, and group representation theory.
△ Less
Submitted 4 August, 2021; v1 submitted 23 November, 2020;
originally announced November 2020.
-
Algorithms and Hardness for Linear Algebra on Geometric Graphs
Authors:
Josh Alman,
Timothy Chu,
Aaron Schild,
Zhao Song
Abstract:
For a function $\mathsf{K} : \mathbb{R}^{d} \times \mathbb{R}^{d} \to \mathbb{R}_{\geq 0}$, and a set $P = \{ x_1, \ldots, x_n\} \subset \mathbb{R}^d$ of $n$ points, the $\mathsf{K}$ graph $G_P$ of $P$ is the complete graph on $n$ nodes where the weight between nodes $i$ and $j$ is given by $\mathsf{K}(x_i, x_j)$. In this paper, we initiate the study of when efficient spectral graph theory is poss…
▽ More
For a function $\mathsf{K} : \mathbb{R}^{d} \times \mathbb{R}^{d} \to \mathbb{R}_{\geq 0}$, and a set $P = \{ x_1, \ldots, x_n\} \subset \mathbb{R}^d$ of $n$ points, the $\mathsf{K}$ graph $G_P$ of $P$ is the complete graph on $n$ nodes where the weight between nodes $i$ and $j$ is given by $\mathsf{K}(x_i, x_j)$. In this paper, we initiate the study of when efficient spectral graph theory is possible on these graphs. We investigate whether or not it is possible to solve the following problems in $n^{1+o(1)}$ time for a $\mathsf{K}$-graph $G_P$ when $d < n^{o(1)}$:
$\bullet$ Multiply a given vector by the adjacency matrix or Laplacian matrix of $G_P$
$\bullet$ Find a spectral sparsifier of $G_P$
$\bullet$ Solve a Laplacian system in $G_P$'s Laplacian matrix
For each of these problems, we consider all functions of the form $\mathsf{K}(u,v) = f(\|u-v\|_2^2)$ for a function $f:\mathbb{R} \rightarrow \mathbb{R}$. We provide algorithms and comparable hardness results for many such $\mathsf{K}$, including the Gaussian kernel, Neural tangent kernels, and more. For example, in dimension $d = Ω(\log n)$, we show that there is a parameter associated with the function $f$ for which low parameter values imply $n^{1+o(1)}$ time algorithms for all three of these problems and high parameter values imply the nonexistence of subquadratic time algorithms assuming Strong Exponential Time Hypothesis ($\mathsf{SETH}$), given natural assumptions on $f$.
As part of our results, we also show that the exponential dependence on the dimension $d$ in the celebrated fast multipole method of Greengard and Rokhlin cannot be improved, assuming $\mathsf{SETH}$, for a broad class of functions $f$. To the best of our knowledge, this is the first formal limitation proven about fast multipole methods.
△ Less
Submitted 4 November, 2020;
originally announced November 2020.
-
A Refined Laser Method and Faster Matrix Multiplication
Authors:
Josh Alman,
Virginia Vassilevska Williams
Abstract:
The complexity of matrix multiplication is measured in terms of $ω$, the smallest real number such that two $n\times n$ matrices can be multiplied using $O(n^{ω+ε})$ field operations for all $ε>0$; the best bound until now is $ω<2.37287$ [Le Gall'14]. All bounds on $ω$ since 1986 have been obtained using the so-called laser method, a way to lower-bound the `value' of a tensor in designing matrix m…
▽ More
The complexity of matrix multiplication is measured in terms of $ω$, the smallest real number such that two $n\times n$ matrices can be multiplied using $O(n^{ω+ε})$ field operations for all $ε>0$; the best bound until now is $ω<2.37287$ [Le Gall'14]. All bounds on $ω$ since 1986 have been obtained using the so-called laser method, a way to lower-bound the `value' of a tensor in designing matrix multiplication algorithms. The main result of this paper is a refinement of the laser method that improves the resulting value bound for most sufficiently large tensors. Thus, even before computing any specific values, it is clear that we achieve an improved bound on $ω$, and we indeed obtain the best bound on $ω$ to date: $$ω< 2.37286.$$ The improvement is of the same magnitude as the improvement that [Le Gall'14] obtained over the previous bound [Vassilevska W.'12]. Our improvement to the laser method is quite general, and we believe it will have further applications in arithmetic complexity.
△ Less
Submitted 12 October, 2020;
originally announced October 2020.
-
Faster Update Time for Turnstile Streaming Algorithms
Authors:
Josh Alman,
Huacheng Yu
Abstract:
In this paper, we present a new algorithm for maintaining linear sketches in turnstile streams with faster update time. As an application, we show that $\log n$ \texttt{Count} sketches or \texttt{CountMin} sketches with a constant number of columns (i.e., buckets) can be implicitly maintained in \emph{worst-case} $O(\log^{0.582} n)$ update time using $O(\log n)$ words of space, on a standard word…
▽ More
In this paper, we present a new algorithm for maintaining linear sketches in turnstile streams with faster update time. As an application, we show that $\log n$ \texttt{Count} sketches or \texttt{CountMin} sketches with a constant number of columns (i.e., buckets) can be implicitly maintained in \emph{worst-case} $O(\log^{0.582} n)$ update time using $O(\log n)$ words of space, on a standard word RAM with word-size $w=Θ(\log n)$. The exponent $0.582\approx 2ω/3-1$, where $ω$ is the current matrix multiplication exponent. Due to the numerous applications of linear sketches, our algorithm improves the update time for many streaming problems in turnstile streams, in the high success probability setting, without using more space, including $\ell_2$ norm estimation, $\ell_2$ heavy hitters, point query with $\ell_1$ or $\ell_2$ error, etc. Our algorithm generalizes, with the same update time and space, to maintaining $\log n$ linear sketches, where each sketch partitions the coordinates into $k<\log^{o(1)} n$ buckets using a $c$-wise independent hash function for constant $c$, and maintains the sum of coordinates for each bucket. Moreover, if arbitrary word operations are allowed, the update time can be further improved to $O(\log^{0.187} n)$, where $0.187\approx ω/2-1$. Our update algorithm is adaptive, and it circumvents the non-adaptive cell-probe lower bounds for turnstile streaming algorithms by Larsen, Nelson and Nguy{ê}n (STOC'15).
On the other hand, our result also shows that proving unconditional cell-probe lower bound for the update time seems very difficult, even if the space is restricted to be (nearly) the optimum. If $ω=2$, the cell-probe update time of our algorithm would be $\log^{o(1)} n$. Hence, proving any higher lower bound would imply $ω>2$.
△ Less
Submitted 4 November, 2019;
originally announced November 2019.
-
Limits on the Universal Method for Matrix Multiplication
Authors:
Josh Alman
Abstract:
In this work, we prove limitations on the known methods for designing matrix multiplication algorithms. Alman and Vassilevska Williams recently defined the Universal Method, which substantially generalizes all the known approaches including Strassen's Laser Method and Cohn and Umans' Group Theoretic Method. We prove concrete lower bounds on the algorithms one can design by applying the Universal M…
▽ More
In this work, we prove limitations on the known methods for designing matrix multiplication algorithms. Alman and Vassilevska Williams recently defined the Universal Method, which substantially generalizes all the known approaches including Strassen's Laser Method and Cohn and Umans' Group Theoretic Method. We prove concrete lower bounds on the algorithms one can design by applying the Universal Method to many different tensors. Our proofs use new tools for upper bounding the asymptotic slice rank of a wide range of tensors. Our main result is that the Universal method applied to any Coppersmith-Winograd tensor $CW_q$ cannot yield a bound on $ω$, the exponent of matrix multiplication, better than $2.16805$. By comparison, it was previously only known that the weaker `Galactic Method' applied to $CW_q$ could not achieve an exponent of $2$.
We also study the Laser Method (which is, in principle, a highly special case of the Universal Method) and prove that it is "complete" for matrix multiplication algorithms: when it applies to a tensor $T$, it achieves $ω= 2$ if and only if it is possible for the Universal method applied to $T$ to achieve $ω= 2$. Hence, the Laser Method, which was originally used as an algorithmic tool, can also be seen as a lower bounding tool. For example, in their landmark paper, Coppersmith and Winograd achieved a bound of $ω\leq 2.376$, by applying the Laser Method to $CW_q$. By our result, the fact that they did not achieve $ω=2$ implies a lower bound on the Universal Method applied to $CW_q$. Indeed, if it were possible for the Universal Method applied to $CW_q$ to achieve $ω=2$, then Coppersmith and Winograd's application of the Laser Method would have achieved $ω=2$.
△ Less
Submitted 1 May, 2019; v1 submitted 20 December, 2018;
originally announced December 2018.
-
Limits on All Known (and Some Unknown) Approaches to Matrix Multiplication
Authors:
Josh Alman,
Virginia Vassilevska Williams
Abstract:
We study the known techniques for designing Matrix Multiplication algorithms. The two main approaches are the Laser method of Strassen, and the Group theoretic approach of Cohn and Umans. We define a generalization based on zeroing outs which subsumes these two approaches, which we call the Solar method, and an even more general method based on monomial degenerations, which we call the Galactic me…
▽ More
We study the known techniques for designing Matrix Multiplication algorithms. The two main approaches are the Laser method of Strassen, and the Group theoretic approach of Cohn and Umans. We define a generalization based on zeroing outs which subsumes these two approaches, which we call the Solar method, and an even more general method based on monomial degenerations, which we call the Galactic method.
We then design a suite of techniques for proving lower bounds on the value of $ω$, the exponent of matrix multiplication, which can be achieved by algorithms using many tensors $T$ and the Galactic method. Some of our techniques exploit `local' properties of $T$, like finding a sub-tensor of $T$ which is so `weak' that $T$ itself couldn't be used to achieve a good bound on $ω$, while others exploit `global' properties, like $T$ being a monomial degeneration of the structural tensor of a group algebra.
Our main result is that there is a universal constant $\ell>2$ such that a large class of tensors generalizing the Coppersmith-Winograd tensor $CW_q$ cannot be used within the Galactic method to show a bound on $ω$ better than $\ell$, for any $q$. We give evidence that previous lower-bounding techniques were not strong enough to show this. We also prove a number of complementary results along the way, including that for any group $G$, the structural tensor of $\mathbb{C}[G]$ can be used to recover the best bound on $ω$ which the Coppersmith-Winograd approach gets using $CW_{|G|-2}$ as long as the asymptotic rank of the structural tensor is not too large.
△ Less
Submitted 19 October, 2018;
originally announced October 2018.
-
An Illuminating Algorithm for the Light Bulb Problem
Authors:
Josh Alman
Abstract:
The Light Bulb Problem is one of the most basic problems in data analysis. One is given as input $n$ vectors in $\{-1,1\}^d$, which are all independently and uniformly random, except for a planted pair of vectors with inner product at least $ρ\cdot d$ for some constant $ρ> 0$. The task is to find the planted pair. The most straightforward algorithm leads to a runtime of $Ω(n^2)$. Algorithms based…
▽ More
The Light Bulb Problem is one of the most basic problems in data analysis. One is given as input $n$ vectors in $\{-1,1\}^d$, which are all independently and uniformly random, except for a planted pair of vectors with inner product at least $ρ\cdot d$ for some constant $ρ> 0$. The task is to find the planted pair. The most straightforward algorithm leads to a runtime of $Ω(n^2)$. Algorithms based on techniques like Locality-Sensitive Hashing achieve runtimes of $n^{2 - O(ρ)}$; as $ρ$ gets small, these approach quadratic.
Building on prior work, we give a new algorithm for this problem which runs in time $O(n^{1.582} + nd),$ regardless of how small $ρ$ is. This matches the best known runtime due to Karppa et al. Our algorithm combines techniques from previous work on the Light Bulb Problem with the so-called `polynomial method in algorithm design,' and has a simpler analysis than previous work. Our algorithm is also easily derandomized, leading to a deterministic algorithm for the Light Bulb Problem with the same runtime of $O(n^{1.582} + nd),$ improving previous results.
△ Less
Submitted 15 October, 2018;
originally announced October 2018.
-
Further limitations of the known approaches for matrix multiplication
Authors:
Josh Alman,
Virginia Vassilevska Williams
Abstract:
We consider the techniques behind the current best algorithms for matrix multiplication. Our results are threefold.
(1) We provide a unifying framework, showing that all known matrix multiplication running times since 1986 can be achieved from a single very natural tensor - the structural tensor $T_q$ of addition modulo an integer $q$.
(2) We show that if one applies a generalization of the kn…
▽ More
We consider the techniques behind the current best algorithms for matrix multiplication. Our results are threefold.
(1) We provide a unifying framework, showing that all known matrix multiplication running times since 1986 can be achieved from a single very natural tensor - the structural tensor $T_q$ of addition modulo an integer $q$.
(2) We show that if one applies a generalization of the known techniques (arbitrary zeroing out of tensor powers to obtain independent matrix products in order to use the asymptotic sum inequality of Schönhage) to an arbitrary monomial degeneration of $T_q$, then there is an explicit lower bound, depending on $q$, on the bound on the matrix multiplication exponent $ω$ that one can achieve. We also show upper bounds on the value $α$ that one can achieve, where $α$ is such that $n\times n^α\times n$ matrix multiplication can be computed in $n^{2+o(1)}$ time.
(3) We show that our lower bound on $ω$ approaches $2$ as $q$ goes to infinity. This suggests a promising approach to improving the bound on $ω$: for variable $q$, find a monomial degeneration of $T_q$ which, using the known techniques, produces an upper bound on $ω$ as a function of $q$. Then, take $q$ to infinity. It is not ruled out, and hence possible, that one can obtain $ω=2$ in this way.
△ Less
Submitted 19 December, 2017;
originally announced December 2017.
-
Dynamic Parameterized Problems and Algorithms
Authors:
Josh Alman,
Matthias Mnich,
Virginia Vassilevska Williams
Abstract:
Fixed-parameter algorithms and kernelization are two powerful methods to solve $\mathsf{NP}$-hard problems. Yet, so far those algorithms have been largely restricted to static inputs.
In this paper we provide fixed-parameter algorithms and kernelizations for fundamental $\mathsf{NP}$-hard problems with dynamic inputs. We consider a variety of parameterized graph and hitting set problems which ar…
▽ More
Fixed-parameter algorithms and kernelization are two powerful methods to solve $\mathsf{NP}$-hard problems. Yet, so far those algorithms have been largely restricted to static inputs.
In this paper we provide fixed-parameter algorithms and kernelizations for fundamental $\mathsf{NP}$-hard problems with dynamic inputs. We consider a variety of parameterized graph and hitting set problems which are known to have $f(k)n^{1+o(1)}$ time algorithms on inputs of size $n$, and we consider the question of whether there is a data structure that supports small updates (such as edge/vertex/set/element insertions and deletions) with an update time of $g(k)n^{o(1)}$; such an update time would be essentially optimal. Update and query times independent of $n$ are particularly desirable. Among many other results, we show that Feedback Vertex Set and $k$-Path admit dynamic algorithms with $f(k)\log^{O(1)}n$ update and query times for some function $f$ depending on the solution size $k$ only.
We complement our positive results by several conditional and unconditional lower bounds. For example, we show that unlike their undirected counterparts, Directed Feedback Vertex Set and Directed $k$-Path do not admit dynamic algorithms with $n^{o(1)}$ update and query times even for constant solution sizes $k\leq 3$, assuming popular hardness hypotheses. We also show that unconditionally, in the cell probe model, Directed Feedback Vertex Set cannot be solved with update time that is purely a function of $k$.
△ Less
Submitted 2 July, 2017;
originally announced July 2017.
-
Cell-Probe Lower Bounds from Online Communication Complexity
Authors:
Josh Alman,
Joshua R. Wang,
Huacheng Yu
Abstract:
In this work, we introduce an online model for communication complexity. Analogous to how online algorithms receive their input piece-by-piece, our model presents one of the players, Bob, his input piece-by-piece, and has the players Alice and Bob cooperate to compute a result each time before the next piece is revealed to Bob. This model has a closer and more natural correspondence to dynamic dat…
▽ More
In this work, we introduce an online model for communication complexity. Analogous to how online algorithms receive their input piece-by-piece, our model presents one of the players, Bob, his input piece-by-piece, and has the players Alice and Bob cooperate to compute a result each time before the next piece is revealed to Bob. This model has a closer and more natural correspondence to dynamic data structures than classic communication models do, and hence presents a new perspective on data structures.
We first present a tight lower bound for the online set intersection problem in the online communication model, demonstrating a general approach for proving online communication lower bounds. The online communication model prevents a batching trick that classic communication complexity allows, and yields a stronger lower bound. We then apply the online communication model to prove data structure lower bounds for two dynamic data structure problems: the Group Range problem and the Dynamic Connectivity problem for forests. Both of the problems admit a worst case $O(\log n)$-time data structure. Using online communication complexity, we prove a tight cell-probe lower bound for each: spending $o(\log n)$ (even amortized) time per operation results in at best an $\exp(-δ^2 n)$ probability of correctly answering a $(1/2+δ)$-fraction of the $n$ queries.
△ Less
Submitted 15 November, 2017; v1 submitted 20 April, 2017;
originally announced April 2017.
-
Probabilistic Rank and Matrix Rigidity
Authors:
Josh Alman,
Ryan Williams
Abstract:
We consider a notion of probabilistic rank and probabilistic sign-rank of a matrix, which measures the extent to which a matrix can be probabilistically represented by low-rank matrices. We demonstrate several connections with matrix rigidity, communication complexity, and circuit lower bounds, including:
The Walsh-Hadamard Transform is Not Very Rigid. We give surprising upper bounds on the rigi…
▽ More
We consider a notion of probabilistic rank and probabilistic sign-rank of a matrix, which measures the extent to which a matrix can be probabilistically represented by low-rank matrices. We demonstrate several connections with matrix rigidity, communication complexity, and circuit lower bounds, including:
The Walsh-Hadamard Transform is Not Very Rigid. We give surprising upper bounds on the rigidity of a family of matrices whose rigidity has been extensively studied, and was conjectured to be highly rigid. For the $2^n \times 2^n$ Walsh-Hadamard transform $H_n$ (a.k.a. Sylvester matrices, or the communication matrix of Inner Product mod 2), we show how to modify only $2^{εn}$ entries in each row and make the rank drop below $2^{n(1-Ω(ε^2/\log(1/ε)))}$, for all $ε> 0$, over any field. That is, it is not possible to prove arithmetic circuit lower bounds on Hadamard matrices, via L. Valiant's matrix rigidity approach. We also show non-trivial rigidity upper bounds for $H_n$ with smaller target rank.
Matrix Rigidity and Threshold Circuit Lower Bounds. We give new consequences of rigid matrices for Boolean circuit complexity. We show that explicit $n \times n$ Boolean matrices which maintain rank at least $2^{(\log n)^{1-δ}}$ after $n^2/2^{(\log n)^{δ/2}}$ modified entries would yield a function lacking sub-quadratic-size $AC^0$ circuits with two layers of arbitrary linear threshold gates. We also prove that explicit 0/1 matrices over $\mathbb{R}$ which are modestly more rigid than the best known rigidity lower bounds for sign-rank would imply strong lower bounds for the infamously difficult class $THR\circ THR$.
△ Less
Submitted 7 January, 2017; v1 submitted 16 November, 2016;
originally announced November 2016.
-
Polynomial Representations of Threshold Functions and Algorithmic Applications
Authors:
Josh Alman,
Timothy M. Chan,
Ryan Williams
Abstract:
We design new polynomials for representing threshold functions in three different regimes: probabilistic polynomials of low degree, which need far less randomness than previous constructions, polynomial threshold functions (PTFs) with "nice" threshold behavior and degree almost as low as the probabilistic polynomials, and a new notion of probabilistic PTFs where we combine the above techniques to…
▽ More
We design new polynomials for representing threshold functions in three different regimes: probabilistic polynomials of low degree, which need far less randomness than previous constructions, polynomial threshold functions (PTFs) with "nice" threshold behavior and degree almost as low as the probabilistic polynomials, and a new notion of probabilistic PTFs where we combine the above techniques to achieve even lower degree with similar "nice" threshold behavior. Utilizing these polynomial constructions, we design faster algorithms for a variety of problems:
$\bullet$ Offline Hamming Nearest (and Furthest) Neighbors: Given $n$ red and $n$ blue points in $d$-dimensional Hamming space for $d=c\log n$, we can find an (exact) nearest (or furthest) blue neighbor for every red point in randomized time $n^{2-1/O(\sqrt{c}\log^{2/3}c)}$ or deterministic time $n^{2-1/O(c\log^2c)}$. These also lead to faster MAX-SAT algorithms for sparse CNFs.
$\bullet$ Offline Approximate Nearest (and Furthest) Neighbors: Given $n$ red and $n$ blue points in $d$-dimensional $\ell_1$ or Euclidean space, we can find a $(1+ε)$-approximate nearest (or furthest) blue neighbor for each red point in randomized time near $dn+n^{2-Ω(ε^{1/3}/\log(1/ε))}$.
$\bullet$ SAT Algorithms and Lower Bounds for Circuits With Linear Threshold Functions: We give a satisfiability algorithm for $AC^0[m]\circ LTF\circ LTF$ circuits with a subquadratic number of linear threshold gates on the bottom layer, and a subexponential number of gates on the other layers, that runs in deterministic $2^{n-n^ε}$ time. This also implies new circuit lower bounds for threshold circuits. We also give a randomized $2^{n-n^ε}$-time SAT algorithm for subexponential-size $MAJ\circ AC^0\circ LTF\circ AC^0\circ LTF$ circuits, where the top $MAJ$ gate and middle $LTF$ gates have $O(n^{6/5-δ})$ fan-in.
△ Less
Submitted 15 August, 2016;
originally announced August 2016.
-
Probabilistic Polynomials and Hamming Nearest Neighbors
Authors:
Josh Alman,
Ryan Williams
Abstract:
We show how to compute any symmetric Boolean function on $n$ variables over any field (as well as the integers) with a probabilistic polynomial of degree $O(\sqrt{n \log(1/ε)})$ and error at most $ε$. The degree dependence on $n$ and $ε$ is optimal, matching a lower bound of Razborov (1987) and Smolensky (1987) for the MAJORITY function. The proof is constructive: a low-degree polynomial can be ef…
▽ More
We show how to compute any symmetric Boolean function on $n$ variables over any field (as well as the integers) with a probabilistic polynomial of degree $O(\sqrt{n \log(1/ε)})$ and error at most $ε$. The degree dependence on $n$ and $ε$ is optimal, matching a lower bound of Razborov (1987) and Smolensky (1987) for the MAJORITY function. The proof is constructive: a low-degree polynomial can be efficiently sampled from the distribution.
This polynomial construction is combined with other algebraic ideas to give the first subquadratic time algorithm for computing a (worst-case) batch of Hamming distances in superlogarithmic dimensions, exactly. To illustrate, let $c(n) : \mathbb{N} \rightarrow \mathbb{N}$. Suppose we are given a database $D$ of $n$ vectors in $\{0,1\}^{c(n) \log n}$ and a collection of $n$ query vectors $Q$ in the same dimension. For all $u \in Q$, we wish to compute a $v \in D$ with minimum Hamming distance from $u$. We solve this problem in $n^{2-1/O(c(n) \log^2 c(n))}$ randomized time. Hence, the problem is in "truly subquadratic" time for $O(\log n)$ dimensions, and in subquadratic time for $d = o((\log^2 n)/(\log \log n)^2)$. We apply the algorithm to computing pairs with maximum inner product, closest pair in $\ell_1$ for vectors with bounded integer entries, and pairs with maximum Jaccard coefficients.
△ Less
Submitted 17 July, 2015;
originally announced July 2015.
-
Circular Planar Electrical Networks II: Positivity Phenomena
Authors:
Joshua Alman,
Carl Lian,
Brandon Tran
Abstract:
Curtis-Ingerman-Morrow characterize response matrices for circular planar electrical networks as symmetric square matrices with row sums zero and non-negative circular minors. In this paper, we study this positivity phenomenon more closely, from both algebraic and combinatorial perspectives. Extending work of Postnikov, we introduce electrical positroids, which are the sets of circular minors whic…
▽ More
Curtis-Ingerman-Morrow characterize response matrices for circular planar electrical networks as symmetric square matrices with row sums zero and non-negative circular minors. In this paper, we study this positivity phenomenon more closely, from both algebraic and combinatorial perspectives. Extending work of Postnikov, we introduce electrical positroids, which are the sets of circular minors which can simultaneously be positive in a response matrix. We give a self-contained axiomatic description of these electrical positroids. In the second part of the paper, we discuss a naturally arising example of a Laurent phenomenon algebra, as studied by Lam-Pylyavskyy. We investigate the clusters in this algebra, building off of initial work by Kenyon-Wilson, using an analogue of weak separation, as was originally introduced by Leclerc-Zelevinsky.
△ Less
Submitted 11 September, 2013;
originally announced September 2013.
-
Circular Planar Electrical Networks I: The Electrical Poset EP_{n}
Authors:
Joshua Alman,
Carl Lian,
Brandon Tran
Abstract:
Following de Verdière-Gitler-Vertigan and Curtis-Ingerman-Morrow, we prove a host of new results on circular planar electrical networks. We introduce a poset EP_{n} of electrical networks with n boundary vertices, giving two equivalent characterizations, one combinatorial and the other topological. We then investigate various properties of the EP_{n}, proving that it is graded by number of edges o…
▽ More
Following de Verdière-Gitler-Vertigan and Curtis-Ingerman-Morrow, we prove a host of new results on circular planar electrical networks. We introduce a poset EP_{n} of electrical networks with n boundary vertices, giving two equivalent characterizations, one combinatorial and the other topological. We then investigate various properties of the EP_{n}, proving that it is graded by number of edges of critical representatives. Finally, we answer various enumerative questions related to EP_{n}, adapting methods of Callan and Stein-Everett.
△ Less
Submitted 10 September, 2013;
originally announced September 2013.
-
Laurent Phenomenon Sequences
Authors:
Joshua Alman,
Cesar Cuenca,
Jiaoyang Huang
Abstract:
In this paper, we undertake a systematic study of recurrences x_{m+n}x_{m} = P(x_{m+1}, ..., x_{m+n-1}) which exhibit the Laurent phenomenon. Some of the most famous among these sequences come from the Somos and the Gale-Robinson recurrences. Our approach is based on finding period 1 seeds of Laurent phenomenon algebras of Lam-Pylyavskyy. We completely classify polynomials P that generate period 1…
▽ More
In this paper, we undertake a systematic study of recurrences x_{m+n}x_{m} = P(x_{m+1}, ..., x_{m+n-1}) which exhibit the Laurent phenomenon. Some of the most famous among these sequences come from the Somos and the Gale-Robinson recurrences. Our approach is based on finding period 1 seeds of Laurent phenomenon algebras of Lam-Pylyavskyy. We completely classify polynomials P that generate period 1 seeds in the cases of n=2,3 and of mutual binomial seeds. We also find several other interesting families of polynomials P whose generated sequences exhibit the Laurent phenomenon. Our classification for binomial seeds is a direct generalization of a result by Fordy and Marsh, that employs a new combinatorial gadget we call a double quiver.
△ Less
Submitted 4 October, 2013; v1 submitted 3 September, 2013;
originally announced September 2013.