Skip to main content

Showing 1–20 of 20 results for author: Zandieh, A

.
  1. arXiv:2406.03482  [pdf, other

    cs.LG cs.AI cs.CL cs.PF

    QJL: 1-Bit Quantized JL Transform for KV Cache Quantization with Zero Overhead

    Authors: Amir Zandieh, Majid Daliri, Insu Han

    Abstract: Serving LLMs requires substantial memory due to the storage requirements of Key-Value (KV) embeddings in the KV cache, which grows with sequence length. An effective approach to compress KV cache is quantization. However, traditional quantization methods face significant memory overhead due to the need to store quantization constants (at least a zero point and a scale) in full precision per data b… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 13 pages

  2. arXiv:2402.06082  [pdf, other

    cs.LG cs.AI cs.DS

    SubGen: Token Generation in Sublinear Time and Memory

    Authors: Amir Zandieh, Insu Han, Vahab Mirrokni, Amin Karbasi

    Abstract: Despite the significant success of large language models (LLMs), their extensive memory requirements pose challenges for deploying them in long-context token generation. The substantial memory footprint of LLM decoders arises from the necessity to store all previous tokens in the attention module, a requirement imposed by key-value (KV) caching. In this work, our focus is on develo** an efficien… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  3. arXiv:2310.05869  [pdf, other

    cs.LG cs.AI

    HyperAttention: Long-context Attention in Near-Linear Time

    Authors: Insu Han, Rajesh Jayaram, Amin Karbasi, Vahab Mirrokni, David P. Woodruff, Amir Zandieh

    Abstract: We present an approximate attention mechanism named HyperAttention to address the computational challenges posed by the growing complexity of long contexts used in Large Language Models (LLMs). Recent work suggests that in the worst-case scenario, quadratic time is necessary unless the entries of the attention matrix are bounded or the matrix has low stable rank. We introduce two parameters which… ▽ More

    Submitted 1 December, 2023; v1 submitted 9 October, 2023; originally announced October 2023.

  4. arXiv:2302.02451  [pdf, other

    cs.LG cs.CV cs.DS

    KDEformer: Accelerating Transformers via Kernel Density Estimation

    Authors: Amir Zandieh, Insu Han, Majid Daliri, Amin Karbasi

    Abstract: Dot-product attention mechanism plays a crucial role in modern deep architectures (e.g., Transformer) for sequence modeling, however, naïve exact computation of this model incurs quadratic time and memory complexities in sequence length, hindering the training of long-sequence models. Critical bottlenecks are due to the computation of partition functions in the denominator of softmax function as w… ▽ More

    Submitted 29 June, 2023; v1 submitted 5 February, 2023; originally announced February 2023.

    Comments: 26 pages, 7 figures

  5. arXiv:2209.04121  [pdf, other

    cs.LG cs.AI stat.ML

    Fast Neural Kernel Embeddings for General Activations

    Authors: Insu Han, Amir Zandieh, Jaehoon Lee, Roman Novak, Lechao Xiao, Amin Karbasi

    Abstract: Infinite width limit has shed light on generalization and optimization aspects of deep learning by establishing connections between neural networks and kernel methods. Despite their importance, the utility of these kernel methods was limited in large-scale learning settings due to their (super-)quadratic runtime and memory complexities. Moreover, most prior works on neural kernels have focused on… ▽ More

    Submitted 9 September, 2022; originally announced September 2022.

  6. arXiv:2202.12995  [pdf, other

    math.NA cs.DS cs.LG eess.SP

    Near Optimal Reconstruction of Spherical Harmonic Expansions

    Authors: Amir Zandieh, Insu Han, Haim Avron

    Abstract: We propose an algorithm for robust recovery of the spherical harmonic expansion of functions defined on the d-dimensional unit sphere $\mathbb{S}^{d-1}$ using a near-optimal number of function evaluations. We show that for any $f \in L^2(\mathbb{S}^{d-1})$, the number of evaluations of $f$ needed to recover its degree-$q$ spherical harmonic expansion equals the dimension of the space of spherical… ▽ More

    Submitted 25 February, 2022; originally announced February 2022.

  7. arXiv:2202.04515  [pdf, other

    cs.LG cs.DS

    Leverage Score Sampling for Tensor Product Matrices in Input Sparsity Time

    Authors: David P. Woodruff, Amir Zandieh

    Abstract: We propose an input sparsity time sampling algorithm that can spectrally approximate the Gram matrix corresponding to the $q$-fold column-wise tensor product of $q$ matrices using a nearly optimal number of samples, improving upon all previously known methods by poly$(q)$ factors. Furthermore, for the important special case of the $q$-fold self-tensoring of a dataset, which is the feature matrix o… ▽ More

    Submitted 24 June, 2022; v1 submitted 9 February, 2022; originally announced February 2022.

  8. arXiv:2202.03474  [pdf, other

    cs.LG

    Random Gegenbauer Features for Scalable Kernel Methods

    Authors: Insu Han, Amir Zandieh, Haim Avron

    Abstract: We propose efficient random features for approximating a new and rich class of kernel functions that we refer to as Generalized Zonal Kernels (GZK). Our proposed GZK family, generalizes the zonal kernels (i.e., dot-product kernels on the unit sphere) by introducing radial factors in their Gegenbauer series expansion, and includes a wide range of ubiquitous kernel functions such as the entirety of… ▽ More

    Submitted 7 February, 2022; originally announced February 2022.

  9. arXiv:2107.07347  [pdf, other

    cs.DS

    Traversing the FFT Computation Tree for Dimension-Independent Sparse Fourier Transforms

    Authors: Karl Bringmann, Michael Kapralov, Mikhail Makarov, Vasileios Nakos, Amir Yagudin, Amir Zandieh

    Abstract: We consider the well-studied Sparse Fourier transform problem, where one aims to quickly recover an approximately Fourier $k$-sparse vector $\widehat{x} \in \mathbb{C}^{n^d}$ from observing its time domain representation $x$. In the exact $k$-sparse case the best known dimension-independent algorithm runs in near cubic time in $k$ and it is unclear whether a faster algorithm like in low dimensions… ▽ More

    Submitted 22 January, 2023; v1 submitted 15 July, 2021; originally announced July 2021.

  10. arXiv:2106.07880  [pdf, other

    cs.LG cs.CV cs.DS

    Scaling Neural Tangent Kernels via Sketching and Random Features

    Authors: Amir Zandieh, Insu Han, Haim Avron, Neta Shoham, Chaewon Kim, **woo Shin

    Abstract: The Neural Tangent Kernel (NTK) characterizes the behavior of infinitely-wide neural networks trained under least squares loss by gradient descent. Recent works also report that NTK regression can outperform finitely-wide neural networks trained on small-scale datasets. However, the computational complexity of kernel methods has limited its use in large-scale learning tasks. To accelerate learning… ▽ More

    Submitted 8 December, 2021; v1 submitted 15 June, 2021; originally announced June 2021.

    Comments: This is a merger of arXiv:2104.01351, arXiv:2104.00415

  11. arXiv:2104.00415  [pdf, other

    cs.LG cs.AI cs.DS

    Learning with Neural Tangent Kernels in Near Input Sparsity Time

    Authors: Amir Zandieh

    Abstract: The Neural Tangent Kernel (NTK) characterizes the behavior of infinitely wide neural nets trained under least squares loss by gradient descent. However, despite its importance, the super-quadratic runtime of kernel methods limits the use of NTK in large-scale learning tasks. To accelerate kernel machines with NTK, we propose a near input sparsity time algorithm that maps the input data to a random… ▽ More

    Submitted 27 July, 2021; v1 submitted 1 April, 2021; originally announced April 2021.

  12. arXiv:2007.03927  [pdf, ps, other

    cs.DS

    Near Input Sparsity Time Kernel Embeddings via Adaptive Sampling

    Authors: David P. Woodruff, Amir Zandieh

    Abstract: To accelerate kernel methods, we propose a near input sparsity time algorithm for sampling the high-dimensional feature space implicitly defined by a kernel transformation. Our main contribution is an importance sampling method for subsampling the feature space of a degree $q$ tensoring of data points in almost input sparsity time, improving the recent oblivious sketching method of (Ahle et al., 2… ▽ More

    Submitted 14 July, 2020; v1 submitted 8 July, 2020; originally announced July 2020.

  13. arXiv:2003.09756  [pdf, ps, other

    stat.ML cs.DS cs.LG

    Scaling up Kernel Ridge Regression via Locality Sensitive Hashing

    Authors: Michael Kapralov, Navid Nouri, Ilya Razenshteyn, Ameya Velingker, Amir Zandieh

    Abstract: Random binning features, introduced in the seminal paper of Rahimi and Recht (2007), are an efficient method for approximating a kernel matrix using locality sensitive hashing. Random binning features provide a very simple and efficient way of approximating the Laplace kernel but unfortunately do not apply to many important classes of kernels, notably ones that generate smooth Gaussian processes,… ▽ More

    Submitted 21 March, 2020; originally announced March 2020.

  14. arXiv:1909.01410  [pdf, ps, other

    cs.DS

    Oblivious Sketching of High-Degree Polynomial Kernels

    Authors: Thomas D. Ahle, Michael Kapralov, Jakob B. T. Knudsen, Rasmus Pagh, Ameya Velingker, David Woodruff, Amir Zandieh

    Abstract: Kernel methods are fundamental tools in machine learning that allow detection of non-linear dependencies between data without explicitly constructing feature vectors in high dimensional spaces. A major disadvantage of kernel methods is their poor scalability: primitives such as kernel PCA or kernel ridge regression generally take prohibitively large quadratic space and (at least) quadratic time, a… ▽ More

    Submitted 22 December, 2020; v1 submitted 3 September, 2019; originally announced September 2019.

  15. arXiv:1902.10633  [pdf, ps, other

    cs.DS

    Dimension-independent Sparse Fourier Transform

    Authors: Michael Kapralov, Ameya Velingker, Amir Zandieh

    Abstract: The Discrete Fourier Transform (DFT) is a fundamental computational primitive, and the fastest known algorithm for computing the DFT is the FFT (Fast Fourier Transform) algorithm. One remarkable feature of FFT is the fact that its runtime depends only on the size $N$ of the input vector, but not on the dimensionality of the input domain: FFT runs in time $O(N\log N)$ irrespective of whether the DF… ▽ More

    Submitted 27 February, 2019; originally announced February 2019.

  16. arXiv:1812.08723  [pdf, ps, other

    cs.DS cs.LG eess.SP math.NA

    A Universal Sampling Method for Reconstructing Signals with Simple Fourier Transforms

    Authors: Haim Avron, Michael Kapralov, Cameron Musco, Christopher Musco, Ameya Velingker, Amir Zandieh

    Abstract: Reconstructing continuous signals from a small number of discrete samples is a fundamental problem across science and engineering. In practice, we are often interested in signals with 'simple' Fourier structure, such as bandlimited, multiband, and Fourier sparse signals. More broadly, any prior knowledge about a signal's Fourier power spectrum can constrain its complexity. Intuitively, signals wit… ▽ More

    Submitted 20 December, 2018; originally announced December 2018.

  17. arXiv:1808.01842  [pdf, other

    cs.LG stat.ML

    Beyond $1/2$-Approximation for Submodular Maximization on Massive Data Streams

    Authors: Ashkan Norouzi-Fard, Jakub Tarnawski, Slobodan Mitrović, Amir Zandieh, Aida Mousavifar, Ola Svensson

    Abstract: Many tasks in machine learning and data mining, such as data diversification, non-parametric learning, kernel machines, clustering etc., require extracting a small but representative summary from a massive dataset. Often, such problems can be posed as maximizing a submodular set function subject to a cardinality constraint. We consider this question in the streaming setting, where elements arrive… ▽ More

    Submitted 6 August, 2018; originally announced August 2018.

    Journal ref: Proc. of 35th International Conference on Machine Learning (ICML), 2018, pages 3829-3838

  18. arXiv:1804.09893  [pdf, other

    cs.LG cs.DS math.NA stat.ML

    Random Fourier Features for Kernel Ridge Regression: Approximation Bounds and Statistical Guarantees

    Authors: Haim Avron, Michael Kapralov, Cameron Musco, Christopher Musco, Ameya Velingker, Amir Zandieh

    Abstract: Random Fourier features is one of the most popular techniques for scaling up kernel methods, such as kernel ridge regression. However, despite impressive empirical results, the statistical properties of random Fourier features are still not well understood. In this paper we take steps toward filling this gap. Specifically, we approach random Fourier features from a spectral matrix approximation po… ▽ More

    Submitted 21 May, 2018; v1 submitted 26 April, 2018; originally announced April 2018.

    Comments: An extended abstract of this work appears in the Proceedings of the 34th International Conference on Machine Learning (ICML 2017)

  19. arXiv:1702.01286  [pdf, other

    cs.DS

    An Adaptive Sublinear-Time Block Sparse Fourier Transform

    Authors: Volkan Cevher, Michael Kapralov, Jonathan Scarlett, Amir Zandieh

    Abstract: The problem of approximately computing the $k$ dominant Fourier coefficients of a vector $X$ quickly, and using few samples in time domain, is known as the Sparse Fourier Transform (sparse FFT) problem. A long line of work on the sparse FFT has resulted in algorithms with $O(k\log n\log (n/k))$ runtime [Hassanieh et al., STOC'12] and $O(k\log n)$ sample complexity [Indyk et al., FOCS'14]. These re… ▽ More

    Submitted 11 April, 2017; v1 submitted 4 February, 2017; originally announced February 2017.

  20. arXiv:1411.6587  [pdf

    cs.IT

    Reconstruction of Sub-Nyquist Random Sampling for Sparse and Multi-Band Signals

    Authors: Amir Zandieh, Alireza Zareian, Masoumeh Azghani, Farokh Marvasti

    Abstract: As technology grows, higher frequency signals are required to be processed in various applications. In order to digitize such signals, conventional analog to digital convertors are facing implementation challenges due to the higher sampling rates. Hence, lower sampling rates (i.e., sub-Nyquist) are considered to be cost efficient. A well-known approach is to consider sparse signals that have fewer… ▽ More

    Submitted 26 November, 2014; v1 submitted 8 November, 2014; originally announced November 2014.