Search | arXiv e-print repository

Capacity Analysis of Vector Symbolic Architectures

Authors: Kenneth L. Clarkson, Shashanka Ubaru, Elizabeth Yang

Abstract: Hyperdimensional computing (HDC) is a biologically-inspired framework which represents symbols with high-dimensional vectors, and uses vector operations to manipulate them. The ensemble of a particular vector space and a prescribed set of vector operations (including one addition-like for "bundling" and one outer-product-like for "binding") form a *vector symbolic architecture* (VSA). While VSAs h… ▽ More Hyperdimensional computing (HDC) is a biologically-inspired framework which represents symbols with high-dimensional vectors, and uses vector operations to manipulate them. The ensemble of a particular vector space and a prescribed set of vector operations (including one addition-like for "bundling" and one outer-product-like for "binding") form a *vector symbolic architecture* (VSA). While VSAs have been employed in numerous applications and have been studied empirically, many theoretical questions about VSAs remain open. We analyze the *representation capacities* of four common VSAs: MAP-I, MAP-B, and two VSAs based on sparse binary vectors. "Representation capacity' here refers to bounds on the dimensions of the VSA vectors required to perform certain symbolic tasks, such as testing for set membership $i \in S$ and estimating set intersection sizes $|X \cap Y|$ for two sets of symbols $X$ and $Y$, to a given degree of accuracy. We also analyze the ability of a novel variant of a Hopfield network (a simple model of associative memory) to perform some of the same tasks that are typically asked of VSAs. In addition to providing new bounds on VSA capacities, our analyses establish and leverage connections between VSAs, "sketching" (dimensionality reduction) algorithms, and Bloom filters. △ Less

Submitted 14 February, 2023; v1 submitted 24 January, 2023; originally announced January 2023.

arXiv:2211.15860 [pdf, other]

Bayesian Experimental Design for Symbolic Discovery

Authors: Kenneth L. Clarkson, Cristina Cornelio, Sanjeeb Dash, Joao Goncalves, Lior Horesh, Nimrod Megiddo

Abstract: This study concerns the formulation and application of Bayesian optimal experimental design to symbolic discovery, which is the inference from observational data of predictive models taking general functional forms. We apply constrained first-order methods to optimize an appropriate selection criterion, using Hamiltonian Monte Carlo to sample from the prior. A step for computing the predictive dis… ▽ More This study concerns the formulation and application of Bayesian optimal experimental design to symbolic discovery, which is the inference from observational data of predictive models taking general functional forms. We apply constrained first-order methods to optimize an appropriate selection criterion, using Hamiltonian Monte Carlo to sample from the prior. A step for computing the predictive distribution, involving convolution, is computed via either numerical integration, or via fast transform methods. △ Less

Submitted 28 November, 2022; originally announced November 2022.

arXiv:2209.09371 [pdf, other]

Topological data analysis on noisy quantum computers

Authors: Ismail Yunus Akhalwaya, Shashanka Ubaru, Kenneth L. Clarkson, Mark S. Squillante, Vishnu Jejjala, Yang-Hui He, Kugendran Naidoo, Vasileios Kalantzis, Lior Horesh

Abstract: Topological data analysis (TDA) is a powerful technique for extracting complex and valuable shape-related summaries of high-dimensional data. However, the computational demands of classical algorithms for computing TDA are exorbitant, and quickly become impractical for high-order characteristics. Quantum computers offer the potential of achieving significant speedup for certain computational probl… ▽ More Topological data analysis (TDA) is a powerful technique for extracting complex and valuable shape-related summaries of high-dimensional data. However, the computational demands of classical algorithms for computing TDA are exorbitant, and quickly become impractical for high-order characteristics. Quantum computers offer the potential of achieving significant speedup for certain computational problems. Indeed, TDA has been purported to be one such problem, yet, quantum computing algorithms proposed for the problem, such as the original Quantum TDA (QTDA) formulation by Lloyd, Garnerone and Zanardi, require fault-tolerance qualifications that are currently unavailable. In this study, we present NISQ-TDA, a fully implemented end-to-end quantum machine learning algorithm needing only a short circuit-depth, that is applicable to high-dimensional classical data, and with provable asymptotic speedup for certain classes of problems. The algorithm neither suffers from the data-loading problem nor does it need to store the input data on the quantum computer explicitly. The algorithm was successfully executed on quantum computing devices, as well as on noisy quantum simulators, applied to small datasets. Preliminary empirical results suggest that the algorithm is robust to noise. △ Less

Submitted 19 March, 2024; v1 submitted 19 September, 2022; originally announced September 2022.

Comments: This paper is a follow up to arXiv:2108.02811 with improved theoretical results and other additional results. This new version presents an improved runtime for the algorithm, and fixes an issue present in the previous version

Journal ref: In the Proceedings of The Twelfth International Conference on Learning Representations (ICLR 2024)

arXiv:2202.05120 [pdf, ps, other]

Low-Rank Approximation with $1/ε^{1/3}$ Matrix-Vector Products

Authors: Ainesh Bakshi, Kenneth L. Clarkson, David P. Woodruff

Abstract: We study iterative methods based on Krylov subspaces for low-rank approximation under any Schatten-$p$ norm. Here, given access to a matrix $A$ through matrix-vector products, an accuracy parameter $ε$, and a target rank $k$, the goal is to find a rank-$k$ matrix $Z$ with orthonormal columns such that $\| A(I -ZZ^\top)\|_{S_p} \leq (1+ε)\min_{U^\top U = I_k} \|A(I - U U^\top)\|_{S_p}$, where… ▽ More We study iterative methods based on Krylov subspaces for low-rank approximation under any Schatten-$p$ norm. Here, given access to a matrix $A$ through matrix-vector products, an accuracy parameter $ε$, and a target rank $k$, the goal is to find a rank-$k$ matrix $Z$ with orthonormal columns such that $\| A(I -ZZ^\top)\|_{S_p} \leq (1+ε)\min_{U^\top U = I_k} \|A(I - U U^\top)\|_{S_p}$, where $\|M\|_{S_p}$ denotes the $\ell_p$ norm of the the singular values of $M$. For the special cases of $p=2$ (Frobenius norm) and $p = \infty$ (Spectral norm), Musco and Musco (NeurIPS 2015) obtained an algorithm based on Krylov methods that uses $\tilde{O}(k/\sqrtε)$ matrix-vector products, improving on the naïve $\tilde{O}(k/ε)$ dependence obtainable by the power method, where $\tilde{O}$ suppresses poly$(\log(dk/ε))$ factors. Our main result is an algorithm that uses only $\tilde{O}(kp^{1/6}/ε^{1/3})$ matrix-vector products, and works for all $p \geq 1$. For $p = 2$ our bound improves the previous $\tilde{O}(k/ε^{1/2})$ bound to $\tilde{O}(k/ε^{1/3})$. Since the Schatten-$p$ and Schatten-$\infty$ norms are the same up to a $(1+ ε)$-factor when $p \geq (\log d)/ε$, our bound recovers the result of Musco and Musco for $p = \infty$. Further, we prove a matrix-vector query lower bound of $Ω(1/ε^{1/3})$ for any fixed constant $p \geq 1$, showing that surprisingly $\tildeΘ(1/ε^{1/3})$ is the optimal complexity for constant~$k$. To obtain our results, we introduce several new techniques, including optimizing over multiple Krylov subspaces simultaneously, and pinching inequalities for partitioned operators. Our lower bound for $p \in [1,2]$ uses the Araki-Lieb-Thirring trace inequality, whereas for $p>2$, we appeal to a norm-compression inequality for aligned partitioned operators. △ Less

Submitted 16 June, 2022; v1 submitted 10 February, 2022; originally announced February 2022.

Comments: STOC '22

arXiv:2108.02811 [pdf, ps, other]

Quantum Topological Data Analysis with Linear Depth and Exponential Speedup

Authors: Shashanka Ubaru, Ismail Yunus Akhalwaya, Mark S. Squillante, Kenneth L. Clarkson, Lior Horesh

Abstract: Quantum computing offers the potential of exponential speedups for certain classical computations. Over the last decade, many quantum machine learning (QML) algorithms have been proposed as candidates for such exponential improvements. However, two issues unravel the hope of exponential speedup for some of these QML algorithms: the data-loading problem and, more recently, the stunning dequantizati… ▽ More Quantum computing offers the potential of exponential speedups for certain classical computations. Over the last decade, many quantum machine learning (QML) algorithms have been proposed as candidates for such exponential improvements. However, two issues unravel the hope of exponential speedup for some of these QML algorithms: the data-loading problem and, more recently, the stunning dequantization results of Tang et al. A third issue, namely the fault-tolerance requirements of most QML algorithms, has further hindered their practical realization. The quantum topological data analysis (QTDA) algorithm of Lloyd, Garnerone and Zanardi was one of the first QML algorithms that convincingly offered an expected exponential speedup. From the outset, it did not suffer from the data-loading problem. A recent result has also shown that the generalized problem solved by this algorithm is likely classically intractable, and would therefore be immune to any dequantization efforts. However, the QTDA algorithm of Lloyd et~al. has a time complexity of $O(n^4/(ε^2 δ))$ (where $n$ is the number of data points, $ε$ is the error tolerance, and $δ$ is the smallest nonzero eigenvalue of the restricted Laplacian) and requires fault-tolerant quantum computing, which has not yet been achieved. In this paper, we completely overhaul the QTDA algorithm to achieve an improved exponential speedup and depth complexity of $O(n\log(1/(δε)))$. Our approach includes three key innovations: (a) an efficient realization of the combinatorial Laplacian as a sum of Pauli operators; (b) a quantum rejection sampling approach to restrict the superposition to the simplices in the complex; and (c) a stochastic rank estimation method to estimate the Betti numbers. We present a theoretical error analysis, and the circuit and computational time and depth complexities for Betti number estimation. △ Less

Submitted 5 August, 2021; originally announced August 2021.

Comments: 27 pages

arXiv:2107.08090 [pdf, ps, other]

Near-Optimal Algorithms for Linear Algebra in the Current Matrix Multiplication Time

Authors: Nadiia Chepurko, Kenneth L. Clarkson, Praneeth Kacham, David P. Woodruff

Abstract: In the numerical linear algebra community, it was suggested that to obtain nearly optimal bounds for various problems such as rank computation, finding a maximal linearly independent subset of columns (a basis), regression, or low-rank approximation, a natural way would be to resolve the main open question of Nelson and Nguyen (FOCS, 2013). This question is regarding the logarithmic factors in the… ▽ More In the numerical linear algebra community, it was suggested that to obtain nearly optimal bounds for various problems such as rank computation, finding a maximal linearly independent subset of columns (a basis), regression, or low-rank approximation, a natural way would be to resolve the main open question of Nelson and Nguyen (FOCS, 2013). This question is regarding the logarithmic factors in the sketching dimension of existing oblivious subspace embeddings that achieve constant-factor approximation. We show how to bypass this question using a refined sketching technique, and obtain optimal or nearly optimal bounds for these problems. A key technique we use is an explicit map** of Indyk based on uncertainty principles and extractors, which after first applying known oblivious subspace embeddings, allows us to quickly spread out the mass of the vector so that sampling is now effective. We thereby avoid a logarithmic factor in the sketching dimension that is standard in bounds proven using the matrix Chernoff inequality. For the fundamental problems of rank computation and finding a basis, our algorithms improve Cheung, Kwok, and Lau (JACM, 2013), and are optimal to within a constant factor and a poly(log log(n))-factor, respectively. Further, for constant-factor regression and low-rank approximation we give the first optimal algorithms, for the current matrix multiplication exponent. △ Less

Submitted 2 November, 2021; v1 submitted 16 July, 2021; originally announced July 2021.

Comments: Accepted to SODA 2022

arXiv:2102.05758 [pdf, other]

Sparse graph based sketching for fast numerical linear algebra

Authors: Dong Hu, Shashanka Ubaru, Alex Gittens, Kenneth L. Clarkson, Lior Horesh, Vassilis Kalantzis

Abstract: In recent years, a variety of randomized constructions of sketching matrices have been devised, that have been used in fast algorithms for numerical linear algebra problems, such as least squares regression, low-rank approximation, and the approximation of leverage scores. A key property of sketching matrices is that of subspace embedding. In this paper, we study sketching matrices that are obtain… ▽ More In recent years, a variety of randomized constructions of sketching matrices have been devised, that have been used in fast algorithms for numerical linear algebra problems, such as least squares regression, low-rank approximation, and the approximation of leverage scores. A key property of sketching matrices is that of subspace embedding. In this paper, we study sketching matrices that are obtained from bipartite graphs that are sparse, i.e., have left degree~s that is small. In particular, we explore two popular classes of sparse graphs, namely, expander graphs and magical graphs. For a given subspace $\mathcal{U} \subseteq \mathbb{R}^n$ of dimension $k$, we show that the magical graph with left degree $s=2$ yields a $(1\pm ε)$ ${\ell}_2$-subspace embedding for $\mathcal{U}$, if the number of right vertices (the sketch size) $m = \mathcal{O}({k^2}/{ε^2})$. The expander graph with $s = \mathcal{O}({\log k}/ε)$ yields a subspace embedding for $m = \mathcal{O}({k \log k}/{ε^2})$. We also discuss the construction of sparse sketching matrices with reduced randomness using expanders based on error-correcting codes. Empirical results on various synthetic and real datasets show that these sparse graph sketching matrices work very well in practice. △ Less

Submitted 10 February, 2021; originally announced February 2021.

Journal ref: ICASSP 2021

arXiv:2101.02158 [pdf, other]

Order Embeddings from Merged Ontologies using Sketching

Authors: Kenneth L. Clarkson, Sanjana Sahayaraj

Abstract: We give a simple, low resource method to produce order embeddings from ontologies. Such embeddings map words to vectors so that order relations on the words, such as hypernymy/hyponymy, are represented in a direct way. Our method uses sketching techniques, in particular countsketch, for dimensionality reduction. We also study methods to merge ontologies, in particular those in medical domains, so… ▽ More We give a simple, low resource method to produce order embeddings from ontologies. Such embeddings map words to vectors so that order relations on the words, such as hypernymy/hyponymy, are represented in a direct way. Our method uses sketching techniques, in particular countsketch, for dimensionality reduction. We also study methods to merge ontologies, in particular those in medical domains, so that order relations are preserved. We give computational results for medical ontologies and for wordnet, showing that our merging techniques are effective and our embedding yields an accurate representation in both generic and specialised domains. △ Less

Submitted 6 January, 2021; originally announced January 2021.

arXiv:2011.04125 [pdf, ps, other]

Quantum-Inspired Algorithms from Randomized Numerical Linear Algebra

Authors: Nadiia Chepurko, Kenneth L. Clarkson, Lior Horesh, Honghao Lin, David P. Woodruff

Abstract: We create classical (non-quantum) dynamic data structures supporting queries for recommender systems and least-squares regression that are comparable to their quantum analogues. De-quantizing such algorithms has received a flurry of attention in recent years; we obtain sharper bounds for these problems. More significantly, we achieve these improvements by arguing that the previous quantum-inspired… ▽ More We create classical (non-quantum) dynamic data structures supporting queries for recommender systems and least-squares regression that are comparable to their quantum analogues. De-quantizing such algorithms has received a flurry of attention in recent years; we obtain sharper bounds for these problems. More significantly, we achieve these improvements by arguing that the previous quantum-inspired algorithms for these problems are doing leverage or ridge-leverage score sampling in disguise; these are powerful and standard techniques in randomized numerical linear algebra. With this recognition, we are able to employ the large body of work in numerical linear algebra to obtain algorithms for these problems that are simpler or faster (or both) than existing approaches. Our experiments demonstrate that the proposed data structures also work well on real-world datasets. △ Less

Submitted 28 June, 2022; v1 submitted 8 November, 2020; originally announced November 2020.

Comments: Minor edits to exposition

arXiv:2010.06392 [pdf, ps, other]

Projection techniques to update the truncated SVD of evolving matrices

Authors: Vassilis Kalantzis, Georgios Kollias, Shashanka Ubaru, Athanasios N. Nikolakopoulos, Lior Horesh, Kenneth L. Clarkson

Abstract: This paper considers the problem of updating the rank-k truncated Singular Value Decomposition (SVD) of matrices subject to the addition of new rows and/or columns over time. Such matrix problems represent an important computational kernel in applications such as Latent Semantic Indexing and Recommender Systems. Nonetheless, the proposed framework is purely algebraic and targets general updating p… ▽ More This paper considers the problem of updating the rank-k truncated Singular Value Decomposition (SVD) of matrices subject to the addition of new rows and/or columns over time. Such matrix problems represent an important computational kernel in applications such as Latent Semantic Indexing and Recommender Systems. Nonetheless, the proposed framework is purely algebraic and targets general updating problems. The algorithm presented in this paper undertakes a projection view-point and focuses on building a pair of subspaces which approximate the linear span of the sought singular vectors of the updated matrix. We discuss and analyze two different choices to form the projection subspaces. Results on matrices from real applications suggest that the proposed algorithm can lead to higher accuracy, especially for the singular triplets associated with the largest modulus singular values. Several practical details and key differences with other approaches are also discussed. △ Less

Submitted 13 October, 2020; originally announced October 2020.

Comments: 13 pages

arXiv:1905.05376 [pdf, ps, other]

Dimensionality Reduction for Tukey Regression

Authors: Kenneth L. Clarkson, Ruosong Wang, David P. Woodruff

Abstract: We give the first dimensionality reduction methods for the overconstrained Tukey regression problem. The Tukey loss function $\|y\|_M = \sum_i M(y_i)$ has $M(y_i) \approx |y_i|^p$ for residual errors $y_i$ smaller than a prescribed threshold $τ$, but $M(y_i)$ becomes constant for errors $|y_i| > τ$. Our results depend on a new structural result, proven constructively, showing that for any $d$-dime… ▽ More We give the first dimensionality reduction methods for the overconstrained Tukey regression problem. The Tukey loss function $\|y\|_M = \sum_i M(y_i)$ has $M(y_i) \approx |y_i|^p$ for residual errors $y_i$ smaller than a prescribed threshold $τ$, but $M(y_i)$ becomes constant for errors $|y_i| > τ$. Our results depend on a new structural result, proven constructively, showing that for any $d$-dimensional subspace $L \subset \mathbb{R}^n$, there is a fixed bounded-size subset of coordinates containing, for every $y \in L$, all the large coordinates, with respect to the Tukey loss function, of $y$. Our methods reduce a given Tukey regression problem to a smaller weighted version, whose solution is a provably good approximate solution to the original problem. Our reductions are fast, simple and easy to implement, and we give empirical results demonstrating their practicality, using existing heuristic solvers for the small versions. We also give exponential-time algorithms giving provably good solutions, and hardness results suggesting that a significant speedup in the worst case is unlikely. △ Less

Submitted 13 May, 2019; originally announced May 2019.

Comments: To appear in ICML 2019

arXiv:1902.00995 [pdf, ps, other]

Minimax experimental design: Bridging the gap between statistical and worst-case approaches to least squares regression

Authors: Michał Dereziński, Kenneth L. Clarkson, Michael W. Mahoney, Manfred K. Warmuth

Abstract: In experimental design, we are given a large collection of vectors, each with a hidden response value that we assume derives from an underlying linear model, and we wish to pick a small subset of the vectors such that querying the corresponding responses will lead to a good estimator of the model. A classical approach in statistics is to assume the responses are linear, plus zero-mean i.i.d. Gauss… ▽ More In experimental design, we are given a large collection of vectors, each with a hidden response value that we assume derives from an underlying linear model, and we wish to pick a small subset of the vectors such that querying the corresponding responses will lead to a good estimator of the model. A classical approach in statistics is to assume the responses are linear, plus zero-mean i.i.d. Gaussian noise, in which case the goal is to provide an unbiased estimator with smallest mean squared error (A-optimal design). A related approach, more common in computer science, is to assume the responses are arbitrary but fixed, in which case the goal is to estimate the least squares solution using few responses, as quickly as possible, for worst-case inputs. Despite many attempts, characterizing the relationship between these two approaches has proven elusive. We address this by proposing a framework for experimental design where the responses are produced by an arbitrary unknown distribution. We show that there is an efficient randomized experimental design procedure that achieves strong variance bounds for an unbiased estimator using few responses in this general model. Nearly tight bounds for the classical A-optimality criterion, as well as improved bounds for worst-case responses, emerge as special cases of this result. In the process, we develop a new algorithm for a joint sampling distribution called volume sampling, and we propose a new i.i.d. importance sampling method: inverse score sampling. A key novelty of our analysis is in develo** new expected error bounds for worst-case regression by controlling the tail behavior of i.i.d. sampling via the jointness of volume sampling. Our result motivates a new minimax-optimality criterion for experimental design which can be viewed as an extension of both A-optimal design and sampling for worst-case regression. △ Less

Submitted 3 February, 2019; originally announced February 2019.

arXiv:1807.09754 [pdf]

Data Infrastructure and Approaches for Ontology-Based Drug Repurposing

Authors: Stephen Boyer, Thomas Griffin, Sarath Swaminathan, Kenneth L. Clarkson, Dmitry Zubarev

Abstract: We report development of a data infrastructure for drug repurposing that takes advantage of two currently available chemical ontologies. The data infrastructure includes a database of compound- target associations augmented with molecular ontological labels. It also contains two computational tools for prediction of new associations. We describe two drug-repurposing systems: one, Nascent Ontologic… ▽ More We report development of a data infrastructure for drug repurposing that takes advantage of two currently available chemical ontologies. The data infrastructure includes a database of compound- target associations augmented with molecular ontological labels. It also contains two computational tools for prediction of new associations. We describe two drug-repurposing systems: one, Nascent Ontological Information Retrieval for Drug Repurposing (NOIR-DR), based on an information retrieval strategy, and another, based on non-negative matrix factorization together with compound similarity, that was inspired by recommender systems. We report the performance of both tools on a drug-repurposing task. △ Less

Submitted 12 July, 2018; originally announced July 2018.

Comments: 17 pages

arXiv:1611.03225 [pdf, ps, other]

Sharper Bounds for Regularized Data Fitting

Authors: Haim Avron, Kenneth L. Clarkson, David P. Woodruff

Abstract: We study matrix sketching methods for regularized variants of linear regression, low rank approximation, and canonical correlation analysis. Our main focus is on sketching techniques which preserve the objective function value for regularized problems, which is an area that has remained largely unexplored. We study regularization both in a fairly broad setting, and in the specific context of the p… ▽ More We study matrix sketching methods for regularized variants of linear regression, low rank approximation, and canonical correlation analysis. Our main focus is on sketching techniques which preserve the objective function value for regularized problems, which is an area that has remained largely unexplored. We study regularization both in a fairly broad setting, and in the specific context of the popular and widely used technique of ridge regularization; for the latter, as applied to each of these problems, we show algorithmic resource bounds in which the {\em statistical dimension} appears in places where in previous bounds the rank would appear. The statistical dimension is always smaller than the rank, and decreases as the amount of regularization increases. In particular, for the ridge low-rank approximation problem $\min_{Y,X} \lVert YX - A \rVert_F^2 + λ\lVert Y\rVert_F^2 + λ\lVert X \rVert_F^2$, where $Y\in\mathbb{R}^{n\times k}$ and $X\in\mathbb{R}^{k\times d}$, we give an approximation algorithm needing \[ O(\mathtt{nnz}(A)) + \tilde{O}((n+d)\varepsilon^{-1}k \min\{k, \varepsilon^{-1}\mathtt{sd}_λ(Y^*)\})+ \mathtt{poly}(\mathtt{sd}_λ(Y^*) \varepsilon^{-1}) \] time, where $s_λ(Y^*)\le k$ is the statistical dimension of $Y^*$, $Y^*$ is an optimal $Y$, $\varepsilon$ is an error parameter, and $\mathtt{nnz}(A)$ is the number of nonzero entries of $A$.This is faster than prior work, even when $λ=0$. We also study regularization in a much more general setting. For example, we obtain sketching-based algorithms for the low-rank approximation problem $\min_{X,Y} \lVert YX - A \rVert_F^2 + f(Y,X)$ where $f(\cdot,\cdot)$ is a regularizing function satisfying some very general conditions (chiefly, invariance under orthogonal transformations). △ Less

Submitted 26 June, 2017; v1 submitted 10 November, 2016; originally announced November 2016.

arXiv:1611.03220 [pdf, ps, other]

Faster Kernel Ridge Regression Using Sketching and Preconditioning

Authors: Haim Avron, Kenneth L. Clarkson, David P. Woodruff

Abstract: Kernel Ridge Regression (KRR) is a simple yet powerful technique for non-parametric regression whose computation amounts to solving a linear system. This system is usually dense and highly ill-conditioned. In addition, the dimensions of the matrix are the same as the number of data points, so direct methods are unrealistic for large-scale datasets. In this paper, we propose a preconditioning techn… ▽ More Kernel Ridge Regression (KRR) is a simple yet powerful technique for non-parametric regression whose computation amounts to solving a linear system. This system is usually dense and highly ill-conditioned. In addition, the dimensions of the matrix are the same as the number of data points, so direct methods are unrealistic for large-scale datasets. In this paper, we propose a preconditioning technique for accelerating the solution of the aforementioned linear system. The preconditioner is based on random feature maps, such as random Fourier features, which have recently emerged as a powerful technique for speeding up and scaling the training of kernel-based methods, such as kernel ridge regression, by resorting to approximations. However, random feature maps only provide crude approximations to the kernel function, so delivering state-of-the-art results by directly solving the approximated system requires the number of random features to be very large. We show that random feature maps can be much more effective in forming preconditioners, since under certain conditions a not-too-large number of random features is sufficient to yield an effective preconditioner. We empirically evaluate our method and show it is highly effective for datasets of up to one million training examples. △ Less

Submitted 15 July, 2017; v1 submitted 10 November, 2016; originally announced November 2016.

arXiv:1512.04226 [pdf, ps, other]

Random Sampling with Removal

Authors: Kenneth L. Clarkson, Bernd Gärtner, Johannes Lengler, May Szedlak

Abstract: We study randomized algorithms for constrained optimization, in abstract frameworks that include, in strictly increasing generality: convex programming; LP-type problems; violator spaces; and a setting we introduce, consistent spaces. Such algorithms typically involve a step of finding the optimal solution for a random sample of the constraints. They exploit the condition that, in finite dimension… ▽ More We study randomized algorithms for constrained optimization, in abstract frameworks that include, in strictly increasing generality: convex programming; LP-type problems; violator spaces; and a setting we introduce, consistent spaces. Such algorithms typically involve a step of finding the optimal solution for a random sample of the constraints. They exploit the condition that, in finite dimension $δ$, this sample optimum violates a provably small expected fraction of the non-sampled constraints, with the fraction decreasing in the sample size~$r$. We extend such algorithms by considering the technique of removal, where a fixed number $k$ of constraints are removed from the sample according to a fixed rule, with the goal of improving the solution quality. This may have the effect of increasing the number of violated non-sampled constraints. We study this increase, and bound it in a variety of general settings. This work is motivated by, and extends, results on removal as proposed for chance-constrained optimization. For many relevant values of $r$, $δ$, and $k$, we prove matching upper and lower bounds for the expected number of constraints violated by a random sample, after the removal of $k$ constraints. For a large range of values of $k$, the new upper bounds improve the previously best bounds for LP-type problems, which moreover had only been known in special cases, and not in the generality we consider. Moreover, we show that our results extend from finite to infinite spaces, for chance-constrained optimization. △ Less

Submitted 31 May, 2019; v1 submitted 14 December, 2015; originally announced December 2015.

arXiv:1510.06073 [pdf, ps, other]

Input Sparsity and Hardness for Robust Subspace Approximation

Authors: Kenneth L. Clarkson, David P. Woodruff

Abstract: In the subspace approximation problem, we seek a k-dimensional subspace F of R^d that minimizes the sum of p-th powers of Euclidean distances to a given set of n points a_1, ..., a_n in R^d, for p >= 1. More generally than minimizing sum_i dist(a_i,F)^p,we may wish to minimize sum_i M(dist(a_i,F)) for some loss function M(), for example, M-Estimators, which include the Huber and Tukey loss functio… ▽ More In the subspace approximation problem, we seek a k-dimensional subspace F of R^d that minimizes the sum of p-th powers of Euclidean distances to a given set of n points a_1, ..., a_n in R^d, for p >= 1. More generally than minimizing sum_i dist(a_i,F)^p,we may wish to minimize sum_i M(dist(a_i,F)) for some loss function M(), for example, M-Estimators, which include the Huber and Tukey loss functions. Such subspaces provide alternatives to the singular value decomposition (SVD), which is the p=2 case, finding such an F that minimizes the sum of squares of distances. For p in [1,2), and for typical M-Estimators, the minimizing $F$ gives a solution that is more robust to outliers than that provided by the SVD. We give several algorithmic and hardness results for these robust subspace approximation problems. We think of the n points as forming an n x d matrix A, and letting nnz(A) denote the number of non-zero entries of A. Our results hold for p in [1,2). We use poly(n) to denote n^{O(1)} as n -> infty. We obtain: (1) For minimizing sum_i dist(a_i,F)^p, we give an algorithm running in O(nnz(A) + (n+d)poly(k/eps) + exp(poly(k/eps))), (2) we show that the problem of minimizing sum_i dist(a_i, F)^p is NP-hard, even to output a (1+1/poly(d))-approximation, answering a question of Kannan and Vempala, and complementing prior results which held for p >2, (3) For loss functions for a wide class of M-Estimators, we give a problem-size reduction: for a parameter K=(log n)^{O(log k)}, our reduction takes O(nnz(A) log n + (n+d) poly(K/eps)) time to reduce the problem to a constrained version involving matrices whose dimensions are poly(K eps^{-1} log n). We also give bicriteria solutions, (4) Our techniques lead to the first O(nnz(A) + poly(d/eps)) time algorithms for (1+eps)-approximate regression for a wide class of convex M-Estimators. △ Less

Submitted 20 October, 2015; originally announced October 2015.

Comments: paper appeared in FOCS, 2015

arXiv:1211.0952 [pdf, other]

doi 10.1137/12089702X

Self-improving Algorithms for Coordinate-Wise Maxima and Convex Hulls

Authors: Kenneth L. Clarkson, Wolfgang Mulzer, C. Seshadhri

Abstract: Finding the coordinate-wise maxima and the convex hull of a planar point set are probably the most classic problems in computational geometry. We consider these problems in the self-improving setting. Here, we have $n$ distributions $\mathcal{D}_1, \ldots, \mathcal{D}_n$ of planar points. An input point set $(p_1, \ldots, p_n)$ is generated by taking an independent sample $p_i$ from each… ▽ More Finding the coordinate-wise maxima and the convex hull of a planar point set are probably the most classic problems in computational geometry. We consider these problems in the self-improving setting. Here, we have $n$ distributions $\mathcal{D}_1, \ldots, \mathcal{D}_n$ of planar points. An input point set $(p_1, \ldots, p_n)$ is generated by taking an independent sample $p_i$ from each $\mathcal{D}_i$, so the input is distributed according to the product $\mathcal{D} = \prod_i \mathcal{D}_i$. A self-improving algorithm repeatedly gets inputs from the distribution $\mathcal{D}$ (which is a priori unknown), and it tries to optimize its running time for $\mathcal{D}$. The algorithm uses the first few inputs to learn salient features of the distribution $\mathcal{D}$, before it becomes fine-tuned to $\mathcal{D}$. Let $\text{OPTMAX}_\mathcal{D}$ (resp. $\text{OPTCH}_\mathcal{D}$) be the expected depth of an \emph{optimal} linear comparison tree computing the maxima (resp. convex hull) for $\mathcal{D}$. Our maxima algorithm eventually achieves expected running time $O(\text{OPTMAX}_\mathcal{D} + n)$. Furthermore, we give a self-improving algorithm for convex hulls with expected running time $O(\text{OPTCH}_\mathcal{D} + n\log\log n)$. Our results require new tools for understanding linear comparison trees. In particular, we convert a general linear comparison tree to a restricted version that can then be related to the running time of our algorithms. Another interesting feature is an interleaved search procedure to determine the likeliest point to be extremal with minimal computation. This allows our algorithms to be competitive with the optimal algorithm for $\mathcal{D}$. △ Less

Submitted 11 January, 2014; v1 submitted 5 November, 2012; originally announced November 2012.

Comments: 39 pages, 17 figures; thoroughly revised presentation; preliminary versions appeared at SODA 2010 and SoCG 2012

Journal ref: SIAM Journal on Computing (SICOMP), 43(2), 2014, pp. 617-653

arXiv:1207.6365 [pdf, other]

Low Rank Approximation and Regression in Input Sparsity Time

Authors: Kenneth L. Clarkson, David P. Woodruff

Abstract: We design a new distribution over $\poly(r \eps^{-1}) \times n$ matrices $S$ so that for any fixed $n \times d$ matrix $A$ of rank $r$, with probability at least 9/10, $\norm{SAx}_2 = (1 \pm \eps)\norm{Ax}_2$ simultaneously for all $x \in \mathbb{R}^d$. Such a matrix $S$ is called a \emph{subspace embedding}. Furthermore, $SA$ can be computed in $\nnz(A) + \poly(d \eps^{-1})$ time, where… ▽ More We design a new distribution over $\poly(r \eps^{-1}) \times n$ matrices $S$ so that for any fixed $n \times d$ matrix $A$ of rank $r$, with probability at least 9/10, $\norm{SAx}_2 = (1 \pm \eps)\norm{Ax}_2$ simultaneously for all $x \in \mathbb{R}^d$. Such a matrix $S$ is called a \emph{subspace embedding}. Furthermore, $SA$ can be computed in $\nnz(A) + \poly(d \eps^{-1})$ time, where $\nnz(A)$ is the number of non-zero entries of $A$. This improves over all previous subspace embeddings, which required at least $Ω(nd \log d)$ time to achieve this property. We call our matrices $S$ \emph{sparse embedding matrices}. Using our sparse embedding matrices, we obtain the fastest known algorithms for $(1+\eps)$-approximation for overconstrained least-squares regression, low-rank approximation, approximating all leverage scores, and $\ell_p$-regression. The leading order term in the time complexity of our algorithms is $O(\nnz(A))$ or $O(\nnz(A)\log n)$. We optimize the low-order $\poly(d/\eps)$ terms in our running times (or for rank-$k$ approximation, the $n*\poly(k/eps)$ term), and show various tradeoffs. For instance, we also use our methods to design new preconditioners that improve the dependence on $\eps$ in least squares regression to $\log 1/\eps$. Finally, we provide preliminary experimental results which suggest that our algorithms are competitive in practice. △ Less

Submitted 5 April, 2013; v1 submitted 26 July, 2012; originally announced July 2012.

Comments: Included optimization of subspace embedding dimension from (d/eps)^4 to O~(d/eps)^2 in Section 4, by more careful analysis of perfect hashing, and minor improvements to regression / low rank approximation because of it

arXiv:1207.4684 [pdf, other]

The Fast Cauchy Transform and Faster Robust Linear Regression

Authors: Kenneth L. Clarkson, Petros Drineas, Malik Magdon-Ismail, Michael W. Mahoney, Xiangrui Meng, David P. Woodruff

Abstract: We provide fast algorithms for overconstrained $\ell_p$ regression and related problems: for an $n\times d$ input matrix $A$ and vector $b\in\mathbb{R}^n$, in $O(nd\log n)$ time we reduce the problem $\min_{x\in\mathbb{R}^d} \|Ax-b\|_p$ to the same problem with input matrix $\tilde A$ of dimension $s \times d$ and corresponding $\tilde b$ of dimension $s\times 1$. Here, $\tilde A$ and $\tilde b$ a… ▽ More We provide fast algorithms for overconstrained $\ell_p$ regression and related problems: for an $n\times d$ input matrix $A$ and vector $b\in\mathbb{R}^n$, in $O(nd\log n)$ time we reduce the problem $\min_{x\in\mathbb{R}^d} \|Ax-b\|_p$ to the same problem with input matrix $\tilde A$ of dimension $s \times d$ and corresponding $\tilde b$ of dimension $s\times 1$. Here, $\tilde A$ and $\tilde b$ are a coreset for the problem, consisting of sampled and rescaled rows of $A$ and $b$; and $s$ is independent of $n$ and polynomial in $d$. Our results improve on the best previous algorithms when $n\gg d$, for all $p\in[1,\infty)$ except $p=2$. We also provide a suite of improved results for finding well-conditioned bases via ellipsoidal rounding, illustrating tradeoffs between running time and conditioning quality, including a one-pass conditioning algorithm for general $\ell_p$ problems. We also provide an empirical evaluation of implementations of our algorithms for $p=1$, comparing them with related algorithms. Our empirical results show that, in the asymptotic regime, the theory is a very good guide to the practical performance of these algorithms. Our algorithms use our faster constructions of well-conditioned bases for $\ell_p$ spaces and, for $p=1$, a fast subspace embedding of independent interest that we call the Fast Cauchy Transform: a distribution over matrices $Π:\mathbb{R}^n\mapsto \mathbb{R}^{O(d\log d)}$, found obliviously to $A$, that approximately preserves the $\ell_1$ norms: that is, with large probability, simultaneously for all $x$, $\|Ax\|_1 \approx \|ΠAx\|_1$, with distortion $O(d^{2+η})$, for an arbitrarily small constant $η>0$; and, moreover, $ΠA$ can be computed in $O(nd\log d)$ time. The techniques underlying our Fast Cauchy Transform include fast Johnson-Lindenstrauss transforms, low-coherence matrices, and rescaling by Cauchy random variables. △ Less

Submitted 5 April, 2014; v1 submitted 19 July, 2012; originally announced July 2012.

Comments: 48 pages; substantially extended and revised; short version in SODA 2013

arXiv:1204.0824 [pdf, other]

doi 10.1137/12089702X

Self-improving Algorithms for Coordinate-wise Maxima

Authors: Kenneth L. Clarkson, Wolfgang Mulzer, C. Seshadhri

Abstract: Computing the coordinate-wise maxima of a planar point set is a classic and well-studied problem in computational geometry. We give an algorithm for this problem in the \emph{self-improving setting}. We have $n$ (unknown) independent distributions $\cD_1, \cD_2, ..., \cD_n$ of planar points. An input pointset $(p_1, p_2, ..., p_n)$ is generated by taking an independent sample $p_i$ from each… ▽ More Computing the coordinate-wise maxima of a planar point set is a classic and well-studied problem in computational geometry. We give an algorithm for this problem in the \emph{self-improving setting}. We have $n$ (unknown) independent distributions $\cD_1, \cD_2, ..., \cD_n$ of planar points. An input pointset $(p_1, p_2, ..., p_n)$ is generated by taking an independent sample $p_i$ from each $\cD_i$, so the input distribution $\cD$ is the product $\prod_i \cD_i$. A self-improving algorithm repeatedly gets input sets from the distribution $\cD$ (which is \emph{a priori} unknown) and tries to optimize its running time for $\cD$. Our algorithm uses the first few inputs to learn salient features of the distribution, and then becomes an optimal algorithm for distribution $\cD$. Let $\OPT_\cD$ denote the expected depth of an \emph{optimal} linear comparison tree computing the maxima for distribution $\cD$. Our algorithm eventually has an expected running time of $O(\text{OPT}_\cD + n)$, even though it did not know $\cD$ to begin with. Our result requires new tools to understand linear comparison trees for computing maxima. We show how to convert general linear comparison trees to very restricted versions, which can then be related to the running time of our algorithm. An interesting feature of our algorithm is an interleaved search, where the algorithm tries to determine the likeliest point to be maximal with minimal computation. This allows the running time to be truly optimal for the distribution $\cD$. △ Less

Submitted 3 April, 2012; originally announced April 2012.

Comments: To appear in Symposium of Computational Geometry 2012 (17 pages, 2 figures)

Journal ref: SIAM Journal on Computing (SICOMP), 43(2), 2014, pp. 617-653

arXiv:1010.4408 [pdf, other]

Sublinear Optimization for Machine Learning

Authors: Kenneth L. Clarkson, Elad Hazan, David P. Woodruff

Abstract: We give sublinear-time approximation algorithms for some optimization problems arising in machine learning, such as training linear classifiers and finding minimum enclosing balls. Our algorithms can be extended to some kernelized versions of these problems, such as SVDD, hard margin SVM, and L2-SVM, for which sublinear-time algorithms were not known before. These new algorithms use a combination… ▽ More We give sublinear-time approximation algorithms for some optimization problems arising in machine learning, such as training linear classifiers and finding minimum enclosing balls. Our algorithms can be extended to some kernelized versions of these problems, such as SVDD, hard margin SVM, and L2-SVM, for which sublinear-time algorithms were not known before. These new algorithms use a combination of a novel sampling techniques and a new multiplicative update algorithm. We give lower bounds which show the running times of many of our algorithms to be nearly best possible in the unit-cost RAM model. We also give implementations of our algorithms in the semi-streaming setting, obtaining the first low pass polylogarithmic space and sublinear time algorithms achieving arbitrary approximation factor. △ Less

Submitted 21 October, 2010; originally announced October 2010.

Comments: extended abstract appeared in FOCS 2010

arXiv:0909.0537 [pdf, ps, other]

On the Set Multi-Cover Problem in Geometric Settings

Authors: Chandra Chekuri, Kenneth L. Clarkson, Sariel Har-Peled

Abstract: We consider the set multi-cover problem in geometric settings. Given a set of points P and a collection of geometric shapes (or sets) F, we wish to find a minimum cardinality subset of F such that each point p in P is covered by (contained in) at least d(p) sets. Here d(p) is an integer demand (requirement) for p. When the demands d(p)=1 for all p, this is the standard set cover problem. The set… ▽ More We consider the set multi-cover problem in geometric settings. Given a set of points P and a collection of geometric shapes (or sets) F, we wish to find a minimum cardinality subset of F such that each point p in P is covered by (contained in) at least d(p) sets. Here d(p) is an integer demand (requirement) for p. When the demands d(p)=1 for all p, this is the standard set cover problem. The set cover problem in geometric settings admits an approximation ratio that is better than that for the general version. In this paper, we show that similar improvements can be obtained for the multi-cover problem as well. In particular, we obtain an O(log Opt) approximation for set systems of bounded VC-dimension, where Opt is the cardinality of an optimal solution, and an O(1) approximation for covering points by half-spaces in three dimensions and for some other classes of shapes. △ Less

Submitted 2 September, 2009; originally announced September 2009.

arXiv:0907.0884 [pdf, ps, other]

doi 10.1137/090766437

Self-Improving Algorithms

Authors: Nir Ailon, Bernard Chazelle, Kenneth L. Clarkson, Ding Liu, Wolfgang Mulzer, C. Seshadhri

Abstract: We investigate ways in which an algorithm can improve its expected performance by fine-tuning itself automatically with respect to an unknown input distribution D. We assume here that D is of product type. More precisely, suppose that we need to process a sequence I_1, I_2, ... of inputs I = (x_1, x_2, ..., x_n) of some fixed length n, where each x_i is drawn independently from some arbitrary, unk… ▽ More We investigate ways in which an algorithm can improve its expected performance by fine-tuning itself automatically with respect to an unknown input distribution D. We assume here that D is of product type. More precisely, suppose that we need to process a sequence I_1, I_2, ... of inputs I = (x_1, x_2, ..., x_n) of some fixed length n, where each x_i is drawn independently from some arbitrary, unknown distribution D_i. The goal is to design an algorithm for these inputs so that eventually the expected running time will be optimal for the input distribution D = D_1 * D_2 * ... * D_n. We give such self-improving algorithms for two problems: (i) sorting a sequence of numbers and (ii) computing the Delaunay triangulation of a planar point set. Both algorithms achieve optimal expected limiting complexity. The algorithms begin with a training phase during which they collect information about the input distribution, followed by a stationary regime in which the algorithms settle to their optimized incarnations. △ Less

Submitted 18 October, 2010; v1 submitted 5 July, 2009; originally announced July 2009.

Comments: 26 pages, 8 figures, preliminary versions appeared at SODA 2006 and SoCG 2008. Thorough revision to improve the presentation of the paper

ACM Class: F.2.2; D.1; F.1.1; I.2.6

Journal ref: SIAM Journal on Computing (SICOMP), 40(2), 2011, pp. 350-375

arXiv:cs/0501045 [pdf, ps, other]

Improved Approximation Algorithms for Geometric Set Cover

Authors: Kenneth L. Clarkson, Kasturi Varadarajan

Abstract: Given a collection S of subsets of some set U, and M a subset of U, the set cover problem is to find the smallest subcollection C of S such that M is a subset of the union of the sets in C. While the general problem is NP-hard to solve, even approximately, here we consider some geometric special cases, where usually U = R^d. Extending prior results, we show that approximation algorithms with pro… ▽ More Given a collection S of subsets of some set U, and M a subset of U, the set cover problem is to find the smallest subcollection C of S such that M is a subset of the union of the sets in C. While the general problem is NP-hard to solve, even approximately, here we consider some geometric special cases, where usually U = R^d. Extending prior results, we show that approximation algorithms with provable performance exist, under a certain general condition: that for a random subset R of S and function f(), there is a decomposition of the portion of U not covered by R into an expected f(|R|) regions, each region of a particular simple form. We show that under this condition, a cover of size O(f(|C|)) can be found. Our proof involves the generalization of shallow cuttings to more general geometric situations. We obtain constant-factor approximation algorithms for covering by unit cubes in R^3, for guarding a one-dimensional terrain, and for covering by similar-sized fat triangles in R^2. We also obtain improved approximation guarantees for fat triangles, of arbitrary size, and for a class of fat objects. △ Less

Submitted 20 January, 2005; originally announced January 2005.

ACM Class: F.2.2

Showing 1–25 of 25 results for author: Clarkson, K L