-
Randomized matrix-free quadrature for spectrum and spectral sum approximation
Authors:
Tyler Chen,
Thomas Trogdon,
Shashanka Ubaru
Abstract:
We study randomized matrix-free quadrature algorithms for spectrum and spectral sum approximation. The algorithms studied are characterized by the use of a Krylov subspace method to approximate independent and identically distributed samples of $\mathbf{v}^{\mathsf{H}} f(\mathbf{A}) \mathbf{v}$, where $\mathbf{v}$ is an isotropic random vector, $\mathbf{A}$ is a Hermitian matrix, and…
▽ More
We study randomized matrix-free quadrature algorithms for spectrum and spectral sum approximation. The algorithms studied are characterized by the use of a Krylov subspace method to approximate independent and identically distributed samples of $\mathbf{v}^{\mathsf{H}} f(\mathbf{A}) \mathbf{v}$, where $\mathbf{v}$ is an isotropic random vector, $\mathbf{A}$ is a Hermitian matrix, and $f(\mathbf{A})$ is a matrix function. This class of algorithms includes the kernel polynomial method and stochastic Lanczos quadrature, two widely used methods for approximating spectra and spectral sums. Our analysis, discussion, and numerical examples provide a unified framework for understanding randomized matrix-free quadrature algorithms and sheds light on the commonalities and tradeoffs between them. Moreover, this framework provides new insights into the practical implementation and use of these algorithms, particularly with regards to parameter selection in the kernel polynomial method.
△ Less
Submitted 2 September, 2022; v1 submitted 4 April, 2022;
originally announced April 2022.
-
Analysis of stochastic Lanczos quadrature for spectrum approximation
Authors:
Tyler Chen,
Thomas Trogdon,
Shashanka Ubaru
Abstract:
The cumulative empirical spectral measure (CESM) $Φ[\mathbf{A}] : \mathbb{R} \to [0,1]$ of a $n\times n$ symmetric matrix $\mathbf{A}$ is defined as the fraction of eigenvalues of $\mathbf{A}$ less than a given threshold, i.e., $Φ[\mathbf{A}](x) := \sum_{i=1}^{n} \frac{1}{n} {\large\unicode{x1D7D9}}[ λ_i[\mathbf{A}]\leq x]$. Spectral sums $\operatorname{tr}(f[\mathbf{A}])$ can be computed as the R…
▽ More
The cumulative empirical spectral measure (CESM) $Φ[\mathbf{A}] : \mathbb{R} \to [0,1]$ of a $n\times n$ symmetric matrix $\mathbf{A}$ is defined as the fraction of eigenvalues of $\mathbf{A}$ less than a given threshold, i.e., $Φ[\mathbf{A}](x) := \sum_{i=1}^{n} \frac{1}{n} {\large\unicode{x1D7D9}}[ λ_i[\mathbf{A}]\leq x]$. Spectral sums $\operatorname{tr}(f[\mathbf{A}])$ can be computed as the Riemann--Stieltjes integral of $f$ against $Φ[\mathbf{A}]$, so the task of estimating CESM arises frequently in a number of applications, including machine learning. We present an error analysis for stochastic Lanczos quadrature (SLQ). We show that SLQ obtains an approximation to the CESM within a Wasserstein distance of $t \: | λ_{\text{max}}[\mathbf{A}] - λ_{\text{min}}[\mathbf{A}] |$ with probability at least $1-η$, by applying the Lanczos algorithm for $\lceil 12 t^{-1} + \frac{1}{2} \rceil$ iterations to $\lceil 4 ( n+2 )^{-1}t^{-2} \ln(2nη^{-1}) \rceil$ vectors sampled independently and uniformly from the unit sphere. We additionally provide (matrix-dependent) a posteriori error bounds for the Wasserstein and Kolmogorov--Smirnov distances between the output of this algorithm and the true CESM. The quality of our bounds is demonstrated using numerical experiments.
△ Less
Submitted 10 June, 2021; v1 submitted 13 May, 2021;
originally announced May 2021.
-
Stop** time signatures for some algorithms in cryptography
Authors:
Percy Deift,
Stephen D. Miller,
Thomas Trogdon
Abstract:
We consider the normalized distribution of the overall running times of some cryptographic algorithms, and what information they reveal about the algorithms. Recent work of Deift, Menon, Olver, Pfrang, and Trogdon has shown that certain numerical algorithms applied to large random matrices exhibit a characteristic distribution of running times, which depends only on the algorithm but are independe…
▽ More
We consider the normalized distribution of the overall running times of some cryptographic algorithms, and what information they reveal about the algorithms. Recent work of Deift, Menon, Olver, Pfrang, and Trogdon has shown that certain numerical algorithms applied to large random matrices exhibit a characteristic distribution of running times, which depends only on the algorithm but are independent of the choice of probability distributions for the matrices. Different algorithms often exhibit different running time distributions, and so the histograms for these running time distributions provide a time-signature for the algorithms, making it possible, in many cases, to distinguish one algorithm from another. In this paper we extend this analysis to cryptographic algorithms, and present examples of such algorithms with time-signatures that are indistinguishable, and others with time-signatures that are clearly distinct.
△ Less
Submitted 20 May, 2019;
originally announced May 2019.
-
The conjugate gradient algorithm on well-conditioned Wishart matrices is almost deterministic
Authors:
Percy Deift,
Thomas Trogdon
Abstract:
We prove that the number of iterations required to solve a random positive definite linear system with the conjugate gradient algorithm is almost deterministic for large matrices. We treat the case of Wishart matrices $W = XX^*$ where $X$ is $n \times m$ and $n/m \sim d$ for $0 < d < 1$. Precisely, we prove that for most choices of error tolerance, as the matrix increases in size, the probability…
▽ More
We prove that the number of iterations required to solve a random positive definite linear system with the conjugate gradient algorithm is almost deterministic for large matrices. We treat the case of Wishart matrices $W = XX^*$ where $X$ is $n \times m$ and $n/m \sim d$ for $0 < d < 1$. Precisely, we prove that for most choices of error tolerance, as the matrix increases in size, the probability that the iteration count deviates from an explicit deterministic value tends to zero. In addition, for a fixed iteration count, we show that the norm of the error vector and the norm of the residual converge exponentially fast in probability, converge in mean and converge almost surely.
△ Less
Submitted 2 October, 2019; v1 submitted 25 January, 2019;
originally announced January 2019.
-
Universal halting times in optimization and machine learning
Authors:
Levent Sagun,
Thomas Trogdon,
Yann LeCun
Abstract:
The authors present empirical distributions for the halting time (measured by the number of iterations to reach a given accuracy) of optimization algorithms applied to two random systems: spin glasses and deep learning. Given an algorithm, which we take to be both the optimization routine and the form of the random landscape, the fluctuations of the halting time follow a distribution that, after c…
▽ More
The authors present empirical distributions for the halting time (measured by the number of iterations to reach a given accuracy) of optimization algorithms applied to two random systems: spin glasses and deep learning. Given an algorithm, which we take to be both the optimization routine and the form of the random landscape, the fluctuations of the halting time follow a distribution that, after centering and scaling, remains unchanged even when the distribution on the landscape is changed. We observe two qualitative classes: A Gumbel-like distribution that appears in Google searches, human decision times, the QR eigenvalue algorithm and spin glasses, and a Gaussian-like distribution that appears in conjugate gradient method, deep network with MNIST input data and deep network with random input data. This empirical evidence suggests presence of a class of distributions for which the halting time is independent of the underlying distribution under some conditions.
△ Less
Submitted 20 February, 2017; v1 submitted 19 November, 2015;
originally announced November 2015.