-
cuFINUFFT: a load-balanced GPU library for general-purpose nonuniform FFTs
Authors:
Yu-hsuan Shih,
Garrett Wright,
Joakim Andén,
Johannes Blaschke,
Alex H. Barnett
Abstract:
Nonuniform fast Fourier transforms dominate the computational cost in many applications including image reconstruction and signal processing. We thus present a general-purpose GPU-based CUDA library for type 1 (nonuniform to uniform) and type 2 (uniform to nonuniform) transforms in dimensions 2 and 3, in single or double precision. It achieves high performance for a given user-requested accuracy,…
▽ More
Nonuniform fast Fourier transforms dominate the computational cost in many applications including image reconstruction and signal processing. We thus present a general-purpose GPU-based CUDA library for type 1 (nonuniform to uniform) and type 2 (uniform to nonuniform) transforms in dimensions 2 and 3, in single or double precision. It achieves high performance for a given user-requested accuracy, regardless of the distribution of nonuniform points, via cache-aware point reordering, and load-balanced blocked spreading in shared memory. At low accuracies, this gives on-GPU throughputs around $10^9$ nonuniform points per second, and (even including host-device transfer) is typically 4-10$\times$ faster than the latest parallel CPU code FINUFFT (at 28 threads). It is competitive with two established GPU codes, being up to 90$\times$ faster at high accuracy and/or type 1 clustered point distributions. Finally we demonstrate a 5-12$\times$ speedup versus CPU in an X-ray diffraction 3D iterative reconstruction task at $10^{-12}$ accuracy, observing excellent multi-GPU weak scaling up to one rank per GPU.
△ Less
Submitted 25 March, 2021; v1 submitted 16 February, 2021;
originally announced February 2021.
-
Factorization of the translation kernel for fast rigid image alignment
Authors:
Aaditya Rangan,
Marina Spivak,
Joakim Andén,
Alex Barnett
Abstract:
An important component of many image alignment methods is the calculation of inner products (correlations) between an image of $n\times n$ pixels and another image translated by some shift and rotated by some angle. For robust alignment of an image pair, the number of considered shifts and angles is typically high, thus the inner product calculation becomes a bottleneck. Existing methods, based on…
▽ More
An important component of many image alignment methods is the calculation of inner products (correlations) between an image of $n\times n$ pixels and another image translated by some shift and rotated by some angle. For robust alignment of an image pair, the number of considered shifts and angles is typically high, thus the inner product calculation becomes a bottleneck. Existing methods, based on fast Fourier transforms (FFTs), compute all such inner products with computational complexity $\mathcal{O}(n^3 \log n)$ per image pair, which is reduced to $\mathcal{O}(N n^2)$ if only $N$ distinct shifts are needed. We propose to use a factorization of the translation kernel (FTK), an optimal interpolation method which represents images in a Fourier--Bessel basis and uses a rank-$H$ approximation of the translation kernel via an operator singular value decomposition (SVD). Its complexity is $\mathcal{O}(Hn(n + N))$ per image pair. We prove that $H = \mathcal{O}((W + \log(1/ε))^2)$, where $2W$ is the magnitude of the maximum desired shift in pixels and $ε$ is the desired accuracy. For fixed $W$ this leads to an acceleration when $N$ is large, such as when sub-pixel shift grids are considered. Finally, we present numerical results in an electron cryomicroscopy application showing speedup factors of $3$-$10$ with respect to the state of the art.
△ Less
Submitted 4 October, 2019; v1 submitted 29 May, 2019;
originally announced May 2019.
-
Multitaper estimation on arbitrary domains
Authors:
Joakim Andén,
José Luis Romero
Abstract:
Multitaper estimators have enjoyed significant success in estimating spectral densities from finite samples using as tapers Slepian functions defined on the acquisition domain. Unfortunately, the numerical calculation of these Slepian tapers is only tractable for certain symmetric domains, such as rectangles or disks. In addition, no performance bounds are currently available for the mean squared…
▽ More
Multitaper estimators have enjoyed significant success in estimating spectral densities from finite samples using as tapers Slepian functions defined on the acquisition domain. Unfortunately, the numerical calculation of these Slepian tapers is only tractable for certain symmetric domains, such as rectangles or disks. In addition, no performance bounds are currently available for the mean squared error of the spectral density estimate. This situation is inadequate for applications such as cryo-electron microscopy, where noise models must be estimated from irregular domains with small sample sizes. We show that the multitaper estimator only depends on the linear space spanned by the tapers. As a result, Slepian tapers may be replaced by proxy tapers spanning the same subspace (validating the common practice of using partially converged solutions to the Slepian eigenproblem as tapers). These proxies may consequently be calculated using standard numerical algorithms for block diagonalization. We also prove a set of performance bounds for multitaper estimators on arbitrary domains. The method is demonstrated on synthetic and experimental datasets from cryo-electron microscopy, where it reduces mean squared error by a factor of two or more compared to traditional methods.
△ Less
Submitted 18 June, 2020; v1 submitted 7 December, 2018;
originally announced December 2018.
-
Factor Analysis for Spectral Estimation
Authors:
Joakim Andén,
Amit Singer
Abstract:
Power spectrum estimation is an important tool in many applications, such as the whitening of noise. The popular multitaper method enjoys significant success, but fails for short signals with few samples. We propose a statistical model where a signal is given by a random linear combination of fixed, yet unknown, stochastic sources. Given multiple such signals, we estimate the subspace spanned by t…
▽ More
Power spectrum estimation is an important tool in many applications, such as the whitening of noise. The popular multitaper method enjoys significant success, but fails for short signals with few samples. We propose a statistical model where a signal is given by a random linear combination of fixed, yet unknown, stochastic sources. Given multiple such signals, we estimate the subspace spanned by the power spectra of these fixed sources. Projecting individual power spectrum estimates onto this subspace increases estimation accuracy. We provide accuracy guarantees for this method and demonstrate it on simulated and experimental data from cryo-electron microscopy.
△ Less
Submitted 8 May, 2017; v1 submitted 15 February, 2017;
originally announced February 2017.