-
Bits missing: Finding exotic pulsars using bfloat16 on NVIDIA GPUs
Authors:
Jack White,
Karel Adamek,
Jayanta Roy,
Sofia Dimoudi,
Scott M. Ransom,
Wesley Armour
Abstract:
The Fourier Domain Acceleration Search (FDAS) is an effective technique for detecting faint binary pulsars in large radio astronomy datasets. This paper quantifies the sensitivity impact of reducing numerical precision in the GPU accelerated FDAS pipeline of the AstroAccelerate software package. The prior implementation used IEEE-754 single-precision in the entire binary pulsar detection pipeline,…
▽ More
The Fourier Domain Acceleration Search (FDAS) is an effective technique for detecting faint binary pulsars in large radio astronomy datasets. This paper quantifies the sensitivity impact of reducing numerical precision in the GPU accelerated FDAS pipeline of the AstroAccelerate software package. The prior implementation used IEEE-754 single-precision in the entire binary pulsar detection pipeline, spending a large fraction of the runtime computing GPU accelerated FFTs. AstroAccelerate has been modified to use bfloat16 (and IEEE754 double-precision to provide a "gold standard" comparison) within the Fourier domain convolution section of the FDAS routine. Approximately 20,000 synthetic pulsar filterbank files representing binary pulsars were generated using SIGPROC with a range of physical parameters. They have been processed using bfloat16, single and double-precision convolutions. All bfloat16 peaks are within 3% of the predicted signal-to-noise ratio of their corresponding single-precision peaks. Of 14,971 "bright" single-precision fundamental peaks above a power of 44.982 (our experimentally measured highest noise value), 14,602 (97.53%) have a peak in the same acceleration and frequency bin in the bfloat16 output plane, whilst in the remaining 369 the nearest peak is located in the adjacent acceleration bin. There is no bin drift measured between the single and double-precision results. The bfloat16 version of FDAS achieves a speedup of approximately 1.6x compared to single-precision. A comparison between AstroAccelerate and the PRESTO software package is presented using observations collected with the GMRT of PSR J1544+4937, a 2.16ms black widow pulsar in a 2.8 hour compact orbit.
△ Less
Submitted 24 June, 2022;
originally announced June 2022.
-
Searching for pulsars in extreme orbits -- GPU acceleration of the Fourier domain 'jerk' search
Authors:
Karel Adámek,
Jan Novotný,
Sofia Dimoudi,
Wesley Armour
Abstract:
Binary pulsars are an important target for radio surveys because they present a natural laboratory for a wide range of astrophysics for example testing general relativity, including detection of gravitational waves. The orbital motion of a pulsar which is locked in a binary system causes a frequency shift (a Doppler shift) in their normally very periodic pulse emissions. These shifts cause a reduc…
▽ More
Binary pulsars are an important target for radio surveys because they present a natural laboratory for a wide range of astrophysics for example testing general relativity, including detection of gravitational waves. The orbital motion of a pulsar which is locked in a binary system causes a frequency shift (a Doppler shift) in their normally very periodic pulse emissions. These shifts cause a reduction in the sensitivity of traditional periodicity searches. To correct this smearing Ransom [2001], Ransom et al. [2002] developed the Fourier domain acceleration search (FDAS) which uses a matched filtering technique. This method is however limited to a constant pulsar acceleration. Therefore, Andersen and Ransom [2018] broadened the Fourier domain acceleration search to account also for a linear change in the acceleration by implementing the Fourier domain "jerk" search into the PRESTO software package. This extension increases the number of matched filters used significantly. We have implemented the Fourier domain "jerk" search (JERK) on GPUs using CUDA. We have achieved 90x performance increase when compared to the parallel implementation of JERK in PRESTO. This work is part of the AstroAccelerate project Armour et al. [2019], a many-core accelerated time-domain signal processing library for radio astronomy.
△ Less
Submitted 4 November, 2019;
originally announced November 2019.
-
GPU Fast Convolution via the Overlap-and-Save Method in Shared Memory
Authors:
Karel Adámek,
Sofia Dimoudi,
Mike Giles,
Wesley Armour
Abstract:
We present an implementation of the overlap-and-save method, a method for the convolution of very long signals with short response functions, which is tailored to GPUs. We have implemented several FFT algorithms (using the CUDA programming language) which exploit GPU shared memory, allowing for GPU accelerated convolution. We compare our implementation with an implementation of the overlap-and-sav…
▽ More
We present an implementation of the overlap-and-save method, a method for the convolution of very long signals with short response functions, which is tailored to GPUs. We have implemented several FFT algorithms (using the CUDA programming language) which exploit GPU shared memory, allowing for GPU accelerated convolution. We compare our implementation with an implementation of the overlap-and-save algorithm utilizing the NVIDIA FFT library (cuFFT). We demonstrate that by using a shared memory based FFT we can achieved significant speed-ups for certain problem sizes and lower the memory requirements of the overlap-and-save method on GPUs.
△ Less
Submitted 10 April, 2020; v1 submitted 4 October, 2019;
originally announced October 2019.
-
A GPU implementation of the Correlation Technique for Real-time Fourier Domain Pulsar Acceleration Searches
Authors:
Sofia Dimoudi,
Karel Adamek,
Prabu Thiagaraj,
Scott M. Ransom,
Aris Karastergiou,
Wesley Armour
Abstract:
The study of binary pulsars enables tests of general relativity. Orbital motion in binary systems causes the apparent pulsar spin frequency to drift, reducing the sensitivity of periodicity searches. Acceleration searches are methods that account for the effect of orbital acceleration. Existing methods are currently computationally expensive, and the vast amount of data that will be produced by ne…
▽ More
The study of binary pulsars enables tests of general relativity. Orbital motion in binary systems causes the apparent pulsar spin frequency to drift, reducing the sensitivity of periodicity searches. Acceleration searches are methods that account for the effect of orbital acceleration. Existing methods are currently computationally expensive, and the vast amount of data that will be produced by next generation instruments such as the Square Kilometre Array (SKA) necessitates real-time acceleration searches, which in turn requires the use of High Performance Computing (HPC) platforms. We present our implementation of the Correlation Technique for the Fourier Domain Acceleration Search (FDAS) algorithm on Graphics Processor Units (GPUs). The correlation technique is applied as a convolution with multiple Finite Impulse Response filters in the Fourier domain. Two approaches are compared: the first uses the NVIDIA cuFFT library for applying Fast Fourier Transforms (FFTs) on the GPU, and the second contains a custom FFT implementation in GPU shared memory. We find that the FFT shared memory implementation performs between 1.5 and 3.2 times faster than our cuFFT-based application for smaller but sufficient filter sizes. It is also 4 to 6 times faster than the existing GPU and OpenMP implementations of FDAS. This work is part of the AstroAccelerate project, a many-core accelerated time-domain signal processing library for radio astronomy.
△ Less
Submitted 15 April, 2018;
originally announced April 2018.
-
Improved Acceleration of the GPU Fourier Domain Acceleration Search Algorithm
Authors:
Karel Adámek,
Sofia Dimoudi,
Mike Giles,
Wesley Armour
Abstract:
We present an improvement of our implementation of the Correlation Technique for the Fourier Domain Acceleration Search (FDAS) algorithm on Graphics Processor Units (GPUs) (Dimoudi & Armour 2015; Dimoudi et al. 2017). Our new improved convolution code which uses our custom GPU FFT code is between 2.5 and 3.9 times faster the than our cuFFT-based implementation (on an NVIDIA P100) and allows for a…
▽ More
We present an improvement of our implementation of the Correlation Technique for the Fourier Domain Acceleration Search (FDAS) algorithm on Graphics Processor Units (GPUs) (Dimoudi & Armour 2015; Dimoudi et al. 2017). Our new improved convolution code which uses our custom GPU FFT code is between 2.5 and 3.9 times faster the than our cuFFT-based implementation (on an NVIDIA P100) and allows for a wider range of filter sizes then our previous version. By using this new version of our convolution code in FDAS we have achieved 44% performance increase over our previous best implementation. It is also approximately 8 times faster than the existing PRESTO GPU implementation of FDAS (Luo 2013). This work is part of the AstroAccelerate project (Armour et al. 2002), a many-core accelerated time-domain signal processing library for radio astronomy.
△ Less
Submitted 29 November, 2017;
originally announced November 2017.
-
Pulsar Acceleration Searches on the GPU for the Square Kilometre Array
Authors:
Sofia Dimoudi,
Wesley Armour
Abstract:
Pulsar acceleration searches are methods for recovering signals from radio telescopes, that may otherwise be lost due to the effect of orbital acceleration in binary systems. The vast amount of data that will be produced by next generation instruments such as the Square Kilometre Array (SKA) necessitates real-time acceleration searches, which in turn requires the use of HPC platforms. We present o…
▽ More
Pulsar acceleration searches are methods for recovering signals from radio telescopes, that may otherwise be lost due to the effect of orbital acceleration in binary systems. The vast amount of data that will be produced by next generation instruments such as the Square Kilometre Array (SKA) necessitates real-time acceleration searches, which in turn requires the use of HPC platforms. We present our implementation of the Fourier Domain Acceleration Search (FDAS) algorithm on Graphics Processor Units (GPUs) in the context of the SKA, as part of the Astro-Accelerate real-time data processing library, currently under development at the Oxford e-Research Centre (OeRC), University of Oxford.
△ Less
Submitted 24 November, 2015; v1 submitted 23 November, 2015;
originally announced November 2015.