-
Intensity-sensitive quality assessment of extended sources in astronomical images
Authors:
X. Li,
K. Adamek,
W. Armour
Abstract:
Radio astronomy studies the Universe by observing the radio emissions of celestial bodies. Different methods can be used to recover the sky brightness distribution (SBD), which describes the distribution of celestial sources from recorded data, with the output dependent on the method used. Image quality assessment (IQA) indexes can be used to compare the differences between restored SBDs produced…
▽ More
Radio astronomy studies the Universe by observing the radio emissions of celestial bodies. Different methods can be used to recover the sky brightness distribution (SBD), which describes the distribution of celestial sources from recorded data, with the output dependent on the method used. Image quality assessment (IQA) indexes can be used to compare the differences between restored SBDs produced by different image reconstruction techniques to evaluate the effectiveness of different techniques. However, reconstructed images (for the same SBD) can appear to be very similar, especially when observed by the human visual system (HVS). Hence current structural similarity methods, inspired by the HVS, are not effective. In the past, we have proposed two methods to assess point source images, where low amounts of concentrated information are present in larger regions of noise-like data. But for images that include extended source(s), the increase in complexity of the structure makes the IQA methods for point sources over-sensitive since the important objects cannot be described by isolated point sources. Therefore, in this article we propose augmented Low-Information Similarity Index (augLISI), an improved version of LISI, to assess images including extended source(s). Experiments have been carried out to illustrate how this new IQA method can help with the development and study of astronomical imaging techniques. Note that although we focus on radio astronomical images herein, these IQA methods are also applicable to other astronomical images, and imaging techniques.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Pulscan: Binary pulsar detection using unmatched filters on NVIDIA GPUs
Authors:
Jack White,
Karel Adámek,
Jayanta Roy,
Scott Ransom,
Wesley Armour
Abstract:
The Fourier Domain Acceleration Search (FDAS) and Fourier Domain Jerk Search (FDJS) are proven matched filtering techniques for detecting binary pulsar signatures in time-domain radio astronomy datasets. Next generation radio telescopes such as the SPOTLIGHT project at the GMRT produce data at rates that mandate real-time processing, as storage of the entire captured dataset for subsequent offline…
▽ More
The Fourier Domain Acceleration Search (FDAS) and Fourier Domain Jerk Search (FDJS) are proven matched filtering techniques for detecting binary pulsar signatures in time-domain radio astronomy datasets. Next generation radio telescopes such as the SPOTLIGHT project at the GMRT produce data at rates that mandate real-time processing, as storage of the entire captured dataset for subsequent offline processing is infeasible. The computational demands of FDAS and FDJS make them challenging to implement in real-time detection pipelines, requiring costly high performance computing facilities. To address this we propose Pulscan, an unmatched filtering approach which achieves order-of-magnitude improvements in runtime performance compared to FDAS whilst being able to detect both accelerated and some jerked binary pulsars. We profile the sensitivity of Pulscan using a distribution (N = 10,955) of synthetic binary pulsars and compare its performance with FDAS and FDJS. Our implementation of Pulscan includes an OpenMP version for multicore CPU acceleration, a version for heterogeneous CPU/GPU environments such as NVIDIA Grace Hopper, and a fully optimized NVIDIA GPU implementation for integration into an AstroAccelerate pipeline, which will be deployed in the SPOTLIGHT project at the GMRT. Our results demonstrate that unmatched filtering in Pulscan can serve as an efficient data reduction step, prioritizing datasets for further analysis and focusing human and subsequent computational resources on likely binary pulsar signatures.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
CLEAN algorithm implementation comparisons between popular software packages
Authors:
Daniel Wright,
Karel Adámek,
Wesley Armour
Abstract:
The CLEAN algorithm, first published by Högbom and its later variants such as Multiscale CLEAN (msCLEAN) by Cornwell, has been the most popular tool for deconvolution in radio astronomy. Interferometric imaging used in aperture synthesis radio telescopes requires deconvolution for removal of the telescopes point spread function from the observed images. We have compared source fluxes produced by d…
▽ More
The CLEAN algorithm, first published by Högbom and its later variants such as Multiscale CLEAN (msCLEAN) by Cornwell, has been the most popular tool for deconvolution in radio astronomy. Interferometric imaging used in aperture synthesis radio telescopes requires deconvolution for removal of the telescopes point spread function from the observed images. We have compared source fluxes produced by different implementations of Högbom and msCLEAN (WSCLEAN, CASA) with a prototype implementation of Högbom and msCLEAN for the Square Kilometer Array (SKA) on two datasets. First is a simulation of multiple point sources of known intensity using Högbom, where none of the software packages detected all the simulated point sources to within 1.0% of the simulated values. The second is of supernova remnant G055.7+3.4 taken by the Karl G. Jansky Very Large Array (VLA) using msCLEAN, where each of the software packages produced different images for the same settings.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Toward using GANs in astrophysical Monte-Carlo simulations
Authors:
Ahab Isaac,
Wesley Armour,
Karel Adámek
Abstract:
Accurate modelling of spectra produced by X-ray sources requires the use of Monte-Carlo simulations. These simulations need to evaluate physical processes, such as those occurring in accretion processes around compact objects by sampling a number of different probability distributions. This is computationally time-consuming and could be sped up if replaced by neural networks. We demonstrate, on an…
▽ More
Accurate modelling of spectra produced by X-ray sources requires the use of Monte-Carlo simulations. These simulations need to evaluate physical processes, such as those occurring in accretion processes around compact objects by sampling a number of different probability distributions. This is computationally time-consuming and could be sped up if replaced by neural networks. We demonstrate, on an example of the Maxwell-Jüttner distribution that describes the speed of relativistic electrons, that the generative adversarial network (GAN) is capable of statistically replicating the distribution. The average value of the Kolmogorov-Smirnov test is 0.5 for samples generated by the neural network, showing that the generated distribution cannot be distinguished from the true distribution.
△ Less
Submitted 16 February, 2024;
originally announced February 2024.
-
Part-time Power Measurements: nvidia-smi's Lack of Attention
Authors:
Zeyu Yang,
Karel Adamek,
Wesley Armour
Abstract:
The GPU has emerged as the go-to accelerator for high throughput and parallel workloads, spanning scientific simulations to AI, thanks to its performance and power efficiency. Given that 6 out of the top 10 fastest supercomputers in the world use NVIDIA GPUs and many AI companies each employ 10,000's of NVIDIA GPUs, an accurate understanding of GPU power consumption is essential for making progres…
▽ More
The GPU has emerged as the go-to accelerator for high throughput and parallel workloads, spanning scientific simulations to AI, thanks to its performance and power efficiency. Given that 6 out of the top 10 fastest supercomputers in the world use NVIDIA GPUs and many AI companies each employ 10,000's of NVIDIA GPUs, an accurate understanding of GPU power consumption is essential for making progress to further improve its efficiency. Despite the limited documentation and the lack of understanding of its mechanisms, NVIDIA GPUs' built-in power sensor, providing easily accessible power readings via the nvidia-smi interface, is widely used in energy efficient computing research on GPUs. Our study seeks to elucidate the internal mechanisms of the power readings provided by nvidia-smi and assess the accuracy of the power and energy consumption data. We have developed a suite of micro-benchmarks to profile the behaviour of nvidia-smi power readings and have evaluated them on over 70 different GPUs from all architectural generations since power measurement was first introduced in the 'Fermi' generation. We have identified several unforeseen problems in terms of power/energy measurement using nvidia-smi, for example on the A100 and H100 GPUs only 25% of the runtime is sampled for power consumption, during the other 75% of the time, the GPU can be using drastically different power and nvidia-smi and results presented by it are unaware of this. This along with other findings can lead to a drastic under/overestimation of energy consumed, especially when considering data centres housing tens of thousands of GPUs. We proposed several good practices that help to mitigate these problems. By comparing our results to those measured from an external power-meter, we have reduced the error in the energy measurement by an average of 35% and in some cases by as much as 65% in the test cases we present.
△ Less
Submitted 11 March, 2024; v1 submitted 5 December, 2023;
originally announced December 2023.
-
Accelerating Dedispersion using Many-Core Architectures
Authors:
Jan Novotný,
Karel Adámek,
M. A. Clark,
Mike Giles,
Wesley Armour
Abstract:
Astrophysical radio signals are excellent probes of extreme physical processes that emit them. However, to reach Earth, electromagnetic radiation passes through the ionised interstellar medium (ISM), introducing a frequency-dependent time delay (dispersion) to the emitted signal. Removing dispersion enables searches for transient signals like Fast Radio Bursts (FRB) or repeating signals from isola…
▽ More
Astrophysical radio signals are excellent probes of extreme physical processes that emit them. However, to reach Earth, electromagnetic radiation passes through the ionised interstellar medium (ISM), introducing a frequency-dependent time delay (dispersion) to the emitted signal. Removing dispersion enables searches for transient signals like Fast Radio Bursts (FRB) or repeating signals from isolated pulsars or those in orbit around other compact objects. The sheer volume and high resolution of data that next generation radio telescopes will produce require High-Performance Computing (HPC) solutions and algorithms to be used in time-domain data processing pipelines to extract scientifically valuable results in real-time. This paper presents a state-of-the-art implementation of brute force incoherent dedispersion on NVIDIA GPUs, and on Intel and AMD CPUs. We show that our implementation is 4x faster (8-bit 8192 channels input) than other available solutions and demonstrate, using 11 existing telescopes, that our implementation is at least 20 faster than real-time. This work is part of the AstroAccelerate package.
△ Less
Submitted 9 November, 2023;
originally announced November 2023.
-
A Survey of Feature detection methods for localisation of plain sections of Axial Brain Magnetic Resonance Imaging
Authors:
Jiří Martinů,
Jan Novotný,
Karel Adámek,
Petr Čermák,
Jiří Kozel,
David Školoudík
Abstract:
Matching MRI brain images between patients or map** patients' MRI slices to the simulated atlas of a brain is key to the automatic registration of MRI of a brain. The ability to match MRI images would also enable such applications as indexing and searching MRI images among multiple patients or selecting images from the region of interest. In this work, we have introduced robustness, accuracy and…
▽ More
Matching MRI brain images between patients or map** patients' MRI slices to the simulated atlas of a brain is key to the automatic registration of MRI of a brain. The ability to match MRI images would also enable such applications as indexing and searching MRI images among multiple patients or selecting images from the region of interest. In this work, we have introduced robustness, accuracy and cumulative distance metrics and methodology that allows us to compare different techniques and approaches in matching brain MRI of different patients or matching MRI brain slice to a position in the brain atlas. To that end, we have used feature detection methods AGAST, AKAZE, BRISK, GFTT, HardNet, and ORB, which are established methods in image processing, and compared them on their resistance to image degradation and their ability to match the same brain MRI slice of different patients. We have demonstrated that some of these techniques can correctly match most of the brain MRI slices of different patients. When matching is performed with the atlas of the human brain, their performance is significantly lower. The best performing feature detection method was a combination of SIFT detector and HardNet descriptor that achieved 93% accuracy in matching images with other patients and only 52% accurately matched images when compared to atlas.
△ Less
Submitted 8 February, 2023;
originally announced February 2023.
-
Cutting the cost of pulsar astronomy: Saving time and energy when searching for binary pulsars using NVIDIA GPUs
Authors:
Jack White,
Karel Adamek,
Wes Armour
Abstract:
Using the Fourier Domain Acceleration Search (FDAS) method to search for binary pulsars is a computationally costly process. Next generation radio telescopes will have to perform FDAS in real time, as data volumes are too large to store. FDAS is a matched filtering approach for searching time-domain radio astronomy datasets for the signatures of binary pulsars with approximately linear acceleratio…
▽ More
Using the Fourier Domain Acceleration Search (FDAS) method to search for binary pulsars is a computationally costly process. Next generation radio telescopes will have to perform FDAS in real time, as data volumes are too large to store. FDAS is a matched filtering approach for searching time-domain radio astronomy datasets for the signatures of binary pulsars with approximately linear acceleration. In this paper we will explore how we have reduced the energy cost of an SKA-like implementation of FDAS in AstroAccelerate, utilising a combination of mixed-precision computing and dynamic frequency scaling on NVIDIA GPUs. Combining the two approaches, we have managed to save 58% of the overall energy cost of FDAS with a (<3%) sacrifice in numerical sensitivity.
△ Less
Submitted 24 November, 2022;
originally announced November 2022.
-
Averaged Recurrence Quantification Analysis -- Method omitting the recurrence threshold choice
Authors:
Radim Pánis,
Karel Adámek,
Norbert Marwan
Abstract:
Recurrence quantification analysis (RQA) is a well established method of nonlinear data analysis. In this work we present a new strategy for an almost parameter-free RQA. The approach finally omits the choice of the threshold parameter by calculating the RQA measures for a range of thresholds (in fact recurrence rates). Specifically, we test the ability of the RQA measure determinism, to sort data…
▽ More
Recurrence quantification analysis (RQA) is a well established method of nonlinear data analysis. In this work we present a new strategy for an almost parameter-free RQA. The approach finally omits the choice of the threshold parameter by calculating the RQA measures for a range of thresholds (in fact recurrence rates). Specifically, we test the ability of the RQA measure determinism, to sort data with respect to their signal to noise ratios. We consider a periodic signal, simple chaotic logistic equation, and Lorenz system in the tested data set with different and even very small signal to noise ratios of lengths $10^2, 10^3, 10^4,$ and $10^5$. To make the calculations possible a new effective algorithm was developed for streamlining of the numerical operations on Graphics Processing Unit (GPU).
△ Less
Submitted 17 August, 2022;
originally announced August 2022.
-
Bits missing: Finding exotic pulsars using bfloat16 on NVIDIA GPUs
Authors:
Jack White,
Karel Adamek,
Jayanta Roy,
Sofia Dimoudi,
Scott M. Ransom,
Wesley Armour
Abstract:
The Fourier Domain Acceleration Search (FDAS) is an effective technique for detecting faint binary pulsars in large radio astronomy datasets. This paper quantifies the sensitivity impact of reducing numerical precision in the GPU accelerated FDAS pipeline of the AstroAccelerate software package. The prior implementation used IEEE-754 single-precision in the entire binary pulsar detection pipeline,…
▽ More
The Fourier Domain Acceleration Search (FDAS) is an effective technique for detecting faint binary pulsars in large radio astronomy datasets. This paper quantifies the sensitivity impact of reducing numerical precision in the GPU accelerated FDAS pipeline of the AstroAccelerate software package. The prior implementation used IEEE-754 single-precision in the entire binary pulsar detection pipeline, spending a large fraction of the runtime computing GPU accelerated FFTs. AstroAccelerate has been modified to use bfloat16 (and IEEE754 double-precision to provide a "gold standard" comparison) within the Fourier domain convolution section of the FDAS routine. Approximately 20,000 synthetic pulsar filterbank files representing binary pulsars were generated using SIGPROC with a range of physical parameters. They have been processed using bfloat16, single and double-precision convolutions. All bfloat16 peaks are within 3% of the predicted signal-to-noise ratio of their corresponding single-precision peaks. Of 14,971 "bright" single-precision fundamental peaks above a power of 44.982 (our experimentally measured highest noise value), 14,602 (97.53%) have a peak in the same acceleration and frequency bin in the bfloat16 output plane, whilst in the remaining 369 the nearest peak is located in the adjacent acceleration bin. There is no bin drift measured between the single and double-precision results. The bfloat16 version of FDAS achieves a speedup of approximately 1.6x compared to single-precision. A comparison between AstroAccelerate and the PRESTO software package is presented using observations collected with the GMRT of PSR J1544+4937, a 2.16ms black widow pulsar in a 2.8 hour compact orbit.
△ Less
Submitted 24 June, 2022;
originally announced June 2022.
-
A Novel Greedy Approach To Harmonic Summing Using GPUs
Authors:
Karel Adamek,
Jayanta Roy,
Wesley Armour
Abstract:
Incoherent harmonic summing is a technique which is used to improve the sensitivity of Fourier domain search methods. A one dimensional harmonic sum is used in time-domain radio astronomy as part of the Fourier domain periodicity search, a type of search used to detect isolated single pulsars. The main problem faced when implementing the harmonic sum on many-core architectures, like GPUs, is the v…
▽ More
Incoherent harmonic summing is a technique which is used to improve the sensitivity of Fourier domain search methods. A one dimensional harmonic sum is used in time-domain radio astronomy as part of the Fourier domain periodicity search, a type of search used to detect isolated single pulsars. The main problem faced when implementing the harmonic sum on many-core architectures, like GPUs, is the very unfavourable memory access pattern of the harmonic sum algorithm. The memory access pattern gets worse as the dimensionality of the harmonic sum increases. Here we present a set of algorithms for calculating the harmonic sum that are suited to many-core architectures such as GPUs. We present an evaluation of the sensitivity of these different approaches, and their performance. This work forms part of the AstroAccelerate project which is a GPU accelerated software package for processing time-domain radio astronomy data.
△ Less
Submitted 25 February, 2022;
originally announced February 2022.
-
Implementation of 3D degridding algorithm on the NVIDIA GPUs using CUDA
Authors:
Karel Adámek,
Peter Wortmann,
Bojan Nikolic,
Ben Mort,
Wesley Armour
Abstract:
Practical aperture synthesis imaging algorithms work by iterating between estimating the sky brightness distribution and a comparison of a prediction based on this estimate with the measured data ("visibilities"). Accuracy in the latter step is crucial but is made difficult by irregular and non-planar sampling of data by the telescope. In this work we present a GPU implementation of 3d de-gridding…
▽ More
Practical aperture synthesis imaging algorithms work by iterating between estimating the sky brightness distribution and a comparison of a prediction based on this estimate with the measured data ("visibilities"). Accuracy in the latter step is crucial but is made difficult by irregular and non-planar sampling of data by the telescope. In this work we present a GPU implementation of 3d de-gridding which accurately deals with these two difficulties and is designed for distributed operation. We address the load balancing issues caused by large variation in visibilities that need to be computed. Using CUDA and NVidia GPUs we measure performance up to 1.2 billion visibilities per second.
△ Less
Submitted 25 February, 2021;
originally announced February 2021.
-
Implementing CUDA Streams into AstroAccelerate -- A Case Study
Authors:
Jan Novotný,
Karel Adámek,
Wes Armour
Abstract:
To be able to run tasks asynchronously on NVIDIA GPUs a programmer must explicitly implement asynchronous execution in their code using the syntax of CUDA streams. Streams allow a programmer to launch independent concurrent execution tasks, providing the ability to utilise different functional units on the GPU asynchronously. For example, it is possible to transfer the results from a previous comp…
▽ More
To be able to run tasks asynchronously on NVIDIA GPUs a programmer must explicitly implement asynchronous execution in their code using the syntax of CUDA streams. Streams allow a programmer to launch independent concurrent execution tasks, providing the ability to utilise different functional units on the GPU asynchronously. For example, it is possible to transfer the results from a previous computation performed on input data n-1, over the PCIe bus whilst computing the result for input data n, by placing different tasks in different CUDA streams. The benefit of such an approach is that the time taken for the data transfer between the host and device can be hidden with computation. This case study deals with the implementation of CUDA streams into AstroAccelerate. AstroAccelerate is a GPU accelerated real-time signal processing pipeline for time-domain radio astronomy.
△ Less
Submitted 6 May, 2021; v1 submitted 4 January, 2021;
originally announced January 2021.
-
Efficiency Near the Edge: Increasing the Energy Efficiency of FFTs on GPUs for Real-time Edge Computing
Authors:
Karel Adámek,
Jan Novotný,
Jeyarajan Thiyagalingam,
Wesley Armour
Abstract:
The Square Kilometre Array (SKA) is an international initiative for develo** the world's largest radio telescope with a total collecting area of over a million square meters. The scale of the operation, combined with the remote location of the telescope, requires the use of energy-efficient computational algorithms. This, along with the extreme data rates that will be produced by the SKA and the…
▽ More
The Square Kilometre Array (SKA) is an international initiative for develo** the world's largest radio telescope with a total collecting area of over a million square meters. The scale of the operation, combined with the remote location of the telescope, requires the use of energy-efficient computational algorithms. This, along with the extreme data rates that will be produced by the SKA and the requirement for a real-time observing capability, necessitates in-situ data processing in an edge style computing solution. More generally, energy efficiency in the modern computing landscape is becoming of paramount concern. Whether it be the power budget that can limit some of the world's largest supercomputers, or the limited power available to the smallest Internet-of-Things devices. In this paper, we study the impact of hardware frequency scaling on the energy consumption and execution time of the Fast Fourier Transform (FFT) on NVIDIA GPUs using the cuFFT library. The FFT is used in many areas of science and it is one of the key algorithms used in radio astronomy data processing pipelines. Through the use of frequency scaling, we show that we can lower the power consumption of the NVIDIA V100 GPU when computing the FFT by up to 60% compared to the boost clock frequency, with less than a 10% increase in the execution time. Furthermore, using one common core clock frequency for all tested FFT lengths, we show on average a 50% reduction in power consumption compared to the boost core clock frequency with an increase in the execution time still below 10%. We demonstrate how these results can be used to lower the power consumption of existing data processing pipelines. These savings, when considered over years of operation, can yield significant financial savings, but can also lead to a significant reduction of greenhouse gas emissions.
△ Less
Submitted 9 November, 2021; v1 submitted 13 September, 2020;
originally announced September 2020.
-
Development of production-ready GPU data processing pipeline software for AstroAccelerate
Authors:
Cees Carels,
Karel Adámek,
Jan Novotný,
Wesley Armour
Abstract:
Upcoming large scale telescope projects such as the Square Kilometre Array (SKA) will see high data rates and large data volumes; requiring tools that can analyse telescope event data quickly and accurately. In modern radio telescopes, analysis software forms a core part of the data read out, and long-term software stability and maintainability are essential. AstroAccelerate is a many core acceler…
▽ More
Upcoming large scale telescope projects such as the Square Kilometre Array (SKA) will see high data rates and large data volumes; requiring tools that can analyse telescope event data quickly and accurately. In modern radio telescopes, analysis software forms a core part of the data read out, and long-term software stability and maintainability are essential. AstroAccelerate is a many core accelerated software package that uses NVIDIA(R) GPUs to perform realtime analysis of radio telescope data, and it has been shown to be substantially faster than realtime at processing simulated SKA-like data. AstroAccelerate contains optimised GPU implementations of signal processing tools used in radio astronomy including dedispersion, Fourier domain acceleration search, single pulse detection, and others. This article describes the transformation of AstroAccelerate from a C-like prototype code to a production-ready software library with a C++ API and a Python interface; while preserving compatibility with legacy software that is implemented in C. The design of the software library interfaces, refactoring aspects, and coding techniques are discussed.
△ Less
Submitted 16 January, 2020; v1 submitted 16 December, 2019;
originally announced December 2019.
-
Searching for pulsars in extreme orbits -- GPU acceleration of the Fourier domain 'jerk' search
Authors:
Karel Adámek,
Jan Novotný,
Sofia Dimoudi,
Wesley Armour
Abstract:
Binary pulsars are an important target for radio surveys because they present a natural laboratory for a wide range of astrophysics for example testing general relativity, including detection of gravitational waves. The orbital motion of a pulsar which is locked in a binary system causes a frequency shift (a Doppler shift) in their normally very periodic pulse emissions. These shifts cause a reduc…
▽ More
Binary pulsars are an important target for radio surveys because they present a natural laboratory for a wide range of astrophysics for example testing general relativity, including detection of gravitational waves. The orbital motion of a pulsar which is locked in a binary system causes a frequency shift (a Doppler shift) in their normally very periodic pulse emissions. These shifts cause a reduction in the sensitivity of traditional periodicity searches. To correct this smearing Ransom [2001], Ransom et al. [2002] developed the Fourier domain acceleration search (FDAS) which uses a matched filtering technique. This method is however limited to a constant pulsar acceleration. Therefore, Andersen and Ransom [2018] broadened the Fourier domain acceleration search to account also for a linear change in the acceleration by implementing the Fourier domain "jerk" search into the PRESTO software package. This extension increases the number of matched filters used significantly. We have implemented the Fourier domain "jerk" search (JERK) on GPUs using CUDA. We have achieved 90x performance increase when compared to the parallel implementation of JERK in PRESTO. This work is part of the AstroAccelerate project Armour et al. [2019], a many-core accelerated time-domain signal processing library for radio astronomy.
△ Less
Submitted 4 November, 2019;
originally announced November 2019.
-
Single Pulse Detection Algorithms for Real-time Fast Radio Burst Searches using GPUs
Authors:
Karel Adamek,
Wesley Armour
Abstract:
The detection of non-repeating or irregular events in time-domain radio astronomy has gained importance over the last decade due to the discovery of fast radio bursts. Existing or upcoming radio telescopes are gathering more and more data and consequently the software, which is an important part of these telescopes, must process large data volumes at high data rates. Data has to be searched throug…
▽ More
The detection of non-repeating or irregular events in time-domain radio astronomy has gained importance over the last decade due to the discovery of fast radio bursts. Existing or upcoming radio telescopes are gathering more and more data and consequently the software, which is an important part of these telescopes, must process large data volumes at high data rates. Data has to be searched through to detect new and interesting events, often in real-time. These requirements necessitate new and fast algorithms which must process data quickly and accurately. In this work we present new algorithms for single pulse detection using boxcar filters. We have quantified the signal loss introduced by single pulse detection algorithms which use boxcar filters and based on these results, we have designed two distinct "lossy" algorithms. Our lossy algorithms use an incomplete set of boxcar filters to accelerate detection at the expense of a small reduction in detected signal power. We present formulae for signal loss, descriptions of our algorithms and their parallel implementation on NVIDIA GPUs using CUDA. We also present tests of correctness, tests on artificial data and the performance achieved. Our implementation can process SKA-MID-like data 266$\times$ faster than real-time on a NVIDIA P100 GPU and 500x faster than real-time on a NVIDIA Titan V GPU with a mean signal power loss of 7%. We conclude with prospects for single pulse detection for beyond SKA era, nanosecond time resolution radio astronomy.
△ Less
Submitted 27 April, 2020; v1 submitted 18 October, 2019;
originally announced October 2019.
-
GPU Fast Convolution via the Overlap-and-Save Method in Shared Memory
Authors:
Karel Adámek,
Sofia Dimoudi,
Mike Giles,
Wesley Armour
Abstract:
We present an implementation of the overlap-and-save method, a method for the convolution of very long signals with short response functions, which is tailored to GPUs. We have implemented several FFT algorithms (using the CUDA programming language) which exploit GPU shared memory, allowing for GPU accelerated convolution. We compare our implementation with an implementation of the overlap-and-sav…
▽ More
We present an implementation of the overlap-and-save method, a method for the convolution of very long signals with short response functions, which is tailored to GPUs. We have implemented several FFT algorithms (using the CUDA programming language) which exploit GPU shared memory, allowing for GPU accelerated convolution. We compare our implementation with an implementation of the overlap-and-save algorithm utilizing the NVIDIA FFT library (cuFFT). We demonstrate that by using a shared memory based FFT we can achieved significant speed-ups for certain problem sizes and lower the memory requirements of the overlap-and-save method on GPUs.
△ Less
Submitted 10 April, 2020; v1 submitted 4 October, 2019;
originally announced October 2019.
-
A GPU implementation of the harmonic sum algorithm
Authors:
Karel Adámek,
Wesley Armour
Abstract:
Time-domain radio astronomy utilizes a harmonic sum algorithm as part of the Fourier domain periodicity search, this type of search is used to discover single pulsars. The harmonic sum algorithm is also used as part of the Fourier domain acceleration search which aims to discover pulsars that are locked in orbit around another pulsar or compact object. However porting the harmonic sum to many-core…
▽ More
Time-domain radio astronomy utilizes a harmonic sum algorithm as part of the Fourier domain periodicity search, this type of search is used to discover single pulsars. The harmonic sum algorithm is also used as part of the Fourier domain acceleration search which aims to discover pulsars that are locked in orbit around another pulsar or compact object. However porting the harmonic sum to many-core architectures like GPUs is not a straightforward task. The main problem that must be overcome is the very unfavourable memory access pattern, which gets worse as the dimensionality of the harmonic sum increases. We present a set of algorithms for calculating the harmonic sum that are more suited to many-core architectures such as GPUs. We present an evaluation of the sensitivity of these different approaches, and their performance. This work forms part of the AstroAccelerate project which is a GPU accelerated software package for processing time-domain radio astronomy data.
△ Less
Submitted 6 December, 2018;
originally announced December 2018.
-
A GPU implementation of the Correlation Technique for Real-time Fourier Domain Pulsar Acceleration Searches
Authors:
Sofia Dimoudi,
Karel Adamek,
Prabu Thiagaraj,
Scott M. Ransom,
Aris Karastergiou,
Wesley Armour
Abstract:
The study of binary pulsars enables tests of general relativity. Orbital motion in binary systems causes the apparent pulsar spin frequency to drift, reducing the sensitivity of periodicity searches. Acceleration searches are methods that account for the effect of orbital acceleration. Existing methods are currently computationally expensive, and the vast amount of data that will be produced by ne…
▽ More
The study of binary pulsars enables tests of general relativity. Orbital motion in binary systems causes the apparent pulsar spin frequency to drift, reducing the sensitivity of periodicity searches. Acceleration searches are methods that account for the effect of orbital acceleration. Existing methods are currently computationally expensive, and the vast amount of data that will be produced by next generation instruments such as the Square Kilometre Array (SKA) necessitates real-time acceleration searches, which in turn requires the use of High Performance Computing (HPC) platforms. We present our implementation of the Correlation Technique for the Fourier Domain Acceleration Search (FDAS) algorithm on Graphics Processor Units (GPUs). The correlation technique is applied as a convolution with multiple Finite Impulse Response filters in the Fourier domain. Two approaches are compared: the first uses the NVIDIA cuFFT library for applying Fast Fourier Transforms (FFTs) on the GPU, and the second contains a custom FFT implementation in GPU shared memory. We find that the FFT shared memory implementation performs between 1.5 and 3.2 times faster than our cuFFT-based application for smaller but sufficient filter sizes. It is also 4 to 6 times faster than the existing GPU and OpenMP implementations of FDAS. This work is part of the AstroAccelerate project, a many-core accelerated time-domain signal processing library for radio astronomy.
△ Less
Submitted 15 April, 2018;
originally announced April 2018.
-
Improved Acceleration of the GPU Fourier Domain Acceleration Search Algorithm
Authors:
Karel Adámek,
Sofia Dimoudi,
Mike Giles,
Wesley Armour
Abstract:
We present an improvement of our implementation of the Correlation Technique for the Fourier Domain Acceleration Search (FDAS) algorithm on Graphics Processor Units (GPUs) (Dimoudi & Armour 2015; Dimoudi et al. 2017). Our new improved convolution code which uses our custom GPU FFT code is between 2.5 and 3.9 times faster the than our cuFFT-based implementation (on an NVIDIA P100) and allows for a…
▽ More
We present an improvement of our implementation of the Correlation Technique for the Fourier Domain Acceleration Search (FDAS) algorithm on Graphics Processor Units (GPUs) (Dimoudi & Armour 2015; Dimoudi et al. 2017). Our new improved convolution code which uses our custom GPU FFT code is between 2.5 and 3.9 times faster the than our cuFFT-based implementation (on an NVIDIA P100) and allows for a wider range of filter sizes then our previous version. By using this new version of our convolution code in FDAS we have achieved 44% performance increase over our previous best implementation. It is also approximately 8 times faster than the existing PRESTO GPU implementation of FDAS (Luo 2013). This work is part of the AstroAccelerate project (Armour et al. 2002), a many-core accelerated time-domain signal processing library for radio astronomy.
△ Less
Submitted 29 November, 2017;
originally announced November 2017.
-
A Real-time Single Pulse Detection Algorithm for GPUs
Authors:
Karel Adámek,
Wesley Armour
Abstract:
The detection of non-repeating events in the radio spectrum has become an important area of study in radio astronomy over the last decade due to the discovery of fast radio bursts (FRBs). We have implemented a single pulse detection algorithm, for NVIDIA GPUs, which use boxcar filters of varying widths. Our code performs the calculation of standard deviation, matched filtering by using boxcar filt…
▽ More
The detection of non-repeating events in the radio spectrum has become an important area of study in radio astronomy over the last decade due to the discovery of fast radio bursts (FRBs). We have implemented a single pulse detection algorithm, for NVIDIA GPUs, which use boxcar filters of varying widths. Our code performs the calculation of standard deviation, matched filtering by using boxcar filters and thresholding based on the signal-to-noise ratio. We present our parallel implementation of our single pulse detection algorithm. Our GPU algorithm is approximately 17x faster than our current CPU OpenMP code (NVIDIA Titan XP vs Intel E5-2650v3). This code is part of the AstroAccelerate project which is a many-core accelerated time-domain signal processing code for radio astronomy. This work allows our AstroAccelerate code to perform a single pulse search on SKA-like data 4.3x faster than real-time.
△ Less
Submitted 29 November, 2016;
originally announced November 2016.
-
Constraining models of twin peak quasi-periodic oscillations with realistic neutron star equations of state
Authors:
Gabriel Török,
Kateřina Goluchová,
Martin Urbanec,
Eva Šrámková,
Karel Adámek,
Gabriela Urbancová,
Tomáš Pecháček,
Pavel Bakala,
Zdeněk Stuchlík,
Jiří Horák,
Jakub Juryšek
Abstract:
Twin-peak quasi-periodic oscillations (QPOs) are observed in the X-ray power-density spectra of several accreting low-mass neutron star (NS) binaries. In our previous work we have considered several QPO models. We have identified and explored mass-angular-momentum relations implied by individual QPO models for the atoll source 4U 1636-53. In this paper we extend our study and confront QPO models w…
▽ More
Twin-peak quasi-periodic oscillations (QPOs) are observed in the X-ray power-density spectra of several accreting low-mass neutron star (NS) binaries. In our previous work we have considered several QPO models. We have identified and explored mass-angular-momentum relations implied by individual QPO models for the atoll source 4U 1636-53. In this paper we extend our study and confront QPO models with various NS equations of state (EoS). We start with simplified calculations assuming Kerr background geometry and then present results of detailed calculations considering the influence of NS quadrupole moment (related to rotationally induced NS oblateness) assuming Hartle-Thorne spacetimes. We show that the application of concrete EoS together with a particular QPO model yields a specific mass-angular-momentum relation. However, we demonstrate that the degeneracy in mass and angular momentum can be removed when the NS spin frequency inferred from the X-ray burst observations is considered. We inspect a large set of EoS and discuss their compatibility with the considered QPO models. We conclude that when the NS spin frequency in 4U 1636-53 is close to 580Hz we can exclude 51 from 90 of the considered combinations of EoS and QPO models. We also discuss additional restrictions that may exclude even more combinations. Namely, there are 13 EOS compatible with the observed twin peak QPOs and the relativistic precession model. However, when considering the low frequency QPOs and Lense-Thirring precession, only 5 EOS are compatible with the model.
△ Less
Submitted 18 November, 2016;
originally announced November 2016.
-
A polyphase filter for many-core architectures
Authors:
Karel Adámek,
Jan Novotný,
Wes Armour
Abstract:
In this article we discuss our implementation of a polyphase filter for real-time data processing in radio astronomy. We describe in detail our implementation of the polyphase filter algorithm and its behaviour on three generations of NVIDIA GPU cards, on dual Intel Xeon CPUs and the Intel Xeon Phi (Knights Corner) platforms. All of our implementations aim to exploit the potential for data reuse t…
▽ More
In this article we discuss our implementation of a polyphase filter for real-time data processing in radio astronomy. We describe in detail our implementation of the polyphase filter algorithm and its behaviour on three generations of NVIDIA GPU cards, on dual Intel Xeon CPUs and the Intel Xeon Phi (Knights Corner) platforms. All of our implementations aim to exploit the potential for data reuse that the algorithm offers. Our GPU implementations explore two different methods for achieving this, the first makes use of L1/Texture cache, the second uses shared memory. We discuss the usability of each of our implementations along with their behaviours. We measure performance in execution time, which is a critical factor for real-time systems, we also present results in terms of bandwidth (GB/s), compute (GFlop/s) and type conversions (GTc/s). We include a presentation of our results in terms of the sample rate which can be processed in real-time by a chosen platform, which more intuitively describes the expected performance in a signal processing setting. Our findings show that, for the GPUs considered, the performance of our polyphase filter when using lower precision input data is limited by type conversions rather than device bandwidth. We compare these results to an implementation on the Xeon Phi. We show that our Xeon Phi implementation has a performance that is 1.47x to 1.95x greater than our CPU implementation, however is not insufficient to compete with the performance of GPUs. We conclude with a comparison of our best performing code to two other implementations of the polyphase filter, showing that our implementation is faster in nearly all cases. This work forms part of the Astro-Accelerate project, a many-core accelerated real-time data processing library for digital signal processing of time-domain radio astronomy data.
△ Less
Submitted 21 April, 2016; v1 submitted 11 November, 2015;
originally announced November 2015.
-
The Implementation of a Real-Time Polyphase Filter
Authors:
Karel Adámek,
Jan Novotný,
Wes Armour
Abstract:
In this article we study the suitability of dierent computational accelerators for the task of real-time data processing. The algorithm used for comparison is the polyphase filter, a standard tool in signal processing and a well established algorithm. We measure performance in FLOPs and execution time, which is a critical factor for real-time systems. For our real-time studies we have chosen a dat…
▽ More
In this article we study the suitability of dierent computational accelerators for the task of real-time data processing. The algorithm used for comparison is the polyphase filter, a standard tool in signal processing and a well established algorithm. We measure performance in FLOPs and execution time, which is a critical factor for real-time systems. For our real-time studies we have chosen a data rate of 6.5GB/s, which is the estimated data rate for a single channel on the SKAs Low Frequency Aperture Array. Our findings how that GPUs are the most likely candidate for real-time data processing. GPUs are better in both performance and power consumption.
△ Less
Submitted 12 November, 2014;
originally announced November 2014.
-
Appearance of innermost stable circular orbits of accretion discs around rotating neutron stars
Authors:
G. Torok,
M. Urbanec,
K. Adamek,
G. Urbancova
Abstract:
The innermost stable cicular orbit (ISCO) of an accretion disc orbiting a neutron star (NS) is often assumed a unique prediction of general relativity. However, it has been argued that ISCO also appears around highly elliptic bodies described by Newtonian theory. In this sense, the behaviour of an ISCO around a rotating oblate neutron star is formed by the interplay between relativistic and Newton…
▽ More
The innermost stable cicular orbit (ISCO) of an accretion disc orbiting a neutron star (NS) is often assumed a unique prediction of general relativity. However, it has been argued that ISCO also appears around highly elliptic bodies described by Newtonian theory. In this sense, the behaviour of an ISCO around a rotating oblate neutron star is formed by the interplay between relativistic and Newtonian effects. Here we briefly explore the consequences of this interplay using a straightforward analytic approach as well as numerical models that involve modern NS equations of state. We examine the ratio K between the ISCO radius and the radius of the neutron star. We find that, with growing NS spin, the ratio K first decreases, but then starts to increase. This non-monotonic behaviour of K can give rise to a neutron star spin interval in which ISCO appears for two very different ranges of NS mass. This may strongly affect the distribution of neutron stars that have an ISCO (ISCO-NS). When (all) neutron stars are distributed around a high mass M0, the ISCO-NS spin distribution is roughly the same as the spin distribution corresponding to all neutron stars. In contrast, if M0 is low, the ISCO-NS distribution can only have a peak around a high value of spin. Finally, an intermediate value of M0 can imply an ISCO-NS distribution divided into two distinct groups of slow and fast rotators. Our findings have immediate astrophysical applications. They can be used for example to distinguish between different models of high-frequency quasiperiodic oscillations observed in low-mass NS X-ray binaries.
△ Less
Submitted 14 March, 2014;
originally announced March 2014.