-
NMPO: Near-Memory Computing Profiling and Offloading
Authors:
Stefano Corda,
Madhurya Kumaraswamy,
Ahsan Javed Awan,
Roel Jordans,
Akash Kumar,
Henk Corporaal
Abstract:
Real-world applications are now processing big-data sets, often bottlenecked by the data movement between the compute units and the main memory. Near-memory computing (NMC), a modern data-centric computational paradigm, can alleviate these bottlenecks, thereby improving the performance of applications. The lack of NMC system availability makes simulators the primary evaluation tool for performance…
▽ More
Real-world applications are now processing big-data sets, often bottlenecked by the data movement between the compute units and the main memory. Near-memory computing (NMC), a modern data-centric computational paradigm, can alleviate these bottlenecks, thereby improving the performance of applications. The lack of NMC system availability makes simulators the primary evaluation tool for performance estimation. However, simulators are usually time-consuming, and methods that can reduce this overhead would accelerate the early-stage design process of NMC systems. This work proposes Near-Memory computing Profiling and Offloading (NMPO), a high-level framework capable of predicting NMC offloading suitability employing an ensemble machine learning model. NMPO predicts NMC suitability with an accuracy of 85.6% and, compared to prior works, can reduce the prediction time by using hardware-dependent applications features by up to 3 order of magnitude.
△ Less
Submitted 29 June, 2021;
originally announced June 2021.
-
TDO-CIM: Transparent Detection and Offloading for Computation In-memory
Authors:
Kanishkan Vadivel,
Lorenzo Chelini,
Ali BanaGozar,
Gagandeep Singh,
Stefano Corda,
Roel Jordans,
Henk Corporaal
Abstract:
Computation in-memory is a promising non-von Neumann approach aiming at completely diminishing the data transfer to and from the memory subsystem. Although a lot of architectures have been proposed, compiler support for such architectures is still lagging behind. In this paper, we close this gap by proposing an end-to-end compilation flow for in-memory computing based on the LLVM compiler infrastr…
▽ More
Computation in-memory is a promising non-von Neumann approach aiming at completely diminishing the data transfer to and from the memory subsystem. Although a lot of architectures have been proposed, compiler support for such architectures is still lagging behind. In this paper, we close this gap by proposing an end-to-end compilation flow for in-memory computing based on the LLVM compiler infrastructure. Starting from sequential code, our approach automatically detects, optimizes, and offloads kernels suitable for in-memory acceleration. We demonstrate our compiler tool-flow on the PolyBench/C benchmark suite and evaluate the benefits of our proposed in-memory architecture simulated in Gem5 by comparing it with a state-of-the-art von Neumann architecture.
△ Less
Submitted 30 June, 2020;
originally announced July 2020.
-
Near Memory Acceleration on High Resolution Radio Astronomy Imaging
Authors:
Stefano Corda,
Bram Veenboer,
Ahsan Javed Awan,
Akash Kumar,
Roel Jordans,
Henk Corporaal
Abstract:
Modern radio telescopes like the Square Kilometer Array (SKA) will need to process in real-time exabytes of radio-astronomical signals to construct a high-resolution map of the sky. Near-Memory Computing (NMC) could alleviate the performance bottlenecks due to frequent memory accesses in a state-of-the-art radio-astronomy imaging algorithm. In this paper, we show that a sub-module performing a two…
▽ More
Modern radio telescopes like the Square Kilometer Array (SKA) will need to process in real-time exabytes of radio-astronomical signals to construct a high-resolution map of the sky. Near-Memory Computing (NMC) could alleviate the performance bottlenecks due to frequent memory accesses in a state-of-the-art radio-astronomy imaging algorithm. In this paper, we show that a sub-module performing a two-dimensional fast Fourier transform (2D FFT) is memory bound using CPI breakdown analysis on IBM Power9. Then, we present an NMC approach on FPGA for 2D FFT that outperforms a CPU by up to a factor of 120x and performs comparably to a high-end GPU, while using less bandwidth and memory.
△ Less
Submitted 4 May, 2020;
originally announced May 2020.
-
Near-Memory Computing: Past, Present, and Future
Authors:
Gagandeep Singh,
Lorenzo Chelini,
Stefano Corda,
Ahsan Javed Awan,
Sander Stuijk,
Roel Jordans,
Henk Corporaal,
Albert-Jan Boonstra
Abstract:
The conventional approach of moving data to the CPU for computation has become a significant performance bottleneck for emerging scale-out data-intensive applications due to their limited data reuse. At the same time, the advancement in 3D integration technologies has made the decade-old concept of coupling compute units close to the memory --- called near-memory computing (NMC) --- more viable. P…
▽ More
The conventional approach of moving data to the CPU for computation has become a significant performance bottleneck for emerging scale-out data-intensive applications due to their limited data reuse. At the same time, the advancement in 3D integration technologies has made the decade-old concept of coupling compute units close to the memory --- called near-memory computing (NMC) --- more viable. Processing right at the "home" of data can significantly diminish the data movement problem of data-intensive applications.
In this paper, we survey the prior art on NMC across various dimensions (architecture, applications, tools, etc.) and identify the key challenges and open issues with future research directions. We also provide a glimpse of our approach to near-memory computing that includes i) NMC specific microarchitecture independent application characterization ii) a compiler framework to offload the NMC kernels on our target NMC platform and iii) an analytical model to evaluate the potential of NMC.
△ Less
Submitted 7 August, 2019;
originally announced August 2019.
-
Platform Independent Software Analysis for Near Memory Computing
Authors:
Stefano Corda,
Gagandeep Singh,
Ahsan Javed Awan,
Roel Jordans,
Henk Corporaal
Abstract:
Near-memory Computing (NMC) promises improved performance for the applications that can exploit the features of emerging memory technologies such as 3D-stacked memory. However, it is not trivial to find such applications and specialized tools are needed to identify them. In this paper, we present PISA-NMC, which extends a state-of-the-art hardware agnostic profiling tool with metrics concerning me…
▽ More
Near-memory Computing (NMC) promises improved performance for the applications that can exploit the features of emerging memory technologies such as 3D-stacked memory. However, it is not trivial to find such applications and specialized tools are needed to identify them. In this paper, we present PISA-NMC, which extends a state-of-the-art hardware agnostic profiling tool with metrics concerning memory and parallelism, which are relevant for NMC. The metrics include memory entropy, spatial locality, data-level, and basic-block-level parallelism. By profiling a set of representative applications and correlating the metrics with the application's performance on a simulated NMC system, we verify the importance of those metrics. Finally, we demonstrate which metrics are useful in identifying applications suitable for NMC architectures.
△ Less
Submitted 24 June, 2019;
originally announced June 2019.
-
Memory and Parallelism Analysis Using a Platform-Independent Approach
Authors:
Stefano Corda,
Gagandeep Singh,
Ahsan Javed Awan,
Roel Jordans,
Henk Corporaal
Abstract:
Emerging computing architectures such as near-memory computing (NMC) promise improved performance for applications by reducing the data movement between CPU and memory. However, detecting such applications is not a trivial task. In this ongoing work, we extend the state-of-the-art platform-independent software analysis tool with NMC related metrics such as memory entropy, spatial locality, data-le…
▽ More
Emerging computing architectures such as near-memory computing (NMC) promise improved performance for applications by reducing the data movement between CPU and memory. However, detecting such applications is not a trivial task. In this ongoing work, we extend the state-of-the-art platform-independent software analysis tool with NMC related metrics such as memory entropy, spatial locality, data-level, and basic-block-level parallelism. These metrics help to identify the applications more suitable for NMC architectures.
△ Less
Submitted 18 April, 2019;
originally announced April 2019.