Skip to main content

Showing 1–8 of 8 results for author: Stuijk, S

.
  1. arXiv:2405.14060  [pdf, ps, other

    cs.LG physics.comp-ph

    Probabilistic Inference in the Era of Tensor Networks and Differential Programming

    Authors: Martin Roa-Villescas, Xuanzhao Gao, Sander Stuijk, Henk Corporaal, **-Guo Liu

    Abstract: Probabilistic inference is a fundamental task in modern machine learning. Recent advances in tensor network (TN) contraction algorithms have enabled the development of better exact inference methods. However, many common inference tasks in probabilistic graphical models (PGMs) still lack corresponding TN-based adaptations. In this work, we advance the connection between PGMs and TNs by formulating… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 12 pages, 4 figures

  2. arXiv:2212.00873  [pdf, other

    cs.AR

    CONVOLVE: Smart and seamless design of smart edge processors

    Authors: M. Gomony, F. Putter, A. Gebregiorgis, G. Paulin, L. Mei, V. Jain, S. Hamdioui, V. Sanchez, T. Grosser, M. Geilen, M. Verhelst, F. Zenke, F. Gurkaynak, B. Bruin, S. Stuijk, S. Davidson, S. De, M. Ghogho, A. Jimborean, S. Eissa, L. Benini, D. Soudris, R. Bishnoi, S. Ainsworth, F. Corradi , et al. (3 additional authors not shown)

    Abstract: With the rise of Deep Learning (DL), our world braces for AI in every edge device, creating an urgent need for edge-AI SoCs. This SoC hardware needs to support high throughput, reliable and secure AI processing at Ultra Low Power (ULP), with a very short time to market. With its strong legacy in edge solutions and open processing platforms, the EU is well-positioned to become a leader in this SoC… ▽ More

    Submitted 2 May, 2023; v1 submitted 1 December, 2022; originally announced December 2022.

  3. arXiv:2208.10606  [pdf, other

    cs.AR cs.AI cs.LG

    LEAPER: Fast and Accurate FPGA-based System Performance Prediction via Transfer Learning

    Authors: Gagandeep Singh, Dionysios Diamantopoulos, Juan Gómez-Luna, Sander Stuijk, Henk Corporaal, Onur Mutlu

    Abstract: Machine learning has recently gained traction as a way to overcome the slow accelerator generation and implementation process on an FPGA. It can be used to build performance and resource usage models that enable fast early-stage design space exploration. First, training requires large amounts of data (features extracted from design synthesis and implementation tools), which is cost-inefficient bec… ▽ More

    Submitted 2 October, 2022; v1 submitted 22 August, 2022; originally announced August 2022.

  4. Dissecting Tensor Cores via Microbenchmarks: Latency, Throughput and Numeric Behaviors

    Authors: Wei Sun, Ang Li, Tong Geng, Sander Stuijk, Henk Corporaal

    Abstract: Tensor Cores have been an important unit to accelerate Fused Matrix Multiplication Accumulation (MMA) in all NVIDIA GPUs since Volta Architecture. To program Tensor Cores, users have to use either legacy wmma APIs or current mma APIs. Legacy wmma APIs are more easy-to-use but can only exploit limited features and power of Tensor Cores. Specifically, wmma APIs support fewer operand shapes and can n… ▽ More

    Submitted 24 November, 2022; v1 submitted 6 June, 2022; originally announced June 2022.

  5. arXiv:2205.07394  [pdf, other

    cs.AR cs.AI cs.DC cs.LG

    Sibyl: Adaptive and Extensible Data Placement in Hybrid Storage Systems Using Online Reinforcement Learning

    Authors: Gagandeep Singh, Rakesh Nadig, Jisung Park, Rahul Bera, Nastaran Ha**azar, David Novo, Juan Gómez-Luna, Sander Stuijk, Henk Corporaal, Onur Mutlu

    Abstract: Hybrid storage systems (HSS) use multiple different storage devices to provide high and scalable storage capacity at high performance. Recent research proposes various techniques that aim to accurately identify performance-critical data to place it in a "best-fit" storage device. Unfortunately, most of these techniques are rigid, which (1) limits their adaptivity to perform well for a wide range o… ▽ More

    Submitted 16 November, 2023; v1 submitted 15 May, 2022; originally announced May 2022.

  6. arXiv:2107.08716  [pdf, other

    cs.AR cs.DC

    Accelerating Weather Prediction using Near-Memory Reconfigurable Fabric

    Authors: Gagandeep Singh, Dionysios Diamantopoulos, Juan Gómez-Luna, Christoph Hagleitner, Sander Stuijk, Henk Corporaal, Onur Mutlu

    Abstract: Ongoing climate change calls for fast and accurate weather and climate modeling. However, when solving large-scale weather prediction simulations, state-of-the-art CPU and GPU implementations suffer from limited performance and high energy consumption. These implementations are dominated by complex irregular memory access patterns and low arithmetic intensity that pose fundamental challenges to ac… ▽ More

    Submitted 21 December, 2021; v1 submitted 19 July, 2021; originally announced July 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:2009.08241, arXiv:2106.06433

  7. arXiv:2009.08241  [pdf, other

    cs.AR

    NERO: A Near High-Bandwidth Memory Stencil Accelerator for Weather Prediction Modeling

    Authors: Gagandeep Singh, Dionysios Diamantopoulos, Christoph Hagleitner, Juan Gomez-Luna, Sander Stuijk, Onur Mutlu, Henk Corporaal

    Abstract: Ongoing climate change calls for fast and accurate weather and climate modeling. However, when solving large-scale weather prediction simulations, state-of-the-art CPU and GPU implementations suffer from limited performance and high energy consumption. These implementations are dominated by complex irregular memory access patterns and low arithmetic intensity that pose fundamental challenges to ac… ▽ More

    Submitted 17 September, 2020; originally announced September 2020.

    Comments: This paper appears in FPL 2020

  8. arXiv:1908.02640  [pdf, other

    cs.AR cs.DC cs.PF

    Near-Memory Computing: Past, Present, and Future

    Authors: Gagandeep Singh, Lorenzo Chelini, Stefano Corda, Ahsan Javed Awan, Sander Stuijk, Roel Jordans, Henk Corporaal, Albert-Jan Boonstra

    Abstract: The conventional approach of moving data to the CPU for computation has become a significant performance bottleneck for emerging scale-out data-intensive applications due to their limited data reuse. At the same time, the advancement in 3D integration technologies has made the decade-old concept of coupling compute units close to the memory --- called near-memory computing (NMC) --- more viable. P… ▽ More

    Submitted 7 August, 2019; originally announced August 2019.

    Comments: Preprint