Skip to main content

Showing 1–6 of 6 results for author: Denolf, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2310.10537  [pdf, other

    cs.LG cs.AI

    Microscaling Data Formats for Deep Learning

    Authors: Bita Darvish Rouhani, Ritchie Zhao, Ankit More, Mathew Hall, Alireza Khodamoradi, Summer Deng, Dhruv Choudhary, Marius Cornea, Eric Dellinger, Kristof Denolf, Stosic Dusan, Venmugil Elango, Maximilian Golub, Alexander Heinecke, Phil James-Roxby, Dharmesh Jani, Gaurav Kolhe, Martin Langhammer, Ada Li, Levi Melnick, Maral Mesmakhosroshahi, Andres Rodriguez, Michael Schulte, Rasoul Shafipour, Lei Shao , et al. (8 additional authors not shown)

    Abstract: Narrow bit-width data formats are key to reducing the computational and storage costs of modern deep learning applications. This paper evaluates Microscaling (MX) data formats that combine a per-block scaling factor with narrow floating-point and integer types for individual elements. MX formats balance the competing needs of hardware efficiency, model accuracy, and user friction. Empirical result… ▽ More

    Submitted 19 October, 2023; v1 submitted 16 October, 2023; originally announced October 2023.

  2. arXiv:2303.03509  [pdf, other

    cs.AR

    SPARTA: Spatial Acceleration for Efficient and Scalable Horizontal Diffusion Weather Stencil Computation

    Authors: Gagandeep Singh, Alireza Khodamoradi, Kristof Denolf, Jack Lo, Juan Gómez-Luna, Joseph Melber, Andra Bisca, Henk Corporaal, Onur Mutlu

    Abstract: Fast and accurate climate simulations and weather predictions are critical for understanding and preparing for the impact of climate change. Real-world weather and climate modeling consist of complex compound stencil kernels that do not perform well on conventional architectures. Horizontal diffusion is one such important compound stencil found in many climate and weather prediction models. Recent… ▽ More

    Submitted 9 May, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

  3. arXiv:2301.07247  [pdf, other

    cs.CV cs.LG cs.NE

    Tailor: Altering Skip Connections for Resource-Efficient Inference

    Authors: Olivia Weng, Gabriel Marcano, Vladimir Loncar, Alireza Khodamoradi, Nojan Sheybani, Andres Meza, Farinaz Koushanfar, Kristof Denolf, Javier Mauricio Duarte, Ryan Kastner

    Abstract: Deep neural networks use skip connections to improve training convergence. However, these skip connections are costly in hardware, requiring extra buffers and increasing on- and off-chip memory utilization and bandwidth requirements. In this paper, we show that skip connections can be optimized for hardware when tackled with a hardware-software codesign approach. We argue that while a network's sk… ▽ More

    Submitted 15 September, 2023; v1 submitted 17 January, 2023; originally announced January 2023.

  4. arXiv:2301.02359  [pdf, other

    cs.AR

    CHARM: Composing Heterogeneous Accelerators for Matrix Multiply on Versal ACAP Architecture

    Authors: **ming Zhuang, Jason Lau, Hanchen Ye, Zhuo** Yang, Yubo Du, Jack Lo, Kristof Denolf, Stephen Neuendorffer, Alex Jones, **gtong Hu, Deming Chen, Jason Cong, Peipei Zhou

    Abstract: Dense matrix multiply (MM) serves as one of the most heavily used kernels in deep learning applications. To cope with the high computation demands of these applications, heterogeneous architectures featuring both FPGA and dedicated ASIC accelerators have emerged as promising platforms. For example, the AMD/Xilinx Versal ACAP architecture combines general-purpose CPU cores and programmable logic wi… ▽ More

    Submitted 5 January, 2023; originally announced January 2023.

  5. arXiv:2211.03079  [pdf, other

    cs.AR cs.DC q-bio.GN

    RUBICON: A Framework for Designing Efficient Deep Learning-Based Genomic Basecallers

    Authors: Gagandeep Singh, Mohammed Alser, Kristof Denolf, Can Firtina, Alireza Khodamoradi, Meryem Banu Cavlak, Henk Corporaal, Onur Mutlu

    Abstract: Nanopore sequencing generates noisy electrical signals that need to be converted into a standard string of DNA nucleotide bases using a computational step called basecalling. The accuracy and speed of basecalling have critical implications for all later steps in genome analysis. Many researchers adopt complex deep learning-based models to perform basecalling without considering the compute demands… ▽ More

    Submitted 5 February, 2024; v1 submitted 6 November, 2022; originally announced November 2022.

  6. arXiv:1906.11879  [pdf, other

    cs.CV eess.IV

    Comparing Energy Efficiency of CPU, GPU and FPGA Implementations for Vision Kernels

    Authors: Murad Qasaimeh, Kristof Denolf, Jack Lo, Kees Vissers, Joseph Zambreno, Phillip H. Jones

    Abstract: Develo** high performance embedded vision applications requires balancing run-time performance with energy constraints. Given the mix of hardware accelerators that exist for embedded computer vision (e.g. multi-core CPUs, GPUs, and FPGAs), and their associated vendor optimized vision libraries, it becomes a challenge for developers to navigate this fragmented solution space. To aid with determin… ▽ More

    Submitted 31 May, 2019; originally announced June 2019.

    Comments: 8 pages, Design Automation Conference (DAC), The 15th IEEE International Conference on Embedded Software and Systems, 2019