Skip to main content

Showing 1–10 of 10 results for author: Riera, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2310.18181  [pdf, other

    cs.AR

    An Energy-Efficient Near-Data Processing Accelerator for DNNs that Optimizes Data Accesses

    Authors: Bahareh Khabbazan, Marc Riera, Antonio González

    Abstract: The constant growth of DNNs makes them challenging to implement and run efficiently on traditional compute-centric architectures. Some accelerators have attempted to add more compute units and on-chip buffers to solve the memory wall problem without much success, and sometimes even worsening the issue since more compute units also require higher memory bandwidth. Prior works have proposed the desi… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

  2. arXiv:2306.16430  [pdf, other

    cs.LG cs.AR

    DNA-TEQ: An Adaptive Exponential Quantization of Tensors for DNN Inference

    Authors: Bahareh Khabbazan, Marc Riera, Antonio González

    Abstract: Quantization is commonly used in Deep Neural Networks (DNNs) to reduce the storage and computational complexity by decreasing the arithmetical precision of activations and weights, a.k.a. tensors. Efficient hardware architectures employ linear quantization to enable the deployment of recent DNNs onto embedded systems and mobile devices. However, linear uniform quantization cannot usually reduce th… ▽ More

    Submitted 22 November, 2023; v1 submitted 28 June, 2023; originally announced June 2023.

    Comments: 10 pages, 8 figures, 5 tables

  3. arXiv:2306.16298  [pdf, other

    cs.AR

    ReDy: A Novel ReRAM-centric Dynamic Quantization Approach for Energy-efficient CNN Inference

    Authors: Mohammad Sabri, Marc Riera, Antonio González

    Abstract: The primary operation in DNNs is the dot product of quantized input activations and weights. Prior works have proposed the design of memory-centric architectures based on the Processing-In-Memory (PIM) paradigm. Resistive RAM (ReRAM) technology is especially appealing for PIM-based DNN accelerators due to its high density to store weights, low leakage energy, low read latency, and high performance… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

    Comments: 13 pages, 16 figures, 4 Tables

  4. arXiv:2112.12630  [pdf, other

    cs.AR cs.LG

    A Survey of Near-Data Processing Architectures for Neural Networks

    Authors: Mehdi Hassanpour, Marc Riera, Antonio González

    Abstract: Data-intensive workloads and applications, such as machine learning (ML), are fundamentally limited by traditional computing systems based on the von-Neumann architecture. As data movement operations and energy consumption become key bottlenecks in the design of computing systems, the interest in unconventional approaches such as Near-Data Processing (NDP), machine learning, and especially neural… ▽ More

    Submitted 23 December, 2021; originally announced December 2021.

  5. arXiv:2112.10037  [pdf, other

    cs.PF cs.AR

    FSpGEMM: An OpenCL-based HPC Framework for Accelerating General Sparse Matrix-Matrix Multiplication on FPGAs

    Authors: Erfan Bank Tavakoli, Michael Riera, Masudul Hassan Quraishi, Fengbo Ren

    Abstract: General sparse matrix-matrix multiplication (SpGEMM) is an integral part of many scientific computing, high-performance computing (HPC), and graph analytic applications. This paper presents a new compressed sparse vector (CSV) format for representing sparse matrices and FSpGEMM, an OpenCL-based HPC framework for accelerating general sparse matrix-matrix multiplication on FPGAs. The proposed FSpGEM… ▽ More

    Submitted 18 December, 2021; originally announced December 2021.

    Comments: 12 pages

  6. arXiv:2107.09408  [pdf, other

    cs.AR cs.LG

    CREW: Computation Reuse and Efficient Weight Storage for Hardware-accelerated MLPs and RNNs

    Authors: Marc Riera, Jose-Maria Arnau, Antonio Gonzalez

    Abstract: Deep Neural Networks (DNNs) have achieved tremendous success for cognitive applications. The core operation in a DNN is the dot product between quantized inputs and weights. Prior works exploit the weight/input repetition that arises due to quantization to avoid redundant computations in Convolutional Neural Networks (CNNs). However, in this paper we show that their effectiveness is severely limit… ▽ More

    Submitted 11 March, 2022; v1 submitted 20 July, 2021; originally announced July 2021.

  7. arXiv:2106.13645  [pdf, other

    cs.DC

    FLASH 1.0: A Software Framework for Rapid Parallel Deployment and Enhancing Host Code Portability in Heterogeneous Computing

    Authors: Michael Riera, Masudul Hassan Quraishi, Erfan Bank Tavakoli, Fengbo Ren

    Abstract: This paper presents FLASH 1.0, a C++-based software framework for rapid parallel deployment and enhancing host code portability in heterogeneous computing. FLASH takes a novel approach in describing kernels and dynamically dispatching them in a hardware-agnostic manner. FLASH features truly hardware-agnostic frontend interfaces, which unify the compile-time control flow and enforce a portability-o… ▽ More

    Submitted 5 July, 2023; v1 submitted 25 June, 2021; originally announced June 2021.

    Comments: 10 pages

  8. arXiv:2011.10896  [pdf, other

    cs.DC cs.CL cs.PF

    HALO 1.0: A Hardware-agnostic Accelerator Orchestration Framework for Enabling Hardware-agnostic Programming with True Performance Portability for Heterogeneous HPC

    Authors: Michael Riera, Erfan Bank Tavakoli, Masudul Hassan Quraishi, Fengbo Ren

    Abstract: This paper presents HALO 1.0, an open-ended extensible multi-agent software framework that implements a set of proposed hardware-agnostic accelerator orchestration (HALO) principles. HALO implements a novel compute-centric message passing interface (C^2MPI) specification for enabling the performance portable execution of a hardware-agnostic host application across heterogeneous accelerators. The e… ▽ More

    Submitted 6 July, 2022; v1 submitted 21 November, 2020; originally announced November 2020.

    Comments: 37 pages

  9. arXiv:2010.05629  [pdf

    cs.NI

    A Survey on Future Railway Radio Communications Services: Challenges and Opportunities

    Authors: Juan Moreno Garcia-Loygorri, Jose Manuel Riera, Leandro de Haro, Carlos Rodriguez

    Abstract: Radio communications is one of the most disruptive technologies in railways, enabling a huge set of value-added services that greatly improve many aspects of railways, making them more efficient, safer, and profitable. Lately, some major technologies like ERTMS for high-speed railways and CBTC for subways have made possible a reduction of headway and increased safety never before seen in this fiel… ▽ More

    Submitted 5 October, 2020; originally announced October 2020.

    Journal ref: IEEE Communications Magazine, 2015

  10. arXiv:1906.02535  [pdf, other

    cs.LG stat.ML

    (Pen-) Ultimate DNN Pruning

    Authors: Marc Riera, Jose-Maria Arnau, Antonio Gonzalez

    Abstract: DNN pruning reduces memory footprint and computational work of DNN-based solutions to improve performance and energy-efficiency. An effective pruning scheme should be able to systematically remove connections and/or neurons that are unnecessary or redundant, reducing the DNN size without any loss in accuracy. In this paper we show that prior pruning schemes require an extremely time-consuming iter… ▽ More

    Submitted 6 June, 2019; originally announced June 2019.