Skip to main content

Showing 1–4 of 4 results for author: Colagrande, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.15068  [pdf, other

    cs.AR

    Occamy: A 432-Core 28.1 DP-GFLOP/s/W 83% FPU Utilization Dual-Chiplet, Dual-HBM2E RISC-V-based Accelerator for Stencil and Sparse Linear Algebra Computations with 8-to-64-bit Floating-Point Support in 12nm FinFET

    Authors: Gianna Paulin, Paul Scheffler, Thomas Benz, Matheus Cavalcante, Tim Fischer, Manuel Eggimann, Yichao Zhang, Nils Wistoff, Luca Bertaccini, Luca Colagrande, Gianmarco Ottavi, Frank K. Gürkaynak, Davide Rossi, Luca Benini

    Abstract: We present Occamy, a 432-core RISC-V dual-chiplet 2.5D system for efficient sparse linear algebra and stencil computations on FP64 and narrow (32-, 16-, 8-bit) SIMD FP data. Occamy features 48 clusters of RISC-V cores with custom extensions, two 64-bit host cores, and a latency-tolerant multi-chiplet interconnect and memory system with 32 GiB of HBM2E. It achieves leading-edge utilization on stenc… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 2 pages, 7 figures. Accepted at the 2024 IEEE Symposium on VLSI Technology & Circuits

  2. arXiv:2405.19284  [pdf, other

    cs.DC cs.AI cs.AR

    Optimizing Foundation Model Inference on a Many-tiny-core Open-source RISC-V Platform

    Authors: Viviane Potocnik, Luca Colagrande, Tim Fischer, Luca Bertaccini, Daniele Jahier Pagliari, Alessio Burrello, Luca Benini

    Abstract: Transformer-based foundation models have become crucial for various domains, most notably natural language processing (NLP) or computer vision (CV). These models are predominantly deployed on high-performance GPUs or hardwired accelerators with highly customized, proprietary instruction sets. Until now, limited attention has been given to RISC-V-based general-purpose platforms. In our work, we pre… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 14 pages, 10 figures, 4 tables, IEEE Transactions on Circuits and Systems for Artificial Intelligence

    ACM Class: C.4; C.3; I.2

  3. arXiv:2404.05303  [pdf, other

    cs.MS cs.AR

    SARIS: Accelerating Stencil Computations on Energy-Efficient RISC-V Compute Clusters with Indirect Stream Registers

    Authors: Paul Scheffler, Luca Colagrande, Luca Benini

    Abstract: Stencil codes are performance-critical in many compute-intensive applications, but suffer from significant address calculation and irregular memory access overheads. This work presents SARIS, a general and highly flexible methodology for stencil acceleration using register-mapped indirect streams. We demonstrate SARIS for various stencil codes on an eight-core RISC-V compute cluster with indirect… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: 6 pages, 5 figures, 2 tables. Accepted at DAC 2024

  4. arXiv:2404.01908  [pdf, other

    cs.AR cs.DC

    Optimizing Offload Performance in Heterogeneous MPSoCs

    Authors: Luca Colagrande, Luca Benini

    Abstract: Heterogeneous multi-core architectures combine a few "host" cores, optimized for single-thread performance, with many small energy-efficient "accelerator" cores for data-parallel processing, on a single chip. Offloading a computation to the many-core acceleration fabric introduces a communication and synchronization cost which reduces the speedup attainable on the accelerator, particularly for sma… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: 2 pages, 1 figure. Accepted for publication in the DATE24 conference proceedings