Skip to main content

Showing 1–11 of 11 results for author: Kurth, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2308.00154  [pdf, other

    cs.AR

    PATRONoC: Parallel AXI Transport Reducing Overhead for Networks-on-Chip targeting Multi-Accelerator DNN Platforms at the Edge

    Authors: Vikram Jain, Matheus Cavalcante, Nazareno Bruschi, Michael Rogenmoser, Thomas Benz, Andreas Kurth, Davide Rossi, Luca Benini, Marian Verhelst

    Abstract: Emerging deep neural network (DNN) applications require high-performance multi-core hardware acceleration with large data bursts. Classical network-on-chips (NoCs) use serial packet-based protocols suffering from significant protocol translation overheads towards the endpoints. This paper proposes PATRONoC, an open-source fully AXI-compliant NoC fabric to better address the specific needs of multi… ▽ More

    Submitted 31 July, 2023; originally announced August 2023.

    Comments: Accepted and presented at 60th DAC

  2. arXiv:2305.05240  [pdf, other

    cs.AR

    A High-performance, Energy-efficient Modular DMA Engine Architecture

    Authors: Thomas Benz, Michael Rogenmoser, Paul Scheffler, Samuel Riedel, Alessandro Ottaviano, Andreas Kurth, Torsten Hoefler, Luca Benini

    Abstract: Data transfers are essential in today's computing systems as latency and complex memory access patterns are increasingly challenging to manage. Direct memory access engines (DMAEs) are critically needed to transfer data independently of the processing elements, hiding latency and achieving high throughput even for complex access patterns to high-latency memory. With the prevalence of heterogeneous… ▽ More

    Submitted 14 November, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

    Comments: 14 pages, 14 figures, accepted by an IEEE journal for publication

  3. arXiv:2201.03861  [pdf, other

    cs.DC cs.AR cs.PF

    HEROv2: Full-Stack Open-Source Research Platform for Heterogeneous Computing

    Authors: Andreas Kurth, Björn Forsberg, Luca Benini

    Abstract: Heterogeneous computers integrate general-purpose host processors with domain-specific accelerators to combine versatility with efficiency and high performance. To realize the full potential of heterogeneous computers, however, many hardware and software design challenges have to be overcome. While architectural and system simulators can be used to analyze heterogeneous computers, they are faced w… ▽ More

    Submitted 11 January, 2022; originally announced January 2022.

    Comments: 14 pages, 9 figures, 3 tables

    ACM Class: C.0; C.1.2; C.1.3; C.1.4; C.5.4; D.1.3

  4. Sub-realtime simulation of a neuronal network of natural density

    Authors: Anno C. Kurth, Johanna Senk, Dennis Terhorst, Justin Finnerty, Markus Diesmann

    Abstract: Full scale simulations of neuronal network models of the brain are challenging due to the high density of connections between neurons. This contribution reports run times shorter than the simulated span of biological time for a full scale model of the local cortical microcircuit with explicit representation of synapses on a recent conventional compute node. Realtime performance is relevant for rob… ▽ More

    Submitted 24 November, 2021; v1 submitted 8 November, 2021; originally announced November 2021.

    Journal ref: Neuromorph. Comput. Eng. 2 021001 (2022)

  5. arXiv:2104.08009  [pdf, other

    cs.DC cs.AR cs.CV cs.LG

    Implementing CNN Layers on the Manticore Cluster-Based Many-Core Architecture

    Authors: Andreas Kurth, Fabian Schuiki, Luca Benini

    Abstract: This document presents implementations of fundamental convolutional neural network (CNN) layers on the Manticore cluster-based many-core architecture and discusses their characteristics and trade-offs.

    Submitted 16 April, 2021; originally announced April 2021.

    Comments: Technical report. 18 pages, 4 figures, 5 algorithms

    ACM Class: C.4; C.1.4; F.2.1; I.2

  6. arXiv:2010.03536  [pdf, other

    cs.NI cs.DC

    PsPIN: A high-performance low-power architecture for flexible in-network compute

    Authors: Salvatore Di Girolamo, Andreas Kurth, Alexandru Calotoiu, Thomas Benz, Timo Schneider, Jakub Beránek, Luca Benini, Torsten Hoefler

    Abstract: The capacity of offloading data and control tasks to the network is becoming increasingly important, especially if we consider the faster growth of network speed when compared to CPU frequencies. In-network compute alleviates the host CPU load by running tasks directly in the network, enabling additional computation/communication overlap and potentially improving overall application performance. H… ▽ More

    Submitted 1 June, 2021; v1 submitted 7 October, 2020; originally announced October 2020.

  7. An Open-Source Platform for High-Performance Non-Coherent On-Chip Communication

    Authors: Andreas Kurth, Wolfgang Rönninger, Thomas Benz, Matheus Cavalcante, Fabian Schuiki, Florian Zaruba, Luca Benini

    Abstract: On-chip communication infrastructure is a central component of modern systems-on-chip (SoCs), and it continues to gain importance as the number of cores, the heterogeneity of components, and the on-chip and off-chip bandwidth continue to grow. Decades of research on on-chip networks enabled cache-coherent shared-memory multiprocessors. However, communication fabrics that meet the needs of heteroge… ▽ More

    Submitted 11 November, 2021; v1 submitted 11 September, 2020; originally announced September 2020.

    Comments: 14 pages, 24 figures, 4 tables

    ACM Class: B.4.3; C.1.2; C.5.4

  8. arXiv:2004.03494  [pdf, other

    cs.PL

    LLHD: A Multi-level Intermediate Representation for Hardware Description Languages

    Authors: Fabian Schuiki, Andreas Kurth, Tobias Grosser, Luca Benini

    Abstract: Modern Hardware Description Languages (HDLs) such as SystemVerilog or VHDL are, due to their sheer complexity, insufficient to transport designs through modern circuit design flows. Instead, each design automation tool lowers HDLs to its own Intermediate Representation (IR). These tools are monolithic and mostly proprietary, disagree in their implementation of HDLs, and while many redundant IRs ex… ▽ More

    Submitted 7 April, 2020; originally announced April 2020.

  9. Network-Accelerated Non-Contiguous Memory Transfers

    Authors: Salvatore Di Girolamo, Konstantin Taranov, Andreas Kurth, Michael Schaffner, Timo Schneider, Jakub Beránek, Maciej Besta, Luca Benini, Duncan Roweth, Torsten Hoefler

    Abstract: Applications often communicate data that is non-contiguous in the send- or the receive-buffer, e.g., when exchanging a column of a matrix stored in row-major order. While non-contiguous transfers are well supported in HPC (e.g., MPI derived datatypes), they can still be up to 5x slower than contiguous transfers of the same size. As we enter the era of network acceleration, we need to investigate w… ▽ More

    Submitted 22 August, 2019; originally announced August 2019.

    Comments: In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC19), Nov. 2019

  10. arXiv:1808.09751  [pdf, other

    cs.AR cs.DC

    Scalable and Efficient Virtual Memory Sharing in Heterogeneous SoCs with TLB Prefetching and MMU-Aware DMA Engine

    Authors: Andreas Kurth, Pirmin Vogel, Andrea Marongiu, Luca Benini

    Abstract: Shared virtual memory (SVM) is key in heterogeneous systems on chip (SoCs), which combine a general-purpose host processor with a many-core accelerator, both for programmability and to avoid data duplication. However, SVM can bring a significant run time overhead when translation lookaside buffer (TLB) entries are missing. Moreover, allowing DMA burst transfers to write SVM traditionally requires… ▽ More

    Submitted 29 August, 2018; originally announced August 2018.

    Comments: 9 pages, 5 figures. Accepted for publication in Proceedings of the 36th IEEE International Conference on Computer Design (ICCD), October 7-10, 2018

  11. arXiv:1712.06497  [pdf, other

    cs.AR cs.DC

    HERO: Heterogeneous Embedded Research Platform for Exploring RISC-V Manycore Accelerators on FPGA

    Authors: Andreas Kurth, Pirmin Vogel, Alessandro Capotondi, Andrea Marongiu, Luca Benini

    Abstract: Heterogeneous embedded systems on chip (HESoCs) co-integrate a standard host processor with programmable manycore accelerators (PMCAs) to combine general-purpose computing with domain-specific, efficient processing capabilities. While leading companies successfully advance their HESoC products, research lags behind due to the challenges of building a prototy** platform that unites an industry-st… ▽ More

    Submitted 18 December, 2017; originally announced December 2017.