-
A Fully Automated Platform for Evaluating ReRAM Crossbars
Authors:
Rebecca Pelke,
Felix Staudigl,
Niklas Thomas,
Nils Bosbach,
Mohammed Hossein,
Jose Cubero-Cascante,
Leticia Bolzani Poehls,
Rainer Leupers,
Jan Moritz Joseph
Abstract:
Resistive Random Access Memory (ReRAM) is a promising candidate for implementing Computing-in-Memory (CIM) architectures and neuromorphic circuits. ReRAM cells exhibit significant variability across different memristive devices and cycles, necessitating further improvements in the areas of devices, algorithms, and applications. To achieve this, understanding the stochastic behavior of the differen…
▽ More
Resistive Random Access Memory (ReRAM) is a promising candidate for implementing Computing-in-Memory (CIM) architectures and neuromorphic circuits. ReRAM cells exhibit significant variability across different memristive devices and cycles, necessitating further improvements in the areas of devices, algorithms, and applications. To achieve this, understanding the stochastic behavior of the different ReRAM technologies is essential. The NeuroBreakoutBoard (NBB) is a versatile instrumentation platform to characterize Non-Volatile Memories (NVMs). However, the NBB itself does not provide any functionality in the form of software or a controller. In this paper, we present a control board for the NBB able to perform reliability assessments of 1T1R ReRAM crossbars. In more detail, an interface that allows a host PC to communicate with the NBB via the new control board is implemented. In a case study, we analyze the Cycle-to-Cycle (C2C) variation and read disturb TiN/Ti/HfO2/TiN cells for different read voltages to gain an understanding of their operational behavior.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
CLSA-CIM: A Cross-Layer Scheduling Approach for Computing-in-Memory Architectures
Authors:
Rebecca Pelke,
Jose Cubero-Cascante,
Nils Bosbach,
Felix Staudigl,
Rainer Leupers,
Jan Moritz Joseph
Abstract:
The demand for efficient machine learning (ML) accelerators is growing rapidly, driving the development of novel computing concepts such as resistive random access memory (RRAM)-based tiled computing-in-memory (CIM) architectures. CIM allows to compute within the memory unit, resulting in faster data processing and reduced power consumption. Efficient compiler algorithms are essential to exploit t…
▽ More
The demand for efficient machine learning (ML) accelerators is growing rapidly, driving the development of novel computing concepts such as resistive random access memory (RRAM)-based tiled computing-in-memory (CIM) architectures. CIM allows to compute within the memory unit, resulting in faster data processing and reduced power consumption. Efficient compiler algorithms are essential to exploit the potential of tiled CIM architectures. While conventional ML compilers focus on code generation for CPUs, GPUs, and other von Neumann architectures, adaptations are needed to cover CIM architectures. Cross-layer scheduling is a promising approach, as it enhances the utilization of CIM cores, thereby accelerating computations. Although similar concepts are implicitly used in previous work, there is a lack of clear and quantifiable algorithmic definitions for cross-layer scheduling for tiled CIM architectures. To close this gap, we present CLSA-CIM, a cross-layer scheduling algorithm for tiled CIM architectures. We integrate CLSA-CIM with existing weight-map** strategies and compare performance against state-of-the-art (SOTA) scheduling algorithms. CLSA-CIM improves the utilization by up to 17.9 x , resulting in an overall speedup increase of up to 29.2 x compared to SOTA.
△ Less
Submitted 17 January, 2024; v1 submitted 15 January, 2024;
originally announced January 2024.
-
Map** of CNNs on multi-core RRAM-based CIM architectures
Authors:
Rebecca Pelke,
Nils Bosbach,
Jose Cubero,
Felix Staudigl,
Rainer Leupers,
Jan Moritz Joseph
Abstract:
RRAM-based multi-core systems improve the energy efficiency and performance of CNNs. Thereby, the distributed parallel execution of convolutional layers causes critical data dependencies that limit the potential speedup. This paper presents synchronization techniques for parallel inference of convolutional layers on RRAM-based CIM architectures. We propose an architecture optimization that enables…
▽ More
RRAM-based multi-core systems improve the energy efficiency and performance of CNNs. Thereby, the distributed parallel execution of convolutional layers causes critical data dependencies that limit the potential speedup. This paper presents synchronization techniques for parallel inference of convolutional layers on RRAM-based CIM architectures. We propose an architecture optimization that enables efficient data exchange and discuss the impact of different architecture setups on the performance. The corresponding compiler algorithms are optimized for high speedup and low memory consumption during CNN inference. We achieve more than 99% of the theoretical acceleration limit with a marginal data transmission overhead of less than 4% for state-of-the-art CNN benchmarks.
△ Less
Submitted 26 October, 2023; v1 submitted 7 September, 2023;
originally announced September 2023.
-
NISTT: A Non-Intrusive SystemC-TLM 2.0 Tracing Tool
Authors:
Nils Bosbach,
Lukas Jünger,
Jan Moritz Joseph,
Rainer Leupers
Abstract:
The increasing complexity of systems-on-a-chip requires the continuous development of electronic design automation tools. Nowadays, the simulation of systems-on-a-chip using virtual platforms is common. Virtual platforms enable hardware/software co-design to shorten the time to market, offer insights into the models, and allow debugging of the simulated hardware. Profiling tools are required to im…
▽ More
The increasing complexity of systems-on-a-chip requires the continuous development of electronic design automation tools. Nowadays, the simulation of systems-on-a-chip using virtual platforms is common. Virtual platforms enable hardware/software co-design to shorten the time to market, offer insights into the models, and allow debugging of the simulated hardware. Profiling tools are required to improve the usability of virtual platforms. During simulation, these tools capture data that are evaluated afterward. Those data can reveal information about the simulation itself and the software executed on the platform. This work presents the tracing tool NISTT that can profile SystemC-TLM-2.0-based virtual platforms. NISTT is implemented in a completely non-intrusive way. That means no changes in the simulation are needed, the source code of the simulation is not required, and the traced simulation does not need to contain debug symbols. The standardized SystemC application programming interface guarantees the compatibility of NISTT with other simulations. The strengths of NISTT are demonstrated in a case study. Here, NISTT is connected to a virtual platform and traces the boot process of Linux. After the simulation, the database created by NISTT is evaluated, and the results are visualized. Furthermore, the overhead of NISTT is quantified. It is shown that NISTT has only a minor influence on the overall simulation performance.
△ Less
Submitted 22 July, 2022;
originally announced July 2022.