-
A Fully Automated Platform for Evaluating ReRAM Crossbars
Authors:
Rebecca Pelke,
Felix Staudigl,
Niklas Thomas,
Nils Bosbach,
Mohammed Hossein,
Jose Cubero-Cascante,
Leticia Bolzani Poehls,
Rainer Leupers,
Jan Moritz Joseph
Abstract:
Resistive Random Access Memory (ReRAM) is a promising candidate for implementing Computing-in-Memory (CIM) architectures and neuromorphic circuits. ReRAM cells exhibit significant variability across different memristive devices and cycles, necessitating further improvements in the areas of devices, algorithms, and applications. To achieve this, understanding the stochastic behavior of the differen…
▽ More
Resistive Random Access Memory (ReRAM) is a promising candidate for implementing Computing-in-Memory (CIM) architectures and neuromorphic circuits. ReRAM cells exhibit significant variability across different memristive devices and cycles, necessitating further improvements in the areas of devices, algorithms, and applications. To achieve this, understanding the stochastic behavior of the different ReRAM technologies is essential. The NeuroBreakoutBoard (NBB) is a versatile instrumentation platform to characterize Non-Volatile Memories (NVMs). However, the NBB itself does not provide any functionality in the form of software or a controller. In this paper, we present a control board for the NBB able to perform reliability assessments of 1T1R ReRAM crossbars. In more detail, an interface that allows a host PC to communicate with the NBB via the new control board is implemented. In a case study, we analyze the Cycle-to-Cycle (C2C) variation and read disturb TiN/Ti/HfO2/TiN cells for different read voltages to gain an understanding of their operational behavior.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
QTFlow: Quantitative Timing-Sensitive Information Flow for Security-Aware Hardware Design on RTL
Authors:
Lennart M. Reimann,
Anshul Prashar,
Chiara Ghinami,
Rebecca Pelke,
Dominik Sisejkovic,
Farhad Merchant,
Rainer Leupers
Abstract:
In contemporary Electronic Design Automation (EDA) tools, security often takes a backseat to the primary goals of power, performance, and area optimization. Commonly, the security analysis is conducted by hand, leading to vulnerabilities in the design remaining unnoticed. Security-aware EDA tools assist the designer in the identification and removal of security threats while kee** performance an…
▽ More
In contemporary Electronic Design Automation (EDA) tools, security often takes a backseat to the primary goals of power, performance, and area optimization. Commonly, the security analysis is conducted by hand, leading to vulnerabilities in the design remaining unnoticed. Security-aware EDA tools assist the designer in the identification and removal of security threats while kee** performance and area in mind. Cutting-edge methods employ information flow analysis to identify inadvertent information leaks in design structures. Current information leakage detection methods use quantitative information flow analysis to quantify the leaks. However, handling sequential circuits poses challenges for state-of-the-art techniques due to their time-agnostic nature, overlooking timing channels, and introducing false positives. To address this, we introduce QTFlow, a timing-sensitive framework for quantifying hardware information leakages during the design phase. Illustrating its effectiveness on open-source benchmarks, QTFlow autonomously identifies timing channels and diminishes all false positives arising from time-agnostic analysis when contrasted with current state-of-the-art techniques.
△ Less
Submitted 6 February, 2024; v1 submitted 31 January, 2024;
originally announced January 2024.
-
CLSA-CIM: A Cross-Layer Scheduling Approach for Computing-in-Memory Architectures
Authors:
Rebecca Pelke,
Jose Cubero-Cascante,
Nils Bosbach,
Felix Staudigl,
Rainer Leupers,
Jan Moritz Joseph
Abstract:
The demand for efficient machine learning (ML) accelerators is growing rapidly, driving the development of novel computing concepts such as resistive random access memory (RRAM)-based tiled computing-in-memory (CIM) architectures. CIM allows to compute within the memory unit, resulting in faster data processing and reduced power consumption. Efficient compiler algorithms are essential to exploit t…
▽ More
The demand for efficient machine learning (ML) accelerators is growing rapidly, driving the development of novel computing concepts such as resistive random access memory (RRAM)-based tiled computing-in-memory (CIM) architectures. CIM allows to compute within the memory unit, resulting in faster data processing and reduced power consumption. Efficient compiler algorithms are essential to exploit the potential of tiled CIM architectures. While conventional ML compilers focus on code generation for CPUs, GPUs, and other von Neumann architectures, adaptations are needed to cover CIM architectures. Cross-layer scheduling is a promising approach, as it enhances the utilization of CIM cores, thereby accelerating computations. Although similar concepts are implicitly used in previous work, there is a lack of clear and quantifiable algorithmic definitions for cross-layer scheduling for tiled CIM architectures. To close this gap, we present CLSA-CIM, a cross-layer scheduling algorithm for tiled CIM architectures. We integrate CLSA-CIM with existing weight-map** strategies and compare performance against state-of-the-art (SOTA) scheduling algorithms. CLSA-CIM improves the utilization by up to 17.9 x , resulting in an overall speedup increase of up to 29.2 x compared to SOTA.
△ Less
Submitted 17 January, 2024; v1 submitted 15 January, 2024;
originally announced January 2024.
-
Map** of CNNs on multi-core RRAM-based CIM architectures
Authors:
Rebecca Pelke,
Nils Bosbach,
Jose Cubero,
Felix Staudigl,
Rainer Leupers,
Jan Moritz Joseph
Abstract:
RRAM-based multi-core systems improve the energy efficiency and performance of CNNs. Thereby, the distributed parallel execution of convolutional layers causes critical data dependencies that limit the potential speedup. This paper presents synchronization techniques for parallel inference of convolutional layers on RRAM-based CIM architectures. We propose an architecture optimization that enables…
▽ More
RRAM-based multi-core systems improve the energy efficiency and performance of CNNs. Thereby, the distributed parallel execution of convolutional layers causes critical data dependencies that limit the potential speedup. This paper presents synchronization techniques for parallel inference of convolutional layers on RRAM-based CIM architectures. We propose an architecture optimization that enables efficient data exchange and discuss the impact of different architecture setups on the performance. The corresponding compiler algorithms are optimized for high speedup and low memory consumption during CNN inference. We achieve more than 99% of the theoretical acceleration limit with a marginal data transmission overhead of less than 4% for state-of-the-art CNN benchmarks.
△ Less
Submitted 26 October, 2023; v1 submitted 7 September, 2023;
originally announced September 2023.
-
parti-gem5: gem5's Timing Mode Parallelised
Authors:
José Cubero-Cascante,
Niko Zurstraßen,
Jörn Nöller,
Rainer Leupers,
Jan Moritz Joseph
Abstract:
Detailed timing models are indispensable tools for the design space exploration of Multiprocessor Systems on Chip (MPSoCs). As core counts continue to increase, the complexity in memory hierarchies and interconnect topologies is also growing, making accurate predictions of design decisions more challenging than ever. In this context, the open-source Full System Simulator (FSS) gem5 is a popular ch…
▽ More
Detailed timing models are indispensable tools for the design space exploration of Multiprocessor Systems on Chip (MPSoCs). As core counts continue to increase, the complexity in memory hierarchies and interconnect topologies is also growing, making accurate predictions of design decisions more challenging than ever. In this context, the open-source Full System Simulator (FSS) gem5 is a popular choice for MPSoC design space exploration, thanks to its flexibility and robust set of detailed timing models. However, its single-threaded simulation kernel severely hampers its throughput. To address this challenge, we introduce parti-gem5, an extension of gem5 that enables parallel timing simulations on modern multi-core simulation hosts. Unlike previous works, parti-gem5 supports gem5's timing mode, the O3CPU, and Ruby's custom cache and interconnect models. Compared to reference single-thread simulations, we achieved speedups of up to 42.7x when simulating a 120-core ARM MPSoC on a 64-core x86-64 host system. While our method introduces timing deviations, the error in total simulated time is below 15% in most cases.
△ Less
Submitted 13 May, 2024; v1 submitted 18 August, 2023;
originally announced August 2023.
-
SoftFlow: Automated HW-SW Confidentiality Verification for Embedded Processors
Authors:
Lennart M. Reimann,
Jonathan Wiesner,
Dominik Sisejkovic,
Farhad Merchant,
Rainer Leupers
Abstract:
Despite its ever-increasing impact, security is not considered as a design objective in commercial electronic design automation (EDA) tools. This results in vulnerabilities being overlooked during the software-hardware design process. Specifically, vulnerabilities that allow leakage of sensitive data might stay unnoticed by standard testing, as the leakage itself might not result in evident functi…
▽ More
Despite its ever-increasing impact, security is not considered as a design objective in commercial electronic design automation (EDA) tools. This results in vulnerabilities being overlooked during the software-hardware design process. Specifically, vulnerabilities that allow leakage of sensitive data might stay unnoticed by standard testing, as the leakage itself might not result in evident functional changes. Therefore, EDA tools are needed to elaborate the confidentiality of sensitive data during the design process. However, state-of-the-art implementations either solely consider the hardware or restrict the expressiveness of the security properties that must be proven. Consequently, more proficient tools are required to assist in the software and hardware design. To address this issue, we propose SoftFlow, an EDA tool that allows determining whether a given software exploits existing leakage paths in hardware. Based on our analysis, the leakage paths can be retained if proven not to be exploited by software. This is desirable if the removal significantly impacts the design's performance or functionality, or if the path cannot be removed as the chip is already manufactured. We demonstrate the feasibility of SoftFlow by identifying vulnerabilities in OpenSSL cryptographic C programs, and redesigning them to avoid leakage of cryptographic keys in a RISC-V architecture.
△ Less
Submitted 4 August, 2023;
originally announced August 2023.
-
Work-in-Progress: A Universal Instrumentation Platform for Non-Volatile Memories
Authors:
Felix Staudigl,
Mohammed Hossein,
Tobias Ziegler,
Hazem Al Indari,
Rebecca Pelke,
Sebastian Siegel,
Dirk J. Wouters,
Dominik Sisejkovic,
Jan Moritz Joseph,
Rainer Leupers
Abstract:
Emerging non-volatile memories (NVMs) represent a disruptive technology that allows a paradigm shift from the conventional von Neumann architecture towards more efficient computing-in-memory (CIM) architectures. Several instrumentation platforms have been proposed to interface NVMs allowing the characterization of single cells and crossbar structures. However, these platforms suffer from low flexi…
▽ More
Emerging non-volatile memories (NVMs) represent a disruptive technology that allows a paradigm shift from the conventional von Neumann architecture towards more efficient computing-in-memory (CIM) architectures. Several instrumentation platforms have been proposed to interface NVMs allowing the characterization of single cells and crossbar structures. However, these platforms suffer from low flexibility and are not capable of performing CIM operations on NVMs. Therefore, we recently designed and built the NeuroBreakoutBoard, a highly versatile instrumentation platform capable of executing CIM on NVMs. We present our preliminary results demonstrating a relative error < 5% in the range of 1 k$Ω$ to 1 M$Ω$ and showcase the switching behavior of a HfO$_2$/Ti-based memristive cell.
△ Less
Submitted 3 August, 2023;
originally announced August 2023.
-
Gate Camouflaging Using Reconfigurable ISFET-Based Threshold Voltage Defined Logic
Authors:
Elmira Moussavi,
Animesh Singh,
Dominik Sisejkovic,
Aravind Padma Kumar,
Daniyar Kizatov,
Sven Ingebrandt,
Rainer Leupers,
Vivek Pachauri,
Farhad Merchant
Abstract:
Most chip designers outsource the manufacturing of their integrated circuits (ICs) to external foundries due to the exorbitant cost and complexity of the process. This involvement of untrustworthy, external entities opens the door to major security threats, such as reverse engineering (RE). RE can reveal the physical structure and functionality of intellectual property (IP) and ICs, leading to IP…
▽ More
Most chip designers outsource the manufacturing of their integrated circuits (ICs) to external foundries due to the exorbitant cost and complexity of the process. This involvement of untrustworthy, external entities opens the door to major security threats, such as reverse engineering (RE). RE can reveal the physical structure and functionality of intellectual property (IP) and ICs, leading to IP theft, counterfeiting, and other misuses. The concept of the threshold voltage-defined (TVD) logic family is a potential mechanism to obfuscate and protect the design and prevent RE. However, it addresses post-fabrication RE issues, and it has been shown that dopant profiling techniques can be used to determine the threshold voltage of the transistor and break the obfuscation. In this work, we propose a novel TVD modulation with ion-sensitive field-effect transistors (ISFETs) to protect the IC from RE and IP piracy. Compared to the conventional TVD logic family, ISFET-TVD allows post-manufacture programming. The ISFET-TVD logic gate can be reconfigured after fabrication, maintaining an exact schematic architecture with an identical layout for all types of logic gates, and thus overcoming the shortcomings of the classic TVD. The threshold voltage of the ISFETs can be adjusted after fabrication by changing the ion concentration of the material in contact with the ion-sensitive gate of the transistor, depending on the Boolean functionality. The ISFET is CMOS compatible, and therefore implemented on 45 nm CMOS technology for demonstration.
△ Less
Submitted 12 April, 2023;
originally announced April 2023.
-
Automated Information Flow Analysis for Integrated Computing-in-Memory Modules
Authors:
Lennart M. Reimann,
Felix Staudigl,
Rainer Leupers
Abstract:
Novel non-volatile memory (NVM) technologies offer high-speed and high-density data storage. In addition, they overcome the von Neumann bottleneck by enabling computing-in-memory (CIM). Various computer architectures have been proposed to integrate CIM blocks in their design, forming a mixed-signal system to combine the computational benefits of CIM with the robustness of conventional CMOS. Novel…
▽ More
Novel non-volatile memory (NVM) technologies offer high-speed and high-density data storage. In addition, they overcome the von Neumann bottleneck by enabling computing-in-memory (CIM). Various computer architectures have been proposed to integrate CIM blocks in their design, forming a mixed-signal system to combine the computational benefits of CIM with the robustness of conventional CMOS. Novel electronic design automation (EDA) tools are necessary to design and manufacture these so-called neuromorphic systems. Furthermore, EDA tools must consider the impact of security vulnerabilities, as hardware security attacks have increased in recent years. Existing information flow analysis (IFA) frameworks offer an automated tool-suite to uphold the confidentiality property for sensitive data during the design of hardware. However, currently available mixed-signal EDA tools are not capable of analyzing the information flow of neuromorphic systems. To illustrate the shortcomings, we develop information flow protocols for NVMs that can be easily integrated in the already existing tool-suites. We show the limitation of the state-of-the-art by analyzing the flow from sensitive signals through multiple memristive crossbar structures to potential untrusted components and outputs. Finally, we provide a thorough discussion of the merits and flaws of the mixed-signal IFA frameworks on neuromorphic systems.
△ Less
Submitted 12 April, 2023;
originally announced April 2023.
-
Fault Injection in Native Logic-in-Memory Computation on Neuromorphic Hardware
Authors:
Felix Staudigl,
Thorben Fetz,
Rebecca Pelke,
Dominik Sisejkovic,
Jan Moritz Joseph,
Leticia Bolzani Pöhls,
Rainer Leupers
Abstract:
Logic-in-memory (LIM) describes the execution of logic gates within memristive crossbar structures, promising to improve performance and energy efficiency. Utilizing only binary values, LIM particularly excels in accelerating binary neural networks, shifting it in the focus of edge applications. Considering its potential, the impact of faults on BNNs accelerated with LIM still lacks investigation.…
▽ More
Logic-in-memory (LIM) describes the execution of logic gates within memristive crossbar structures, promising to improve performance and energy efficiency. Utilizing only binary values, LIM particularly excels in accelerating binary neural networks, shifting it in the focus of edge applications. Considering its potential, the impact of faults on BNNs accelerated with LIM still lacks investigation. In this paper, we propose faulty logic-in-memory (FLIM), a fault injection platform capable of executing full-fledged BNNs on LIM while injecting in-field faults. The results show that FLIM runs a single MNIST picture 66754x faster than the state of the art by offering a fine-grained fault injection methodology.
△ Less
Submitted 15 February, 2023;
originally announced February 2023.
-
Quantitative Information Flow for Hardware: Advancing the Attack Landscape
Authors:
Lennart M. Reimann,
Sarp Erdönmez,
Dominik Sisejkovic,
Rainer Leupers
Abstract:
Security still remains an afterthought in modern Electronic Design Automation (EDA) tools, which solely focus on enhancing performance and reducing the chip size. Typically, the security analysis is conducted by hand, leading to vulnerabilities in the design remaining unnoticed. Security-aware EDA tools assist the designer in the identification and removal of security threats while kee** perform…
▽ More
Security still remains an afterthought in modern Electronic Design Automation (EDA) tools, which solely focus on enhancing performance and reducing the chip size. Typically, the security analysis is conducted by hand, leading to vulnerabilities in the design remaining unnoticed. Security-aware EDA tools assist the designer in the identification and removal of security threats while kee** performance and area in mind. State-of-the-art approaches utilize information flow analysis to spot unintended information leakages in design structures. However, the classification of such threats is binary, resulting in negligible leakages being listed as well. A novel quantitative analysis allows the application of a metric to determine a numeric value for a leakage. Nonetheless, current approximations to quantify the leakage are still prone to overlooking leakages. The mathematical model 2D-QModel introduced in this work aims to overcome this shortcoming. Additionally, as previous work only includes a limited threat model, multiple threat models can be applied using the provided approach. Open-source benchmarks are used to show the capabilities of 2D-QModel to identify hardware Trojans in the design while ignoring insignificant leakages.
△ Less
Submitted 30 November, 2022;
originally announced November 2022.
-
A Temperature Independent Readout Circuit for ISFET-Based Sensor Applications
Authors:
Elmira Moussavi,
Dominik Sisejkovic,
Animesh Singh,
Daniyar Kizatov,
Rainer Leupers,
Sven Ingebrandt,
Vivek Pachauri,
Farhad Merchant
Abstract:
The ion-sensitive field-effect transistor (ISFET) is an emerging technology that has received much attention in numerous research areas, including biochemistry, medicine, and security applications. However, compared to other types of sensors, the complexity of ISFETs make it more challenging to achieve a sensitive, fast and repeatable response. Therefore, various readout circuits have been develop…
▽ More
The ion-sensitive field-effect transistor (ISFET) is an emerging technology that has received much attention in numerous research areas, including biochemistry, medicine, and security applications. However, compared to other types of sensors, the complexity of ISFETs make it more challenging to achieve a sensitive, fast and repeatable response. Therefore, various readout circuits have been developed to improve the performance of ISFETs, especially to eliminate the temperature effect. This paper presents a new approach for a temperature-independent readout circuit that uses the threshold voltage differences of an ISFET-MOSFET pair. The Linear Technology Simulation Program with Integrated Circuit Emphasis (LTspice) is used to analyze the ISFET performance based on the proposed readout circuit characteristics. A macro-model is used to model ISFET behavior, including the first-level Spice model for the MOSFET part and Verilog-A to model the surface potential, reference electrode, and electrolyte of the ISFET to determine the relationships between variables.In this way, the behavior of the ISFET is monitored by the output voltage of the readout circuit based on a change in the electrolyte's hydrogen potential (pH), determined by the simulation. The proposed readout circuit has a temperature coefficient of 11.9 $ppm/°C$ for a temperature range of 0-100 $°C$ and pH between 1 and 13. The proposed ISFET readout circuit outperforms other designs in terms of simplicity and not requiring an additional sensor.
△ Less
Submitted 9 August, 2022;
originally announced August 2022.
-
NISTT: A Non-Intrusive SystemC-TLM 2.0 Tracing Tool
Authors:
Nils Bosbach,
Lukas Jünger,
Jan Moritz Joseph,
Rainer Leupers
Abstract:
The increasing complexity of systems-on-a-chip requires the continuous development of electronic design automation tools. Nowadays, the simulation of systems-on-a-chip using virtual platforms is common. Virtual platforms enable hardware/software co-design to shorten the time to market, offer insights into the models, and allow debugging of the simulated hardware. Profiling tools are required to im…
▽ More
The increasing complexity of systems-on-a-chip requires the continuous development of electronic design automation tools. Nowadays, the simulation of systems-on-a-chip using virtual platforms is common. Virtual platforms enable hardware/software co-design to shorten the time to market, offer insights into the models, and allow debugging of the simulated hardware. Profiling tools are required to improve the usability of virtual platforms. During simulation, these tools capture data that are evaluated afterward. Those data can reveal information about the simulation itself and the software executed on the platform. This work presents the tracing tool NISTT that can profile SystemC-TLM-2.0-based virtual platforms. NISTT is implemented in a completely non-intrusive way. That means no changes in the simulation are needed, the source code of the simulation is not required, and the traced simulation does not need to contain debug symbols. The standardized SystemC application programming interface guarantees the compatibility of NISTT with other simulations. The strengths of NISTT are demonstrated in a case study. Here, NISTT is connected to a virtual platform and traces the boot process of Linux. After the simulation, the database created by NISTT is evaluated, and the results are visualized. Furthermore, the overhead of NISTT is quantified. It is shown that NISTT has only a minor influence on the overall simulation performance.
△ Less
Submitted 22 July, 2022;
originally announced July 2022.
-
PA-PUF: A Novel Priority Arbiter PUF
Authors:
Simranjeet Singh,
Srinivasu Bodapati,
Sachin Patkar,
Rainer Leupers,
Anupam Chattopadhyay,
Farhad Merchant
Abstract:
This paper proposes a 3-input arbiter-based novel physically unclonable function (PUF) design. Firstly, a 3-input priority arbiter is designed using a simple arbiter, two multiplexers (2:1), and an XOR logic gate. The priority arbiter has an equal probability of 0's and 1's at the output, which results in excellent uniformity (49.45%) while retrieving the PUF response. Secondly, a new PUF design b…
▽ More
This paper proposes a 3-input arbiter-based novel physically unclonable function (PUF) design. Firstly, a 3-input priority arbiter is designed using a simple arbiter, two multiplexers (2:1), and an XOR logic gate. The priority arbiter has an equal probability of 0's and 1's at the output, which results in excellent uniformity (49.45%) while retrieving the PUF response. Secondly, a new PUF design based on priority arbiter PUF (PA-PUF) is presented. The PA-PUF design is evaluated for uniqueness, non-linearity, and uniformity against the standard tests. The proposed PA-PUF design is configurable in challenge-response pairs through an arbitrary number of feed-forward priority arbiters introduced to the design. We demonstrate, through extensive experiments, reliability of 100% after performing the error correction techniques and uniqueness of 49.63%. Finally, the design is compared with the literature to evaluate its implementation efficiency, where it is clearly found to be superior compared to the state-of-the-art.
△ Less
Submitted 21 July, 2022;
originally announced July 2022.
-
EmuNoC: Hybrid Emulation for Fast and Flexible Network-on-Chip Prototy** on FPGAs
Authors:
Yee Yang Tan,
Felix Staudigl,
Lukas Jünger,
Anna Drewes,
Rainer Leupers,
Jan Moritz Joseph
Abstract:
Networks-on-Chips (NoCs) recently became widely used, from multi-core CPUs to edge-AI accelerators. Emulation on FPGAs promises to accelerate their RTL modeling compared to slow simulations. However, realistic test stimuli are challenging to generate in hardware for diverse applications. In other words, both a fast and flexible design framework is required. The most promising solution is hybrid em…
▽ More
Networks-on-Chips (NoCs) recently became widely used, from multi-core CPUs to edge-AI accelerators. Emulation on FPGAs promises to accelerate their RTL modeling compared to slow simulations. However, realistic test stimuli are challenging to generate in hardware for diverse applications. In other words, both a fast and flexible design framework is required. The most promising solution is hybrid emulation, in which parts of the design are simulated in software, and the other parts are emulated in hardware. This paper proposes a novel hybrid emulation framework called EmuNoC. We introduce a clock-synchronization method and software-only packet generation that improves the emulation speed by 36.3x to 79.3x over state-of-the-art frameworks while retaining the flexibility of a pure-software interface for stimuli simulation. We also increased the area efficiency to model up to an NoC with 169 routers on a single FPGA, while previous frameworks only achieved 64 routers.
△ Less
Submitted 23 June, 2022;
originally announced June 2022.
-
X-Fault: Impact of Faults on Binary Neural Networks in Memristor-Crossbar Arrays with Logic-in-Memory Computation
Authors:
Felix Staudigl,
Karl J. X. Sturm,
Maximilian Bartel,
Thorben Fetz,
Dominik Sisejkovic,
Jan Moritz Joseph,
Leticia Bolzani Pöhls,
Rainer Leupers
Abstract:
Memristor-based crossbar arrays represent a promising emerging memory technology to replace conventional memories by offering a high density and enabling computing-in-memory (CIM) paradigms. While analog computing provides the best performance, non-idealities and ADC/DAC conversion limit memristor-based CIM. Logic-in-Memory (LIM) presents another flavor of CIM, in which the memristors are used in…
▽ More
Memristor-based crossbar arrays represent a promising emerging memory technology to replace conventional memories by offering a high density and enabling computing-in-memory (CIM) paradigms. While analog computing provides the best performance, non-idealities and ADC/DAC conversion limit memristor-based CIM. Logic-in-Memory (LIM) presents another flavor of CIM, in which the memristors are used in a binary manner to implement logic gates. Since binary neural networks (BNNs) use binary logic gates as the dominant operation, they can benefit from the massively parallel execution of binary operations and better resilience to variations of the memristors. Although conventional neural networks have been thoroughly investigated, the impact of faults on memristor-based BNNs remains unclear. Therefore, we analyze the impact of faults on logic gates in memristor-based crossbar arrays for BNNs. We propose a simulation framework that simulates different traditional faults to examine the accuracy loss of BNNs on memristive crossbar arrays. In addition, we compare different logic families based on the robustness and feasibility to accelerate AI applications.
△ Less
Submitted 4 April, 2022;
originally announced April 2022.
-
Designing ML-Resilient Locking at Register-Transfer Level
Authors:
Dominik Sisejkovic,
Luca Collini,
Benjamin Tan,
Christian Pilato,
Ramesh Karri,
Rainer Leupers
Abstract:
Various logic-locking schemes have been proposed to protect hardware from intellectual property piracy and malicious design modifications. Since traditional locking techniques are applied on the gate-level netlist after logic synthesis, they have no semantic knowledge of the design function. Data-driven, machine-learning (ML) attacks can uncover the design flaws within gate-level locking. Recent p…
▽ More
Various logic-locking schemes have been proposed to protect hardware from intellectual property piracy and malicious design modifications. Since traditional locking techniques are applied on the gate-level netlist after logic synthesis, they have no semantic knowledge of the design function. Data-driven, machine-learning (ML) attacks can uncover the design flaws within gate-level locking. Recent proposals on register-transfer level (RTL) locking have access to semantic hardware information. We investigate the resilience of ASSURE, a state-of-the-art RTL locking method, against ML attacks. We used the lessons learned to derive two ML-resilient RTL locking schemes built to reinforce ASSURE locking. We developed ML-driven security metrics to evaluate the schemes against an RTL adaptation of the state-of-the-art, ML-based SnapShot attack.
△ Less
Submitted 6 April, 2022; v1 submitted 10 March, 2022;
originally announced March 2022.
-
pHGen: A pH-Based Key Generation Mechanism Using ISFETs
Authors:
Elmira Moussavi,
Dominik Sisejkovic,
Fabian Brings,
Daniyar Kizatov,
Animesh Singh,
Xuan Thang Vu,
Sven Ingebrandt,
Rainer Leupers,
Vivek Pachauri,
Farhad Merchant
Abstract:
Digital keys are a fundamental component of many hardware- and software-based security mechanisms. However, digital keys are limited to binary values and easily exploitable when stored in standard memories. In this paper, based on emerging technologies, we introduce pHGen, a potential-of-hydrogen (pH)-based key generation mechanism that leverages chemical reactions in the form of a potential chang…
▽ More
Digital keys are a fundamental component of many hardware- and software-based security mechanisms. However, digital keys are limited to binary values and easily exploitable when stored in standard memories. In this paper, based on emerging technologies, we introduce pHGen, a potential-of-hydrogen (pH)-based key generation mechanism that leverages chemical reactions in the form of a potential change in ion-sensitive field-effect transistors (ISFETs). The threshold voltage of ISFETs is manipulated corresponding to a known pH buffer solution (key) in which the transistors are immersed. To read the chemical information effectively via ISFETs, we designed a readout circuit for stable operation and detection of voltage thresholds. To demonstrate the applicability of the proposed key generation, we utilize pHGen for logic locking -- a hardware integrity protection scheme. The proposed key-generation method breaks the limits of binary values and provides the first steps toward the utilization of multi-valued voltage thresholds of ISFETs controlled by chemical information. The pHGen approach is expected to be a turning point for using more sophisticated bio-based analog keys for securing next-generation electronics.
△ Less
Submitted 24 February, 2022;
originally announced February 2022.
-
A Parallel SystemC Virtual Platform for Neuromorphic Architectures
Authors:
Melvin Galicia,
Farhad Merchant,
Rainer Leupers
Abstract:
With the increasing interest in neuromorphic computing, designers of embedded systems face the challenge of efficiently simulating such platforms to enable architecture design exploration early in the development cycle. Executing artificial neural network applications on neuromorphic systems which are being simulated on virtual platforms (VPs) is an extremely demanding computational task. Neverthe…
▽ More
With the increasing interest in neuromorphic computing, designers of embedded systems face the challenge of efficiently simulating such platforms to enable architecture design exploration early in the development cycle. Executing artificial neural network applications on neuromorphic systems which are being simulated on virtual platforms (VPs) is an extremely demanding computational task. Nevertheless, it is a vital benchmarking task for comparing different possible architectures. Therefore, exploiting the multicore capabilities of the VP's host system is essential to achieve faster simulations. Hence, this paper presents a parallel SystemC based VP for RISC-V multicore platforms integrating multiple computing-in-memory neuromorphic accelerators. In this paper, different VP segmentation architectures are explored for the integration of neuromorphic accelerators and are shown their corresponding speedup simulations compared to conventional sequential SystemC execution.
△ Less
Submitted 24 December, 2021;
originally announced December 2021.
-
NeuroHammer: Inducing Bit-Flips in Memristive Crossbar Memories
Authors:
Felix Staudigl,
Hazem Al Indari,
Daniel Schön,
Dominik Sisejkovic,
Farhad Merchant,
Jan Moritz Joseph,
Vikas Rana,
Stephan Menzel,
Rainer Leupers
Abstract:
Emerging non-volatile memory (NVM) technologies offer unique advantages in energy efficiency, latency, and features such as computing-in-memory. Consequently, emerging NVM technologies are considered an ideal substrate for computation and storage in future-generation neuromorphic platforms. These technologies need to be evaluated for fundamental reliability and security issues. In this paper, we p…
▽ More
Emerging non-volatile memory (NVM) technologies offer unique advantages in energy efficiency, latency, and features such as computing-in-memory. Consequently, emerging NVM technologies are considered an ideal substrate for computation and storage in future-generation neuromorphic platforms. These technologies need to be evaluated for fundamental reliability and security issues. In this paper, we present \emph{NeuroHammer}, a security threat in ReRAM crossbars caused by thermal crosstalk between memory cells. We demonstrate that bit-flips can be deliberately induced in ReRAM devices in a crossbar by systematically writing adjacent memory cells. A simulation flow is developed to evaluate NeuroHammer and the impact of physical parameters on the effectiveness of the attack. Finally, we discuss the security implications in the context of possible attack scenarios.
△ Less
Submitted 6 December, 2021; v1 submitted 2 December, 2021;
originally announced December 2021.
-
QFlow: Quantitative Information Flow for Security-Aware Hardware Design in Verilog
Authors:
Lennart M. Reimann,
Luca Hanel,
Dominik Sisejkovic,
Farhad Merchant,
Rainer Leupers
Abstract:
The enormous amount of code required to design modern hardware implementations often leads to critical vulnerabilities being overlooked. Especially vulnerabilities that compromise the confidentiality of sensitive data, such as cryptographic keys, have a major impact on the trustworthiness of an entire system. Information flow analysis can elaborate whether information from sensitive signals flows…
▽ More
The enormous amount of code required to design modern hardware implementations often leads to critical vulnerabilities being overlooked. Especially vulnerabilities that compromise the confidentiality of sensitive data, such as cryptographic keys, have a major impact on the trustworthiness of an entire system. Information flow analysis can elaborate whether information from sensitive signals flows towards outputs or untrusted components of the system. But most of these analytical strategies rely on the non-interference property, stating that the untrusted targets must not be influenced by the source's data, which is shown to be too inflexible for many applications. To address this issue, there are approaches to quantify the information flow between components such that insignificant leakage can be neglected. Due to the high computational complexity of this quantification, approximations are needed, which introduce mispredictions. To tackle those limitations, we reformulate the approximations. Further, we propose a tool QFlow with a higher detection rate than previous tools. It can be used by non-experienced users to identify data leakages in hardware designs, thus facilitating a security-aware design process.
△ Less
Submitted 22 December, 2021; v1 submitted 6 September, 2021;
originally announced September 2021.
-
Deceptive Logic Locking for Hardware Integrity Protection against Machine Learning Attacks
Authors:
Dominik Sisejkovic,
Farhad Merchant,
Lennart M. Reimann,
Rainer Leupers
Abstract:
Logic locking has emerged as a prominent key-driven technique to protect the integrity of integrated circuits. However, novel machine-learning-based attacks have recently been introduced to challenge the security foundations of locking schemes. These attacks are able to recover a significant percentage of the key without having access to an activated circuit. This paper address this issue through…
▽ More
Logic locking has emerged as a prominent key-driven technique to protect the integrity of integrated circuits. However, novel machine-learning-based attacks have recently been introduced to challenge the security foundations of locking schemes. These attacks are able to recover a significant percentage of the key without having access to an activated circuit. This paper address this issue through two focal points. First, we present a theoretical model to test locking schemes for key-related structural leakage that can be exploited by machine learning. Second, based on the theoretical model, we introduce D-MUX: a deceptive multiplexer-based logic-locking scheme that is resilient against structure-exploiting machine learning attacks. Through the design of D-MUX, we uncover a major fallacy in existing multiplexer-based locking schemes in the form of a structural-analysis attack. Finally, an extensive cost evaluation of D-MUX is presented. To the best of our knowledge, D-MUX is the first machine-learning-resilient locking scheme capable of protecting against all known learning-based attacks. Hereby, the presented work offers a starting point for the design and evaluation of future-generation logic locking in the era of machine learning.
△ Less
Submitted 19 July, 2021;
originally announced July 2021.
-
Logic Locking at the Frontiers of Machine Learning: A Survey on Developments and Opportunities
Authors:
Dominik Sisejkovic,
Lennart M. Reimann,
Elmira Moussavi,
Farhad Merchant,
Rainer Leupers
Abstract:
In the past decade, a lot of progress has been made in the design and evaluation of logic locking; a premier technique to safeguard the integrity of integrated circuits throughout the electronics supply chain. However, the widespread proliferation of machine learning has recently introduced a new pathway to evaluating logic locking schemes. This paper summarizes the recent developments in logic lo…
▽ More
In the past decade, a lot of progress has been made in the design and evaluation of logic locking; a premier technique to safeguard the integrity of integrated circuits throughout the electronics supply chain. However, the widespread proliferation of machine learning has recently introduced a new pathway to evaluating logic locking schemes. This paper summarizes the recent developments in logic locking attacks and countermeasures at the frontiers of contemporary machine learning models. Based on the presented work, the key takeaways, opportunities, and challenges are highlighted to offer recommendations for the design of next-generation logic locking.
△ Less
Submitted 23 November, 2021; v1 submitted 5 July, 2021;
originally announced July 2021.
-
Brightening the Optical Flow through Posit Arithmetic
Authors:
Vinay Saxena,
Ankitha Reddy,
Jonathan Neudorfer,
John Gustafson,
Sangeeth Nambiar,
Rainer Leupers,
Farhad Merchant
Abstract:
As new technologies are invented, their commercial viability needs to be carefully examined along with their technical merits and demerits. The posit data format, proposed as a drop-in replacement for IEEE 754 float format, is one such invention that requires extensive theoretical and experimental study to identify products that can benefit from the advantages of posits for specific market segment…
▽ More
As new technologies are invented, their commercial viability needs to be carefully examined along with their technical merits and demerits. The posit data format, proposed as a drop-in replacement for IEEE 754 float format, is one such invention that requires extensive theoretical and experimental study to identify products that can benefit from the advantages of posits for specific market segments. In this paper, we present an extensive empirical study of posit-based arithmetic vis-à-vis IEEE 754 compliant arithmetic for the optical flow estimation method called Lucas-Kanade (LuKa). First, we use SoftPosit and SoftFloat format emulators to perform an empirical error analysis of the LuKa method. Our study shows that the average error in LuKa with SoftPosit is an order of magnitude lower than LuKa with SoftFloat. We then present the integration of the hardware implementation of a posit adder and multiplier in a RISC-V open-source platform. We make several recommendations, along with the analysis of LuKa in the RISC-V context, for future generation platforms incorporating posit arithmetic units.
△ Less
Submitted 17 January, 2021;
originally announced January 2021.
-
ANDROMEDA: An FPGA Based RISC-V MPSoC Exploration Framework
Authors:
Farhad Merchant,
Dominik Sisejkovic,
Lennart M. Reimann,
Kirthihan Yasotharan,
Thomas Grass,
Rainer Leupers
Abstract:
With the growing demands of consumer electronic products, the computational requirements are increasing exponentially. Due to the applications' computational needs, the computer architects are trying to pack as many cores as possible on a single die for accelerated execution of the application program codes. In a multiprocessor system-on-chip (MPSoC), striking a balance among the number of cores,…
▽ More
With the growing demands of consumer electronic products, the computational requirements are increasing exponentially. Due to the applications' computational needs, the computer architects are trying to pack as many cores as possible on a single die for accelerated execution of the application program codes. In a multiprocessor system-on-chip (MPSoC), striking a balance among the number of cores, memory subsystems, and network-on-chip parameters is essential to attain the desired performance. In this paper, we present ANDROMEDA, a RISC-V based framework that allows us to explore the different configurations of an MPSoC and observe the performance penalties and gains. We emulate the various configurations of MPSoC on the Synopsys HAPS-80D Dual FPGA platform. Using STREAM, matrix multiply, and N-body simulations as benchmarks, we demonstrate our framework's efficacy in quickly identifying the right parameters for efficient execution of these benchmarks.
△ Less
Submitted 14 January, 2021;
originally announced January 2021.
-
An Investigation on Inherent Robustness of Posit Data Representation
Authors:
Ihsen Alouani,
Anouar Ben Khalifa,
Farhad Merchant,
Rainer Leupers
Abstract:
As the dimensions and operating voltages of computer electronics shrink to cope with consumers' demand for higher performance and lower power consumption, circuit sensitivity to soft errors increases dramatically. Recently, a new data-type is proposed in the literature called posit data type. Posit arithmetic has absolute advantages such as higher numerical accuracy, speed, and simpler hardware de…
▽ More
As the dimensions and operating voltages of computer electronics shrink to cope with consumers' demand for higher performance and lower power consumption, circuit sensitivity to soft errors increases dramatically. Recently, a new data-type is proposed in the literature called posit data type. Posit arithmetic has absolute advantages such as higher numerical accuracy, speed, and simpler hardware design than IEEE 754-2008 technical standard-compliant arithmetic. In this paper, we propose a comparative robustness study between 32-bit posit and 32-bit IEEE 754-2008 compliant representations. At first, we propose a theoretical analysis for IEEE 754 compliant numbers and posit numbers for single bit flip and double bit flips. Then, we conduct exhaustive fault injection experiments that show a considerable inherent resilience in posit format compared to classical IEEE 754 compliant representation. To show a relevant use-case of fault-tolerant applications, we perform experiments on a set of machine-learning applications. In more than 95% of the exhaustive fault injection exploration, posit representation is less impacted by faults than the IEEE 754 compliant floating-point representation. Moreover, in 100% of the tested machine-learning applications, the accuracy of posit-implemented systems is higher than the classical floating-point-based ones.
△ Less
Submitted 5 January, 2021;
originally announced January 2021.
-
Architecture, Dataflow and Physical Design Implications of 3D-ICs for DNN-Accelerators
Authors:
Jan Moritz Joseph,
Ananda Samajdar,
Lingjun Zhu,
Rainer Leupers,
Sung-Kyu Lim,
Thilo Pionteck,
Tushar Krishna
Abstract:
The everlasting demand for higher computing power for deep neural networks (DNNs) drives the development of parallel computing architectures. 3D integration, in which chips are integrated and connected vertically, can further increase performance because it introduces another level of spatial parallelism. Therefore, we analyze dataflows, performance, area, power and temperature of such 3D-DNN-acce…
▽ More
The everlasting demand for higher computing power for deep neural networks (DNNs) drives the development of parallel computing architectures. 3D integration, in which chips are integrated and connected vertically, can further increase performance because it introduces another level of spatial parallelism. Therefore, we analyze dataflows, performance, area, power and temperature of such 3D-DNN-accelerators. Monolithic and TSV-based stacked 3D-ICs are compared against 2D-ICs. We identify workload properties and architectural parameters for efficient 3D-ICs and achieve up to 9.14x speedup of 3D vs. 2D. We discuss area-performance trade-offs. We demonstrate applicability as the 3D-IC draws similar power as 2D-ICs and is not thermal limited.
△ Less
Submitted 18 February, 2021; v1 submitted 23 December, 2020;
originally announced December 2020.
-
Challenging the Security of Logic Locking Schemes in the Era of Deep Learning: A Neuroevolutionary Approach
Authors:
Dominik Sisejkovic,
Farhad Merchant,
Lennart M. Reimann,
Harshit Srivastava,
Ahmed Hallawa,
Rainer Leupers
Abstract:
Logic locking is a prominent technique to protect the integrity of hardware designs throughout the integrated circuit design and fabrication flow. However, in recent years, the security of locking schemes has been thoroughly challenged by the introduction of various deobfuscation attacks. As in most research branches, deep learning is being introduced in the domain of logic locking as well. Theref…
▽ More
Logic locking is a prominent technique to protect the integrity of hardware designs throughout the integrated circuit design and fabrication flow. However, in recent years, the security of locking schemes has been thoroughly challenged by the introduction of various deobfuscation attacks. As in most research branches, deep learning is being introduced in the domain of logic locking as well. Therefore, in this paper we present SnapShot: a novel attack on logic locking that is the first of its kind to utilize artificial neural networks to directly predict a key bit value from a locked synthesized gate-level netlist without using a golden reference. Hereby, the attack uses a simpler yet more flexible learning model compared to existing work. Two different approaches are evaluated. The first approach is based on a simple feedforward fully connected neural network. The second approach utilizes genetic algorithms to evolve more complex convolutional neural network architectures specialized for the given task. The attack flow offers a generic and customizable framework for attacking locking schemes using machine learning techniques. We perform an extensive evaluation of SnapShot for two realistic attack scenarios, comprising both reference benchmark circuits as well as silicon-proven RISC-V core modules. The evaluation results show that SnapShot achieves an average key prediction accuracy of 82.60% for the selected attack scenario, with a significant performance increase of 10.49 percentage points compared to the state of the art. Moreover, SnapShot outperforms the existing technique on all evaluated benchmarks. The results indicate that the security foundation of common logic locking schemes is build on questionable assumptions. The conclusions of the evaluation offer insights into the challenges of designing future logic locking schemes that are resilient to machine learning attacks.
△ Less
Submitted 30 November, 2020; v1 submitted 20 November, 2020;
originally announced November 2020.
-
Dataflow Aware Map** of Convolutional Neural Networks Onto Many-Core Platforms With Network-on-Chip Interconnect
Authors:
Andreas Bytyn,
René Ahlsdorf,
Rainer Leupers,
Gerd Ascheid
Abstract:
Machine intelligence, especially using convolutional neural networks (CNNs), has become a large area of research over the past years. Increasingly sophisticated hardware accelerators are proposed that exploit e.g. the sparsity in computations and make use of reduced precision arithmetic to scale down the energy consumption. However, future platforms require more than just energy efficiency: Scalab…
▽ More
Machine intelligence, especially using convolutional neural networks (CNNs), has become a large area of research over the past years. Increasingly sophisticated hardware accelerators are proposed that exploit e.g. the sparsity in computations and make use of reduced precision arithmetic to scale down the energy consumption. However, future platforms require more than just energy efficiency: Scalability is becoming an increasingly important factor. The required effort for physical implementation grows with the size of the accelerator making it more difficult to meet target constraints. Using many-core platforms consisting of several homogeneous cores can alleviate the aforementioned limitations with regard to physical implementation at the expense of an increased dataflow map** effort. While the dataflow in CNNs is deterministic and can therefore be optimized offline, the problem of finding a suitable scheme that minimizes both runtime and off-chip memory accesses is a challenging task which becomes even more complex if an interconnect system is involved. This work presents an automated map** strategy starting at the single-core level with different optimization targets for minimal runtime and minimal off-chip memory accesses. The strategy is then extended towards a suitable many-core map** scheme and evaluated using a scalable system-level simulation with a network-on-chip interconnect. Design space exploration is performed by map** the well-known CNNs AlexNet and VGG-16 to platforms of different core counts and computational power per core in order to investigate the trade-offs. Our map** strategy and system setup is scaled starting from the single core level up to 128 cores, thereby showing the limits of the selected approach.
△ Less
Submitted 18 June, 2020;
originally announced June 2020.
-
CLARINET: A RISC-V Based Framework for Posit Arithmetic Empiricism
Authors:
Niraj Sharma,
Riya Jain,
Madhumita Mohan,
Sachin Patkar,
Rainer Leupers,
Nikhil Rishiyur,
Farhad Merchant
Abstract:
Many engineering and scientific applications require high precision arithmetic. IEEE~754-2008 compliant (floating-point) arithmetic is the de facto standard for performing these computations. Recently, posit arithmetic has been proposed as a drop-in replacement for floating-point arithmetic. The posit\texttrademark data representation and arithmetic claim several absolute advantages over the float…
▽ More
Many engineering and scientific applications require high precision arithmetic. IEEE~754-2008 compliant (floating-point) arithmetic is the de facto standard for performing these computations. Recently, posit arithmetic has been proposed as a drop-in replacement for floating-point arithmetic. The posit\texttrademark data representation and arithmetic claim several absolute advantages over the floating-point format and arithmetic, including higher dynamic range, better accuracy, and superior performance-area trade-offs. However, there does not exist any accessible, holistic framework that facilitates the validation of these claims of posit arithmetic, especially when the claims involve long accumulations (quire).
In this paper, we present a consolidated general-purpose processor-based framework to support posit arithmetic empiricism. The end-users of the framework have the liberty to seamlessly experiment with their applications using posit and floating-point arithmetic since the framework is designed for the two number systems to coexist. Melodica is a posit arithmetic core that implements parametric fused operations that uniquely involve the quire data type. Clarinet is a Melodica-enabled processor based on the RISC-V ISA. To the best of our knowledge, this is the first-ever integration of quire with a RISC-V core. To show the effectiveness of the Clarinet platform, we perform an extensive application study and benchmark some of the common linear algebra and computer vision kernels. We emulate Clarinet on a Xilinx FPGA and present utilization and timing data. Clarinet and Melodica remain actively under development and is available in open-source for posit arithmetic empiricism.
△ Less
Submitted 27 October, 2021; v1 submitted 30 May, 2020;
originally announced June 2020.
-
An Application-Specific VLIW Processor with Vector Instruction Set for CNN Acceleration
Authors:
Andreas Bytyn,
Rainer Leupers,
Gerd Ascheid
Abstract:
In recent years, neural networks have surpassed classical algorithms in areas such as object recognition, e.g. in the well-known ImageNet challenge. As a result, great effort is being put into develo** fast and efficient accelerators, especially for Convolutional Neural Networks (CNNs). In this work we present ConvAix, a fully C-programmable processor, which -- contrary to many existing architec…
▽ More
In recent years, neural networks have surpassed classical algorithms in areas such as object recognition, e.g. in the well-known ImageNet challenge. As a result, great effort is being put into develo** fast and efficient accelerators, especially for Convolutional Neural Networks (CNNs). In this work we present ConvAix, a fully C-programmable processor, which -- contrary to many existing architectures -- does not rely on a hard-wired array of multiply-and-accumulate (MAC) units. Instead it maps computations onto independent vector lanes making use of a carefully designed vector instruction set. The presented processor is targeted towards latency-sensitive applications and is capable of executing up to 192 MAC operations per cycle. ConvAix operates at a target clock frequency of 400 MHz in 28nm CMOS, thereby offering state-of-the-art performance with proper flexibility within its target domain. Simulation results for several 2D convolutional layers from well known CNNs (AlexNet, VGG-16) show an average ALU utilization of 72.5% using vector instructions with 16 bit fixed-point arithmetic. Compared to other well-known designs which are less flexible, ConvAix offers competitive energy efficiency of up to 497 GOP/s/W while even surpassing them in terms of area efficiency and processing speed.
△ Less
Submitted 10 April, 2019;
originally announced April 2019.
-
Efficient Realization of Givens Rotation through Algorithm-Architecture Co-design for Acceleration of QR Factorization
Authors:
Farhad Merchant,
Tarun Vatwani,
Anupam Chattopadhyay,
Soumyendu Raha,
S K Nandy,
Ranjani Narayan,
Rainer Leupers
Abstract:
We present efficient realization of Generalized Givens Rotation (GGR) based QR factorization that achieves 3-100x better performance in terms of Gflops/watt over state-of-the-art realizations on multicore, and General Purpose Graphics Processing Units (GPGPUs). GGR is an improvement over classical Givens Rotation (GR) operation that can annihilate multiple elements of rows and columns of an input…
▽ More
We present efficient realization of Generalized Givens Rotation (GGR) based QR factorization that achieves 3-100x better performance in terms of Gflops/watt over state-of-the-art realizations on multicore, and General Purpose Graphics Processing Units (GPGPUs). GGR is an improvement over classical Givens Rotation (GR) operation that can annihilate multiple elements of rows and columns of an input matrix simultaneously. GGR takes 33% lesser multiplications compared to GR. For custom implementation of GGR, we identify macro operations in GGR and realize them on a Reconfigurable Data-path (RDP) tightly coupled to pipeline of a Processing Element (PE). In PE, GGR attains speed-up of 1.1x over Modified Householder Transform (MHT) presented in the literature. For parallel realization of GGR, we use REDEFINE, a scalable massively parallel Coarse-grained Reconfigurable Architecture, and show that the speed-up attained is commensurate with the hardware resources in REDEFINE. GGR also outperforms General Matrix Multiplication (gemm) by 10% in-terms of Gflops/watt which is counter-intuitive.
△ Less
Submitted 23 March, 2018; v1 submitted 14 March, 2018;
originally announced March 2018.
-
EURETILE 2010-2012 summary: first three years of activity of the European Reference Tiled Experiment
Authors:
Pier Stanislao Paolucci,
Iuliana Bacivarov,
Gert Goossens,
Rainer Leupers,
Frédéric Rousseau,
Christoph Schumacher,
Lothar Thiele,
Piero Vicini
Abstract:
This is the summary of first three years of activity of the EURETILE FP7 project 247846. EURETILE investigates and implements brain-inspired and fault-tolerant foundational innovations to the system architecture of massively parallel tiled computer architectures and the corresponding programming paradigm. The execution targets are a many-tile HW platform, and a many-tile simulator. A set of SW pro…
▽ More
This is the summary of first three years of activity of the EURETILE FP7 project 247846. EURETILE investigates and implements brain-inspired and fault-tolerant foundational innovations to the system architecture of massively parallel tiled computer architectures and the corresponding programming paradigm. The execution targets are a many-tile HW platform, and a many-tile simulator. A set of SW process - HW tile map** candidates is generated by the holistic SW tool-chain using a combination of analytic and bio-inspired methods. The Hardware dependent Software is then generated, providing OS services with maximum efficiency/minimal overhead. The many-tile simulator collects profiling data, closing the loop of the SW tool chain. Fine-grain parallelism inside processes is exploited by optimized intra-tile compilation techniques, but the project focus is above the level of the elementary tile. The elementary HW tile is a multi-processor, which includes a fault tolerant Distributed Network Processor (for inter-tile communication) and ASIP accelerators. Furthermore, EURETILE investigates and implements the innovations for equip** the elementary HW tile with high-bandwidth, low-latency brain-like inter-tile communication emulating 3 levels of connection hierarchy, namely neural columns, cortical areas and cortex, and develops a dedicated cortical simulation benchmark: DPSNN-STDP (Distributed Polychronous Spiking Neural Net with synaptic Spiking Time Dependent Plasticity). EURETILE leverages on the multi-tile HW paradigm and SW tool-chain developed by the FET-ACA SHAPES Integrated Project (2006-2009).
△ Less
Submitted 7 May, 2013;
originally announced May 2013.
-
A Scalable VLSI Architecture for Soft-Input Soft-Output Depth-First Sphere Decoding
Authors:
Ernst Martin Witte,
Filippo Borlenghi,
Gerd Ascheid,
Rainer Leupers,
Heinrich Meyr
Abstract:
Multiple-input multiple-output (MIMO) wireless transmission imposes huge challenges on the design of efficient hardware architectures for iterative receivers. A major challenge is soft-input soft-output (SISO) MIMO demap**, often approached by sphere decoding (SD). In this paper, we introduce the - to our best knowledge - first VLSI architecture for SISO SD applying a single tree-search approach…
▽ More
Multiple-input multiple-output (MIMO) wireless transmission imposes huge challenges on the design of efficient hardware architectures for iterative receivers. A major challenge is soft-input soft-output (SISO) MIMO demap**, often approached by sphere decoding (SD). In this paper, we introduce the - to our best knowledge - first VLSI architecture for SISO SD applying a single tree-search approach. Compared with a soft-output-only base architecture similar to the one proposed by Studer et al. in IEEE J-SAC 2008, the architectural modifications for soft input still allow a one-node-per-cycle execution. For a 4x4 16-QAM system, the area increases by 57% and the operating frequency degrades by 34% only.
△ Less
Submitted 7 June, 2010; v1 submitted 18 October, 2009;
originally announced October 2009.