-
Adaptive Robotic Arm Control with a Spiking Recurrent Neural Network on a Digital Accelerator
Authors:
Alejandro Linares-Barranco,
Luciano Prono,
Robert Lengenstein,
Giacomo Indiveri,
Charlotte Frenkel
Abstract:
With the rise of artificial intelligence, neural network simulations of biological neuron models are being explored to reduce the footprint of learning and inference in resource-constrained task scenarios. A mainstream type of such networks are spiking neural networks (SNNs) based on simplified Integrate and Fire models for which several hardware accelerators have emerged. Among them, the ReckOn c…
▽ More
With the rise of artificial intelligence, neural network simulations of biological neuron models are being explored to reduce the footprint of learning and inference in resource-constrained task scenarios. A mainstream type of such networks are spiking neural networks (SNNs) based on simplified Integrate and Fire models for which several hardware accelerators have emerged. Among them, the ReckOn chip was introduced as a recurrent SNN allowing for both online training and execution of tasks based on arbitrary sensory modalities, demonstrated for vision, audition, and navigation. As a fully digital and open-source chip, we adapted ReckOn to be implemented on a Xilinx Multiprocessor System on Chip system (MPSoC), facilitating its deployment in embedded systems and increasing the setup flexibility. We present an overview of the system, and a Python framework to use it on a Pynq ZU platform. We validate the architecture and implementation in the new scenario of robotic arm control, and show how the simulated accuracy is preserved with a peak performance of 3.8M events processed per second.
△ Less
Submitted 2 June, 2024; v1 submitted 21 May, 2024;
originally announced May 2024.
-
Within-Camera Multilayer Perceptron DVS Denoising
Authors:
A. Rios-Navarro,
S. Guo,
G Abarajithan,
K. Vijayakumar,
A. Linares-Barranco,
T. Aarrestad,
R. Kastner,
T. Delbruck
Abstract:
In-camera event denoising reduces the data rate of event cameras by filtering out noise at the source. A lightweight multilayer perceptron denoising filter (MLPF) provides state-of-the-art low-cost denoising accuracy. It processes a small neighborhood of pixels from the timestamp image around each event to discriminate signal and noise events. This paper proposes two digital logic implementations…
▽ More
In-camera event denoising reduces the data rate of event cameras by filtering out noise at the source. A lightweight multilayer perceptron denoising filter (MLPF) provides state-of-the-art low-cost denoising accuracy. It processes a small neighborhood of pixels from the timestamp image around each event to discriminate signal and noise events. This paper proposes two digital logic implementations of the MLPF denoiser and quantifies their resource cost, power, and latency. The hardware MLPF quantizes the weights and hidden unit activations to 4 bits and has about 1k weights with about 40% sparsity. The Area-Under-Curve Receiver Operating Characteristic accuracy is nearly indistinguishable from that of the floating point network. The FPGA MLPF processes each event in 10 clock cycles. In FPGA, it uses 3.5k flip flops and 11.5k LUTs. Our ASIC implementation in 65nm digital technology for a 346x260 pixel camera occupies an area of 4.3mm^2 and consumes 4nJ of energy per event at event rates up to 25MHz. The MLPF can be easily integrated into an event camera using an FPGA or as an ASIC directly on the camera chip or in the same package. This denoising could dramatically reduce the energy consumed by the communication and host processor and open new areas of always-on event camera application under scavenged and battery power. Code: https://github.com/SensorsINI/dnd_hls
△ Less
Submitted 15 April, 2023;
originally announced April 2023.
-
LIPSFUS: A neuromorphic dataset for audio-visual sensory fusion of lip reading
Authors:
Antonio Rios-Navarro,
Enrique PiƱero-Fuentes,
Salvador Canas-Moreno,
Aqib Javed,
** Harkin,
Alejandro Linares-Barranco
Abstract:
This paper presents a sensory fusion neuromorphic dataset collected with precise temporal synchronization using a set of Address-Event-Representation sensors and tools. The target application is the lip reading of several keywords for different machine learning applications, such as digits, robotic commands, and auxiliary rich phonetic short words. The dataset is enlarged with a spiking version of…
▽ More
This paper presents a sensory fusion neuromorphic dataset collected with precise temporal synchronization using a set of Address-Event-Representation sensors and tools. The target application is the lip reading of several keywords for different machine learning applications, such as digits, robotic commands, and auxiliary rich phonetic short words. The dataset is enlarged with a spiking version of an audio-visual lip reading dataset collected with frame-based cameras. LIPSFUS is publicly available and it has been validated with a deep learning architecture for audio and visual classification. It is intended for sensory fusion architectures based on both artificial and spiking neural network algorithms.
△ Less
Submitted 28 March, 2023;
originally announced April 2023.
-
An MPSoC-based on-line Edge Infrastructure for Embedded Neuromorphic Robotic Controllers
Authors:
Enrique Pinero-Fuentes,
Salvador Canas-Moreno,
Antonio Rios-Navarro,
Daniel Cascado-Caballero,
Angel Jimenez-Fernandez,
Alejandro Linares-Barranco
Abstract:
In this work, an all-in-one neuromorphic controller system with reduced latency and power consumption for a robotic arm is presented. Biological muscle movement consists of stretching and shrinking fibres via spike-commanded signals that come from motor neurons, which in turn are connected to a central pattern generator neural structure. In addition, biological systems are able to respond to diver…
▽ More
In this work, an all-in-one neuromorphic controller system with reduced latency and power consumption for a robotic arm is presented. Biological muscle movement consists of stretching and shrinking fibres via spike-commanded signals that come from motor neurons, which in turn are connected to a central pattern generator neural structure. In addition, biological systems are able to respond to diverse stimuli rather fast and efficiently, and this is based on the way information is coded within neural processes. As opposed to human-created encoding systems, neural ones use neurons and spikes to process the information and make weighted decisions based on a continuous learning process. The Event-Driven Scorbot platform (ED-Scorbot) consists of a 6 Degrees of Freedom (DoF) robotic arm whose controller implements a Spiking Proportional-Integrative- Derivative algorithm, mimicking in this way the previously commented biological systems. In this paper, we present an infrastructure upgrade to the ED-Scorbot platform, replacing the controller hardware, which was comprised of two Spartan Field Programmable Gate Arrays (FPGAs) and a barebone computer, with an edge device, the Xilinx Zynq-7000 SoC (System on Chip) which reduces the response time, power consumption and overall complexity.
△ Less
Submitted 31 March, 2022;
originally announced April 2022.
-
Towards hardware Implementation of WTA for CPG-based control of a Spiking Robotic Arm
Authors:
A. Linares-Barranco,
E. Pinero-Fuentes,
S. Canas-Moreno,
A. Rios-Navarro,
Maryada,
Chenxi Wu,
**gyue Zhao,
D. Zendrikov,
G. Indiveri
Abstract:
Biological nervous systems typically perform the control of numerous degrees of freedom for example in animal limbs. Neuromorphic engineers study these systems by emulating them in hardware for a deeper understanding and its possible application to solve complex problems in engineering and robotics. Central-Pattern-Generators (CPGs) are part of neuro-controllers, typically used at their last steps…
▽ More
Biological nervous systems typically perform the control of numerous degrees of freedom for example in animal limbs. Neuromorphic engineers study these systems by emulating them in hardware for a deeper understanding and its possible application to solve complex problems in engineering and robotics. Central-Pattern-Generators (CPGs) are part of neuro-controllers, typically used at their last steps to produce rhythmic patterns for limbs movement. Different patterns and gaits typically compete through winner-take-all (WTA) circuits to produce the right movements. In this work we present a WTA circuit implemented in a Spiking-Neural-Network (SNN) processor to produce such patterns for controlling a robotic arm in real-time. The robot uses spike-based proportional-integrativederivative (SPID) controllers to keep a commanded joint position from the winner population of neurons of the WTA circuit. Experiments demonstrate the feasibility of robotic control with spiking circuits following brain-inspiration.
△ Less
Submitted 14 February, 2022;
originally announced February 2022.
-
Wide & Deep neural network model for patch aggregation in CNN-based prostate cancer detection systems
Authors:
Lourdes Duran-Lopez,
Juan P. Dominguez-Morales,
Daniel Gutierrez-Galan,
Antonio Rios-Navarro,
Angel Jimenez-Fernandez,
Saturnino Vicente-Diaz,
Alejandro Linares-Barranco
Abstract:
Prostate cancer (PCa) is one of the most commonly diagnosed cancer and one of the leading causes of death among men, with almost 1.41 million new cases and around 375,000 deaths in 2020. Artificial Intelligence algorithms have had a huge impact in medical image analysis, including digital histopathology, where Convolutional Neural Networks (CNNs) are used to provide a fast and accurate diagnosis,…
▽ More
Prostate cancer (PCa) is one of the most commonly diagnosed cancer and one of the leading causes of death among men, with almost 1.41 million new cases and around 375,000 deaths in 2020. Artificial Intelligence algorithms have had a huge impact in medical image analysis, including digital histopathology, where Convolutional Neural Networks (CNNs) are used to provide a fast and accurate diagnosis, supporting experts in this task. To perform an automatic diagnosis, prostate tissue samples are first digitized into gigapixel-resolution whole-slide images. Due to the size of these images, neural networks cannot use them as input and, therefore, small subimages called patches are extracted and predicted, obtaining a patch-level classification. In this work, a novel patch aggregation method based on a custom Wide & Deep neural network model is presented, which performs a slide-level classification using the patch-level classes obtained from a CNN. The malignant tissue ratio, a 10-bin malignant probability histogram, the least squares regression line of the histogram, and the number of malignant connected components are used by the proposed model to perform the classification. An accuracy of 94.24% and a sensitivity of 98.87% were achieved, proving that the proposed system could aid pathologists by speeding up the screening process and, thus, contribute to the fight against PCa.
△ Less
Submitted 20 May, 2021;
originally announced May 2021.
-
Dynamic Vision Sensor integration on FPGA-based CNN accelerators for high-speed visual classification
Authors:
Alejandro Linares-Barranco,
Antonio Rios-Navarro,
Ricardo Tapiador-Morales,
Tobi Delbruck
Abstract:
Deep-learning is a cutting edge theory that is being applied to many fields. For vision applications the Convolutional Neural Networks (CNN) are demanding significant accuracy for classification tasks. Numerous hardware accelerators have populated during the last years to improve CPU or GPU based solutions. This technology is commonly prototyped and tested over FPGAs before being considered for AS…
▽ More
Deep-learning is a cutting edge theory that is being applied to many fields. For vision applications the Convolutional Neural Networks (CNN) are demanding significant accuracy for classification tasks. Numerous hardware accelerators have populated during the last years to improve CPU or GPU based solutions. This technology is commonly prototyped and tested over FPGAs before being considered for ASIC fabrication for mass production. The use of commercial typical cameras (30fps) limits the capabilities of these systems for high speed applications. The use of dynamic vision sensors (DVS) that emulate the behavior of a biological retina is taking an incremental importance to improve this applications due to its nature, where the information is represented by a continuous stream of spikes and the frames to be processed by the CNN are constructed collecting a fixed number of these spikes (called events). The faster an object is, the more events are produced by DVS, so the higher is the equivalent frame rate. Therefore, these DVS utilization allows to compute a frame at the maximum speed a CNN accelerator can offer. In this paper we present a VHDL/HLS description of a pipelined design for FPGA able to collect events from an Address-Event-Representation (AER) DVS retina to obtain a normalized histogram to be used by a particular CNN accelerator, called NullHop. VHDL is used to describe the circuit, and HLS for computation blocks, which are used to perform the normalization of a frame needed for the CNN. Results outperform previous implementations of frames collection and normalization using ARM processors running at 800MHz on a Zynq7100 in both latency and power consumption. A measured 67% speedup factor is presented for a Roshambo CNN real-time experiment running at 160fps peak rate.
△ Less
Submitted 17 May, 2019;
originally announced May 2019.
-
NeuroPod: a real-time neuromorphic spiking CPG applied to robotics
Authors:
Daniel Gutierrez-Galan,
Juan Pedro Dominguez-Morales,
Fernando Perez-Pena,
Alejandro Linares-Barranco
Abstract:
Initially, robots were developed with the aim of making our life easier, carrying out repetitive or dangerous tasks for humans. Although they were able to perform these tasks, the latest generation of robots are being designed to take a step further, by performing more complex tasks that have been carried out by smart animals or humans up to date. To this end, inspiration needs to be taken from bi…
▽ More
Initially, robots were developed with the aim of making our life easier, carrying out repetitive or dangerous tasks for humans. Although they were able to perform these tasks, the latest generation of robots are being designed to take a step further, by performing more complex tasks that have been carried out by smart animals or humans up to date. To this end, inspiration needs to be taken from biological examples. For instance, insects are able to optimally solve complex environment navigation problems, and many researchers have started to mimic how these insects behave. Recent interest in neuromorphic engineering has motivated us to present a real-time, neuromorphic, spike-based Central Pattern Generator of application in neurorobotics, using an arthropod-like robot. A Spiking Neural Network was designed and implemented on SpiNNaker. The network models a complex, online-change capable Central Pattern Generator which generates three gaits for a hexapod robot locomotion. Reconfigurable hardware was used to manage both the motors of the robot and the real-time communication interface with the Spiking Neural Networks. Real-time measurements confirm the simulation results, and locomotion tests show that NeuroPod can perform the gaits without any balance loss or added delay.
△ Less
Submitted 25 April, 2019;
originally announced April 2019.
-
Performance evaluation over HW/SW co-design SoC memory transfers for a CNN accelerator
Authors:
A. Rios-Navarro,
R. Tapiador-Morales,
A. Jimenez-Fernandez,
M. Dominguez-Morales,
C. Amaya,
A. Linares-Barranco
Abstract:
Many FPGAs vendors have recently included embedded processors in their devices, like Xilinx with ARM-Cortex A cores, together with programmable logic cells. These devices are known as Programmable System on Chip (PSoC). Their ARM cores (embedded in the processing system or PS) communicates with the programmable logic cells (PL) using ARM-standard AXI buses. In this paper we analyses the performanc…
▽ More
Many FPGAs vendors have recently included embedded processors in their devices, like Xilinx with ARM-Cortex A cores, together with programmable logic cells. These devices are known as Programmable System on Chip (PSoC). Their ARM cores (embedded in the processing system or PS) communicates with the programmable logic cells (PL) using ARM-standard AXI buses. In this paper we analyses the performance of exhaustive data transfers between PS and PL for a Xilinx Zynq FPGA in a co-design real scenario for Convolutional Neural Networks (CNN) accelerator, which processes, in dedicated hardware, a stream of visual information from a neuromorphic visual sensor for classification. In the PS side, a Linux operating system is running, which recollects visual events from the neuromorphic sensor into a normalized frame, and then it transfers these frames to the accelerator of multi-layered CNNs, and read results, using an AXI-DMA bus in a per-layer way. As these kind of accelerators try to process information as quick as possible, data bandwidth becomes critical and maintaining a good balanced data throughput rate requires some considerations. We present and evaluate several data partitioning techniques to improve the balance between RX and TX transfer and two different ways of transfers management: through a polling routine at the userlevel of the OS, and through a dedicated interrupt-based kernellevel driver. We demonstrate that for longer enough packets, the kernel-level driver solution gets better timing in computing a CNN classification example. Main advantage of using kernel-level driver is to have safer solutions and to have tasks scheduling in the OS to manage other important processes for our application, like frames collection from sensors and their normalization.
△ Less
Submitted 9 May, 2018;
originally announced June 2018.
-
NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps
Authors:
Alessandro Aimar,
Hesham Mostafa,
Enrico Calabrese,
Antonio Rios-Navarro,
Ricardo Tapiador-Morales,
Iulia-Alexandra Lungu,
Moritz B. Milde,
Federico Corradi,
Alejandro Linares-Barranco,
Shih-Chii Liu,
Tobi Delbruck
Abstract:
Convolutional neural networks (CNNs) have become the dominant neural network architecture for solving many state-of-the-art (SOA) visual processing tasks. Even though Graphical Processing Units (GPUs) are most often used in training and deploying CNNs, their power efficiency is less than 10 GOp/s/W for single-frame runtime inference. We propose a flexible and efficient CNN accelerator architecture…
▽ More
Convolutional neural networks (CNNs) have become the dominant neural network architecture for solving many state-of-the-art (SOA) visual processing tasks. Even though Graphical Processing Units (GPUs) are most often used in training and deploying CNNs, their power efficiency is less than 10 GOp/s/W for single-frame runtime inference. We propose a flexible and efficient CNN accelerator architecture called NullHop that implements SOA CNNs useful for low-power and low-latency application scenarios. NullHop exploits the sparsity of neuron activations in CNNs to accelerate the computation and reduce memory requirements. The flexible architecture allows high utilization of available computing resources across kernel sizes ranging from 1x1 to 7x7. NullHop can process up to 128 input and 128 output feature maps per layer in a single pass. We implemented the proposed architecture on a Xilinx Zynq FPGA platform and present results showing how our implementation reduces external memory transfers and compute time in five different CNNs ranging from small ones up to the widely known large VGG16 and VGG19 CNNs. Post-synthesis simulations using Mentor Modelsim in a 28nm process with a clock frequency of 500 MHz show that the VGG19 network achieves over 450 GOp/s. By exploiting sparsity, NullHop achieves an efficiency of 368%, maintains over 98% utilization of the MAC units, and achieves a power efficiency of over 3TOp/s/W in a core area of 6.3mm$^2$. As further proof of NullHop's usability, we interfaced its FPGA implementation with a neuromorphic event camera for real time interactive demonstrations.
△ Less
Submitted 6 March, 2018; v1 submitted 5 June, 2017;
originally announced June 2017.
-
Comprehensive Evaluation of OpenCL-based Convolutional Neural Network Accelerators in Xilinx and Altera FPGAs
Authors:
R. Tapiador,
A. Rios-Navarro,
A. Linares-Barranco,
Minkyu Kim,
Deepak Kadetotad,
Jae-sun Seo
Abstract:
Deep learning has significantly advanced the state of the art in artificial intelligence, gaining wide popularity from both industry and academia. Special interest is around Convolutional Neural Networks (CNN), which take inspiration from the hierarchical structure of the visual cortex, to form deep layers of convolutional operations, along with fully connected classifiers. Hardware implementation…
▽ More
Deep learning has significantly advanced the state of the art in artificial intelligence, gaining wide popularity from both industry and academia. Special interest is around Convolutional Neural Networks (CNN), which take inspiration from the hierarchical structure of the visual cortex, to form deep layers of convolutional operations, along with fully connected classifiers. Hardware implementations of these deep CNN architectures are challenged with memory bottlenecks that require many convolution and fully-connected layers demanding large amount of communication for parallel computation. Multi-core CPU based solutions have demonstrated their inadequacy for this problem due to the memory wall and low parallelism. Many-core GPU architectures show superior performance but they consume high power and also have memory constraints due to inconsistencies between cache and main memory. FPGA design solutions are also actively being explored, which allow implementing the memory hierarchy using embedded BlockRAM. This boosts the parallel use of shared memory elements between multiple processing units, avoiding data replicability and inconsistencies. This makes FPGAs potentially powerful solutions for real-time classification of CNNs. Both Altera and Xilinx have adopted OpenCL co-design framework from GPU for FPGA designs as a pseudo-automatic development solution. In this paper, a comprehensive evaluation and comparison of Altera and Xilinx OpenCL frameworks for a 5-layer deep CNN is presented. Hardware resources, temporal performance and the OpenCL architecture for CNNs are discussed. Xilinx demonstrates faster synthesis, better FPGA resource utilization and more compact boards. Altera provides multi-platforms tools, mature design community and better execution times.
△ Less
Submitted 29 September, 2016;
originally announced September 2016.