Search | arXiv e-print repository

arXiv:2406.17049 [pdf, other]

Fast Switching Serial and Parallel Paradigms of SNN Inference on Multi-core Heterogeneous Neuromorphic Platform SpiNNaker2

Authors: Jiaxin Huang, Bernhard Vogginger, Florian Kelber, Hector Gonzalez, Klaus Knobloch, Christian Georg Mayr

Abstract: With serial and parallel processors are introduced into Spiking Neural Networks (SNNs) execution, more and more researchers are dedicated to improving the performance of the computing paradigms by taking full advantage of strengths of the available processor. In this paper, we compare and integrate serial and parallel paradigms into one SNN compiling system. For a faster switching between them in… ▽ More With serial and parallel processors are introduced into Spiking Neural Networks (SNNs) execution, more and more researchers are dedicated to improving the performance of the computing paradigms by taking full advantage of strengths of the available processor. In this paper, we compare and integrate serial and parallel paradigms into one SNN compiling system. For a faster switching between them in the layer granularity, we train the classifier to prejudge a better paradigm before compiling instead of making decision afterwards, saving a great amount of compiling time and RAM space on host PC. The classifier Adaptive Boost with the highest accuracy (91.69 percent) among 12 classifiers is integrated into the switching system, which utilizes less memory and processors on the multi-core neuromorphic hardware backend SpiNNaker2 than two individual paradigms. To the best of our knowledge, it is the first fast switching compiling system for SNN simulation. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.05224 [pdf, other]

ON-OFF Neuromorphic ISING Machines using Fowler-Nordheim Annealers

Authors: Zihao Chen, Zhili Xiao, Mahmoud Akl, Johannes Leugring, Omowuyi Olajide, Adil Malik, Nik Dennler, Chad Harper, Subhankar Bose, Hector A. Gonzalez, Jason Eshraghian, Riccardo Pignari, Gianvito Urgese, Andreas G. Andreou, Sadasivan Shankar, Christian Mayr, Gert Cauwenberghs, Shantanu Chakrabartty

Abstract: We introduce NeuroSA, a neuromorphic architecture specifically designed to ensure asymptotic convergence to the ground state of an Ising problem using an annealing process that is governed by the physics of quantum mechanical tunneling using Fowler-Nordheim (FN). The core component of NeuroSA consists of a pair of asynchronous ON-OFF neurons, which effectively map classical simulated annealing (SA… ▽ More We introduce NeuroSA, a neuromorphic architecture specifically designed to ensure asymptotic convergence to the ground state of an Ising problem using an annealing process that is governed by the physics of quantum mechanical tunneling using Fowler-Nordheim (FN). The core component of NeuroSA consists of a pair of asynchronous ON-OFF neurons, which effectively map classical simulated annealing (SA) dynamics onto a network of integrate-and-fire (IF) neurons. The threshold of each ON-OFF neuron pair is adaptively adjusted by an FN annealer which replicates the optimal escape mechanism and convergence of SA, particularly at low temperatures. To validate the effectiveness of our neuromorphic Ising machine, we systematically solved various benchmark MAX-CUT combinatorial optimization problems. Across multiple runs, NeuroSA consistently generates solutions that approach the state-of-the-art level with high accuracy (greater than 99%), and without any graph-specific hyperparameter tuning. For practical illustration, we present results from an implementation of NeuroSA on the SpiNNaker2 platform, highlighting the feasibility of map** our proposed architecture onto a standard neuromorphic accelerator platform. △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: 36 pages, 8 figures

arXiv:2405.00433 [pdf, other]

Weight Sparsity Complements Activity Sparsity in Neuromorphic Language Models

Authors: Rishav Mukherji, Mark Schöne, Khaleelulla Khan Nazeer, Christian Mayr, David Kappel, Anand Subramoney

Abstract: Activity and parameter sparsity are two standard methods of making neural networks computationally more efficient. Event-based architectures such as spiking neural networks (SNNs) naturally exhibit activity sparsity, and many methods exist to sparsify their connectivity by pruning weights. While the effect of weight pruning on feed-forward SNNs has been previously studied for computer vision tasks… ▽ More Activity and parameter sparsity are two standard methods of making neural networks computationally more efficient. Event-based architectures such as spiking neural networks (SNNs) naturally exhibit activity sparsity, and many methods exist to sparsify their connectivity by pruning weights. While the effect of weight pruning on feed-forward SNNs has been previously studied for computer vision tasks, the effects of pruning for complex sequence tasks like language modeling are less well studied since SNNs have traditionally struggled to achieve meaningful performance on these tasks. Using a recently published SNN-like architecture that works well on small-scale language modeling, we study the effects of weight pruning when combined with activity sparsity. Specifically, we study the trade-off between the multiplicative efficiency gains the combination affords and its effect on task performance for language modeling. To dissect the effects of the two sparsities, we conduct a comparative analysis between densely activated models and sparsely activated event-based models across varying degrees of connectivity sparsity. We demonstrate that sparse activity and sparse connectivity complement each other without a proportional drop in task performance for an event-based neural network trained on the Penn Treebank and WikiText-2 language modeling datasets. Our results suggest sparsely connected event-based neural networks are promising candidates for effective and efficient sequence modeling. △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: arXiv admin note: text overlap with arXiv:2311.07625

arXiv:2404.18508 [pdf, other]

Scalable Event-by-event Processing of Neuromorphic Sensory Signals With Deep State-Space Models

Authors: Mark Schöne, Neeraj Mohan Sushma, **gyue Zhuge, Christian Mayr, Anand Subramoney, David Kappel

Abstract: Event-based sensors are well suited for real-time processing due to their fast response times and encoding of the sensory data as successive temporal differences. These and other valuable properties, such as a high dynamic range, are suppressed when the data is converted to a frame-based format. However, most current methods either collapse events into frames or cannot scale up when processing the… ▽ More Event-based sensors are well suited for real-time processing due to their fast response times and encoding of the sensory data as successive temporal differences. These and other valuable properties, such as a high dynamic range, are suppressed when the data is converted to a frame-based format. However, most current methods either collapse events into frames or cannot scale up when processing the event data directly event-by-event. In this work, we address the key challenges of scaling up event-by-event modeling of the long event streams emitted by such sensors, which is a particularly relevant problem for neuromorphic computing. While prior methods can process up to a few thousand time steps, our model, based on modern recurrent deep state-space models, scales to event streams of millions of events for both training and inference.We leverage their stable parameterization for learning long-range dependencies, parallelizability along the sequence dimension, and their ability to integrate asynchronous events effectively to scale them up to long event streams.We further augment these with novel event-centric techniques enabling our model to match or beat the state-of-the-art performance on several event stream benchmarks. In the Spiking Speech Commands task, we improve state-of-the-art by a large margin of 6.6% to 87.1%. On the DVS128-Gestures dataset, we achieve competitive results without using frames or convolutional neural networks. Our work demonstrates, for the first time, that it is possible to use fully event-based processing with purely recurrent networks to achieve state-of-the-art task performance in several event-based benchmarks. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2402.02521 [pdf, other]

Neuromorphic hardware for sustainable AI data centers

Authors: Bernhard Vogginger, Amirhossein Rostami, Vaibhav Jain, Sirine Arfa, Andreas Hantsch, David Kappel, Michael Schäfer, Ulrike Faltings, Hector A. Gonzalez, Chen Liu, Christian Mayr, Wolfgang Maaß

Abstract: As humans advance toward a higher level of artificial intelligence, it is always at the cost of escalating computational resource consumption, which requires develo** novel solutions to meet the exponential growth of AI computing demand. Neuromorphic hardware takes inspiration from how the brain processes information and promises energy-efficient computing of AI workloads. Despite its potential,… ▽ More As humans advance toward a higher level of artificial intelligence, it is always at the cost of escalating computational resource consumption, which requires develo** novel solutions to meet the exponential growth of AI computing demand. Neuromorphic hardware takes inspiration from how the brain processes information and promises energy-efficient computing of AI workloads. Despite its potential, neuromorphic hardware has not found its way into commercial AI data centers. In this article, we try to analyze the underlying reasons for this and derive requirements and guidelines to promote neuromorphic systems for efficient and sustainable cloud computing: We first review currently available neuromorphic hardware systems and collect examples where neuromorphic solutions excel conventional AI processing on CPUs and GPUs. Next, we identify applications, models and algorithms which are commonly deployed in AI data centers as further directions for neuromorphic algorithms research. Last, we derive requirements and best practices for the hardware and software integration of neuromorphic systems into data centers. With this article, we hope to increase awareness of the challenges of integrating neuromorphic hardware into data centers and to guide the community to enable sustainable and energy-efficient AI at scale. △ Less

Submitted 26 June, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

Comments: 11 pages, 2 figures, presented as poster at NICE 2024, 2nd version with updated author list and minor updates

arXiv:2401.04491 [pdf, other]

SpiNNaker2: A Large-Scale Neuromorphic System for Event-Based and Asynchronous Machine Learning

Authors: Hector A. Gonzalez, Jiaxin Huang, Florian Kelber, Khaleelulla Khan Nazeer, Tim Langer, Chen Liu, Matthias Lohrmann, Amirhossein Rostami, Mark Schöne, Bernhard Vogginger, Timo C. Wunderlich, Yexin Yan, Mahmoud Akl, Christian Mayr

Abstract: The joint progress of artificial neural networks (ANNs) and domain specific hardware accelerators such as GPUs and TPUs took over many domains of machine learning research. This development is accompanied by a rapid growth of the required computational demands for larger models and more data. Concurrently, emerging properties of foundation models such as in-context learning drive new opportunities… ▽ More The joint progress of artificial neural networks (ANNs) and domain specific hardware accelerators such as GPUs and TPUs took over many domains of machine learning research. This development is accompanied by a rapid growth of the required computational demands for larger models and more data. Concurrently, emerging properties of foundation models such as in-context learning drive new opportunities for machine learning applications. However, the computational cost of such applications is a limiting factor of the technology in data centers, and more importantly in mobile devices and edge systems. To mediate the energy footprint and non-trivial latency of contemporary systems, neuromorphic computing systems deeply integrate computational principles of neurobiological systems by leveraging low-power analog and digital technologies. SpiNNaker2 is a digital neuromorphic chip developed for scalable machine learning. The event-based and asynchronous design of SpiNNaker2 allows the composition of large-scale systems involving thousands of chips. This work features the operating principles of SpiNNaker2 systems, outlining the prototype of novel machine learning applications. These applications range from ANNs over bio-inspired spiking neural networks to generalized event-based neural networks. With the successful development and deployment of SpiNNaker2, we aim to facilitate the advancement of event-based and asynchronous algorithms for future generations of machine learning systems. △ Less

Submitted 9 January, 2024; originally announced January 2024.

Comments: Submitted at the Workshop on Machine Learning with New Compute Paradigms at NeurIPS 2023 (MLNPCP 2023)

arXiv:2312.09084 [pdf, other]

Language Modeling on a SpiNNaker 2 Neuromorphic Chip

Authors: Khaleelulla Khan Nazeer, Mark Schöne, Rishav Mukherji, Bernhard Vogginger, Christian Mayr, David Kappel, Anand Subramoney

Abstract: As large language models continue to scale in size rapidly, so too does the computational power required to run them. Event-based networks on neuromorphic devices offer a potential way to reduce energy consumption for inference significantly. However, to date, most event-based networks that can run on neuromorphic hardware, including spiking neural networks (SNNs), have not achieved task performan… ▽ More As large language models continue to scale in size rapidly, so too does the computational power required to run them. Event-based networks on neuromorphic devices offer a potential way to reduce energy consumption for inference significantly. However, to date, most event-based networks that can run on neuromorphic hardware, including spiking neural networks (SNNs), have not achieved task performance even on par with LSTM models for language modeling. As a result, language modeling on neuromorphic devices has seemed a distant prospect. In this work, we demonstrate the first-ever implementation of a language model on a neuromorphic device - specifically the SpiNNaker 2 chip - based on a recently published event-based architecture called the EGRU. SpiNNaker 2 is a many-core neuromorphic chip designed for large-scale asynchronous processing, while the EGRU is architected to leverage such hardware efficiently while maintaining competitive task performance. This implementation marks the first time a neuromorphic language model matches LSTMs, setting the stage for taking task performance to the level of large language models. We also demonstrate results on a gesture recognition task based on inputs from a DVS camera. Overall, our results showcase the feasibility of this neuro-inspired neural network in hardware, highlighting significant gains versus conventional hardware in energy efficiency for the common use case of single batch inference. △ Less

Submitted 24 January, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

arXiv:2311.07625 [pdf, ps, other]

Activity Sparsity Complements Weight Sparsity for Efficient RNN Inference

Authors: Rishav Mukherji, Mark Schöne, Khaleelulla Khan Nazeer, Christian Mayr, Anand Subramoney

Abstract: Artificial neural networks open up unprecedented machine learning capabilities at the cost of ever growing computational requirements. Sparsifying the parameters, often achieved through weight pruning, has been identified as a powerful technique to compress the number of model parameters and reduce the computational operations of neural networks. Yet, sparse activations, while omnipresent in both… ▽ More Artificial neural networks open up unprecedented machine learning capabilities at the cost of ever growing computational requirements. Sparsifying the parameters, often achieved through weight pruning, has been identified as a powerful technique to compress the number of model parameters and reduce the computational operations of neural networks. Yet, sparse activations, while omnipresent in both biological neural networks and deep learning systems, have not been fully utilized as a compression technique in deep learning. Moreover, the interaction between sparse activations and weight pruning is not fully understood. In this work, we demonstrate that activity sparsity can compose multiplicatively with parameter sparsity in a recurrent neural network model based on the GRU that is designed to be activity sparse. We achieve up to $20\times$ reduction of computation while maintaining perplexities below $60$ on the Penn Treebank language modeling task. This magnitude of reduction has not been achieved previously with solely sparsely connected LSTMs, and the language modeling performance of our model has not been achieved previously with any sparsely activated recurrent neural networks or spiking neural networks. Neuromorphic computing devices are especially good at taking advantage of the dynamic activity sparsity, and our results provide strong evidence that making deep learning models activity sparse and porting them to neuromorphic devices can be a viable strategy that does not compromise on task performance. Our results also drive further convergence of methods from deep learning and neuromorphic computing for efficient machine learning. △ Less

Submitted 7 December, 2023; v1 submitted 13 November, 2023; originally announced November 2023.

Comments: Accepted to the First MLNCP Workshop @ NeurIPS 2023

arXiv:2310.09094 [pdf]

doi 10.1109/ISOCC59558.2023.10396509

A RISC-V MCU with adaptive reverse body bias and ultra-low-power retention mode in 22 nm FD-SOI

Authors: Heiner Bauer, Marco Stolba, Stefan Scholze, Dennis Walter, Christian Mayr, Alexander Oefelein, Sebastian Höppner, André Scharfe, Flo Schraut, Holger Eisenreich

Abstract: We present a low-power, energy efficient 32-bit RISC-V microprocessor unit (MCU) in 22 nm FD-SOI. It achieves ultra-low leakage,even at high temperatures, by using an adaptive reverse body biasing aware sign-off approach, a low-power optimized physical implementation, and custom SRAM macros with retention mode. We demonstrate the robustness of the chip with measurements over the full industrial te… ▽ More We present a low-power, energy efficient 32-bit RISC-V microprocessor unit (MCU) in 22 nm FD-SOI. It achieves ultra-low leakage,even at high temperatures, by using an adaptive reverse body biasing aware sign-off approach, a low-power optimized physical implementation, and custom SRAM macros with retention mode. We demonstrate the robustness of the chip with measurements over the full industrial temperature range, from -40 °C to 125 °C. Our results match the state of the art (SOTA) with 4.8 uW / MHz at 50 MHz in active mode and surpass the SOTA in ultra-low-power retention mode. △ Less

Submitted 13 October, 2023; originally announced October 2023.

Comments: accepted at ISOCC 2023

arXiv:2305.14974 [pdf, other]

Block-local learning with probabilistic latent representations

Authors: David Kappel, Khaleelulla Khan Nazeer, Cabrel Teguemne Fokam, Christian Mayr, Anand Subramoney

Abstract: The ubiquitous backpropagation algorithm requires sequential updates through the network introducing a locking problem. In addition, back-propagation relies on the transpose of forward weight matrices to compute updates, introducing a weight transport problem across the network. Locking and weight transport are problems because they prevent efficient parallelization and horizontal scaling of the t… ▽ More The ubiquitous backpropagation algorithm requires sequential updates through the network introducing a locking problem. In addition, back-propagation relies on the transpose of forward weight matrices to compute updates, introducing a weight transport problem across the network. Locking and weight transport are problems because they prevent efficient parallelization and horizontal scaling of the training process. We propose a new method to address both these problems and scale up the training of large models. Our method works by dividing a deep neural network into blocks and introduces a feedback network that propagates the information from the targets backwards to provide auxiliary local losses. Forward and backward propagation can operate in parallel and with different sets of weights, addressing the problems of locking and weight transport. Our approach derives from a statistical interpretation of training that treats output activations of network blocks as parameters of probability distributions. The resulting learning framework uses these parameters to evaluate the agreement between forward and backward information. Error backpropagation is then performed locally within each block, leading to "block-local" learning. Several previously proposed alternatives to error backpropagation emerge as special cases of our model. We present results on a variety of tasks and architectures, demonstrating state-of-the-art performance using block-local learning. These results provide a new principled framework for training networks in a distributed setting. △ Less

Submitted 27 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

Comments: 23 pages, 4 figures, preprint

arXiv:2304.04842 [pdf, other]

Deploying Machine Learning Models to Ahead-of-Time Runtime on Edge Using MicroTVM

Authors: Chen Liu, Matthias Jobst, Liyuan Guo, Xinyue Shi, Johannes Partzsch, Christian Mayr

Abstract: In the past few years, more and more AI applications have been applied to edge devices. However, models trained by data scientists with machine learning frameworks, such as PyTorch or TensorFlow, can not be seamlessly executed on edge. In this paper, we develop an end-to-end code generator parsing a pre-trained model to C source libraries for the backend using MicroTVM, a machine learning compiler… ▽ More In the past few years, more and more AI applications have been applied to edge devices. However, models trained by data scientists with machine learning frameworks, such as PyTorch or TensorFlow, can not be seamlessly executed on edge. In this paper, we develop an end-to-end code generator parsing a pre-trained model to C source libraries for the backend using MicroTVM, a machine learning compiler framework extension addressing inference on bare metal devices. An analysis shows that specific compute-intensive operators can be easily offloaded to the dedicated accelerator with a Universal Modular Accelerator (UMA) interface, while others are processed in the CPU cores. By using the automatically generated ahead-of-time C runtime, we conduct a hand gesture recognition experiment on an ARM Cortex M4F core. △ Less

Submitted 14 April, 2023; v1 submitted 10 April, 2023; originally announced April 2023.

Comments: CODAI 2022 Workshop - Embedded System Week (ESWeek)

arXiv:2304.04640 [pdf, other]

NeuroBench: A Framework for Benchmarking Neuromorphic Computing Algorithms and Systems

Authors: Jason Yik, Korneel Van den Berghe, Douwe den Blanken, Younes Bouhadjar, Maxime Fabre, Paul Hueber, Denis Kleyko, Noah Pacik-Nelson, Pao-Sheng Vincent Sun, Guangzhi Tang, Shenqi Wang, Biyan Zhou, Soikat Hasan Ahmed, George Vathakkattil Joseph, Benedetto Leto, Aurora Micheli, Anurag Kumar Mishra, Gregor Lenz, Tao Sun, Zergham Ahmed, Mahmoud Akl, Brian Anderson, Andreas G. Andreou, Chiara Bartolozzi, Arindam Basu , et al. (73 additional authors not shown)

Abstract: Neuromorphic computing shows promise for advancing computing efficiency and capabilities of AI applications using brain-inspired principles. However, the neuromorphic research field currently lacks standardized benchmarks, making it difficult to accurately measure technological advancements, compare performance with conventional methods, and identify promising future research directions. Prior neu… ▽ More Neuromorphic computing shows promise for advancing computing efficiency and capabilities of AI applications using brain-inspired principles. However, the neuromorphic research field currently lacks standardized benchmarks, making it difficult to accurately measure technological advancements, compare performance with conventional methods, and identify promising future research directions. Prior neuromorphic computing benchmark efforts have not seen widespread adoption due to a lack of inclusive, actionable, and iterative benchmark design and guidelines. To address these shortcomings, we present NeuroBench: a benchmark framework for neuromorphic computing algorithms and systems. NeuroBench is a collaboratively-designed effort from an open community of nearly 100 co-authors across over 50 institutions in industry and academia, aiming to provide a representative structure for standardizing the evaluation of neuromorphic approaches. The NeuroBench framework introduces a common set of tools and systematic methodology for inclusive benchmark measurement, delivering an objective reference framework for quantifying neuromorphic approaches in both hardware-independent (algorithm track) and hardware-dependent (system track) settings. In this article, we present initial performance baselines across various model architectures on the algorithm track and outline the system track benchmark tasks and guidelines. NeuroBench is intended to continually expand its benchmarks and features to foster and track the progress made by the research community. △ Less

Submitted 17 January, 2024; v1 submitted 10 April, 2023; originally announced April 2023.

Comments: Updated from whitepaper to full perspective article preprint

arXiv:2209.15317 [pdf, other]

doi 10.1142/S0129065722500514

Convolutional Neural Networks Quantization with Attention

Authors: Binyi Wu, Bernd Waschneck, Christian Georg Mayr

Abstract: It has been proven that, compared to using 32-bit floating-point numbers in the training phase, Deep Convolutional Neural Networks (DCNNs) can operate with low precision during inference, thereby saving memory space and power consumption. However, quantizing networks is always accompanied by an accuracy decrease. Here, we propose a method, double-stage Squeeze-and-Threshold (double-stage ST). It u… ▽ More It has been proven that, compared to using 32-bit floating-point numbers in the training phase, Deep Convolutional Neural Networks (DCNNs) can operate with low precision during inference, thereby saving memory space and power consumption. However, quantizing networks is always accompanied by an accuracy decrease. Here, we propose a method, double-stage Squeeze-and-Threshold (double-stage ST). It uses the attention mechanism to quantize networks and achieve state-of-art results. Using our method, the 3-bit model can achieve accuracy that exceeds the accuracy of the full-precision baseline model. The proposed double-stage ST activation quantization is easy to apply: inserting it before the convolution. △ Less

Submitted 30 September, 2022; originally announced September 2022.

Comments: Preprint of an article published in International Journal of Neural Systems, [10.1142/S0129065722500514] \c{opyright} [copyright World Scientific Publishing Company] [https://www.worldscientific.com/doi/10.1142/S0129065722500514]

arXiv:2206.06178 [pdf, other]

Efficient recurrent architectures through activity sparsity and sparse back-propagation through time

Authors: Anand Subramoney, Khaleelulla Khan Nazeer, Mark Schöne, Christian Mayr, David Kappel

Abstract: Recurrent neural networks (RNNs) are well suited for solving sequence tasks in resource-constrained systems due to their expressivity and low computational requirements. However, there is still a need to bridge the gap between what RNNs are capable of in terms of efficiency and performance and real-world application requirements. The memory and computational requirements arising from propagating t… ▽ More Recurrent neural networks (RNNs) are well suited for solving sequence tasks in resource-constrained systems due to their expressivity and low computational requirements. However, there is still a need to bridge the gap between what RNNs are capable of in terms of efficiency and performance and real-world application requirements. The memory and computational requirements arising from propagating the activations of all the neurons at every time step to every connected neuron, together with the sequential dependence of activations, contribute to the inefficiency of training and using RNNs. We propose a solution inspired by biological neuron dynamics that makes the communication between RNN units sparse and discrete. This makes the backward pass with backpropagation through time (BPTT) computationally sparse and efficient as well. We base our model on the gated recurrent unit (GRU), extending it with units that emit discrete events for communication triggered by a threshold so that no information is communicated to other units in the absence of events. We show theoretically that the communication between units, and hence the computation required for both the forward and backward passes, scales with the number of events in the network. Our model achieves efficiency without compromising task performance, demonstrating competitive performance compared to state-of-the-art recurrent network models in real-world tasks, including language modeling. The dynamic activity sparsity mechanism also makes our model well suited for novel energy-efficient neuromorphic hardware. Code is available at https://github.com/KhaleelKhan/EvNN/. △ Less

Submitted 9 March, 2023; v1 submitted 13 June, 2022; originally announced June 2022.

Comments: Published as notable-top-25% paper in ICLR 2023

arXiv:2206.03943 [pdf, other]

Robust Environment Perception for Automated Driving: A Unified Learning Pipeline for Visual-Infrared Object Detection

Authors: Mohsen Vadidar, Ali Kariminezhad, Christian Mayr, Laurent Kloeker, Lutz Eckstein

Abstract: The RGB complementary metal-oxidesemiconductor (CMOS) sensor works within the visible light spectrum. Therefore it is very sensitive to environmental light conditions. On the contrary, a long-wave infrared (LWIR) sensor operating in 8-14 micro meter spectral band, functions independent of visible light. In this paper, we exploit both visual and thermal perception units for robust object detectio… ▽ More The RGB complementary metal-oxidesemiconductor (CMOS) sensor works within the visible light spectrum. Therefore it is very sensitive to environmental light conditions. On the contrary, a long-wave infrared (LWIR) sensor operating in 8-14 micro meter spectral band, functions independent of visible light. In this paper, we exploit both visual and thermal perception units for robust object detection purposes. After delicate synchronization and (cross-) labeling of the FLIR [1] dataset, this multi-modal perception data passes through a convolutional neural network (CNN) to detect three critical objects on the road, namely pedestrians, bicycles, and cars. After evaluation of RGB and infrared (thermal and infrared are often used interchangeably) sensors separately, various network structures are compared to fuse the data at the feature level effectively. Our RGB-thermal (RGBT) fusion network, which takes advantage of a novel entropy-block attention module (EBAM), outperforms the state-of-the-art network [2] by 10% with 82.9% mAP. △ Less

Submitted 8 June, 2022; originally announced June 2022.

arXiv:2205.08406 [pdf, other]

Raw Radar data based Object Detection and Heading estimation using Cross Attention

Authors: Ravi Kothari, Ali Kariminezhad, Christian Mayr, Haoming Zhang

Abstract: Radar is an inevitable part of the perception sensor set for autonomous driving functions. It plays a gap-filling role to complement the shortcomings of other sensors in diverse scenarios and weather conditions. In this paper, we propose a Deep Neural Network (DNN) based end-to-end object detection and heading estimation framework using raw radar data. To this end, we approach the problem in both… ▽ More Radar is an inevitable part of the perception sensor set for autonomous driving functions. It plays a gap-filling role to complement the shortcomings of other sensors in diverse scenarios and weather conditions. In this paper, we propose a Deep Neural Network (DNN) based end-to-end object detection and heading estimation framework using raw radar data. To this end, we approach the problem in both a Data-centric and model-centric manner. We refine the publicly available CARRADA dataset and introduce Bivariate norm annotations. Besides, the baseline model is improved by a transformer inspired cross-attention fusion and further center-offset maps are added to reduce localisation error. Our proposed model improves the detection mean Average Precision (mAP) by 5%, while reducing the model complexity by almost 23%. For comprehensive scene understanding purposes, we extend our model for heading estimation. The improved ground truth and proposed model is available at Github △ Less

Submitted 17 February, 2023; v1 submitted 17 May, 2022; originally announced May 2022.

arXiv:2202.12650 [pdf, ps, other]

doi 10.1109/TC.2022.3162708

Time-coded Spiking Fourier Transform in Neuromorphic Hardware

Authors: Javier López-Randulfe, Nico Reeb, Negin Karimi, Chen Liu, Hector A. Gonzalez, Robin Dietrich, Bernhard Vogginger, Christian Mayr, Alois Knoll

Abstract: After several decades of continuously optimizing computing systems, the Moore's law is reaching itsend. However, there is an increasing demand for fast and efficient processing systems that can handlelarge streams of data while decreasing system footprints. Neuromorphic computing answers thisneed by creating decentralized architectures that communicate with binary events over time. Despiteits rapi… ▽ More After several decades of continuously optimizing computing systems, the Moore's law is reaching itsend. However, there is an increasing demand for fast and efficient processing systems that can handlelarge streams of data while decreasing system footprints. Neuromorphic computing answers thisneed by creating decentralized architectures that communicate with binary events over time. Despiteits rapid growth in the last few years, novel algorithms are needed that can leverage the potential ofthis emerging computing paradigm and can stimulate the design of advanced neuromorphic chips.In this work, we propose a time-based spiking neural network that is mathematically equivalent tothe Fourier transform. We implemented the network in the neuromorphic chip Loihi and conductedexperiments on five different real scenarios with an automotive frequency modulated continuouswave radar. Experimental results validate the algorithm, and we hope they prompt the design of adhoc neuromorphic chips that can improve the efficiency of state-of-the-art digital signal processorsand encourage research on neuromorphic computing for signal processing. △ Less

Submitted 31 March, 2022; v1 submitted 25 February, 2022; originally announced February 2022.

Comments: Accepted version on IEEE Transactions on Computers (early access). Added copyright notice

arXiv:2105.00789 [pdf, other]

doi 10.1109/TVLSI.2021.3117401

Hardware Implementation of an OPC UA Server for Industrial Field Devices

Authors: Heiner Bauer, Sebastian Höppner, Chris Iatrou, Zohra Charania, Stephan Hartmann, Saif-Ur Rehman, Andreas Dixius, Georg Ellguth, Dennis Walter, Johannes Uhlig, Felix Neumärker, Marc Berthel, Marco Stolba, Florian Kelber, Leon Urbas, Christian Mayr

Abstract: Industrial plants suffer from a high degree of complexity and incompatibility in their communication infrastructure, caused by a wild mix of proprietary technologies. This prevents transformation towards Industry 4.0 and the Industrial Internet of Things. Open Platform Communications Unified Architecture (OPC UA) is a standardized protocol that addresses these problems with uniform and semantic co… ▽ More Industrial plants suffer from a high degree of complexity and incompatibility in their communication infrastructure, caused by a wild mix of proprietary technologies. This prevents transformation towards Industry 4.0 and the Industrial Internet of Things. Open Platform Communications Unified Architecture (OPC UA) is a standardized protocol that addresses these problems with uniform and semantic communication across all levels of the hierarchy. However, its adoption in embedded field devices, such as sensors and actors, is still lacking due to prohibitive memory and power requirements of software implementations. We have developed a dedicated hardware engine that offloads processing of the OPC UA protocol and enables realization of compact and low-power field devices with OPC UA support. As part of a proof-of-concept embedded system we have implemented this engine in a 22 nm FDSOI technology. We measured performance, power consumption, and memory footprint of our test chip and compared it with a software implementation based on open62541 and a Raspberry Pi 2B. Our OPC UA hardware engine is 50 times more energy efficient and only requires 36 KiB of memory. The complete chip consumes only 24 mW under full load, making it suitable for low-power embedded applications. △ Less

Submitted 3 May, 2021; originally announced May 2021.

Comments: 5 pages, 6 figures, 2 tables

Journal ref: IEEE Transactions on Very Large Scale Integration (VLSI) Systems 29 (2021)

arXiv:2103.08392 [pdf, ps, other]

The SpiNNaker 2 Processing Element Architecture for Hybrid Digital Neuromorphic Computing

Authors: Sebastian Höppner, Yexin Yan, Andreas Dixius, Stefan Scholze, Johannes Partzsch, Marco Stolba, Florian Kelber, Bernhard Vogginger, Felix Neumärker, Georg Ellguth, Stephan Hartmann, Stefan Schiefer, Thomas Hocker, Dennis Walter, Genting Liu, Jim Garside, Steve Furber, Christian Mayr

Abstract: This paper introduces the processing element architecture of the second generation SpiNNaker chip, implemented in 22nm FDSOI. On circuit level, the chip features adaptive body biasing for near-threshold operation, and dynamic voltage-and-frequency scaling driven by spiking activity. On system level, processing is centered around an ARM M4 core, similar to the processor-centric architecture of the… ▽ More This paper introduces the processing element architecture of the second generation SpiNNaker chip, implemented in 22nm FDSOI. On circuit level, the chip features adaptive body biasing for near-threshold operation, and dynamic voltage-and-frequency scaling driven by spiking activity. On system level, processing is centered around an ARM M4 core, similar to the processor-centric architecture of the first generation SpiNNaker. To speed operation of subtasks, we have added accelerators for numerical operations of both spiking (SNN) and rate based (deep) neural networks (DNN). PEs communicate via a dedicated, custom-designed network-on-chip. We present three benchmarks showing operation of the whole processor element on SNN, DNN and hybrid SNN/DNN networks. △ Less

Submitted 15 August, 2022; v1 submitted 15 March, 2021; originally announced March 2021.

arXiv:2009.08921 [pdf, other]

doi 10.1088/2634-4386/abf150

Low-Power Low-Latency Keyword Spotting and Adaptive Control with a SpiNNaker 2 Prototype and Comparison with Loihi

Authors: Yexin Yan, Terrence C. Stewart, Xuan Choo, Bernhard Vogginger, Johannes Partzsch, Sebastian Hoeppner, Florian Kelber, Chris Eliasmith, Steve Furber, Christian Mayr

Abstract: We implemented two neural network based benchmark tasks on a prototype chip of the second-generation SpiNNaker (SpiNNaker 2) neuromorphic system: keyword spotting and adaptive robotic control. Keyword spotting is commonly used in smart speakers to listen for wake words, and adaptive control is used in robotic applications to adapt to unknown dynamics in an online fashion. We highlight the benefit… ▽ More We implemented two neural network based benchmark tasks on a prototype chip of the second-generation SpiNNaker (SpiNNaker 2) neuromorphic system: keyword spotting and adaptive robotic control. Keyword spotting is commonly used in smart speakers to listen for wake words, and adaptive control is used in robotic applications to adapt to unknown dynamics in an online fashion. We highlight the benefit of a multiply accumulate (MAC) array in the SpiNNaker 2 prototype which is ordinarily used in rate-based machine learning networks when employed in a neuromorphic, spiking context. In addition, the same benchmark tasks have been implemented on the Loihi neuromorphic chip, giving a side-by-side comparison regarding power consumption and computation time. While Loihi shows better efficiency when less complicated vector-matrix multiplication is involved, with the MAC array, the SpiNNaker 2 prototype shows better efficiency when high dimensional vector-matrix multiplication is involved. △ Less

Submitted 18 September, 2020; originally announced September 2020.

arXiv:2007.01634 [pdf, other]

Social distancing with the Optimal Steps Model

Authors: Christina Maria Mayr, Gerta Köster

Abstract: With the Covid-19 pandemic an urgent need to simulate social distancing arises. The Optimal Steps Model (OSM) is a pedestrian locomotion model that operationalizes an individual's need for personal space. We present new parameter values for personal space in the Optimal Steps Model to simulate social distancing in the pedestrian dynamics simulator Vadere. Our approach is pragmatic. We consider two… ▽ More With the Covid-19 pandemic an urgent need to simulate social distancing arises. The Optimal Steps Model (OSM) is a pedestrian locomotion model that operationalizes an individual's need for personal space. We present new parameter values for personal space in the Optimal Steps Model to simulate social distancing in the pedestrian dynamics simulator Vadere. Our approach is pragmatic. We consider two use cases: in the first we demand that a set social distance must never be violated. In the second the social distance must be kept only on average. For each use case we conduct simulation studies in a typical bottleneck scenario and measure contact times, that is, violations of the social distance rule. We derive rules of thumb for suitable parameter choices in dependency of the desired social distance. We test the rules of thumb for the social distances 1.5m and 2.0m and observe that the new parameter values indeed lead to the desired social distancing. Thus, the rules of thumb will quickly enable Vadere users to conduct their own studies without understanding the intricacies of the OSM implementation and without extensive parameter adjustment. △ Less

Submitted 1 March, 2022; v1 submitted 3 July, 2020; originally announced July 2020.

Comments: 9 pages, 8 figures, 4 tables

arXiv:2003.13749 [pdf, other]

The Operating System of the Neuromorphic BrainScaleS-1 System

Authors: Eric Müller, Sebastian Schmitt, Christian Mauch, Sebastian Billaudelle, Andreas Grübl, Maurice Güttler, Dan Husmann, Joscha Ilmberger, Sebastian Jeltsch, Jakob Kaiser, Johann Klähn, Mitja Kleider, Christoph Koke, José Montes, Paul Müller, Johannes Partzsch, Felix Passenberg, Hartmut Schmidt, Bernhard Vogginger, Jonas Weidner, Christian Mayr, Johannes Schemmel

Abstract: BrainScaleS-1 is a wafer-scale mixed-signal accelerated neuromorphic system targeted for research in the fields of computational neuroscience and beyond-von-Neumann computing. The BrainScaleS Operating System (BrainScaleS OS) is a software stack giving users the possibility to emulate networks described in the high-level network description language PyNN with minimal knowledge of the system. At th… ▽ More BrainScaleS-1 is a wafer-scale mixed-signal accelerated neuromorphic system targeted for research in the fields of computational neuroscience and beyond-von-Neumann computing. The BrainScaleS Operating System (BrainScaleS OS) is a software stack giving users the possibility to emulate networks described in the high-level network description language PyNN with minimal knowledge of the system. At the same time, expert usage is facilitated by allowing to hook into the system at any depth of the stack. We present operation and development methodologies implemented for the BrainScaleS-1 neuromorphic architecture and walk through the individual components of BrainScaleS OS constituting the software stack for BrainScaleS-1 platform operation. △ Less

Submitted 2 February, 2022; v1 submitted 30 March, 2020; originally announced March 2020.

arXiv:1911.02385 [pdf, other]

SpiNNaker 2: A 10 Million Core Processor System for Brain Simulation and Machine Learning

Authors: Christian Mayr, Sebastian Hoeppner, Steve Furber

Abstract: SpiNNaker is an ARM-based processor platform optimized for the simulation of spiking neural networks. This brief describes the roadmap in going from the current SPINNaker1 system, a 1 Million core machine in 130nm CMOS, to SpiNNaker2, a 10 Million core machine in 22nm FDSOI. Apart from pure scaling, we will take advantage of specific technology features, such as runtime adaptive body biasing, to d… ▽ More SpiNNaker is an ARM-based processor platform optimized for the simulation of spiking neural networks. This brief describes the roadmap in going from the current SPINNaker1 system, a 1 Million core machine in 130nm CMOS, to SpiNNaker2, a 10 Million core machine in 22nm FDSOI. Apart from pure scaling, we will take advantage of specific technology features, such as runtime adaptive body biasing, to deliver cutting-edge power consumption. Power management of the cores allows a wide range of workload adaptivity, i.e. processor power scales with the complexity and activity of the spiking network. Additional numerical accelerators will enhance the utility of SpiNNaker2 for simulation of spiking neural networks as well as for executing conventional deep neural networks. These measures should increase the simulation capacity of the machine by a factor $>$50. The interplay between the two domains, i.e. spiking and rate based, will provide an interesting field for algorithm exploration on SpiNNaker2. Apart from the platforms' traditional usage as a neuroscience exploration tool, the extended functionality opens up new application areas such as automotive AI, tactile internet, industry 4.0 and biomedical processing. △ Less

Submitted 6 November, 2019; originally announced November 2019.

Journal ref: Communication Process Architectures 2018

arXiv:1904.10389 [pdf, ps, other]

Mean Field Approach for Configuring Population Dynamics on a Biohybrid Neuromorphic System

Authors: Johannes Partzsch, Christian Mayr, Massimiliano Giulioni, Marko Noack, Stefan Hänzsche, Stefan Scholze, Sebastian Höppner, Paolo Del Giudice, Rene Schüffny

Abstract: Real-time coupling of cell cultures to neuromorphic circuits necessitates a neuromorphic network that replicates biological behaviour both on a per-neuron and on a population basis, with a network size comparable to the culture. We present a large neuromorphic system composed of 9 chips, with overall 2880 neurons and 144M conductance-based synapses. As they are realized in a robust switched-capaci… ▽ More Real-time coupling of cell cultures to neuromorphic circuits necessitates a neuromorphic network that replicates biological behaviour both on a per-neuron and on a population basis, with a network size comparable to the culture. We present a large neuromorphic system composed of 9 chips, with overall 2880 neurons and 144M conductance-based synapses. As they are realized in a robust switched-capacitor fashion, individual neurons and synapses can be configured to replicate with high fidelity a wide range of biologically realistic behaviour. In contrast to other exploration/heuristics-based approaches, we employ a theory-guided mesoscopic approach to configure the overall network to a range of bursting behaviours, thus replicating the statistics of our targeted in-vitro network. The mesoscopic approach has implications beyond our proposed biohybrid, as it allows a targeted exploration of the behavioural space, which is a non-trivial task especially in large, recurrent networks. △ Less

Submitted 23 April, 2019; originally announced April 2019.

Comments: 21 pages, 13 figures

arXiv:1903.08941 [pdf, other]

Dynamic Power Management for Neuromorphic Many-Core Systems

Authors: Sebastian Hoeppner, Bernhard Vogginger, Yexin Yan, Andreas Dixius, Stefan Scholze, Johannes Partzsch, Felix Neumaerker, Stephan Hartmann, Stefan Schiefer, Georg Ellguth, Love Cederstroem, Luis Plana, Jim Garside, Steve Furber, Christian Mayr

Abstract: This work presents a dynamic power management architecture for neuromorphic many core systems such as SpiNNaker. A fast dynamic voltage and frequency scaling (DVFS) technique is presented which allows the processing elements (PE) to change their supply voltage and clock frequency individually and autonomously within less than 100 ns. This is employed by the neuromorphic simulation software flow, w… ▽ More This work presents a dynamic power management architecture for neuromorphic many core systems such as SpiNNaker. A fast dynamic voltage and frequency scaling (DVFS) technique is presented which allows the processing elements (PE) to change their supply voltage and clock frequency individually and autonomously within less than 100 ns. This is employed by the neuromorphic simulation software flow, which defines the performance level (PL) of the PE based on the actual workload within each simulation cycle. A test chip in 28 nm SLP CMOS technology has been implemented. It includes 4 PEs which can be scaled from 0.7 V to 1.0 V with frequencies from 125 MHz to 500 MHz at three distinct PLs. By measurement of three neuromorphic benchmarks it is shown that the total PE power consumption can be reduced by 75%, with 80% baseline power reduction and a 50% reduction of energy per neuron and synapse computation, all while maintaining temporary peak system performance to achieve biological real-time operation of the system. A numerical model of this power management model is derived which allows DVFS architecture exploration for neuromorphics. The proposed technique is to be used for the second generation SpiNNaker neuromorphic many core system. △ Less

Submitted 21 March, 2019; originally announced March 2019.

arXiv:1903.08500 [pdf, other]

doi 10.1109/TBCAS.2019.2906401

Efficient Reward-Based Structural Plasticity on a SpiNNaker 2 Prototype

Authors: Yexin Yan, David Kappel, Felix Neumaerker, Johannes Partzsch, Bernhard Vogginger, Sebastian Hoeppner, Steve Furber, Wolfgang Maass, Robert Legenstein, Christian Mayr

Abstract: Advances in neuroscience uncover the mechanisms employed by the brain to efficiently solve complex learning tasks with very limited resources. However, the efficiency is often lost when one tries to port these findings to a silicon substrate, since brain-inspired algorithms often make extensive use of complex functions such as random number generators, that are expensive to compute on standard gen… ▽ More Advances in neuroscience uncover the mechanisms employed by the brain to efficiently solve complex learning tasks with very limited resources. However, the efficiency is often lost when one tries to port these findings to a silicon substrate, since brain-inspired algorithms often make extensive use of complex functions such as random number generators, that are expensive to compute on standard general purpose hardware. The prototype chip of the 2nd generation SpiNNaker system is designed to overcome this problem. Low-power ARM processors equipped with a random number generator and an exponential function accelerator enable the efficient execution of brain-inspired algorithms. We implement the recently introduced reward-based synaptic sampling model that employs structural plasticity to learn a function or task. The numerical simulation of the model requires to update the synapse variables in each time step including an explorative random term. To the best of our knowledge, this is the most complex synapse model implemented so far on the SpiNNaker system. By making efficient use of the hardware accelerators and numerical optimizations the computation time of one plasticity update is reduced by a factor of 2. This, combined with fitting the model into to the local SRAM, leads to 62% energy reduction compared to the case without accelerators and the use of external DRAM. The model implementation is integrated into the SpiNNaker software framework allowing for scalability onto larger systems. The hardware-software system presented in this work paves the way for power-efficient mobile and biomedical applications with biologically plausible brain-inspired algorithms. △ Less

Submitted 20 March, 2019; originally announced March 2019.

Comments: accepted by IEEE TBioCAS

arXiv:1709.04179 [pdf]

A geographically distributed bio-hybrid neural network with memristive plasticity

Authors: Alexantrou Serb, Andrea Corna, Richard George, Ali Khiat, Federico Rocchi, Marco Reato, Marta Maschietto, Chirstian Mayr, Giacomo Indiveri, Stefano Vassanelli, Themistoklis Prodromakis

Abstract: Throughout evolution the brain has mastered the art of processing real-world inputs through networks of interlinked spiking neurons. Synapses have emerged as key elements that, owing to their plasticity, are merging neuron-to-neuron signalling with memory storage and computation. Electronics has made important steps in emulating neurons through neuromorphic circuits and synapses with nanoscale mem… ▽ More Throughout evolution the brain has mastered the art of processing real-world inputs through networks of interlinked spiking neurons. Synapses have emerged as key elements that, owing to their plasticity, are merging neuron-to-neuron signalling with memory storage and computation. Electronics has made important steps in emulating neurons through neuromorphic circuits and synapses with nanoscale memristors, yet novel applications that interlink them in heterogeneous bio-inspired and bio-hybrid architectures are just beginning to materialise. The use of memristive technologies in brain-inspired architectures for computing or for sensing spiking activity of biological neurons8 are only recent examples, however interlinking brain and electronic neurons through plasticity-driven synaptic elements has remained so far in the realm of the imagination. Here, we demonstrate a bio-hybrid neural network (bNN) where memristors work as "synaptors" between rat neural circuits and VLSI neurons. The two fundamental synaptors, from artificial-to-biological (ABsyn) and from biological-to- artificial (BAsyn), are interconnected over the Internet. The bNN extends across Europe, collapsing spatial boundaries existing in natural brain networks and laying the foundations of a new geographically distributed and evolving architecture: the Internet of Neuro-electronics (IoN). △ Less

Submitted 13 September, 2017; originally announced September 2017.

Comments: 16 pages, 10 figures

arXiv:1703.06043 [pdf, other]

doi 10.1109/ISCAS.2017.8050530

Pattern representation and recognition with accelerated analog neuromorphic systems

Authors: Mihai A. Petrovici, Sebastian Schmitt, Johann Klähn, David Stöckel, Anna Schroeder, Guillaume Bellec, Johannes Bill, Oliver Breitwieser, Ilja Bytschok, Andreas Grübl, Maurice Güttler, Andreas Hartel, Stephan Hartmann, Dan Husmann, Kai Husmann, Sebastian Jeltsch, Vitali Karasenko, Mitja Kleider, Christoph Koke, Alexander Kononov, Christian Mauch, Eric Müller, Paul Müller, Johannes Partzsch, Thomas Pfeil , et al. (11 additional authors not shown)

Abstract: Despite being originally inspired by the central nervous system, artificial neural networks have diverged from their biological archetypes as they have been remodeled to fit particular tasks. In this paper, we review several possibilites to reverse map these architectures to biologically more realistic spiking networks with the aim of emulating them on fast, low-power neuromorphic hardware. Since… ▽ More Despite being originally inspired by the central nervous system, artificial neural networks have diverged from their biological archetypes as they have been remodeled to fit particular tasks. In this paper, we review several possibilites to reverse map these architectures to biologically more realistic spiking networks with the aim of emulating them on fast, low-power neuromorphic hardware. Since many of these devices employ analog components, which cannot be perfectly controlled, finding ways to compensate for the resulting effects represents a key challenge. Here, we discuss three different strategies to address this problem: the addition of auxiliary network components for stabilizing activity, the utilization of inherently robust architectures and a training method for hardware-emulated networks that functions without perfect knowledge of the system's dynamics and parameters. For all three scenarios, we corroborate our theoretical considerations with experimental results on accelerated analog neuromorphic platforms. △ Less

Submitted 3 July, 2017; v1 submitted 17 March, 2017; originally announced March 2017.

Comments: accepted at ISCAS 2017

Journal ref: Circuits and Systems (ISCAS), 2017 IEEE International Symposium on

arXiv:1703.01909 [pdf, other]

doi 10.1109/IJCNN.2017.7966125

Neuromorphic Hardware In The Loop: Training a Deep Spiking Network on the BrainScaleS Wafer-Scale System

Authors: Sebastian Schmitt, Johann Klaehn, Guillaume Bellec, Andreas Gruebl, Maurice Guettler, Andreas Hartel, Stephan Hartmann, Dan Husmann, Kai Husmann, Vitali Karasenko, Mitja Kleider, Christoph Koke, Christian Mauch, Eric Mueller, Paul Mueller, Johannes Partzsch, Mihai A. Petrovici, Stefan Schiefer, Stefan Scholze, Bernhard Vogginger, Robert Legenstein, Wolfgang Maass, Christian Mayr, Johannes Schemmel, Karlheinz Meier

Abstract: Emulating spiking neural networks on analog neuromorphic hardware offers several advantages over simulating them on conventional computers, particularly in terms of speed and energy consumption. However, this usually comes at the cost of reduced control over the dynamics of the emulated networks. In this paper, we demonstrate how iterative training of a hardware-emulated network can compensate for… ▽ More Emulating spiking neural networks on analog neuromorphic hardware offers several advantages over simulating them on conventional computers, particularly in terms of speed and energy consumption. However, this usually comes at the cost of reduced control over the dynamics of the emulated networks. In this paper, we demonstrate how iterative training of a hardware-emulated network can compensate for anomalies induced by the analog substrate. We first convert a deep neural network trained in software to a spiking network on the BrainScaleS wafer-scale neuromorphic system, thereby enabling an acceleration factor of 10 000 compared to the biological time domain. This map** is followed by the in-the-loop training, where in each training step, the network activity is first recorded in hardware and then used to compute the parameter updates in software via backpropagation. An essential finding is that the parameter updates do not have to be precise, but only need to approximately follow the correct gradient, which simplifies the computation of updates. Using this approach, after only several tens of iterations, the spiking network shows an accuracy close to the ideal software-emulated prototype. The presented techniques show that deep spiking networks emulated on analog neuromorphic devices can attain good computational performance despite the inherent variations of the analog substrate. △ Less

Submitted 6 March, 2017; originally announced March 2017.

Comments: 8 pages, 10 figures, submitted to IJCNN 2017

arXiv:1412.3243 [pdf, ps, other]

Switched-Capacitor Realization of Presynaptic Short-Term-Plasticity and Stop-Learning Synapses in 28 nm CMOS

Authors: Marko Noack, Johannes Partzsch, Christian Mayr, Stefan Hänzsche, Stefan Scholze, Sebastian Höppner, Georg Ellguth, Rene Schüffny

Abstract: Synaptic dynamics, such as long- and short-term plasticity, play an important role in the complexity and biological realism achievable when running neural networks on a neuromorphic IC. For example, they endow the IC with an ability to adapt and learn from its environment. In order to achieve the mil- lisecond to second time constants required for these synaptic dynamics, analog subthreshold circu… ▽ More Synaptic dynamics, such as long- and short-term plasticity, play an important role in the complexity and biological realism achievable when running neural networks on a neuromorphic IC. For example, they endow the IC with an ability to adapt and learn from its environment. In order to achieve the mil- lisecond to second time constants required for these synaptic dynamics, analog subthreshold circuits are usually employed. However, due to process variation and leakage problems, it is almost impossible to port these types of circuits to modern sub-100nm technologies. In contrast, we present a neuromor- phic system in a 28 nm CMOS process that employs switched capacitor (SC) circuits to implement 128 short term plasticity presynapses as well as 8192 stop-learning synapses. The neuromorphic system consumes an area of 0.36 mm2 and runs at a power consumption of 1.9 mW. The circuit makes use of a technique for minimizing leakage effects allowing for real-time operation with time constants up to sev- eral seconds. Since we rely on SC techniques for all calculations, the system is composed of only generic mixed-signal building blocks. These generic building blocks make the system easy to port between technologies and the large digital circuit part inherent in an SC system benefits fully from technology scaling. △ Less

Submitted 10 December, 2014; originally announced December 2014.

arXiv:1412.3233 [pdf, ps, other]

A Biological-Realtime Neuromorphic System in 28 nm CMOS using Low-Leakage Switched Capacitor Circuits

Authors: Christian Mayr, Johannes Partzsch, Marko Noack, Stefan Hänzsche, Stefan Scholze, Sebastian Höppner, Georg Ellguth, Rene Schüffny

Abstract: A switched-capacitor (SC) neuromorphic system for closed-loop neural coupling in 28 nm CMOS is presented, occupying 600 um by 600 um. It offers 128 input channels (i.e. presynaptic terminals), 8192 synapses and 64 output channels (i.e. neurons). Biologically realistic neuron and synapse dynam- ics are achieved via a faithful translation of the behavioural equations to SC circuits. As leakage curre… ▽ More A switched-capacitor (SC) neuromorphic system for closed-loop neural coupling in 28 nm CMOS is presented, occupying 600 um by 600 um. It offers 128 input channels (i.e. presynaptic terminals), 8192 synapses and 64 output channels (i.e. neurons). Biologically realistic neuron and synapse dynam- ics are achieved via a faithful translation of the behavioural equations to SC circuits. As leakage currents significantly affect circuit behaviour at this technology node, dedicated compensation techniques are employed to achieve biological-realtime operation, with faithful reproduction of time constants of several 100 ms at room temperature. Power draw of the overall system is 1.9 mW. △ Less

Submitted 10 December, 2014; originally announced December 2014.

arXiv:1409.0171 [pdf, ps, other]

OTA based 200 GΩ resistance on 700 μm2 in 180 nm CMOS for neuromorphic applications

Authors: Christian Mayr, Michael Schultz, Marko Noack, Stephan Henker, Johannes Partzsch, Rene Schüffny

Abstract: Generating an exponential decay function with a time constant on the order of hundreds of milliseconds is a mainstay for neuromorphic circuits. Usually, either subthreshold circuits or RC-decays based on transconductance amplifiers are used. In the latter case, transconductances in the 10 pS range are needed. However, state-of-the-art low-transconductance amplifiers still require too much circuit… ▽ More Generating an exponential decay function with a time constant on the order of hundreds of milliseconds is a mainstay for neuromorphic circuits. Usually, either subthreshold circuits or RC-decays based on transconductance amplifiers are used. In the latter case, transconductances in the 10 pS range are needed. However, state-of-the-art low-transconductance amplifiers still require too much circuit area to be applicable in neuromorphic circuits where >100 of these time constant circuits may be required on a single chip. We present a silicon verified operational transconductance amplifier that achieves a gm of 5 pS in only 700 μm2, a factor of 10-100 less area than current examples. This allows a high-density integration of time constant circuits in target appliations such as synaptic learning or as driving circuit for neuromorphic memristor arrays. △ Less

Submitted 30 August, 2014; originally announced September 2014.

Comments: 8 pages, 8 figures

arXiv:1408.1986 [pdf]

doi 10.1109/TNN.2007.891687

Gabor-like Image Filtering using a Neural Microcircuit

Authors: C. Mayr, A. Heittmann, R. Schüffny

Abstract: In this letter, we present an implementation of a neural microcircuit for image processing employing Hebbian-adaptive learning. The neuronal circuit utilizes only excitatory synapses to correlate action potentials, extracting the uncorrelated ones, which contain significant image information. This circuit is capable of approximating Gabor-like image filtering and other image processing functions In this letter, we present an implementation of a neural microcircuit for image processing employing Hebbian-adaptive learning. The neuronal circuit utilizes only excitatory synapses to correlate action potentials, extracting the uncorrelated ones, which contain significant image information. This circuit is capable of approximating Gabor-like image filtering and other image processing functions △ Less

Submitted 8 August, 2014; originally announced August 2014.

Journal ref: IEEE Transactions on Neural Networks, vol. 18, pages 955-959, 2007

arXiv:1408.1984 [pdf]

Neighborhood Rank Order Coding for Robust Texture Analysis and Feature Extraction

Authors: C. Mayr, R. Schüffny

Abstract: Research into the visual cortex and general neural information processing has led to various attempts to integrate pulse computation schemes in image analysis systems. Of interest is especially the robustness of representing an analogue signal in the phase or duration of a pulsed, quasi-digital signal, as well as the possibility of direct digital interaction, i.e. computation, among these signals.… ▽ More Research into the visual cortex and general neural information processing has led to various attempts to integrate pulse computation schemes in image analysis systems. Of interest is especially the robustness of representing an analogue signal in the phase or duration of a pulsed, quasi-digital signal, as well as the possibility of direct digital interaction, i.e. computation, among these signals. Such a computation can also achieve information compaction for subsequent processing stages. By using a pulse order encoding scheme motivated by dendritic pulse interaction, we will show that a powerful low-level feature and texture extraction operator, called Pulsed Local Orientation Coding (PLOC), can be implemented. Feature extraction results are being presented, and a possible VLSI implementation is detailed. △ Less

Submitted 8 August, 2014; originally announced August 2014.

Journal ref: Proc. 7th International Conference on Hybrid Intelligent Systems HIS 07, pages 290-295, 2007

arXiv:1408.1938 [pdf]

Applying Spiking Neural Nets to Noise Sha**

Authors: C. Mayr, R. Schüffny

Abstract: -In recent years, there has been an increased focus on the mechanics of information transmission in spiking neural networks. Especially the Noise Sha** properties of these networks and their similarity to Delta-Sigma Modulators has received a lot of attention. However, very little of the research done in this area has focused on the effect the weights in these networks have on the Noise Sha**… ▽ More -In recent years, there has been an increased focus on the mechanics of information transmission in spiking neural networks. Especially the Noise Sha** properties of these networks and their similarity to Delta-Sigma Modulators has received a lot of attention. However, very little of the research done in this area has focused on the effect the weights in these networks have on the Noise Sha** properties and on post- processing of the network output signal. This paper concerns itself with the various modes of network operation and beneficial as well as detrimental effects which the systematic generation of network weights can effect. Also, a method for post-processing of the spiking output signal is introduced, bringing the output signal more in line with conventional Delta-Sigma Modulators. Relevancy of this research to industrial application of neural nets as building blocks of oversampled A/D converters is shown. Also, further points of contention are listed, which must be thoroughly researched to add to the above mentioned applicability of spiking neural nets. △ Less

Submitted 8 August, 2014; originally announced August 2014.

Journal ref: IEICE Transactions on Information and Systems, vol. E88-D, no. 8, pages 1885-1892, 2005

Showing 1–35 of 35 results for author: Mayr, C