Search | arXiv e-print repository

Deploying Machine Learning Models to Ahead-of-Time Runtime on Edge Using MicroTVM

Authors: Chen Liu, Matthias Jobst, Liyuan Guo, Xinyue Shi, Johannes Partzsch, Christian Mayr

Abstract: In the past few years, more and more AI applications have been applied to edge devices. However, models trained by data scientists with machine learning frameworks, such as PyTorch or TensorFlow, can not be seamlessly executed on edge. In this paper, we develop an end-to-end code generator parsing a pre-trained model to C source libraries for the backend using MicroTVM, a machine learning compiler… ▽ More In the past few years, more and more AI applications have been applied to edge devices. However, models trained by data scientists with machine learning frameworks, such as PyTorch or TensorFlow, can not be seamlessly executed on edge. In this paper, we develop an end-to-end code generator parsing a pre-trained model to C source libraries for the backend using MicroTVM, a machine learning compiler framework extension addressing inference on bare metal devices. An analysis shows that specific compute-intensive operators can be easily offloaded to the dedicated accelerator with a Universal Modular Accelerator (UMA) interface, while others are processed in the CPU cores. By using the automatically generated ahead-of-time C runtime, we conduct a hand gesture recognition experiment on an ARM Cortex M4F core. △ Less

Submitted 14 April, 2023; v1 submitted 10 April, 2023; originally announced April 2023.

Comments: CODAI 2022 Workshop - Embedded System Week (ESWeek)

arXiv:2103.08392 [pdf, ps, other]

The SpiNNaker 2 Processing Element Architecture for Hybrid Digital Neuromorphic Computing

Authors: Sebastian Höppner, Yexin Yan, Andreas Dixius, Stefan Scholze, Johannes Partzsch, Marco Stolba, Florian Kelber, Bernhard Vogginger, Felix Neumärker, Georg Ellguth, Stephan Hartmann, Stefan Schiefer, Thomas Hocker, Dennis Walter, Genting Liu, Jim Garside, Steve Furber, Christian Mayr

Abstract: This paper introduces the processing element architecture of the second generation SpiNNaker chip, implemented in 22nm FDSOI. On circuit level, the chip features adaptive body biasing for near-threshold operation, and dynamic voltage-and-frequency scaling driven by spiking activity. On system level, processing is centered around an ARM M4 core, similar to the processor-centric architecture of the… ▽ More This paper introduces the processing element architecture of the second generation SpiNNaker chip, implemented in 22nm FDSOI. On circuit level, the chip features adaptive body biasing for near-threshold operation, and dynamic voltage-and-frequency scaling driven by spiking activity. On system level, processing is centered around an ARM M4 core, similar to the processor-centric architecture of the first generation SpiNNaker. To speed operation of subtasks, we have added accelerators for numerical operations of both spiking (SNN) and rate based (deep) neural networks (DNN). PEs communicate via a dedicated, custom-designed network-on-chip. We present three benchmarks showing operation of the whole processor element on SNN, DNN and hybrid SNN/DNN networks. △ Less

Submitted 15 August, 2022; v1 submitted 15 March, 2021; originally announced March 2021.

arXiv:2009.08921 [pdf, other]

doi 10.1088/2634-4386/abf150

Low-Power Low-Latency Keyword Spotting and Adaptive Control with a SpiNNaker 2 Prototype and Comparison with Loihi

Authors: Yexin Yan, Terrence C. Stewart, Xuan Choo, Bernhard Vogginger, Johannes Partzsch, Sebastian Hoeppner, Florian Kelber, Chris Eliasmith, Steve Furber, Christian Mayr

Abstract: We implemented two neural network based benchmark tasks on a prototype chip of the second-generation SpiNNaker (SpiNNaker 2) neuromorphic system: keyword spotting and adaptive robotic control. Keyword spotting is commonly used in smart speakers to listen for wake words, and adaptive control is used in robotic applications to adapt to unknown dynamics in an online fashion. We highlight the benefit… ▽ More We implemented two neural network based benchmark tasks on a prototype chip of the second-generation SpiNNaker (SpiNNaker 2) neuromorphic system: keyword spotting and adaptive robotic control. Keyword spotting is commonly used in smart speakers to listen for wake words, and adaptive control is used in robotic applications to adapt to unknown dynamics in an online fashion. We highlight the benefit of a multiply accumulate (MAC) array in the SpiNNaker 2 prototype which is ordinarily used in rate-based machine learning networks when employed in a neuromorphic, spiking context. In addition, the same benchmark tasks have been implemented on the Loihi neuromorphic chip, giving a side-by-side comparison regarding power consumption and computation time. While Loihi shows better efficiency when less complicated vector-matrix multiplication is involved, with the MAC array, the SpiNNaker 2 prototype shows better efficiency when high dimensional vector-matrix multiplication is involved. △ Less

Submitted 18 September, 2020; originally announced September 2020.

arXiv:2003.13749 [pdf, other]

The Operating System of the Neuromorphic BrainScaleS-1 System

Authors: Eric Müller, Sebastian Schmitt, Christian Mauch, Sebastian Billaudelle, Andreas Grübl, Maurice Güttler, Dan Husmann, Joscha Ilmberger, Sebastian Jeltsch, Jakob Kaiser, Johann Klähn, Mitja Kleider, Christoph Koke, José Montes, Paul Müller, Johannes Partzsch, Felix Passenberg, Hartmut Schmidt, Bernhard Vogginger, Jonas Weidner, Christian Mayr, Johannes Schemmel

Abstract: BrainScaleS-1 is a wafer-scale mixed-signal accelerated neuromorphic system targeted for research in the fields of computational neuroscience and beyond-von-Neumann computing. The BrainScaleS Operating System (BrainScaleS OS) is a software stack giving users the possibility to emulate networks described in the high-level network description language PyNN with minimal knowledge of the system. At th… ▽ More BrainScaleS-1 is a wafer-scale mixed-signal accelerated neuromorphic system targeted for research in the fields of computational neuroscience and beyond-von-Neumann computing. The BrainScaleS Operating System (BrainScaleS OS) is a software stack giving users the possibility to emulate networks described in the high-level network description language PyNN with minimal knowledge of the system. At the same time, expert usage is facilitated by allowing to hook into the system at any depth of the stack. We present operation and development methodologies implemented for the BrainScaleS-1 neuromorphic architecture and walk through the individual components of BrainScaleS OS constituting the software stack for BrainScaleS-1 platform operation. △ Less

Submitted 2 February, 2022; v1 submitted 30 March, 2020; originally announced March 2020.

arXiv:1904.10389 [pdf, ps, other]

Mean Field Approach for Configuring Population Dynamics on a Biohybrid Neuromorphic System

Authors: Johannes Partzsch, Christian Mayr, Massimiliano Giulioni, Marko Noack, Stefan Hänzsche, Stefan Scholze, Sebastian Höppner, Paolo Del Giudice, Rene Schüffny

Abstract: Real-time coupling of cell cultures to neuromorphic circuits necessitates a neuromorphic network that replicates biological behaviour both on a per-neuron and on a population basis, with a network size comparable to the culture. We present a large neuromorphic system composed of 9 chips, with overall 2880 neurons and 144M conductance-based synapses. As they are realized in a robust switched-capaci… ▽ More Real-time coupling of cell cultures to neuromorphic circuits necessitates a neuromorphic network that replicates biological behaviour both on a per-neuron and on a population basis, with a network size comparable to the culture. We present a large neuromorphic system composed of 9 chips, with overall 2880 neurons and 144M conductance-based synapses. As they are realized in a robust switched-capacitor fashion, individual neurons and synapses can be configured to replicate with high fidelity a wide range of biologically realistic behaviour. In contrast to other exploration/heuristics-based approaches, we employ a theory-guided mesoscopic approach to configure the overall network to a range of bursting behaviours, thus replicating the statistics of our targeted in-vitro network. The mesoscopic approach has implications beyond our proposed biohybrid, as it allows a targeted exploration of the behavioural space, which is a non-trivial task especially in large, recurrent networks. △ Less

Submitted 23 April, 2019; originally announced April 2019.

Comments: 21 pages, 13 figures

arXiv:1903.08941 [pdf, other]

Dynamic Power Management for Neuromorphic Many-Core Systems

Authors: Sebastian Hoeppner, Bernhard Vogginger, Yexin Yan, Andreas Dixius, Stefan Scholze, Johannes Partzsch, Felix Neumaerker, Stephan Hartmann, Stefan Schiefer, Georg Ellguth, Love Cederstroem, Luis Plana, Jim Garside, Steve Furber, Christian Mayr

Abstract: This work presents a dynamic power management architecture for neuromorphic many core systems such as SpiNNaker. A fast dynamic voltage and frequency scaling (DVFS) technique is presented which allows the processing elements (PE) to change their supply voltage and clock frequency individually and autonomously within less than 100 ns. This is employed by the neuromorphic simulation software flow, w… ▽ More This work presents a dynamic power management architecture for neuromorphic many core systems such as SpiNNaker. A fast dynamic voltage and frequency scaling (DVFS) technique is presented which allows the processing elements (PE) to change their supply voltage and clock frequency individually and autonomously within less than 100 ns. This is employed by the neuromorphic simulation software flow, which defines the performance level (PL) of the PE based on the actual workload within each simulation cycle. A test chip in 28 nm SLP CMOS technology has been implemented. It includes 4 PEs which can be scaled from 0.7 V to 1.0 V with frequencies from 125 MHz to 500 MHz at three distinct PLs. By measurement of three neuromorphic benchmarks it is shown that the total PE power consumption can be reduced by 75%, with 80% baseline power reduction and a 50% reduction of energy per neuron and synapse computation, all while maintaining temporary peak system performance to achieve biological real-time operation of the system. A numerical model of this power management model is derived which allows DVFS architecture exploration for neuromorphics. The proposed technique is to be used for the second generation SpiNNaker neuromorphic many core system. △ Less

Submitted 21 March, 2019; originally announced March 2019.

arXiv:1903.08500 [pdf, other]

doi 10.1109/TBCAS.2019.2906401

Efficient Reward-Based Structural Plasticity on a SpiNNaker 2 Prototype

Authors: Yexin Yan, David Kappel, Felix Neumaerker, Johannes Partzsch, Bernhard Vogginger, Sebastian Hoeppner, Steve Furber, Wolfgang Maass, Robert Legenstein, Christian Mayr

Abstract: Advances in neuroscience uncover the mechanisms employed by the brain to efficiently solve complex learning tasks with very limited resources. However, the efficiency is often lost when one tries to port these findings to a silicon substrate, since brain-inspired algorithms often make extensive use of complex functions such as random number generators, that are expensive to compute on standard gen… ▽ More Advances in neuroscience uncover the mechanisms employed by the brain to efficiently solve complex learning tasks with very limited resources. However, the efficiency is often lost when one tries to port these findings to a silicon substrate, since brain-inspired algorithms often make extensive use of complex functions such as random number generators, that are expensive to compute on standard general purpose hardware. The prototype chip of the 2nd generation SpiNNaker system is designed to overcome this problem. Low-power ARM processors equipped with a random number generator and an exponential function accelerator enable the efficient execution of brain-inspired algorithms. We implement the recently introduced reward-based synaptic sampling model that employs structural plasticity to learn a function or task. The numerical simulation of the model requires to update the synapse variables in each time step including an explorative random term. To the best of our knowledge, this is the most complex synapse model implemented so far on the SpiNNaker system. By making efficient use of the hardware accelerators and numerical optimizations the computation time of one plasticity update is reduced by a factor of 2. This, combined with fitting the model into to the local SRAM, leads to 62% energy reduction compared to the case without accelerators and the use of external DRAM. The model implementation is integrated into the SpiNNaker software framework allowing for scalability onto larger systems. The hardware-software system presented in this work paves the way for power-efficient mobile and biomedical applications with biologically plausible brain-inspired algorithms. △ Less

Submitted 20 March, 2019; originally announced March 2019.

Comments: accepted by IEEE TBioCAS

arXiv:1703.06043 [pdf, other]

doi 10.1109/ISCAS.2017.8050530

Pattern representation and recognition with accelerated analog neuromorphic systems

Authors: Mihai A. Petrovici, Sebastian Schmitt, Johann Klähn, David Stöckel, Anna Schroeder, Guillaume Bellec, Johannes Bill, Oliver Breitwieser, Ilja Bytschok, Andreas Grübl, Maurice Güttler, Andreas Hartel, Stephan Hartmann, Dan Husmann, Kai Husmann, Sebastian Jeltsch, Vitali Karasenko, Mitja Kleider, Christoph Koke, Alexander Kononov, Christian Mauch, Eric Müller, Paul Müller, Johannes Partzsch, Thomas Pfeil , et al. (11 additional authors not shown)

Abstract: Despite being originally inspired by the central nervous system, artificial neural networks have diverged from their biological archetypes as they have been remodeled to fit particular tasks. In this paper, we review several possibilites to reverse map these architectures to biologically more realistic spiking networks with the aim of emulating them on fast, low-power neuromorphic hardware. Since… ▽ More Despite being originally inspired by the central nervous system, artificial neural networks have diverged from their biological archetypes as they have been remodeled to fit particular tasks. In this paper, we review several possibilites to reverse map these architectures to biologically more realistic spiking networks with the aim of emulating them on fast, low-power neuromorphic hardware. Since many of these devices employ analog components, which cannot be perfectly controlled, finding ways to compensate for the resulting effects represents a key challenge. Here, we discuss three different strategies to address this problem: the addition of auxiliary network components for stabilizing activity, the utilization of inherently robust architectures and a training method for hardware-emulated networks that functions without perfect knowledge of the system's dynamics and parameters. For all three scenarios, we corroborate our theoretical considerations with experimental results on accelerated analog neuromorphic platforms. △ Less

Submitted 3 July, 2017; v1 submitted 17 March, 2017; originally announced March 2017.

Comments: accepted at ISCAS 2017

Journal ref: Circuits and Systems (ISCAS), 2017 IEEE International Symposium on

arXiv:1703.01909 [pdf, other]

doi 10.1109/IJCNN.2017.7966125

Neuromorphic Hardware In The Loop: Training a Deep Spiking Network on the BrainScaleS Wafer-Scale System

Authors: Sebastian Schmitt, Johann Klaehn, Guillaume Bellec, Andreas Gruebl, Maurice Guettler, Andreas Hartel, Stephan Hartmann, Dan Husmann, Kai Husmann, Vitali Karasenko, Mitja Kleider, Christoph Koke, Christian Mauch, Eric Mueller, Paul Mueller, Johannes Partzsch, Mihai A. Petrovici, Stefan Schiefer, Stefan Scholze, Bernhard Vogginger, Robert Legenstein, Wolfgang Maass, Christian Mayr, Johannes Schemmel, Karlheinz Meier

Abstract: Emulating spiking neural networks on analog neuromorphic hardware offers several advantages over simulating them on conventional computers, particularly in terms of speed and energy consumption. However, this usually comes at the cost of reduced control over the dynamics of the emulated networks. In this paper, we demonstrate how iterative training of a hardware-emulated network can compensate for… ▽ More Emulating spiking neural networks on analog neuromorphic hardware offers several advantages over simulating them on conventional computers, particularly in terms of speed and energy consumption. However, this usually comes at the cost of reduced control over the dynamics of the emulated networks. In this paper, we demonstrate how iterative training of a hardware-emulated network can compensate for anomalies induced by the analog substrate. We first convert a deep neural network trained in software to a spiking network on the BrainScaleS wafer-scale neuromorphic system, thereby enabling an acceleration factor of 10 000 compared to the biological time domain. This map** is followed by the in-the-loop training, where in each training step, the network activity is first recorded in hardware and then used to compute the parameter updates in software via backpropagation. An essential finding is that the parameter updates do not have to be precise, but only need to approximately follow the correct gradient, which simplifies the computation of updates. Using this approach, after only several tens of iterations, the spiking network shows an accuracy close to the ideal software-emulated prototype. The presented techniques show that deep spiking networks emulated on analog neuromorphic devices can attain good computational performance despite the inherent variations of the analog substrate. △ Less

Submitted 6 March, 2017; originally announced March 2017.

Comments: 8 pages, 10 figures, submitted to IJCNN 2017

arXiv:1412.3243 [pdf, ps, other]

Switched-Capacitor Realization of Presynaptic Short-Term-Plasticity and Stop-Learning Synapses in 28 nm CMOS

Authors: Marko Noack, Johannes Partzsch, Christian Mayr, Stefan Hänzsche, Stefan Scholze, Sebastian Höppner, Georg Ellguth, Rene Schüffny

Abstract: Synaptic dynamics, such as long- and short-term plasticity, play an important role in the complexity and biological realism achievable when running neural networks on a neuromorphic IC. For example, they endow the IC with an ability to adapt and learn from its environment. In order to achieve the mil- lisecond to second time constants required for these synaptic dynamics, analog subthreshold circu… ▽ More Synaptic dynamics, such as long- and short-term plasticity, play an important role in the complexity and biological realism achievable when running neural networks on a neuromorphic IC. For example, they endow the IC with an ability to adapt and learn from its environment. In order to achieve the mil- lisecond to second time constants required for these synaptic dynamics, analog subthreshold circuits are usually employed. However, due to process variation and leakage problems, it is almost impossible to port these types of circuits to modern sub-100nm technologies. In contrast, we present a neuromor- phic system in a 28 nm CMOS process that employs switched capacitor (SC) circuits to implement 128 short term plasticity presynapses as well as 8192 stop-learning synapses. The neuromorphic system consumes an area of 0.36 mm2 and runs at a power consumption of 1.9 mW. The circuit makes use of a technique for minimizing leakage effects allowing for real-time operation with time constants up to sev- eral seconds. Since we rely on SC techniques for all calculations, the system is composed of only generic mixed-signal building blocks. These generic building blocks make the system easy to port between technologies and the large digital circuit part inherent in an SC system benefits fully from technology scaling. △ Less

Submitted 10 December, 2014; originally announced December 2014.

arXiv:1412.3233 [pdf, ps, other]

A Biological-Realtime Neuromorphic System in 28 nm CMOS using Low-Leakage Switched Capacitor Circuits

Authors: Christian Mayr, Johannes Partzsch, Marko Noack, Stefan Hänzsche, Stefan Scholze, Sebastian Höppner, Georg Ellguth, Rene Schüffny

Abstract: A switched-capacitor (SC) neuromorphic system for closed-loop neural coupling in 28 nm CMOS is presented, occupying 600 um by 600 um. It offers 128 input channels (i.e. presynaptic terminals), 8192 synapses and 64 output channels (i.e. neurons). Biologically realistic neuron and synapse dynam- ics are achieved via a faithful translation of the behavioural equations to SC circuits. As leakage curre… ▽ More A switched-capacitor (SC) neuromorphic system for closed-loop neural coupling in 28 nm CMOS is presented, occupying 600 um by 600 um. It offers 128 input channels (i.e. presynaptic terminals), 8192 synapses and 64 output channels (i.e. neurons). Biologically realistic neuron and synapse dynam- ics are achieved via a faithful translation of the behavioural equations to SC circuits. As leakage currents significantly affect circuit behaviour at this technology node, dedicated compensation techniques are employed to achieve biological-realtime operation, with faithful reproduction of time constants of several 100 ms at room temperature. Power draw of the overall system is 1.9 mW. △ Less

Submitted 10 December, 2014; originally announced December 2014.

arXiv:1409.0171 [pdf, ps, other]

OTA based 200 GΩ resistance on 700 μm2 in 180 nm CMOS for neuromorphic applications

Authors: Christian Mayr, Michael Schultz, Marko Noack, Stephan Henker, Johannes Partzsch, Rene Schüffny

Abstract: Generating an exponential decay function with a time constant on the order of hundreds of milliseconds is a mainstay for neuromorphic circuits. Usually, either subthreshold circuits or RC-decays based on transconductance amplifiers are used. In the latter case, transconductances in the 10 pS range are needed. However, state-of-the-art low-transconductance amplifiers still require too much circuit… ▽ More Generating an exponential decay function with a time constant on the order of hundreds of milliseconds is a mainstay for neuromorphic circuits. Usually, either subthreshold circuits or RC-decays based on transconductance amplifiers are used. In the latter case, transconductances in the 10 pS range are needed. However, state-of-the-art low-transconductance amplifiers still require too much circuit area to be applicable in neuromorphic circuits where >100 of these time constant circuits may be required on a single chip. We present a silicon verified operational transconductance amplifier that achieves a gm of 5 pS in only 700 μm2, a factor of 10-100 less area than current examples. This allows a high-density integration of time constant circuits in target appliations such as synaptic learning or as driving circuit for neuromorphic memristor arrays. △ Less

Submitted 30 August, 2014; originally announced September 2014.

Comments: 8 pages, 8 figures

Showing 1–12 of 12 results for author: Partzsch, J