-
The SpiNNaker 2 Processing Element Architecture for Hybrid Digital Neuromorphic Computing
Authors:
Sebastian Höppner,
Yexin Yan,
Andreas Dixius,
Stefan Scholze,
Johannes Partzsch,
Marco Stolba,
Florian Kelber,
Bernhard Vogginger,
Felix Neumärker,
Georg Ellguth,
Stephan Hartmann,
Stefan Schiefer,
Thomas Hocker,
Dennis Walter,
Genting Liu,
Jim Garside,
Steve Furber,
Christian Mayr
Abstract:
This paper introduces the processing element architecture of the second generation SpiNNaker chip, implemented in 22nm FDSOI. On circuit level, the chip features adaptive body biasing for near-threshold operation, and dynamic voltage-and-frequency scaling driven by spiking activity. On system level, processing is centered around an ARM M4 core, similar to the processor-centric architecture of the…
▽ More
This paper introduces the processing element architecture of the second generation SpiNNaker chip, implemented in 22nm FDSOI. On circuit level, the chip features adaptive body biasing for near-threshold operation, and dynamic voltage-and-frequency scaling driven by spiking activity. On system level, processing is centered around an ARM M4 core, similar to the processor-centric architecture of the first generation SpiNNaker. To speed operation of subtasks, we have added accelerators for numerical operations of both spiking (SNN) and rate based (deep) neural networks (DNN). PEs communicate via a dedicated, custom-designed network-on-chip. We present three benchmarks showing operation of the whole processor element on SNN, DNN and hybrid SNN/DNN networks.
△ Less
Submitted 15 August, 2022; v1 submitted 15 March, 2021;
originally announced March 2021.
-
Dynamic Power Management for Neuromorphic Many-Core Systems
Authors:
Sebastian Hoeppner,
Bernhard Vogginger,
Yexin Yan,
Andreas Dixius,
Stefan Scholze,
Johannes Partzsch,
Felix Neumaerker,
Stephan Hartmann,
Stefan Schiefer,
Georg Ellguth,
Love Cederstroem,
Luis Plana,
Jim Garside,
Steve Furber,
Christian Mayr
Abstract:
This work presents a dynamic power management architecture for neuromorphic many core systems such as SpiNNaker. A fast dynamic voltage and frequency scaling (DVFS) technique is presented which allows the processing elements (PE) to change their supply voltage and clock frequency individually and autonomously within less than 100 ns. This is employed by the neuromorphic simulation software flow, w…
▽ More
This work presents a dynamic power management architecture for neuromorphic many core systems such as SpiNNaker. A fast dynamic voltage and frequency scaling (DVFS) technique is presented which allows the processing elements (PE) to change their supply voltage and clock frequency individually and autonomously within less than 100 ns. This is employed by the neuromorphic simulation software flow, which defines the performance level (PL) of the PE based on the actual workload within each simulation cycle. A test chip in 28 nm SLP CMOS technology has been implemented. It includes 4 PEs which can be scaled from 0.7 V to 1.0 V with frequencies from 125 MHz to 500 MHz at three distinct PLs. By measurement of three neuromorphic benchmarks it is shown that the total PE power consumption can be reduced by 75%, with 80% baseline power reduction and a 50% reduction of energy per neuron and synapse computation, all while maintaining temporary peak system performance to achieve biological real-time operation of the system. A numerical model of this power management model is derived which allows DVFS architecture exploration for neuromorphics. The proposed technique is to be used for the second generation SpiNNaker neuromorphic many core system.
△ Less
Submitted 21 March, 2019;
originally announced March 2019.
-
Project Beehive: A Hardware/Software Co-designed Stack for Runtime and Architectural Research
Authors:
Christos Kotselidis,
Andrey Rodchenko,
Colin Barrett,
Andy Nisbet,
John Mawer,
Will Toms,
James Clarkson,
Cosmin Gorgovan,
Amanieu d'Antras,
Yaman Cakmakci,
Thanos Stratikopoulos,
Sebastian Werner,
Jim Garside,
Javier Navaridas,
Antoniu Pop,
John Goodacre,
Mikel Lujan
Abstract:
The end of Dennard scaling combined with stagnation in architectural and compiler optimizations makes it challenging to achieve significant performance deltas. Solutions based solely in hardware or software are no longer sufficient to maintain the pace of improvements seen during the past few decades. In hardware, the end of single-core scaling resulted in the proliferation of multi-core system ar…
▽ More
The end of Dennard scaling combined with stagnation in architectural and compiler optimizations makes it challenging to achieve significant performance deltas. Solutions based solely in hardware or software are no longer sufficient to maintain the pace of improvements seen during the past few decades. In hardware, the end of single-core scaling resulted in the proliferation of multi-core system architectures, however this has forced complex parallel programming techniques into the mainstream. To further exploit physical resources, systems are becoming increasingly heterogeneous with specialized computing elements and accelerators. Programming across a range of disparate architectures requires a new level of abstraction that programming languages will have to adapt to. In software, emerging complex applications, from domains such as Big Data and computer vision, run on multi-layered software stacks targeting hardware with a variety of constraints and resources. Hence, optimizing for the power-performance (and resiliency) space requires experimentation platforms that offer quick and easy prototy** of hardware/software co-designed techniques. To that end, we present Project Beehive: A Hardware/Software co-designed stack for runtime and architectural research. Project Beehive utilizes various state-of-the-art software and hardware components along with novel and extensible co-design techniques. The objective of Project Beehive is to provide a modern platform for experimentation on emerging applications, programming languages, compilers, runtimes, and low-power heterogeneous many-core architectures in a full-system co-designed manner.
△ Less
Submitted 5 June, 2017; v1 submitted 14 September, 2015;
originally announced September 2015.
-
HAPPY: Hybrid Address-based Page Policy in DRAMs
Authors:
Mohsen Ghasempour,
Aamer Jaleel,
Jim Garside,
Mikel Luján
Abstract:
Memory controllers have used static page closure policies to decide whether a row should be left open, open-page policy, or closed immediately, close-page policy, after the row has been accessed. The appropriate choice for a particular access can reduce the average memory latency. However, since application access patterns change at run time, static page policies cannot guarantee to deliver optimu…
▽ More
Memory controllers have used static page closure policies to decide whether a row should be left open, open-page policy, or closed immediately, close-page policy, after the row has been accessed. The appropriate choice for a particular access can reduce the average memory latency. However, since application access patterns change at run time, static page policies cannot guarantee to deliver optimum execution time. Hybrid page policies have been investigated as a means of covering these dynamic scenarios and are now implemented in state-of-the-art processors. Hybrid page policies switch between open-page and close-page policies while the application is running, by monitoring the access pattern of row hits/conflicts and predicting future behavior. Unfortunately, as the size of DRAM memory increases, fine-grain tracking and analysis of memory access patterns does not remain practical. We propose a compact memory address-based encoding technique which can improve or maintain the performance of DRAMs page closure predictors while reducing the hardware overhead in comparison with state-of-the-art techniques. As a case study, we integrate our technique, HAPPY, with a state-of-the-art monitor, the Intel-adaptive open-page policy predictor employed by the Intel Xeon X5650, and a traditional Hybrid page policy. We evaluate them across 70 memory intensive workload mixes consisting of single-thread and multi-thread applications. The experimental results show that using the HAPPY encoding applied to the Intel-adaptive page closure policy can reduce the hardware overhead by 5X for the evaluated 64 GB memory (up to 40X for a 512 GB memory) while maintaining the prediction accuracy.
△ Less
Submitted 12 September, 2015;
originally announced September 2015.
-
DReAM: Dynamic Re-arrangement of Address Map** to Improve the Performance of DRAMs
Authors:
Mohsen Ghasempour,
Jim Garside,
Aamer Jaleel,
Mikel Luján
Abstract:
The initial location of data in DRAMs is determined and controlled by the 'address-map**' and even modern memory controllers use a fixed and run-time-agnostic address map**. On the other hand, the memory access pattern seen at the memory interface level will dynamically change at run-time. This dynamic nature of memory access pattern and the fixed behavior of address map** process in DRAM co…
▽ More
The initial location of data in DRAMs is determined and controlled by the 'address-map**' and even modern memory controllers use a fixed and run-time-agnostic address map**. On the other hand, the memory access pattern seen at the memory interface level will dynamically change at run-time. This dynamic nature of memory access pattern and the fixed behavior of address map** process in DRAM controllers, implied by using a fixed address map** scheme, means that DRAM performance cannot be exploited efficiently. DReAM is a novel hardware technique that can detect a workload-specific address map** at run-time based on the application access pattern which improves the performance of DRAMs. The experimental results show that DReAM outperforms the best evaluated address map** on average by 9%, for map**-sensitive workloads, by 2% for map**-insensitive workloads, and up to 28% across all the workloads. DReAM can be seen as an insurance policy capable of detecting which scenarios are not well served by the predefined address map**.
△ Less
Submitted 12 September, 2015;
originally announced September 2015.
-
Transparent hardware synthesis of Java for predictable large-scale distributed systems
Authors:
Ian Gray,
Yu Chan,
Jamie Garside,
Neil Audsley,
Andy Wellings
Abstract:
The JUNIPER project is develo** a framework for the construction of large-scale distributed systems in which execution time bounds can be guaranteed. Part of this work involves the automatic implementation of input Java code on FPGAs, both for speed and predictability. An important focus of this work is to make the use of FPGAs transparent though runtime co-design and partial reconfiguration. In…
▽ More
The JUNIPER project is develo** a framework for the construction of large-scale distributed systems in which execution time bounds can be guaranteed. Part of this work involves the automatic implementation of input Java code on FPGAs, both for speed and predictability. An important focus of this work is to make the use of FPGAs transparent though runtime co-design and partial reconfiguration. Initial results show that the use of Java does not hamper hardware generation, and provides tight execution time estimates. This paper describes an overview the approach taken, and presents some preliminary results that demonstrate the promise in the technique.
△ Less
Submitted 28 August, 2015;
originally announced August 2015.