-
The Impact of Activation Sparsity on Overfitting in Convolutional Neural Networks
Authors:
Karim Huesmann,
Luis Garcia Rodriguez,
Lars Linsen,
Benjamin Risse
Abstract:
Overfitting is one of the fundamental challenges when training convolutional neural networks and is usually identified by a diverging training and test loss. The underlying dynamics of how the flow of activations induce overfitting is however poorly understood. In this study we introduce a perplexity-based sparsity definition to derive and visualise layer-wise activation measures. These novel expl…
▽ More
Overfitting is one of the fundamental challenges when training convolutional neural networks and is usually identified by a diverging training and test loss. The underlying dynamics of how the flow of activations induce overfitting is however poorly understood. In this study we introduce a perplexity-based sparsity definition to derive and visualise layer-wise activation measures. These novel explainable AI strategies reveal a surprising relationship between activation sparsity and overfitting, namely an increase in sparsity in the feature extraction layers shortly before the test loss starts rising. This tendency is preserved across network architectures and reguralisation strategies so that our measures can be used as a reliable indicator for overfitting while decoupling the network's generalisation capabilities from its loss-based definition. Moreover, our differentiable sparsity formulation can be used to explicitly penalise the emergence of sparsity during training so that the impact of reduced sparsity on overfitting can be studied in real-time. Applying this penalty and analysing activation sparsity for well known regularisers and in common network architectures supports the hypothesis that reduced activation sparsity can effectively improve the generalisation and classification performance. In line with other recent work on this topic, our methods reveal novel insights into the contradicting concepts of activation sparsity and network capacity by demonstrating that dense activations can enable discriminative feature learning while efficiently exploiting the capacity of deep models without suffering from overfitting, even when trained excessively.
△ Less
Submitted 13 April, 2021;
originally announced April 2021.
-
Exploiting the Full Capacity of Deep Neural Networks while Avoiding Overfitting by Targeted Sparsity Regularization
Authors:
Karim Huesmann,
Soeren Klemm,
Lars Linsen,
Benjamin Risse
Abstract:
Overfitting is one of the most common problems when training deep neural networks on comparatively small datasets. Here, we demonstrate that neural network activation sparsity is a reliable indicator for overfitting which we utilize to propose novel targeted sparsity visualization and regularization strategies. Based on these strategies we are able to understand and counteract overfitting caused b…
▽ More
Overfitting is one of the most common problems when training deep neural networks on comparatively small datasets. Here, we demonstrate that neural network activation sparsity is a reliable indicator for overfitting which we utilize to propose novel targeted sparsity visualization and regularization strategies. Based on these strategies we are able to understand and counteract overfitting caused by activation sparsity and filter correlation in a targeted layer-by-layer manner. Our results demonstrate that targeted sparsity regularization can efficiently be used to regularize well-known datasets and architectures with a significant increase in image classification performance while outperforming both dropout and batch normalization. Ultimately, our study reveals novel insights into the contradicting concepts of activation sparsity and network capacity by demonstrating that targeted sparsity regularization enables salient and discriminative feature learning while exploiting the full capacity of deep models without suffering from overfitting, even when trained excessively.
△ Less
Submitted 21 February, 2020;
originally announced February 2020.
-
Accelerated physical emulation of Bayesian inference in spiking neural networks
Authors:
Akos F. Kungl,
Sebastian Schmitt,
Johann Klähn,
Paul Müller,
Andreas Baumbach,
Dominik Dold,
Alexander Kugele,
Nico Gürtler,
Luziwei Leng,
Eric Müller,
Christoph Koke,
Mitja Kleider,
Christian Mauch,
Oliver Breitwieser,
Maurice Güttler,
Dan Husmann,
Kai Husmann,
Joscha Ilmberger,
Andreas Hartel,
Vitali Karasenko,
Andreas Grübl,
Johannes Schemmel,
Karlheinz Meier,
Mihai A. Petrovici
Abstract:
The massively parallel nature of biological information processing plays an important role for its superiority to human-engineered computing devices. In particular, it may hold the key to overcoming the von Neumann bottleneck that limits contemporary computer architectures. Physical-model neuromorphic devices seek to replicate not only this inherent parallelism, but also aspects of its microscopic…
▽ More
The massively parallel nature of biological information processing plays an important role for its superiority to human-engineered computing devices. In particular, it may hold the key to overcoming the von Neumann bottleneck that limits contemporary computer architectures. Physical-model neuromorphic devices seek to replicate not only this inherent parallelism, but also aspects of its microscopic dynamics in analog circuits emulating neurons and synapses. However, these machines require network models that are not only adept at solving particular tasks, but that can also cope with the inherent imperfections of analog substrates. We present a spiking network model that performs Bayesian inference through sampling on the BrainScaleS neuromorphic platform, where we use it for generative and discriminative computations on visual data. By illustrating its functionality on this platform, we implicitly demonstrate its robustness to various substrate-specific distortive effects, as well as its accelerated capability for computation. These results showcase the advantages of brain-inspired physical computation and provide important building blocks for large-scale neuromorphic applications.
△ Less
Submitted 1 April, 2020; v1 submitted 6 July, 2018;
originally announced July 2018.
-
Pattern representation and recognition with accelerated analog neuromorphic systems
Authors:
Mihai A. Petrovici,
Sebastian Schmitt,
Johann Klähn,
David Stöckel,
Anna Schroeder,
Guillaume Bellec,
Johannes Bill,
Oliver Breitwieser,
Ilja Bytschok,
Andreas Grübl,
Maurice Güttler,
Andreas Hartel,
Stephan Hartmann,
Dan Husmann,
Kai Husmann,
Sebastian Jeltsch,
Vitali Karasenko,
Mitja Kleider,
Christoph Koke,
Alexander Kononov,
Christian Mauch,
Eric Müller,
Paul Müller,
Johannes Partzsch,
Thomas Pfeil
, et al. (11 additional authors not shown)
Abstract:
Despite being originally inspired by the central nervous system, artificial neural networks have diverged from their biological archetypes as they have been remodeled to fit particular tasks. In this paper, we review several possibilites to reverse map these architectures to biologically more realistic spiking networks with the aim of emulating them on fast, low-power neuromorphic hardware. Since…
▽ More
Despite being originally inspired by the central nervous system, artificial neural networks have diverged from their biological archetypes as they have been remodeled to fit particular tasks. In this paper, we review several possibilites to reverse map these architectures to biologically more realistic spiking networks with the aim of emulating them on fast, low-power neuromorphic hardware. Since many of these devices employ analog components, which cannot be perfectly controlled, finding ways to compensate for the resulting effects represents a key challenge. Here, we discuss three different strategies to address this problem: the addition of auxiliary network components for stabilizing activity, the utilization of inherently robust architectures and a training method for hardware-emulated networks that functions without perfect knowledge of the system's dynamics and parameters. For all three scenarios, we corroborate our theoretical considerations with experimental results on accelerated analog neuromorphic platforms.
△ Less
Submitted 3 July, 2017; v1 submitted 17 March, 2017;
originally announced March 2017.
-
Neuromorphic Hardware In The Loop: Training a Deep Spiking Network on the BrainScaleS Wafer-Scale System
Authors:
Sebastian Schmitt,
Johann Klaehn,
Guillaume Bellec,
Andreas Gruebl,
Maurice Guettler,
Andreas Hartel,
Stephan Hartmann,
Dan Husmann,
Kai Husmann,
Vitali Karasenko,
Mitja Kleider,
Christoph Koke,
Christian Mauch,
Eric Mueller,
Paul Mueller,
Johannes Partzsch,
Mihai A. Petrovici,
Stefan Schiefer,
Stefan Scholze,
Bernhard Vogginger,
Robert Legenstein,
Wolfgang Maass,
Christian Mayr,
Johannes Schemmel,
Karlheinz Meier
Abstract:
Emulating spiking neural networks on analog neuromorphic hardware offers several advantages over simulating them on conventional computers, particularly in terms of speed and energy consumption. However, this usually comes at the cost of reduced control over the dynamics of the emulated networks. In this paper, we demonstrate how iterative training of a hardware-emulated network can compensate for…
▽ More
Emulating spiking neural networks on analog neuromorphic hardware offers several advantages over simulating them on conventional computers, particularly in terms of speed and energy consumption. However, this usually comes at the cost of reduced control over the dynamics of the emulated networks. In this paper, we demonstrate how iterative training of a hardware-emulated network can compensate for anomalies induced by the analog substrate. We first convert a deep neural network trained in software to a spiking network on the BrainScaleS wafer-scale neuromorphic system, thereby enabling an acceleration factor of 10 000 compared to the biological time domain. This map** is followed by the in-the-loop training, where in each training step, the network activity is first recorded in hardware and then used to compute the parameter updates in software via backpropagation. An essential finding is that the parameter updates do not have to be precise, but only need to approximately follow the correct gradient, which simplifies the computation of updates. Using this approach, after only several tens of iterations, the spiking network shows an accuracy close to the ideal software-emulated prototype. The presented techniques show that deep spiking networks emulated on analog neuromorphic devices can attain good computational performance despite the inherent variations of the analog substrate.
△ Less
Submitted 6 March, 2017;
originally announced March 2017.