Search | arXiv e-print repository

The Impact of Activation Sparsity on Overfitting in Convolutional Neural Networks

Authors: Karim Huesmann, Luis Garcia Rodriguez, Lars Linsen, Benjamin Risse

Abstract: Overfitting is one of the fundamental challenges when training convolutional neural networks and is usually identified by a diverging training and test loss. The underlying dynamics of how the flow of activations induce overfitting is however poorly understood. In this study we introduce a perplexity-based sparsity definition to derive and visualise layer-wise activation measures. These novel expl… ▽ More Overfitting is one of the fundamental challenges when training convolutional neural networks and is usually identified by a diverging training and test loss. The underlying dynamics of how the flow of activations induce overfitting is however poorly understood. In this study we introduce a perplexity-based sparsity definition to derive and visualise layer-wise activation measures. These novel explainable AI strategies reveal a surprising relationship between activation sparsity and overfitting, namely an increase in sparsity in the feature extraction layers shortly before the test loss starts rising. This tendency is preserved across network architectures and reguralisation strategies so that our measures can be used as a reliable indicator for overfitting while decoupling the network's generalisation capabilities from its loss-based definition. Moreover, our differentiable sparsity formulation can be used to explicitly penalise the emergence of sparsity during training so that the impact of reduced sparsity on overfitting can be studied in real-time. Applying this penalty and analysing activation sparsity for well known regularisers and in common network architectures supports the hypothesis that reduced activation sparsity can effectively improve the generalisation and classification performance. In line with other recent work on this topic, our methods reveal novel insights into the contradicting concepts of activation sparsity and network capacity by demonstrating that dense activations can enable discriminative feature learning while efficiently exploiting the capacity of deep models without suffering from overfitting, even when trained excessively. △ Less

Submitted 13 April, 2021; originally announced April 2021.

Journal ref: Pattern Recognition. ICPR International Workshops and Challenges (2021) 130-145

arXiv:2002.09237 [pdf, other]

Exploiting the Full Capacity of Deep Neural Networks while Avoiding Overfitting by Targeted Sparsity Regularization

Authors: Karim Huesmann, Soeren Klemm, Lars Linsen, Benjamin Risse

Abstract: Overfitting is one of the most common problems when training deep neural networks on comparatively small datasets. Here, we demonstrate that neural network activation sparsity is a reliable indicator for overfitting which we utilize to propose novel targeted sparsity visualization and regularization strategies. Based on these strategies we are able to understand and counteract overfitting caused b… ▽ More Overfitting is one of the most common problems when training deep neural networks on comparatively small datasets. Here, we demonstrate that neural network activation sparsity is a reliable indicator for overfitting which we utilize to propose novel targeted sparsity visualization and regularization strategies. Based on these strategies we are able to understand and counteract overfitting caused by activation sparsity and filter correlation in a targeted layer-by-layer manner. Our results demonstrate that targeted sparsity regularization can efficiently be used to regularize well-known datasets and architectures with a significant increase in image classification performance while outperforming both dropout and batch normalization. Ultimately, our study reveals novel insights into the contradicting concepts of activation sparsity and network capacity by demonstrating that targeted sparsity regularization enables salient and discriminative feature learning while exploiting the full capacity of deep models without suffering from overfitting, even when trained excessively. △ Less

Submitted 21 February, 2020; originally announced February 2020.

Comments: 10 pages, 9 figures

ACM Class: I.2.6; I.5.1

arXiv:1807.02389 [pdf, other]

doi 10.3389/fnins.2019.01201

Accelerated physical emulation of Bayesian inference in spiking neural networks

Authors: Akos F. Kungl, Sebastian Schmitt, Johann Klähn, Paul Müller, Andreas Baumbach, Dominik Dold, Alexander Kugele, Nico Gürtler, Luziwei Leng, Eric Müller, Christoph Koke, Mitja Kleider, Christian Mauch, Oliver Breitwieser, Maurice Güttler, Dan Husmann, Kai Husmann, Joscha Ilmberger, Andreas Hartel, Vitali Karasenko, Andreas Grübl, Johannes Schemmel, Karlheinz Meier, Mihai A. Petrovici

Abstract: The massively parallel nature of biological information processing plays an important role for its superiority to human-engineered computing devices. In particular, it may hold the key to overcoming the von Neumann bottleneck that limits contemporary computer architectures. Physical-model neuromorphic devices seek to replicate not only this inherent parallelism, but also aspects of its microscopic… ▽ More The massively parallel nature of biological information processing plays an important role for its superiority to human-engineered computing devices. In particular, it may hold the key to overcoming the von Neumann bottleneck that limits contemporary computer architectures. Physical-model neuromorphic devices seek to replicate not only this inherent parallelism, but also aspects of its microscopic dynamics in analog circuits emulating neurons and synapses. However, these machines require network models that are not only adept at solving particular tasks, but that can also cope with the inherent imperfections of analog substrates. We present a spiking network model that performs Bayesian inference through sampling on the BrainScaleS neuromorphic platform, where we use it for generative and discriminative computations on visual data. By illustrating its functionality on this platform, we implicitly demonstrate its robustness to various substrate-specific distortive effects, as well as its accelerated capability for computation. These results showcase the advantages of brain-inspired physical computation and provide important building blocks for large-scale neuromorphic applications. △ Less

Submitted 1 April, 2020; v1 submitted 6 July, 2018; originally announced July 2018.

Comments: This preprint has been published 2019 November 14. Please cite as: Kungl A. F. et al. (2019) Accelerated Physical Emulation of Bayesian Inference in Spiking Neural Networks. Front. Neurosci. 13:1201. doi: 10.3389/fnins.2019.01201

Journal ref: Frontiers in Neuroscience - Neuromorphic Engineering, 14 November 2019

arXiv:1703.06043 [pdf, other]

doi 10.1109/ISCAS.2017.8050530

Pattern representation and recognition with accelerated analog neuromorphic systems

Authors: Mihai A. Petrovici, Sebastian Schmitt, Johann Klähn, David Stöckel, Anna Schroeder, Guillaume Bellec, Johannes Bill, Oliver Breitwieser, Ilja Bytschok, Andreas Grübl, Maurice Güttler, Andreas Hartel, Stephan Hartmann, Dan Husmann, Kai Husmann, Sebastian Jeltsch, Vitali Karasenko, Mitja Kleider, Christoph Koke, Alexander Kononov, Christian Mauch, Eric Müller, Paul Müller, Johannes Partzsch, Thomas Pfeil , et al. (11 additional authors not shown)

Abstract: Despite being originally inspired by the central nervous system, artificial neural networks have diverged from their biological archetypes as they have been remodeled to fit particular tasks. In this paper, we review several possibilites to reverse map these architectures to biologically more realistic spiking networks with the aim of emulating them on fast, low-power neuromorphic hardware. Since… ▽ More Despite being originally inspired by the central nervous system, artificial neural networks have diverged from their biological archetypes as they have been remodeled to fit particular tasks. In this paper, we review several possibilites to reverse map these architectures to biologically more realistic spiking networks with the aim of emulating them on fast, low-power neuromorphic hardware. Since many of these devices employ analog components, which cannot be perfectly controlled, finding ways to compensate for the resulting effects represents a key challenge. Here, we discuss three different strategies to address this problem: the addition of auxiliary network components for stabilizing activity, the utilization of inherently robust architectures and a training method for hardware-emulated networks that functions without perfect knowledge of the system's dynamics and parameters. For all three scenarios, we corroborate our theoretical considerations with experimental results on accelerated analog neuromorphic platforms. △ Less

Submitted 3 July, 2017; v1 submitted 17 March, 2017; originally announced March 2017.

Comments: accepted at ISCAS 2017

Journal ref: Circuits and Systems (ISCAS), 2017 IEEE International Symposium on

arXiv:1703.01909 [pdf, other]

doi 10.1109/IJCNN.2017.7966125

Neuromorphic Hardware In The Loop: Training a Deep Spiking Network on the BrainScaleS Wafer-Scale System

Authors: Sebastian Schmitt, Johann Klaehn, Guillaume Bellec, Andreas Gruebl, Maurice Guettler, Andreas Hartel, Stephan Hartmann, Dan Husmann, Kai Husmann, Vitali Karasenko, Mitja Kleider, Christoph Koke, Christian Mauch, Eric Mueller, Paul Mueller, Johannes Partzsch, Mihai A. Petrovici, Stefan Schiefer, Stefan Scholze, Bernhard Vogginger, Robert Legenstein, Wolfgang Maass, Christian Mayr, Johannes Schemmel, Karlheinz Meier

Abstract: Emulating spiking neural networks on analog neuromorphic hardware offers several advantages over simulating them on conventional computers, particularly in terms of speed and energy consumption. However, this usually comes at the cost of reduced control over the dynamics of the emulated networks. In this paper, we demonstrate how iterative training of a hardware-emulated network can compensate for… ▽ More Emulating spiking neural networks on analog neuromorphic hardware offers several advantages over simulating them on conventional computers, particularly in terms of speed and energy consumption. However, this usually comes at the cost of reduced control over the dynamics of the emulated networks. In this paper, we demonstrate how iterative training of a hardware-emulated network can compensate for anomalies induced by the analog substrate. We first convert a deep neural network trained in software to a spiking network on the BrainScaleS wafer-scale neuromorphic system, thereby enabling an acceleration factor of 10 000 compared to the biological time domain. This map** is followed by the in-the-loop training, where in each training step, the network activity is first recorded in hardware and then used to compute the parameter updates in software via backpropagation. An essential finding is that the parameter updates do not have to be precise, but only need to approximately follow the correct gradient, which simplifies the computation of updates. Using this approach, after only several tens of iterations, the spiking network shows an accuracy close to the ideal software-emulated prototype. The presented techniques show that deep spiking networks emulated on analog neuromorphic devices can attain good computational performance despite the inherent variations of the analog substrate. △ Less

Submitted 6 March, 2017; originally announced March 2017.

Comments: 8 pages, 10 figures, submitted to IJCNN 2017

Showing 1–5 of 5 results for author: Husmann, K