-
Learning Mid-Level Auditory Codes from Natural Sound Statistics
Authors:
Wiktor Młynarski,
Josh H. McDermott
Abstract:
Interaction with the world requires an organism to transform sensory signals into representations in which behaviorally meaningful properties of the environment are made explicit. These representations are derived through cascades of neuronal processing stages in which neurons at each stage recode the output of preceding stages. Explanations of sensory coding may thus involve understanding how low…
▽ More
Interaction with the world requires an organism to transform sensory signals into representations in which behaviorally meaningful properties of the environment are made explicit. These representations are derived through cascades of neuronal processing stages in which neurons at each stage recode the output of preceding stages. Explanations of sensory coding may thus involve understanding how low-level patterns are combined into more complex structures. Although models exist in the visual domain to explain how mid-level features such as junctions and curves might be derived from oriented filters in early visual cortex, little is known about analogous grou** principles for mid-level auditory representations. We propose a hierarchical generative model of natural sounds that learns combinations of spectrotemporal features from natural stimulus statistics. In the first layer the model forms a sparse convolutional code of spectrograms using a dictionary of learned spectrotemporal kernels. To generalize from specific kernel activation patterns, the second layer encodes patterns of time-varying magnitude of multiple first layer coefficients. Because second-layer features are sensitive to combinations of spectrotemporal features, the representation they support encodes more complex acoustic patterns than the first layer. When trained on corpora of speech and environmental sounds, some second-layer units learned to group spectrotemporal features that occur together in natural sounds. Others instantiate opponency between dissimilar sets of spectrotemporal features. Such grou**s might be instantiated by neurons in the auditory cortex, providing a hypothesis for mid-level neuronal computation.
△ Less
Submitted 14 October, 2017; v1 submitted 24 January, 2017;
originally announced January 2017.
-
Natural statistics of binaural sounds
Authors:
Wiktor Młynarski,
Jürgen Jost
Abstract:
Binaural sound localization is usually considered a discrimination task, where interaural time (ITD) and level (ILD) disparities at pure frequency channels are utilized to identify a position of a sound source. In natural conditions binaural circuits are exposed to a stimulation by sound waves originating from multiple, often moving and overlap** sources. Therefore statistics of binaural cues de…
▽ More
Binaural sound localization is usually considered a discrimination task, where interaural time (ITD) and level (ILD) disparities at pure frequency channels are utilized to identify a position of a sound source. In natural conditions binaural circuits are exposed to a stimulation by sound waves originating from multiple, often moving and overlap** sources. Therefore statistics of binaural cues depend on acoustic properties and the spatial configuration of the environment. In order to process binaural sounds efficiently, the auditory system should be adapted to naturally encountered cue distributions. Statistics of cues encountered naturally and their dependence on the physical properties of an auditory scene have not been studied before. Here, we performed binaural recordings of three auditory scenes with varying spatial properties. We have analyzed empirical cue distributions from each scene by fitting them with parametric probability density functions which allowed for an easy comparison of different scenes. Higher order statistics of binaural waveforms were analyzed by performing Independent Component Analysis (ICA) and studying properties of learned basis functions. Obtained results can be related to known neuronal mechanisms and suggest how binaural hearing can be understood in terms of adaptation to the natural signal statistics.
△ Less
Submitted 28 February, 2014; v1 submitted 19 February, 2014;
originally announced February 2014.
-
Sparse, complex-valued representations of natural sounds learned with phase and amplitude continuity priors
Authors:
Wiktor Mlynarski
Abstract:
Complex-valued sparse coding is a data representation which employs a dictionary of two-dimensional subspaces, while imposing a sparse, factorial prior on complex amplitudes. When trained on a dataset of natural image patches, it learns phase invariant features which closely resemble receptive fields of complex cells in the visual cortex. Features trained on natural sounds however, rarely reveal p…
▽ More
Complex-valued sparse coding is a data representation which employs a dictionary of two-dimensional subspaces, while imposing a sparse, factorial prior on complex amplitudes. When trained on a dataset of natural image patches, it learns phase invariant features which closely resemble receptive fields of complex cells in the visual cortex. Features trained on natural sounds however, rarely reveal phase invariance and capture other aspects of the data. This observation is a starting point of the present work. As its first contribution, it provides an analysis of natural sound statistics by means of learning sparse, complex representations of short speech intervals. Secondly, it proposes priors over the basis function set, which bias them towards phase-invariant solutions. In this way, a dictionary of complex basis functions can be learned from the data statistics, while preserving the phase invariance property. Finally, representations trained on speech sounds with and without priors are compared. Prior-based basis functions reveal performance comparable to unconstrained sparse coding, while explicitely representing phase as a temporal shift. Such representations can find applications in many perceptual and machine learning tasks.
△ Less
Submitted 18 February, 2014; v1 submitted 17 December, 2013;
originally announced December 2013.
-
Efficient coding of spectrotemporal binaural sounds leads to emergence of the auditory space representation
Authors:
Wiktor Mlynarski
Abstract:
To date a number of studies have shown that receptive field shapes of early sensory neurons can be reproduced by optimizing coding efficiency of natural stimulus ensembles. A still unresolved question is whether the efficient coding hypothesis explains formation of neurons which explicitly represent environmental features of different functional importance. This paper proposes that the spatial sel…
▽ More
To date a number of studies have shown that receptive field shapes of early sensory neurons can be reproduced by optimizing coding efficiency of natural stimulus ensembles. A still unresolved question is whether the efficient coding hypothesis explains formation of neurons which explicitly represent environmental features of different functional importance. This paper proposes that the spatial selectivity of higher auditory neurons emerges as a direct consequence of learning efficient codes for natural binaural sounds. Firstly, it is demonstrated that a linear efficient coding transform - Independent Component Analysis (ICA) trained on spectrograms of naturalistic simulated binaural sounds extracts spatial information present in the signal. A simple hierarchical ICA extension allowing for decoding of sound position is proposed. Furthermore, it is shown that units revealing spatial selectivity can be learned from a binaural recording of a natural auditory scene. In both cases a relatively small subpopulation of learned spectrogram features suffices to perform accurate sound localization. Representation of the auditory space is therefore learned in a purely unsupervised way by maximizing the coding efficiency and without any task-specific constraints. This results imply that efficient coding is a useful strategy for learning structures which allow for making behaviorally vital inferences about the environment.
△ Less
Submitted 19 February, 2014; v1 submitted 4 November, 2013;
originally announced November 2013.