Search | arXiv e-print repository

arXiv:1903.11990 [pdf, other]

On the Stability and Generalization of Learning with Kernel Activation Functions

Authors: Michele Cirillo, Simone Scardapane, Steven Van Vaerenbergh, Aurelio Uncini

Abstract: In this brief we investigate the generalization properties of a recently-proposed class of non-parametric activation functions, the kernel activation functions (KAFs). KAFs introduce additional parameters in the learning process in order to adapt nonlinearities individually on a per-neuron basis, exploiting a cheap kernel expansion of every activation value. While this increase in flexibility has… ▽ More In this brief we investigate the generalization properties of a recently-proposed class of non-parametric activation functions, the kernel activation functions (KAFs). KAFs introduce additional parameters in the learning process in order to adapt nonlinearities individually on a per-neuron basis, exploiting a cheap kernel expansion of every activation value. While this increase in flexibility has been shown to provide significant improvements in practice, a theoretical proof for its generalization capability has not been addressed yet in the literature. Here, we leverage recent literature on the stability properties of non-convex models trained via stochastic gradient descent (SGD). By indirectly proving two key smoothness properties of the models under consideration, we prove that neural networks endowed with KAFs generalize well when trained with SGD for a finite number of steps. Interestingly, our analysis provides a guideline for selecting one of the hyper-parameters of the model, the bandwidth of the scalar Gaussian kernel. A short experimental evaluation validates the proof. △ Less

Submitted 28 March, 2019; originally announced March 2019.

Comments: Submitted as a brief paper to IEEE TNNLS

arXiv:1807.04065 [pdf, other]

Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions

Authors: Simone Scardapane, Steven Van Vaerenbergh, Danilo Comminiello, Simone Totaro, Aurelio Uncini

Abstract: Gated recurrent neural networks have achieved remarkable results in the analysis of sequential data. Inside these networks, gates are used to control the flow of information, allowing to model even very long-term dependencies in the data. In this paper, we investigate whether the original gate equation (a linear projection followed by an element-wise sigmoid) can be improved. In particular, we des… ▽ More Gated recurrent neural networks have achieved remarkable results in the analysis of sequential data. Inside these networks, gates are used to control the flow of information, allowing to model even very long-term dependencies in the data. In this paper, we investigate whether the original gate equation (a linear projection followed by an element-wise sigmoid) can be improved. In particular, we design a more flexible architecture, with a small number of adaptable parameters, which is able to model a wider range of gating functions than the classical one. To this end, we replace the sigmoid function in the standard gate with a non-parametric formulation extending the recently proposed kernel activation function (KAF), with the addition of a residual skip-connection. A set of experiments on sequential variants of the MNIST dataset shows that the adoption of this novel gate allows to improve accuracy with a negligible cost in terms of computational power and with a large speed-up in the number of training iterations. △ Less

Submitted 11 July, 2018; originally announced July 2018.

Comments: Accepted for presentation at 2018 IEEE International Workshop on Machine Learning for Signal Processing (MLSP)

arXiv:1802.09405 [pdf, other]

Improving Graph Convolutional Networks with Non-Parametric Activation Functions

Authors: Simone Scardapane, Steven Van Vaerenbergh, Danilo Comminiello, Aurelio Uncini

Abstract: Graph neural networks (GNNs) are a class of neural networks that allow to efficiently perform inference on data that is associated to a graph structure, such as, e.g., citation networks or knowledge graphs. While several variants of GNNs have been proposed, they only consider simple nonlinear activation functions in their layers, such as rectifiers or squashing functions. In this paper, we investi… ▽ More Graph neural networks (GNNs) are a class of neural networks that allow to efficiently perform inference on data that is associated to a graph structure, such as, e.g., citation networks or knowledge graphs. While several variants of GNNs have been proposed, they only consider simple nonlinear activation functions in their layers, such as rectifiers or squashing functions. In this paper, we investigate the use of graph convolutional networks (GCNs) when combined with more complex activation functions, able to adapt from the training data. More specifically, we extend the recently proposed kernel activation function, a non-parametric model which can be implemented easily, can be regularized with standard $\ell_p$-norms techniques, and is smooth over its entire domain. Our experimental evaluation shows that the proposed architecture can significantly improve over its baseline, while similar improvements cannot be obtained by simply increasing the depth or size of the original GCN. △ Less

Submitted 26 February, 2018; originally announced February 2018.

Comments: Submitted to EUSIPCO 2018

arXiv:1802.05910 [pdf, other]

Pattern Localization in Time Series through Signal-To-Model Alignment in Latent Space

Authors: Steven Van Vaerenbergh, Ignacio Santamaria, Victor Elvira, Matteo Salvatori

Abstract: In this paper, we study the problem of locating a predefined sequence of patterns in a time series. In particular, the studied scenario assumes a theoretical model is available that contains the expected locations of the patterns. This problem is found in several contexts, and it is commonly solved by first synthesizing a time series from the model, and then aligning it to the true time series thr… ▽ More In this paper, we study the problem of locating a predefined sequence of patterns in a time series. In particular, the studied scenario assumes a theoretical model is available that contains the expected locations of the patterns. This problem is found in several contexts, and it is commonly solved by first synthesizing a time series from the model, and then aligning it to the true time series through dynamic time war**. We propose a technique that increases the similarity of both time series before aligning them, by map** them into a latent correlation space. The map** is learned from the data through a machine-learning setup. Experiments on data from non-destructive testing demonstrate that the proposed approach shows significant improvements over the state of the art. △ Less

Submitted 19 February, 2018; v1 submitted 16 February, 2018; originally announced February 2018.

Comments: IEEE ICASSP 2018

arXiv:1707.04035 [pdf, other]

Kafnets: kernel-based non-parametric activation functions for neural networks

Authors: Simone Scardapane, Steven Van Vaerenbergh, Simone Totaro, Aurelio Uncini

Abstract: Neural networks are generally built by interleaving (adaptable) linear layers with (fixed) nonlinear activation functions. To increase their flexibility, several authors have proposed methods for adapting the activation functions themselves, endowing them with varying degrees of flexibility. None of these approaches, however, have gained wide acceptance in practice, and research in this topic rema… ▽ More Neural networks are generally built by interleaving (adaptable) linear layers with (fixed) nonlinear activation functions. To increase their flexibility, several authors have proposed methods for adapting the activation functions themselves, endowing them with varying degrees of flexibility. None of these approaches, however, have gained wide acceptance in practice, and research in this topic remains open. In this paper, we introduce a novel family of flexible activation functions that are based on an inexpensive kernel expansion at every neuron. Leveraging over several properties of kernel-based models, we propose multiple variations for designing and initializing these kernel activation functions (KAFs), including a multidimensional scheme allowing to nonlinearly combine information from different paths in the network. The resulting KAFs can approximate any map** defined over a subset of the real line, either convex or nonconvex. Furthermore, they are smooth over their entire domain, linear in their parameters, and they can be regularized using any known scheme, including the use of $\ell_1$ penalties to enforce sparseness. To the best of our knowledge, no other known model satisfies all these properties simultaneously. In addition, we provide a relatively complete overview on alternative techniques for adapting the activation functions, which is currently lacking in the literature. A large set of experiments validates our proposal. △ Less

Submitted 23 November, 2017; v1 submitted 13 July, 2017; originally announced July 2017.

Comments: Preprint submitted to Neural Networks (Elsevier)

arXiv:1706.03533 [pdf, other]

Recursive Multikernel Filters Exploiting Nonlinear Temporal Structure

Authors: Steven Van Vaerenbergh, Simone Scardapane, Ignacio Santamaria

Abstract: In kernel methods, temporal information on the data is commonly included by using time-delayed embeddings as inputs. Recently, an alternative formulation was proposed by defining a gamma-filter explicitly in a reproducing kernel Hilbert space, giving rise to a complex model where multiple kernels operate on different temporal combinations of the input signal. In the original formulation, the kerne… ▽ More In kernel methods, temporal information on the data is commonly included by using time-delayed embeddings as inputs. Recently, an alternative formulation was proposed by defining a gamma-filter explicitly in a reproducing kernel Hilbert space, giving rise to a complex model where multiple kernels operate on different temporal combinations of the input signal. In the original formulation, the kernels are then simply combined to obtain a single kernel matrix (for instance by averaging), which provides computational benefits but discards important information on the temporal structure of the signal. Inspired by works on multiple kernel learning, we overcome this drawback by considering the different kernels separately. We propose an efficient strategy to adaptively combine and select these kernels during the training phase. The resulting batch and online algorithms automatically learn to process highly nonlinear temporal information extracted from the input signal, which is implicitly encoded in the kernel values. We evaluate our proposal on several artificial and real tasks, showing that it can outperform classical approaches both in batch and online settings. △ Less

Submitted 12 June, 2017; originally announced June 2017.

Comments: Eusipco 2017

arXiv:1609.03164 [pdf, ps, other]

On the Relationship between Online Gaussian Process Regression and Kernel Least Mean Squares Algorithms

Authors: Steven Van Vaerenbergh, Jesus Fernandez-Bes, Víctor Elvira

Abstract: We study the relationship between online Gaussian process (GP) regression and kernel least mean squares (KLMS) algorithms. While the latter have no capacity of storing the entire posterior distribution during online learning, we discover that their operation corresponds to the assumption of a fixed posterior covariance that follows a simple parametric model. Interestingly, several well-known KLMS… ▽ More We study the relationship between online Gaussian process (GP) regression and kernel least mean squares (KLMS) algorithms. While the latter have no capacity of storing the entire posterior distribution during online learning, we discover that their operation corresponds to the assumption of a fixed posterior covariance that follows a simple parametric model. Interestingly, several well-known KLMS algorithms correspond to specific cases of this model. The probabilistic perspective allows us to understand how each of them handles uncertainty, which could explain some of their performance differences. △ Less

Submitted 11 September, 2016; originally announced September 2016.

Comments: Accepted for publication in 2016 IEEE International Workshop on Machine Learning for Signal Processing

arXiv:1501.06929 [pdf, ps, other]

doi 10.1109/ICASSP.2015.7178361

A Probabilistic Least-Mean-Squares Filter

Authors: Jesus Fernandez-Bes, Víctor Elvira, Steven Van Vaerenbergh

Abstract: We introduce a probabilistic approach to the LMS filter. By means of an efficient approximation, this approach provides an adaptable step-size LMS algorithm together with a measure of uncertainty about the estimation. In addition, the proposed approximation preserves the linear complexity of the standard LMS. Numerical results show the improved performance of the algorithm with respect to standard… ▽ More We introduce a probabilistic approach to the LMS filter. By means of an efficient approximation, this approach provides an adaptable step-size LMS algorithm together with a measure of uncertainty about the estimation. In addition, the proposed approximation preserves the linear complexity of the standard LMS. Numerical results show the improved performance of the algorithm with respect to standard LMS and state-of-the-art algorithms with similar complexity. The goal of this work, therefore, is to open the door to bring some more Bayesian machine learning techniques to adaptive filtering. △ Less

Submitted 27 January, 2015; originally announced January 2015.

arXiv:1310.5347 [pdf, other]

Bayesian Extensions of Kernel Least Mean Squares

Authors: Il Memming Park, Sohan Seth, Steven Van Vaerenbergh

Abstract: The kernel least mean squares (KLMS) algorithm is a computationally efficient nonlinear adaptive filtering method that "kernelizes" the celebrated (linear) least mean squares algorithm. We demonstrate that the least mean squares algorithm is closely related to the Kalman filtering, and thus, the KLMS can be interpreted as an approximate Bayesian filtering method. This allows us to systematically d… ▽ More The kernel least mean squares (KLMS) algorithm is a computationally efficient nonlinear adaptive filtering method that "kernelizes" the celebrated (linear) least mean squares algorithm. We demonstrate that the least mean squares algorithm is closely related to the Kalman filtering, and thus, the KLMS can be interpreted as an approximate Bayesian filtering method. This allows us to systematically develop extensions of the KLMS by modifying the underlying state-space and observation models. The resulting extensions introduce many desirable properties such as "forgetting", and the ability to learn from discrete data, while retaining the computational simplicity and time complexity of the original algorithm. △ Less

Submitted 20 October, 2013; originally announced October 2013.

Comments: 7 pages, 4 fiures

arXiv:1303.2823 [pdf, other]

doi 10.1109/MSP.2013.2250352

Gaussian Processes for Nonlinear Signal Processing

Authors: Fernando Pérez-Cruz, Steven Van Vaerenbergh, Juan José Murillo-Fuentes, Miguel Lázaro-Gredilla, Ignacio Santamaria

Abstract: Gaussian processes (GPs) are versatile tools that have been successfully employed to solve nonlinear estimation problems in machine learning, but that are rarely used in signal processing. In this tutorial, we present GPs for regression as a natural nonlinear extension to optimal Wiener filtering. After establishing their basic formulation, we discuss several important aspects and extensions, incl… ▽ More Gaussian processes (GPs) are versatile tools that have been successfully employed to solve nonlinear estimation problems in machine learning, but that are rarely used in signal processing. In this tutorial, we present GPs for regression as a natural nonlinear extension to optimal Wiener filtering. After establishing their basic formulation, we discuss several important aspects and extensions, including recursive and adaptive algorithms for dealing with non-stationarity, low-complexity solutions, non-Gaussian noise models and classification scenarios. Furthermore, we provide a selection of relevant applications to wireless digital communications. △ Less

Submitted 27 September, 2013; v1 submitted 12 March, 2013; originally announced March 2013.

Journal ref: IEEE Signal Processing Magazine, vol.30, no.4, pp.40-50, July 2013

arXiv:1108.3372 [pdf, ps, other]

Overlap** Mixtures of Gaussian Processes for the Data Association Problem

Authors: Miguel Lázaro-Gredilla, Steven Van Vaerenbergh, Neil Lawrence

Abstract: In this work we introduce a mixture of GPs to address the data association problem, i.e. to label a group of observations according to the sources that generated them. Unlike several previously proposed GP mixtures, the novel mixture has the distinct characteristic of using no gating function to determine the association of samples and mixture components. Instead, all the GPs in the mixture are gl… ▽ More In this work we introduce a mixture of GPs to address the data association problem, i.e. to label a group of observations according to the sources that generated them. Unlike several previously proposed GP mixtures, the novel mixture has the distinct characteristic of using no gating function to determine the association of samples and mixture components. Instead, all the GPs in the mixture are global and samples are clustered following "trajectories" across input space. We use a non-standard variational Bayesian algorithm to efficiently recover sample labels and learn the hyperparameters. We show how multi-object tracking problems can be disambiguated and also explore the characteristics of the model in traditional regression settings. △ Less

Submitted 16 August, 2011; originally announced August 2011.

Showing 1–11 of 11 results for author: Van Vaerenbergh, S