Search | arXiv e-print repository

Asynchronous Graph Generator

Authors: Christopher P. Ley, Felipe Tobar

Abstract: We introduce the asynchronous graph generator (AGG), a novel graph attention network for imputation and prediction of multi-channel time series. Free from recurrent components or assumptions about temporal/spatial regularity, AGG encodes measurements, timestamps and channel-specific features directly in the nodes via learnable embeddings. Through an attention mechanism, these embeddings allow for… ▽ More We introduce the asynchronous graph generator (AGG), a novel graph attention network for imputation and prediction of multi-channel time series. Free from recurrent components or assumptions about temporal/spatial regularity, AGG encodes measurements, timestamps and channel-specific features directly in the nodes via learnable embeddings. Through an attention mechanism, these embeddings allow for discovering expressive relationships among the variables of interest in the form of a homogeneous graph. Once trained, AGG performs imputation by \emph{conditional attention generation}, i.e., by creating a new node conditioned on given timestamps and channel specification. The proposed AGG is compared to related methods in the literature and its performance is analysed from a data augmentation perspective. Our experiments reveal that AGG achieved state-of-the-art results in time series imputation, classification and prediction for the benchmark datasets \emph{Bei**g Air Quality}, \emph{PhysioNet ICU 2012} and \emph{UCI localisation}, outperforming other recent attention-based networks. △ Less

Submitted 22 May, 2024; v1 submitted 29 September, 2023; originally announced September 2023.

Comments: Submitted to NeurIPS 2024

arXiv:2308.07012 [pdf, other]

Greedy online change point detection

Authors: Jou-Hui Ho, Felipe Tobar

Abstract: Standard online change point detection (CPD) methods tend to have large false discovery rates as their detections are sensitive to outliers. To overcome this drawback, we propose Greedy Online Change Point Detection (GOCPD), a computationally appealing method which finds change points by maximizing the probability of the data coming from the (temporal) concatenation of two independent models. We s… ▽ More Standard online change point detection (CPD) methods tend to have large false discovery rates as their detections are sensitive to outliers. To overcome this drawback, we propose Greedy Online Change Point Detection (GOCPD), a computationally appealing method which finds change points by maximizing the probability of the data coming from the (temporal) concatenation of two independent models. We show that, for time series with a single change point, this objective is unimodal and thus CPD can be accelerated via ternary search with logarithmic complexity. We demonstrate the effectiveness of GOCPD on synthetic data and validate our findings on real-world univariate and multivariate settings. △ Less

Submitted 14 August, 2023; originally announced August 2023.

Comments: Accepted at IEEE MLSP 2023

arXiv:2305.04871 [pdf, other]

doi 10.1098/rspa.2022.0648

Gaussian process deconvolution

Authors: Felipe Tobar, Arnaud Robert, Jorge F. Silva

Abstract: Let us consider the deconvolution problem, that is, to recover a latent source $x(\cdot)$ from the observations $\mathbf{y} = [y_1,\ldots,y_N]$ of a convolution process $y = x\star h + η$, where $η$ is an additive noise, the observations in $\mathbf{y}$ might have missing parts with respect to $y$, and the filter $h$ could be unknown. We propose a novel strategy to address this task when $x$ is a… ▽ More Let us consider the deconvolution problem, that is, to recover a latent source $x(\cdot)$ from the observations $\mathbf{y} = [y_1,\ldots,y_N]$ of a convolution process $y = x\star h + η$, where $η$ is an additive noise, the observations in $\mathbf{y}$ might have missing parts with respect to $y$, and the filter $h$ could be unknown. We propose a novel strategy to address this task when $x$ is a continuous-time signal: we adopt a Gaussian process (GP) prior on the source $x$, which allows for closed-form Bayesian nonparametric deconvolution. We first analyse the direct model to establish the conditions under which the model is well defined. Then, we turn to the inverse problem, where we study i) some necessary conditions under which Bayesian deconvolution is feasible, and ii) to which extent the filter $h$ can be learnt from data or approximated for the blind deconvolution case. The proposed approach, termed Gaussian process deconvolution (GPDC) is compared to other deconvolution methods conceptually, via illustrative examples, and using real-world datasets. △ Less

Submitted 8 May, 2023; v1 submitted 8 May, 2023; originally announced May 2023.

Comments: Accepted at Proceedings of the Royal Society A

arXiv:2210.05394 [pdf, other]

Computationally-efficient initialisation of GPs: The generalised variogram method

Authors: Felipe Tobar, Elsa Cazelles, Taco de Wolff

Abstract: We present a computationally-efficient strategy to initialise the hyperparameters of a Gaussian process (GP) avoiding the computation of the likelihood function. Our strategy can be used as a pretraining stage to find initial conditions for maximum-likelihood (ML) training, or as a standalone method to compute hyperparameters values to be plugged in directly into the GP model. Motivated by the fac… ▽ More We present a computationally-efficient strategy to initialise the hyperparameters of a Gaussian process (GP) avoiding the computation of the likelihood function. Our strategy can be used as a pretraining stage to find initial conditions for maximum-likelihood (ML) training, or as a standalone method to compute hyperparameters values to be plugged in directly into the GP model. Motivated by the fact that training a GP via ML is equivalent (on average) to minimising the KL-divergence between the true and learnt model, we set to explore different metrics/divergences among GPs that are computationally inexpensive and provide hyperparameter values that are close to those found via ML. In practice, we identify the GP hyperparameters by projecting the empirical covariance or (Fourier) power spectrum onto a parametric family, thus proposing and studying various measures of discrepancy operating on the temporal and frequency domains. Our contribution extends the variogram method developed by the geostatistics literature and, accordingly, it is referred to as the generalised variogram method (GVM). In addition to the theoretical presentation of GVM, we provide experimental validation in terms of accuracy, consistency with ML and computational complexity for different kernels using synthetic and real-world data. △ Less

Submitted 26 April, 2023; v1 submitted 11 October, 2022; originally announced October 2022.

Journal ref: https://openreview.net/forum?id=slsAQHpS7n (2023)

arXiv:2202.09233 [pdf, other]

Nonstationary multi-output Gaussian processes via harmonizable spectral mixtures

Authors: Matías Altamirano, Felipe Tobar

Abstract: Kernel design for Multi-output Gaussian Processes (MOGP) has received increased attention recently. In particular, the Multi-Output Spectral Mixture kernel (MOSM) arXiv:1709.01298 approach has been praised as a general model in the sense that it extends other approaches such as Linear Model of Corregionalization, Intrinsic Corregionalization Model and Cross-Spectral Mixture. MOSM relies on Cramér'… ▽ More Kernel design for Multi-output Gaussian Processes (MOGP) has received increased attention recently. In particular, the Multi-Output Spectral Mixture kernel (MOSM) arXiv:1709.01298 approach has been praised as a general model in the sense that it extends other approaches such as Linear Model of Corregionalization, Intrinsic Corregionalization Model and Cross-Spectral Mixture. MOSM relies on Cramér's theorem to parametrise the power spectral densities (PSD) as a Gaussian mixture, thus, having a structural restriction: by assuming the existence of a PSD, the method is only suited for multi-output stationary applications. We develop a nonstationary extension of MOSM by proposing the family of harmonizable kernels for MOGPs, a class of kernels that contains both stationary and a vast majority of non-stationary processes. A main contribution of the proposed harmonizable kernels is that they automatically identify a possible nonstationary behaviour meaning that practitioners do not need to choose between stationary or non-stationary kernels. The proposed method is first validated on synthetic data with the purpose of illustrating the key properties of our approach, and then compared to existing MOGP methods on two real-world settings from finance and electroencephalography. △ Less

Submitted 18 February, 2022; originally announced February 2022.

Comments: Accepted at AISTATS 2022

arXiv:2112.15238 [pdf, other]

Studying the Interplay between Information Loss and Operation Loss in Representations for Classification

Authors: Jorge F. Silva, Felipe Tobar, Mario Vicuña, Felipe Cordova

Abstract: Information-theoretic measures have been widely adopted in the design of features for learning and decision problems. Inspired by this, we look at the relationship between i) a weak form of information loss in the Shannon sense and ii) the operation loss in the minimum probability of error (MPE) sense when considering a family of lossy continuous representations (features) of a continuous observat… ▽ More Information-theoretic measures have been widely adopted in the design of features for learning and decision problems. Inspired by this, we look at the relationship between i) a weak form of information loss in the Shannon sense and ii) the operation loss in the minimum probability of error (MPE) sense when considering a family of lossy continuous representations (features) of a continuous observation. We present several results that shed light on this interplay. Our first result offers a lower bound on a weak form of information loss as a function of its respective operation loss when adopting a discrete lossy representation (quantization) instead of the original raw observation. From this, our main result shows that a specific form of vanishing information loss (a weak notion of asymptotic informational sufficiency) implies a vanishing MPE loss (or asymptotic operational sufficiency) when considering a general family of lossy continuous representations. Our theoretical findings support the observation that the selection of feature representations that attempt to capture informational sufficiency is appropriate for learning, but this selection is a rather conservative design principle if the intended goal is achieving MPE in classification. Supporting this last point, and under some structural conditions, we show that it is possible to adopt an alternative notion of informational sufficiency (strictly weaker than pure sufficiency in the mutual information sense) to achieve operational sufficiency in learning. △ Less

Submitted 30 December, 2021; originally announced December 2021.

Comments: 64 pages, 9 figures

arXiv:2110.02151 [pdf, other]

Detection of blue whale vocalisations using a temporal-domain convolutional neural network

Authors: Bryan Sagredo, Sonia Español-Jiménez, Felipe Tobar

Abstract: We present a framework for detecting blue whale vocalisations from acoustic submarine recordings. The proposed methodology comprises three stages: i) a preprocessing step where the audio recordings are conditioned through normalisation, filtering, and denoising; ii) a label-propagation mechanism to ensure the consistency of the annotations of the whale vocalisations, and iii) a convolutional neura… ▽ More We present a framework for detecting blue whale vocalisations from acoustic submarine recordings. The proposed methodology comprises three stages: i) a preprocessing step where the audio recordings are conditioned through normalisation, filtering, and denoising; ii) a label-propagation mechanism to ensure the consistency of the annotations of the whale vocalisations, and iii) a convolutional neural network that receives audio samples. Based on 34 real-world submarine recordings (28 for training and 6 for testing) we obtained promising performance indicators including an Accuracy of 85.4\% and a Recall of 93.5\%. Furthermore, even for the cases where our detector did not match the ground-truth labels, a visual inspection validates the ability of our approach to detect possible parts of whale calls unlabelled as such due to not being complete calls. △ Less

Submitted 5 October, 2021; originally announced October 2021.

arXiv:2110.02144 [pdf, other]

Late reverberation suppression using U-nets

Authors: Diego León, Felipe Tobar

Abstract: In real-world settings, speech signals are almost always affected by reverberation produced by the working environment; these corrupted signals need to be \emph{dereverberated} prior to performing, e.g., speech recognition, speech-to-text conversion, compression, or general audio enhancement. In this paper, we propose a supervised dereverberation technique using \emph{U-nets with skip connections}… ▽ More In real-world settings, speech signals are almost always affected by reverberation produced by the working environment; these corrupted signals need to be \emph{dereverberated} prior to performing, e.g., speech recognition, speech-to-text conversion, compression, or general audio enhancement. In this paper, we propose a supervised dereverberation technique using \emph{U-nets with skip connections}, which are fully-convolutional encoder-decoder networks with layers arranged in the form of an "U" and connections that "skip" some layers. Building on this architecture, we address speech dereverberation through the lens of Late Reverberation Suppression (LS). Via experiments on synthetic and real-world data with different noise levels and reverberation settings, we show that our proposed method termed "LS U-net" improves quality, intelligibility and other performance metrics compared to the original U-net method and it is on par with the state-of-the-art GAN-based approaches. △ Less

Submitted 5 October, 2021; originally announced October 2021.

arXiv:2102.13380 [pdf, other]

A novel notion of barycenter for probability distributions based on optimal weak mass transport

Authors: Elsa Cazelles, Felipe Tobar, Joaquín Fontbona

Abstract: We introduce weak barycenters of a family of probability distributions, based on the recently developed notion of optimal weak transport of mass by Gozlanet al. (2017) and Backhoff-Veraguas et al. (2020). We provide a theoretical analysis of this object and discuss its interpretation in the light of convex ordering between probability measures. In particular, we show that, rather than averaging th… ▽ More We introduce weak barycenters of a family of probability distributions, based on the recently developed notion of optimal weak transport of mass by Gozlanet al. (2017) and Backhoff-Veraguas et al. (2020). We provide a theoretical analysis of this object and discuss its interpretation in the light of convex ordering between probability measures. In particular, we show that, rather than averaging the input distributions in a geometric way (as the Wasserstein barycenter based on classic optimal transport does) weak barycenters extract common geometric information shared by all the input distributions, encoded as a latent random variable that underlies all of them. We also provide an iterative algorithm to compute a weak barycenter for a finite family of input distributions, and a stochastic algorithm that computes them for arbitrary populations of laws. The latter approach is particularly well suited for the streaming setting, i.e., when distributions are observed sequentially. The notion of weak barycenter and our approaches to compute it are illustrated on synthetic examples, validated on 2D real-world data and compared to standard Wasserstein barycenters. △ Less

Submitted 10 March, 2023; v1 submitted 26 February, 2021; originally announced February 2021.

arXiv:2101.06119 [pdf, other]

doi 10.1109/MSP.2021.3053551

Data Science for Engineers: A Teaching Ecosystem

Authors: Felipe Tobar, Felipe Bravo-Marquez, Jocelyn Dunstan, Joaquin Fontbona, Alejandro Maass, Daniel Remenik, Jorge F. Silva

Abstract: We describe an ecosystem for teaching data science (DS) to engineers which blends theory, methods, and applications, developed at the Faculty of Physical and Mathematical Sciences, Universidad de Chile, over the last three years. This initiative has been motivated by the increasing demand for DS qualifications both from academic and professional environments. The ecosystem is distributed in a coll… ▽ More We describe an ecosystem for teaching data science (DS) to engineers which blends theory, methods, and applications, developed at the Faculty of Physical and Mathematical Sciences, Universidad de Chile, over the last three years. This initiative has been motivated by the increasing demand for DS qualifications both from academic and professional environments. The ecosystem is distributed in a collaborative fashion across three departments in the above Faculty and includes postgraduate programmes, courses, professional diplomas, data repositories, laboratories, trainee programmes, and internships. By sharing our teaching principles and the innovative components of our approach to teaching DS, we hope our experience can be useful to those develo** their own DS programmes and ecosystems. The open challenges and future plans for our ecosystem are also discussed at the end of the article. △ Less

Submitted 14 January, 2021; originally announced January 2021.

Comments: Accepted at IEEE Signal Processing Magazine (Special Issue on Innovation Starts with Education)

arXiv:2002.05789 [pdf, other]

Gaussian process imputation of multiple financial series

Authors: Taco de Wolff, Alejandro Cuevas, Felipe Tobar

Abstract: In Financial Signal Processing, multiple time series such as financial indicators, stock prices and exchange rates are strongly coupled due to their dependence on the latent state of the market and therefore they are required to be jointly analysed. We focus on learning the relationships among financial time series by modelling them through a multi-output Gaussian process (MOGP) with expressive co… ▽ More In Financial Signal Processing, multiple time series such as financial indicators, stock prices and exchange rates are strongly coupled due to their dependence on the latent state of the market and therefore they are required to be jointly analysed. We focus on learning the relationships among financial time series by modelling them through a multi-output Gaussian process (MOGP) with expressive covariance functions. Learning these market dependencies among financial series is crucial for the imputation and prediction of financial observations. The proposed model is validated experimentally on two real-world financial datasets for which their correlations across channels are analysed. We compare our model against other MOGPs and the independent Gaussian process on real financial data. △ Less

Submitted 11 February, 2020; originally announced February 2020.

Comments: Accepted at IEEE ICASSP 2020

arXiv:2002.03471 [pdf, other]

MOGPTK: The Multi-Output Gaussian Process Toolkit

Authors: Taco de Wolff, Alejandro Cuevas, Felipe Tobar

Abstract: We present MOGPTK, a Python package for multi-channel data modelling using Gaussian processes (GP). The aim of this toolkit is to make multi-output GP (MOGP) models accessible to researchers, data scientists, and practitioners alike. MOGPTK uses a Python front-end, relies on the GPflow suite and is built on a TensorFlow back-end, thus enabling GPU-accelerated training. The toolkit facilitates impl… ▽ More We present MOGPTK, a Python package for multi-channel data modelling using Gaussian processes (GP). The aim of this toolkit is to make multi-output GP (MOGP) models accessible to researchers, data scientists, and practitioners alike. MOGPTK uses a Python front-end, relies on the GPflow suite and is built on a TensorFlow back-end, thus enabling GPU-accelerated training. The toolkit facilitates implementing the entire pipeline of GP modelling, including data loading, parameter initialization, model learning, parameter interpretation, up to data imputation and extrapolation. MOGPTK implements the main multi-output covariance kernels from literature, as well as spectral-based parameter initialization strategies. The source code, tutorials and examples in the form of Jupyter notebooks, together with the API documentation, can be found at http://github.com/GAMES-UChile/mogptk △ Less

Submitted 9 February, 2020; originally announced February 2020.

arXiv:1912.05509 [pdf, other]

The Wasserstein-Fourier Distance for Stationary Time Series

Authors: Elsa Cazelles, Arnaud Robert, Felipe Tobar

Abstract: We propose the Wasserstein-Fourier (WF) distance to measure the (dis)similarity between time series by quantifying the displacement of their energy across frequencies. The WF distance operates by calculating the Wasserstein distance between the (normalised) power spectral densities (NPSD) of time series. Yet this rationale has been considered in the past, we fill a gap in the open literature provi… ▽ More We propose the Wasserstein-Fourier (WF) distance to measure the (dis)similarity between time series by quantifying the displacement of their energy across frequencies. The WF distance operates by calculating the Wasserstein distance between the (normalised) power spectral densities (NPSD) of time series. Yet this rationale has been considered in the past, we fill a gap in the open literature providing a formal introduction of this distance, together with its main properties from the joint perspective of Fourier analysis and optimal transport. As the main aim of this work is to validate WF as a general-purpose metric for time series, we illustrate its applicability on three broad contexts. First, we rely on WF to implement a PCA-like dimensionality reduction for NPSDs which allows for meaningful visualisation and pattern recognition applications. Second, we show that the geometry induced by WF on the space of NPSDs admits a geodesic interpolant between time series, thus enabling data augmentation on the spectral domain, by averaging the dynamic content of two signals. Third, we implement WF for time series classification using parametric/non-parametric classifiers and compare it to other classical metrics. Supported on theoretical results, as well as synthetic illustrations and experiments on real-world data, this work establishes WF as a meaningful and capable resource pertinent to general distance-based applications of time series. △ Less

Submitted 11 December, 2020; v1 submitted 11 December, 2019; originally announced December 2019.

arXiv:1909.07279 [pdf, other]

Band-Limited Gaussian Processes: The Sinc Kernel

Authors: Felipe Tobar

Abstract: We propose a novel class of Gaussian processes (GPs) whose spectra have compact support, meaning that their sample trajectories are almost-surely band limited. As a complement to the growing literature on spectral design of covariance kernels, the core of our proposal is to model power spectral densities through a rectangular function, which results in a kernel based on the sinc function with stra… ▽ More We propose a novel class of Gaussian processes (GPs) whose spectra have compact support, meaning that their sample trajectories are almost-surely band limited. As a complement to the growing literature on spectral design of covariance kernels, the core of our proposal is to model power spectral densities through a rectangular function, which results in a kernel based on the sinc function with straightforward extensions to non-centred (around zero frequency) and frequency-varying cases. In addition to its use in regression, the relationship between the sinc kernel and the classic theory is illuminated, in particular, the Shannon-Nyquist theorem is interpreted as posterior reconstruction under the proposed kernel. Additionally, we show that the sinc kernel is instrumental in two fundamental signal processing applications: first, in stereo amplitude modulation, where the non-centred sinc kernel arises naturally. Second, for band-pass filtering, where the proposed kernel allows for a Bayesian treatment that is robust to observation noise and missing data. The developed theory is complemented with illustrative graphic examples and validated experimentally using real-world data. △ Less

Submitted 16 September, 2019; originally announced September 2019.

Comments: To appear at NeurIPS 2019

arXiv:1906.09665 [pdf, other]

doi 10.1016/j.neunet.2019.06.012

Compositionally-Warped Gaussian Processes

Authors: Gonzalo Rios, Felipe Tobar

Abstract: The Gaussian process (GP) is a nonparametric prior distribution over functions indexed by time, space, or other high-dimensional index set. The GP is a flexible model yet its limitation is given by its very nature: it can only model Gaussian marginal distributions. To model non-Gaussian data, a GP can be warped by a nonlinear transformation (or war**) as performed by warped GPs (WGPs) and more c… ▽ More The Gaussian process (GP) is a nonparametric prior distribution over functions indexed by time, space, or other high-dimensional index set. The GP is a flexible model yet its limitation is given by its very nature: it can only model Gaussian marginal distributions. To model non-Gaussian data, a GP can be warped by a nonlinear transformation (or war**) as performed by warped GPs (WGPs) and more computationally-demanding alternatives such as Bayesian WGPs and deep GPs. However, the WGP requires a numerical approximation of the inverse war** for prediction, which increases the computational complexity in practice. To sidestep this issue, we construct a novel class of war**s consisting of compositions of multiple elementary functions, for which the inverse is known explicitly. We then propose the compositionally-warped GP (CWGP), a non-Gaussian generative model whose expressiveness follows from its deep compositional architecture, and its computational efficiency is guaranteed by the analytical inverse war**. Experimental validation using synthetic and real-world datasets confirms that the proposed CWGP is robust to the choice of war**s and provides more accurate point predictions, better trained models and shorter computation times than WGP. △ Less

Submitted 12 July, 2019; v1 submitted 23 June, 2019; originally announced June 2019.

Comments: Accepted at Elsevier Neural Networks, DOI added and author order corrected

arXiv:1902.03427 [pdf, other]

Low-pass filtering as Bayesian inference

Authors: Cristobal Valenzuela, Felipe Tobar

Abstract: We propose a Bayesian nonparametric method for low-pass filtering that can naturally handle unevenly-sampled and noise-corrupted observations. The proposed model is constructed as a latent-factor model for time series, where the latent factors are Gaussian processes with non-overlap** spectra. With this construction, the low-pass version of the time series can be identified as the low-frequency… ▽ More We propose a Bayesian nonparametric method for low-pass filtering that can naturally handle unevenly-sampled and noise-corrupted observations. The proposed model is constructed as a latent-factor model for time series, where the latent factors are Gaussian processes with non-overlap** spectra. With this construction, the low-pass version of the time series can be identified as the low-frequency latent component, and therefore it can be found by means of Bayesian inference. We show that the model admits exact training and can be implemented with minimal numerical approximations. Finally, the proposed model is validated against standard linear filters on synthetic and real-world time series. △ Less

Submitted 9 February, 2019; originally announced February 2019.

Comments: Accepted at ICASSP 2019

arXiv:1809.02196 [pdf, other]

Bayesian Nonparametric Spectral Estimation

Authors: Felipe Tobar

Abstract: Spectral estimation (SE) aims to identify how the energy of a signal (e.g., a time series) is distributed across different frequencies. This can become particularly challenging when only partial and noisy observations of the signal are available, where current methods fail to handle uncertainty appropriately. In this context, we propose a joint probabilistic model for signals, observations and spe… ▽ More Spectral estimation (SE) aims to identify how the energy of a signal (e.g., a time series) is distributed across different frequencies. This can become particularly challenging when only partial and noisy observations of the signal are available, where current methods fail to handle uncertainty appropriately. In this context, we propose a joint probabilistic model for signals, observations and spectra, where SE is addressed as an exact inference problem. Assuming a Gaussian process prior over the signal, we apply Bayes' rule to find the analytic posterior distribution of the spectrum given a set of observations. Besides its expressiveness and natural account of spectral uncertainty, the proposed model also provides a functional-form representation of the power spectral density, which can be optimised efficiently. Comparison with previous approaches, in particular against Lomb-Scargle, is addressed theoretically and also experimentally in three different scenarios. Code and demo available at https://github.com/GAMES-UChile/BayesianSpectralEstimation. △ Less

Submitted 12 January, 2019; v1 submitted 6 September, 2018; originally announced September 2018.

Comments: 11 pages. In Advances in Neural Information Processing Systems, 2018

arXiv:1805.10833 [pdf, other]

Bayesian Learning with Wasserstein Barycenters

Authors: Julio Backhoff-Veraguas, Joaquin Fontbona, Gonzalo Rios, Felipe Tobar

Abstract: We introduce and study a novel model-selection strategy for Bayesian learning, based on optimal transport, along with its associated predictive posterior law: the Wasserstein population barycenter of the posterior law over models. We first show how this estimator, termed Bayesian Wasserstein barycenter (BWB), arises naturally in a general, parameter-free Bayesian model-selection framework, when th… ▽ More We introduce and study a novel model-selection strategy for Bayesian learning, based on optimal transport, along with its associated predictive posterior law: the Wasserstein population barycenter of the posterior law over models. We first show how this estimator, termed Bayesian Wasserstein barycenter (BWB), arises naturally in a general, parameter-free Bayesian model-selection framework, when the considered Bayesian risk is the Wasserstein distance. Examples are given, illustrating how the BWB extends some classic parametric and non-parametric selection strategies. Furthermore, we also provide explicit conditions granting the existence and statistical consistency of the BWB, and discuss some of its general and specific properties, providing insights into its advantages compared to usual choices, such as the model average estimator. Finally, we illustrate how this estimator can be computed using the stochastic gradient descent (SGD) algorithm in Wasserstein space introduced in a companion paper arXiv:2201.04232v2 [math.OC], and provide a numerical example for experimental validation of the proposed method. △ Less

Submitted 9 November, 2022; v1 submitted 28 May, 2018; originally announced May 2018.

Comments: This is the final version, accepted for publication in ESAIM-P&S. Results on Bayesian consistency in Wasserstein topology and on convergence of the BWB estimator have been extended and improved. A series of new examples have been added

MSC Class: 62F15; 62C10; 68Q32; 68T05 ACM Class: G.3; I.2.6

arXiv:1803.07102 [pdf, other]

Learning non-Gaussian Time Series using the Box-Cox Gaussian Process

Authors: Gonzalo Rios, Felipe Tobar

Abstract: Gaussian processes (GPs) are Bayesian nonparametric generative models that provide interpretability of hyperparameters, admit closed-form expressions for training and inference, and are able to accurately represent uncertainty. To model general non-Gaussian data with complex correlation structure, GPs can be paired with an expressive covariance kernel and then fed into a nonlinear transformation (… ▽ More Gaussian processes (GPs) are Bayesian nonparametric generative models that provide interpretability of hyperparameters, admit closed-form expressions for training and inference, and are able to accurately represent uncertainty. To model general non-Gaussian data with complex correlation structure, GPs can be paired with an expressive covariance kernel and then fed into a nonlinear transformation (or war**). However, overparametrising the kernel and the war** is known to, respectively, hinder gradient-based training and make the predictions computationally expensive. We remedy this issue by (i) training the model using derivative-free global-optimisation techniques so as to find meaningful maxima of the model likelihood, and (ii) proposing a war** function based on the celebrated Box-Cox transformation that requires minimal numerical approximations---unlike existing warped GP models. We validate the proposed approach by first showing that predictions can be computed analytically, and then on a learning, reconstruction and forecasting experiment using real-world datasets. △ Less

Submitted 19 March, 2018; originally announced March 2018.

Comments: Accepted at IEEE IJCNN

arXiv:1709.01298 [pdf, other]

Spectral Mixture Kernels for Multi-Output Gaussian Processes

Authors: Gabriel Parra, Felipe Tobar

Abstract: Early approaches to multiple-output Gaussian processes (MOGPs) relied on linear combinations of independent, latent, single-output Gaussian processes (GPs). This resulted in cross-covariance functions with limited parametric interpretation, thus conflicting with the ability of single-output GPs to understand lengthscales, frequencies and magnitudes to name a few. On the contrary, current approache… ▽ More Early approaches to multiple-output Gaussian processes (MOGPs) relied on linear combinations of independent, latent, single-output Gaussian processes (GPs). This resulted in cross-covariance functions with limited parametric interpretation, thus conflicting with the ability of single-output GPs to understand lengthscales, frequencies and magnitudes to name a few. On the contrary, current approaches to MOGP are able to better interpret the relationship between different channels by directly modelling the cross-covariances as a spectral mixture kernel with a phase shift. We extend this rationale and propose a parametric family of complex-valued cross-spectral densities and then build on Cramér's Theorem (the multivariate version of Bochner's Theorem) to provide a principled approach to design multivariate covariance functions. The so-constructed kernels are able to model delays among channels in addition to phase differences and are thus more expressive than previous methods, while also providing full parametric interpretation of the relationship across channels. The proposed method is first validated on synthetic data and then compared to existing MOGP methods on two real-world examples. △ Less

Submitted 3 November, 2017; v1 submitted 5 September, 2017; originally announced September 2017.

Comments: To appear in Advances in Neural Information Processing Systems 31 (NIPS 2017)

arXiv:1707.05909 [pdf, other]

doi 10.1109/LSP.2016.2637312

Recovering Latent Signals from a Mixture of Measurements using a Gaussian Process Prior

Authors: Felipe Tobar, Gonzalo Rios, Tomás Valdivia, Pablo Guerrero

Abstract: In sensing applications, sensors cannot always measure the latent quantity of interest at the required resolution, sometimes they can only acquire a blurred version of it due the sensor's transfer function. To recover latent signals when only noisy mixed measurements of the signal are available, we propose the Gaussian process mixture of measurements (GPMM), which models the latent signal as a Gau… ▽ More In sensing applications, sensors cannot always measure the latent quantity of interest at the required resolution, sometimes they can only acquire a blurred version of it due the sensor's transfer function. To recover latent signals when only noisy mixed measurements of the signal are available, we propose the Gaussian process mixture of measurements (GPMM), which models the latent signal as a Gaussian process (GP) and allows us to perform Bayesian inference on such signal conditional to a set of noisy mixture of measurements. We describe how to train GPMM, that is, to find the hyperparameters of the GP and the mixing weights, and how to perform inference on the latent signal under GPMM; additionally, we identify the solution to the underdetermined linear system resulting from a sensing application as a particular case of GPMM. The proposed model is validated in the recovery of three signals: a smooth synthetic signal, a real-world heart-rate time series and a step function, where GPMM outperformed the standard GP in terms of estimation error, uncertainty representation and recovery of the spectral content of the latent signal. △ Less

Submitted 18 July, 2017; originally announced July 2017.

Comments: Published on IEEE Signal Processing Letters on Dec. 2016

Journal ref: IEEE Signal Processing Letters, vol. 24, no. 2, pp. 231-235, Feb. 2017

arXiv:1707.04236 [pdf, other]

Improving Sparsity in Kernel Adaptive Filters Using a Unit-Norm Dictionary

Authors: Felipe Tobar

Abstract: Kernel adaptive filters, a class of adaptive nonlinear time-series models, are known by their ability to learn expressive autoregressive patterns from sequential data. However, for trivial monotonic signals, they struggle to perform accurate predictions and at the same time keep computational complexity within desired boundaries. This is because new observations are incorporated to the dictionary… ▽ More Kernel adaptive filters, a class of adaptive nonlinear time-series models, are known by their ability to learn expressive autoregressive patterns from sequential data. However, for trivial monotonic signals, they struggle to perform accurate predictions and at the same time keep computational complexity within desired boundaries. This is because new observations are incorporated to the dictionary when they are far from what the algorithm has seen in the past. We propose a novel approach to kernel adaptive filtering that compares new observations against dictionary samples in terms of their unit-norm (normalised) versions, meaning that new observations that look like previous samples but have a different magnitude are not added to the dictionary. We achieve this by proposing the unit-norm Gaussian kernel and define a sparsification criterion for this novel kernel. This new methodology is validated on two real-world datasets against standard KAF in terms of the normalised mean square error and the dictionary size. △ Less

Submitted 13 July, 2017; originally announced July 2017.

Comments: Accepted at the IEEE Digital Signal Processing conference 2017

arXiv:1707.03450 [pdf, other]

Initialising Kernel Adaptive Filters via Probabilistic Inference

Authors: Iván Castro, Cristóbal Silva, Felipe Tobar

Abstract: We present a probabilistic framework for both (i) determining the initial settings of kernel adaptive filters (KAFs) and (ii) constructing fully-adaptive KAFs whereby in addition to weights and dictionaries, kernel parameters are learnt sequentially. This is achieved by formulating the estimator as a probabilistic model and defining dedicated prior distributions over the kernel parameters, weights… ▽ More We present a probabilistic framework for both (i) determining the initial settings of kernel adaptive filters (KAFs) and (ii) constructing fully-adaptive KAFs whereby in addition to weights and dictionaries, kernel parameters are learnt sequentially. This is achieved by formulating the estimator as a probabilistic model and defining dedicated prior distributions over the kernel parameters, weights and dictionary, enforcing desired properties such as sparsity. The model can then be trained using a subset of data to initialise standard KAFs or updated sequentially each time a new observation becomes available. Due to the nonlinear/non-Gaussian properties of the model, learning and inference is achieved using gradient-based maximum-a-posteriori optimisation and Markov chain Monte Carlo methods, and can be confidently used to compute predictions. The proposed framework was validated on nonlinear time series of both synthetic and real-world nature, where it outperformed standard KAFs in terms of mean square error and the sparsity of the learnt dictionaries. △ Less

Submitted 11 July, 2017; originally announced July 2017.

Showing 1–23 of 23 results for author: Tobar, F