Search | arXiv e-print repository

Reparameterization invariance in approximate Bayesian inference

Authors: Hrittik Roy, Marco Miani, Carl Henrik Ek, Philipp Hennig, Marvin Pförtner, Lukas Tatzel, Søren Hauberg

Abstract: Current approximate posteriors in Bayesian neural networks (BNNs) exhibit a crucial limitation: they fail to maintain invariance under reparameterization, i.e. BNNs assign different posterior densities to different parametrizations of identical functions. This creates a fundamental flaw in the application of Bayesian principles as it breaks the correspondence between uncertainty over the parameter… ▽ More Current approximate posteriors in Bayesian neural networks (BNNs) exhibit a crucial limitation: they fail to maintain invariance under reparameterization, i.e. BNNs assign different posterior densities to different parametrizations of identical functions. This creates a fundamental flaw in the application of Bayesian principles as it breaks the correspondence between uncertainty over the parameters with uncertainty over the parametrized function. In this paper, we investigate this issue in the context of the increasingly popular linearized Laplace approximation. Specifically, it has been observed that linearized predictives alleviate the common underfitting problems of the Laplace approximation. We develop a new geometric view of reparametrizations from which we explain the success of linearization. Moreover, we demonstrate that these reparameterization invariance properties can be extended to the original neural network predictive using a Riemannian diffusion process giving a straightforward algorithm for approximate posterior sampling, which empirically improves posterior fit. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2212.10010 [pdf, other]

Identifying latent distances with Finslerian geometry

Authors: Alison Pouplin, David Eklund, Carl Henrik Ek, Søren Hauberg

Abstract: Riemannian geometry provides us with powerful tools to explore the latent space of generative models while preserving the underlying structure of the data. The latent space can be equipped it with a Riemannian metric, pulled back from the data manifold. With this metric, we can systematically navigate the space relying on geodesics defined as the shortest curves between two points. Generative mode… ▽ More Riemannian geometry provides us with powerful tools to explore the latent space of generative models while preserving the underlying structure of the data. The latent space can be equipped it with a Riemannian metric, pulled back from the data manifold. With this metric, we can systematically navigate the space relying on geodesics defined as the shortest curves between two points. Generative models are often stochastic, causing the data space, the Riemannian metric, and the geodesics, to be stochastic as well. Stochastic objects are at best impractical, and at worst impossible, to manipulate. A common solution is to approximate the stochastic pullback metric by its expectation. But the geodesics derived from this expected Riemannian metric do not correspond to the expected length-minimising curves. In this work, we propose another metric whose geodesics explicitly minimise the expected length of the pullback metric. We show this metric defines a Finsler metric, and we compare it with the expected Riemannian metric. In high dimensions, we prove that both metrics converge to each other at a rate of $O\left(\frac{1}{D}\right)$. This convergence implies that the established expected Riemannian metric is an accurate approximation of the theoretically more grounded Finsler metric. This provides justification for using the expected Riemannian metric for practical implementations. △ Less

Submitted 11 October, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

Comments: 36 pages, 12 figures, accepted at TMLR (October 2023)

arXiv:2211.16367 [pdf, other]

A locally time-invariant metric for climate model ensemble predictions of extreme risk

Authors: Mala Virdee, Markus Kaiser, Emily Shuckburgh, Carl Henrik Ek, Ieva Kazlauskaite

Abstract: Adaptation-relevant predictions of climate change are often derived by combining climate model simulations in a multi-model ensemble. Model evaluation methods used in performance-based ensemble weighting schemes have limitations in the context of high-impact extreme events. We introduce a locally time-invariant method for evaluating climate model simulations with a focus on assessing the simulatio… ▽ More Adaptation-relevant predictions of climate change are often derived by combining climate model simulations in a multi-model ensemble. Model evaluation methods used in performance-based ensemble weighting schemes have limitations in the context of high-impact extreme events. We introduce a locally time-invariant method for evaluating climate model simulations with a focus on assessing the simulation of extremes. We explore the behaviour of the proposed method in predicting extreme heat days in Nairobi and provide comparative results for eight additional cities. △ Less

Submitted 18 April, 2023; v1 submitted 26 November, 2022; originally announced November 2022.

arXiv:2110.15761 [pdf, other]

Aligned Multi-Task Gaussian Process

Authors: Olga Mikheeva, Ieva Kazlauskaite, Adam Hartshorne, Hedvig Kjellström, Carl Henrik Ek, Neill D. F. Campbell

Abstract: Multi-task learning requires accurate identification of the correlations between tasks. In real-world time-series, tasks are rarely perfectly temporally aligned; traditional multi-task models do not account for this and subsequent errors in correlation estimation will result in poor predictive performance and uncertainty quantification. We introduce a method that automatically accounts for tempora… ▽ More Multi-task learning requires accurate identification of the correlations between tasks. In real-world time-series, tasks are rarely perfectly temporally aligned; traditional multi-task models do not account for this and subsequent errors in correlation estimation will result in poor predictive performance and uncertainty quantification. We introduce a method that automatically accounts for temporal misalignment in a unified generative model that improves predictive performance. Our method uses Gaussian processes (GPs) to model the correlations both within and between the tasks. Building on the previous work by Kazlauskaiteet al. [2019], we include a separate monotonic warp of the input data to model temporal misalignment. In contrast to previous work, we formulate a lower bound that accounts for uncertainty in both the estimates of the war** process and the underlying functions. Also, our new take on a monotonic stochastic process, with efficient path-wise sampling for the warp functions, allows us to perform full Bayesian inference in the model rather than MAP estimates. Missing data experiments, on synthetic and real time-series, demonstrate the advantages of accounting for misalignments (vs standard unaligned method) as well as modelling the uncertainty in the war** process(vs baseline MAP alignment approach). △ Less

Submitted 29 October, 2021; originally announced October 2021.

arXiv:2105.04504 [pdf, other]

Deep Neural Networks as Point Estimates for Deep Gaussian Processes

Authors: Vincent Dutordoir, James Hensman, Mark van der Wilk, Carl Henrik Ek, Zoubin Ghahramani, Nicolas Durrande

Abstract: Neural networks and Gaussian processes are complementary in their strengths and weaknesses. Having a better understanding of their relationship comes with the promise to make each method benefit from the strengths of the other. In this work, we establish an equivalence between the forward passes of neural networks and (deep) sparse Gaussian process models. The theory we develop is based on interpr… ▽ More Neural networks and Gaussian processes are complementary in their strengths and weaknesses. Having a better understanding of their relationship comes with the promise to make each method benefit from the strengths of the other. In this work, we establish an equivalence between the forward passes of neural networks and (deep) sparse Gaussian process models. The theory we develop is based on interpreting activation functions as interdomain inducing features through a rigorous analysis of the interplay between activation functions and kernels. This results in models that can either be seen as neural networks with improved uncertainty prediction or deep Gaussian processes with increased prediction accuracy. These claims are supported by experimental results on regression and classification datasets. △ Less

Submitted 9 December, 2021; v1 submitted 10 May, 2021; originally announced May 2021.

Comments: 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

arXiv:2010.13632 [pdf, other]

Black-box density function estimation using recursive partitioning

Authors: Erik Bodin, Zhenwen Dai, Neill D. F. Campbell, Carl Henrik Ek

Abstract: We present a novel approach to Bayesian inference and general Bayesian computation that is defined through a sequential decision loop. Our method defines a recursive partitioning of the sample space. It neither relies on gradients nor requires any problem-specific tuning, and is asymptotically exact for any density function with a bounded domain. The output is an approximation to the whole density… ▽ More We present a novel approach to Bayesian inference and general Bayesian computation that is defined through a sequential decision loop. Our method defines a recursive partitioning of the sample space. It neither relies on gradients nor requires any problem-specific tuning, and is asymptotically exact for any density function with a bounded domain. The output is an approximation to the whole density function including the normalisation constant, via partitions organised in efficient data structures. Such approximations may be used for evidence estimation or fast posterior sampling, but also as building blocks to treat a larger class of estimation problems. The algorithm shows competitive performance to recent state-of-the-art methods on synthetic and real-world problems including parameter inference for gravitational-wave physics. △ Less

Submitted 8 June, 2021; v1 submitted 26 October, 2020; originally announced October 2020.

Comments: International Conference on Machine Learning (ICML) 2021

arXiv:2001.09886 [pdf, other]

Bayesian nonparametric shared multi-sequence time series segmentation

Authors: Olga Mikheeva, Ieva Kazlauskaite, Hedvig Kjellström, Carl Henrik Ek

Abstract: In this paper, we introduce a method for segmenting time series data using tools from Bayesian nonparametrics. We consider the task of temporal segmentation of a set of time series data into representative stationary segments. We use Gaussian process (GP) priors to impose our knowledge about the characteristics of the underlying stationary segments, and use a nonparametric distribution to partitio… ▽ More In this paper, we introduce a method for segmenting time series data using tools from Bayesian nonparametrics. We consider the task of temporal segmentation of a set of time series data into representative stationary segments. We use Gaussian process (GP) priors to impose our knowledge about the characteristics of the underlying stationary segments, and use a nonparametric distribution to partition the sequences into such segments, formulated in terms of a prior distribution on segment length. Given the segmentation, the model can be viewed as a variant of a Gaussian mixture model where the mixture components are described using the covariance function of a GP. We demonstrate the effectiveness of our model on synthetic data as well as on real time-series data of heartbeats where the task is to segment the indicative types of beats and to classify the heartbeat recordings into classes that correspond to healthy and abnormal heart sounds. △ Less

Submitted 27 January, 2020; originally announced January 2020.

arXiv:1909.07698 [pdf, other]

Compositional uncertainty in deep Gaussian processes

Authors: Ivan Ustyuzhaninov, Ieva Kazlauskaite, Markus Kaiser, Erik Bodin, Neill D. F. Campbell, Carl Henrik Ek

Abstract: Gaussian processes (GPs) are nonparametric priors over functions. Fitting a GP implies computing a posterior distribution of functions consistent with the observed data. Similarly, deep Gaussian processes (DGPs) should allow us to compute a posterior distribution of compositions of multiple functions giving rise to the observations. However, exact Bayesian inference is intractable for DGPs, motiva… ▽ More Gaussian processes (GPs) are nonparametric priors over functions. Fitting a GP implies computing a posterior distribution of functions consistent with the observed data. Similarly, deep Gaussian processes (DGPs) should allow us to compute a posterior distribution of compositions of multiple functions giving rise to the observations. However, exact Bayesian inference is intractable for DGPs, motivating the use of various approximations. We show that the application of simplifying mean-field assumptions across the hierarchy leads to the layers of a DGP collapsing to near-deterministic transformations. We argue that such an inference scheme is suboptimal, not taking advantage of the potential of the model to discover the compositional structure in the data. To address this issue, we examine alternative variational inference schemes allowing for dependencies across different layers and discuss their advantages and limitations. △ Less

Submitted 25 February, 2020; v1 submitted 17 September, 2019; originally announced September 2019.

Comments: 17 pages

arXiv:1907.04902 [pdf, other]

Interpretable Dynamics Models for Data-Efficient Reinforcement Learning

Authors: Markus Kaiser, Clemens Otte, Thomas Runkler, Carl Henrik Ek

Abstract: In this paper, we present a Bayesian view on model-based reinforcement learning. We use expert knowledge to impose structure on the transition model and present an efficient learning scheme based on variational inference. This scheme is applied to a heteroskedastic and bimodal benchmark problem on which we compare our results to NFQ and show how our approach yields human-interpretable insight abou… ▽ More In this paper, we present a Bayesian view on model-based reinforcement learning. We use expert knowledge to impose structure on the transition model and present an efficient learning scheme based on variational inference. This scheme is applied to a heteroskedastic and bimodal benchmark problem on which we compare our results to NFQ and show how our approach yields human-interpretable insight about the underlying dynamics while also increasing data-efficiency. △ Less

Submitted 10 July, 2019; originally announced July 2019.

Comments: ESANN 2019 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges (Belgium), 24-26 April 2019, i6doc.com publ., ISBN 978-287-587-065-0

arXiv:1906.11152 [pdf, other]

Modulating Surrogates for Bayesian Optimization

Authors: Erik Bodin, Markus Kaiser, Ieva Kazlauskaite, Zhenwen Dai, Neill D. F. Campbell, Carl Henrik Ek

Abstract: Bayesian optimization (BO) methods often rely on the assumption that the objective function is well-behaved, but in practice, this is seldom true for real-world objectives even if noise-free observations can be collected. Common approaches, which try to model the objective as precisely as possible, often fail to make progress by spending too many evaluations modeling irrelevant details. We address… ▽ More Bayesian optimization (BO) methods often rely on the assumption that the objective function is well-behaved, but in practice, this is seldom true for real-world objectives even if noise-free observations can be collected. Common approaches, which try to model the objective as precisely as possible, often fail to make progress by spending too many evaluations modeling irrelevant details. We address this issue by proposing surrogate models that focus on the well-behaved structure in the objective function, which is informative for search, while ignoring detrimental structure that is challenging to model from few observations. First, we demonstrate that surrogate models with appropriate noise distributions can absorb challenging structures in the objective function by treating them as irreducible uncertainty. Secondly, we show that a latent Gaussian process is an excellent surrogate for this purpose, comparing with Gaussian processes with standard noise distributions. We perform numerous experiments on a range of BO benchmarks and find that our approach improves reliability and performance when faced with challenging objective functions. △ Less

Submitted 8 September, 2020; v1 submitted 26 June, 2019; originally announced June 2019.

Journal ref: 37th International Conference On Machine Learning (ICML 2020)

arXiv:1905.12930 [pdf, other]

Monotonic Gaussian Process Flow

Authors: Ivan Ustyuzhaninov, Ieva Kazlauskaite, Carl Henrik Ek, Neill D. F. Campbell

Abstract: We propose a new framework for imposing monotonicity constraints in a Bayesian nonparametric setting based on numerical solutions of stochastic differential equations. We derive a nonparametric model of monotonic functions that allows for interpretable priors and principled quantification of hierarchical uncertainty. We demonstrate the efficacy of the proposed model by providing competitive result… ▽ More We propose a new framework for imposing monotonicity constraints in a Bayesian nonparametric setting based on numerical solutions of stochastic differential equations. We derive a nonparametric model of monotonic functions that allows for interpretable priors and principled quantification of hierarchical uncertainty. We demonstrate the efficacy of the proposed model by providing competitive results to other probabilistic monotonic models on a number of benchmark functions. In addition, we consider the utility of a monotonic random process as a part of a hierarchical probabilistic model; we examine the task of temporal alignment of time-series data where it is beneficial to use a monotonic random process in order to preserve the uncertainty in the temporal war**s. △ Less

Submitted 25 February, 2020; v1 submitted 30 May, 2019; originally announced May 2019.

Comments: Proceedings of the 23nd International Conference on Artificial Intelligence and Statistics (AISTATS) 2020 (14 pages)

arXiv:1901.10673 [pdf, other]

Invariant Feature Map**s for Generalizing Affordance Understanding Using Regularized Metric Learning

Authors: Martin Hjelm, Carl Henrik Ek, Renaud Detry, Danica Kragic

Abstract: This paper presents an approach for learning invariant features for object affordance understanding. One of the major problems for a robotic agent acquiring a deeper understanding of affordances is finding sensory-grounded semantics. Being able to understand what in the representation of an object makes the object afford an action opens up for more efficient manipulation, interchange of objects th… ▽ More This paper presents an approach for learning invariant features for object affordance understanding. One of the major problems for a robotic agent acquiring a deeper understanding of affordances is finding sensory-grounded semantics. Being able to understand what in the representation of an object makes the object afford an action opens up for more efficient manipulation, interchange of objects that visually might not be similar, transfer learning, and robot to human communication. Our approach uses a metric learning algorithm that learns a feature transform that encourages objects that affords the same action to be close in the feature space. We regularize the learning, such that we penalize irrelevant features, allowing the agent to link what in the sensory input caused the object to afford the action. From this, we show how the agent can abstract the affordance and reason about the similarity between different affordances. △ Less

Submitted 29 January, 2019; originally announced January 2019.

arXiv:1812.05477 [pdf, other]

Gaussian Process Deep Belief Networks: A Smooth Generative Model of Shape with Uncertainty Propagation

Authors: Alessandro Di Martino, Erik Bodin, Carl Henrik Ek, Neill D. F. Campbell

Abstract: The shape of an object is an important characteristic for many vision problems such as segmentation, detection and tracking. Being independent of appearance, it is possible to generalize to a large range of objects from only small amounts of data. However, shapes represented as silhouette images are challenging to model due to complicated likelihood functions leading to intractable posteriors. In… ▽ More The shape of an object is an important characteristic for many vision problems such as segmentation, detection and tracking. Being independent of appearance, it is possible to generalize to a large range of objects from only small amounts of data. However, shapes represented as silhouette images are challenging to model due to complicated likelihood functions leading to intractable posteriors. In this paper we present a generative model of shapes which provides a low dimensional latent encoding which importantly resides on a smooth manifold with respect to the silhouette images. The proposed model propagates uncertainty in a principled manner allowing it to learn from small amounts of data and providing predictions with associated uncertainty. We provide experiments that show how our proposed model provides favorable quantitative results compared with the state-of-the-art while simultaneously providing a representation that resides on a low-dimensional interpretable manifold. △ Less

Submitted 13 December, 2018; originally announced December 2018.

arXiv:1811.10689 [pdf, other]

Sequence Alignment with Dirichlet Process Mixtures

Authors: Ieva Kazlauskaite, Ivan Ustyuzhaninov, Carl Henrik Ek, Neill D. F. Campbell

Abstract: We present a probabilistic model for unsupervised alignment of high-dimensional time-warped sequences based on the Dirichlet Process Mixture Model (DPMM). We follow the approach introduced in (Kazlauskaite, 2018) of simultaneously representing each data sequence as a composition of a true underlying function and a time-war**, both of which are modelled using Gaussian processes (GPs) (Rasmussen,… ▽ More We present a probabilistic model for unsupervised alignment of high-dimensional time-warped sequences based on the Dirichlet Process Mixture Model (DPMM). We follow the approach introduced in (Kazlauskaite, 2018) of simultaneously representing each data sequence as a composition of a true underlying function and a time-war**, both of which are modelled using Gaussian processes (GPs) (Rasmussen, 2005), and aligning the underlying functions using an unsupervised alignment method. In (Kazlauskaite, 2018) the alignment is performed using the GP latent variable model (GP-LVM) (Lawrence, 2005) as a model of sequences, while our main contribution is extending this approach to using DPMM, which allows us to align the sequences temporally and cluster them at the same time. We show that the DPMM achieves competitive results in comparison to the GP-LVM on synthetic and real-world data sets, and discuss the different properties of the estimated underlying functions and the time-warps favoured by these models. △ Less

Submitted 26 November, 2018; originally announced November 2018.

Comments: 6 pages, 3 figures, "All Of Bayesian Nonparametrics" Workshop at the 32nd Annual Conference on Neural Information Processing Systems (BNP@NeurIPS2018)

arXiv:1810.07158 [pdf, other]

Data Association with Gaussian Processes

Authors: Markus Kaiser, Clemens Otte, Thomas Runkler, Carl Henrik Ek

Abstract: The data association problem is concerned with separating data coming from different generating processes, for example when data come from different data sources, contain significant noise, or exhibit multimodality. We present a fully Bayesian approach to this problem. Our model is capable of simultaneously solving the data association problem and the induced supervised learning problems. Underpin… ▽ More The data association problem is concerned with separating data coming from different generating processes, for example when data come from different data sources, contain significant noise, or exhibit multimodality. We present a fully Bayesian approach to this problem. Our model is capable of simultaneously solving the data association problem and the induced supervised learning problems. Underpinning our approach is the use of Gaussian process priors to encode the structure of both the data and the data associations. We present an efficient learning scheme based on doubly stochastic variational inference and discuss how it can be applied to deep Gaussian process priors. △ Less

Submitted 5 May, 2019; v1 submitted 16 October, 2018; originally announced October 2018.

arXiv:1807.04833 [pdf, ps, other]

DP-GP-LVM: A Bayesian Non-Parametric Model for Learning Multivariate Dependency Structures

Authors: Andrew R. Lawrence, Carl Henrik Ek, Neill D. F. Campbell

Abstract: We present a non-parametric Bayesian latent variable model capable of learning dependency structures across dimensions in a multivariate setting. Our approach is based on flexible Gaussian process priors for the generative map**s and interchangeable Dirichlet process priors to learn the structure. The introduction of the Dirichlet process as a specific structural prior allows our model to circum… ▽ More We present a non-parametric Bayesian latent variable model capable of learning dependency structures across dimensions in a multivariate setting. Our approach is based on flexible Gaussian process priors for the generative map**s and interchangeable Dirichlet process priors to learn the structure. The introduction of the Dirichlet process as a specific structural prior allows our model to circumvent issues associated with previous Gaussian process latent variable models. Inference is performed by deriving an efficient variational bound on the marginal log-likelihood on the model. △ Less

Submitted 12 July, 2018; originally announced July 2018.

arXiv:1803.02603 [pdf, other]

Gaussian Process Latent Variable Alignment Learning

Authors: Ieva Kazlauskaite, Carl Henrik Ek, Neill D. F. Campbell

Abstract: We present a model that can automatically learn alignments between high-dimensional data in an unsupervised manner. Our proposed method casts alignment learning in a framework where both alignment and data are modelled simultaneously. Further, we automatically infer grou**s of different types of sequences within the same dataset. We derive a probabilistic model built on non-parametric priors tha… ▽ More We present a model that can automatically learn alignments between high-dimensional data in an unsupervised manner. Our proposed method casts alignment learning in a framework where both alignment and data are modelled simultaneously. Further, we automatically infer grou**s of different types of sequences within the same dataset. We derive a probabilistic model built on non-parametric priors that allows for flexible warps while at the same time providing means to specify interpretable constraints. We demonstrate the efficacy of our approach with superior quantitative performance to the state-of-the-art approaches and provide examples to illustrate the versatility of our model in automatic inference of sequence grou**s, absent from previous approaches, as well as easy specification of high level priors for different modalities of data. △ Less

Submitted 1 March, 2019; v1 submitted 7 March, 2018; originally announced March 2018.

Comments: Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS) 2019 (13 pages, 11 figures)

arXiv:1802.04642 [pdf, other]

doi 10.1109/IROS.2016.7759112

Active Exploration Using Gaussian Random Fields and Gaussian Process Implicit Surfaces

Authors: Sergio Caccamo, Yasemin Bekiroglu, Carl Henrik Ek, Danica Kragic

Abstract: In this work we study the problem of exploring surfaces and building compact 3D representations of the environment surrounding a robot through active perception. We propose an online probabilistic framework that merges visual and tactile measurements using Gaussian Random Field and Gaussian Process Implicit Surfaces. The system investigates incomplete point clouds in order to find a small set of r… ▽ More In this work we study the problem of exploring surfaces and building compact 3D representations of the environment surrounding a robot through active perception. We propose an online probabilistic framework that merges visual and tactile measurements using Gaussian Random Field and Gaussian Process Implicit Surfaces. The system investigates incomplete point clouds in order to find a small set of regions of interest which are then physically explored with a robotic arm equipped with tactile sensors. We show experimental results obtained using a PrimeSense camera, a Kinova Jaco2 robotic arm and Optoforce sensors on different scenarios. We then demonstrate how to use the online framework for object detection and terrain classification. △ Less

Submitted 13 February, 2018; originally announced February 2018.

Comments: 8 pages, 6 figures, external contents (https://youtu.be/0-UlFRQT0JI)

Journal ref: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

arXiv:1712.06536 [pdf, other]

Nonparametric Inference for Auto-Encoding Variational Bayes

Authors: Erik Bodin, Iman Malik, Carl Henrik Ek, Neill D. F. Campbell

Abstract: We would like to learn latent representations that are low-dimensional and highly interpretable. A model that has these characteristics is the Gaussian Process Latent Variable Model. The benefits and negative of the GP-LVM are complementary to the Variational Autoencoder, the former provides interpretable low-dimensional latent representations while the latter is able to handle large amounts of da… ▽ More We would like to learn latent representations that are low-dimensional and highly interpretable. A model that has these characteristics is the Gaussian Process Latent Variable Model. The benefits and negative of the GP-LVM are complementary to the Variational Autoencoder, the former provides interpretable low-dimensional latent representations while the latter is able to handle large amounts of data and can use non-Gaussian likelihoods. Our inspiration for this paper is to marry these two approaches and reap the benefits of both. In order to do so we will introduce a novel approximate inference scheme inspired by the GP-LVM and the VAE. We show experimentally that the approximation allows the capacity of the generative bottle-neck (Z) of the VAE to be arbitrarily large without losing a highly interpretable representation, allowing reconstruction quality to be unlimited by Z at the same time as a low-dimensional space can be used to perform ancestral sampling from as well as a means to reason about the embedded data. △ Less

Submitted 18 December, 2017; originally announced December 2017.

Comments: Presented at NIPS 2017 Workshop on Advances in Approximate Bayesian Inference

arXiv:1710.02766 [pdf, other]

Bayesian Alignments of Warped Multi-Output Gaussian Processes

Authors: Markus Kaiser, Clemens Otte, Thomas Runkler, Carl Henrik Ek

Abstract: We propose a novel Bayesian approach to modelling nonlinear alignments of time series based on latent shared information. We apply the method to the real-world problem of finding common structure in the sensor data of wind turbines introduced by the underlying latent and turbulent wind field. The proposed model allows for both arbitrary alignments of the inputs and non-parametric output war**s t… ▽ More We propose a novel Bayesian approach to modelling nonlinear alignments of time series based on latent shared information. We apply the method to the real-world problem of finding common structure in the sensor data of wind turbines introduced by the underlying latent and turbulent wind field. The proposed model allows for both arbitrary alignments of the inputs and non-parametric output war**s to transform the observations. This gives rise to multiple deep Gaussian process models connected via latent generating processes. We present an efficient variational approximation based on nested variational compression and show how the model can be used to extract shared information between dependent time series, recovering an interpretable functional decomposition of the learning problem. We show results for an artificial data set and real-world data of two wind turbines. △ Less

Submitted 23 May, 2018; v1 submitted 7 October, 2017; originally announced October 2017.

arXiv:1708.03535 [pdf, other]

Neural Translation of Musical Style

Authors: Iman Malik, Carl Henrik Ek

Abstract: Music is an expressive form of communication often used to convey emotion in scenarios where "words are not enough". Part of this information lies in the musical composition where well-defined language exists. However, a significant amount of information is added during a performance as the musician interprets the composition. The performer injects expressiveness into the written score through var… ▽ More Music is an expressive form of communication often used to convey emotion in scenarios where "words are not enough". Part of this information lies in the musical composition where well-defined language exists. However, a significant amount of information is added during a performance as the musician interprets the composition. The performer injects expressiveness into the written score through variations of different musical properties such as dynamics and tempo. In this paper, we describe a model that can learn to perform sheet music. Our research concludes that the generated performances are indistinguishable from a human performance, thereby passing a test in the spirit of a "musical Turing test". △ Less

Submitted 11 August, 2017; originally announced August 2017.

arXiv:1707.05534 [pdf, other]

Latent Gaussian Process Regression

Authors: Erik Bodin, Neill D. F. Campbell, Carl Henrik Ek

Abstract: We introduce Latent Gaussian Process Regression which is a latent variable extension allowing modelling of non-stationary multi-modal processes using GPs. The approach is built on extending the input space of a regression problem with a latent variable that is used to modulate the covariance function over the training data. We show how our approach can be used to model multi-modal and non-stationa… ▽ More We introduce Latent Gaussian Process Regression which is a latent variable extension allowing modelling of non-stationary multi-modal processes using GPs. The approach is built on extending the input space of a regression problem with a latent variable that is used to modulate the covariance function over the training data. We show how our approach can be used to model multi-modal and non-stationary processes. We exemplify the approach on a set of synthetic data and provide results on real data from motion capture and geostatistics. △ Less

Submitted 16 September, 2017; v1 submitted 18 July, 2017; originally announced July 2017.

arXiv:1701.03449 [pdf, other]

Manifold Alignment Determination: finding correspondences across different data views

Authors: Andreas Damianou, Neil D. Lawrence, Carl Henrik Ek

Abstract: We present Manifold Alignment Determination (MAD), an algorithm for learning alignments between data points from multiple views or modalities. The approach is capable of learning correspondences between views as well as correspondences between individual data-points. The proposed method requires only a few aligned examples from which it is capable to recover a global alignment through a probabilis… ▽ More We present Manifold Alignment Determination (MAD), an algorithm for learning alignments between data points from multiple views or modalities. The approach is capable of learning correspondences between views as well as correspondences between individual data-points. The proposed method requires only a few aligned examples from which it is capable to recover a global alignment through a probabilistic model. The strong, yet flexible regularization provided by the generative model is sufficient to align the views. We provide experiments on both synthetic and real data to highlight the benefit of the proposed approach. △ Less

Submitted 12 January, 2017; originally announced January 2017.

Comments: NIPS workshop on Multi-Modal Machine Learning, 2015

MSC Class: 60G15 ACM Class: G.3; G.1.2; I.2.6; I.5.4

arXiv:1607.08206 [pdf, other]

Diagnostic Prediction Using Discomfort Drawings with IBTM

Authors: Cheng Zhang, Hedvig Kjellstrom, Carl Henrik Ek, Bo C. Bertilson

Abstract: In this paper, we explore the possibility to apply machine learning to make diagnostic predictions using discomfort drawings. A discomfort drawing is an intuitive way for patients to express discomfort and pain related symptoms. These drawings have proven to be an effective method to collect patient data and make diagnostic decisions in real-life practice. A dataset from real-world patient cases i… ▽ More In this paper, we explore the possibility to apply machine learning to make diagnostic predictions using discomfort drawings. A discomfort drawing is an intuitive way for patients to express discomfort and pain related symptoms. These drawings have proven to be an effective method to collect patient data and make diagnostic decisions in real-life practice. A dataset from real-world patient cases is collected for which medical experts provide diagnostic labels. Next, we use a factorized multimodal topic model, Inter-Battery Topic Model (IBTM), to train a system that can make diagnostic predictions given an unseen discomfort drawing. The number of output diagnostic labels is determined by using mean-shift clustering on the discomfort drawing. Experimental results show reasonable predictions of diagnostic labels given an unseen discomfort drawing. Additionally, we generate synthetic discomfort drawings with IBTM given a diagnostic label, which results in typical cases of symptoms. The positive result indicates a significant potential of machine learning to be used for parts of the pain diagnostic process and to be a decision support system for physicians and other health care personnel. △ Less

Submitted 13 September, 2016; v1 submitted 27 July, 2016; originally announced July 2016.

Comments: Presented at 2016 Machine Learning and Healthcare Conference (MLHC 2016), Los Angeles, CA

arXiv:1607.00067 [pdf, other]

Unsupervised Learning with Imbalanced Data via Structure Consolidation Latent Variable Model

Authors: Fariba Yousefi, Zhenwen Dai, Carl Henrik Ek, Neil Lawrence

Abstract: Unsupervised learning on imbalanced data is challenging because, when given imbalanced data, current model is often dominated by the major category and ignores the categories with small amount of data. We develop a latent variable model that can cope with imbalanced data by dividing the latent space into a shared space and a private space. Based on Gaussian Process Latent Variable Models, we propo… ▽ More Unsupervised learning on imbalanced data is challenging because, when given imbalanced data, current model is often dominated by the major category and ignores the categories with small amount of data. We develop a latent variable model that can cope with imbalanced data by dividing the latent space into a shared space and a private space. Based on Gaussian Process Latent Variable Models, we propose a new kernel formulation that enables the separation of latent space and derives an efficient variational inference method. The performance of our model is demonstrated with an imbalanced medical image dataset. △ Less

Submitted 30 June, 2016; originally announced July 2016.

Comments: ICLR 2016 Workshop

arXiv:1605.06155 [pdf, other]

Inter-Battery Topic Representation Learning

Authors: Cheng Zhang, Hedvig Kjellstrom, Carl Henrik Ek

Abstract: In this paper, we present the Inter-Battery Topic Model (IBTM). Our approach extends traditional topic models by learning a factorized latent variable representation. The structured representation leads to a model that marries benefits traditionally associated with a discriminative approach, such as feature selection, with those of a generative model, such as principled regularization and ability… ▽ More In this paper, we present the Inter-Battery Topic Model (IBTM). Our approach extends traditional topic models by learning a factorized latent variable representation. The structured representation leads to a model that marries benefits traditionally associated with a discriminative approach, such as feature selection, with those of a generative model, such as principled regularization and ability to handle missing data. The factorization is provided by representing data in terms of aligned pairs of observations as different views. This provides means for selecting a representation that separately models topics that exist in both views from the topics that are unique to a single view. This structured consolidation allows for efficient and robust inference and provides a compact and efficient representation. Learning is performed in a Bayesian fashion by maximizing a rigorous bound on the log-likelihood. Firstly, we illustrate the benefits of the model on a synthetic dataset,. The model is then evaluated in both uni- and multi-modality settings on two different classification tasks with off-the-shelf convolutional neural network (CNN) features which generate state-of-the-art results with extremely compact representations. △ Less

Submitted 28 July, 2016; v1 submitted 19 May, 2016; originally announced May 2016.

Comments: ECCV 2016

arXiv:1604.04939 [pdf, other]

Multi-view Learning as a Nonparametric Nonlinear Inter-Battery Factor Analysis

Authors: Andreas Damianou, Neil D. Lawrence, Carl Henrik Ek

Abstract: Factor analysis aims to determine latent factors, or traits, which summarize a given data set. Inter-battery factor analysis extends this notion to multiple views of the data. In this paper we show how a nonlinear, nonparametric version of these models can be recovered through the Gaussian process latent variable model. This gives us a flexible formalism for multi-view learning where the latent va… ▽ More Factor analysis aims to determine latent factors, or traits, which summarize a given data set. Inter-battery factor analysis extends this notion to multiple views of the data. In this paper we show how a nonlinear, nonparametric version of these models can be recovered through the Gaussian process latent variable model. This gives us a flexible formalism for multi-view learning where the latent variables can be used both for exploratory purposes and for learning representations that enable efficient inference for ambiguous estimation tasks. Learning is performed in a Bayesian manner through the formulation of a variational compression scheme which gives a rigorous lower bound on the log likelihood. Our Bayesian framework provides strong regularization during training, allowing the structure of the latent space to be determined efficiently and automatically. We demonstrate this by producing the first (to our knowledge) published results of learning from dozens of views, even when data is scarce. We further show experimental results on several different types of multi-view data sets and for different kinds of tasks, including exploratory data analysis, generation, ambiguity modelling through latent priors and classification. △ Less

Submitted 17 April, 2016; originally announced April 2016.

Comments: 49 pages including appendix

MSC Class: 60G15 (Primary) 58E30; 62-09 ACM Class: G.3; G.1.2; I.2.6; I.5.4

arXiv:1501.06284 [pdf, other]

On a Family of Decomposable Kernels on Sequences

Authors: Andrea Baisero, Florian T. Pokorny, Carl Henrik Ek

Abstract: In many applications data is naturally presented in terms of orderings of some basic elements or symbols. Reasoning about such data requires a notion of similarity capable of handling sequences of different lengths. In this paper we describe a family of Mercer kernel functions for such sequentially structured data. The family is characterized by a decomposable structure in terms of symbol-level an… ▽ More In many applications data is naturally presented in terms of orderings of some basic elements or symbols. Reasoning about such data requires a notion of similarity capable of handling sequences of different lengths. In this paper we describe a family of Mercer kernel functions for such sequentially structured data. The family is characterized by a decomposable structure in terms of symbol-level and structure-level similarities, representing a specific combination of kernels which allows for efficient computation. We provide an experimental evaluation on sequential classification tasks comparing kernels from our family of kernels to a state of the art sequence kernel called the Global Alignment kernel which has been shown to outperform Dynamic Time War** △ Less

Submitted 26 January, 2015; originally announced January 2015.

arXiv:1411.6509 [pdf, other]

Persistent Evidence of Local Image Properties in Generic ConvNets

Authors: Ali Sharif Razavian, Hossein Azizpour, Atsuto Maki, Josephine Sullivan, Carl Henrik Ek, Stefan Carlsson

Abstract: Supervised training of a convolutional network for object classification should make explicit any information related to the class of objects and disregard any auxiliary information associated with the capture of the image or the variation within the object class. Does this happen in practice? Although this seems to pertain to the very final layers in the network, if we look at earlier layers we f… ▽ More Supervised training of a convolutional network for object classification should make explicit any information related to the class of objects and disregard any auxiliary information associated with the capture of the image or the variation within the object class. Does this happen in practice? Although this seems to pertain to the very final layers in the network, if we look at earlier layers we find that this is not the case. Surprisingly, strong spatial information is implicit. This paper addresses this, in particular, exploiting the image representation at the first fully connected layer, i.e. the global image descriptor which has been recently shown to be most effective in a range of visual recognition tasks. We empirically demonstrate evidences for the finding in the contexts of four different tasks: 2d landmark detection, 2d object keypoints prediction, estimation of the RGB values of input image, and recovery of semantic label of each pixel. We base our investigation on a simple framework with ridge rigression commonly across these tasks, and show results which all support our insight. Such spatial information can be used for computing correspondence of landmarks to a good accuracy, but should potentially be useful for improving the training of the convolutional nets for classification purposes. △ Less

Submitted 24 November, 2014; originally announced November 2014.

arXiv:1301.3461 [pdf, ps, other]

Factorized Topic Models

Authors: Cheng Zhang, Carl Henrik Ek, Andreas Damianou, Hedvig Kjellstrom

Abstract: In this paper we present a modification to a latent topic model, which makes the model exploit supervision to produce a factorized representation of the observed data. The structured parameterization separately encodes variance that is shared between classes from variance that is private to each class by the introduction of a new prior over the topic space. The approach allows for a more eff{}icie… ▽ More In this paper we present a modification to a latent topic model, which makes the model exploit supervision to produce a factorized representation of the observed data. The structured parameterization separately encodes variance that is shared between classes from variance that is private to each class by the introduction of a new prior over the topic space. The approach allows for a more eff{}icient inference and provides an intuitive interpretation of the data in terms of an informative signal together with structured noise. The factorized representation is shown to enhance inference performance for image, text, and video classification. △ Less

Submitted 23 April, 2013; v1 submitted 15 January, 2013; originally announced January 2013.

Comments: ICLR 2013

Showing 1–30 of 30 results for author: Ek, C H