Search | arXiv e-print repository

Automatic variational inference with cascading flows

Authors: Luca Ambrogioni, Gianluigi Silvestri, Marcel van Gerven

Abstract: The automation of probabilistic reasoning is one of the primary aims of machine learning. Recently, the confluence of variational inference and deep learning has led to powerful and flexible automatic inference methods that can be trained by stochastic gradient descent. In particular, normalizing flows are highly parameterized deep models that can fit arbitrarily complex posterior densities. Howev… ▽ More The automation of probabilistic reasoning is one of the primary aims of machine learning. Recently, the confluence of variational inference and deep learning has led to powerful and flexible automatic inference methods that can be trained by stochastic gradient descent. In particular, normalizing flows are highly parameterized deep models that can fit arbitrarily complex posterior densities. However, normalizing flows struggle in highly structured probabilistic programs as they need to relearn the forward-pass of the program. Automatic structured variational inference (ASVI) remedies this problem by constructing variational programs that embed the forward-pass. Here, we combine the flexibility of normalizing flows and the prior-embedding property of ASVI in a new family of variational programs, which we named cascading flows. A cascading flows program interposes a newly designed highway flow architecture in between the conditional distributions of the prior program such as to steer it toward the observed data. These programs can be constructed automatically from an input probabilistic program and can also be amortized automatically. We evaluate the performance of the new variational programs in a series of structured inference problems. We find that cascading flows have much higher performance than both normalizing flows and ASVI in a large set of structured inference problems. △ Less

Submitted 9 February, 2021; originally announced February 2021.

arXiv:2006.06438 [pdf, other]

GAIT-prop: A biologically plausible learning rule derived from backpropagation of error

Authors: Nasir Ahmad, Marcel A. J. van Gerven, Luca Ambrogioni

Abstract: Traditional backpropagation of error, though a highly successful algorithm for learning in artificial neural network models, includes features which are biologically implausible for learning in real neural circuits. An alternative called target propagation proposes to solve this implausibility by using a top-down model of neural activity to convert an error at the output of a neural network into l… ▽ More Traditional backpropagation of error, though a highly successful algorithm for learning in artificial neural network models, includes features which are biologically implausible for learning in real neural circuits. An alternative called target propagation proposes to solve this implausibility by using a top-down model of neural activity to convert an error at the output of a neural network into layer-wise and plausible 'targets' for every unit. These targets can then be used to produce weight updates for network training. However, thus far, target propagation has been heuristically proposed without demonstrable equivalence to backpropagation. Here, we derive an exact correspondence between backpropagation and a modified form of target propagation (GAIT-prop) where the target is a small perturbation of the forward pass. Specifically, backpropagation and GAIT-prop give identical updates when synaptic weight matrices are orthogonal. In a series of simple computer vision experiments, we show near-identical performance between backpropagation and GAIT-prop with a soft orthogonality-inducing regularizer. △ Less

Submitted 5 November, 2020; v1 submitted 11 June, 2020; originally announced June 2020.

Comments: 13 pages, 4 figures

arXiv:2004.14545 [pdf, other]

Explainable Deep Learning: A Field Guide for the Uninitiated

Authors: Gabrielle Ras, Ning Xie, Marcel van Gerven, Derek Doran

Abstract: Deep neural networks (DNNs) have become a proven and indispensable machine learning tool. As a black-box model, it remains difficult to diagnose what aspects of the model's input drive the decisions of a DNN. In countless real-world domains, from legislation and law enforcement to healthcare, such diagnosis is essential to ensure that DNN decisions are driven by aspects appropriate in the context… ▽ More Deep neural networks (DNNs) have become a proven and indispensable machine learning tool. As a black-box model, it remains difficult to diagnose what aspects of the model's input drive the decisions of a DNN. In countless real-world domains, from legislation and law enforcement to healthcare, such diagnosis is essential to ensure that DNN decisions are driven by aspects appropriate in the context of its use. The development of methods and studies enabling the explanation of a DNN's decisions has thus blossomed into an active, broad area of research. A practitioner wanting to study explainable deep learning may be intimidated by the plethora of orthogonal directions the field has taken. This complexity is further exacerbated by competing definitions of what it means ``to explain'' the actions of a DNN and to evaluate an approach's ``ability to explain''. This article offers a field guide to explore the space of explainable deep learning aimed at those uninitiated in the field. The field guide: i) Introduces three simple dimensions defining the space of foundational methods that contribute to explainable deep learning, ii) discusses the evaluations for model explanations, iii) places explainability in the context of other related deep learning research areas, and iv) finally elaborates on user-oriented explanation designing and potential future directions on explainable deep learning. We hope the guide is used as an easy-to-digest starting point for those just embarking on research in this field. △ Less

Submitted 13 September, 2021; v1 submitted 29 April, 2020; originally announced April 2020.

Comments: Survey paper on Explainable Deep Learning, 70 pages including references, 13 figures, 5 tables

arXiv:2002.00643 [pdf, other]

Automatic structured variational inference

Authors: Luca Ambrogioni, Kate Lin, Emily Fertig, Sharad Vikram, Max Hinne, Dave Moore, Marcel van Gerven

Abstract: Stochastic variational inference offers an attractive option as a default method for differentiable probabilistic programming. However, the performance of the variational approach depends on the choice of an appropriate variational family. Here, we introduce automatic structured variational inference (ASVI), a fully automated method for constructing structured variational families, inspired by the… ▽ More Stochastic variational inference offers an attractive option as a default method for differentiable probabilistic programming. However, the performance of the variational approach depends on the choice of an appropriate variational family. Here, we introduce automatic structured variational inference (ASVI), a fully automated method for constructing structured variational families, inspired by the closed-form update in conjugate Bayesian models. These convex-update families incorporate the forward pass of the input probabilistic program and can therefore capture complex statistical dependencies. Convex-update families have the same space and time complexity as the input probabilistic program and are therefore tractable for a very large family of models including both continuous and discrete variables. We validate our automatic variational method on a wide range of low- and high-dimensional inference problems. We find that ASVI provides a clear improvement in performance when compared with other popular approaches such as the mean-field approach and inverse autoregressive flows. We provide an open source implementation of ASVI in TensorFlow Probability. △ Less

Submitted 10 February, 2021; v1 submitted 3 February, 2020; originally announced February 2020.

arXiv:2001.10657 [pdf, other]

The Indian Chefs Process

Authors: Patrick Dallaire, Luca Ambrogioni, Ludovic Trottier, Umut Güçlü, Max Hinne, Philippe Giguère, Brahim Chaib-Draa, Marcel van Gerven, Francois Laviolette

Abstract: This paper introduces the Indian Chefs Process (ICP), a Bayesian nonparametric prior on the joint space of infinite directed acyclic graphs (DAGs) and orders that generalizes Indian Buffet Processes. As our construction shows, the proposed distribution relies on a latent Beta Process controlling both the orders and outgoing connection probabilities of the nodes, and yields a probability distributi… ▽ More This paper introduces the Indian Chefs Process (ICP), a Bayesian nonparametric prior on the joint space of infinite directed acyclic graphs (DAGs) and orders that generalizes Indian Buffet Processes. As our construction shows, the proposed distribution relies on a latent Beta Process controlling both the orders and outgoing connection probabilities of the nodes, and yields a probability distribution on sparse infinite graphs. The main advantage of the ICP over previously proposed Bayesian nonparametric priors for DAG structures is its greater flexibility. To the best of our knowledge, the ICP is the first Bayesian nonparametric model supporting every possible DAG. We demonstrate the usefulness of the ICP on learning the structure of deep generative sigmoid networks as well as convolutional neural networks. △ Less

Submitted 28 January, 2020; originally announced January 2020.

arXiv:1912.09831 [pdf, other]

Background Hardly Matters: Understanding Personality Attribution in Deep Residual Networks

Authors: Gabriëlle Ras, Ron Dotsch, Luca Ambrogioni, Umut Güçlü, Marcel A. J. van Gerven

Abstract: Perceived personality traits attributed to an individual do not have to correspond to their actual personality traits and may be determined in part by the context in which one encounters a person. These apparent traits determine, to a large extent, how other people will behave towards them. Deep neural networks are increasingly being used to perform automated personality attribution (e.g., job int… ▽ More Perceived personality traits attributed to an individual do not have to correspond to their actual personality traits and may be determined in part by the context in which one encounters a person. These apparent traits determine, to a large extent, how other people will behave towards them. Deep neural networks are increasingly being used to perform automated personality attribution (e.g., job interviews). It is important that we understand the driving factors behind the predictions, in humans and in deep neural networks. This paper explicitly studies the effect of the image background on apparent personality prediction while addressing two important confounds present in existing literature; overlap** data splits and including facial information in the background. Surprisingly, we found no evidence that background information improves model predictions for apparent personality traits. In fact, when background is explicitly added to the input, a decrease in performance was measured across all models. △ Less

Submitted 20 December, 2019; originally announced December 2019.

Comments: 10 pages, 4 figures, 2 tables

arXiv:1912.04075 [pdf, other]

Temporal Factorization of 3D Convolutional Kernels

Authors: Gabriëlle Ras, Luca Ambrogioni, Umut Güçlü, Marcel A. J. van Gerven

Abstract: 3D convolutional neural networks are difficult to train because they are parameter-expensive and data-hungry. To solve these problems we propose a simple technique for learning 3D convolutional kernels efficiently requiring less training data. We achieve this by factorizing the 3D kernel along the temporal dimension, reducing the number of parameters and making training from data more efficient. A… ▽ More 3D convolutional neural networks are difficult to train because they are parameter-expensive and data-hungry. To solve these problems we propose a simple technique for learning 3D convolutional kernels efficiently requiring less training data. We achieve this by factorizing the 3D kernel along the temporal dimension, reducing the number of parameters and making training from data more efficient. Additionally we introduce a novel dataset called Video-MNIST to demonstrate the performance of our method. Our method significantly outperforms the conventional 3D convolution in the low data regime (1 to 5 videos per class). Finally, our model achieves competitive results in the high data regime (>10 videos per class) using up to 45% fewer parameters. △ Less

Submitted 9 December, 2019; originally announced December 2019.

Comments: 8 pages, 3 figures, Proceedings of BNAIC/BENELEARN 2019 conference

Journal ref: Proceedings of the 31st Benelux Conference on Artificial Intelligence (BNAIC 2019) and the 28th Belgian Dutch Conference on Machine Learning (Benelearn 2019), Brussels, Belgium, November 6-8, 2019

arXiv:1911.06722 [pdf, other]

Bayesian nonparametric discontinuity design

Authors: Max Hinne, David Leeftink, Marcel A. J. van Gerven, Luca Ambrogioni

Abstract: Quasi-experimental research designs, such as regression discontinuity and interrupted time series, allow for causal inference in the absence of a randomized controlled trial, at the cost of additional assumptions. In this paper, we provide a framework for discontinuity-based designs using Bayesian model comparison and Gaussian process regression, which we refer to as 'Bayesian nonparametric discon… ▽ More Quasi-experimental research designs, such as regression discontinuity and interrupted time series, allow for causal inference in the absence of a randomized controlled trial, at the cost of additional assumptions. In this paper, we provide a framework for discontinuity-based designs using Bayesian model comparison and Gaussian process regression, which we refer to as 'Bayesian nonparametric discontinuity design', or BNDD for short. BNDD addresses the two major shortcomings in most implementations of such designs: overconfidence due to implicit conditioning on the alleged effect, and model misspecification due to reliance on overly simplistic regression models. With the appropriate Gaussian process covariance function, our approach can detect discontinuities of any order, and in spectral features. We demonstrate the usage of BNDD in simulations, and apply the framework to determine the effect of running for political positions on longevity, of the effect of an alleged historical phantom border in the Netherlands on Dutch voting behaviour, and of Kundalini Yoga meditation on heart rate. △ Less

Submitted 14 December, 2021; v1 submitted 15 November, 2019; originally announced November 2019.

Comments: 15 pages, 6 figures. Parts of this work are published in 'Spectral discontinuity design: Interrupted time series with spectral mixture kernels' in the Machine Learning for Health workshop at NeurIPS 2020

arXiv:1907.04050 [pdf, other]

k-GANs: Ensemble of Generative Models with Semi-Discrete Optimal Transport

Authors: Luca Ambrogioni, Umut Güçlü, Marcel van Gerven

Abstract: Generative adversarial networks (GANs) are the state of the art in generative modeling. Unfortunately, most GAN methods are susceptible to mode collapse, meaning that they tend to capture only a subset of the modes of the true distribution. A possible way of dealing with this problem is to use an ensemble of GANs, where (ideally) each network models a single mode. In this paper, we introduce a pri… ▽ More Generative adversarial networks (GANs) are the state of the art in generative modeling. Unfortunately, most GAN methods are susceptible to mode collapse, meaning that they tend to capture only a subset of the modes of the true distribution. A possible way of dealing with this problem is to use an ensemble of GANs, where (ideally) each network models a single mode. In this paper, we introduce a principled method for training an ensemble of GANs using semi-discrete optimal transport theory. In our approach, each generative network models the transportation map between a point mass (Dirac measure) and the restriction of the data distribution on a tile of a Voronoi tessellation that is defined by the location of the point masses. We iteratively train the generative networks and the point masses until convergence. The resulting k-GANs algorithm has strong theoretical connection with the k-medoids algorithm. In our experiments, we show that our ensemble method consistently outperforms baseline GANs. △ Less

Submitted 9 July, 2019; originally announced July 2019.

arXiv:1904.00469

Perturbative estimation of stochastic gradients

Authors: Luca Ambrogioni, Marcel A. J. van Gerven

Abstract: In this paper we introduce a family of stochastic gradient estimation techniques based of the perturbative expansion around the mean of the sampling distribution. We characterize the bias and variance of the resulting Taylor-corrected estimators using the Lagrange error formula. Furthermore, we introduce a family of variance reduction techniques that can be applied to other gradient estimators. Fi… ▽ More In this paper we introduce a family of stochastic gradient estimation techniques based of the perturbative expansion around the mean of the sampling distribution. We characterize the bias and variance of the resulting Taylor-corrected estimators using the Lagrange error formula. Furthermore, we introduce a family of variance reduction techniques that can be applied to other gradient estimators. Finally, we show that these new perturbative methods can be extended to discrete functions using analytic continuation. Using this technique, we derive a new gradient descent method for training stochastic networks with binary weights. In our experiments, we show that the perturbative correction improves the convergence of stochastic variational inference both in the continuous and in the discrete case. △ Less

Submitted 15 November, 2019; v1 submitted 31 March, 2019; originally announced April 2019.

Comments: Needs improvements, the experiments are too limited

arXiv:1811.02827 [pdf, other]

Wasserstein variational gradient descent: From semi-discrete optimal transport to ensemble variational inference

Authors: Luca Ambrogioni, Umut Guclu, Marcel van Gerven

Abstract: Particle-based variational inference offers a flexible way of approximating complex posterior distributions with a set of particles. In this paper we introduce a new particle-based variational inference method based on the theory of semi-discrete optimal transport. Instead of minimizing the KL divergence between the posterior and the variational approximation, we minimize a semi-discrete optimal t… ▽ More Particle-based variational inference offers a flexible way of approximating complex posterior distributions with a set of particles. In this paper we introduce a new particle-based variational inference method based on the theory of semi-discrete optimal transport. Instead of minimizing the KL divergence between the posterior and the variational approximation, we minimize a semi-discrete optimal transport divergence. The solution of the resulting optimal transport problem provides both a particle approximation and a set of optimal transportation densities that map each particle to a segment of the posterior distribution. We approximate these transportation densities by minimizing the KL divergence between a truncated distribution and the optimal transport solution. The resulting algorithm can be interpreted as a form of ensemble variational inference where each particle is associated with a local variational approximation. △ Less

Submitted 15 May, 2019; v1 submitted 7 November, 2018; originally announced November 2018.

arXiv:1805.11542 [pdf, other]

Forward Amortized Inference for Likelihood-Free Variational Marginalization

Authors: Luca Ambrogioni, Umut Güçlü, Julia Berezutskaya, Eva W. P. van den Borne, Yağmur Güçlütürk, Max Hinne, Eric Maris, Marcel A. J. van Gerven

Abstract: In this paper, we introduce a new form of amortized variational inference by using the forward KL divergence in a joint-contrastive variational loss. The resulting forward amortized variational inference is a likelihood-free method as its gradient can be sampled without bias and without requiring any evaluation of either the model joint distribution or its derivatives. We prove that our new variat… ▽ More In this paper, we introduce a new form of amortized variational inference by using the forward KL divergence in a joint-contrastive variational loss. The resulting forward amortized variational inference is a likelihood-free method as its gradient can be sampled without bias and without requiring any evaluation of either the model joint distribution or its derivatives. We prove that our new variational loss is optimized by the exact posterior marginals in the fully factorized mean-field approximation, a property that is not shared with the more conventional reverse KL inference. Furthermore, we show that forward amortized inference can be easily marginalized over large families of latent variables in order to obtain a marginalized variational posterior. We consider two examples of variational marginalization. In our first example we train a Bayesian forecaster for predicting a simplified chaotic model of atmospheric convection. In the second example we train an amortized variational approximation of a Bayesian optimal classifier by marginalizing over the model space. The result is a powerful meta-classification network that can solve arbitrary classification problems without further training. △ Less

Submitted 29 May, 2018; originally announced May 2018.

Comments: 9 pages, 3 figures

arXiv:1805.11284 [pdf, other]

Wasserstein Variational Inference

Authors: Luca Ambrogioni, Umut Güçlü, Yağmur Güçlütürk, Max Hinne, Eric Maris, Marcel A. J. van Gerven

Abstract: This paper introduces Wasserstein variational inference, a new form of approximate Bayesian inference based on optimal transport theory. Wasserstein variational inference uses a new family of divergences that includes both f-divergences and the Wasserstein distance as special cases. The gradients of the Wasserstein variational loss are obtained by backpropagating through the Sinkhorn iterations. T… ▽ More This paper introduces Wasserstein variational inference, a new form of approximate Bayesian inference based on optimal transport theory. Wasserstein variational inference uses a new family of divergences that includes both f-divergences and the Wasserstein distance as special cases. The gradients of the Wasserstein variational loss are obtained by backpropagating through the Sinkhorn iterations. This technique results in a very stable likelihood-free training method that can be used with implicit distributions and probabilistic programs. Using the Wasserstein variational inference framework, we introduce several new forms of autoencoders and test their robustness and performance against existing variational autoencoding techniques. △ Less

Submitted 4 June, 2018; v1 submitted 29 May, 2018; originally announced May 2018.

Comments: 8 pages, 1 figure

arXiv:1803.07517 [pdf, other]

Explanation Methods in Deep Learning: Users, Values, Concerns and Challenges

Authors: Gabrielle Ras, Marcel van Gerven, Pim Haselager

Abstract: Issues regarding explainable AI involve four components: users, laws & regulations, explanations and algorithms. Together these components provide a context in which explanation methods can be evaluated regarding their adequacy. The goal of this chapter is to bridge the gap between expert users and lay users. Different kinds of users are identified and their concerns revealed, relevant statements… ▽ More Issues regarding explainable AI involve four components: users, laws & regulations, explanations and algorithms. Together these components provide a context in which explanation methods can be evaluated regarding their adequacy. The goal of this chapter is to bridge the gap between expert users and lay users. Different kinds of users are identified and their concerns revealed, relevant statements from the General Data Protection Regulation are analyzed in the context of Deep Neural Networks (DNNs), a taxonomy for the classification of existing explanation methods is introduced, and finally, the various classes of explanation methods are analyzed to verify if user concerns are justified. Overall, it is clear that (visual) explanations can be given about various aspects of the influence of the input on the output. However, it is noted that explanation methods or interfaces for lay users are missing and we speculate which criteria these methods / interfaces should satisfy. Finally it is noted that two important concerns are difficult to address with explanation methods: the concern about bias in datasets that leads to biased DNNs, as well as the suspicion about unfair outcomes. △ Less

Submitted 29 March, 2018; v1 submitted 20 March, 2018; originally announced March 2018.

Comments: 14 pages, 1 figure, This article will appear as a chapter in Explainable and Interpretable Models in Computer Vision and Machine Learning Springer series on Challenges in Machine Learning

MSC Class: 68-02

arXiv:1802.03488 [pdf, other]

Generalization of an Upper Bound on the Number of Nodes Needed to Achieve Linear Separability

Authors: Marjolein Troost, Katja Seeliger, Marcel van Gerven

Abstract: An important issue in neural network research is how to choose the number of nodes and layers such as to solve a classification problem. We provide new intuitions based on earlier results by An et al. (2015) by deriving an upper bound on the number of nodes in networks with two hidden layers such that linear separability can be achieved. Concretely, we show that if the data can be described in ter… ▽ More An important issue in neural network research is how to choose the number of nodes and layers such as to solve a classification problem. We provide new intuitions based on earlier results by An et al. (2015) by deriving an upper bound on the number of nodes in networks with two hidden layers such that linear separability can be achieved. Concretely, we show that if the data can be described in terms of N finite sets and the used activation function f is non-constant, increasing and has a left asymptote, we can derive how many nodes are needed to linearly separate these sets. This will be an upper bound that depends on the structure of the data. This structure can be analyzed using an algorithm. For the leaky rectified linear activation function, we prove separately that under some conditions on the slope, the same number of layers and nodes as for the aforementioned activation functions is sufficient. We empirically validate our claims. △ Less

Submitted 9 February, 2018; originally announced February 2018.

Comments: Presented at the 29th Benelux Conference on Artificial Intelligence (BNAIC 2017)

arXiv:1705.07111 [pdf, other]

The Kernel Mixture Network: A Nonparametric Method for Conditional Density Estimation of Continuous Random Variables

Authors: Luca Ambrogioni, Umut Güçlü, Marcel A. J. van Gerven, Eric Maris

Abstract: This paper introduces the kernel mixture network, a new method for nonparametric estimation of conditional probability densities using neural networks. We model arbitrarily complex conditional densities as linear combinations of a family of kernel functions centered at a subset of training points. The weights are determined by the outer layer of a deep neural network, trained by minimizing the neg… ▽ More This paper introduces the kernel mixture network, a new method for nonparametric estimation of conditional probability densities using neural networks. We model arbitrarily complex conditional densities as linear combinations of a family of kernel functions centered at a subset of training points. The weights are determined by the outer layer of a deep neural network, trained by minimizing the negative log likelihood. This generalizes the popular quantized softmax approach, which can be seen as a kernel mixture network with square and non-overlap** kernels. We test the performance of our method on two important applications, namely Bayesian filtering and generative modeling. In the Bayesian filtering example, we show that the method can be used to filter complex nonlinear and non-Gaussian signals defined on manifolds. The resulting kernel mixture network filter outperforms both the quantized softmax filter and the extended Kalman filter in terms of model likelihood. Finally, our experiments on generative models show that, given the same architecture, the kernel mixture network leads to higher test set likelihood, less overfitting and more diversified and realistic generated samples than the quantized softmax approach. △ Less

Submitted 19 May, 2017; originally announced May 2017.

arXiv:1705.07109 [pdf, other]

Deep adversarial neural decoding

Authors: Yağmur Güçlütürk, Umut Güçlü, Katja Seeliger, Sander Bosch, Rob van Lier, Marcel van Gerven

Abstract: Here, we present a novel approach to solve the problem of reconstructing perceived stimuli from brain responses by combining probabilistic inference with deep learning. Our approach first inverts the linear transformation from latent features to brain responses with maximum a posteriori estimation and then inverts the nonlinear transformation from perceived stimuli to latent features with adversar… ▽ More Here, we present a novel approach to solve the problem of reconstructing perceived stimuli from brain responses by combining probabilistic inference with deep learning. Our approach first inverts the linear transformation from latent features to brain responses with maximum a posteriori estimation and then inverts the nonlinear transformation from perceived stimuli to latent features with adversarial training of convolutional neural networks. We test our approach with a functional magnetic resonance imaging experiment and show that it can generate state-of-the-art reconstructions of perceived faces from brain activations. △ Less

Submitted 15 June, 2017; v1 submitted 19 May, 2017; originally announced May 2017.

Comments: Added appendix and updated figures

arXiv:1705.05603 [pdf, other]

GP CaKe: Effective brain connectivity with causal kernels

Authors: Luca Ambrogioni, Max Hinne, Marcel van Gerven, Eric Maris

Abstract: A fundamental goal in network neuroscience is to understand how activity in one region drives activity elsewhere, a process referred to as effective connectivity. Here we propose to model this causal interaction using integro-differential equations and causal kernels that allow for a rich analysis of effective connectivity. The approach combines the tractability and flexibility of autoregressive m… ▽ More A fundamental goal in network neuroscience is to understand how activity in one region drives activity elsewhere, a process referred to as effective connectivity. Here we propose to model this causal interaction using integro-differential equations and causal kernels that allow for a rich analysis of effective connectivity. The approach combines the tractability and flexibility of autoregressive modeling with the biophysical interpretability of dynamic causal modeling. The causal kernels are learned nonparametrically using Gaussian process regression, yielding an efficient framework for causal inference. We construct a novel class of causal covariance functions that enforce the desired properties of the causal kernels, an approach which we call GP CaKe. By construction, the model and its hyperparameters have biophysical meaning and are therefore easily interpretable. We demonstrate the efficacy of GP CaKe on a number of simulations and give an example of a realistic application on magnetoencephalography (MEG) data. △ Less

Submitted 16 May, 2017; originally announced May 2017.

arXiv:1702.05243 [pdf, other]

Estimating Nonlinear Dynamics with the ConvNet Smoother

Authors: Luca Ambrogioni, Umut Güçlü, Eric Maris, Marcel van Gerven

Abstract: Estimating the state of a dynamical system from a series of noise-corrupted observations is fundamental in many areas of science and engineering. The most well-known method, the Kalman smoother (and the related Kalman filter), relies on assumptions of linearity and Gaussianity that are rarely met in practice. In this paper, we introduced a new dynamical smoothing method that exploits the remarkabl… ▽ More Estimating the state of a dynamical system from a series of noise-corrupted observations is fundamental in many areas of science and engineering. The most well-known method, the Kalman smoother (and the related Kalman filter), relies on assumptions of linearity and Gaussianity that are rarely met in practice. In this paper, we introduced a new dynamical smoothing method that exploits the remarkable capabilities of convolutional neural networks to approximate complex non-linear functions. The main idea is to generate a training set composed of both latent states and observations from an ensemble of simulators and to train the deep network to recover the former from the latter. Importantly, this method only requires the availability of the simulators and can therefore be applied in situations in which either the latent dynamical model or the observation model cannot be easily expressed in closed form. In our simulation studies, we show that the resulting ConvNet smoother has almost optimal performance in the Gaussian case even when the parameters are unknown. Furthermore, the method can be successfully applied to extremely non-linear and non-Gaussian systems. Finally, we empirically validate our approach via the analysis of measured brain signals. △ Less

Submitted 21 April, 2017; v1 submitted 17 February, 2017; originally announced February 2017.

arXiv:1701.01437

NIPS 2016 Workshop on Representation Learning in Artificial and Biological Neural Networks (MLINI 2016)

Authors: Leila Wehbe, Anwar Nunez-Elizalde, Marcel van Gerven, Irina Rish, Brian Murphy, Moritz Grosse-Wentrup, Georg Langs, Guillermo Cecchi

Abstract: This workshop explores the interface between cognitive neuroscience and recent advances in AI fields that aim to reproduce human performance such as natural language processing and computer vision, and specifically deep learning approaches to such problems. When studying the cognitive capabilities of the brain, scientists follow a system identification approach in which they present different st… ▽ More This workshop explores the interface between cognitive neuroscience and recent advances in AI fields that aim to reproduce human performance such as natural language processing and computer vision, and specifically deep learning approaches to such problems. When studying the cognitive capabilities of the brain, scientists follow a system identification approach in which they present different stimuli to the subjects and try to model the response that different brain areas have of that stimulus. The goal is to understand the brain by trying to find the function that expresses the activity of brain areas in terms of different properties of the stimulus. Experimental stimuli are becoming increasingly complex with more and more people being interested in studying real life phenomena such as the perception of natural images or natural sentences. There is therefore a need for a rich and adequate vector representation of the properties of the stimulus, that we can obtain using advances in machine learning. In parallel, new ML approaches, many of which in deep learning, are inspired to a certain extent by human behavior or biological principles. Neural networks for example were originally inspired by biological neurons. More recently, processes such as attention are being used which have are inspired by human behavior. However, the large bulk of these methods are independent of findings about brain function, and it is unclear whether it is at all beneficial for machine learning to try to emulate brain function in order to achieve the same tasks that the brain achieves. △ Less

Submitted 10 April, 2017; v1 submitted 6 January, 2017; originally announced January 2017.

arXiv:1605.02609 [pdf, other]

doi 10.1371/journal.pcbi.1005540

Dynamic Decomposition of Spatiotemporal Neural Signals

Authors: Luca Ambrogioni, Marcel A. J. van Gerven, Eric Maris

Abstract: Neural signals are characterized by rich temporal and spatiotemporal dynamics that reflect the organization of cortical networks. Theoretical research has shown how neural networks can operate at different dynamic ranges that correspond to specific types of information processing. Here we present a data analysis framework that uses a linearized model of these dynamic states in order to decompose t… ▽ More Neural signals are characterized by rich temporal and spatiotemporal dynamics that reflect the organization of cortical networks. Theoretical research has shown how neural networks can operate at different dynamic ranges that correspond to specific types of information processing. Here we present a data analysis framework that uses a linearized model of these dynamic states in order to decompose the measured neural signal into a series of components that capture both rhythmic and non-rhythmic neural activity. The method is based on stochastic differential equations and Gaussian process regression. Through computer simulations and analysis of magnetoencephalographic data, we demonstrate the efficacy of the method in identifying meaningful modulations of oscillatory signals corrupted by structured temporal and spatiotemporal noise. These results suggest that the method is particularly suitable for the analysis and interpretation of complex temporal and spatiotemporal neural signals. △ Less

Submitted 9 May, 2016; originally announced May 2016.

arXiv:1604.04931 [pdf, other]

Regularizing Solutions to the MEG Inverse Problem Using Space-Time Separable Covariance Functions

Authors: Arno Solin, Pasi Jylänki, Jaakko Kauramäki, Tom Heskes, Marcel A. J. van Gerven, Simo Särkkä

Abstract: In magnetoencephalography (MEG) the conventional approach to source reconstruction is to solve the underdetermined inverse problem independently over time and space. Here we present how the conventional approach can be extended by regularizing the solution in space and time by a Gaussian process (Gaussian random field) model. Assuming a separable covariance function in space and time, the computat… ▽ More In magnetoencephalography (MEG) the conventional approach to source reconstruction is to solve the underdetermined inverse problem independently over time and space. Here we present how the conventional approach can be extended by regularizing the solution in space and time by a Gaussian process (Gaussian random field) model. Assuming a separable covariance function in space and time, the computational complexity of the proposed model becomes (without any further assumptions or restrictions) $\mathcal{O}(t^3 + n^3 + m^2n)$, where $t$ is the number of time steps, $m$ is the number of sources, and $n$ is the number of sensors. We apply the method to both simulated and empirical data, and demonstrate the efficiency and generality of our Bayesian source reconstruction approach which subsumes various classical approaches in the literature. △ Less

Submitted 17 April, 2016; originally announced April 2016.

Comments: 25 pages, 7 figures

arXiv:1409.2676 [pdf, other]

Efficient sampling of Gaussian graphical models using conditional Bayes factors

Authors: Max Hinne, Alex Lenkoski, Tom Heskes, Marcel van Gerven

Abstract: Bayesian estimation of Gaussian graphical models has proven to be challenging because the conjugate prior distribution on the Gaussian precision matrix, the G-Wishart distribution, has a doubly intractable partition function. Recent developments provide a direct way to sample from the G-Wishart distribution, which allows for more efficient algorithms for model selection than previously possible. S… ▽ More Bayesian estimation of Gaussian graphical models has proven to be challenging because the conjugate prior distribution on the Gaussian precision matrix, the G-Wishart distribution, has a doubly intractable partition function. Recent developments provide a direct way to sample from the G-Wishart distribution, which allows for more efficient algorithms for model selection than previously possible. Still, estimating Gaussian graphical models with more than a handful of variables remains a nearly infeasible task. Here, we propose two novel algorithms that use the direct sampler to more efficiently approximate the posterior distribution of the Gaussian graphical model. The first algorithm uses conditional Bayes factors to compare models in a Metropolis-Hastings framework. The second algorithm is based on a continuous time Markov process. We show that both algorithms are substantially faster than state-of-the-art alternatives. Finally, we show how the algorithms may be used to simultaneously estimate both structural and functional connectivity between subcortical brain regions using resting-state fMRI. △ Less

Submitted 9 September, 2014; originally announced September 2014.

Comments: 9 pages, 1 figure

arXiv:1202.1696 [pdf, ps, other]

Bayesian Inference of Whole-Brain Networks

Authors: M. Hinne, T. Heskes, M. A. J. van Gerven

Abstract: In structural brain networks the connections of interest consist of white-matter fibre bundles between spatially segregated brain regions. The presence, location and orientation of these white matter tracts can be derived using diffusion MRI in combination with probabilistic tractography. Unfortunately, as of yet no approaches have been suggested that provide an undisputed way of inferring brain n… ▽ More In structural brain networks the connections of interest consist of white-matter fibre bundles between spatially segregated brain regions. The presence, location and orientation of these white matter tracts can be derived using diffusion MRI in combination with probabilistic tractography. Unfortunately, as of yet no approaches have been suggested that provide an undisputed way of inferring brain networks from tractography. In this paper, we provide a computational framework which we refer to as Bayesian connectomics. Rather than applying an arbitrary threshold to obtain a single network, we consider the posterior distribution of networks that are supported by the data, combined with an exponential random graph (ERGM) prior that captures a priori knowledge concerning the graph-theoretical properties of whole-brain networks. We show that, on simulated probabilistic tractography data, our approach is able to reconstruct whole-brain networks. In addition, our approach directly supports multi-model data fusion and group-level network inference. △ Less

Submitted 8 February, 2012; originally announced February 2012.

Comments: 10 pages, 2 figures

Showing 1–24 of 24 results for author: van Gerven, M