Search | arXiv e-print repository

Bridging discrete and continuous state spaces: Exploring the Ehrenfest process in time-continuous diffusion models

Authors: Ludwig Winkler, Lorenz Richter, Manfred Opper

Abstract: Generative modeling via stochastic processes has led to remarkable empirical results as well as to recent advances in their theoretical understanding. In principle, both space and time of the processes can be discrete or continuous. In this work, we study time-continuous Markov jump processes on discrete state spaces and investigate their correspondence to state-continuous diffusion processes give… ▽ More Generative modeling via stochastic processes has led to remarkable empirical results as well as to recent advances in their theoretical understanding. In principle, both space and time of the processes can be discrete or continuous. In this work, we study time-continuous Markov jump processes on discrete state spaces and investigate their correspondence to state-continuous diffusion processes given by SDEs. In particular, we revisit the $\textit{Ehrenfest process}$, which converges to an Ornstein-Uhlenbeck process in the infinite state space limit. Likewise, we can show that the time-reversal of the Ehrenfest process converges to the time-reversed Ornstein-Uhlenbeck process. This observation bridges discrete and continuous state spaces and allows to carry over methods from one to the respective other setting. Additionally, we suggest an algorithm for training the time-reversal of Markov jump processes which relies on conditional expectations and can thus be directly related to denoising score matching. We demonstrate our methods in multiple convincing numerical experiments. △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2404.01860 [pdf, other]

Self-StrAE at SemEval-2024 Task 1: Making Self-Structuring AutoEncoders Learn More With Less

Authors: Mattia Opper, N. Siddharth

Abstract: This paper presents two simple improvements to the Self-Structuring AutoEncoder (Self-StrAE). Firstly, we show that including reconstruction to the vocabulary as an auxiliary objective improves representation quality. Secondly, we demonstrate that increasing the number of independent channels leads to significant improvements in embedding quality, while simultaneously reducing the number of parame… ▽ More This paper presents two simple improvements to the Self-Structuring AutoEncoder (Self-StrAE). Firstly, we show that including reconstruction to the vocabulary as an auxiliary objective improves representation quality. Secondly, we demonstrate that increasing the number of independent channels leads to significant improvements in embedding quality, while simultaneously reducing the number of parameters. Surprisingly, we demonstrate that this trend can be followed to the extreme, even to point of reducing the total number of non-embedding parameters to seven. Our system can be pre-trained from scratch with as little as 10M tokens of input data, and proves effective across English, Spanish and Afrikaans. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: SemEval 2024

arXiv:2402.08676 [pdf, other]

A Convergence Analysis of Approximate Message Passing with Non-Separable Functions and Applications to Multi-Class Classification

Authors: Burak Çakmak, Yue M. Lu, Manfred Opper

Abstract: Motivated by the recent application of approximate message passing (AMP) to the analysis of convex optimizations in multi-class classifications [Loureiro, et. al., 2021], we present a convergence analysis of AMP dynamics with non-separable multivariate nonlinearities. As an application, we present a complete (and independent) analysis of the motivated convex optimization problem. Motivated by the recent application of approximate message passing (AMP) to the analysis of convex optimizations in multi-class classifications [Loureiro, et. al., 2021], we present a convergence analysis of AMP dynamics with non-separable multivariate nonlinearities. As an application, we present a complete (and independent) analysis of the motivated convex optimization problem. △ Less

Submitted 13 February, 2024; originally announced February 2024.

arXiv:2311.00128 [pdf, other]

On the effect of curriculum learning with developmental data for grammar acquisition

Authors: Mattia Opper, J. Morrison, N. Siddharth

Abstract: This work explores the degree to which grammar acquisition is driven by language `simplicity' and the source modality (speech vs. text) of data. Using BabyBERTa as a probe, we find that grammar acquisition is largely driven by exposure to speech data, and in particular through exposure to two of the BabyLM training corpora: AO-Childes and Open Subtitles. We arrive at this finding by examining vari… ▽ More This work explores the degree to which grammar acquisition is driven by language `simplicity' and the source modality (speech vs. text) of data. Using BabyBERTa as a probe, we find that grammar acquisition is largely driven by exposure to speech data, and in particular through exposure to two of the BabyLM training corpora: AO-Childes and Open Subtitles. We arrive at this finding by examining various ways of presenting input data to our model. First, we assess the impact of various sequence-level complexity based curricula. We then examine the impact of learning over `blocks' -- covering spans of text that are balanced for the number of tokens in each of the source corpora (rather than number of lines). Finally, we explore curricula that vary the degree to which the model is exposed to different corpora. In all cases, we find that over-exposure to AO-Childes and Open Subtitles significantly drives performance. We verify these findings through a comparable control dataset in which exposure to these corpora, and speech more generally, is limited by design. Our findings indicate that it is not the proportion of tokens occupied by high-utility data that aids acquisition, but rather the proportion of training steps assigned to such data. We hope this encourages future research into the use of more developmentally plausible linguistic data (which tends to be more scarce) to augment general purpose pre-training regimes. △ Less

Submitted 3 November, 2023; v1 submitted 31 October, 2023; originally announced November 2023.

Comments: CoNLL-CMCL Shared Task BabyLM Challenge 2023

arXiv:2310.17638 [pdf, other]

Generative Fractional Diffusion Models

Authors: Gabriel Nobis, Maximilian Springenberg, Marco Aversa, Michael Detzel, Rembert Daems, Roderick Murray-Smith, Shinichi Nakajima, Sebastian Lapuschkin, Stefano Ermon, Tolga Birdal, Manfred Opper, Christoph Knochenhauer, Luis Oala, Wojciech Samek

Abstract: We introduce the first continuous-time score-based generative model that leverages fractional diffusion processes for its underlying dynamics. Although diffusion models have excelled at capturing data distributions, they still suffer from various limitations such as slow convergence, mode-collapse on imbalanced data, and lack of diversity. These issues are partially linked to the use of light-tail… ▽ More We introduce the first continuous-time score-based generative model that leverages fractional diffusion processes for its underlying dynamics. Although diffusion models have excelled at capturing data distributions, they still suffer from various limitations such as slow convergence, mode-collapse on imbalanced data, and lack of diversity. These issues are partially linked to the use of light-tailed Brownian motion (BM) with independent increments. In this paper, we replace BM with an approximation of its non-Markovian counterpart, fractional Brownian motion (fBM), characterized by correlated increments and Hurst index $H \in (0,1)$, where $H=1/2$ recovers the classical BM. To ensure tractable inference and learning, we employ a recently popularized Markov approximation of fBM (MA-fBM) and derive its reverse time model, resulting in generative fractional diffusion models (GFDMs). We characterize the forward dynamics using a continuous reparameterization trick and propose an augmented score matching loss to efficiently learn the score-function, which is partly known in closed form, at minimal added cost. The ability to drive our diffusion model via fBM provides flexibility and control. $H \leq 1/2$ enters the regime of rough paths whereas $H>1/2$ regularizes diffusion paths and invokes long-term memory as well as a heavy-tailed behaviour (super-diffusion). The Markov approximation allows added control by varying the number of Markov processes linearly combined to approximate fBM. Our evaluations on real image datasets demonstrate that GFDM achieves greater pixel-wise diversity and enhanced image quality, as indicated by a lower FID, offering a promising alternative to traditional diffusion models. △ Less

Submitted 24 June, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

ACM Class: I.2.4; F.4.1; G.3

arXiv:2310.12975 [pdf, other]

Variational Inference for SDEs Driven by Fractional Noise

Authors: Rembert Daems, Manfred Opper, Guillaume Crevecoeur, Tolga Birdal

Abstract: We present a novel variational framework for performing inference in (neural) stochastic differential equations (SDEs) driven by Markov-approximate fractional Brownian motion (fBM). SDEs offer a versatile tool for modeling real-world continuous-time dynamic systems with inherent noise and randomness. Combining SDEs with the powerful inference capabilities of variational methods, enables the learni… ▽ More We present a novel variational framework for performing inference in (neural) stochastic differential equations (SDEs) driven by Markov-approximate fractional Brownian motion (fBM). SDEs offer a versatile tool for modeling real-world continuous-time dynamic systems with inherent noise and randomness. Combining SDEs with the powerful inference capabilities of variational methods, enables the learning of representative function distributions through stochastic gradient descent. However, conventional SDEs typically assume the underlying noise to follow a Brownian motion (BM), which hinders their ability to capture long-term dependencies. In contrast, fractional Brownian motion (fBM) extends BM to encompass non-Markovian dynamics, but existing methods for inferring fBM parameters are either computationally demanding or statistically inefficient. In this paper, building upon the Markov approximation of fBM, we derive the evidence lower bound essential for efficient variational inference of posterior path measures, drawing from the well-established field of stochastic analysis. Additionally, we provide a closed-form expression to determine optimal approximation coefficients. Furthermore, we propose the use of neural networks to learn the drift, diffusion and control terms within our variational posterior, leading to the variational training of neural-SDEs. In this framework, we also optimize the Hurst index, governing the nature of our fractional noise. Beyond validation on synthetic data, we contribute a novel architecture for variational latent video prediction,-an approach that, to the best of our knowledge, enables the first variational neural-SDE application to video perception. △ Less

Submitted 19 October, 2023; originally announced October 2023.

Comments: 24 pages, under review

arXiv:2305.05588 [pdf, other]

StrAE: Autoencoding for Pre-Trained Embeddings using Explicit Structure

Authors: Mattia Opper, Victor Prokhorov, N. Siddharth

Abstract: This work presents StrAE: a Structured Autoencoder framework that through strict adherence to explicit structure, and use of a novel contrastive objective over tree-structured representations, enables effective learning of multi-level representations. Through comparison over different forms of structure, we verify that our results are directly attributable to the informativeness of the structure p… ▽ More This work presents StrAE: a Structured Autoencoder framework that through strict adherence to explicit structure, and use of a novel contrastive objective over tree-structured representations, enables effective learning of multi-level representations. Through comparison over different forms of structure, we verify that our results are directly attributable to the informativeness of the structure provided as input, and show that this is not the case for existing tree models. We then further extend StrAE to allow the model to define its own compositions using a simple localised-merge algorithm. This variant, called Self-StrAE, outperforms baselines that don't involve explicit hierarchical compositions, and is comparable to models given informative structure (e.g. constituency parses). Our experiments are conducted in a data-constrained (circa 10M tokens) setting to help tease apart the contribution of the inductive bias to effective learning. However, we find that this framework can be robust to scale, and when extended to a much larger dataset (circa 100M tokens), our 430 parameter model performs comparably to a 6-layer RoBERTa many orders of magnitude larger in size. Our findings support the utility of incorporating explicit composition as an inductive bias for effective representation learning. △ Less

Submitted 25 October, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

Comments: EMNLP 2023 Main

arXiv:2304.12290 [pdf, other]

Joint Message Detection and Channel Estimation for Unsourced Random Access in Cell-Free User-Centric Wireless Networks

Authors: Burak Çakmak, Eleni Gkiouzepi, Manfred Opper, Giuseppe Caire

Abstract: We consider unsourced random access (uRA) in a cell-free (CF) user-centric wireless network, where a large number of potential users compete for a random access slot, while only a finite subset is active. The random access users transmit codewords of length $L$ symbols from a shared codebook, which are received by $B$ geographically distributed radio units (RUs) equipped with $M$ antennas each. Ou… ▽ More We consider unsourced random access (uRA) in a cell-free (CF) user-centric wireless network, where a large number of potential users compete for a random access slot, while only a finite subset is active. The random access users transmit codewords of length $L$ symbols from a shared codebook, which are received by $B$ geographically distributed radio units (RUs) equipped with $M$ antennas each. Our goal is to devise and analyze a \emph{centralized} decoder to detect the transmitted messages (without prior knowledge of the active users) and estimate the corresponding channel state information. A specific challenge lies in the fact that, due to the geographically distributed nature of the CF network, there is no fixed correspondence between codewords and large-scale fading coefficients (LSFCs). To overcome this problem, we propose a scheme where the access codebook is partitioned in "location-based" subcodes, such that users in a particular location make use of the corresponding subcode. The joint message detection and channel estimation is obtained via a novel {\em Approximated Message Passing} (AMP) algorithm to estimate the linear superposition of matrix-valued "sources" corrupted by Gaussian noise. The matrices to be estimated exhibit zero rows for inactive messages and Gaussian-distributed rows corresponding to the active messages. The asymmetry in the LSFCs and message activity probabilities leads to \emph{different statistics} for the matrix sources, which distinguishes the AMP formulation from previous cases. In the regime where the codebook size scales linearly with $L$, while $B$ and $M$ are fixed, we present a rigorous high-dimensional analysis of the proposed AMP algorithm. Then, exploiting the fundamental decoupling principle of AMP, we provide a comprehensive analysis of Neyman-Pearson message detection, along with the subsequent channel estimation. △ Less

Submitted 5 February, 2024; v1 submitted 24 April, 2023; originally announced April 2023.

Comments: 45 pages, 9 figures, submitted to the IEEE Transactions on Information Theory

arXiv:2202.08198 [pdf, ps, other]

doi 10.1088/1742-5468/ac764a

Analysis of Random Sequential Message Passing Algorithms for Approximate Inference

Authors: Burak Çakmak, Yue M. Lu, Manfred Opper

Abstract: We analyze the dynamics of a random sequential message passing algorithm for approximate inference with large Gaussian latent variable models in a student-teacher scenario. To model nontrivial dependencies between the latent variables, we assume random covariance matrices drawn from rotation invariant ensembles. Moreover, we consider a model mismatching setting, where the teacher model and the one… ▽ More We analyze the dynamics of a random sequential message passing algorithm for approximate inference with large Gaussian latent variable models in a student-teacher scenario. To model nontrivial dependencies between the latent variables, we assume random covariance matrices drawn from rotation invariant ensembles. Moreover, we consider a model mismatching setting, where the teacher model and the one used by the student may be different. By means of dynamical functional approach, we obtain exact dynamical mean-field equations characterizing the dynamics of the inference algorithm. We also derive a range of model parameters for which the sequential algorithm does not converge. The boundary of this parameter range coincides with the de Almeida Thouless (AT) stability condition of the replica symmetric ansatz for the static probabilistic model. △ Less

Submitted 16 February, 2022; originally announced February 2022.

arXiv:2107.10066 [pdf, other]

Adaptive Inducing Points Selection For Gaussian Processes

Authors: Théo Galy-Fajou, Manfred Opper

Abstract: Gaussian Processes (\textbf{GPs}) are flexible non-parametric models with strong probabilistic interpretation. While being a standard choice for performing inference on time series, GPs have few techniques to work in a streaming setting. \cite{bui2017streaming} developed an efficient variational approach to train online GPs by using sparsity techniques: The whole set of observations is approximate… ▽ More Gaussian Processes (\textbf{GPs}) are flexible non-parametric models with strong probabilistic interpretation. While being a standard choice for performing inference on time series, GPs have few techniques to work in a streaming setting. \cite{bui2017streaming} developed an efficient variational approach to train online GPs by using sparsity techniques: The whole set of observations is approximated by a smaller set of inducing points (\textbf{IPs}) and moved around with new data. Both the number and the locations of the IPs will affect greatly the performance of the algorithm. In addition to optimizing their locations, we propose to adaptively add new points, based on the properties of the GP and the structure of the data. △ Less

Submitted 21 July, 2021; originally announced July 2021.

Comments: Accepted at Continual Learning Workshop - ICML 2020 : https://sites.google.com/view/cl-icml/home

arXiv:2105.09618 [pdf, other]

Nonlinear Hawkes Process with Gaussian Process Self Effects

Authors: Noa Malem-Shinitski, Cesar Ojeda, Manfred Opper

Abstract: Traditionally, Hawkes processes are used to model time--continuous point processes with history dependence. Here we propose an extended model where the self--effects are of both excitatory and inhibitory type and follow a Gaussian Process. Whereas previous work either relies on a less flexible parameterization of the model, or requires a large amount of data, our formulation allows for both a flex… ▽ More Traditionally, Hawkes processes are used to model time--continuous point processes with history dependence. Here we propose an extended model where the self--effects are of both excitatory and inhibitory type and follow a Gaussian Process. Whereas previous work either relies on a less flexible parameterization of the model, or requires a large amount of data, our formulation allows for both a flexible model and learning when data are scarce. We continue the line of work of Bayesian inference for Hawkes processes, and our approach dispenses with the necessity of estimating a branching structure for the posterior, as we perform inference on an aggregated sum of Gaussian Processes. Efficient approximate Bayesian inference is achieved via data augmentation, and we describe a mean--field variational inference approach to learn the model parameters. To demonstrate the flexibility of the model we apply our methodology on data from three different domains and compare it to previously reported results. △ Less

Submitted 20 May, 2021; originally announced May 2021.

arXiv:2101.01571 [pdf, ps, other]

doi 10.1103/PhysRevE.103.L030101

Exact solution to the random sequential dynamics of a message passing algorithm

Authors: Burak Çakmak, Manfred Opper

Abstract: We analyze the random sequential dynamics of a message passing algorithm for Ising models with random interactions in the large system limit. We derive exact results for the two-time correlation functions and the speed of convergence. The {\em de Almedia-Thouless} stability criterion of the static problem is found to be necessary and sufficient for the global convergence of the random sequential d… ▽ More We analyze the random sequential dynamics of a message passing algorithm for Ising models with random interactions in the large system limit. We derive exact results for the two-time correlation functions and the speed of convergence. The {\em de Almedia-Thouless} stability criterion of the static problem is found to be necessary and sufficient for the global convergence of the random sequential dynamics. △ Less

Submitted 2 March, 2021; v1 submitted 5 January, 2021; originally announced January 2021.

Comments: Accepted for publication in Physical Review E Letter

Journal ref: Phys. Rev. E 103, 030101 (2021)

arXiv:2005.01560 [pdf, ps, other]

doi 10.1088/1742-5468/abb8c9

A Dynamical Mean-Field Theory for Learning in Restricted Boltzmann Machines

Authors: Burak Çakmak, Manfred Opper

Abstract: We define a message-passing algorithm for computing magnetizations in Restricted Boltzmann machines, which are Ising models on bipartite graphs introduced as neural network models for probability distributions over spin configurations. To model nontrivial statistical dependencies between the spins' couplings, we assume that the rectangular coupling matrix is drawn from an arbitrary bi-rotation inv… ▽ More We define a message-passing algorithm for computing magnetizations in Restricted Boltzmann machines, which are Ising models on bipartite graphs introduced as neural network models for probability distributions over spin configurations. To model nontrivial statistical dependencies between the spins' couplings, we assume that the rectangular coupling matrix is drawn from an arbitrary bi-rotation invariant random matrix ensemble. Using the dynamical functional method of statistical mechanics we exactly analyze the dynamics of the algorithm in the large system limit. We prove the global convergence of the algorithm under a stability criterion and compute asymptotic convergence rates showing excellent agreement with numerical simulations. △ Less

Submitted 4 May, 2020; originally announced May 2020.

Comments: 29 pages, 2 figures

arXiv:2002.11451 [pdf, other]

Automated Augmented Conjugate Inference for Non-conjugate Gaussian Process Models

Authors: Théo Galy-Fajou, Florian Wenzel, Manfred Opper

Abstract: We propose automated augmented conjugate inference, a new inference method for non-conjugate Gaussian processes (GP) models. Our method automatically constructs an auxiliary variable augmentation that renders the GP model conditionally conjugate. Building on the conjugate structure of the augmented model, we develop two inference methods. First, a fast and scalable stochastic variational inference… ▽ More We propose automated augmented conjugate inference, a new inference method for non-conjugate Gaussian processes (GP) models. Our method automatically constructs an auxiliary variable augmentation that renders the GP model conditionally conjugate. Building on the conjugate structure of the augmented model, we develop two inference methods. First, a fast and scalable stochastic variational inference method that uses efficient block coordinate ascent updates, which are computed in closed form. Second, an asymptotically correct Gibbs sampler that is useful for small datasets. Our experiments show that our method are up two orders of magnitude faster and more robust than existing state-of-the-art black-box methods. △ Less

Submitted 26 February, 2020; originally announced February 2020.

Comments: Accepted at AISTATS 2020

arXiv:2002.02533 [pdf, ps, other]

doi 10.5506/APhysPolB.51.1673

Understanding the dynamics of message passing algorithms: a free probability heuristics

Authors: Manfred Opper, Burak Çakmak

Abstract: We use freeness assumptions of random matrix theory to analyze the dynamical behavior of inference algorithms for probabilistic models with dense coupling matrices in the limit of large systems. For a toy Ising model, we are able to recover previous results such as the property of vanishing effective memories and the analytical convergence rate of the algorithm. We use freeness assumptions of random matrix theory to analyze the dynamical behavior of inference algorithms for probabilistic models with dense coupling matrices in the limit of large systems. For a toy Ising model, we are able to recover previous results such as the property of vanishing effective memories and the analytical convergence rate of the algorithm. △ Less

Submitted 3 February, 2020; originally announced February 2020.

Comments: 11 pages, 2 figures. Presented at the conference "Random Matrix Theory: Applications in the Information Era'' 2019 Kraków

arXiv:2001.04918 [pdf, ps, other]

doi 10.1088/1751-8121/ab8ff4

Analysis of Bayesian Inference Algorithms by the Dynamical Functional Approach

Authors: Burak Çakmak, Manfred Opper

Abstract: We analyze the dynamics of an algorithm for approximate inference with large Gaussian latent variable models in a student-teacher scenario. To model nontrivial dependencies between the latent variables, we assume random covariance matrices drawn from rotation invariant ensembles. For the case of perfect data-model matching, the knowledge of static order parameters derived from the replica method a… ▽ More We analyze the dynamics of an algorithm for approximate inference with large Gaussian latent variable models in a student-teacher scenario. To model nontrivial dependencies between the latent variables, we assume random covariance matrices drawn from rotation invariant ensembles. For the case of perfect data-model matching, the knowledge of static order parameters derived from the replica method allows us to obtain efficient algorithmic updates in terms of matrix-vector multiplications with a fixed matrix. Using the dynamical functional approach, we obtain an exact effective stochastic process in the thermodynamic limit for a single node. From this, we obtain closed-form expressions for the rate of the convergence. Analytical results are excellent agreement with simulations of single instances of large models. △ Less

Submitted 14 January, 2020; originally announced January 2020.

Comments: 25 pages, 2 figures

arXiv:1910.00069 [pdf, ps, other]

doi 10.1088/1742-5468/ab43d3

Tightening Bounds for Variational Inference by Revisiting Perturbation Theory

Authors: Robert Bamler, Cheng Zhang, Manfred Opper, Stephan Mandt

Abstract: Variational inference has become one of the most widely used methods in latent variable modeling. In its basic form, variational inference employs a fully factorized variational distribution and minimizes its KL divergence to the posterior. As the minimization can only be carried out approximately, this approximation induces a bias. In this paper, we revisit perturbation theory as a powerful way o… ▽ More Variational inference has become one of the most widely used methods in latent variable modeling. In its basic form, variational inference employs a fully factorized variational distribution and minimizes its KL divergence to the posterior. As the minimization can only be carried out approximately, this approximation induces a bias. In this paper, we revisit perturbation theory as a powerful way of improving the variational approximation. Perturbation theory relies on a form of Taylor expansion of the log marginal likelihood, vaguely in terms of the log ratio of the true posterior and its variational approximation. While first order terms give the classical variational bound, higher-order terms yield corrections that tighten it. However, traditional perturbation theory does not provide a lower bound, making it inapt for stochastic optimization. In this paper, we present a similar yet alternative way of deriving corrections to the ELBO that resemble perturbation theory, but that result in a valid bound. We show in experiments on Gaussian Processes and Variational Autoencoders that the new bounds are more mass covering, and that the resulting posterior covariances are closer to the true posterior and lead to higher likelihoods on held-out data. △ Less

Submitted 30 September, 2019; originally announced October 2019.

Comments: To appear in Journal of Statistical Mechanics: Theory and Experiment (JSTAT), 2019

arXiv:1905.09670 [pdf, other]

Multi-Class Gaussian Process Classification Made Conjugate: Efficient Inference via Data Augmentation

Authors: Théo Galy-Fajou, Florian Wenzel, Christian Donner, Manfred Opper

Abstract: We propose a new scalable multi-class Gaussian process classification approach building on a novel modified softmax likelihood function. The new likelihood has two benefits: it leads to well-calibrated uncertainty estimates and allows for an efficient latent variable augmentation. The augmented model has the advantage that it is conditionally conjugate leading to a fast variational inference metho… ▽ More We propose a new scalable multi-class Gaussian process classification approach building on a novel modified softmax likelihood function. The new likelihood has two benefits: it leads to well-calibrated uncertainty estimates and allows for an efficient latent variable augmentation. The augmented model has the advantage that it is conditionally conjugate leading to a fast variational inference method via block coordinate ascent updates. Previous approaches suffered from a trade-off between uncertainty calibration and speed. Our experiments show that our method leads to well-calibrated uncertainty estimates and competitive predictive performance while being up to two orders faster than the state of the art. △ Less

Submitted 23 May, 2019; originally announced May 2019.

Comments: Accepted at UAI 2019

arXiv:1901.08583 [pdf, ps, other]

doi 10.1103/PhysRevE.99.062140

Memory-free dynamics for the TAP equations of Ising models with arbitrary rotation invariant ensembles of random coupling matrices

Authors: Burak Çakmak, Manfred Opper

Abstract: We propose an iterative algorithm for solving the Thouless-Anderson-Palmer (TAP) equations of Ising models with arbitrary rotation invariant (random) coupling matrices. In the thermodynamic limit, we prove by means of the dynamical functional method that the proposed algorithm converges when the so-called de Almeida Thouless (AT) criterion is fulfilled. Moreover, we give exact analytical expressio… ▽ More We propose an iterative algorithm for solving the Thouless-Anderson-Palmer (TAP) equations of Ising models with arbitrary rotation invariant (random) coupling matrices. In the thermodynamic limit, we prove by means of the dynamical functional method that the proposed algorithm converges when the so-called de Almeida Thouless (AT) criterion is fulfilled. Moreover, we give exact analytical expressions for the rate of the convergence. △ Less

Submitted 7 March, 2019; v1 submitted 24 January, 2019; originally announced January 2019.

Comments: 14 pages, 6 figures, the extended version of the previous preprint arXiv:1901.08583v1, both authors are co-first authors

Journal ref: Phys. Rev. E 99, 062140 (2019)

arXiv:1808.00831 [pdf, other]

Efficient Bayesian Inference of Sigmoidal Gaussian Cox Processes

Authors: Christian Donner, Manfred Opper

Abstract: We present an approximate Bayesian inference approach for estimating the intensity of an inhomogeneous Poisson process, where the intensity function is modelled using a Gaussian process (GP) prior via a sigmoid link function. Augmenting the model using a latent marked Poisson process and Pólya--Gamma random variables we obtain a representation of the likelihood which is conjugate to the GP prior.… ▽ More We present an approximate Bayesian inference approach for estimating the intensity of an inhomogeneous Poisson process, where the intensity function is modelled using a Gaussian process (GP) prior via a sigmoid link function. Augmenting the model using a latent marked Poisson process and Pólya--Gamma random variables we obtain a representation of the likelihood which is conjugate to the GP prior. We estimate the posterior using a variational free--form mean field optimisation together with the framework of sparse GPs. Furthermore, as alternative approximation we suggest a sparse Laplace's method for the posterior, for which an efficient expectation--maximisation algorithm is derived to find the posterior's mode. Both algorithms compare well against exact inference obtained by a Markov Chain Monte Carlo sampler and standard variational Gauss approach solving the same model, while being one order of magnitude faster. Furthermore, the performance and speed of our method is competitive with that of another recently proposed Poisson process model based on a quadratic link function, while not being limited to GPs with squared exponential kernels and rectangular domains. △ Less

Submitted 3 May, 2019; v1 submitted 2 August, 2018; originally announced August 2018.

Comments: 34 pages; 6 figures

MSC Class: 60G55

Journal ref: Journal of Machine Learning Research, year 2018, volume 19,number 67, pages 1-34

arXiv:1805.11494 [pdf, ps, other]

Efficient Bayesian Inference for a Gaussian Process Density Model

Authors: Christian Donner, Manfred Opper

Abstract: We reconsider a nonparametric density model based on Gaussian processes. By augmenting the model with latent Pólya--Gamma random variables and a latent marked Poisson process we obtain a new likelihood which is conjugate to the model's Gaussian process prior. The augmented posterior allows for efficient inference by Gibbs sampling and an approximate variational mean field approach. For the latter… ▽ More We reconsider a nonparametric density model based on Gaussian processes. By augmenting the model with latent Pólya--Gamma random variables and a latent marked Poisson process we obtain a new likelihood which is conjugate to the model's Gaussian process prior. The augmented posterior allows for efficient inference by Gibbs sampling and an approximate variational mean field approach. For the latter we utilise sparse GP approximations to tackle the infinite dimensionality of the problem. The performance of both algorithms and comparisons with other density estimators are demonstrated on artificial and real datasets with up to several thousand data points. △ Less

Submitted 29 May, 2018; originally announced May 2018.

Comments: 11 pages, 5 figures

MSC Class: 62G07; 60G15

arXiv:1803.04497 [pdf, other]

Automated software vulnerability detection with machine learning

Authors: Jacob A. Harer, Louis Y. Kim, Rebecca L. Russell, Onur Ozdemir, Leonard R. Kosta, Akshay Rangamani, Lei H. Hamilton, Gabriel I. Centeno, Jonathan R. Key, Paul M. Ellingwood, Erik Antelman, Alan Mackay, Marc W. McConley, Jeffrey M. Opper, Peter Chin, Tomo Lazovich

Abstract: Thousands of security vulnerabilities are discovered in production software each year, either reported publicly to the Common Vulnerabilities and Exposures database or discovered internally in proprietary code. Vulnerabilities often manifest themselves in subtle ways that are not obvious to code reviewers or the developers themselves. With the wealth of open source code available for analysis, the… ▽ More Thousands of security vulnerabilities are discovered in production software each year, either reported publicly to the Common Vulnerabilities and Exposures database or discovered internally in proprietary code. Vulnerabilities often manifest themselves in subtle ways that are not obvious to code reviewers or the developers themselves. With the wealth of open source code available for analysis, there is an opportunity to learn the patterns of bugs that can lead to security vulnerabilities directly from data. In this paper, we present a data-driven approach to vulnerability detection using machine learning, specifically applied to C and C++ programs. We first compile a large dataset of hundreds of thousands of open-source functions labeled with the outputs of a static analyzer. We then compare methods applied directly to source code with methods applied to artifacts extracted from the build process, finding that source-based models perform better. We also compare the application of deep neural network models with more traditional models such as random forests and find the best performance comes from combining features learned by deep models with tree-based models. Ultimately, our highest performing model achieves an area under the precision-recall curve of 0.49 and an area under the ROC curve of 0.87. △ Less

Submitted 2 August, 2018; v1 submitted 14 February, 2018; originally announced March 2018.

arXiv:1802.06383 [pdf, other]

Efficient Gaussian Process Classification Using Polya-Gamma Data Augmentation

Authors: Florian Wenzel, Theo Galy-Fajou, Christan Donner, Marius Kloft, Manfred Opper

Abstract: We propose a scalable stochastic variational approach to GP classification building on Polya-Gamma data augmentation and inducing points. Unlike former approaches, we obtain closed-form updates based on natural gradients that lead to efficient optimization. We evaluate the algorithm on real-world datasets containing up to 11 million data points and demonstrate that it is up to two orders of magnit… ▽ More We propose a scalable stochastic variational approach to GP classification building on Polya-Gamma data augmentation and inducing points. Unlike former approaches, we obtain closed-form updates based on natural gradients that lead to efficient optimization. We evaluate the algorithm on real-world datasets containing up to 11 million data points and demonstrate that it is up to two orders of magnitude faster than the state-of-the-art while being competitive in terms of prediction performance. △ Less

Submitted 27 November, 2018; v1 submitted 18 February, 2018; originally announced February 2018.

arXiv:1801.05411 [pdf, other]

Expectation Propagation for Approximate Inference: Free Probability Framework

Authors: Burak Çakmak, Manfred Opper

Abstract: We study asymptotic properties of expectation propagation (EP) -- a method for approximate inference originally developed in the field of machine learning. Applied to generalized linear models, EP iteratively computes a multivariate Gaussian approximation to the exact posterior distribution. The computational complexity of the repeated update of covariance matrices severely limits the application… ▽ More We study asymptotic properties of expectation propagation (EP) -- a method for approximate inference originally developed in the field of machine learning. Applied to generalized linear models, EP iteratively computes a multivariate Gaussian approximation to the exact posterior distribution. The computational complexity of the repeated update of covariance matrices severely limits the application of EP to large problem sizes. In this study, we present a rigorous analysis by means of free probability theory that allows us to overcome this computational bottleneck if specific data matrices in the problem fulfill certain properties of asymptotic freeness. We demonstrate the relevance of our approach on the gene selection problem of a microarray dataset. △ Less

Submitted 9 May, 2018; v1 submitted 16 January, 2018; originally announced January 2018.

Comments: Both authors are co-first authors. The main body of this paper is accepted for publication in the proceedings of the 2018 IEEE International Symposium on Information Theory (ISIT)

arXiv:1709.07433 [pdf, other]

Perturbative Black Box Variational Inference

Authors: Robert Bamler, Cheng Zhang, Manfred Opper, Stephan Mandt

Abstract: Black box variational inference (BBVI) with reparameterization gradients triggered the exploration of divergence measures other than the Kullback-Leibler (KL) divergence, such as alpha divergences. In this paper, we view BBVI with generalized divergences as a form of estimating the marginal likelihood via biased importance sampling. The choice of divergence determines a bias-variance trade-off bet… ▽ More Black box variational inference (BBVI) with reparameterization gradients triggered the exploration of divergence measures other than the Kullback-Leibler (KL) divergence, such as alpha divergences. In this paper, we view BBVI with generalized divergences as a form of estimating the marginal likelihood via biased importance sampling. The choice of divergence determines a bias-variance trade-off between the tightness of a bound on the marginal likelihood (low bias) and the variance of its gradient estimators. Drawing on variational perturbation theory of statistical physics, we use these insights to construct a family of new variational bounds. Enumerated by an odd integer order $K$, this family captures the standard KL bound for $K=1$, and converges to the exact marginal likelihood as $K\to\infty$. Compared to alpha-divergences, our reparameterization gradients have a lower variance. We show in experiments on Gaussian Processes and Variational Autoencoders that the new bounds are more mass covering, and that the resulting posterior covariances are closer to the true posterior and lead to higher likelihoods on held-out data. △ Less

Submitted 6 January, 2018; v1 submitted 21 September, 2017; originally announced September 2017.

Comments: In the proceedings of Advances in Neural Information Processing Systems (NIPS 2017)

arXiv:1705.04284 [pdf, ps, other]

Dynamical Functional Theory for Compressed Sensing

Authors: Burak Çakmak, Manfred Opper, Ole Winther, Bernard H. Fleury

Abstract: We introduce a theoretical approach for designing generalizations of the approximate message passing (AMP) algorithm for compressed sensing which are valid for large observation matrices that are drawn from an invariant random matrix ensemble. By design, the fixed points of the algorithm obey the Thouless-Anderson-Palmer (TAP) equations corresponding to the ensemble. Using a dynamical functional a… ▽ More We introduce a theoretical approach for designing generalizations of the approximate message passing (AMP) algorithm for compressed sensing which are valid for large observation matrices that are drawn from an invariant random matrix ensemble. By design, the fixed points of the algorithm obey the Thouless-Anderson-Palmer (TAP) equations corresponding to the ensemble. Using a dynamical functional approach we are able to derive an effective stochastic process for the marginal statistics of a single component of the dynamics. This allows us to design memory terms in the algorithm in such a way that the resulting fields become Gaussian random variables allowing for an explicit analysis. The asymptotic statistics of these fields are consistent with the replica ansatz of the compressed sensing problem. △ Less

Submitted 11 May, 2017; originally announced May 2017.

Comments: 5 pages, accepted for ISIT 2017

arXiv:1608.06602 [pdf, other]

Self-Averaging Expectation Propagation

Authors: Burak Çakmak, Manfred Opper, Bernard H. Fleury, Ole Winther

Abstract: We investigate the problem of approximate Bayesian inference for a general class of observation models by means of the expectation propagation (EP) framework for large systems under some statistical assumptions. Our approach tries to overcome the numerical bottleneck of EP caused by the inversion of large matrices. Assuming that the measurement matrices are realizations of specific types of ensemb… ▽ More We investigate the problem of approximate Bayesian inference for a general class of observation models by means of the expectation propagation (EP) framework for large systems under some statistical assumptions. Our approach tries to overcome the numerical bottleneck of EP caused by the inversion of large matrices. Assuming that the measurement matrices are realizations of specific types of ensembles we use the concept of freeness from random matrix theory to show that the EP cavity variances exhibit an asymptotic self-averaging property. They can be pre-computed using specific generating functions, i.e. the R- and/or S-transforms in free probability, which do not require matrix inversions. Our approach extends the framework of (generalized) approximate message passing -- assumes zero-mean iid entries of the measurement matrix -- to a general class of random matrix ensembles. The generalization is via a simple formulation of the R- and/or S-transforms of the limiting eigenvalue distribution of the Gramian of the measurement matrix. We demonstrate the performance of our approach on a signal recovery problem of nonlinear compressed sensing and compare it with that of EP. △ Less

Submitted 23 August, 2016; originally announced August 2016.

Comments: 12 pages

arXiv:1509.01229 [pdf, ps, other]

doi 10.1088/1751-8113/49/11/114002

A Theory of Solving TAP Equations for Ising Models with General Invariant Random Matrices

Authors: Manfred Opper, Burak Çakmak, Ole Winther

Abstract: We consider the problem of solving TAP mean field equations by iteration for Ising model with coupling matrices that are drawn at random from general invariant ensembles. We develop an analysis of iterative algorithms using a dynamical functional approach that in the thermodynamic limit yields an effective dynamics of a single variable trajectory. Our main novel contribution is the expression for… ▽ More We consider the problem of solving TAP mean field equations by iteration for Ising model with coupling matrices that are drawn at random from general invariant ensembles. We develop an analysis of iterative algorithms using a dynamical functional approach that in the thermodynamic limit yields an effective dynamics of a single variable trajectory. Our main novel contribution is the expression for the implicit memory term of the dynamics for general invariant ensembles. By subtracting these terms, that depend on magnetizations at previous time steps, the implicit memory terms cancel making the iteration dependent on a Gaussian distributed field only. The TAP magnetizations are stable fixed points if an AT stability criterion is fulfilled. We illustrate our method explicitly for coupling matrices drawn from the random orthogonal ensemble. △ Less

Submitted 28 March, 2016; v1 submitted 3 September, 2015; originally announced September 2015.

Comments: 27 pages, 6 Figures Published in Journal of Physics A: Mathematical and Theoretical, Volume 49, Number 11, 2016

arXiv:1406.7179 [pdf, other]

Optimal Population Codes for Control and Estimation

Authors: Alex Susemihl, Ron Meir, Manfred Opper

Abstract: Agents acting in the natural world aim at selecting appropriate actions based on noisy and partial sensory observations. Many behaviors leading to decision mak- ing and action selection in a closed loop setting are naturally phrased within a control theoretic framework. Within the framework of optimal Control Theory, one is usually given a cost function which is minimized by selecting a control la… ▽ More Agents acting in the natural world aim at selecting appropriate actions based on noisy and partial sensory observations. Many behaviors leading to decision mak- ing and action selection in a closed loop setting are naturally phrased within a control theoretic framework. Within the framework of optimal Control Theory, one is usually given a cost function which is minimized by selecting a control law based on the observations. While in standard control settings the sensors are assumed fixed, biological systems often gain from the extra flexibility of optimiz- ing the sensors themselves. However, this sensory adaptation is geared towards control rather than perception, as is often assumed. In this work we show that sen- sory adaptation for control differs from sensory adaptation for perception, even for simple control setups. This implies, consistently with recent experimental results, that when studying sensory adaptation, it is essential to account for the task being performed. △ Less

Submitted 27 June, 2014; originally announced June 2014.

Comments: 9 Pages, 4 figures

arXiv:1309.3103 [pdf, ps, other]

Temporal Autoencoding Improves Generative Models of Time Series

Authors: Chris Häusler, Alex Susemihl, Martin P Nawrot, Manfred Opper

Abstract: Restricted Boltzmann Machines (RBMs) are generative models which can learn useful representations from samples of a dataset in an unsupervised fashion. They have been widely employed as an unsupervised pre-training method in machine learning. RBMs have been modified to model time series in two main ways: The Temporal RBM stacks a number of RBMs laterally and introduces temporal dependencies betwee… ▽ More Restricted Boltzmann Machines (RBMs) are generative models which can learn useful representations from samples of a dataset in an unsupervised fashion. They have been widely employed as an unsupervised pre-training method in machine learning. RBMs have been modified to model time series in two main ways: The Temporal RBM stacks a number of RBMs laterally and introduces temporal dependencies between the hidden layer units; The Conditional RBM, on the other hand, considers past samples of the dataset as a conditional bias and learns a representation which takes these into account. Here we propose a new training method for both the TRBM and the CRBM, which enforces the dynamic structure of temporal datasets. We do so by treating the temporal models as denoising autoencoders, considering past frames of the dataset as corrupted versions of the present frame and minimizing the reconstruction error of the present data by the model. We call this approach Temporal Autoencoding. This leads to a significant improvement in the performance of both models in a filling-in-frames task across a number of datasets. The error reduction for motion capture data is 56\% for the CRBM and 80\% for the TRBM. Taking the posterior mean prediction instead of single samples further improves the model's estimates, decreasing the error by as much as 91\% for the CRBM on motion capture data. We also trained the model to perform forecasting on a large number of datasets and have found TA pretraining to consistently improve the performance of the forecasts. Furthermore, by looking at the prediction error across time, we can see that this improvement reflects a better representation of the dynamics of the data as opposed to a bias towards reconstructing the observed data on a short time scale. △ Less

Submitted 12 September, 2013; originally announced September 2013.

Showing 1–30 of 30 results for author: Opper, M