Search | arXiv e-print repository

Interventionally Consistent Surrogates for Agent-based Simulators

Authors: Joel Dyer, Nicholas Bishop, Yorgos Felekis, Fabio Massimo Zennaro, Anisoara Calinescu, Theodoros Damoulas, Michael Wooldridge

Abstract: Agent-based simulators provide granular representations of complex intelligent systems by directly modelling the interactions of the system's constituent agents. Their high-fidelity nature enables hyper-local policy evaluation and testing of what-if scenarios, but is associated with large computational costs that inhibits their widespread use. Surrogate models can address these computational limit… ▽ More Agent-based simulators provide granular representations of complex intelligent systems by directly modelling the interactions of the system's constituent agents. Their high-fidelity nature enables hyper-local policy evaluation and testing of what-if scenarios, but is associated with large computational costs that inhibits their widespread use. Surrogate models can address these computational limitations, but they must behave consistently with the agent-based model under policy interventions of interest. In this paper, we capitalise on recent developments on causal abstractions to develop a framework for learning interventionally consistent surrogate models for agent-based simulators. Our proposed approach facilitates rapid experimentation with policy interventions in complex systems, while inducing surrogates to behave consistently with high probability with respect to the agent-based simulator across interventions of interest. We demonstrate with empirical studies that observationally trained surrogates can misjudge the effect of interventions and misguide policymakers towards suboptimal policies, while surrogates trained for interventional consistency with our proposed method closely mimic the behaviour of an agent-based model under interventions of interest. △ Less

Submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.08107 [pdf, other]

Causal Optimal Transport of Abstractions

Authors: Yorgos Felekis, Fabio Massimo Zennaro, Nicola Branchini, Theodoros Damoulas

Abstract: Causal abstraction (CA) theory establishes formal criteria for relating multiple structural causal models (SCMs) at different levels of granularity by defining maps between them. These maps have significant relevance for real-world challenges such as synthesizing causal evidence from multiple experimental environments, learning causally consistent representations at different resolutions, and link… ▽ More Causal abstraction (CA) theory establishes formal criteria for relating multiple structural causal models (SCMs) at different levels of granularity by defining maps between them. These maps have significant relevance for real-world challenges such as synthesizing causal evidence from multiple experimental environments, learning causally consistent representations at different resolutions, and linking interventions across multiple SCMs. In this work, we propose COTA, the first method to learn abstraction maps from observational and interventional data without assuming complete knowledge of the underlying SCMs. In particular, we introduce a multi-marginal Optimal Transport (OT) formulation that enforces do-calculus causal constraints, together with a cost function that relies on interventional information. We extensively evaluate COTA on synthetic and real world problems, and showcase its advantages over non-causal, independent and aggregated COTA formulations. Finally, we demonstrate the efficiency of our method as a data augmentation tool by comparing it against the state-of-the-art CA learning framework, which assumes fully specified SCMs, on a real-world downstream task. △ Less

Submitted 13 December, 2023; originally announced December 2023.

arXiv:2307.02184 [pdf, other]

doi 10.1002/sta4.656

Table inference for combinatorial origin-destination choices in agent-based population synthesis

Authors: Ioannis Zachos, Theodoros Damoulas, Mark Girolami

Abstract: A key challenge in agent-based mobility simulations is the synthesis of individual agent socioeconomic profiles. Such profiles include locations of agent activities, which dictate the quality of the simulated travel patterns. These locations are typically represented in origin-destination matrices that are sampled using coarse travel surveys. This is because fine-grained trip profiles are scarce a… ▽ More A key challenge in agent-based mobility simulations is the synthesis of individual agent socioeconomic profiles. Such profiles include locations of agent activities, which dictate the quality of the simulated travel patterns. These locations are typically represented in origin-destination matrices that are sampled using coarse travel surveys. This is because fine-grained trip profiles are scarce and fragmented due to privacy and cost reasons. The discrepancy between data and sampling resolutions renders agent traits non-identifiable due to the combinatorial space of data-consistent individual attributes. This problem is pertinent to any agent-based inference setting where the latent state is discrete. Existing approaches have used continuous relaxations of the underlying location assignments and subsequent ad-hoc discretisation thereof. We propose a framework to efficiently navigate this space offering improved reconstruction and coverage as well as linear-time sampling of the ground truth origin-destination table. This allows us to avoid factorially growing rejection rates and poor summary statistic consistency inherent in discrete choice modelling. We achieve this by introducing joint sampling schemes for the continuous intensity and discrete table of agent trips, as well as Markov bases that can efficiently traverse this combinatorial space subject to summary statistic constraints. Our framework's benefits are demonstrated in multiple controlled experiments and a large-scale application to agent work trip reconstruction in Cambridge, UK. △ Less

Submitted 6 July, 2023; v1 submitted 5 July, 2023; originally announced July 2023.

Comments: 17 pages, 8 figures, 2 tables

arXiv:2306.01468 [pdf, other]

Robust Bayesian Inference for Berkson and Classical Measurement Error Models

Authors: Charita Dellaporta, Theodoros Damoulas

Abstract: Measurement error occurs when a covariate influencing a response variable is corrupted by noise. This can lead to misleading inference outcomes, particularly in problems where accurately estimating the relationship between covariates and response variables is crucial, such as causal effect estimation. Existing methods for dealing with measurement error often rely on strong assumptions such as know… ▽ More Measurement error occurs when a covariate influencing a response variable is corrupted by noise. This can lead to misleading inference outcomes, particularly in problems where accurately estimating the relationship between covariates and response variables is crucial, such as causal effect estimation. Existing methods for dealing with measurement error often rely on strong assumptions such as knowledge of the error distribution or its variance and availability of replicated measurements of the covariates. We propose a Bayesian Nonparametric Learning framework that is robust to mismeasured covariates, does not require the preceding assumptions, and can incorporate prior beliefs about the error distribution. This approach gives rise to a general framework that is suitable for both Classical and Berkson error models via the appropriate specification of the prior centering measure of a Dirichlet Process (DP). Moreover, it offers flexibility in the choice of loss function depending on the type of regression model. We provide bounds on the generalization error based on the Maximum Mean Discrepancy (MMD) loss which allows for generalization to non-Gaussian distributed errors and nonlinear covariate-response relationships. We showcase the effectiveness of the proposed framework versus prior art in real-world problems containing either Berkson or Classical measurement errors. △ Less

Submitted 29 April, 2024; v1 submitted 2 June, 2023; originally announced June 2023.

Comments: 60 pages, 12 figures. v2: Updated version of paper

arXiv:2208.10981 [pdf, ps, other]

Causal Entropy Optimization

Authors: Nicola Branchini, Virginia Aglietti, Neil Dhir, Theodoros Damoulas

Abstract: We study the problem of globally optimizing the causal effect on a target variable of an unknown causal graph in which interventions can be performed. This problem arises in many areas of science including biology, operations research and healthcare. We propose Causal Entropy Optimization (CEO), a framework that generalizes Causal Bayesian Optimization (CBO) to account for all sources of uncertain… ▽ More We study the problem of globally optimizing the causal effect on a target variable of an unknown causal graph in which interventions can be performed. This problem arises in many areas of science including biology, operations research and healthcare. We propose Causal Entropy Optimization (CEO), a framework that generalizes Causal Bayesian Optimization (CBO) to account for all sources of uncertainty, including the one arising from the causal graph structure. CEO incorporates the causal structure uncertainty both in the surrogate models for the causal effects and in the mechanism used to select interventions via an information-theoretic acquisition function. The resulting algorithm automatically trades-off structure learning and causal effect optimization, while naturally accounting for observation noise. For various synthetic and real-world structural causal models, CEO achieves faster convergence to the global optimum compared with CBO while also learning the graph. Furthermore, our joint approach to structure learning and causal optimization improves upon sequential, structure-learning-first approaches. △ Less

Submitted 23 August, 2022; originally announced August 2022.

arXiv:2202.04744 [pdf, other]

Robust Bayesian Inference for Simulator-based Models via the MMD Posterior Bootstrap

Authors: Charita Dellaporta, Jeremias Knoblauch, Theodoros Damoulas, François-Xavier Briol

Abstract: Simulator-based models are models for which the likelihood is intractable but simulation of synthetic data is possible. They are often used to describe complex real-world phenomena, and as such can often be misspecified in practice. Unfortunately, existing Bayesian approaches for simulators are known to perform poorly in those cases. In this paper, we propose a novel algorithm based on the posteri… ▽ More Simulator-based models are models for which the likelihood is intractable but simulation of synthetic data is possible. They are often used to describe complex real-world phenomena, and as such can often be misspecified in practice. Unfortunately, existing Bayesian approaches for simulators are known to perform poorly in those cases. In this paper, we propose a novel algorithm based on the posterior bootstrap and maximum mean discrepancy estimators. This leads to a highly-parallelisable Bayesian inference algorithm with strong robustness properties. This is demonstrated through an in-depth theoretical study which includes generalisation bounds and proofs of frequentist consistency and robustness of our posterior. The approach is then assessed on a range of examples including a g-and-k distribution and a toggle-switch model. △ Less

Submitted 19 December, 2022; v1 submitted 9 February, 2022; originally announced February 2022.

Comments: Accepted for publication (with an oral presentation) at AISTATS 2022. A preliminary version of this paper was accepted in the NeurIPS 2021 workshop "Your Model is Wrong: Robustness and misspecification in probabilistic modeling". v2: added some references. v3: corrected small error in theorem 3

arXiv:2111.01732 [pdf, other]

Spatio-Temporal Variational Gaussian Processes

Authors: Oliver Hamelijnck, William J. Wilkinson, Niki A. Loppi, Arno Solin, Theodoros Damoulas

Abstract: We introduce a scalable approach to Gaussian process inference that combines spatio-temporal filtering with natural gradient variational inference, resulting in a non-conjugate GP method for multivariate data that scales linearly with respect to time. Our natural gradient approach enables application of parallel filtering and smoothing, further reducing the temporal span complexity to be logarithm… ▽ More We introduce a scalable approach to Gaussian process inference that combines spatio-temporal filtering with natural gradient variational inference, resulting in a non-conjugate GP method for multivariate data that scales linearly with respect to time. Our natural gradient approach enables application of parallel filtering and smoothing, further reducing the temporal span complexity to be logarithmic in the number of time steps. We derive a sparse approximation that constructs a state-space model over a reduced set of spatial inducing points, and show that for separable Markov kernels the full and sparse cases exactly recover the standard variational GP, whilst exhibiting favourable computational properties. To further improve the spatial scaling we propose a mean-field assumption of independence between spatial locations which, when coupled with sparsity and parallelisation, leads to an efficient and accurate method for large spatio-temporal problems. △ Less

Submitted 2 November, 2021; originally announced November 2021.

arXiv:2110.13891 [pdf, other]

Dynamic Causal Bayesian Optimization

Authors: Virginia Aglietti, Neil Dhir, Javier González, Theodoros Damoulas

Abstract: This paper studies the problem of performing a sequence of optimal interventions in a causal dynamical system where both the target variable of interest and the inputs evolve over time. This problem arises in a variety of domains e.g. system biology and operational research. Dynamic Causal Bayesian Optimization (DCBO) brings together ideas from sequential decision making, causal inference and Gaus… ▽ More This paper studies the problem of performing a sequence of optimal interventions in a causal dynamical system where both the target variable of interest and the inputs evolve over time. This problem arises in a variety of domains e.g. system biology and operational research. Dynamic Causal Bayesian Optimization (DCBO) brings together ideas from sequential decision making, causal inference and Gaussian process (GP) emulation. DCBO is useful in scenarios where all causal effects in a graph are changing over time. At every time step DCBO identifies a local optimal intervention by integrating both observational and past interventional data collected from the system. We give theoretical results detailing how one can transfer interventional information across time steps and define a dynamic causal GP model which can be used to quantify uncertainty and find optimal interventions in practice. We demonstrate how DCBO identifies optimal interventions faster than competing approaches in multiple settings and applications. △ Less

Submitted 26 October, 2021; originally announced October 2021.

arXiv:2109.03582 [pdf, other]

Higher Order Kernel Mean Embeddings to Capture Filtrations of Stochastic Processes

Authors: Cristopher Salvi, Maud Lemercier, Chong Liu, Blanka Hovarth, Theodoros Damoulas, Terry Lyons

Abstract: Stochastic processes are random variables with values in some space of paths. However, reducing a stochastic process to a path-valued random variable ignores its filtration, i.e. the flow of information carried by the process through time. By conditioning the process on its filtration, we introduce a family of higher order kernel mean embeddings (KMEs) that generalizes the notion of KME and captur… ▽ More Stochastic processes are random variables with values in some space of paths. However, reducing a stochastic process to a path-valued random variable ignores its filtration, i.e. the flow of information carried by the process through time. By conditioning the process on its filtration, we introduce a family of higher order kernel mean embeddings (KMEs) that generalizes the notion of KME and captures additional information related to the filtration. We derive empirical estimators for the associated higher order maximum mean discrepancies (MMDs) and prove consistency. We then construct a filtration-sensitive kernel two-sample test able to pick up information that gets missed by the standard MMD test. In addition, leveraging our higher order MMDs we construct a family of universal kernels on stochastic processes that allows to solve real-world calibration and optimal stop** problems in quantitative finance (such as the pricing of American options) via classical kernel-based regression methods. Finally, adapting existing tests for conditional independence to the case of stochastic processes, we design a causal-discovery algorithm to recover the causal graph of structural dependencies among interacting bodies solely from observations of their multidimensional trajectories. △ Less

Submitted 3 November, 2021; v1 submitted 8 September, 2021; originally announced September 2021.

Comments: Published at NeurIPS 2021

MSC Class: 60L10; 60L20

arXiv:2108.02594 [pdf, other]

A variational Bayesian spatial interaction model for estimating revenue and demand at business facilities

Authors: Shanaka Perera, Virginia Aglietti, Theodoros Damoulas

Abstract: We study the problem of estimating potential revenue or demand at business facilities and understanding its generating mechanism. This problem arises in different fields such as operation research or urban science, and more generally, it is crucial for businesses' planning and decision making. We develop a Bayesian spatial interaction model, henceforth BSIM, which provides probabilistic prediction… ▽ More We study the problem of estimating potential revenue or demand at business facilities and understanding its generating mechanism. This problem arises in different fields such as operation research or urban science, and more generally, it is crucial for businesses' planning and decision making. We develop a Bayesian spatial interaction model, henceforth BSIM, which provides probabilistic predictions about revenues generated by a particular business location provided their features and the potential customers' characteristics in a given region. BSIM explicitly accounts for the competition among the competitive facilities through a probability value determined by evaluating a store-specific Gaussian distribution at a given customer location. We propose a scalable variational inference framework that, while being significantly faster than competing Markov Chain Monte Carlo inference schemes, exhibits comparable performances in terms of parameters identification and uncertainty quantification. We demonstrate the benefits of BSIM in various synthetic settings characterised by an increasing number of stores and customers. Finally, we construct a real-world, large spatial dataset for pub activities in London, UK, which includes over 1,500 pubs and 150,000 customer regions. We demonstrate how BSIM outperforms competing approaches on this large dataset in terms of prediction performances while providing results that are both interpretable and consistent with related indicators observed for the London region. △ Less

Submitted 5 August, 2021; originally announced August 2021.

arXiv:2105.04211 [pdf, other]

SigGPDE: Scaling Sparse Gaussian Processes on Sequential Data

Authors: Maud Lemercier, Cristopher Salvi, Thomas Cass, Edwin V. Bonilla, Theodoros Damoulas, Terry Lyons

Abstract: Making predictions and quantifying their uncertainty when the input data is sequential is a fundamental learning challenge, recently attracting increasing attention. We develop SigGPDE, a new scalable sparse variational inference framework for Gaussian Processes (GPs) on sequential data. Our contribution is twofold. First, we construct inducing variables underpinning the sparse approximation so th… ▽ More Making predictions and quantifying their uncertainty when the input data is sequential is a fundamental learning challenge, recently attracting increasing attention. We develop SigGPDE, a new scalable sparse variational inference framework for Gaussian Processes (GPs) on sequential data. Our contribution is twofold. First, we construct inducing variables underpinning the sparse approximation so that the resulting evidence lower bound (ELBO) does not require any matrix inversion. Second, we show that the gradients of the GP signature kernel are solutions of a hyperbolic partial differential equation (PDE). This theoretical insight allows us to build an efficient back-propagation algorithm to optimize the ELBO. We showcase the significant computational gains of SigGPDE compared to existing methods, while achieving state-of-the-art performance for classification tasks on large datasets of up to 1 million multivariate time series. △ Less

Submitted 12 October, 2021; v1 submitted 10 May, 2021; originally announced May 2021.

Comments: Published at ICML 2021

MSC Class: 60L10; 60L20

arXiv:2012.07574 [pdf, other]

An Expectation-Based Network Scan Statistic for a COVID-19 Early Warning System

Authors: Chance Haycock, Edward Thorpe-Woods, James Walsh, Patrick O'Hara, Oscar Giles, Neil Dhir, Theodoros Damoulas

Abstract: One of the Greater London Authority's (GLA) response to the COVID-19 pandemic brings together multiple large-scale and heterogeneous datasets capturing mobility, transportation and traffic activity over the city of London to better understand 'busyness' and enable targeted interventions and effective policy-making. As part of Project Odysseus we describe an early-warning system and introduce an ex… ▽ More One of the Greater London Authority's (GLA) response to the COVID-19 pandemic brings together multiple large-scale and heterogeneous datasets capturing mobility, transportation and traffic activity over the city of London to better understand 'busyness' and enable targeted interventions and effective policy-making. As part of Project Odysseus we describe an early-warning system and introduce an expectation-based scan statistic for networks to help the GLA and Transport for London, understand the extent to which populations are following government COVID-19 guidelines. We explicitly treat the case of geographically fixed time-series data located on a (road) network and primarily focus on monitoring the dynamics across large regions of the capital. Additionally, we also focus on the detection and reporting of significant spatio-temporal regions. Our approach is extending the Network Based Scan Statistic (NBSS) by making it expectation-based (EBP) and by using stochastic processes for time-series forecasting, which enables us to quantify metric uncertainty in both the EBP and NBSS frameworks. We introduce a variant of the metric used in the EBP model which focuses on identifying space-time regions in which activity is quieter than expected. △ Less

Submitted 8 December, 2020; originally announced December 2020.

arXiv:2009.12821 [pdf, other]

Multi-task Causal Learning with Gaussian Processes

Authors: Virginia Aglietti, Theodoros Damoulas, Mauricio Álvarez, Javier González

Abstract: This paper studies the problem of learning the correlation structure of a set of intervention functions defined on the directed acyclic graph (DAG) of a causal model. This is useful when we are interested in jointly learning the causal effects of interventions on different subsets of variables in a DAG, which is common in field such as healthcare or operations research. We propose the first multi-… ▽ More This paper studies the problem of learning the correlation structure of a set of intervention functions defined on the directed acyclic graph (DAG) of a causal model. This is useful when we are interested in jointly learning the causal effects of interventions on different subsets of variables in a DAG, which is common in field such as healthcare or operations research. We propose the first multi-task causal Gaussian process (GP) model, which we call DAG-GP, that allows for information sharing across continuous interventions and across experiments on different variables. DAG-GP accommodates different assumptions in terms of data availability and captures the correlation between functions lying in input spaces of different dimensionality via a well-defined integral operator. We give theoretical results detailing when and how the DAG-GP model can be formulated depending on the DAG. We test both the quality of its predictions and its calibrated uncertainties. Compared to single-task models, DAG-GP achieves the best fitting performance in a variety of real and synthetic settings. In addition, it helps to select optimal interventions faster than competing approaches when used within sequential decision making frameworks, like active learning or Bayesian optimization. △ Less

Submitted 27 September, 2020; originally announced September 2020.

arXiv:2006.15641 [pdf, other]

Variational Autoencoding of PDE Inverse Problems

Authors: Daniel J. Tait, Theodoros Damoulas

Abstract: Specifying a governing physical model in the presence of missing physics and recovering its parameters are two intertwined and fundamental problems in science. Modern machine learning allows one to circumvent these, via emulators and surrogates, but in doing so disregards prior knowledge and physical laws that are especially important for small data regimes, interpretability, and decision making.… ▽ More Specifying a governing physical model in the presence of missing physics and recovering its parameters are two intertwined and fundamental problems in science. Modern machine learning allows one to circumvent these, via emulators and surrogates, but in doing so disregards prior knowledge and physical laws that are especially important for small data regimes, interpretability, and decision making. In this work we fold the mechanistic model into a flexible data-driven surrogate to arrive at a physically structured decoder network. This provides accelerated inference for the Bayesian inverse problem, and can act as a drop-in regulariser that encodes a-priori physical information. We employ the variational form of the PDE problem and introduce stochastic local approximations as a form of model based data augmentation. We demonstrate both the accuracy and increased computational efficiency of the framework on real world settings and structured spatial processes. △ Less

Submitted 28 June, 2020; originally announced June 2020.

arXiv:2006.05805 [pdf, other]

Distribution Regression for Sequential Data

Authors: Maud Lemercier, Cristopher Salvi, Theodoros Damoulas, Edwin V. Bonilla, Terry Lyons

Abstract: Distribution regression refers to the supervised learning problem where labels are only available for groups of inputs instead of individual inputs. In this paper, we develop a rigorous mathematical framework for distribution regression where inputs are complex data streams. Leveraging properties of the expected signature and a recent signature kernel trick for sequential data from stochastic anal… ▽ More Distribution regression refers to the supervised learning problem where labels are only available for groups of inputs instead of individual inputs. In this paper, we develop a rigorous mathematical framework for distribution regression where inputs are complex data streams. Leveraging properties of the expected signature and a recent signature kernel trick for sequential data from stochastic analysis, we introduce two new learning techniques, one feature-based and the other kernel-based. Each is suited to a different data regime in terms of the number of data streams and the dimensionality of the individual streams. We provide theoretical results on the universality of both approaches and demonstrate empirically their robustness to irregularly sampled multivariate time-series, achieving state-of-the-art performance on both synthetic and real-world examples from thermodynamics, mathematical finance and agricultural science. △ Less

Submitted 29 September, 2021; v1 submitted 10 June, 2020; originally announced June 2020.

Comments: Published at AISTATS 2021

MSC Class: 60L10; 60L20

arXiv:2002.09998 [pdf, other]

Generalized Bayesian Filtering via Sequential Monte Carlo

Authors: Ayman Boustati, Ömer Deniz Akyildiz, Theodoros Damoulas, Adam M. Johansen

Abstract: We introduce a framework for inference in general state-space hidden Markov models (HMMs) under likelihood misspecification. In particular, we leverage the loss-theoretic perspective of Generalized Bayesian Inference (GBI) to define generalised filtering recursions in HMMs, that can tackle the problem of inference under model misspecification. In doing so, we arrive at principled procedures for ro… ▽ More We introduce a framework for inference in general state-space hidden Markov models (HMMs) under likelihood misspecification. In particular, we leverage the loss-theoretic perspective of Generalized Bayesian Inference (GBI) to define generalised filtering recursions in HMMs, that can tackle the problem of inference under model misspecification. In doing so, we arrive at principled procedures for robust inference against observation contamination by utilising the $β$-divergence. Operationalising the proposed framework is made possible via sequential Monte Carlo methods (SMC), where most standard particle methods, and their associated convergence results, are readily adapted to the new setting. We apply our approach to object tracking and Gaussian process regression problems, and observe improved performance over both standard filtering algorithms and other robust filters. △ Less

Submitted 21 October, 2020; v1 submitted 23 February, 2020; originally announced February 2020.

arXiv:1910.03906 [pdf, other]

Probabilistic sequential matrix factorization

Authors: Ömer Deniz Akyildiz, Gerrit J. J. van den Burg, Theodoros Damoulas, Mark F. J. Steel

Abstract: We introduce the probabilistic sequential matrix factorization (PSMF) method for factorizing time-varying and non-stationary datasets consisting of high-dimensional time-series. In particular, we consider nonlinear Gaussian state-space models where sequential approximate inference results in the factorization of a data matrix into a dictionary and time-varying coefficients with potentially nonline… ▽ More We introduce the probabilistic sequential matrix factorization (PSMF) method for factorizing time-varying and non-stationary datasets consisting of high-dimensional time-series. In particular, we consider nonlinear Gaussian state-space models where sequential approximate inference results in the factorization of a data matrix into a dictionary and time-varying coefficients with potentially nonlinear Markovian dependencies. The assumed Markovian structure on the coefficients enables us to encode temporal dependencies into a low-dimensional feature space. The proposed inference method is solely based on an approximate extended Kalman filtering scheme, which makes the resulting method particularly efficient. PSMF can account for temporal nonlinearities and, more importantly, can be used to calibrate and estimate generic differentiable nonlinear subspace models. We also introduce a robust version of PSMF, called rPSMF, which uses Student-t filters to handle model misspecification. We show that PSMF can be used in multiple contexts: modeling time series with a periodic subspace, robustifying changepoint detection methods, and imputing missing data in several high-dimensional time-series, such as measurements of pollutants across London. △ Less

Submitted 18 March, 2021; v1 submitted 9 October, 2019; originally announced October 2019.

Comments: Accepted for publication at AISTATS 2021

arXiv:1910.02008 [pdf, ps, other]

Nonasymptotic estimates for Stochastic Gradient Langevin Dynamics under local conditions in nonconvex optimization

Authors: Ying Zhang, Ömer Deniz Akyildiz, Theodoros Damoulas, Sotirios Sabanis

Abstract: In this paper, we are concerned with a non-asymptotic analysis of sampling algorithms used in nonconvex optimization. In particular, we obtain non-asymptotic estimates in Wasserstein-1 and Wasserstein-2 distances for a popular class of algorithms called Stochastic Gradient Langevin Dynamics (SGLD). In addition, the aforementioned Wasserstein-2 convergence result can be applied to establish a non-a… ▽ More In this paper, we are concerned with a non-asymptotic analysis of sampling algorithms used in nonconvex optimization. In particular, we obtain non-asymptotic estimates in Wasserstein-1 and Wasserstein-2 distances for a popular class of algorithms called Stochastic Gradient Langevin Dynamics (SGLD). In addition, the aforementioned Wasserstein-2 convergence result can be applied to establish a non-asymptotic error bound for the expected excess risk. Crucially, these results are obtained under a local Lipschitz condition and a local dissipativity condition where we remove the uniform dependence in the data stream. We illustrate the importance of this relaxation by presenting examples from variational inference and from index tracking optimization. △ Less

Submitted 14 October, 2022; v1 submitted 4 October, 2019; originally announced October 2019.

Comments: 38 pages

MSC Class: 60J20; 60J22; 65C05; 65C40; 62D05

arXiv:1906.08344 [pdf, other]

Multi-resolution Multi-task Gaussian Processes

Authors: Oliver Hamelijnck, Theodoros Damoulas, Kangrui Wang, Mark Girolami

Abstract: We consider evidence integration from potentially dependent observation processes under varying spatio-temporal sampling resolutions and noise levels. We develop a multi-resolution multi-task (MRGP) framework while allowing for both inter-task and intra-task multi-resolution and multi-fidelity. We develop shallow Gaussian Process (GP) mixtures that approximate the difficult to estimate joint likel… ▽ More We consider evidence integration from potentially dependent observation processes under varying spatio-temporal sampling resolutions and noise levels. We develop a multi-resolution multi-task (MRGP) framework while allowing for both inter-task and intra-task multi-resolution and multi-fidelity. We develop shallow Gaussian Process (GP) mixtures that approximate the difficult to estimate joint likelihood with a composite one and deep GP constructions that naturally handle biases in the mean. By doing so, we generalize and outperform state of the art GP compositions and offer information-theoretic corrections and efficient variational approximations. We demonstrate the competitiveness of MRGPs on synthetic settings and on the challenging problem of hyper-local estimation of air pollution levels across London from multiple sensing modalities operating at disparate spatio-temporal resolutions. △ Less

Submitted 5 November, 2019; v1 submitted 19 June, 2019; originally announced June 2019.

arXiv:1906.03161 [pdf, other]

Structured Variational Inference in Continuous Cox Process Models

Authors: Virginia Aglietti, Edwin V. Bonilla, Theodoros Damoulas, Sally Cripps

Abstract: We propose a scalable framework for inference in an inhomogeneous Poisson process modeled by a continuous sigmoidal Cox process that assumes the corresponding intensity function is given by a Gaussian process (GP) prior transformed with a scaled logistic sigmoid function. We present a tractable representation of the likelihood through augmentation with a superposition of Poisson processes. This vi… ▽ More We propose a scalable framework for inference in an inhomogeneous Poisson process modeled by a continuous sigmoidal Cox process that assumes the corresponding intensity function is given by a Gaussian process (GP) prior transformed with a scaled logistic sigmoid function. We present a tractable representation of the likelihood through augmentation with a superposition of Poisson processes. This view enables a structured variational approximation capturing dependencies across variables in the model. Our framework avoids discretization of the domain, does not require accurate numerical integration over the input space and is not limited to GPs with squared exponential kernels. We evaluate our approach on synthetic and real-world data showing that its benefits are particularly pronounced on multivariate input settings where it overcomes the limitations of mean-field methods and sampling schemes. We provide the state of-the-art in terms of speed, accuracy and uncertainty quantification trade-offs. △ Less

Submitted 7 June, 2019; originally announced June 2019.

arXiv:1905.12407 [pdf, other]

Non-linear Multitask Learning with Deep Gaussian Processes

Authors: Ayman Boustati, Theodoros Damoulas, Richard S. Savage

Abstract: We present a multi-task learning formulation for Deep Gaussian processes (DGPs), through non-linear mixtures of latent processes. The latent space is composed of private processes that capture within-task information and shared processes that capture across-task dependencies. We propose two different methods for segmenting the latent space: through hard coding shared and task-specific processes or… ▽ More We present a multi-task learning formulation for Deep Gaussian processes (DGPs), through non-linear mixtures of latent processes. The latent space is composed of private processes that capture within-task information and shared processes that capture across-task dependencies. We propose two different methods for segmenting the latent space: through hard coding shared and task-specific processes or through soft sharing with Automatic Relevance Determination kernels. We show that our formulation is able to improve the learning performance and transfer information between the tasks, outperforming other probabilistic multi-task learning models across real-world and benchmarking settings. △ Less

Submitted 23 February, 2020; v1 submitted 29 May, 2019; originally announced May 2019.

arXiv:1904.02063 [pdf, other]

Generalized Variational Inference: Three arguments for deriving new Posteriors

Authors: Jeremias Knoblauch, Jack Jewson, Theodoros Damoulas

Abstract: We advocate an optimization-centric view on and introduce a novel generalization of Bayesian inference. Our inspiration is the representation of Bayes' rule as infinite-dimensional optimization problem (Csiszar, 1975; Donsker and Varadhan; 1975, Zellner; 1988). First, we use it to prove an optimality result of standard Variational Inference (VI): Under the proposed view, the standard Evidence Lowe… ▽ More We advocate an optimization-centric view on and introduce a novel generalization of Bayesian inference. Our inspiration is the representation of Bayes' rule as infinite-dimensional optimization problem (Csiszar, 1975; Donsker and Varadhan; 1975, Zellner; 1988). First, we use it to prove an optimality result of standard Variational Inference (VI): Under the proposed view, the standard Evidence Lower Bound (ELBO) maximizing VI posterior is preferable to alternative approximations of the Bayesian posterior. Next, we argue for generalizing standard Bayesian inference. The need for this arises in situations of severe misalignment between reality and three assumptions underlying standard Bayesian inference: (1) Well-specified priors, (2) well-specified likelihoods, (3) the availability of infinite computing power. Our generalization addresses these shortcomings with three arguments and is called the Rule of Three (RoT). We derive it axiomatically and recover existing posteriors as special cases, including the Bayesian posterior and its approximation by standard VI. In contrast, approximations based on alternative ELBO-like objectives violate the axioms. Finally, we study a special case of the RoT that we call Generalized Variational Inference (GVI). GVI posteriors are a large and tractable family of belief distributions specified by three arguments: A loss, a divergence and a variational family. GVI posteriors have appealing properties, including consistency and an interpretation as approximate ELBO. The last part of the paper explores some attractive applications of GVI in popular machine learning models, including robustness and more appropriate marginals. After deriving black box inference schemes for GVI posteriors, their predictive performance is investigated on Bayesian Neural Networks and Deep Gaussian Processes, where GVI can comprehensively improve upon existing methods. △ Less

Submitted 12 December, 2019; v1 submitted 3 April, 2019; originally announced April 2019.

Comments: 103 pages, 23 figures (comprehensive revision of previous version)

arXiv:1806.02261 [pdf, other]

Doubly Robust Bayesian Inference for Non-Stationary Streaming Data with $β$-Divergences

Authors: Jeremias Knoblauch, Jack Jewson, Theodoros Damoulas

Abstract: We present the very first robust Bayesian Online Changepoint Detection algorithm through General Bayesian Inference (GBI) with $β$-divergences. The resulting inference procedure is doubly robust for both the parameter and the changepoint (CP) posterior, with linear time and constant space complexity. We provide a construction for exponential models and demonstrate it on the Bayesian Linear Regress… ▽ More We present the very first robust Bayesian Online Changepoint Detection algorithm through General Bayesian Inference (GBI) with $β$-divergences. The resulting inference procedure is doubly robust for both the parameter and the changepoint (CP) posterior, with linear time and constant space complexity. We provide a construction for exponential models and demonstrate it on the Bayesian Linear Regression model. In so doing, we make two additional contributions: Firstly, we make GBI scalable using Structural Variational approximations that are exact as $β\to 0$. Secondly, we give a principled way of choosing the divergence parameter $β$ by minimizing expected predictive loss on-line. Reducing False Discovery Rates of CPs from more than 90% to 0% on real world data, this offers the state of the art. △ Less

Submitted 27 November, 2018; v1 submitted 6 June, 2018; originally announced June 2018.

Comments: 39 pages, 11 figures, published at Neural Information Processing Systems (NeurIPS) 2018

Journal ref: Neural Information Processing Systems (NeurIPS) 2018

arXiv:1805.09781 [pdf, other]

Efficient Inference in Multi-task Cox Process Models

Authors: Virginia Aglietti, Theodoros Damoulas, Edwin Bonilla

Abstract: We generalize the log Gaussian Cox process (LGCP) framework to model multiple correlated point data jointly. The observations are treated as realizations of multiple LGCPs, whose log intensities are given by linear combinations of latent functions drawn from Gaussian process priors. The combination coefficients are also drawn from Gaussian processes and can incorporate additional dependencies. We… ▽ More We generalize the log Gaussian Cox process (LGCP) framework to model multiple correlated point data jointly. The observations are treated as realizations of multiple LGCPs, whose log intensities are given by linear combinations of latent functions drawn from Gaussian process priors. The combination coefficients are also drawn from Gaussian processes and can incorporate additional dependencies. We derive closed-form expressions for the moments of the intensity functions and develop an efficient variational inference algorithm that is orders of magnitude faster than competing deterministic and stochastic approximations of multivariate LGCP, coregionalization models, and multi-task permanental processes. Our approach outperforms these benchmarks in multiple problems, offering the current state of the art in modeling multivariate point processes. △ Less

Submitted 15 March, 2019; v1 submitted 24 May, 2018; originally announced May 2018.

arXiv:1805.05383 [pdf, other]

Spatio-temporal Bayesian On-line Changepoint Detection with Model Selection

Authors: Jeremias Knoblauch, Theodoros Damoulas

Abstract: Bayesian On-line Changepoint Detection is extended to on-line model selection and non-stationary spatio-temporal processes. We propose spatially structured Vector Autoregressions (VARs) for modelling the process between changepoints (CPs) and give an upper bound on the approximation error of such models. The resulting algorithm performs prediction, model selection and CP detection on-line. Its tim… ▽ More Bayesian On-line Changepoint Detection is extended to on-line model selection and non-stationary spatio-temporal processes. We propose spatially structured Vector Autoregressions (VARs) for modelling the process between changepoints (CPs) and give an upper bound on the approximation error of such models. The resulting algorithm performs prediction, model selection and CP detection on-line. Its time complexity is linear and its space complexity constant, and thus it is two orders of magnitudes faster than its closest competitor. In addition, it outperforms the state of the art for multivariate data. △ Less

Submitted 6 June, 2018; v1 submitted 14 May, 2018; originally announced May 2018.

Comments: 10 pages, 7f figures, to appear in Proceedings of the 35th International Conference on Machine Learning 2018

arXiv:1804.01431 [pdf, other]

Posterior Inference for Sparse Hierarchical Non-stationary Models

Authors: Karla Monterrubio-Gómez, Lassi Roininen, Sara Wade, Theo Damoulas, Mark Girolami

Abstract: Gaussian processes are valuable tools for non-parametric modelling, where typically an assumption of stationarity is employed. While removing this assumption can improve prediction, fitting such models is challenging. In this work, hierarchical models are constructed based on Gaussian Markov random fields with stochastic spatially varying parameters. Importantly, this allows for non-stationarity w… ▽ More Gaussian processes are valuable tools for non-parametric modelling, where typically an assumption of stationarity is employed. While removing this assumption can improve prediction, fitting such models is challenging. In this work, hierarchical models are constructed based on Gaussian Markov random fields with stochastic spatially varying parameters. Importantly, this allows for non-stationarity while also addressing the computational burden through a sparse banded representation of the precision matrix. In this setting, efficient Markov chain Monte Carlo (MCMC) sampling is challenging due to the strong coupling a posteriori of the parameters and hyperparameters. We develop and compare three adaptive MCMC schemes and make use of banded matrix operations for faster inference. Furthermore, a novel extension to multi-dimensional settings is proposed through an additive structure that retains the flexibility and scalability of the model, while also inheriting interpretability from the additive approach. A thorough assessment of the efficiency and accuracy of the methods in nonstationary settings is presented for both simulated experiments and a computer emulation problem. △ Less

Submitted 1 May, 2019; v1 submitted 4 April, 2018; originally announced April 2018.

Showing 1–26 of 26 results for author: Damoulas, T