Search | arXiv e-print repository

Outlier-robust Kalman Filtering through Generalised Bayes

Authors: Gerardo Duran-Martin, Matias Altamirano, Alexander Y. Shestopaloff, Leandro Sánchez-Betancourt, Jeremias Knoblauch, Matt Jones, François-Xavier Briol, Kevin Murphy

Abstract: We derive a novel, provably robust, and closed-form Bayesian update rule for online filtering in state-space models in the presence of outliers and misspecified measurement models. Our method combines generalised Bayesian inference with filtering methods such as the extended and ensemble Kalman filter. We use the former to show robustness and the latter to ensure computational efficiency in the ca… ▽ More We derive a novel, provably robust, and closed-form Bayesian update rule for online filtering in state-space models in the presence of outliers and misspecified measurement models. Our method combines generalised Bayesian inference with filtering methods such as the extended and ensemble Kalman filter. We use the former to show robustness and the latter to ensure computational efficiency in the case of nonlinear models. Our method matches or outperforms other robust filtering methods (such as those based on variational Bayes) at a much lower computational cost. We show this empirically on a range of filtering problems with outlier measurements, such as object tracking, state estimation in high-dimensional chaotic systems, and online learning of neural networks. △ Less

Submitted 28 May, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

Comments: 41st International Conference on Machine Learning (ICML 2024)

arXiv:2311.00463 [pdf, other]

Robust and Conjugate Gaussian Process Regression

Authors: Matias Altamirano, François-Xavier Briol, Jeremias Knoblauch

Abstract: To enable closed form conditioning, a common assumption in Gaussian process (GP) regression is independent and identically distributed Gaussian observation noise. This strong and simplistic assumption is often violated in practice, which leads to unreliable inferences and uncertainty quantification. Unfortunately, existing methods for robustifying GPs break closed-form conditioning, which makes th… ▽ More To enable closed form conditioning, a common assumption in Gaussian process (GP) regression is independent and identically distributed Gaussian observation noise. This strong and simplistic assumption is often violated in practice, which leads to unreliable inferences and uncertainty quantification. Unfortunately, existing methods for robustifying GPs break closed-form conditioning, which makes them less attractive to practitioners and significantly more computationally expensive. In this paper, we demonstrate how to perform provably robust and conjugate Gaussian process (RCGP) regression at virtually no additional cost using generalised Bayesian inference. RCGP is particularly versatile as it enables exact conjugate closed form updates in all settings where standard GPs admit them. To demonstrate its strong empirical performance, we deploy RCGP for problems ranging from Bayesian optimisation to sparse variational Gaussian processes. △ Less

Submitted 3 June, 2024; v1 submitted 1 November, 2023; originally announced November 2023.

arXiv:2305.15027 [pdf, other]

A Rigorous Link between Deep Ensembles and (Variational) Bayesian Methods

Authors: Veit David Wild, Sahra Ghalebikesabi, Dino Sejdinovic, Jeremias Knoblauch

Abstract: We establish the first mathematically rigorous link between Bayesian, variational Bayesian, and ensemble methods. A key step towards this it to reformulate the non-convex optimisation problem typically encountered in deep learning as a convex optimisation in the space of probability measures. On a technical level, our contribution amounts to studying generalised variational inference through the l… ▽ More We establish the first mathematically rigorous link between Bayesian, variational Bayesian, and ensemble methods. A key step towards this it to reformulate the non-convex optimisation problem typically encountered in deep learning as a convex optimisation in the space of probability measures. On a technical level, our contribution amounts to studying generalised variational inference through the lense of Wasserstein gradient flows. The result is a unified theory of various seemingly disconnected approaches that are commonly used for uncertainty quantification in deep learning -- including deep ensembles and (variational) Bayesian methods. This offers a fresh perspective on the reasons behind the success of deep ensembles over procedures based on parameterised variational inference, and allows the derivation of new ensembling schemes with convergence guarantees. We showcase this by proposing a family of interacting deep ensembles with direct parallels to the interactions of particle systems in thermodynamics, and use our theory to prove the convergence of these algorithms to a well-defined global minimiser on the space of probability measures. △ Less

Submitted 22 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

arXiv:2302.04759 [pdf, other]

Robust and Scalable Bayesian Online Changepoint Detection

Authors: Matias Altamirano, François-Xavier Briol, Jeremias Knoblauch

Abstract: This paper proposes an online, provably robust, and scalable Bayesian approach for changepoint detection. The resulting algorithm has key advantages over previous work: it provides provable robustness by leveraging the generalised Bayesian perspective, and also addresses the scalability issues of previous attempts. Specifically, the proposed generalised Bayesian formalism leads to conjugate poster… ▽ More This paper proposes an online, provably robust, and scalable Bayesian approach for changepoint detection. The resulting algorithm has key advantages over previous work: it provides provable robustness by leveraging the generalised Bayesian perspective, and also addresses the scalability issues of previous attempts. Specifically, the proposed generalised Bayesian formalism leads to conjugate posteriors whose parameters are available in closed form by leveraging diffusion score matching. The resulting algorithm is exact, can be updated through simple algebra, and is more than 10 times faster than its closest competitor. △ Less

Submitted 12 May, 2023; v1 submitted 9 February, 2023; originally announced February 2023.

arXiv:2210.11218 [pdf, other]

Explainable Multi-Agent Recommendation System for Energy-Efficient Decision Support in Smart Homes

Authors: Alona Zharova, Annika Boer, Julia Knoblauch, Kai Ingo Schewina, Jana Vihs

Abstract: Understandable and persuasive recommendations support the electricity consumers' behavioral change to tackle the energy efficiency problem. Generating load shifting recommendations for household appliances as explainable increases the transparency and trustworthiness of the system. This paper proposes an explainable multi-agent recommendation system for load shifting for household appliances. Firs… ▽ More Understandable and persuasive recommendations support the electricity consumers' behavioral change to tackle the energy efficiency problem. Generating load shifting recommendations for household appliances as explainable increases the transparency and trustworthiness of the system. This paper proposes an explainable multi-agent recommendation system for load shifting for household appliances. First, we provide agents with enhanced predictive capacity by including weather data, applying state-of-the-art models, and tuning the hyperparameters. Second, we suggest an Explainability Agent providing transparent recommendations. We also provide an overview of the predictive and explainability performance. Third, we discuss the impact and scaling potential of the suggested approach. △ Less

Submitted 4 January, 2023; v1 submitted 20 October, 2022; originally announced October 2022.

arXiv:2202.04744 [pdf, other]

Robust Bayesian Inference for Simulator-based Models via the MMD Posterior Bootstrap

Authors: Charita Dellaporta, Jeremias Knoblauch, Theodoros Damoulas, François-Xavier Briol

Abstract: Simulator-based models are models for which the likelihood is intractable but simulation of synthetic data is possible. They are often used to describe complex real-world phenomena, and as such can often be misspecified in practice. Unfortunately, existing Bayesian approaches for simulators are known to perform poorly in those cases. In this paper, we propose a novel algorithm based on the posteri… ▽ More Simulator-based models are models for which the likelihood is intractable but simulation of synthetic data is possible. They are often used to describe complex real-world phenomena, and as such can often be misspecified in practice. Unfortunately, existing Bayesian approaches for simulators are known to perform poorly in those cases. In this paper, we propose a novel algorithm based on the posterior bootstrap and maximum mean discrepancy estimators. This leads to a highly-parallelisable Bayesian inference algorithm with strong robustness properties. This is demonstrated through an in-depth theoretical study which includes generalisation bounds and proofs of frequentist consistency and robustness of our posterior. The approach is then assessed on a range of examples including a g-and-k distribution and a toggle-switch model. △ Less

Submitted 19 December, 2022; v1 submitted 9 February, 2022; originally announced February 2022.

Comments: Accepted for publication (with an oral presentation) at AISTATS 2022. A preliminary version of this paper was accepted in the NeurIPS 2021 workshop "Your Model is Wrong: Robustness and misspecification in probabilistic modeling". v2: added some references. v3: corrected small error in theorem 3

arXiv:2201.09042 [pdf, other]

Uncertainty-aware deep learning methods for robust diabetic retinopathy classification

Authors: Joel Jaskari, Jaakko Sahlsten, Theodoros Damoulas, Jeremias Knoblauch, Simo Särkkä, Leo Kärkkäinen, Kustaa Hietala, Kimmo Kaski

Abstract: Automatic classification of diabetic retinopathy from retinal images has been widely studied using deep neural networks with impressive results. However, there is a clinical need for estimation of the uncertainty in the classifications, a shortcoming of modern neural networks. Recently, approximate Bayesian deep learning methods have been proposed for the task but the studies have only considered… ▽ More Automatic classification of diabetic retinopathy from retinal images has been widely studied using deep neural networks with impressive results. However, there is a clinical need for estimation of the uncertainty in the classifications, a shortcoming of modern neural networks. Recently, approximate Bayesian deep learning methods have been proposed for the task but the studies have only considered the binary referable/non-referable diabetic retinopathy classification applied to benchmark datasets. We present novel results by systematically investigating a clinical dataset and a clinically relevant 5-class classification scheme, in addition to benchmark datasets and the binary classification scheme. Moreover, we derive a connection between uncertainty measures and classifier risk, from which we develop a new uncertainty measure. We observe that the previously proposed entropy-based uncertainty measure generalizes to the clinical dataset on the binary classification scheme but not on the 5-class scheme, whereas our new uncertainty measure generalizes to the latter case. △ Less

Submitted 2 February, 2022; v1 submitted 22 January, 2022; originally announced January 2022.

arXiv:2011.01596 [pdf, other]

Transforming Gaussian Processes With Normalizing Flows

Authors: Juan Maroñas, Oliver Hamelijnck, Jeremias Knoblauch, Theodoros Damoulas

Abstract: Gaussian Processes (GPs) can be used as flexible, non-parametric function priors. Inspired by the growing body of work on Normalizing Flows, we enlarge this class of priors through a parametric invertible transformation that can be made input-dependent. Doing so also allows us to encode interpretable prior knowledge (e.g., boundedness constraints). We derive a variational approximation to the resu… ▽ More Gaussian Processes (GPs) can be used as flexible, non-parametric function priors. Inspired by the growing body of work on Normalizing Flows, we enlarge this class of priors through a parametric invertible transformation that can be made input-dependent. Doing so also allows us to encode interpretable prior knowledge (e.g., boundedness constraints). We derive a variational approximation to the resulting Bayesian inference problem, which is as fast as stochastic variational GP regression (Hensman et al., 2013; Dezfouli and Bonilla,2015). This makes the model a computationally efficient alternative to other hierarchical extensions of GP priors (Lazaro-Gredilla,2012; Damianou and Lawrence, 2013). The resulting algorithm's computational and inferential performance is excellent, and we demonstrate this on a range of data sets. For example, even with only 5 inducing points and an input-dependent flow, our method is consistently competitive with a standard sparse GP fitted using 100 inducing points. △ Less

Submitted 25 February, 2021; v1 submitted 3 November, 2020; originally announced November 2020.

Comments: AISTATS 2021, camera ready

arXiv:2010.13456 [pdf, other]

Robust Bayesian Inference for Discrete Outcomes with the Total Variation Distance

Authors: Jeremias Knoblauch, Lara Vomfell

Abstract: Models of discrete-valued outcomes are easily misspecified if the data exhibit zero-inflation, overdispersion or contamination. Without additional knowledge about the existence and nature of this misspecification, model inference and prediction are adversely affected. Here, we introduce a robust discrepancy-based Bayesian approach using the Total Variation Distance (TVD). In the process, we addres… ▽ More Models of discrete-valued outcomes are easily misspecified if the data exhibit zero-inflation, overdispersion or contamination. Without additional knowledge about the existence and nature of this misspecification, model inference and prediction are adversely affected. Here, we introduce a robust discrepancy-based Bayesian approach using the Total Variation Distance (TVD). In the process, we address and resolve two challenges: First, we study convergence and robustness properties of a computationally efficient estimator for the TVD between a parametric model and the data-generating mechanism. Second, we provide an efficient inference method adapted from Lyddon et al. (2019) which corresponds to formulating an uninformative nonparametric prior directly over the data-generating mechanism. Lastly, we empirically demonstrate that our approach is robust and significantly improves predictive performance on a range of simulated and real world data. △ Less

Submitted 26 October, 2020; originally announced October 2020.

Comments: 16p., 7 figs.; authors contributed equally & author order determined by coin flip

arXiv:2006.05188 [pdf, other]

Optimal Continual Learning has Perfect Memory and is NP-hard

Authors: Jeremias Knoblauch, Hisham Husain, Tom Diethe

Abstract: Continual Learning (CL) algorithms incrementally learn a predictor or representation across multiple sequentially observed tasks. Designing CL algorithms that perform reliably and avoid so-called catastrophic forgetting has proven a persistent challenge. The current paper develops a theoretical approach that explains why. In particular, we derive the computational properties which CL algorithms wo… ▽ More Continual Learning (CL) algorithms incrementally learn a predictor or representation across multiple sequentially observed tasks. Designing CL algorithms that perform reliably and avoid so-called catastrophic forgetting has proven a persistent challenge. The current paper develops a theoretical approach that explains why. In particular, we derive the computational properties which CL algorithms would have to possess in order to avoid catastrophic forgetting. Our main finding is that such optimal CL algorithms generally solve an NP-hard problem and will require perfect memory to do so. The findings are of theoretical interest, but also explain the excellent performance of CL algorithms using experience replay, episodic memory and core sets relative to regularization-based approaches. △ Less

Submitted 9 June, 2020; originally announced June 2020.

Comments: Accepted for publication at ICML (International Conference on Machine Learning) 2020; 13 pages, 8 Figures

arXiv:1904.02303 [pdf, other]

Robust Deep Gaussian Processes

Authors: Jeremias Knoblauch

Abstract: This report provides an in-depth overview over the implications and novelty Generalized Variational Inference (GVI) (Knoblauch et al., 2019) brings to Deep Gaussian Processes (DGPs) (Damianou & Lawrence, 2013). Specifically, robustness to model misspecification as well as principled alternatives for uncertainty quantification are motivated with an information-geometric view. These modifications ha… ▽ More This report provides an in-depth overview over the implications and novelty Generalized Variational Inference (GVI) (Knoblauch et al., 2019) brings to Deep Gaussian Processes (DGPs) (Damianou & Lawrence, 2013). Specifically, robustness to model misspecification as well as principled alternatives for uncertainty quantification are motivated with an information-geometric view. These modifications have clear interpretations and can be implemented in less than 100 lines of Python code. Most importantly, the corresponding empirical results show that DGPs can greatly benefit from the presented enhancements. △ Less

Submitted 20 May, 2019; v1 submitted 3 April, 2019; originally announced April 2019.

Comments: 11 pages, 4 figures

arXiv:1904.02063 [pdf, other]

Generalized Variational Inference: Three arguments for deriving new Posteriors

Authors: Jeremias Knoblauch, Jack Jewson, Theodoros Damoulas

Abstract: We advocate an optimization-centric view on and introduce a novel generalization of Bayesian inference. Our inspiration is the representation of Bayes' rule as infinite-dimensional optimization problem (Csiszar, 1975; Donsker and Varadhan; 1975, Zellner; 1988). First, we use it to prove an optimality result of standard Variational Inference (VI): Under the proposed view, the standard Evidence Lowe… ▽ More We advocate an optimization-centric view on and introduce a novel generalization of Bayesian inference. Our inspiration is the representation of Bayes' rule as infinite-dimensional optimization problem (Csiszar, 1975; Donsker and Varadhan; 1975, Zellner; 1988). First, we use it to prove an optimality result of standard Variational Inference (VI): Under the proposed view, the standard Evidence Lower Bound (ELBO) maximizing VI posterior is preferable to alternative approximations of the Bayesian posterior. Next, we argue for generalizing standard Bayesian inference. The need for this arises in situations of severe misalignment between reality and three assumptions underlying standard Bayesian inference: (1) Well-specified priors, (2) well-specified likelihoods, (3) the availability of infinite computing power. Our generalization addresses these shortcomings with three arguments and is called the Rule of Three (RoT). We derive it axiomatically and recover existing posteriors as special cases, including the Bayesian posterior and its approximation by standard VI. In contrast, approximations based on alternative ELBO-like objectives violate the axioms. Finally, we study a special case of the RoT that we call Generalized Variational Inference (GVI). GVI posteriors are a large and tractable family of belief distributions specified by three arguments: A loss, a divergence and a variational family. GVI posteriors have appealing properties, including consistency and an interpretation as approximate ELBO. The last part of the paper explores some attractive applications of GVI in popular machine learning models, including robustness and more appropriate marginals. After deriving black box inference schemes for GVI posteriors, their predictive performance is investigated on Bayesian Neural Networks and Deep Gaussian Processes, where GVI can comprehensively improve upon existing methods. △ Less

Submitted 12 December, 2019; v1 submitted 3 April, 2019; originally announced April 2019.

Comments: 103 pages, 23 figures (comprehensive revision of previous version)

arXiv:1806.02261 [pdf, other]

Doubly Robust Bayesian Inference for Non-Stationary Streaming Data with $β$-Divergences

Authors: Jeremias Knoblauch, Jack Jewson, Theodoros Damoulas

Abstract: We present the very first robust Bayesian Online Changepoint Detection algorithm through General Bayesian Inference (GBI) with $β$-divergences. The resulting inference procedure is doubly robust for both the parameter and the changepoint (CP) posterior, with linear time and constant space complexity. We provide a construction for exponential models and demonstrate it on the Bayesian Linear Regress… ▽ More We present the very first robust Bayesian Online Changepoint Detection algorithm through General Bayesian Inference (GBI) with $β$-divergences. The resulting inference procedure is doubly robust for both the parameter and the changepoint (CP) posterior, with linear time and constant space complexity. We provide a construction for exponential models and demonstrate it on the Bayesian Linear Regression model. In so doing, we make two additional contributions: Firstly, we make GBI scalable using Structural Variational approximations that are exact as $β\to 0$. Secondly, we give a principled way of choosing the divergence parameter $β$ by minimizing expected predictive loss on-line. Reducing False Discovery Rates of CPs from more than 90% to 0% on real world data, this offers the state of the art. △ Less

Submitted 27 November, 2018; v1 submitted 6 June, 2018; originally announced June 2018.

Comments: 39 pages, 11 figures, published at Neural Information Processing Systems (NeurIPS) 2018

Journal ref: Neural Information Processing Systems (NeurIPS) 2018

arXiv:1805.05383 [pdf, other]

Spatio-temporal Bayesian On-line Changepoint Detection with Model Selection

Authors: Jeremias Knoblauch, Theodoros Damoulas

Abstract: Bayesian On-line Changepoint Detection is extended to on-line model selection and non-stationary spatio-temporal processes. We propose spatially structured Vector Autoregressions (VARs) for modelling the process between changepoints (CPs) and give an upper bound on the approximation error of such models. The resulting algorithm performs prediction, model selection and CP detection on-line. Its tim… ▽ More Bayesian On-line Changepoint Detection is extended to on-line model selection and non-stationary spatio-temporal processes. We propose spatially structured Vector Autoregressions (VARs) for modelling the process between changepoints (CPs) and give an upper bound on the approximation error of such models. The resulting algorithm performs prediction, model selection and CP detection on-line. Its time complexity is linear and its space complexity constant, and thus it is two orders of magnitudes faster than its closest competitor. In addition, it outperforms the state of the art for multivariate data. △ Less

Submitted 6 June, 2018; v1 submitted 14 May, 2018; originally announced May 2018.

Comments: 10 pages, 7f figures, to appear in Proceedings of the 35th International Conference on Machine Learning 2018

Showing 1–14 of 14 results for author: Knoblauch, J