Search | arXiv e-print repository

Root Cause Analysis of Outliers with Missing Structural Knowledge

Authors: Nastaran Okati, Sergio Hernan Garrido Mejia, William Roy Orchard, Patrick Blöbaum, Dominik Janzing

Abstract: Recent work conceptualized root cause analysis (RCA) of anomalies via quantitative contribution analysis using causal counterfactuals in structural causal models (SCMs). The framework comes with three practical challenges: (1) it requires the causal directed acyclic graph (DAG), together with an SCM, (2) it is statistically ill-posed since it probes regression models in regions of low probability… ▽ More Recent work conceptualized root cause analysis (RCA) of anomalies via quantitative contribution analysis using causal counterfactuals in structural causal models (SCMs). The framework comes with three practical challenges: (1) it requires the causal directed acyclic graph (DAG), together with an SCM, (2) it is statistically ill-posed since it probes regression models in regions of low probability density, (3) it relies on Shapley values which are computationally expensive to find. In this paper, we propose simplified, efficient methods of root cause analysis when the task is to identify a unique root cause instead of quantitative contribution analysis. Our proposed methods run in linear order of SCM nodes and they require only the causal DAG without counterfactuals. Furthermore, for those use cases where the causal DAG is unknown, we justify the heuristic of identifying root causes as the variables with the highest anomaly score. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2305.09565 [pdf, other]

Toward Falsifying Causal Graphs Using a Permutation-Based Test

Authors: Elias Eulig, Atalanti A. Mastakouri, Patrick Blöbaum, Michaela Hardt, Dominik Janzing

Abstract: Understanding the causal relationships among the variables of a system is paramount to explain and control its behaviour. Inferring the causal graph from observational data without interventions, however, requires a lot of strong assumptions that are not always realistic. Even for domain experts it can be challenging to express the causal graph. Therefore, metrics that quantitatively assess the go… ▽ More Understanding the causal relationships among the variables of a system is paramount to explain and control its behaviour. Inferring the causal graph from observational data without interventions, however, requires a lot of strong assumptions that are not always realistic. Even for domain experts it can be challenging to express the causal graph. Therefore, metrics that quantitatively assess the goodness of a causal graph provide helpful checks before using it in downstream tasks. Existing metrics provide an absolute number of inconsistencies between the graph and the observed data, and without a baseline, practitioners are left to answer the hard question of how many such inconsistencies are acceptable or expected. Here, we propose a novel consistency metric by constructing a surrogate baseline through node permutations. By comparing the number of inconsistencies with those on the surrogate baseline, we derive an interpretable metric that captures whether the DAG fits significantly better than random. Evaluating on both simulated and real data sets from various domains, including biology and cloud monitoring, we demonstrate that the true DAG is not falsified by our metric, whereas the wrong graphs given by a hypothetical user are likely to be falsified. △ Less

Submitted 16 May, 2023; originally announced May 2023.

Comments: 23 pages, 9 figures

arXiv:2302.00860 [pdf, other]

Interventional and Counterfactual Inference with Diffusion Models

Authors: Patrick Chao, Patrick Blöbaum, Shiva Prasad Kasiviswanathan

Abstract: We consider the problem of answering observational, interventional, and counterfactual queries in a causally sufficient setting where only observational data and the causal graph are available. Utilizing the recent developments in diffusion models, we introduce diffusion-based causal models (DCM) to learn causal mechanisms, that generate unique latent encodings. These encodings enable us to direct… ▽ More We consider the problem of answering observational, interventional, and counterfactual queries in a causally sufficient setting where only observational data and the causal graph are available. Utilizing the recent developments in diffusion models, we introduce diffusion-based causal models (DCM) to learn causal mechanisms, that generate unique latent encodings. These encodings enable us to directly sample under interventions and perform abduction for counterfactuals. Diffusion models are a natural fit here, since they can encode each node to a latent representation that acts as a proxy for exogenous noise. Our empirical evaluations demonstrate significant improvements over existing state-of-the-art methods for answering causal queries. Furthermore, we provide theoretical results that offer a methodology for analyzing counterfactual estimation in general encoder-decoder models, which could be useful in settings beyond our proposed approach. △ Less

Submitted 6 June, 2023; v1 submitted 1 February, 2023; originally announced February 2023.

arXiv:2301.05182 [pdf, other]

Thompson Sampling with Diffusion Generative Prior

Authors: Yu-Guan Hsieh, Shiva Prasad Kasiviswanathan, Branislav Kveton, Patrick Blöbaum

Abstract: In this work, we initiate the idea of using denoising diffusion models to learn priors for online decision making problems. Our special focus is on the meta-learning for bandit framework, with the goal of learning a strategy that performs well across bandit tasks of a same class. To this end, we train a diffusion model that learns the underlying task distribution and combine Thompson sampling with… ▽ More In this work, we initiate the idea of using denoising diffusion models to learn priors for online decision making problems. Our special focus is on the meta-learning for bandit framework, with the goal of learning a strategy that performs well across bandit tasks of a same class. To this end, we train a diffusion model that learns the underlying task distribution and combine Thompson sampling with the learned prior to deal with new tasks at test time. Our posterior sampling algorithm is designed to carefully balance between the learned prior and the noisy observations that come from the learner's interaction with the environment. To capture realistic bandit scenarios, we also propose a novel diffusion model training procedure that trains even from incomplete and/or noisy data, which could be of independent interest. Finally, our extensive experimental evaluations clearly demonstrate the potential of the proposed approach. △ Less

Submitted 30 January, 2023; v1 submitted 12 January, 2023; originally announced January 2023.

arXiv:2301.04041 [pdf, other]

Manifold Restricted Interventional Shapley Values

Authors: Muhammad Faaiz Taufiq, Patrick Blöbaum, Lenon Minorics

Abstract: Shapley values are model-agnostic methods for explaining model predictions. Many commonly used methods of computing Shapley values, known as off-manifold methods, rely on model evaluations on out-of-distribution input samples. Consequently, explanations obtained are sensitive to model behaviour outside the data distribution, which may be irrelevant for all practical purposes. While on-manifold met… ▽ More Shapley values are model-agnostic methods for explaining model predictions. Many commonly used methods of computing Shapley values, known as off-manifold methods, rely on model evaluations on out-of-distribution input samples. Consequently, explanations obtained are sensitive to model behaviour outside the data distribution, which may be irrelevant for all practical purposes. While on-manifold methods have been proposed which do not suffer from this problem, we show that such methods are overly dependent on the input data distribution, and therefore result in unintuitive and misleading explanations. To circumvent these problems, we propose ManifoldShap, which respects the model's domain of validity by restricting model evaluations to the data manifold. We show, theoretically and empirically, that ManifoldShap is robust to off-manifold perturbations of the model and leads to more accurate and intuitive explanations than existing state-of-the-art Shapley methods. △ Less

Submitted 25 February, 2023; v1 submitted 10 January, 2023; originally announced January 2023.

Comments: Proceedings of the 26th International Conference on Artificial Intelligence and Statistics (AISTATS) 2023

arXiv:2212.07383 [pdf, other]

Sequential Kernelized Independence Testing

Authors: Aleksandr Podkopaev, Patrick Blöbaum, Shiva Prasad Kasiviswanathan, Aaditya Ramdas

Abstract: Independence testing is a classical statistical problem that has been extensively studied in the batch setting when one fixes the sample size before collecting data. However, practitioners often prefer procedures that adapt to the complexity of a problem at hand instead of setting sample size in advance. Ideally, such procedures should (a) stop earlier on easy tasks (and later on harder tasks), he… ▽ More Independence testing is a classical statistical problem that has been extensively studied in the batch setting when one fixes the sample size before collecting data. However, practitioners often prefer procedures that adapt to the complexity of a problem at hand instead of setting sample size in advance. Ideally, such procedures should (a) stop earlier on easy tasks (and later on harder tasks), hence making better use of available resources, and (b) continuously monitor the data and efficiently incorporate statistical evidence after collecting new data, while controlling the false alarm rate. Classical batch tests are not tailored for streaming data: valid inference after data peeking requires correcting for multiple testing which results in low power. Following the principle of testing by betting, we design sequential kernelized independence tests that overcome such shortcomings. We exemplify our broad framework using bets inspired by kernelized dependence measures, e.g., the Hilbert-Schmidt independence criterion. Our test is also valid under non-i.i.d., time-varying settings. We demonstrate the power of our approaches on both simulated and real data. △ Less

Submitted 19 July, 2023; v1 submitted 14 December, 2022; originally announced December 2022.

Comments: To appear at ICML 2023

arXiv:2206.06821 [pdf, other]

DoWhy-GCM: An extension of DoWhy for causal inference in graphical causal models

Authors: Patrick Blöbaum, Peter Götz, Kailash Budhathoki, Atalanti A. Mastakouri, Dominik Janzing

Abstract: We present DoWhy-GCM, an extension of the DoWhy Python library, which leverages graphical causal models. Unlike existing causality libraries, which mainly focus on effect estimation, DoWhy-GCM addresses diverse causal queries, such as identifying the root causes of outliers and distributional changes, attributing causal influences to the data generating process of each node, or diagnosis of causal… ▽ More We present DoWhy-GCM, an extension of the DoWhy Python library, which leverages graphical causal models. Unlike existing causality libraries, which mainly focus on effect estimation, DoWhy-GCM addresses diverse causal queries, such as identifying the root causes of outliers and distributional changes, attributing causal influences to the data generating process of each node, or diagnosis of causal structures. With DoWhy-GCM, users typically specify cause-effect relations via a causal graph, fit causal mechanisms, and pose causal queries -- all with just a few lines of code. The general documentation is available at https://www.pywhy.org/dowhy and the DoWhy-GCM specific code at https://github.com/py-why/dowhy/tree/main/dowhy/gcm. △ Less

Submitted 6 June, 2024; v1 submitted 14 June, 2022; originally announced June 2022.

Journal ref: Journal of Machine Learning Research 25(147), 2024

arXiv:2202.11612 [pdf, other]

Testing Granger Non-Causality in Panels with Cross-Sectional Dependencies

Authors: Lenon Minorics, Caner Turkmen, David Kernert, Patrick Bloebaum, Laurent Callot, Dominik Janzing

Abstract: This paper proposes a new approach for testing Granger non-causality on panel data. Instead of aggregating panel member statistics, we aggregate their corresponding p-values and show that the resulting p-value approximately bounds the type I error by the chosen significance level even if the panel members are dependent. We compare our approach against the most widely used Granger causality algorit… ▽ More This paper proposes a new approach for testing Granger non-causality on panel data. Instead of aggregating panel member statistics, we aggregate their corresponding p-values and show that the resulting p-value approximately bounds the type I error by the chosen significance level even if the panel members are dependent. We compare our approach against the most widely used Granger causality algorithm on panel data and show that our approach yields lower FDR at the same power for large sample sizes and panels with cross-sectional dependencies. Finally, we examine COVID-19 data about confirmed cases and deaths measured in countries/regions worldwide and show that our approach is able to discover the true causal relation between confirmed cases and deaths while state-of-the-art approaches fail. △ Less

Submitted 23 February, 2022; originally announced February 2022.

arXiv:2102.13384 [pdf, other]

Why did the distribution change?

Authors: Kailash Budhathoki, Dominik Janzing, Patrick Bloebaum, Hoiyi Ng

Abstract: We describe a formal approach based on graphical causal models to identify the "root causes" of the change in the probability distribution of variables. After factorizing the joint distribution into conditional distributions of each variable, given its parents (the "causal mechanisms"), we attribute the change to changes of these causal mechanisms. This attribution analysis accounts for the fact t… ▽ More We describe a formal approach based on graphical causal models to identify the "root causes" of the change in the probability distribution of variables. After factorizing the joint distribution into conditional distributions of each variable, given its parents (the "causal mechanisms"), we attribute the change to changes of these causal mechanisms. This attribution analysis accounts for the fact that mechanisms often change independently and sometimes only some of them change. Through simulations, we study the performance of our distribution change attribution method. We then present a real-world case study identifying the drivers of the difference in the income distribution between men and women. △ Less

Submitted 23 May, 2021; v1 submitted 26 February, 2021; originally announced February 2021.

Comments: Proceedings of the Twenty Fourth International Conference on Artificial Intelligence and Statistics (AISTATS), 2021

arXiv:2007.00714 [pdf, other]

Quantifying intrinsic causal contributions via structure preserving interventions

Authors: Dominik Janzing, Patrick Blöbaum, Atalanti A. Mastakouri, Philipp M. Faller, Lenon Minorics, Kailash Budhathoki

Abstract: We propose a notion of causal influence that describes the `intrinsic' part of the contribution of a node on a target node in a DAG. By recursively writing each node as a function of the upstream noise terms, we separate the intrinsic information added by each node from the one obtained from its ancestors. To interpret the intrinsic information as a {\it causal} contribution, we consider `structur… ▽ More We propose a notion of causal influence that describes the `intrinsic' part of the contribution of a node on a target node in a DAG. By recursively writing each node as a function of the upstream noise terms, we separate the intrinsic information added by each node from the one obtained from its ancestors. To interpret the intrinsic information as a {\it causal} contribution, we consider `structure-preserving interventions' that randomize each node in a way that mimics the usual dependence on the parents and does not perturb the observed joint distribution. To get a measure that is invariant with respect to relabelling nodes we use Shapley based symmetrization and show that it reduces in the linear case to simple ANOVA after resolving the target node into noise variables. We describe our contribution analysis for variance and entropy, but contributions for other target metrics can be defined analogously. The code is available in the package gcm of the open source library DoWhy. △ Less

Submitted 8 March, 2024; v1 submitted 1 July, 2020; originally announced July 2020.

Comments: to appear at AISTATS 2024

Journal ref: AISTATS 2024, https://proceedings.mlr.press/v238/janzing24a.html

arXiv:1912.02724 [pdf, other]

Causal structure based root cause analysis of outliers

Authors: Dominik Janzing, Kailash Budhathoki, Lenon Minorics, Patrick Blöbaum

Abstract: We describe a formal approach to identify 'root causes' of outliers observed in $n$ variables $X_1,\dots,X_n$ in a scenario where the causal relation between the variables is a known directed acyclic graph (DAG). To this end, we first introduce a systematic way to define outlier scores. Further, we introduce the concept of 'conditional outlier score' which measures whether a value of some variable… ▽ More We describe a formal approach to identify 'root causes' of outliers observed in $n$ variables $X_1,\dots,X_n$ in a scenario where the causal relation between the variables is a known directed acyclic graph (DAG). To this end, we first introduce a systematic way to define outlier scores. Further, we introduce the concept of 'conditional outlier score' which measures whether a value of some variable is unexpected *given the value of its parents* in the DAG, if one were to assume that the causal structure and the corresponding conditional distributions are also valid for the anomaly. Finally, we quantify to what extent the high outlier score of some target variable can be attributed to outliers of its ancestors. This quantification is defined via Shapley values from cooperative game theory. △ Less

Submitted 5 December, 2019; originally announced December 2019.

Comments: 11 pages, 9 Figures

arXiv:1910.13413 [pdf, other]

Feature relevance quantification in explainable AI: A causal problem

Authors: Dominik Janzing, Lenon Minorics, Patrick Blöbaum

Abstract: We discuss promising recent contributions on quantifying feature relevance using Shapley values, where we observed some confusion on which probability distribution is the right one for dropped features. We argue that the confusion is based on not carefully distinguishing between observational and interventional conditional probabilities and try a clarification based on Pearl's seminal work on caus… ▽ More We discuss promising recent contributions on quantifying feature relevance using Shapley values, where we observed some confusion on which probability distribution is the right one for dropped features. We argue that the confusion is based on not carefully distinguishing between observational and interventional conditional probabilities and try a clarification based on Pearl's seminal work on causality. We conclude that unconditional rather than conditional expectations provide the right notion of drop** features in contradiction to the theoretical justification of the software package SHAP. Parts of SHAP are unaffected because unconditional expectations (which we argue to be conceptually right) are used as approximation for the conditional ones, which encouraged others to `improve' SHAP in a way that we believe to be flawed. △ Less

Submitted 27 November, 2019; v1 submitted 29 October, 2019; originally announced October 2019.

Comments: 11 pages, 5 figures

arXiv:1709.00776 [pdf, other]

Estimation of interventional effects of features on prediction

Authors: Patrick Blöbaum, Shohei Shimizu

Abstract: The interpretability of prediction mechanisms with respect to the underlying prediction problem is often unclear. While several studies have focused on develo** prediction models with meaningful parameters, the causal relationships between the predictors and the actual prediction have not been considered. Here, we connect the underlying causal structure of a data generation process and the causa… ▽ More The interpretability of prediction mechanisms with respect to the underlying prediction problem is often unclear. While several studies have focused on develo** prediction models with meaningful parameters, the causal relationships between the predictors and the actual prediction have not been considered. Here, we connect the underlying causal structure of a data generation process and the causal structure of a prediction mechanism. To achieve this, we propose a framework that identifies the feature with the greatest causal influence on the prediction and estimates the necessary causal intervention of a feature such that a desired prediction is obtained. The general concept of the framework has no restrictions regarding data linearity; however, we focus on an implementation for linear data here. The framework applicability is evaluated using artificial data and demonstrated using real-world data. △ Less

Submitted 3 September, 2017; originally announced September 2017.

Comments: To appear in Proc. IEEE International Workshop on Machine Learning for Signal Processing (MLSP2017)

arXiv:1610.03263 [pdf, other]

doi 10.1007/s41237-017-0022-z

Error Asymmetry in Causal and Anticausal Regression

Authors: Patrick Blöbaum, Takashi Washio, Shohei Shimizu

Abstract: It is generally difficult to make any statements about the expected prediction error in an univariate setting without further knowledge about how the data were generated. Recent work showed that knowledge about the real underlying causal structure of a data generation process has implications for various machine learning settings. Assuming an additive noise and an independence between data generat… ▽ More It is generally difficult to make any statements about the expected prediction error in an univariate setting without further knowledge about how the data were generated. Recent work showed that knowledge about the real underlying causal structure of a data generation process has implications for various machine learning settings. Assuming an additive noise and an independence between data generating mechanism and its input, we draw a novel connection between the intrinsic causal relationship of two variables and the expected prediction error. We formulate the theorem that the expected error of the true data generating function as prediction model is generally smaller when the effect is predicted from its cause and, on the contrary, greater when the cause is predicted from its effect. The theorem implies an asymmetry in the error depending on the prediction direction. This is further corroborated with empirical evaluations in artificial and real-world data sets. △ Less

Submitted 17 April, 2017; v1 submitted 11 October, 2016; originally announced October 2016.

Journal ref: Behaviormetrika, 2017, 10.1007/s41237-017-0022-z

Showing 1–14 of 14 results for author: Blöbaum, P