-
Causal structure based root cause analysis of outliers
Authors:
Dominik Janzing,
Kailash Budhathoki,
Lenon Minorics,
Patrick Blöbaum
Abstract:
We describe a formal approach to identify 'root causes' of outliers observed in $n$ variables $X_1,\dots,X_n$ in a scenario where the causal relation between the variables is a known directed acyclic graph (DAG). To this end, we first introduce a systematic way to define outlier scores. Further, we introduce the concept of 'conditional outlier score' which measures whether a value of some variable…
▽ More
We describe a formal approach to identify 'root causes' of outliers observed in $n$ variables $X_1,\dots,X_n$ in a scenario where the causal relation between the variables is a known directed acyclic graph (DAG). To this end, we first introduce a systematic way to define outlier scores. Further, we introduce the concept of 'conditional outlier score' which measures whether a value of some variable is unexpected *given the value of its parents* in the DAG, if one were to assume that the causal structure and the corresponding conditional distributions are also valid for the anomaly. Finally, we quantify to what extent the high outlier score of some target variable can be attributed to outliers of its ancestors. This quantification is defined via Shapley values from cooperative game theory.
△ Less
Submitted 5 December, 2019;
originally announced December 2019.
-
Structural causal models for macro-variables in time-series
Authors:
Dominik Janzing,
Paul Rubenstein,
Bernhard Schölkopf
Abstract:
We consider a bivariate time series $(X_t,Y_t)$ that is given by a simple linear autoregressive model. Assuming that the equations describing each variable as a linear combination of past values are considered structural equations, there is a clear meaning of how intervening on one particular $X_t$ influences $Y_{t'}$ at later times $t'>t$. In the present work, we describe conditions under which o…
▽ More
We consider a bivariate time series $(X_t,Y_t)$ that is given by a simple linear autoregressive model. Assuming that the equations describing each variable as a linear combination of past values are considered structural equations, there is a clear meaning of how intervening on one particular $X_t$ influences $Y_{t'}$ at later times $t'>t$. In the present work, we describe conditions under which one can define a causal model between variables that are coarse-grained in time, thus admitting statements like `setting $X$ to $x$ changes $Y$ in a certain way' without referring to specific time instances. We show that particularly simple statements follow in the frequency domain, thus providing meaning to interventions on frequencies.
△ Less
Submitted 11 April, 2018;
originally announced April 2018.
-
Merging joint distributions via causal model classes with low VC dimension
Authors:
Dominik Janzing
Abstract:
If $X,Y,Z$ denote sets of random variables, two different data sources may contain samples from $P_{X,Y}$ and $P_{Y,Z}$, respectively. We argue that causal inference can help inferring properties of the 'unobserved joint distributions' $P_{X,Y,Z}$ or $P_{X,Z}$. The properties may be conditional independences or also quantitative statements about dependences.
More generally, we define a learning…
▽ More
If $X,Y,Z$ denote sets of random variables, two different data sources may contain samples from $P_{X,Y}$ and $P_{Y,Z}$, respectively. We argue that causal inference can help inferring properties of the 'unobserved joint distributions' $P_{X,Y,Z}$ or $P_{X,Z}$. The properties may be conditional independences or also quantitative statements about dependences.
More generally, we define a learning scenario where the input is a subset of variables and the label is some statistical property of that subset. Sets of jointly observed variables define the training points, while unobserved sets are possible test points. To solve this learning task, we infer, as an intermediate step, a causal model from the observations that then entails properties of unobserved sets. Accordingly, we can define the VC dimension of a class of causal models and derive generalization bounds for the predictions.
Here, causal inference becomes more modest and better accessible to empirical tests than usual: rather than trying to find a causal hypothesis that is 'true' (which is a problematic term when it is unclear how to define interventions) a causal hypothesis is useful whenever it correctly predicts statistical properties of unobserved joint distributions.
Within such a 'pragmatic' application of causal inference, some popular heuristic approaches become justified in retrospect. It is, for instance, allowed to infer DAGs from partial correlations instead of conditional independences if the DAGs are only used to predict partial correlations.
I hypothesize that our pragmatic view on causality may even cover the usual meaning in terms of interventions and sketch why predicting the impact of interventions can sometimes also be phrased as a task of the above type.
△ Less
Submitted 17 May, 2018; v1 submitted 9 April, 2018;
originally announced April 2018.
-
A central limit like theorem for Fourier sums
Authors:
Dominik Janzing,
Naji Shajarisales,
Michel Besserve
Abstract:
We consider the probability distributions of values in the complex plane attained by Fourier sums of the form \sum_{j=1}^n a_j exp(-2πi j nu) /sqrt{n} when the frequency nu is drawn uniformly at random from an interval of length 1. If the coefficients a_j are i.i.d. drawn with finite third moment, the distance of these distributions to an isotropic two-dimensional Gaussian on C converges in probab…
▽ More
We consider the probability distributions of values in the complex plane attained by Fourier sums of the form \sum_{j=1}^n a_j exp(-2πi j nu) /sqrt{n} when the frequency nu is drawn uniformly at random from an interval of length 1. If the coefficients a_j are i.i.d. drawn with finite third moment, the distance of these distributions to an isotropic two-dimensional Gaussian on C converges in probability to zero for any pseudometric on the set of distributions for which the distance between empirical distributions and the underlying distribution converges to zero in probability.
△ Less
Submitted 21 July, 2017;
originally announced July 2017.
-
Group invariance principles for causal generative models
Authors:
Michel Besserve,
Naji Shajarisales,
Bernhard Schölkopf,
Dominik Janzing
Abstract:
The postulate of independence of cause and mechanism (ICM) has recently led to several new causal discovery algorithms. The interpretation of independence and the way it is utilized, however, varies across these methods. Our aim in this paper is to propose a group theoretic framework for ICM to unify and generalize these approaches. In our setting, the cause-mechanism relationship is assessed by c…
▽ More
The postulate of independence of cause and mechanism (ICM) has recently led to several new causal discovery algorithms. The interpretation of independence and the way it is utilized, however, varies across these methods. Our aim in this paper is to propose a group theoretic framework for ICM to unify and generalize these approaches. In our setting, the cause-mechanism relationship is assessed by comparing it against a null hypothesis through the application of random generic group transformations. We show that the group theoretic view provides a very general tool to study the structure of data generating mechanisms with direct applications to machine learning.
△ Less
Submitted 5 May, 2017;
originally announced May 2017.
-
Algorithmic independence of initial condition and dynamical law in thermodynamics and causal inference
Authors:
Dominik Janzing,
Rafael Chaves,
Bernhard Schoelkopf
Abstract:
We postulate a principle stating that the initial condition of a physical system is typically algorithmically independent of the dynamical law. We argue that this links thermodynamics and causal inference. On the one hand, it entails behaviour that is similar to the usual arrow of time. On the other hand, it motivates a statistical asymmetry between cause and effect that has recently postulated in…
▽ More
We postulate a principle stating that the initial condition of a physical system is typically algorithmically independent of the dynamical law. We argue that this links thermodynamics and causal inference. On the one hand, it entails behaviour that is similar to the usual arrow of time. On the other hand, it motivates a statistical asymmetry between cause and effect that has recently postulated in the field of causal inference, namely, that the probability distribution P(cause) contains no information about the conditional distribution P(effect|cause) and vice versa, while P(effect) may contain information about P(cause|effect).
△ Less
Submitted 7 December, 2015;
originally announced December 2015.
-
Quantifying causal influences
Authors:
Dominik Janzing,
David Balduzzi,
Moritz Grosse-Wentrup,
Bernhard Schölkopf
Abstract:
Many methods for causal inference generate directed acyclic graphs (DAGs) that formalize causal relations between $n$ variables. Given the joint distribution on all these variables, the DAG contains all information about how intervening on one variable changes the distribution of the other $n-1$ variables. However, quantifying the causal influence of one variable on another one remains a nontrivia…
▽ More
Many methods for causal inference generate directed acyclic graphs (DAGs) that formalize causal relations between $n$ variables. Given the joint distribution on all these variables, the DAG contains all information about how intervening on one variable changes the distribution of the other $n-1$ variables. However, quantifying the causal influence of one variable on another one remains a nontrivial question. Here we propose a set of natural, intuitive postulates that a measure of causal strength should satisfy. We then introduce a communication scenario, where edges in a DAG play the role of channels that can be locally corrupted by interventions. Causal strength is then the relative entropy distance between the old and the new distribution. Many other measures of causal strength have been proposed, including average causal effect, transfer entropy, directed information, and information flow. We explain how they fail to satisfy the postulates on simple DAGs of $\leq3$ nodes. Finally, we investigate the behavior of our measure on time-series, supporting our claims with experiments on simulated data.
△ Less
Submitted 28 January, 2014; v1 submitted 29 March, 2012;
originally announced March 2012.
-
Causal inference using the algorithmic Markov condition
Authors:
Dominik Janzing,
Bernhard Schoelkopf
Abstract:
Inferring the causal structure that links n observables is usually based upon detecting statistical dependences and choosing simple graphs that make the joint measure Markovian. Here we argue why causal inference is also possible when only single observations are present.
We develop a theory how to generate causal graphs explaining similarities between single objects. To this end, we replace t…
▽ More
Inferring the causal structure that links n observables is usually based upon detecting statistical dependences and choosing simple graphs that make the joint measure Markovian. Here we argue why causal inference is also possible when only single observations are present.
We develop a theory how to generate causal graphs explaining similarities between single objects. To this end, we replace the notion of conditional stochastic independence in the causal Markov condition with the vanishing of conditional algorithmic mutual information and describe the corresponding causal inference rules.
We explain why a consistent reformulation of causal inference in terms of algorithmic complexity implies a new inference principle that takes into account also the complexity of conditional probability densities, making it possible to select among Markov equivalent causal graphs. This insight provides a theoretical foundation of a heuristic principle proposed in earlier work.
We also discuss how to replace Kolmogorov complexity with decidable complexity criteria. This can be seen as an algorithmic analog of replacing the empirically undecidable question of statistical independence with practical independence tests that are based on implicit or explicit assumptions on the underlying distribution.
△ Less
Submitted 23 April, 2008;
originally announced April 2008.
-
Selection Criterion for Log-Linear Models Using Statistical Learning Theory
Authors:
Daniel Herrmann,
Dominik Janzing
Abstract:
Log-linear models are a well-established method for describing statistical dependencies among a set of n random variables. The observed frequencies of the n-tuples are explained by a joint probability such that its logarithm is a sum of functions, where each function depends on as few variables as possible. We obtain for this class a new model selection criterion using nonasymptotic concepts of…
▽ More
Log-linear models are a well-established method for describing statistical dependencies among a set of n random variables. The observed frequencies of the n-tuples are explained by a joint probability such that its logarithm is a sum of functions, where each function depends on as few variables as possible. We obtain for this class a new model selection criterion using nonasymptotic concepts of statistical learning theory. We calculate the VC dimension for the class of k-factor log-linear models. In this way we are not only able to select the model with the appropriate complexity, but obtain also statements on the reliability of the estimated probability distribution. Furthermore we show that the selection of the best model among a set of models with the same complexity can be written as a convex optimization problem.
△ Less
Submitted 7 February, 2003;
originally announced February 2003.
-
Required sample size for learning sparse Bayesian networks with many variables
Authors:
Pawel Wocjan,
Dominik Janzing,
Thomas Beth
Abstract:
Learning joint probability distributions on n random variables requires exponential sample size in the generic case. Here we consider the case that a temporal (or causal) order of the variables is known and that the (unknown) graph of causal dependencies has bounded in-degree Delta. Then the joint measure is uniquely determined by the probabilities of all (2 Delta+1)-tuples. Upper bounds on the…
▽ More
Learning joint probability distributions on n random variables requires exponential sample size in the generic case. Here we consider the case that a temporal (or causal) order of the variables is known and that the (unknown) graph of causal dependencies has bounded in-degree Delta. Then the joint measure is uniquely determined by the probabilities of all (2 Delta+1)-tuples. Upper bounds on the sample size required for estimating their probabilities can be given in terms of the VC-dimension of the set of corresponding cylinder sets. The sample size grows less than linearly with n.
△ Less
Submitted 26 April, 2002;
originally announced April 2002.