Search | arXiv e-print repository

arXiv:1912.02724 [pdf, other]

Causal structure based root cause analysis of outliers

Authors: Dominik Janzing, Kailash Budhathoki, Lenon Minorics, Patrick Blöbaum

Abstract: We describe a formal approach to identify 'root causes' of outliers observed in $n$ variables $X_1,\dots,X_n$ in a scenario where the causal relation between the variables is a known directed acyclic graph (DAG). To this end, we first introduce a systematic way to define outlier scores. Further, we introduce the concept of 'conditional outlier score' which measures whether a value of some variable… ▽ More We describe a formal approach to identify 'root causes' of outliers observed in $n$ variables $X_1,\dots,X_n$ in a scenario where the causal relation between the variables is a known directed acyclic graph (DAG). To this end, we first introduce a systematic way to define outlier scores. Further, we introduce the concept of 'conditional outlier score' which measures whether a value of some variable is unexpected *given the value of its parents* in the DAG, if one were to assume that the causal structure and the corresponding conditional distributions are also valid for the anomaly. Finally, we quantify to what extent the high outlier score of some target variable can be attributed to outliers of its ancestors. This quantification is defined via Shapley values from cooperative game theory. △ Less

Submitted 5 December, 2019; originally announced December 2019.

Comments: 11 pages, 9 Figures

arXiv:1804.03911 [pdf, ps, other]

Structural causal models for macro-variables in time-series

Authors: Dominik Janzing, Paul Rubenstein, Bernhard Schölkopf

Abstract: We consider a bivariate time series $(X_t,Y_t)$ that is given by a simple linear autoregressive model. Assuming that the equations describing each variable as a linear combination of past values are considered structural equations, there is a clear meaning of how intervening on one particular $X_t$ influences $Y_{t'}$ at later times $t'>t$. In the present work, we describe conditions under which o… ▽ More We consider a bivariate time series $(X_t,Y_t)$ that is given by a simple linear autoregressive model. Assuming that the equations describing each variable as a linear combination of past values are considered structural equations, there is a clear meaning of how intervening on one particular $X_t$ influences $Y_{t'}$ at later times $t'>t$. In the present work, we describe conditions under which one can define a causal model between variables that are coarse-grained in time, thus admitting statements like `setting $X$ to $x$ changes $Y$ in a certain way' without referring to specific time instances. We show that particularly simple statements follow in the frequency domain, thus providing meaning to interventions on frequencies. △ Less

Submitted 11 April, 2018; originally announced April 2018.

Comments: 8 pages

arXiv:1804.03206 [pdf, other]

Merging joint distributions via causal model classes with low VC dimension

Authors: Dominik Janzing

Abstract: If $X,Y,Z$ denote sets of random variables, two different data sources may contain samples from $P_{X,Y}$ and $P_{Y,Z}$, respectively. We argue that causal inference can help inferring properties of the 'unobserved joint distributions' $P_{X,Y,Z}$ or $P_{X,Z}$. The properties may be conditional independences or also quantitative statements about dependences. More generally, we define a learning… ▽ More If $X,Y,Z$ denote sets of random variables, two different data sources may contain samples from $P_{X,Y}$ and $P_{Y,Z}$, respectively. We argue that causal inference can help inferring properties of the 'unobserved joint distributions' $P_{X,Y,Z}$ or $P_{X,Z}$. The properties may be conditional independences or also quantitative statements about dependences. More generally, we define a learning scenario where the input is a subset of variables and the label is some statistical property of that subset. Sets of jointly observed variables define the training points, while unobserved sets are possible test points. To solve this learning task, we infer, as an intermediate step, a causal model from the observations that then entails properties of unobserved sets. Accordingly, we can define the VC dimension of a class of causal models and derive generalization bounds for the predictions. Here, causal inference becomes more modest and better accessible to empirical tests than usual: rather than trying to find a causal hypothesis that is 'true' (which is a problematic term when it is unclear how to define interventions) a causal hypothesis is useful whenever it correctly predicts statistical properties of unobserved joint distributions. Within such a 'pragmatic' application of causal inference, some popular heuristic approaches become justified in retrospect. It is, for instance, allowed to infer DAGs from partial correlations instead of conditional independences if the DAGs are only used to predict partial correlations. I hypothesize that our pragmatic view on causality may even cover the usual meaning in terms of interventions and sketch why predicting the impact of interventions can sometimes also be phrased as a task of the above type. △ Less

Submitted 17 May, 2018; v1 submitted 9 April, 2018; originally announced April 2018.

Comments: 21 pages, two errors in V1 corrected

arXiv:1707.06819 [pdf, ps, other]

A central limit like theorem for Fourier sums

Authors: Dominik Janzing, Naji Shajarisales, Michel Besserve

Abstract: We consider the probability distributions of values in the complex plane attained by Fourier sums of the form \sum_{j=1}^n a_j exp(-2πi j nu) /sqrt{n} when the frequency nu is drawn uniformly at random from an interval of length 1. If the coefficients a_j are i.i.d. drawn with finite third moment, the distance of these distributions to an isotropic two-dimensional Gaussian on C converges in probab… ▽ More We consider the probability distributions of values in the complex plane attained by Fourier sums of the form \sum_{j=1}^n a_j exp(-2πi j nu) /sqrt{n} when the frequency nu is drawn uniformly at random from an interval of length 1. If the coefficients a_j are i.i.d. drawn with finite third moment, the distance of these distributions to an isotropic two-dimensional Gaussian on C converges in probability to zero for any pseudometric on the set of distributions for which the distance between empirical distributions and the underlying distribution converges to zero in probability. △ Less

Submitted 21 July, 2017; originally announced July 2017.

Comments: 7 pages

MSC Class: 60Fxx

arXiv:1705.02212 [pdf, other]

Group invariance principles for causal generative models

Authors: Michel Besserve, Naji Shajarisales, Bernhard Schölkopf, Dominik Janzing

Abstract: The postulate of independence of cause and mechanism (ICM) has recently led to several new causal discovery algorithms. The interpretation of independence and the way it is utilized, however, varies across these methods. Our aim in this paper is to propose a group theoretic framework for ICM to unify and generalize these approaches. In our setting, the cause-mechanism relationship is assessed by c… ▽ More The postulate of independence of cause and mechanism (ICM) has recently led to several new causal discovery algorithms. The interpretation of independence and the way it is utilized, however, varies across these methods. Our aim in this paper is to propose a group theoretic framework for ICM to unify and generalize these approaches. In our setting, the cause-mechanism relationship is assessed by comparing it against a null hypothesis through the application of random generic group transformations. We show that the group theoretic view provides a very general tool to study the structure of data generating mechanisms with direct applications to machine learning. △ Less

Submitted 5 May, 2017; originally announced May 2017.

Comments: 16 pages, 6 figures

ACM Class: I.2.6; I.2.10; G.3; I.5.3

arXiv:1512.02057 [pdf, other]

doi 10.1088/1367-2630/18/9/093052

Algorithmic independence of initial condition and dynamical law in thermodynamics and causal inference

Authors: Dominik Janzing, Rafael Chaves, Bernhard Schoelkopf

Abstract: We postulate a principle stating that the initial condition of a physical system is typically algorithmically independent of the dynamical law. We argue that this links thermodynamics and causal inference. On the one hand, it entails behaviour that is similar to the usual arrow of time. On the other hand, it motivates a statistical asymmetry between cause and effect that has recently postulated in… ▽ More We postulate a principle stating that the initial condition of a physical system is typically algorithmically independent of the dynamical law. We argue that this links thermodynamics and causal inference. On the one hand, it entails behaviour that is similar to the usual arrow of time. On the other hand, it motivates a statistical asymmetry between cause and effect that has recently postulated in the field of causal inference, namely, that the probability distribution P(cause) contains no information about the conditional distribution P(effect|cause) and vice versa, while P(effect) may contain information about P(cause|effect). △ Less

Submitted 7 December, 2015; originally announced December 2015.

Comments: 7 pages, latex, 2 figures

Journal ref: New J. Phys. 18, 093052 (2016)

arXiv:1203.6502 [pdf, ps, other]

doi 10.1214/13-AOS1145

Quantifying causal influences

Authors: Dominik Janzing, David Balduzzi, Moritz Grosse-Wentrup, Bernhard Schölkopf

Abstract: Many methods for causal inference generate directed acyclic graphs (DAGs) that formalize causal relations between $n$ variables. Given the joint distribution on all these variables, the DAG contains all information about how intervening on one variable changes the distribution of the other $n-1$ variables. However, quantifying the causal influence of one variable on another one remains a nontrivia… ▽ More Many methods for causal inference generate directed acyclic graphs (DAGs) that formalize causal relations between $n$ variables. Given the joint distribution on all these variables, the DAG contains all information about how intervening on one variable changes the distribution of the other $n-1$ variables. However, quantifying the causal influence of one variable on another one remains a nontrivial question. Here we propose a set of natural, intuitive postulates that a measure of causal strength should satisfy. We then introduce a communication scenario, where edges in a DAG play the role of channels that can be locally corrupted by interventions. Causal strength is then the relative entropy distance between the old and the new distribution. Many other measures of causal strength have been proposed, including average causal effect, transfer entropy, directed information, and information flow. We explain how they fail to satisfy the postulates on simple DAGs of $\leq3$ nodes. Finally, we investigate the behavior of our measure on time-series, supporting our claims with experiments on simulated data. △ Less

Submitted 28 January, 2014; v1 submitted 29 March, 2012; originally announced March 2012.

Comments: Published in at http://dx.doi.org/10.1214/13-AOS1145 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS1145

Journal ref: Annals of Statistics 2013, Vol. 41, No. 5, 2324-2358

arXiv:0804.3678 [pdf, ps, other]

Causal inference using the algorithmic Markov condition

Authors: Dominik Janzing, Bernhard Schoelkopf

Abstract: Inferring the causal structure that links n observables is usually based upon detecting statistical dependences and choosing simple graphs that make the joint measure Markovian. Here we argue why causal inference is also possible when only single observations are present. We develop a theory how to generate causal graphs explaining similarities between single objects. To this end, we replace t… ▽ More Inferring the causal structure that links n observables is usually based upon detecting statistical dependences and choosing simple graphs that make the joint measure Markovian. Here we argue why causal inference is also possible when only single observations are present. We develop a theory how to generate causal graphs explaining similarities between single objects. To this end, we replace the notion of conditional stochastic independence in the causal Markov condition with the vanishing of conditional algorithmic mutual information and describe the corresponding causal inference rules. We explain why a consistent reformulation of causal inference in terms of algorithmic complexity implies a new inference principle that takes into account also the complexity of conditional probability densities, making it possible to select among Markov equivalent causal graphs. This insight provides a theoretical foundation of a heuristic principle proposed in earlier work. We also discuss how to replace Kolmogorov complexity with decidable complexity criteria. This can be seen as an algorithmic analog of replacing the empirically undecidable question of statistical independence with practical independence tests that are based on implicit or explicit assumptions on the underlying distribution. △ Less

Submitted 23 April, 2008; originally announced April 2008.

Comments: 16 figures

MSC Class: 62A01

arXiv:math/0302079 [pdf, ps, other]

Selection Criterion for Log-Linear Models Using Statistical Learning Theory

Authors: Daniel Herrmann, Dominik Janzing

Abstract: Log-linear models are a well-established method for describing statistical dependencies among a set of n random variables. The observed frequencies of the n-tuples are explained by a joint probability such that its logarithm is a sum of functions, where each function depends on as few variables as possible. We obtain for this class a new model selection criterion using nonasymptotic concepts of… ▽ More Log-linear models are a well-established method for describing statistical dependencies among a set of n random variables. The observed frequencies of the n-tuples are explained by a joint probability such that its logarithm is a sum of functions, where each function depends on as few variables as possible. We obtain for this class a new model selection criterion using nonasymptotic concepts of statistical learning theory. We calculate the VC dimension for the class of k-factor log-linear models. In this way we are not only able to select the model with the appropriate complexity, but obtain also statements on the reliability of the estimated probability distribution. Furthermore we show that the selection of the best model among a set of models with the same complexity can be written as a convex optimization problem. △ Less

Submitted 7 February, 2003; originally announced February 2003.

Comments: 19 pages, no figure, latex

MSC Class: 62H17; 62H15; 62H12 (Primary) 62G10; 62G07; 62G15 (Secondary)

arXiv:cs/0204052 [pdf, ps, other]

Required sample size for learning sparse Bayesian networks with many variables

Authors: Pawel Wocjan, Dominik Janzing, Thomas Beth

Abstract: Learning joint probability distributions on n random variables requires exponential sample size in the generic case. Here we consider the case that a temporal (or causal) order of the variables is known and that the (unknown) graph of causal dependencies has bounded in-degree Delta. Then the joint measure is uniquely determined by the probabilities of all (2 Delta+1)-tuples. Upper bounds on the… ▽ More Learning joint probability distributions on n random variables requires exponential sample size in the generic case. Here we consider the case that a temporal (or causal) order of the variables is known and that the (unknown) graph of causal dependencies has bounded in-degree Delta. Then the joint measure is uniquely determined by the probabilities of all (2 Delta+1)-tuples. Upper bounds on the sample size required for estimating their probabilities can be given in terms of the VC-dimension of the set of corresponding cylinder sets. The sample size grows less than linearly with n. △ Less

Submitted 26 April, 2002; originally announced April 2002.

Comments: 9 pages

ACM Class: I.2.6

Showing 1–10 of 10 results for author: Janzing, D