Search | arXiv e-print repository

Do Not Marginalize Mechanisms, Rather Consolidate!

Authors: Moritz Willig, Matej Zečević, Devendra Singh Dhami, Kristian Kersting

Abstract: Structural causal models (SCMs) are a powerful tool for understanding the complex causal relationships that underlie many real-world systems. As these systems grow in size, the number of variables and complexity of interactions between them does, too. Thus, becoming convoluted and difficult to analyze. This is particularly true in the context of machine learning and artificial intelligence, where… ▽ More Structural causal models (SCMs) are a powerful tool for understanding the complex causal relationships that underlie many real-world systems. As these systems grow in size, the number of variables and complexity of interactions between them does, too. Thus, becoming convoluted and difficult to analyze. This is particularly true in the context of machine learning and artificial intelligence, where an ever increasing amount of data demands for new methods to simplify and compress large scale SCM. While methods for marginalizing and abstracting SCM already exist today, they may destroy the causality of the marginalized model. To alleviate this, we introduce the concept of consolidating causal mechanisms to transform large-scale SCM while preserving consistent interventional behaviour. We show consolidation is a powerful method for simplifying SCM, discuss reduction of computational complexity and give a perspective on generalizing abilities of consolidated SCM. △ Less

Submitted 12 October, 2023; originally announced October 2023.

Comments: 19 pages, 8 figures

arXiv:2308.13067 [pdf, other]

Causal Parrots: Large Language Models May Talk Causality But Are Not Causal

Authors: Matej Zečević, Moritz Willig, Devendra Singh Dhami, Kristian Kersting

Abstract: Some argue scale is all what is needed to achieve AI, covering even causal models. We make it clear that large language models (LLMs) cannot be causal and give reason onto why sometimes we might feel otherwise. To this end, we define and exemplify a new subgroup of Structural Causal Model (SCM) that we call meta SCM which encode causal facts about other SCM within their variables. We conjecture th… ▽ More Some argue scale is all what is needed to achieve AI, covering even causal models. We make it clear that large language models (LLMs) cannot be causal and give reason onto why sometimes we might feel otherwise. To this end, we define and exemplify a new subgroup of Structural Causal Model (SCM) that we call meta SCM which encode causal facts about other SCM within their variables. We conjecture that in the cases where LLM succeed in doing causal inference, underlying was a respective meta SCM that exposed correlations between causal facts in natural language on whose data the LLM was ultimately trained. If our hypothesis holds true, then this would imply that LLMs are like parrots in that they simply recite the causal knowledge embedded in the data. Our empirical analysis provides favoring evidence that current LLMs are even weak `causal parrots.' △ Less

Submitted 24 August, 2023; originally announced August 2023.

Comments: Published in Transactions in Machine Learning Research (TMLR) (08/2023). Main paper: 17 pages, References: 3 pages, Appendix: 7 pages. Figures: 5 main, 3 appendix. Tables: 3 main

Journal ref: Transactions in Machine Learning Research (08/2023)

arXiv:2212.12575 [pdf, other]

Continual Causal Abstractions

Authors: Matej Zečević, Moritz Willig, Jonas Seng, Florian Peter Busch

Abstract: This short paper discusses continually updated causal abstractions as a potential direction of future research. The key idea is to revise the existing level of causal abstraction to a different level of detail that is both consistent with the history of observed data and more effective in solving a given task. This short paper discusses continually updated causal abstractions as a potential direction of future research. The key idea is to revise the existing level of causal abstraction to a different level of detail that is both consistent with the history of observed data and more effective in solving a given task. △ Less

Submitted 6 January, 2023; v1 submitted 23 December, 2022; originally announced December 2022.

Comments: Main paper: 3 pages, 1 figure. References: 1 page

arXiv:2212.12570 [pdf, other]

Pearl Causal Hierarchy on Image Data: Intricacies & Challenges

Authors: Matej Zečević, Moritz Willig, Devendra Singh Dhami, Kristian Kersting

Abstract: Many researchers have voiced their support towards Pearl's counterfactual theory of causation as a step** stone for AI/ML research's ultimate goal of intelligent systems. As in any other growing subfield, patience seems to be a virtue since significant progress on integrating notions from both fields takes time, yet, major challenges such as the lack of ground truth benchmarks or a unified persp… ▽ More Many researchers have voiced their support towards Pearl's counterfactual theory of causation as a step** stone for AI/ML research's ultimate goal of intelligent systems. As in any other growing subfield, patience seems to be a virtue since significant progress on integrating notions from both fields takes time, yet, major challenges such as the lack of ground truth benchmarks or a unified perspective on classical problems such as computer vision seem to hinder the momentum of the research movement. This present work exemplifies how the Pearl Causal Hierarchy (PCH) can be understood on image data by providing insights on several intricacies but also challenges that naturally arise when applying key concepts from Pearlian causality to the study of image data. △ Less

Submitted 23 December, 2022; originally announced December 2022.

Comments: Main paper: 9 pages, References: 2 pages. Main paper: 7 figures

arXiv:2212.12560 [pdf, other]

On How AI Needs to Change to Advance the Science of Drug Discovery

Authors: Kieran Didi, Matej Zečević

Abstract: Research around AI for Science has seen significant success since the rise of deep learning models over the past decade, even with longstanding challenges such as protein structure prediction. However, this fast development inevitably made their flaws apparent -- especially in domains of reasoning where understanding the cause-effect relationship is important. One such domain is drug discovery, in… ▽ More Research around AI for Science has seen significant success since the rise of deep learning models over the past decade, even with longstanding challenges such as protein structure prediction. However, this fast development inevitably made their flaws apparent -- especially in domains of reasoning where understanding the cause-effect relationship is important. One such domain is drug discovery, in which such understanding is required to make sense of data otherwise plagued by spurious correlations. Said spuriousness only becomes worse with the ongoing trend of ever-increasing amounts of data in the life sciences and thereby restricts researchers in their ability to understand disease biology and create better therapeutics. Therefore, to advance the science of drug discovery with AI it is becoming necessary to formulate the key problems in the language of causality, which allows the explication of modelling assumptions needed for identifying true cause-effect relationships. In this attention paper, we present causal drug discovery as the craft of creating models that ground the process of drug discovery in causal reasoning. △ Less

Submitted 23 December, 2022; originally announced December 2022.

Comments: Main paper: 6 pages, References: 1.5 pages. Main paper: 3 figures

arXiv:2206.10591 [pdf, other]

Can Foundation Models Talk Causality?

Authors: Moritz Willig, Matej Zečević, Devendra Singh Dhami, Kristian Kersting

Abstract: Foundation models are subject to an ongoing heated debate, leaving open the question of progress towards AGI and dividing the community into two camps: the ones who see the arguably impressive results as evidence to the scaling hypothesis, and the others who are worried about the lack of interpretability and reasoning capabilities. By investigating to which extent causal representations might be c… ▽ More Foundation models are subject to an ongoing heated debate, leaving open the question of progress towards AGI and dividing the community into two camps: the ones who see the arguably impressive results as evidence to the scaling hypothesis, and the others who are worried about the lack of interpretability and reasoning capabilities. By investigating to which extent causal representations might be captured by these large scale language models, we make a humble efforts towards resolving the ongoing philosophical conflicts. △ Less

Submitted 23 December, 2022; v1 submitted 14 June, 2022; originally announced June 2022.

Comments: Main paper: 6 pages, References: 1.5 pages, Supplement: 11.5 pages. Main paper: 4 figures, Supplement: 3 figures, 8 tables

arXiv:2206.07203 [pdf, other]

Attributions Beyond Neural Networks: The Linear Program Case

Authors: Florian Peter Busch, Matej Zečević, Kristian Kersting, Devendra Singh Dhami

Abstract: Linear Programs (LPs) have been one of the building blocks in machine learning and have championed recent strides in differentiable optimizers for learning systems. While there exist solvers for even high-dimensional LPs, understanding said high-dimensional solutions poses an orthogonal and unresolved problem. We introduce an approach where we consider neural encodings for LPs that justify the app… ▽ More Linear Programs (LPs) have been one of the building blocks in machine learning and have championed recent strides in differentiable optimizers for learning systems. While there exist solvers for even high-dimensional LPs, understanding said high-dimensional solutions poses an orthogonal and unresolved problem. We introduce an approach where we consider neural encodings for LPs that justify the application of attribution methods from explainable artificial intelligence (XAI) designed for neural learning systems. The several encoding functions we propose take into account aspects such as feasibility of the decision space, the cost attached to each input, or the distance to special points of interest. We investigate the mathematical consequences of several XAI methods on said neural LP encodings. We empirically show that the attribution methods Saliency and LIME reveal indistinguishable results up to perturbation levels, and we propose the property of Directedness as the main discriminative criterion between Saliency and LIME on one hand, and a perturbation-based Feature Permutation approach on the other hand. Directedness indicates whether an attribution method gives feature attributions with respect to an increase of that feature. We further notice the baseline selection problem beyond the classical computer vision setting for Integrated Gradients. △ Less

Submitted 14 June, 2022; originally announced June 2022.

Comments: Main paper: 9.5 pages, References: 2 pages, Supplement: 2.5 pages. Main paper: 5 figures, 2 tables, Supplement: 1 figure

arXiv:2206.07196 [pdf, other]

Towards a Solution to Bongard Problems: A Causal Approach

Authors: Salahedine Youssef, Matej Zečević, Devendra Singh Dhami, Kristian Kersting

Abstract: Even though AI has advanced rapidly in recent years displaying success in solving highly complex problems, the class of Bongard Problems (BPs) yet remain largely unsolved by modern ML techniques. In this paper, we propose a new approach in an attempt to not only solve BPs but also extract meaning out of learned representations. This includes the reformulation of the classical BP into a reinforceme… ▽ More Even though AI has advanced rapidly in recent years displaying success in solving highly complex problems, the class of Bongard Problems (BPs) yet remain largely unsolved by modern ML techniques. In this paper, we propose a new approach in an attempt to not only solve BPs but also extract meaning out of learned representations. This includes the reformulation of the classical BP into a reinforcement learning (RL) setting which will allow the model to gain access to counterfactuals to guide its decisions but also explain its decisions. Since learning meaningful representations in BPs is an essential sub-problem, we further make use of contrastive learning for the extraction of low level features from pixel data. Several experiments have been conducted for analyzing the general BP-RL setup, feature extraction methods and using the best combination for the feature space analysis and its interpretation. △ Less

Submitted 23 December, 2022; v1 submitted 14 June, 2022; originally announced June 2022.

Comments: Main paper: 12 pages, References: 2 pages, Supplement: 3 pages. Main paper: 9 figures, Supplement: 3 figures

arXiv:2206.07195 [pdf, other]

Tearing Apart NOTEARS: Controlling the Graph Prediction via Variance Manipulation

Authors: Jonas Seng, Matej Zečević, Devendra Singh Dhami, Kristian Kersting

Abstract: Simulations are ubiquitous in machine learning. Especially in graph learning, simulations of Directed Acyclic Graphs (DAG) are being deployed for evaluating new algorithms. In the literature, it was recently argued that continuous-optimization approaches to structure discovery such as NOTEARS might be exploiting the sortability of the variable's variances in the available data due to their use of… ▽ More Simulations are ubiquitous in machine learning. Especially in graph learning, simulations of Directed Acyclic Graphs (DAG) are being deployed for evaluating new algorithms. In the literature, it was recently argued that continuous-optimization approaches to structure discovery such as NOTEARS might be exploiting the sortability of the variable's variances in the available data due to their use of least square losses. Specifically, since structure discovery is a key problem in science and beyond, we want to be invariant to the scale being used for measuring our data (e.g. meter versus centimeter should not affect the causal direction inferred by the algorithm). In this work, we further strengthen this initial, negative empirical suggestion by both proving key results in the multivariate case and corroborating with further empirical evidence. In particular, we show that we can control the resulting graph with our targeted variance attacks, even in the case where we can only partially manipulate the variances of the data. △ Less

Submitted 14 June, 2022; originally announced June 2022.

Comments: Main paper: 5.5 pages, References: 1 page, Supplement: 2 pages. Main paper: 3 figures, Supplement: 1 figure, 1 table

arXiv:2206.07194 [pdf, other]

Machines Explaining Linear Programs

Authors: David Steinmann, Matej Zečević, Devendra Singh Dhami, Kristian Kersting

Abstract: There has been a recent push in making machine learning models more interpretable so that their performance can be trusted. Although successful, these methods have mostly focused on the deep learning methods while the fundamental optimization methods in machine learning such as linear programs (LP) have been left out. Even if LPs can be considered as whitebox or clearbox models, they are not easy… ▽ More There has been a recent push in making machine learning models more interpretable so that their performance can be trusted. Although successful, these methods have mostly focused on the deep learning methods while the fundamental optimization methods in machine learning such as linear programs (LP) have been left out. Even if LPs can be considered as whitebox or clearbox models, they are not easy to understand in terms of relationships between inputs and outputs. As a linear program only provides the optimal solution to an optimization problem, further explanations are often helpful. In this work, we extend the attribution methods for explaining neural networks to linear programs. These methods explain the model by providing relevance scores for the model inputs, to show the influence of each input on the output. Alongside using classical gradient-based attribution methods we also propose a way to adapt perturbation-based attribution methods to LPs. Our evaluations of several different linear and integer problems showed that attribution methods can generate useful explanations for linear programs. However, we also demonstrate that using a neural attribution method directly might come with some drawbacks, as the properties of these methods on neural networks do not necessarily transfer to linear programs. The methods can also struggle if a linear program has more than one optimal solution, as a solver just returns one possible solution. Our results can hopefully be used as a good starting point for further research in this direction. △ Less

Submitted 14 June, 2022; originally announced June 2022.

Comments: Main paper: 9.5 pages, References: 2.5 pages, Supplement: 6 pages. Main paper: 5 figures, 4 tables, Supplement: 3 figures, 6 tables

arXiv:2203.15274 [pdf, other]

Finding Structure and Causality in Linear Programs

Authors: Matej Zečević, Florian Peter Busch, Devendra Singh Dhami, Kristian Kersting

Abstract: Linear Programs (LP) are celebrated widely, particularly so in machine learning where they have allowed for effectively solving probabilistic inference tasks or imposing structure on end-to-end learning systems. Their potential might seem depleted but we propose a foundational, causal perspective that reveals intriguing intra- and inter-structure relations for LP components. We conduct a systemati… ▽ More Linear Programs (LP) are celebrated widely, particularly so in machine learning where they have allowed for effectively solving probabilistic inference tasks or imposing structure on end-to-end learning systems. Their potential might seem depleted but we propose a foundational, causal perspective that reveals intriguing intra- and inter-structure relations for LP components. We conduct a systematic, empirical investigation on general-, shortest path- and energy system LPs. △ Less

Submitted 29 March, 2022; originally announced March 2022.

Comments: Main paper: 5 pages, References: 2 pages, Appendix: 1 page. Figures: 8 main, 1 appendix. Tables: 1 appendix

arXiv:2110.12066 [pdf, other]

The Causal Loss: Driving Correlation to Imply Causation

Authors: Moritz Willig, Matej Zečević, Devendra Singh Dhami, Kristian Kersting

Abstract: Most algorithms in classical and contemporary machine learning focus on correlation-based dependence between features to drive performance. Although success has been observed in many relevant problems, these algorithms fail when the underlying causality is inconsistent with the assumed relations. We propose a novel model-agnostic loss function called Causal Loss that improves the interventional qu… ▽ More Most algorithms in classical and contemporary machine learning focus on correlation-based dependence between features to drive performance. Although success has been observed in many relevant problems, these algorithms fail when the underlying causality is inconsistent with the assumed relations. We propose a novel model-agnostic loss function called Causal Loss that improves the interventional quality of the prediction using an intervened neural-causal regularizer. In support of our theoretical results, our experimental illustration shows how causal loss bestows a non-causal associative model (like a standard neural net or decision tree) with interventional capabilities. △ Less

Submitted 22 October, 2021; originally announced October 2021.

Comments: Main paper: 8 pages, References: 2 pages, Appendix: 3 pages. Figures: 4 main, 4 appendix. Tables: 2 main

arXiv:2110.12052 [pdf, other]

A Taxonomy for Inference in Causal Model Families

Authors: Matej Zečević, Devendra Singh Dhami, Kristian Kersting

Abstract: Neurally-parameterized Structural Causal Models in the Pearlian notion to causality, referred to as NCM, were recently introduced as a step towards next-generation learning systems. However, said NCM are only concerned with the learning aspect of causal inference but totally miss out on the architecture aspect. That is, actual causal inference within NCM is intractable in that the NCM won't return… ▽ More Neurally-parameterized Structural Causal Models in the Pearlian notion to causality, referred to as NCM, were recently introduced as a step towards next-generation learning systems. However, said NCM are only concerned with the learning aspect of causal inference but totally miss out on the architecture aspect. That is, actual causal inference within NCM is intractable in that the NCM won't return an answer to a query in polynomial time. This insight follows as corollary to the more general statement on the intractability of arbitrary SCM parameterizations, which we prove in this work through classical 3-SAT reduction. Since future learning algorithms will be required to deal with both high dimensional data and highly complex mechanisms governing the data, we ultimately believe work on tractable inference for causality to be decisive. We also show that not all ``causal'' models are created equal. More specifically, there are models capable of answering causal queries that are not SCM, which we refer to as \emph{partially causal models} (PCM). We provide a tabular taxonomy in terms of tractability properties for all of the different model families, namely correlation-based, PCM and SCM. To conclude our work, we also provide some initial ideas on how to overcome parts of the intractability of causal inference with SCM by showing an example of how parameterizing an SCM with SPN modules can at least allow for tractable mechanisms. We hope that our impossibility result alongside the taxonomy for tractability in causal models can raise awareness for this novel research direction since achieving success with causality in real world downstream tasks will not only depend on learning correct models as we also require having the practical ability to gain access to model inferences. △ Less

Submitted 23 December, 2022; v1 submitted 22 October, 2021; originally announced October 2021.

Comments: Main paper: 12 pages, References: 3 pages, Appendix: 4 pages. Figures: 3 main, 2 appendix

arXiv:2110.02395 [pdf, other]

Causal Explanations of Structural Causal Models

Authors: Matej Zečević, Devendra Singh Dhami, Constantin A. Rothkopf, Kristian Kersting

Abstract: In explanatory interactive learning (XIL) the user queries the learner, then the learner explains its answer to the user and finally the loop repeats. XIL is attractive for two reasons, (1) the learner becomes better and (2) the user's trust increases. For both reasons to hold, the learner's explanations must be useful to the user and the user must be allowed to ask useful questions. Ideally, both… ▽ More In explanatory interactive learning (XIL) the user queries the learner, then the learner explains its answer to the user and finally the loop repeats. XIL is attractive for two reasons, (1) the learner becomes better and (2) the user's trust increases. For both reasons to hold, the learner's explanations must be useful to the user and the user must be allowed to ask useful questions. Ideally, both questions and explanations should be grounded in a causal model since they avoid spurious fallacies. Ultimately, we seem to seek a causal variant of XIL. The question part on the user's end we believe to be solved since the user's mental model can provide the causal model. But how would the learner provide causal explanations? In this work we show that existing explanation methods are not guaranteed to be causal even when provided with a Structural Causal Model (SCM). Specifically, we use the popular, proclaimed causal explanation method CXPlain to illustrate how the generated explanations leave open the question of truly causal explanations. Thus as a step towards causal XIL, we propose a solution to the lack of causal explanations. We solve this problem by deriving from first principles an explanation method that makes full use of a given SCM, which we refer to as SC$\textbf{E}$ ($\textbf{E}$ standing for explanation). Since SCEs make use of structural information, any causal graph learner can now provide human-readable explanations. We conduct several experiments including a user study with 22 participants to investigate the virtue of SCE as causal explanations of SCMs. △ Less

Submitted 23 December, 2022; v1 submitted 5 October, 2021; originally announced October 2021.

Comments: Main paper: 9 pages, References: 2.5 pages, Supplement: 12 pages. Main paper: 4 figures, Supplement: 6 figures, 2 tables

arXiv:2109.04173 [pdf, other]

Relating Graph Neural Networks to Structural Causal Models

Authors: Matej Zečević, Devendra Singh Dhami, Petar Veličković, Kristian Kersting

Abstract: Causality can be described in terms of a structural causal model (SCM) that carries information on the variables of interest and their mechanistic relations. For most processes of interest the underlying SCM will only be partially observable, thus causal inference tries leveraging the exposed. Graph neural networks (GNN) as universal approximators on structured input pose a viable candidate for ca… ▽ More Causality can be described in terms of a structural causal model (SCM) that carries information on the variables of interest and their mechanistic relations. For most processes of interest the underlying SCM will only be partially observable, thus causal inference tries leveraging the exposed. Graph neural networks (GNN) as universal approximators on structured input pose a viable candidate for causal learning, suggesting a tighter integration with SCM. To this effect we present a theoretical analysis from first principles that establishes a more general view on neural-causal models, revealing several novel connections between GNN and SCM. We establish a new model class for GNN-based causal inference that is necessary and sufficient for causal effect identification. Our empirical illustration on simulations and standard benchmarks validate our theoretical proofs. △ Less

Submitted 22 October, 2021; v1 submitted 9 September, 2021; originally announced September 2021.

Comments: Main paper: 12 pages, References: 2 pages, Appendix: 13 pages; Main paper: 4 figures, Appendix: 2 figures

arXiv:2105.12697 [pdf, other]

Structural Causal Models Reveal Confounder Bias in Linear Program Modelling

Authors: Matej Zečević, Devendra Singh Dhami, Kristian Kersting

Abstract: The recent years have been marked by extended research on adversarial attacks, especially on deep neural networks. With this work we intend on posing and investigating the question of whether the phenomenon might be more general in nature, that is, adversarial-style attacks outside classical classification tasks. Specifically, we investigate optimization problems as they constitute a fundamental p… ▽ More The recent years have been marked by extended research on adversarial attacks, especially on deep neural networks. With this work we intend on posing and investigating the question of whether the phenomenon might be more general in nature, that is, adversarial-style attacks outside classical classification tasks. Specifically, we investigate optimization problems as they constitute a fundamental part of modern AI research. To this end, we consider the base class of optimizers namely Linear Programs (LPs). On our initial attempt of a naïve map** between the formalism of adversarial examples and LPs, we quickly identify the key ingredients missing for making sense of a reasonable notion of adversarial examples for LPs. Intriguingly, the formalism of Pearl's notion to causality allows for the right description of adversarial like examples for LPs. Characteristically, we show the direct influence of the Structural Causal Model (SCM) onto the subsequent LP optimization, which ultimately exposes a notion of confounding in LPs (inherited by said SCM) that allows for adversarial-style attacks. We provide both the general proof formally alongside existential proofs of such intriguing LP-parameterizations based on SCM for three combinatorial problems, namely Linear Assignment, Shortest Path and a real world problem of energy systems. △ Less

Submitted 7 November, 2023; v1 submitted 26 May, 2021; originally announced May 2021.

Comments: Published at the 15th Asian Conference on Machine Learning (ACML 2023) Journal Track. Main paper: 19 pages, References: 2 pages, Supplement: .5 page. Main paper: 3 figures, 3 tables, Supplement: 1 table

arXiv:2102.10440 [pdf, other]

Interventional Sum-Product Networks: Causal Inference with Tractable Probabilistic Models

Authors: Matej Zečević, Devendra Singh Dhami, Athresh Karanam, Sriraam Natarajan, Kristian Kersting

Abstract: While probabilistic models are an important tool for studying causality, doing so suffers from the intractability of inference. As a step towards tractable causal models, we consider the problem of learning interventional distributions using sum-product networks (SPNs) that are over-parameterized by gate functions, e.g., neural networks. Providing an arbitrarily intervened causal graph as input, e… ▽ More While probabilistic models are an important tool for studying causality, doing so suffers from the intractability of inference. As a step towards tractable causal models, we consider the problem of learning interventional distributions using sum-product networks (SPNs) that are over-parameterized by gate functions, e.g., neural networks. Providing an arbitrarily intervened causal graph as input, effectively subsuming Pearl's do-operator, the gate function predicts the parameters of the SPN. The resulting interventional SPNs are motivated and illustrated by a structural causal model themed around personal health. Our empirical evaluation on three benchmark data sets as well as a synthetic health data set clearly demonstrates that interventional SPNs indeed are both expressive in modelling and flexible in adapting to the interventions. △ Less

Submitted 25 October, 2021; v1 submitted 20 February, 2021; originally announced February 2021.

Comments: Main paper: 10 pages, References: 3 pages, Appendix: 8 pages. Main paper: 6 figures, Appendix: 5 figures

Showing 1–17 of 17 results for author: Zečević, M