Search | arXiv e-print repository

Do Not Marginalize Mechanisms, Rather Consolidate!

Authors: Moritz Willig, Matej Zečević, Devendra Singh Dhami, Kristian Kersting

Abstract: Structural causal models (SCMs) are a powerful tool for understanding the complex causal relationships that underlie many real-world systems. As these systems grow in size, the number of variables and complexity of interactions between them does, too. Thus, becoming convoluted and difficult to analyze. This is particularly true in the context of machine learning and artificial intelligence, where… ▽ More Structural causal models (SCMs) are a powerful tool for understanding the complex causal relationships that underlie many real-world systems. As these systems grow in size, the number of variables and complexity of interactions between them does, too. Thus, becoming convoluted and difficult to analyze. This is particularly true in the context of machine learning and artificial intelligence, where an ever increasing amount of data demands for new methods to simplify and compress large scale SCM. While methods for marginalizing and abstracting SCM already exist today, they may destroy the causality of the marginalized model. To alleviate this, we introduce the concept of consolidating causal mechanisms to transform large-scale SCM while preserving consistent interventional behaviour. We show consolidation is a powerful method for simplifying SCM, discuss reduction of computational complexity and give a perspective on generalizing abilities of consolidated SCM. △ Less

Submitted 12 October, 2023; originally announced October 2023.

Comments: 19 pages, 8 figures

arXiv:2308.13067 [pdf, other]

Causal Parrots: Large Language Models May Talk Causality But Are Not Causal

Authors: Matej Zečević, Moritz Willig, Devendra Singh Dhami, Kristian Kersting

Abstract: Some argue scale is all what is needed to achieve AI, covering even causal models. We make it clear that large language models (LLMs) cannot be causal and give reason onto why sometimes we might feel otherwise. To this end, we define and exemplify a new subgroup of Structural Causal Model (SCM) that we call meta SCM which encode causal facts about other SCM within their variables. We conjecture th… ▽ More Some argue scale is all what is needed to achieve AI, covering even causal models. We make it clear that large language models (LLMs) cannot be causal and give reason onto why sometimes we might feel otherwise. To this end, we define and exemplify a new subgroup of Structural Causal Model (SCM) that we call meta SCM which encode causal facts about other SCM within their variables. We conjecture that in the cases where LLM succeed in doing causal inference, underlying was a respective meta SCM that exposed correlations between causal facts in natural language on whose data the LLM was ultimately trained. If our hypothesis holds true, then this would imply that LLMs are like parrots in that they simply recite the causal knowledge embedded in the data. Our empirical analysis provides favoring evidence that current LLMs are even weak `causal parrots.' △ Less

Submitted 24 August, 2023; originally announced August 2023.

Comments: Published in Transactions in Machine Learning Research (TMLR) (08/2023). Main paper: 17 pages, References: 3 pages, Appendix: 7 pages. Figures: 5 main, 3 appendix. Tables: 3 main

Journal ref: Transactions in Machine Learning Research (08/2023)

arXiv:2212.12575 [pdf, other]

Continual Causal Abstractions

Authors: Matej Zečević, Moritz Willig, Jonas Seng, Florian Peter Busch

Abstract: This short paper discusses continually updated causal abstractions as a potential direction of future research. The key idea is to revise the existing level of causal abstraction to a different level of detail that is both consistent with the history of observed data and more effective in solving a given task. This short paper discusses continually updated causal abstractions as a potential direction of future research. The key idea is to revise the existing level of causal abstraction to a different level of detail that is both consistent with the history of observed data and more effective in solving a given task. △ Less

Submitted 6 January, 2023; v1 submitted 23 December, 2022; originally announced December 2022.

Comments: Main paper: 3 pages, 1 figure. References: 1 page

arXiv:2212.12570 [pdf, other]

Pearl Causal Hierarchy on Image Data: Intricacies & Challenges

Authors: Matej Zečević, Moritz Willig, Devendra Singh Dhami, Kristian Kersting

Abstract: Many researchers have voiced their support towards Pearl's counterfactual theory of causation as a step** stone for AI/ML research's ultimate goal of intelligent systems. As in any other growing subfield, patience seems to be a virtue since significant progress on integrating notions from both fields takes time, yet, major challenges such as the lack of ground truth benchmarks or a unified persp… ▽ More Many researchers have voiced their support towards Pearl's counterfactual theory of causation as a step** stone for AI/ML research's ultimate goal of intelligent systems. As in any other growing subfield, patience seems to be a virtue since significant progress on integrating notions from both fields takes time, yet, major challenges such as the lack of ground truth benchmarks or a unified perspective on classical problems such as computer vision seem to hinder the momentum of the research movement. This present work exemplifies how the Pearl Causal Hierarchy (PCH) can be understood on image data by providing insights on several intricacies but also challenges that naturally arise when applying key concepts from Pearlian causality to the study of image data. △ Less

Submitted 23 December, 2022; originally announced December 2022.

Comments: Main paper: 9 pages, References: 2 pages. Main paper: 7 figures

arXiv:2206.10591 [pdf, other]

Can Foundation Models Talk Causality?

Authors: Moritz Willig, Matej Zečević, Devendra Singh Dhami, Kristian Kersting

Abstract: Foundation models are subject to an ongoing heated debate, leaving open the question of progress towards AGI and dividing the community into two camps: the ones who see the arguably impressive results as evidence to the scaling hypothesis, and the others who are worried about the lack of interpretability and reasoning capabilities. By investigating to which extent causal representations might be c… ▽ More Foundation models are subject to an ongoing heated debate, leaving open the question of progress towards AGI and dividing the community into two camps: the ones who see the arguably impressive results as evidence to the scaling hypothesis, and the others who are worried about the lack of interpretability and reasoning capabilities. By investigating to which extent causal representations might be captured by these large scale language models, we make a humble efforts towards resolving the ongoing philosophical conflicts. △ Less

Submitted 23 December, 2022; v1 submitted 14 June, 2022; originally announced June 2022.

Comments: Main paper: 6 pages, References: 1.5 pages, Supplement: 11.5 pages. Main paper: 4 figures, Supplement: 3 figures, 8 tables

arXiv:2110.12066 [pdf, other]

The Causal Loss: Driving Correlation to Imply Causation

Authors: Moritz Willig, Matej Zečević, Devendra Singh Dhami, Kristian Kersting

Abstract: Most algorithms in classical and contemporary machine learning focus on correlation-based dependence between features to drive performance. Although success has been observed in many relevant problems, these algorithms fail when the underlying causality is inconsistent with the assumed relations. We propose a novel model-agnostic loss function called Causal Loss that improves the interventional qu… ▽ More Most algorithms in classical and contemporary machine learning focus on correlation-based dependence between features to drive performance. Although success has been observed in many relevant problems, these algorithms fail when the underlying causality is inconsistent with the assumed relations. We propose a novel model-agnostic loss function called Causal Loss that improves the interventional quality of the prediction using an intervened neural-causal regularizer. In support of our theoretical results, our experimental illustration shows how causal loss bestows a non-causal associative model (like a standard neural net or decision tree) with interventional capabilities. △ Less

Submitted 22 October, 2021; originally announced October 2021.

Comments: Main paper: 8 pages, References: 2 pages, Appendix: 3 pages. Figures: 4 main, 4 appendix. Tables: 2 main

arXiv:1908.06660 [pdf, other]

doi 10.3389/frai.2020.00024

Learning to play the Chess Variant Crazyhouse above World Champion Level with Deep Neural Networks and Human Data

Authors: Johannes Czech, Moritz Willig, Alena Beyer, Kristian Kersting, Johannes Fürnkranz

Abstract: Deep neural networks have been successfully applied in learning the board games Go, chess and shogi without prior knowledge by making use of reinforcement learning. Although starting from zero knowledge has been shown to yield impressive results, it is associated with high computationally costs especially for complex games. With this paper, we present CrazyAra which is a neural network based engin… ▽ More Deep neural networks have been successfully applied in learning the board games Go, chess and shogi without prior knowledge by making use of reinforcement learning. Although starting from zero knowledge has been shown to yield impressive results, it is associated with high computationally costs especially for complex games. With this paper, we present CrazyAra which is a neural network based engine solely trained in supervised manner for the chess variant crazyhouse. Crazyhouse is a game with a higher branching factor than chess and there is only limited data of lower quality available compared to AlphaGo. Therefore, we focus on improving efficiency in multiple aspects while relying on low computational resources. These improvements include modifications in the neural network design and training configuration, the introduction of a data normalization step and a more sample efficient Monte-Carlo tree search which has a lower chance to blunder. After training on 569,537 human games for 1.5 days we achieve a move prediction accuracy of 60.4%. During development, versions of CrazyAra played professional human players. Most notably, CrazyAra achieved a four to one win over 2017 crazyhouse world champion Justin Tan (aka LM Jann Lee) who is more than 400 Elo higher rated compared to the average player in our training set. Furthermore, we test the playing strength of CrazyAra on CPU against all participants of the second Crazyhouse Computer Championships 2017, winning against twelve of the thirteen participants. Finally, for CrazyAraFish we continue training our model on generated engine games. In ten long-time control matches playing Stockfish 10, CrazyAraFish wins three games and draws one out of ten matches. △ Less

Submitted 22 August, 2019; v1 submitted 19 August, 2019; originally announced August 2019.

Comments: 35 pages, 19 figures, 14 tables

Journal ref: Frontiers in Artificial Intelligence, Machine Learning and Artificial Intelligence, Volume 3 (2020)

Showing 1–7 of 7 results for author: Willig, M