-
Efficient SAGE Estimation via Causal Structure Learning
Authors:
Christoph Luther,
Gunnar König,
Moritz Grosse-Wentrup
Abstract:
The Shapley Additive Global Importance (SAGE) value is a theoretically appealing interpretability method that fairly attributes global importance to a model's features. However, its exact calculation requires the computation of the feature's surplus performance contributions over an exponential number of feature sets. This is computationally expensive, particularly because estimating the surplus c…
▽ More
The Shapley Additive Global Importance (SAGE) value is a theoretically appealing interpretability method that fairly attributes global importance to a model's features. However, its exact calculation requires the computation of the feature's surplus performance contributions over an exponential number of feature sets. This is computationally expensive, particularly because estimating the surplus contributions requires sampling from conditional distributions. Thus, SAGE approximation algorithms only take a fraction of the feature sets into account. We propose $d$-SAGE, a method that accelerates SAGE approximation. $d$-SAGE is motivated by the observation that conditional independencies (CIs) between a feature and the model target imply zero surplus contributions, such that their computation can be skipped. To identify CIs, we leverage causal structure learning (CSL) to infer a graph that encodes (conditional) independencies in the data as $d$-separations. This is computationally more efficient because the expense of the one-time graph inference and the $d$-separation queries is negligible compared to the expense of surplus contribution evaluations. Empirically we demonstrate that $d$-SAGE enables the efficient and accurate estimation of SAGE values.
△ Less
Submitted 6 April, 2023;
originally announced April 2023.
-
Improvement-Focused Causal Recourse (ICR)
Authors:
Gunnar König,
Timo Freiesleben,
Moritz Grosse-Wentrup
Abstract:
Algorithmic recourse recommendations, such as Karimi et al.'s (2021) causal recourse (CR), inform stakeholders of how to act to revert unfavourable decisions. However, some actions lead to acceptance (i.e., revert the model's decision) but do not lead to improvement (i.e., may not revert the underlying real-world state). To recommend such actions is to recommend fooling the predictor. We introduce…
▽ More
Algorithmic recourse recommendations, such as Karimi et al.'s (2021) causal recourse (CR), inform stakeholders of how to act to revert unfavourable decisions. However, some actions lead to acceptance (i.e., revert the model's decision) but do not lead to improvement (i.e., may not revert the underlying real-world state). To recommend such actions is to recommend fooling the predictor. We introduce a novel method, Improvement-Focused Causal Recourse (ICR), which involves a conceptual shift: Firstly, we require ICR recommendations to guide towards improvement. Secondly, we do not tailor the recommendations to be accepted by a specific predictor. Instead, we leverage causal knowledge to design decision systems that predict accurately pre- and post-recourse. As a result, improvement guarantees translate into acceptance guarantees. We demonstrate that given correct causal knowledge, ICR, in contrast to existing approaches, guides towards both acceptance and improvement.
△ Less
Submitted 27 October, 2022;
originally announced October 2022.
-
A Causal Perspective on Meaningful and Robust Algorithmic Recourse
Authors:
Gunnar König,
Timo Freiesleben,
Moritz Grosse-Wentrup
Abstract:
Algorithmic recourse explanations inform stakeholders on how to act to revert unfavorable predictions. However, in general ML models do not predict well in interventional distributions. Thus, an action that changes the prediction in the desired way may not lead to an improvement of the underlying target. Such recourse is neither meaningful nor robust to model refits. Extending the work of Karimi e…
▽ More
Algorithmic recourse explanations inform stakeholders on how to act to revert unfavorable predictions. However, in general ML models do not predict well in interventional distributions. Thus, an action that changes the prediction in the desired way may not lead to an improvement of the underlying target. Such recourse is neither meaningful nor robust to model refits. Extending the work of Karimi et al. (2021), we propose meaningful algorithmic recourse (MAR) that only recommends actions that improve both prediction and target. We justify this selection constraint by highlighting the differences between model audit and meaningful, actionable recourse explanations. Additionally, we introduce a relaxation of MAR called effective algorithmic recourse (EAR), which, under certain assumptions, yields meaningful recourse by only allowing interventions on causes of the target.
△ Less
Submitted 16 July, 2021;
originally announced July 2021.
-
Decomposition of Global Feature Importance into Direct and Associative Components (DEDACT)
Authors:
Gunnar König,
Timo Freiesleben,
Bernd Bischl,
Giuseppe Casalicchio,
Moritz Grosse-Wentrup
Abstract:
Global model-agnostic feature importance measures either quantify whether features are directly used for a model's predictions (direct importance) or whether they contain prediction-relevant information (associative importance). Direct importance provides causal insight into the model's mechanism, yet it fails to expose the leakage of information from associated but not directly used variables. In…
▽ More
Global model-agnostic feature importance measures either quantify whether features are directly used for a model's predictions (direct importance) or whether they contain prediction-relevant information (associative importance). Direct importance provides causal insight into the model's mechanism, yet it fails to expose the leakage of information from associated but not directly used variables. In contrast, associative importance exposes information leakage but does not provide causal insight into the model's mechanism. We introduce DEDACT - a framework to decompose well-established direct and associative importance measures into their respective associative and direct components. DEDACT provides insight into both the sources of prediction-relevant information in the data and the direct and indirect feature pathways by which the information enters the model. We demonstrate the method's usefulness on simulated examples.
△ Less
Submitted 15 June, 2021;
originally announced June 2021.
-
A Distance Covariance-based Kernel for Nonlinear Causal Clustering in Heterogeneous Populations
Authors:
Alex Markham,
Richeek Das,
Moritz Grosse-Wentrup
Abstract:
We consider the problem of causal structure learning in the setting of heterogeneous populations, i.e., populations in which a single causal structure does not adequately represent all population members, as is common in biological and social sciences. To this end, we introduce a distance covariance-based kernel designed specifically to measure the similarity between the underlying nonlinear causa…
▽ More
We consider the problem of causal structure learning in the setting of heterogeneous populations, i.e., populations in which a single causal structure does not adequately represent all population members, as is common in biological and social sciences. To this end, we introduce a distance covariance-based kernel designed specifically to measure the similarity between the underlying nonlinear causal structures of different samples. Indeed, we prove that the corresponding feature map is a statistically consistent estimator of nonlinear independence structure, rendering the kernel itself a statistical test for the hypothesis that sets of samples come from different generating causal structures. Even stronger, we prove that the kernel space is isometric to the space of causal ancestral graphs, so that distance between samples in the kernel space is guaranteed to correspond to distance between their generating causal structures. This kernel thus enables us to perform clustering to identify the homogeneous subpopulations, for which we can then learn causal structures using existing methods. Though we focus on the theoretical aspects of the kernel, we also evaluate its performance on synthetic data and demonstrate its use on a real gene expression data set.
△ Less
Submitted 18 February, 2022; v1 submitted 7 June, 2021;
originally announced June 2021.
-
Relative Feature Importance
Authors:
Gunnar König,
Christoph Molnar,
Bernd Bischl,
Moritz Grosse-Wentrup
Abstract:
Interpretable Machine Learning (IML) methods are used to gain insight into the relevance of a feature of interest for the performance of a model. Commonly used IML methods differ in whether they consider features of interest in isolation, e.g., Permutation Feature Importance (PFI), or in relation to all remaining feature variables, e.g., Conditional Feature Importance (CFI). As such, the perturbat…
▽ More
Interpretable Machine Learning (IML) methods are used to gain insight into the relevance of a feature of interest for the performance of a model. Commonly used IML methods differ in whether they consider features of interest in isolation, e.g., Permutation Feature Importance (PFI), or in relation to all remaining feature variables, e.g., Conditional Feature Importance (CFI). As such, the perturbation mechanisms inherent to PFI and CFI represent extreme reference points. We introduce Relative Feature Importance (RFI), a generalization of PFI and CFI that allows for a more nuanced feature importance computation beyond the PFI versus CFI dichotomy. With RFI, the importance of a feature relative to any other subset of features can be assessed, including variables that were not available at training time. We derive general interpretation rules for RFI based on a detailed theoretical analysis of the implications of relative feature relevance, and demonstrate the method's usefulness on simulated examples.
△ Less
Submitted 16 July, 2020;
originally announced July 2020.
-
General Pitfalls of Model-Agnostic Interpretation Methods for Machine Learning Models
Authors:
Christoph Molnar,
Gunnar König,
Julia Herbinger,
Timo Freiesleben,
Susanne Dandl,
Christian A. Scholbeck,
Giuseppe Casalicchio,
Moritz Grosse-Wentrup,
Bernd Bischl
Abstract:
An increasing number of model-agnostic interpretation techniques for machine learning (ML) models such as partial dependence plots (PDP), permutation feature importance (PFI) and Shapley values provide insightful model interpretations, but can lead to wrong conclusions if applied incorrectly. We highlight many general pitfalls of ML model interpretation, such as using interpretation techniques in…
▽ More
An increasing number of model-agnostic interpretation techniques for machine learning (ML) models such as partial dependence plots (PDP), permutation feature importance (PFI) and Shapley values provide insightful model interpretations, but can lead to wrong conclusions if applied incorrectly. We highlight many general pitfalls of ML model interpretation, such as using interpretation techniques in the wrong context, interpreting models that do not generalize well, ignoring feature dependencies, interactions, uncertainty estimates and issues in high-dimensional settings, or making unjustified causal interpretations, and illustrate them with examples. We focus on pitfalls for global methods that describe the average model behavior, but many pitfalls also apply to local methods that explain individual predictions. Our paper addresses ML practitioners by raising awareness of pitfalls and identifying solutions for correct model interpretation, but also addresses ML researchers by discussing open issues for further research.
△ Less
Submitted 17 August, 2021; v1 submitted 8 July, 2020;
originally announced July 2020.
-
Measurement Dependence Inducing Latent Causal Models
Authors:
Alex Markham,
Moritz Grosse-Wentrup
Abstract:
We consider the task of causal structure learning over measurement dependence inducing latent (MeDIL) causal models. We show that this task can be framed in terms of the graph theoretic problem of finding edge clique covers,resulting in an algorithm for returning minimal MeDIL causal models (minMCMs). This algorithm is non-parametric, requiring no assumptions about linearity or Gaussianity. Furthe…
▽ More
We consider the task of causal structure learning over measurement dependence inducing latent (MeDIL) causal models. We show that this task can be framed in terms of the graph theoretic problem of finding edge clique covers,resulting in an algorithm for returning minimal MeDIL causal models (minMCMs). This algorithm is non-parametric, requiring no assumptions about linearity or Gaussianity. Furthermore, despite rather weak assumptions aboutthe class of MeDIL causal models, we show that minimality in minMCMs implies some rather specific and interesting properties. By establishing MeDIL causal models as a semantics for edge clique covers, we also provide a starting point for future work further connecting causal structure learning to developments in graph theory and network science.
△ Less
Submitted 21 September, 2020; v1 submitted 19 October, 2019;
originally announced October 2019.
-
Causal Consistency of Structural Equation Models
Authors:
Paul K. Rubenstein,
Sebastian Weichwald,
Stephan Bongers,
Joris M. Mooij,
Dominik Janzing,
Moritz Grosse-Wentrup,
Bernhard Schölkopf
Abstract:
Complex systems can be modelled at various levels of detail. Ideally, causal models of the same system should be consistent with one another in the sense that they agree in their predictions of the effects of interventions. We formalise this notion of consistency in the case of Structural Equation Models (SEMs) by introducing exact transformations between SEMs. This provides a general language to…
▽ More
Complex systems can be modelled at various levels of detail. Ideally, causal models of the same system should be consistent with one another in the sense that they agree in their predictions of the effects of interventions. We formalise this notion of consistency in the case of Structural Equation Models (SEMs) by introducing exact transformations between SEMs. This provides a general language to consider, for instance, the different levels of description in the following three scenarios: (a) models with large numbers of variables versus models in which the `irrelevant' or unobservable variables have been marginalised out; (b) micro-level models versus macro-level models in which the macro-variables are aggregate features of the micro-variables; (c) dynamical time series models versus models of their stationary behaviour. Our analysis stresses the importance of well specified interventions in the causal modelling process and sheds light on the interpretation of cyclic SEMs.
△ Less
Submitted 4 July, 2017;
originally announced July 2017.
-
The right tool for the right question --- beyond the encoding versus decoding dichotomy
Authors:
Sebastian Weichwald,
Moritz Grosse-Wentrup
Abstract:
There are two major questions that neuroimaging studies attempt to answer: First, how are sensory stimuli represented in the brain (which we term the stimulus-based setting)? And, second, how does the brain generate cognition (termed the response-based setting)? There has been a lively debate in the neuroimaging community whether encoding and decoding models can provide insights into these questio…
▽ More
There are two major questions that neuroimaging studies attempt to answer: First, how are sensory stimuli represented in the brain (which we term the stimulus-based setting)? And, second, how does the brain generate cognition (termed the response-based setting)? There has been a lively debate in the neuroimaging community whether encoding and decoding models can provide insights into these questions. In this commentary, we construct two simple and analytically tractable examples to demonstrate that while an encoding model analysis helps with the former, neither model is appropriate to satisfactorily answer the latter question. Consequently, we argue that if we want to understand how the brain generates cognition, we need to move beyond the encoding versus decoding dichotomy and instead discuss and develop tools that are specifically tailored to our endeavour.
△ Less
Submitted 28 April, 2017;
originally announced April 2017.
-
NIPS 2016 Workshop on Representation Learning in Artificial and Biological Neural Networks (MLINI 2016)
Authors:
Leila Wehbe,
Anwar Nunez-Elizalde,
Marcel van Gerven,
Irina Rish,
Brian Murphy,
Moritz Grosse-Wentrup,
Georg Langs,
Guillermo Cecchi
Abstract:
This workshop explores the interface between cognitive neuroscience and recent advances in AI fields that aim to reproduce human performance such as natural language processing and computer vision, and specifically deep learning approaches to such problems.
When studying the cognitive capabilities of the brain, scientists follow a system identification approach in which they present different st…
▽ More
This workshop explores the interface between cognitive neuroscience and recent advances in AI fields that aim to reproduce human performance such as natural language processing and computer vision, and specifically deep learning approaches to such problems.
When studying the cognitive capabilities of the brain, scientists follow a system identification approach in which they present different stimuli to the subjects and try to model the response that different brain areas have of that stimulus. The goal is to understand the brain by trying to find the function that expresses the activity of brain areas in terms of different properties of the stimulus. Experimental stimuli are becoming increasingly complex with more and more people being interested in studying real life phenomena such as the perception of natural images or natural sentences. There is therefore a need for a rich and adequate vector representation of the properties of the stimulus, that we can obtain using advances in machine learning.
In parallel, new ML approaches, many of which in deep learning, are inspired to a certain extent by human behavior or biological principles. Neural networks for example were originally inspired by biological neurons. More recently, processes such as attention are being used which have are inspired by human behavior. However, the large bulk of these methods are independent of findings about brain function, and it is unclear whether it is at all beneficial for machine learning to try to emulate brain function in order to achieve the same tasks that the brain achieves.
△ Less
Submitted 10 April, 2017; v1 submitted 6 January, 2017;
originally announced January 2017.
-
A note on the expected minimum error probability in equientropic channels
Authors:
Sebastian Weichwald,
Tatiana Fomina,
Bernhard Schölkopf,
Moritz Grosse-Wentrup
Abstract:
While the channel capacity reflects a theoretical upper bound on the achievable information transmission rate in the limit of infinitely many bits, it does not characterise the information transfer of a given encoding routine with finitely many bits. In this note, we characterise the quality of a code (i. e. a given encoding routine) by an upper bound on the expected minimum error probability that…
▽ More
While the channel capacity reflects a theoretical upper bound on the achievable information transmission rate in the limit of infinitely many bits, it does not characterise the information transfer of a given encoding routine with finitely many bits. In this note, we characterise the quality of a code (i. e. a given encoding routine) by an upper bound on the expected minimum error probability that can be achieved when using this code. We show that for equientropic channels this upper bound is minimal for codes with maximal marginal entropy. As an instructive example we show for the additive white Gaussian noise (AWGN) channel that random coding---also a capacity achieving code---indeed maximises the marginal entropy in the limit of infinite messages.
△ Less
Submitted 4 April, 2017; v1 submitted 23 May, 2016;
originally announced May 2016.
-
Proceedings of the 5th Workshop on Machine Learning and Interpretation in Neuroimaging (MLINI) at NIPS 2015
Authors:
I. Rish,
L. Wehbe,
G. Langs,
M. Grosse-Wentrup,
B. Murphy,
G. Cecchi
Abstract:
This volume is a collection of contributions from the 5th Workshop on Machine Learning and Interpretation in Neuroimaging (MLINI) at the Neural Information Processing Systems (NIPS 2015) conference. Modern multivariate statistical methods developed in the rapidly growing field of machine learning are being increasingly applied to various problems in neuroimaging, from cognitive state detection to…
▽ More
This volume is a collection of contributions from the 5th Workshop on Machine Learning and Interpretation in Neuroimaging (MLINI) at the Neural Information Processing Systems (NIPS 2015) conference. Modern multivariate statistical methods developed in the rapidly growing field of machine learning are being increasingly applied to various problems in neuroimaging, from cognitive state detection to clinical diagnosis and prognosis. Multivariate pattern analysis methods are designed to examine complex relationships between high-dimensional signals, such as brain images, and outcomes of interest, such as the category of a stimulus, a type of a mental state of a subject, or a specific mental disorder. Such techniques are in contrast with the traditional mass-univariate approaches that dominated neuroimaging in the past and treated each individual imaging measurement in isolation.
We believe that machine learning has a prominent role in sha** how questions in neuroscience are framed, and that the machine-learning mind set is now entering modern psychology and behavioral studies. It is also equally important that practical applications in these fields motivate a rapidly evolving line or research in the machine learning community. In parallel, there is an intense interest in learning more about brain function in the context of rich naturalistic environments and scenes. Efforts to go beyond highly specific paradigms that pinpoint a single function, towards schemes for measuring the interaction with natural and more varied scene are made. The goal of the workshop is to pinpoint the most pressing issues and common challenges across the neuroscience, neuroimaging, psychology and machine learning fields, and to sketch future directions and open questions in the light of novel methodology.
△ Less
Submitted 14 May, 2016;
originally announced May 2016.
-
Recovery of non-linear cause-effect relationships from linearly mixed neuroimaging data
Authors:
Sebastian Weichwald,
Arthur Gretton,
Bernhard Schölkopf,
Moritz Grosse-Wentrup
Abstract:
Causal inference concerns the identification of cause-effect relationships between variables. However, often only linear combinations of variables constitute meaningful causal variables. For example, recovering the signal of a cortical source from electroencephalography requires a well-tuned combination of signals recorded at multiple electrodes. We recently introduced the MERLiN (Mixture Effect R…
▽ More
Causal inference concerns the identification of cause-effect relationships between variables. However, often only linear combinations of variables constitute meaningful causal variables. For example, recovering the signal of a cortical source from electroencephalography requires a well-tuned combination of signals recorded at multiple electrodes. We recently introduced the MERLiN (Mixture Effect Recovery in Linear Networks) algorithm that is able to recover, from an observed linear mixture, a causal variable that is a linear effect of another given variable. Here we relax the assumption of this cause-effect relationship being linear and present an extended algorithm that can pick up non-linear cause-effect relationships. Thus, the main contribution is an algorithm (and ready to use code) that has broader applicability and allows for a richer model class. Furthermore, a comparative analysis indicates that the assumption of linear cause-effect relationships is not restrictive in analysing electroencephalographic data.
△ Less
Submitted 30 September, 2016; v1 submitted 2 May, 2016;
originally announced May 2016.
-
Causal and anti-causal learning in pattern recognition for neuroimaging
Authors:
Sebastian Weichwald,
Bernhard Schölkopf,
Tonio Ball,
Moritz Grosse-Wentrup
Abstract:
Pattern recognition in neuroimaging distinguishes between two types of models: encoding- and decoding models. This distinction is based on the insight that brain state features, that are found to be relevant in an experimental paradigm, carry a different meaning in encoding- than in decoding models. In this paper, we argue that this distinction is not sufficient: Relevant features in encoding- and…
▽ More
Pattern recognition in neuroimaging distinguishes between two types of models: encoding- and decoding models. This distinction is based on the insight that brain state features, that are found to be relevant in an experimental paradigm, carry a different meaning in encoding- than in decoding models. In this paper, we argue that this distinction is not sufficient: Relevant features in encoding- and decoding models carry a different meaning depending on whether they represent causal- or anti-causal relations. We provide a theoretical justification for this argument and conclude that causal inference is essential for interpretation in neuroimaging.
△ Less
Submitted 15 December, 2015;
originally announced December 2015.
-
Decoding index finger position from EEG using random forests
Authors:
Sebastian Weichwald,
Timm Meyer,
Bernhard Schölkopf,
Tonio Ball,
Moritz Grosse-Wentrup
Abstract:
While invasively recorded brain activity is known to provide detailed information on motor commands, it is an open question at what level of detail information about positions of body parts can be decoded from non-invasively acquired signals. In this work it is shown that index finger positions can be differentiated from non-invasive electroencephalographic (EEG) recordings in healthy human subjec…
▽ More
While invasively recorded brain activity is known to provide detailed information on motor commands, it is an open question at what level of detail information about positions of body parts can be decoded from non-invasively acquired signals. In this work it is shown that index finger positions can be differentiated from non-invasive electroencephalographic (EEG) recordings in healthy human subjects. Using a leave-one-subject-out cross-validation procedure, a random forest distinguished different index finger positions on a numerical keyboard above chance-level accuracy. Among the different spectral features investigated, high $β$-power (20-30 Hz) over contralateral sensorimotor cortex carried most information about finger position. Thus, these findings indicate that finger position is in principle decodable from non-invasive features of brain activity that generalize across individuals.
△ Less
Submitted 14 December, 2015;
originally announced December 2015.
-
MERLiN: Mixture Effect Recovery in Linear Networks
Authors:
Sebastian Weichwald,
Moritz Grosse-Wentrup,
Arthur Gretton
Abstract:
Causal inference concerns the identification of cause-effect relationships between variables, e.g. establishing whether a stimulus affects activity in a certain brain region. The observed variables themselves often do not constitute meaningful causal variables, however, and linear combinations need to be considered. In electroencephalographic studies, for example, one is not interested in establis…
▽ More
Causal inference concerns the identification of cause-effect relationships between variables, e.g. establishing whether a stimulus affects activity in a certain brain region. The observed variables themselves often do not constitute meaningful causal variables, however, and linear combinations need to be considered. In electroencephalographic studies, for example, one is not interested in establishing cause-effect relationships between electrode signals (the observed variables), but rather between cortical signals (the causal variables) which can be recovered as linear combinations of electrode signals.
We introduce MERLiN (Mixture Effect Recovery in Linear Networks), a family of causal inference algorithms that implement a novel means of constructing causal variables from non-causal variables. We demonstrate through application to EEG data how the basic MERLiN algorithm can be extended for application to different (neuroimaging) data modalities. Given an observed linear mixture, the algorithms can recover a causal variable that is a linear effect of another given variable. That is, MERLiN allows us to recover a cortical signal that is affected by activity in a certain brain region, while not being a direct effect of the stimulus. The Python/Matlab implementation for all presented algorithms is available on https://github.com/sweichwald/MERLiN
△ Less
Submitted 27 September, 2016; v1 submitted 3 December, 2015;
originally announced December 2015.
-
Causal interpretation rules for encoding and decoding models in neuroimaging
Authors:
Sebastian Weichwald,
Timm Meyer,
Ozan Özdenizci,
Bernhard Schölkopf,
Tonio Ball,
Moritz Grosse-Wentrup
Abstract:
Causal terminology is often introduced in the interpretation of encoding and decoding models trained on neuroimaging data. In this article, we investigate which causal statements are warranted and which ones are not supported by empirical evidence. We argue that the distinction between encoding and decoding models is not sufficient for this purpose: relevant features in encoding and decoding model…
▽ More
Causal terminology is often introduced in the interpretation of encoding and decoding models trained on neuroimaging data. In this article, we investigate which causal statements are warranted and which ones are not supported by empirical evidence. We argue that the distinction between encoding and decoding models is not sufficient for this purpose: relevant features in encoding and decoding models carry a different meaning in stimulus- and in response-based experimental paradigms. We show that only encoding models in the stimulus-based setting support unambiguous causal interpretations. By combining encoding and decoding models trained on the same data, however, we obtain insights into causal relations beyond those that are implied by each individual model type. We illustrate the empirical relevance of our theoretical findings on EEG data recorded during a visuo-motor learning task.
△ Less
Submitted 15 November, 2015;
originally announced November 2015.