Search | arXiv e-print repository

Identifying Representations for Intervention Extrapolation

Authors: Sorawit Saengkyongam, Elan Rosenfeld, Pradeep Ravikumar, Niklas Pfister, Jonas Peters

Abstract: The premise of identifiable and causal representation learning is to improve the current representation learning paradigm in terms of generalizability or robustness. Despite recent progress in questions of identifiability, more theoretical results demonstrating concrete advantages of these methods for downstream tasks are needed. In this paper, we consider the task of intervention extrapolation: p… ▽ More The premise of identifiable and causal representation learning is to improve the current representation learning paradigm in terms of generalizability or robustness. Despite recent progress in questions of identifiability, more theoretical results demonstrating concrete advantages of these methods for downstream tasks are needed. In this paper, we consider the task of intervention extrapolation: predicting how interventions affect an outcome, even when those interventions are not observed at training time, and show that identifiable representations can provide an effective solution to this task even if the interventions affect the outcome non-linearly. Our setup includes an outcome Y, observed features X, which are generated as a non-linear transformation of latent features Z, and exogenous action variables A, which influence Z. The objective of intervention extrapolation is to predict how interventions on A that lie outside the training support of A affect Y. Here, extrapolation becomes possible if the effect of A on Z is linear and the residual when regressing Z on A has full support. As Z is latent, we combine the task of intervention extrapolation with identifiable representation learning, which we call Rep4Ex: we aim to map the observed features X into a subspace that allows for non-linear extrapolation in A. We show that the hidden representation is identifiable up to an affine transformation in Z-space, which is sufficient for intervention extrapolation. The identifiability is characterized by a novel constraint describing the linearity assumption of A on Z. Based on this insight, we propose a method that enforces the linear invariance constraint and can be combined with any type of autoencoder. We validate our theoretical findings through synthetic experiments and show that our approach succeeds in predicting the effects of unseen interventions. △ Less

Submitted 5 March, 2024; v1 submitted 6 October, 2023; originally announced October 2023.

Comments: Accepted at the International Conference on Learning Representations (ICLR) 2024

arXiv:2309.12833 [pdf, other]

Model-based causal feature selection for general response types

Authors: Lucas Kook, Sorawit Saengkyongam, Anton Rask Lundborg, Torsten Hothorn, Jonas Peters

Abstract: Discovering causal relationships from observational data is a fundamental yet challenging task. Invariant causal prediction (ICP, Peters et al., 2016) is a method for causal feature selection which requires data from heterogeneous settings and exploits that causal models are invariant. ICP has been extended to general additive noise models and to nonparametric settings using conditional independen… ▽ More Discovering causal relationships from observational data is a fundamental yet challenging task. Invariant causal prediction (ICP, Peters et al., 2016) is a method for causal feature selection which requires data from heterogeneous settings and exploits that causal models are invariant. ICP has been extended to general additive noise models and to nonparametric settings using conditional independence tests. However, the latter often suffer from low power (or poor type I error control) and additive noise models are not suitable for applications in which the response is not measured on a continuous scale, but reflects categories or counts. Here, we develop transformation-model (TRAM) based ICP, allowing for continuous, categorical, count-type, and uninformatively censored responses (these model classes, generally, do not allow for identifiability when there is no exogenous heterogeneity). As an invariance test, we propose TRAM-GCM based on the expected conditional covariance between environments and score residuals with uniform asymptotic level guarantees. For the special case of linear shift TRAMs, we also consider TRAM-Wald, which tests invariance based on the Wald statistic. We provide an open-source R package 'tramicp' and evaluate our approach on simulated data and in a case study investigating causal features of survival in critically ill patients. △ Less

Submitted 8 July, 2024; v1 submitted 22 September, 2023; originally announced September 2023.

Comments: Code available at https://github.com/LucasKook/tramicp.git

arXiv:2306.10983 [pdf, other]

Effect-Invariant Mechanisms for Policy Generalization

Authors: Sorawit Saengkyongam, Niklas Pfister, Predrag Klasnja, Susan Murphy, Jonas Peters

Abstract: Policy learning is an important component of many real-world learning systems. A major challenge in policy learning is how to adapt efficiently to unseen environments or tasks. Recently, it has been suggested to exploit invariant conditional distributions to learn models that generalize better to unseen environments. However, assuming invariance of entire conditional distributions (which we call f… ▽ More Policy learning is an important component of many real-world learning systems. A major challenge in policy learning is how to adapt efficiently to unseen environments or tasks. Recently, it has been suggested to exploit invariant conditional distributions to learn models that generalize better to unseen environments. However, assuming invariance of entire conditional distributions (which we call full invariance) may be too strong of an assumption in practice. In this paper, we introduce a relaxation of full invariance called effect-invariance (e-invariance for short) and prove that it is sufficient, under suitable assumptions, for zero-shot policy generalization. We also discuss an extension that exploits e-invariance when we have a small sample from the test environment, enabling few-shot policy generalization. Our work does not assume an underlying causal graph or that the data are generated by a structural causal model; instead, we develop testing procedures to test e-invariance directly from data. We present empirical results using simulated data and a mobile health intervention dataset to demonstrate the effectiveness of our approach. △ Less

Submitted 27 June, 2023; v1 submitted 19 June, 2023; originally announced June 2023.

arXiv:2202.01864 [pdf, other]

Exploiting Independent Instruments: Identification and Distribution Generalization

Authors: Sorawit Saengkyongam, Leonard Henckel, Niklas Pfister, Jonas Peters

Abstract: Instrumental variable models allow us to identify a causal function between covariates $X$ and a response $Y$, even in the presence of unobserved confounding. Most of the existing estimators assume that the error term in the response $Y$ and the hidden confounders are uncorrelated with the instruments $Z$. This is often motivated by a graphical separation, an argument that also justifies independe… ▽ More Instrumental variable models allow us to identify a causal function between covariates $X$ and a response $Y$, even in the presence of unobserved confounding. Most of the existing estimators assume that the error term in the response $Y$ and the hidden confounders are uncorrelated with the instruments $Z$. This is often motivated by a graphical separation, an argument that also justifies independence. Positing an independence restriction, however, leads to strictly stronger identifiability results. We connect to the existing literature in econometrics and provide a practical method called HSIC-X for exploiting independence that can be combined with any gradient-based learning procedure. We see that even in identifiable settings, taking into account higher moments may yield better finite sample results. Furthermore, we exploit the independence for distribution generalization. We prove that the proposed estimator is invariant to distributional shifts on the instruments and worst-case optimal whenever these shifts are sufficiently strong. These results hold even in the under-identified case where the instruments are not sufficiently rich to identify the causal function. △ Less

Submitted 22 September, 2022; v1 submitted 3 February, 2022; originally announced February 2022.

Comments: Accepted at ICML 2022

arXiv:2106.00808 [pdf, other]

Invariant Policy Learning: A Causal Perspective

Authors: Sorawit Saengkyongam, Nikolaj Thams, Jonas Peters, Niklas Pfister

Abstract: Contextual bandit and reinforcement learning algorithms have been successfully used in various interactive learning systems such as online advertising, recommender systems, and dynamic pricing. However, they have yet to be widely adopted in high-stakes application domains, such as healthcare. One reason may be that existing approaches assume that the underlying mechanisms are static in the sense t… ▽ More Contextual bandit and reinforcement learning algorithms have been successfully used in various interactive learning systems such as online advertising, recommender systems, and dynamic pricing. However, they have yet to be widely adopted in high-stakes application domains, such as healthcare. One reason may be that existing approaches assume that the underlying mechanisms are static in the sense that they do not change over different environments. In many real-world systems, however, the mechanisms are subject to shifts across environments which may invalidate the static environment assumption. In this paper, we take a step toward tackling the problem of environmental shifts considering the framework of offline contextual bandits. We view the environmental shift problem through the lens of causality and propose multi-environment contextual bandits that allow for changes in the underlying mechanisms. We adopt the concept of invariance from the causality literature and introduce the notion of policy invariance. We argue that policy invariance is only relevant if unobserved variables are present and show that, in that case, an optimal invariant policy is guaranteed to generalize across environments under suitable assumptions. Our results establish concrete connections among causality, invariance, and contextual bandits. △ Less

Submitted 22 September, 2022; v1 submitted 1 June, 2021; originally announced June 2021.

arXiv:2105.10821 [pdf, other]

Statistical Testing under Distributional Shifts

Authors: Nikolaj Thams, Sorawit Saengkyongam, Niklas Pfister, Jonas Peters

Abstract: In this work, we introduce statistical testing under distributional shifts. We are interested in the hypothesis $P^* \in H_0$ for a target distribution $P^*$, but observe data from a different distribution $Q^*$. We assume that $P^*$ is related to $Q^*$ through a known shift $τ$ and formally introduce hypothesis testing in this setting. We propose a general testing procedure that first resamples f… ▽ More In this work, we introduce statistical testing under distributional shifts. We are interested in the hypothesis $P^* \in H_0$ for a target distribution $P^*$, but observe data from a different distribution $Q^*$. We assume that $P^*$ is related to $Q^*$ through a known shift $τ$ and formally introduce hypothesis testing in this setting. We propose a general testing procedure that first resamples from the observed data to construct an auxiliary data set and then applies an existing test in the target domain. We prove that if the size of the resample is at most $o(\sqrt{n})$ and the resampling weights are well-behaved, this procedure inherits the pointwise asymptotic level and power from the target test. If the map $τ$ is estimated from data, we can maintain the above guarantees under mild conditions if the estimation works sufficiently well. We further extend our results to finite sample level, uniform asymptotic level and a different resampling scheme. Testing under distributional shifts allows us to tackle a diverse set of problems. We argue that it may prove useful in reinforcement learning and covariate shift, we show how it reduces conditional to unconditional independence testing and we provide example applications in causal inference. △ Less

Submitted 29 April, 2022; v1 submitted 22 May, 2021; originally announced May 2021.

MSC Class: 62G10 (Primary); 68Txx (Secondary)

arXiv:2005.11528 [pdf, other]

Learning Joint Nonlinear Effects from Single-variable Interventions in the Presence of Hidden Confounders

Authors: Sorawit Saengkyongam, Ricardo Silva

Abstract: We propose an approach to estimate the effect of multiple simultaneous interventions in the presence of hidden confounders. To overcome the problem of hidden confounding, we consider the setting where we have access to not only the observational data but also sets of single-variable interventions in which each of the treatment variables is intervened on separately. We prove identifiability under t… ▽ More We propose an approach to estimate the effect of multiple simultaneous interventions in the presence of hidden confounders. To overcome the problem of hidden confounding, we consider the setting where we have access to not only the observational data but also sets of single-variable interventions in which each of the treatment variables is intervened on separately. We prove identifiability under the assumption that the data is generated from a nonlinear continuous structural causal model with additive Gaussian noise. In addition, we propose a simple parameter estimation method by pooling all the data from different regimes and jointly maximizing the combined likelihood. We also conduct comprehensive experiments to verify the identifiability result as well as to compare the performance of our approach against a baseline on both synthetic and real-world data. △ Less

Submitted 16 June, 2020; v1 submitted 23 May, 2020; originally announced May 2020.

Comments: Accepted to The Conference on Uncertainty in Artificial Intelligence (UAI) 2020

arXiv:1805.08845 [pdf, other]

Counterfactual Mean Embeddings

Authors: Krikamol Muandet, Motonobu Kanagawa, Sorawit Saengkyongam, Sanparith Marukatat

Abstract: Counterfactual inference has become a ubiquitous tool in online advertisement, recommendation systems, medical diagnosis, and econometrics. Accurate modeling of outcome distributions associated with different interventions -- known as counterfactual distributions -- is crucial for the success of these applications. In this work, we propose to model counterfactual distributions using a novel Hilber… ▽ More Counterfactual inference has become a ubiquitous tool in online advertisement, recommendation systems, medical diagnosis, and econometrics. Accurate modeling of outcome distributions associated with different interventions -- known as counterfactual distributions -- is crucial for the success of these applications. In this work, we propose to model counterfactual distributions using a novel Hilbert space representation called counterfactual mean embedding (CME). The CME embeds the associated counterfactual distribution into a reproducing kernel Hilbert space (RKHS) endowed with a positive definite kernel, which allows us to perform causal inference over the entire landscape of the counterfactual distribution. Based on this representation, we propose a distributional treatment effect (DTE) that can quantify the causal effect over entire outcome distributions. Our approach is nonparametric as the CME can be estimated under the unconfoundedness assumption from observational data without requiring any parametric assumption about the underlying distributions. We also establish a rate of convergence of the proposed estimator which depends on the smoothness of the conditional mean and the Radon-Nikodym derivative of the underlying marginal distributions. Furthermore, our framework allows for more complex outcomes such as images, sequences, and graphs. Our experimental results on synthetic data and off-policy evaluation tasks demonstrate the advantages of the proposed estimator. △ Less

Submitted 10 July, 2021; v1 submitted 22 May, 2018; originally announced May 2018.

Comments: 71 pages

arXiv:1712.04712 [pdf, ps, other]

Efficient Computation of the Stochastic Behavior of Partial Sum Processes

Authors: Sorawit Saengkyongam, Anthony Hayter, Seksan Kiatsupaibul, Wei Liu

Abstract: In this paper the computational aspects of probability calculations for dynamical partial sum expressions are discussed. Such dynamical partial sum expressions have many important applications, and examples are provided in the fields of reliability, product quality assessment, and stochastic control. While these probability calculations are ostensibly of a high dimension, and consequently intracta… ▽ More In this paper the computational aspects of probability calculations for dynamical partial sum expressions are discussed. Such dynamical partial sum expressions have many important applications, and examples are provided in the fields of reliability, product quality assessment, and stochastic control. While these probability calculations are ostensibly of a high dimension, and consequently intractable in general, it is shown how a recursive integration methodology can be implemented to obtain exact calculations as a series of two-dimensional calculations. The computational aspects of the implementaion of this methodology, with the adoption of Fast Fourier Transforms, are discussed. △ Less

Submitted 13 December, 2017; originally announced December 2017.

Showing 1–9 of 9 results for author: Saengkyongam, S