Search | arXiv e-print repository

arXiv:2406.02049 [pdf, other]

Causal Effect Identification in LiNGAM Models with Latent Confounders

Authors: Daniele Tramontano, Yaroslav Kivva, Saber Salehkaleybar, Mathias Drton, Negar Kiyavash

Abstract: We study the generic identifiability of causal effects in linear non-Gaussian acyclic models (LiNGAM) with latent variables. We consider the problem in two main settings: When the causal graph is known a priori, and when it is unknown. In both settings, we provide a complete graphical characterization of the identifiable direct or total causal effects among observed variables. Moreover, we propose… ▽ More We study the generic identifiability of causal effects in linear non-Gaussian acyclic models (LiNGAM) with latent variables. We consider the problem in two main settings: When the causal graph is known a priori, and when it is unknown. In both settings, we provide a complete graphical characterization of the identifiable direct or total causal effects among observed variables. Moreover, we propose efficient algorithms to certify the graphical conditions. Finally, we propose an adaptation of the reconstruction independent component analysis (RICA) algorithm that estimates the causal effects from the observational data given the causal graph. Experimental results show the effectiveness of the proposed method in estimating the causal effects. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: Accepted at International Conference on Machine Learning (ICML) 2024

arXiv:2405.14547 [pdf, other]

Causal Effect Identification in a Sub-Population with Latent Variables

Authors: Amir Mohammad Abouei, Ehsan Mokhtarian, Negar Kiyavash, Matthias Grossglauser

Abstract: The s-ID problem seeks to compute a causal effect in a specific sub-population from the observational data pertaining to the same sub population (Abouei et al., 2023). This problem has been addressed when all the variables in the system are observable. In this paper, we consider an extension of the s-ID problem that allows for the presence of latent variables. To tackle the challenges induced by t… ▽ More The s-ID problem seeks to compute a causal effect in a specific sub-population from the observational data pertaining to the same sub population (Abouei et al., 2023). This problem has been addressed when all the variables in the system are observable. In this paper, we consider an extension of the s-ID problem that allows for the presence of latent variables. To tackle the challenges induced by the presence of latent variables in a sub-population, we first extend the classical relevant graphical definitions, such as c-components and Hedges, initially defined for the so-called ID problem (Pearl, 1995; Tian & Pearl, 2002), to their new counterparts. Subsequently, we propose a sound algorithm for the s-ID problem with latent variables. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: 19 pages, 5 figures

arXiv:2403.09300 [pdf, ps, other]

Recursive Causal Discovery

Authors: Ehsan Mokhtarian, Sepehr Elahi, Sina Akbari, Negar Kiyavash

Abstract: Causal discovery, i.e., learning the causal graph from data, is often the first step toward the identification and estimation of causal effects, a key requirement in numerous scientific domains. Causal discovery is hampered by two main challenges: limited data results in errors in statistical testing and the computational complexity of the learning task is daunting. This paper builds upon and exte… ▽ More Causal discovery, i.e., learning the causal graph from data, is often the first step toward the identification and estimation of causal effects, a key requirement in numerous scientific domains. Causal discovery is hampered by two main challenges: limited data results in errors in statistical testing and the computational complexity of the learning task is daunting. This paper builds upon and extends four of our prior publications (Mokhtarian et al., 2021; Akbari et al., 2021; Mokhtarian et al., 2022, 2023a). These works introduced the concept of removable variables, which are the only variables that can be removed recursively for the purpose of causal discovery. Presence and identification of removable variables allow recursive approaches for causal discovery, a promising solution that helps to address the aforementioned challenges by reducing the problem size successively. This reduction not only minimizes conditioning sets in each conditional independence (CI) test, leading to fewer errors but also significantly decreases the number of required CI tests. The worst-case performances of these methods nearly match the lower bound. In this paper, we present a unified framework for the proposed algorithms, refined with additional details and enhancements for a coherent presentation. A comprehensive literature review is also included, comparing the computational complexity of our methods with existing approaches, showcasing their state-of-the-art efficiency. Another contribution of this paper is the release of RCD, a Python package that efficiently implements these algorithms. This package is designed for practitioners and researchers interested in applying these methods in practical scenarios. The package is available at github.com/ban-epfl/rcd, with comprehensive documentation provided at rcdpackage.com. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: 50 pages, 5 tables, 11 algorithms, 5 figures

arXiv:2401.07578 [pdf, other]

Confounded Budgeted Causal Bandits

Authors: Fateme Jamshidi, Jalal Etesami, Negar Kiyavash

Abstract: We study the problem of learning 'good' interventions in a stochastic environment modeled by its underlying causal graph. Good interventions refer to interventions that maximize rewards. Specifically, we consider the setting of a pre-specified budget constraint, where interventions can have non-uniform costs. We show that this problem can be formulated as maximizing the expected reward for a stoch… ▽ More We study the problem of learning 'good' interventions in a stochastic environment modeled by its underlying causal graph. Good interventions refer to interventions that maximize rewards. Specifically, we consider the setting of a pre-specified budget constraint, where interventions can have non-uniform costs. We show that this problem can be formulated as maximizing the expected reward for a stochastic multi-armed bandit with side information. We propose an algorithm to minimize the cumulative regret in general causal graphs. This algorithm trades off observations and interventions based on their costs to achieve the optimal reward. This algorithm generalizes the state-of-the-art methods by allowing non-uniform costs and hidden confounders in the causal graph. Furthermore, we develop an algorithm to minimize the simple regret in the budgeted setting with non-uniform costs and also general causal graphs. We provide theoretical guarantees, including both upper and lower bounds, as well as empirical evaluations of our algorithms. Our empirical results showcase that our algorithms outperform the state of the art. △ Less

Submitted 15 January, 2024; originally announced January 2024.

arXiv:2312.16707 [pdf, other]

Modeling Systemic Risk: A Time-Varying Nonparametric Causal Inference Framework

Authors: Jalal Etesami, Ali Habibnia, Negar Kiyavash

Abstract: We propose a nonparametric and time-varying directed information graph (TV-DIG) framework to estimate the evolving causal structure in time series networks, thereby addressing the limitations of traditional econometric models in capturing high-dimensional, nonlinear, and time-varying interconnections among series. This framework employs an information-theoretic measure rooted in a generalized vers… ▽ More We propose a nonparametric and time-varying directed information graph (TV-DIG) framework to estimate the evolving causal structure in time series networks, thereby addressing the limitations of traditional econometric models in capturing high-dimensional, nonlinear, and time-varying interconnections among series. This framework employs an information-theoretic measure rooted in a generalized version of Granger-causality, which is applicable to both linear and nonlinear dynamics. Our framework offers advancements in measuring systemic risk and establishes meaningful connections with established econometric models, including vector autoregression and switching models. We evaluate the efficacy of our proposed model through simulation experiments and empirical analysis, reporting promising results in recovering simulated time-varying networks with nonlinear and multivariate structures. We apply this framework to identify and monitor the evolution of interconnectedness and systemic risk among major assets and industrial sectors within the financial network. We focus on cryptocurrencies' potential systemic risks to financial stability, including spillover effects on other sectors during crises like the COVID-19 pandemic and the Federal Reserve's 2020 emergency response. Our findings reveals significant, previously underrecognized pre-2020 influences of cryptocurrencies on certain financial sectors, highlighting their potential systemic risks and offering a systematic approach in tracking evolving cross-sector interactions within financial networks. △ Less

Submitted 27 December, 2023; originally announced December 2023.

arXiv:2312.06091 [pdf, other]

Learning Unknown Intervention Targets in Structural Causal Models from Heterogeneous Data

Authors: Yuqin Yang, Saber Salehkaleybar, Negar Kiyavash

Abstract: We study the problem of identifying the unknown intervention targets in structural causal models where we have access to heterogeneous data collected from multiple environments. The unknown intervention targets are the set of endogenous variables whose corresponding exogenous noises change across the environments. We propose a two-phase approach which in the first phase recovers the exogenous nois… ▽ More We study the problem of identifying the unknown intervention targets in structural causal models where we have access to heterogeneous data collected from multiple environments. The unknown intervention targets are the set of endogenous variables whose corresponding exogenous noises change across the environments. We propose a two-phase approach which in the first phase recovers the exogenous noises corresponding to unknown intervention targets whose distributions have changed across environments. In the second phase, the recovered noises are matched with the corresponding endogenous variables. For the recovery phase, we provide sufficient conditions for learning these exogenous noises up to some component-wise invertible transformation. For the matching phase, under the causal sufficiency assumption, we show that the proposed method uniquely identifies the intervention targets. In the presence of latent confounders, the intervention targets among the observed variables cannot be determined uniquely. We provide a candidate intervention target set which is a superset of the true intervention targets. Our approach improves upon the state of the art as the returned candidate set is always a subset of the target set returned by previous work. Moreover, we do not require restrictive assumptions such as linearity of the causal model or performing invariance tests to learn whether a distribution is changing across environments which could be highly sample inefficient. Our experimental results show the effectiveness of our proposed algorithm in practice. △ Less

Submitted 9 March, 2024; v1 submitted 10 December, 2023; originally announced December 2023.

Comments: Accepted at 27th International Conference on Artificial Intelligence and Statistics (AISTATS 2024)

arXiv:2311.08914 [pdf, other]

Efficiently Esca** Saddle Points for Non-Convex Policy Optimization

Authors: Sadegh Khorasani, Saber Salehkaleybar, Negar Kiyavash, Niao He, Matthias Grossglauser

Abstract: Policy gradient (PG) is widely used in reinforcement learning due to its scalability and good performance. In recent years, several variance-reduced PG methods have been proposed with a theoretical guarantee of converging to an approximate first-order stationary point (FOSP) with the sample complexity of $O(ε^{-3})$. However, FOSPs could be bad local optima or saddle points. Moreover, these algori… ▽ More Policy gradient (PG) is widely used in reinforcement learning due to its scalability and good performance. In recent years, several variance-reduced PG methods have been proposed with a theoretical guarantee of converging to an approximate first-order stationary point (FOSP) with the sample complexity of $O(ε^{-3})$. However, FOSPs could be bad local optima or saddle points. Moreover, these algorithms often use importance sampling (IS) weights which could impair the statistical effectiveness of variance reduction. In this paper, we propose a variance-reduced second-order method that uses second-order information in the form of Hessian vector products (HVP) and converges to an approximate second-order stationary point (SOSP) with sample complexity of $\tilde{O}(ε^{-3})$. This rate improves the best-known sample complexity for achieving approximate SOSPs by a factor of $O(ε^{-0.5})$. Moreover, the proposed variance reduction technique bypasses IS weights by using HVP terms. Our experimental results show that the proposed algorithm outperforms the state of the art and is more robust to changes in random seeds. △ Less

Submitted 15 November, 2023; originally announced November 2023.

Comments: arXiv admin note: text overlap with arXiv:2205.08253

MSC Class: ACM-class:I.2.6

arXiv:2310.13553 [pdf, other]

On sample complexity of conditional independence testing with Von Mises estimator with application to causal discovery

Authors: Fateme Jamshidi, Luca Ganassali, Negar Kiyavash

Abstract: Motivated by conditional independence testing, an essential step in constraint-based causal discovery algorithms, we study the nonparametric Von Mises estimator for the entropy of multivariate distributions built on a kernel density estimator. We establish an exponential concentration inequality for this estimator. We design a test for conditional independence (CI) based on our estimator, called V… ▽ More Motivated by conditional independence testing, an essential step in constraint-based causal discovery algorithms, we study the nonparametric Von Mises estimator for the entropy of multivariate distributions built on a kernel density estimator. We establish an exponential concentration inequality for this estimator. We design a test for conditional independence (CI) based on our estimator, called VM-CI, which achieves optimal parametric rates under smoothness assumptions. Leveraging the exponential concentration, we prove a tight upper bound for the overall error of VM-CI. This, in turn, allows us to characterize the sample complexity of any constraint-based causal discovery algorithm that uses VM-CI for CI tests. To the best of our knowledge, this is the first sample complexity guarantee for causal discovery for continuous variables. Furthermore, we empirically show that VM-CI outperforms other popular CI tests in terms of either time or sample complexity (or both), which translates to a better performance in structure learning as well. △ Less

Submitted 20 October, 2023; originally announced October 2023.

arXiv:2309.02281 [pdf, ps, other]

s-ID: Causal Effect Identification in a Sub-Population

Authors: Amir Mohammad Abouei, Ehsan Mokhtarian, Negar Kiyavash

Abstract: Causal inference in a sub-population involves identifying the causal effect of an intervention on a specific subgroup, which is distinguished from the whole population through the influence of systematic biases in the sampling process. However, ignoring the subtleties introduced by sub-populations can either lead to erroneous inference or limit the applicability of existing methods. We introduce a… ▽ More Causal inference in a sub-population involves identifying the causal effect of an intervention on a specific subgroup, which is distinguished from the whole population through the influence of systematic biases in the sampling process. However, ignoring the subtleties introduced by sub-populations can either lead to erroneous inference or limit the applicability of existing methods. We introduce and advocate for a causal inference problem in sub-populations (henceforth called s-ID), in which we merely have access to observational data of the targeted sub-population (as opposed to the entire population). Existing inference problems in sub-populations operate on the premise that the given data distributions originate from the entire population, thus, cannot tackle the s-ID problem. To address this gap, we provide necessary and sufficient conditions that must hold in the causal graph for a causal effect in a sub-population to be identifiable from the observational distribution of that sub-population. Given these conditions, we present a sound and complete algorithm for the s-ID problem. △ Less

Submitted 8 January, 2024; v1 submitted 5 September, 2023; originally announced September 2023.

Comments: 24 pages, 15 figures, 1 table, To appear in AAAI 2024 conference

arXiv:2307.02459 [pdf, ps, other]

Gaussian Database Alignment and Gaussian Planted Matching

Authors: Osman Emre Dai, Daniel Cullina, Negar Kiyavash

Abstract: Database alignment is a variant of the graph alignment problem: Given a pair of anonymized databases containing separate yet correlated features for a set of users, the problem is to identify the correspondence between the features and align the anonymized user sets based on correlation alone. This closely relates to planted matching, where given a bigraph with random weights, the goal is to ident… ▽ More Database alignment is a variant of the graph alignment problem: Given a pair of anonymized databases containing separate yet correlated features for a set of users, the problem is to identify the correspondence between the features and align the anonymized user sets based on correlation alone. This closely relates to planted matching, where given a bigraph with random weights, the goal is to identify the underlying matching that generated the given weights. We study an instance of the database alignment problem with multivariate Gaussian features and derive results that apply both for database alignment and for planted matching, demonstrating the connection between them. The performance thresholds for database alignment converge to that for planted matching when the dimensionality of the database features is $ω(\log n)$, where $n$ is the size of the alignment, and no individual feature is too strong. The maximum likelihood algorithms for both planted matching and database alignment take the form of a linear program and we study relaxations to better understand the significance of various constraints under various conditions and present achievability and converse bounds. Our results show that the almost-exact alignment threshold for the relaxed algorithms coincide with that of maximum likelihood, while there is a gap between the exact alignment thresholds. Our analysis and results extend to the unbalanced case where one user set is not fully covered by the alignment. △ Less

Submitted 5 July, 2023; originally announced July 2023.

Comments: Under review

arXiv:2306.11755 [pdf, other]

On Identifiability of Conditional Causal Effects

Authors: Yaroslav Kivva, Jalal Etesami, Negar Kiyavash

Abstract: We address the problem of identifiability of an arbitrary conditional causal effect given both the causal graph and a set of any observational and/or interventional distributions of the form $Q[S]:=P(S|do(V\setminus S))$, where $V$ denotes the set of all observed variables and $S\subseteq V$. We call this problem conditional generalized identifiability (c-gID in short) and prove the completeness o… ▽ More We address the problem of identifiability of an arbitrary conditional causal effect given both the causal graph and a set of any observational and/or interventional distributions of the form $Q[S]:=P(S|do(V\setminus S))$, where $V$ denotes the set of all observed variables and $S\subseteq V$. We call this problem conditional generalized identifiability (c-gID in short) and prove the completeness of Pearl's $do$-calculus for the c-gID problem by providing sound and complete algorithm for the c-gID problem. This work revisited the c-gID problem in Lee et al. [2020], Correa et al. [2021] by adding explicitly the positivity assumption which is crucial for identifiability. It extends the results of [Lee et al., 2019, Kivva et al., 2022] on general identifiability (gID) which studied the problem for unconditional causal effects and Shpitser and Pearl [2006b] on identifiability of conditional causal effects given merely the observational distribution $P(\mathbf{V})$ as our algorithm generalizes the algorithms proposed in [Kivva et al., 2022] and [Shpitser and Pearl, 2006b]. △ Less

Submitted 19 June, 2023; originally announced June 2023.

arXiv:2306.06263 [pdf, other]

A Cross-Moment Approach for Causal Effect Estimation

Authors: Yaroslav Kivva, Saber Salehkaleybar, Negar Kiyavash

Abstract: We consider the problem of estimating the causal effect of a treatment on an outcome in linear structural causal models (SCM) with latent confounders when we have access to a single proxy variable. Several methods (such as difference-in-difference (DiD) estimator or negative outcome control) have been proposed in this setting in the literature. However, these approaches require either restrictive… ▽ More We consider the problem of estimating the causal effect of a treatment on an outcome in linear structural causal models (SCM) with latent confounders when we have access to a single proxy variable. Several methods (such as difference-in-difference (DiD) estimator or negative outcome control) have been proposed in this setting in the literature. However, these approaches require either restrictive assumptions on the data generating model or having access to at least two proxy variables. We propose a method to estimate the causal effect using cross moments between the treatment, the outcome, and the proxy variable. In particular, we show that the causal effect can be identified with simple arithmetic operations on the cross moments if the latent confounder in linear SCM is non-Gaussian. In this setting, DiD estimator provides an unbiased estimate only in the special case where the latent confounder has exactly the same direct causal effects on the outcomes in the pre-treatment and post-treatment phases. This translates to the common trend assumption in DiD, which we effectively relax. Additionally, we provide an impossibility result that shows the causal effect cannot be identified if the observational distribution over the treatment, the outcome, and the proxy is jointly Gaussian. Our experiments on both synthetic and real-world datasets showcase the effectiveness of the proposed approach in estimating the causal effect. △ Less

Submitted 9 June, 2023; originally announced June 2023.

arXiv:2306.00585 [pdf, other]

Causal Imitability Under Context-Specific Independence Relations

Authors: Fateme Jamshidi, Sina Akbari, Negar Kiyavash

Abstract: Drawbacks of ignoring the causal mechanisms when performing imitation learning have recently been acknowledged. Several approaches both to assess the feasibility of imitation and to circumvent causal confounding and causal misspecifications have been proposed in the literature. However, the potential benefits of the incorporation of additional information about the underlying causal structure are… ▽ More Drawbacks of ignoring the causal mechanisms when performing imitation learning have recently been acknowledged. Several approaches both to assess the feasibility of imitation and to circumvent causal confounding and causal misspecifications have been proposed in the literature. However, the potential benefits of the incorporation of additional information about the underlying causal structure are left unexplored. An example of such overlooked information is context-specific independence (CSI), i.e., independence that holds only in certain contexts. We consider the problem of causal imitation learning when CSI relations are known. We prove that the decision problem pertaining to the feasibility of imitation in this setting is NP-hard. Further, we provide a necessary graphical criterion for imitation learning under CSI and show that under a structural assumption, this criterion is also sufficient. Finally, we propose a sound algorithmic approach for causal imitation learning which takes both CSI relations and data into account. △ Less

Submitted 11 June, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

Comments: 19 pages, 4 figures, under review

arXiv:2305.18210 [pdf, other]

Learning Causal Graphs via Monotone Triangular Transport Maps

Authors: Sina Akbari, Luca Ganassali, Negar Kiyavash

Abstract: We study the problem of causal structure learning from data using optimal transport (OT). Specifically, we first provide a constraint-based method which builds upon lower-triangular monotone parametric transport maps to design conditional independence tests which are agnostic to the noise distribution. We provide an algorithm for causal discovery up to Markov Equivalence with no assumptions on the… ▽ More We study the problem of causal structure learning from data using optimal transport (OT). Specifically, we first provide a constraint-based method which builds upon lower-triangular monotone parametric transport maps to design conditional independence tests which are agnostic to the noise distribution. We provide an algorithm for causal discovery up to Markov Equivalence with no assumptions on the structural equations/noise distributions, which allows for settings with latent variables. Our approach also extends to score-based causal discovery by providing a novel means for defining scores. This allows us to uniquely recover the causal graph under additional identifiability and structural assumptions, such as additive noise or post-nonlinear models. We provide experimental results to compare the proposed approach with the state of the art on both synthetic and real-world datasets. △ Less

Submitted 26 May, 2023; originally announced May 2023.

Comments: 20 pages, 6 figures, under review

arXiv:2301.11401 [pdf, other]

Causal Bandits without Graph Learning

Authors: Mikhail Konobeev, Jalal Etesami, Negar Kiyavash

Abstract: We study the causal bandit problem when the causal graph is unknown and develop an efficient algorithm for finding the parent node of the reward node using atomic interventions. We derive the exact equation for the expected number of interventions performed by the algorithm and show that under certain graphical conditions it could perform either logarithmically fast or, under more general assumpti… ▽ More We study the causal bandit problem when the causal graph is unknown and develop an efficient algorithm for finding the parent node of the reward node using atomic interventions. We derive the exact equation for the expected number of interventions performed by the algorithm and show that under certain graphical conditions it could perform either logarithmically fast or, under more general assumptions, slower but still sublinearly in the number of variables. We formally show that our algorithm is optimal as it meets the universal lower bound we establish for any algorithm that performs atomic interventions. Finally, we extend our algorithm to the case when the reward node has multiple parents. Using this algorithm together with a standard algorithm from bandit literature leads to improved regret bounds. △ Less

Submitted 8 June, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

arXiv:2211.03984 [pdf, other]

Causal Discovery in Linear Latent Variable Models Subject to Measurement Error

Authors: Yuqin Yang, AmirEmad Ghassami, Mohamed Nafea, Negar Kiyavash, Kun Zhang, Ilya Shpitser

Abstract: We focus on causal discovery in the presence of measurement error in linear systems where the mixing matrix, i.e., the matrix indicating the independent exogenous noise terms pertaining to the observed variables, is identified up to permutation and scaling of the columns. We demonstrate a somewhat surprising connection between this problem and causal discovery in the presence of unobserved parentl… ▽ More We focus on causal discovery in the presence of measurement error in linear systems where the mixing matrix, i.e., the matrix indicating the independent exogenous noise terms pertaining to the observed variables, is identified up to permutation and scaling of the columns. We demonstrate a somewhat surprising connection between this problem and causal discovery in the presence of unobserved parentless causes, in the sense that there is a map**, given by the mixing matrix, between the underlying models to be inferred in these problems. Consequently, any identifiability result based on the mixing matrix for one model translates to an identifiability result for the other model. We characterize to what extent the causal models can be identified under a two-part faithfulness assumption. Under only the first part of the assumption (corresponding to the conventional definition of faithfulness), the structure can be learned up to the causal ordering among an ordered grou** of the variables but not all the edges across the groups can be identified. We further show that if both parts of the faithfulness assumption are imposed, the structure can be learned up to a more refined ordered grou**. As a result of this refinement, for the latent variable model with unobserved parentless causes, the structure can be identified. Based on our theoretical results, we propose causal structure learning methods for both models, and evaluate their performance on synthetic data. △ Less

Submitted 7 November, 2022; originally announced November 2022.

Comments: Accepted at 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

arXiv:2208.06935 [pdf, other]

Novel Ordering-based Approaches for Causal Structure Learning in the Presence of Unobserved Variables

Authors: Ehsan Mokhtarian, Mohammadsadegh Khorasani, Jalal Etesami, Negar Kiyavash

Abstract: We propose ordering-based approaches for learning the maximal ancestral graph (MAG) of a structural equation model (SEM) up to its Markov equivalence class (MEC) in the presence of unobserved variables. Existing ordering-based methods in the literature recover a graph through learning a causal order (c-order). We advocate for a novel order called removable order (r-order) as they are advantageous… ▽ More We propose ordering-based approaches for learning the maximal ancestral graph (MAG) of a structural equation model (SEM) up to its Markov equivalence class (MEC) in the presence of unobserved variables. Existing ordering-based methods in the literature recover a graph through learning a causal order (c-order). We advocate for a novel order called removable order (r-order) as they are advantageous over c-orders for structure learning. This is because r-orders are the minimizers of an appropriately defined optimization problem that could be either solved exactly (using a reinforcement learning approach) or approximately (using a hill-climbing search). Moreover, the r-orders (unlike c-orders) are invariant among all the graphs in a MEC and include c-orders as a subset. Given that set of r-orders is often significantly larger than the set of c-orders, it is easier for the optimization problem to find an r-order instead of a c-order. We evaluate the performance and the scalability of our proposed approaches on both real-world and randomly generated networks. △ Less

Submitted 14 August, 2022; originally announced August 2022.

Comments: 14 pages, 6 figures, 3 tables

arXiv:2208.04627 [pdf, other]

Causal Effect Identification in Uncertain Causal Networks

Authors: Sina Akbari, Fateme Jamshidi, Ehsan Mokhtarian, Matthew J. Vowels, Jalal Etesami, Negar Kiyavash

Abstract: Causal identification is at the core of the causal inference literature, where complete algorithms have been proposed to identify causal queries of interest. The validity of these algorithms hinges on the restrictive assumption of having access to a correctly specified causal structure. In this work, we study the setting where a probabilistic model of the causal structure is available. Specificall… ▽ More Causal identification is at the core of the causal inference literature, where complete algorithms have been proposed to identify causal queries of interest. The validity of these algorithms hinges on the restrictive assumption of having access to a correctly specified causal structure. In this work, we study the setting where a probabilistic model of the causal structure is available. Specifically, the edges in a causal graph exist with uncertainties which may, for example, represent degree of belief from domain experts. Alternatively, the uncertainty about an edge may reflect the confidence of a particular statistical test. The question that naturally arises in this setting is: Given such a probabilistic graph and a specific causal effect of interest, what is the subgraph which has the highest plausibility and for which the causal effect is identifiable? We show that answering this question reduces to solving an NP-complete combinatorial optimization problem which we call the edge ID problem. We propose efficient algorithms to approximate this problem and evaluate them against both real-world networks and randomly generated graphs. △ Less

Submitted 27 October, 2023; v1 submitted 9 August, 2022; originally announced August 2022.

Comments: 27 pages, 9 figures, NeurIPS 2023 conference, causal identification, causal discovery, probabilistic models

arXiv:2206.01081 [pdf, ps, other]

Revisiting the General Identifiability Problem

Authors: Yaroslav Kivva, Ehsan Mokhtarian, Jalal Etesami, Negar Kiyavash

Abstract: We revisit the problem of general identifiability originally introduced in [Lee et al., 2019] for causal inference and note that it is necessary to add positivity assumption of observational distribution to the original definition of the problem. We show that without such an assumption the rules of do-calculus and consequently the proposed algorithm in [Lee et al., 2019] are not sound. Moreover, a… ▽ More We revisit the problem of general identifiability originally introduced in [Lee et al., 2019] for causal inference and note that it is necessary to add positivity assumption of observational distribution to the original definition of the problem. We show that without such an assumption the rules of do-calculus and consequently the proposed algorithm in [Lee et al., 2019] are not sound. Moreover, adding the assumption will cause the completeness proof in [Lee et al., 2019] to fail. Under positivity assumption, we present a new algorithm that is provably both sound and complete. A nice property of this new algorithm is that it establishes a connection between general identifiability and classical identifiability by Pearl [1995] through decomposing the general identifiability problem into a series of classical identifiability sub-problems. △ Less

Submitted 2 June, 2022; originally announced June 2022.

Comments: 22 pages, 4 figures, 2 Algorithms, UAI 2022

arXiv:2205.12856 [pdf, other]

Stochastic Second-Order Methods Improve Best-Known Sample Complexity of SGD for Gradient-Dominated Function

Authors: Saeed Masiha, Saber Salehkaleybar, Niao He, Negar Kiyavash, Patrick Thiran

Abstract: We study the performance of Stochastic Cubic Regularized Newton (SCRN) on a class of functions satisfying gradient dominance property with $1\leα\le2$ which holds in a wide range of applications in machine learning and signal processing. This condition ensures that any first-order stationary point is a global optimum. We prove that the total sample complexity of SCRN in achieving $ε$-global optimu… ▽ More We study the performance of Stochastic Cubic Regularized Newton (SCRN) on a class of functions satisfying gradient dominance property with $1\leα\le2$ which holds in a wide range of applications in machine learning and signal processing. This condition ensures that any first-order stationary point is a global optimum. We prove that the total sample complexity of SCRN in achieving $ε$-global optimum is $\mathcal{O}(ε^{-7/(2α)+1})$ for $1\leα< 3/2$ and $\mathcal{\tilde{O}}(ε^{-2/(α)})$ for $3/2\leα\le 2$. SCRN improves the best-known sample complexity of stochastic gradient descent. Even under a weak version of gradient dominance property, which is applicable to policy-based reinforcement learning (RL), SCRN achieves the same improvement over stochastic policy gradient methods. Additionally, we show that the average sample complexity of SCRN can be reduced to ${\mathcal{O}}(ε^{-2})$ for $α=1$ using a variance reduction method with time-varying batch sizes. Experimental results in various RL settings showcase the remarkable performance of SCRN compared to first-order methods. △ Less

Submitted 20 January, 2023; v1 submitted 25 May, 2022; originally announced May 2022.

arXiv:2205.10083 [pdf, other]

A Unified Experiment Design Approach for Cyclic and Acyclic Causal Models

Authors: Ehsan Mokhtarian, Saber Salehkaleybar, AmirEmad Ghassami, Negar Kiyavash

Abstract: We study experiment design for unique identification of the causal graph of a simple SCM, where the graph may contain cycles. The presence of cycles in the structure introduces major challenges for experiment design as, unlike acyclic graphs, learning the skeleton of causal graphs with cycles may not be possible from merely the observational distribution. Furthermore, intervening on a variable in… ▽ More We study experiment design for unique identification of the causal graph of a simple SCM, where the graph may contain cycles. The presence of cycles in the structure introduces major challenges for experiment design as, unlike acyclic graphs, learning the skeleton of causal graphs with cycles may not be possible from merely the observational distribution. Furthermore, intervening on a variable in such graphs does not necessarily lead to orienting all the edges incident to it. In this paper, we propose an experiment design approach that can learn both cyclic and acyclic graphs and hence, unifies the task of experiment design for both types of graphs. We provide a lower bound on the number of experiments required to guarantee the unique identification of the causal graph in the worst case, showing that the proposed approach is order-optimal in terms of the number of experiments up to an additive logarithmic term. Moreover, we extend our result to the setting where the size of each experiment is bounded by a constant. For this case, we show that our approach is optimal in terms of the size of the largest experiment required for uniquely identifying the causal graph in the worst case. △ Less

Submitted 13 December, 2023; v1 submitted 20 May, 2022; originally announced May 2022.

Comments: 31 pages, 6 figures, 1 table, accepted in JMLR

arXiv:2205.08253 [pdf, other]

Momentum-Based Policy Gradient with Second-Order Information

Authors: Saber Salehkaleybar, Sadegh Khorasani, Negar Kiyavash, Niao He, Patrick Thiran

Abstract: Variance-reduced gradient estimators for policy gradient methods have been one of the main focus of research in the reinforcement learning in recent years as they allow acceleration of the estimation process. We propose a variance-reduced policy-gradient method, called SHARP, which incorporates second-order information into stochastic gradient descent (SGD) using momentum with a time-varying learn… ▽ More Variance-reduced gradient estimators for policy gradient methods have been one of the main focus of research in the reinforcement learning in recent years as they allow acceleration of the estimation process. We propose a variance-reduced policy-gradient method, called SHARP, which incorporates second-order information into stochastic gradient descent (SGD) using momentum with a time-varying learning rate. SHARP algorithm is parameter-free, achieving $ε$-approximate first-order stationary point with $O(ε^{-3})$ number of trajectories, while using a batch size of $O(1)$ at each iteration. Unlike most previous work, our proposed algorithm does not require importance sampling which can compromise the advantage of variance reduction process. Moreover, the variance of estimation error decays with the fast rate of $O(1/t^{2/3})$ where $t$ is the number of iterations. Our extensive experimental evaluations show the effectiveness of the proposed algorithm on various control tasks and its advantage over the state of the art in practice. △ Less

Submitted 26 November, 2023; v1 submitted 17 May, 2022; originally announced May 2022.

arXiv:2205.02232 [pdf, other]

Experimental Design for Causal Effect Identification

Authors: Sina Akbari, Jalal Etesami, Negar Kiyavash

Abstract: Pearl's do calculus is a complete axiomatic approach to learn the identifiable causal effects from observational data. When such an effect is not identifiable, it is necessary to perform a collection of often costly interventions in the system to learn the causal effect. In this work, we consider the problem of designing the collection of interventions with the minimum cost to identify the desired… ▽ More Pearl's do calculus is a complete axiomatic approach to learn the identifiable causal effects from observational data. When such an effect is not identifiable, it is necessary to perform a collection of often costly interventions in the system to learn the causal effect. In this work, we consider the problem of designing the collection of interventions with the minimum cost to identify the desired effect. First, we prove that this problem is NP-hard, and subsequently propose an algorithm that can either find the optimal solution or a logarithmic-factor approximation of it. This is done by establishing a connection between our problem and the minimum hitting set problem. Additionally, we propose several polynomial-time heuristic algorithms to tackle the computational complexity of the problem. Although these algorithms could potentially stumble on sub-optimal solutions, our simulations show that they achieve small regrets on random graphs. △ Less

Submitted 17 August, 2023; v1 submitted 4 May, 2022; originally announced May 2022.

Comments: 53 pages, 13 figures, extending the findings of our ICML2022 paper

arXiv:2112.10884 [pdf, other]

Learning Bayesian Networks in the Presence of Structural Side Information

Authors: Ehsan Mokhtarian, Sina Akbari, Fateme Jamshidi, Jalal Etesami, Negar Kiyavash

Abstract: We study the problem of learning a Bayesian network (BN) of a set of variables when structural side information about the system is available. It is well known that learning the structure of a general BN is both computationally and statistically challenging. However, often in many applications, side information about the underlying structure can potentially reduce the learning complexity. In this… ▽ More We study the problem of learning a Bayesian network (BN) of a set of variables when structural side information about the system is available. It is well known that learning the structure of a general BN is both computationally and statistically challenging. However, often in many applications, side information about the underlying structure can potentially reduce the learning complexity. In this paper, we develop a recursive constraint-based algorithm that efficiently incorporates such knowledge (i.e., side information) into the learning process. In particular, we study two types of structural side information about the underlying BN: (I) an upper bound on its clique number is known, or (II) it is diamond-free. We provide theoretical guarantees for the learning algorithms, including the worst-case number of tests required in each scenario. As a consequence of our work, we show that bounded treewidth BNs can be learned with polynomial complexity. Furthermore, we evaluate the performance and the scalability of our algorithms in both synthetic and real-world structures and show that they outperform the state-of-the-art structure learning algorithms. △ Less

Submitted 20 December, 2021; originally announced December 2021.

Comments: 20 pages, 7 figures, 5 tables, AAAI 2022 conference

arXiv:2111.00341 [pdf, other]

Causal Discovery in Linear Structural Causal Models with Deterministic Relations

Authors: Yuqin Yang, Mohamed Nafea, AmirEmad Ghassami, Negar Kiyavash

Abstract: Linear structural causal models (SCMs) -- in which each observed variable is generated by a subset of the other observed variables as well as a subset of the exogenous sources -- are pervasive in causal inference and casual discovery. However, for the task of causal discovery, existing work almost exclusively focus on the submodel where each observed variable is associated with a distinct source w… ▽ More Linear structural causal models (SCMs) -- in which each observed variable is generated by a subset of the other observed variables as well as a subset of the exogenous sources -- are pervasive in causal inference and casual discovery. However, for the task of causal discovery, existing work almost exclusively focus on the submodel where each observed variable is associated with a distinct source with non-zero variance. This results in the restriction that no observed variable can deterministically depend on other observed variables or latent confounders. In this paper, we extend the results on structure learning by focusing on a subclass of linear SCMs which do not have this property, i.e., models in which observed variables can be causally affected by any subset of the sources, and are allowed to be a deterministic function of other observed variables or latent confounders. This allows for a more realistic modeling of influence or information propagation in systems. We focus on the task of causal discovery form observational data generated from a member of this subclass. We derive a set of necessary and sufficient conditions for unique identifiability of the causal structure. To the best of our knowledge, this is the first work that gives identifiability results for causal discovery under both latent confounding and deterministic relationships. Further, we propose an algorithm for recovering the underlying causal structure when the aforementioned conditions are satisfied. We validate our theoretical results both on synthetic and real datasets. △ Less

Submitted 7 November, 2022; v1 submitted 30 October, 2021; originally announced November 2021.

Comments: Accepted at 1st Conference on Causal Learning and Reasoning (CLeaR 2022)

arXiv:2110.12064 [pdf, other]

Causal Effect Identification with Context-specific Independence Relations of Control Variables

Authors: Ehsan Mokhtarian, Fateme Jamshidi, Jalal Etesami, Negar Kiyavash

Abstract: We study the problem of causal effect identification from observational distribution given the causal graph and some context-specific independence (CSI) relations. It was recently shown that this problem is NP-hard, and while a sound algorithm to learn the causal effects is proposed in Tikka et al. (2019), no complete algorithm for the task exists. In this work, we propose a sound and complete alg… ▽ More We study the problem of causal effect identification from observational distribution given the causal graph and some context-specific independence (CSI) relations. It was recently shown that this problem is NP-hard, and while a sound algorithm to learn the causal effects is proposed in Tikka et al. (2019), no complete algorithm for the task exists. In this work, we propose a sound and complete algorithm for the setting when the CSI relations are limited to observed nodes with no parents in the causal graph. One limitation of the state of the art in terms of its applicability is that the CSI relations among all variables, even unobserved ones, must be given (as opposed to learned). Instead, We introduce a set of graphical constraints under which the CSI relations can be learned from mere observational distribution. This expands the set of identifiable causal effects beyond the state of the art. △ Less

Submitted 17 February, 2022; v1 submitted 22 October, 2021; originally announced October 2021.

Comments: 10 pages, 5 figures, 2 algorithms, 1 table

arXiv:2110.12036 [pdf, other]

Recursive Causal Structure Learning in the Presence of Latent Variables and Selection Bias

Authors: Sina Akbari, Ehsan Mokhtarian, AmirEmad Ghassami, Negar Kiyavash

Abstract: We consider the problem of learning the causal MAG of a system from observational data in the presence of latent variables and selection bias. Constraint-based methods are one of the main approaches for solving this problem, but the existing methods are either computationally impractical when dealing with large graphs or lacking completeness guarantees. We propose a novel computationally efficient… ▽ More We consider the problem of learning the causal MAG of a system from observational data in the presence of latent variables and selection bias. Constraint-based methods are one of the main approaches for solving this problem, but the existing methods are either computationally impractical when dealing with large graphs or lacking completeness guarantees. We propose a novel computationally efficient recursive constraint-based method that is sound and complete. The key idea of our approach is that at each iteration a specific type of variable is identified and removed. This allows us to learn the structure efficiently and recursively, as this technique reduces both the number of required conditional independence (CI) tests and the size of the conditioning sets. The former substantially reduces the computational complexity, while the latter results in more reliable CI tests. We provide an upper bound on the number of required CI tests in the worst case. To the best of our knowledge, this is the tightest bound in the literature. We further provide a lower bound on the number of CI tests required by any constraint-based method. The upper bound of our proposed approach and the lower bound at most differ by a factor equal to the number of variables in the worst case. We provide experimental results to compare the proposed approach with the state of the art on both synthetic and real-world structures. △ Less

Submitted 22 October, 2021; originally announced October 2021.

Comments: 23 pages, 5 figures, NeurIPS 2021

arXiv:2106.00772 [pdf, other]

Information Theoretic Measures for Fairness-aware Feature Selection

Authors: Sajad Khodadadian, Mohamed Nafea, AmirEmad Ghassami, Negar Kiyavash

Abstract: Machine learning algorithms are increasingly used for consequential decision making regarding individuals based on their relevant features. Features that are relevant for accurate decisions may however lead to either explicit or implicit forms of discrimination against unprivileged groups, such as those of certain race or gender. This happens due to existing biases in the training data, which are… ▽ More Machine learning algorithms are increasingly used for consequential decision making regarding individuals based on their relevant features. Features that are relevant for accurate decisions may however lead to either explicit or implicit forms of discrimination against unprivileged groups, such as those of certain race or gender. This happens due to existing biases in the training data, which are often replicated or even exacerbated by the learning algorithm. Identifying and measuring these biases at the data level is a challenging problem due to the interdependence among the features, and the decision outcome. In this work, we develop a framework for fairness-aware feature selection which takes into account the correlation among the features and the decision outcome, and is based on information theoretic measures for the accuracy and discriminatory impacts of features. In particular, we first propose information theoretic measures which quantify the impact of different subsets of features on the accuracy and discrimination of the decision outcomes. We then deduce the marginal impact of each feature using Shapley value function; a solution concept in cooperative game theory used to estimate marginal contributions of players in a coalitional game. Finally, we design a fairness utility score for each feature (for feature selection) which quantifies how this feature influences accurate as well as nondiscriminatory decisions. Our framework depends on the joint statistics of the data rather than a particular classifier design. We examine our proposed framework on real and synthetic data to evaluate its performance. △ Less

Submitted 8 June, 2021; v1 submitted 1 June, 2021; originally announced June 2021.

Comments: 15 pages, 6 figures

arXiv:2103.15888 [pdf, ps, other]

The Complexity of Nonconvex-Strongly-Concave Minimax Optimization

Authors: Siqi Zhang, Junchi Yang, Cristóbal Guzmán, Negar Kiyavash, Niao He

Abstract: This paper studies the complexity for finding approximate stationary points of nonconvex-strongly-concave (NC-SC) smooth minimax problems, in both general and averaged smooth finite-sum settings. We establish nontrivial lower complexity bounds of $Ω(\sqrtκΔLε^{-2})$ and $Ω(n+\sqrt{nκ}ΔLε^{-2})$ for the two settings, respectively, where $κ$ is the condition number, $L$ is the smoothness constant, a… ▽ More This paper studies the complexity for finding approximate stationary points of nonconvex-strongly-concave (NC-SC) smooth minimax problems, in both general and averaged smooth finite-sum settings. We establish nontrivial lower complexity bounds of $Ω(\sqrtκΔLε^{-2})$ and $Ω(n+\sqrt{nκ}ΔLε^{-2})$ for the two settings, respectively, where $κ$ is the condition number, $L$ is the smoothness constant, and $Δ$ is the initial gap. Our result reveals substantial gaps between these limits and best-known upper bounds in the literature. To close these gaps, we introduce a generic acceleration scheme that deploys existing gradient-based methods to solve a sequence of crafted strongly-convex-strongly-concave subproblems. In the general setting, the complexity of our proposed algorithm nearly matches the lower bound; in particular, it removes an additional poly-logarithmic dependence on accuracy present in previous works. In the averaged smooth finite-sum setting, our proposed algorithm improves over previous algorithms by providing a nearly-tight dependence on the condition number. △ Less

Submitted 29 March, 2021; originally announced March 2021.

arXiv:2102.01867 [pdf, other]

Impact of Data Processing on Fairness in Supervised Learning

Authors: Sajad Khodadadian, AmirEmad Ghassami, Negar Kiyavash

Abstract: We study the impact of pre and post processing for reducing discrimination in data-driven decision makers. We first analyze the fundamental trade-off between fairness and accuracy in a pre-processing approach, and propose a design for a pre-processing module based on a convex optimization program, which can be added before the original classifier. This leads to a fundamental lower bound on attaina… ▽ More We study the impact of pre and post processing for reducing discrimination in data-driven decision makers. We first analyze the fundamental trade-off between fairness and accuracy in a pre-processing approach, and propose a design for a pre-processing module based on a convex optimization program, which can be added before the original classifier. This leads to a fundamental lower bound on attainable discrimination, given any acceptable distortion in the outcome. Furthermore, we reformulate an existing post-processing method in terms of our accuracy and fairness measures, which allows comparing post-processing and pre-processing approaches. We show that under some mild conditions, pre-processing outperforms post-processing. Finally, we show that by appropriate choice of the discrimination measure, the optimization problem for both pre and post processing approaches will reduce to a linear program and hence can be solved efficiently. △ Less

Submitted 2 February, 2021; originally announced February 2021.

Comments: 18 pages, 4 figures

arXiv:2010.04992 [pdf, other]

A Recursive Markov Boundary-Based Approach to Causal Structure Learning

Authors: Ehsan Mokhtarian, Sina Akbari, AmirEmad Ghassami, Negar Kiyavash

Abstract: Constraint-based methods are one of the main approaches for causal structure learning that are particularly valued as they are asymptotically guaranteed to find a structure that is Markov equivalent to the causal graph of the system. On the other hand, they may require an exponentially large number of conditional independence (CI) tests in the number of variables of the system. In this paper, we p… ▽ More Constraint-based methods are one of the main approaches for causal structure learning that are particularly valued as they are asymptotically guaranteed to find a structure that is Markov equivalent to the causal graph of the system. On the other hand, they may require an exponentially large number of conditional independence (CI) tests in the number of variables of the system. In this paper, we propose a novel recursive constraint-based method for causal structure learning that significantly reduces the required number of CI tests compared to the existing literature. The idea of the proposed approach is to use Markov boundary information to identify a specific variable that can be removed from the set of variables without affecting the statistical dependencies among the other variables. Having identified such a variable, we discover its neighborhood, remove that variable from the set of variables, and recursively learn the causal structure over the remaining variables. We further provide a lower bound on the number of CI tests required by any constraint-based method. Comparing this lower bound to our achievable bound demonstrates the efficiency of the proposed approach. Our experimental results show that the proposed algorithm outperforms state-of-the-art both on synthetic and real-world structures. △ Less

Submitted 20 May, 2021; v1 submitted 10 October, 2020; originally announced October 2020.

Comments: 29 pages, 6 figures, 3 tables

arXiv:2006.09670 [pdf, ps, other]

LazyIter: A Fast Algorithm for Counting Markov Equivalent DAGs and Designing Experiments

Authors: Ali AhmadiTeshnizi, Saber Salehkaleybar, Negar Kiyavash

Abstract: The causal relationships among a set of random variables are commonly represented by a Directed Acyclic Graph (DAG), where there is a directed edge from variable $X$ to variable $Y$ if $X$ is a direct cause of $Y$. From the purely observational data, the true causal graph can be identified up to a Markov Equivalence Class (MEC), which is a set of DAGs with the same conditional independencies betwe… ▽ More The causal relationships among a set of random variables are commonly represented by a Directed Acyclic Graph (DAG), where there is a directed edge from variable $X$ to variable $Y$ if $X$ is a direct cause of $Y$. From the purely observational data, the true causal graph can be identified up to a Markov Equivalence Class (MEC), which is a set of DAGs with the same conditional independencies between the variables. The size of an MEC is a measure of complexity for recovering the true causal graph by performing interventions. We propose a method for efficient iteration over possible MECs given intervention results. We utilize the proposed method for computing MEC sizes and experiment design in active and passive learning settings. Compared to previous work for computing the size of MEC, our proposed algorithm reduces the time complexity by a factor of $O(n)$ for sparse graphs where $n$ is the number of variables in the system. Additionally, integrating our approach with dynamic programming, we design an optimal algorithm for passive experiment design. Experimental results show that our proposed algorithms for both computing the size of MEC and experiment design outperform the state of the art. △ Less

Submitted 17 June, 2020; originally announced June 2020.

Comments: 11 pages, 2 figures, ICML

arXiv:2002.09621 [pdf, other]

Global Convergence and Variance-Reduced Optimization for a Class of Nonconvex-Nonconcave Minimax Problems

Authors: Junchi Yang, Negar Kiyavash, Niao He

Abstract: Nonconvex minimax problems appear frequently in emerging machine learning applications, such as generative adversarial networks and adversarial learning. Simple algorithms such as the gradient descent ascent (GDA) are the common practice for solving these nonconvex games and receive lots of empirical success. Yet, it is known that these vanilla GDA algorithms with constant step size can potentiall… ▽ More Nonconvex minimax problems appear frequently in emerging machine learning applications, such as generative adversarial networks and adversarial learning. Simple algorithms such as the gradient descent ascent (GDA) are the common practice for solving these nonconvex games and receive lots of empirical success. Yet, it is known that these vanilla GDA algorithms with constant step size can potentially diverge even in the convex setting. In this work, we show that for a subclass of nonconvex-nonconcave objectives satisfying a so-called two-sided Polyak-Łojasiewicz inequality, the alternating gradient descent ascent (AGDA) algorithm converges globally at a linear rate and the stochastic AGDA achieves a sublinear rate. We further develop a variance reduced algorithm that attains a provably faster rate than AGDA when the problem has the finite-sum structure. △ Less

Submitted 21 February, 2020; originally announced February 2020.

arXiv:2001.00543 [pdf, other]

Toward Optimal Adversarial Policies in the Multiplicative Learning System with a Malicious Expert

Authors: S. Rasoul Etesami, Negar Kiyavash, Vincent Leon, H. Vincent Poor

Abstract: We consider a learning system based on the conventional multiplicative weight (MW) rule that combines experts' advice to predict a sequence of true outcomes. It is assumed that one of the experts is malicious and aims to impose the maximum loss on the system. The loss of the system is naturally defined to be the aggregate absolute difference between the sequence of predicted outcomes and the true… ▽ More We consider a learning system based on the conventional multiplicative weight (MW) rule that combines experts' advice to predict a sequence of true outcomes. It is assumed that one of the experts is malicious and aims to impose the maximum loss on the system. The loss of the system is naturally defined to be the aggregate absolute difference between the sequence of predicted outcomes and the true outcomes. We consider this problem under both offline and online settings. In the offline setting where the malicious expert must choose its entire sequence of decisions a priori, we show somewhat surprisingly that a simple greedy policy of always reporting false prediction is asymptotically optimal with an approximation ratio of $1+O(\sqrt{\frac{\ln N}{N}})$, where $N$ is the total number of prediction stages. In particular, we describe a policy that closely resembles the structure of the optimal offline policy. For the online setting where the malicious expert can adaptively make its decisions, we show that the optimal online policy can be efficiently computed by solving a dynamic program in $O(N^3)$. Our results provide a new direction for vulnerability assessment of commonly used learning algorithms to adversarial attacks where the threat is an integral part of the system. △ Less

Submitted 17 September, 2020; v1 submitted 2 January, 2020; originally announced January 2020.

arXiv:1911.04628 [pdf, other]

Model-Augmented Estimation of Conditional Mutual Information for Feature Selection

Authors: Alan Yang, AmirEmad Ghassami, Maxim Raginsky, Negar Kiyavash, Elyse Rosenbaum

Abstract: Markov blanket feature selection, while theoretically optimal, is generally challenging to implement. This is due to the shortcomings of existing approaches to conditional independence (CI) testing, which tend to struggle either with the curse of dimensionality or computational complexity. We propose a novel two-step approach which facilitates Markov blanket feature selection in high dimensions. F… ▽ More Markov blanket feature selection, while theoretically optimal, is generally challenging to implement. This is due to the shortcomings of existing approaches to conditional independence (CI) testing, which tend to struggle either with the curse of dimensionality or computational complexity. We propose a novel two-step approach which facilitates Markov blanket feature selection in high dimensions. First, neural networks are used to map features to low-dimensional representations. In the second step, CI testing is performed by applying the $k$-NN conditional mutual information estimator to the learned feature maps. The map**s are designed to ensure that mapped samples both preserve information and share similar information about the target variable if and only if they are close in Euclidean distance. We show that these properties boost the performance of the $k$-NN estimator in the second step. The performance of the proposed method is evaluated on both synthetic and real data. △ Less

Submitted 19 June, 2020; v1 submitted 11 November, 2019; originally announced November 2019.

Comments: Accepted to UAI 2020

arXiv:1910.12993 [pdf, other]

Characterizing Distribution Equivalence and Structure Learning for Cyclic and Acyclic Directed Graphs

Authors: AmirEmad Ghassami, Alan Yang, Negar Kiyavash, Kun Zhang

Abstract: The main approach to defining equivalence among acyclic directed causal graphical models is based on the conditional independence relationships in the distributions that the causal models can generate, in terms of the Markov equivalence. However, it is known that when cycles are allowed in the causal structure, conditional independence may not be a suitable notion for equivalence of two structures… ▽ More The main approach to defining equivalence among acyclic directed causal graphical models is based on the conditional independence relationships in the distributions that the causal models can generate, in terms of the Markov equivalence. However, it is known that when cycles are allowed in the causal structure, conditional independence may not be a suitable notion for equivalence of two structures, as it does not reflect all the information in the distribution that is useful for identification of the underlying structure. In this paper, we present a general, unified notion of equivalence for linear Gaussian causal directed graphical models, whether they are cyclic or acyclic. In our proposed definition of equivalence, two structures are equivalent if they can generate the same set of data distributions. We also propose a weaker notion of equivalence called quasi-equivalence, which we show is the extent of identifiability from observational data. We propose analytic as well as graphical methods for characterizing the equivalence of two structures. Additionally, we propose a score-based method for learning the structure from observational data, which successfully deals with both acyclic and cyclic structures. △ Less

Submitted 2 July, 2020; v1 submitted 28 October, 2019; originally announced October 2019.

arXiv:1910.05651 [pdf, other]

Interventional Experiment Design for Causal Structure Learning

Authors: AmirEmad Ghassami, Saber Salehkaleybar, Negar Kiyavash

Abstract: It is known that from purely observational data, a causal DAG is identifiable only up to its Markov equivalence class, and for many ground truth DAGs, the direction of a large portion of the edges will be remained unidentified. The golden standard for learning the causal DAG beyond Markov equivalence is to perform a sequence of interventions in the system and use the data gathered from the interve… ▽ More It is known that from purely observational data, a causal DAG is identifiable only up to its Markov equivalence class, and for many ground truth DAGs, the direction of a large portion of the edges will be remained unidentified. The golden standard for learning the causal DAG beyond Markov equivalence is to perform a sequence of interventions in the system and use the data gathered from the interventional distributions. We consider a setup in which given a budget $k$, we design $k$ interventions non-adaptively. We cast the problem of finding the best intervention target set as an optimization problem which aims to maximize the number of edges whose directions are identified due to the performed interventions. First, we consider the case that the underlying causal structure is a tree. For this case, we propose an efficient exact algorithm for the worst-case gain setup, as well as an approximate algorithm for the average gain setup. We then show that the proposed approach for the average gain setup can be extended to the case of general causal structures. In this case, besides the design of interventions, calculating the objective function is also challenging. We propose an efficient exact calculator as well as two estimators for this task. We evaluate the proposed methods using synthetic as well as real data. △ Less

Submitted 12 October, 2019; originally announced October 2019.

arXiv:1908.03932 [pdf, other]

Learning Linear Non-Gaussian Causal Models in the Presence of Latent Variables

Authors: Saber Salehkaleybar, AmirEmad Ghassami, Negar Kiyavash, Kun Zhang

Abstract: We consider the problem of learning causal models from observational data generated by linear non-Gaussian acyclic causal models with latent variables. Without considering the effect of latent variables, one usually infers wrong causal relationships among the observed variables. Under faithfulness assumption, we propose a method to check whether there exists a causal path between any two observed… ▽ More We consider the problem of learning causal models from observational data generated by linear non-Gaussian acyclic causal models with latent variables. Without considering the effect of latent variables, one usually infers wrong causal relationships among the observed variables. Under faithfulness assumption, we propose a method to check whether there exists a causal path between any two observed variables. From this information, we can obtain the causal order among them. The next question is then whether or not the causal effects can be uniquely identified as well. It can be shown that causal effects among observed variables cannot be identified uniquely even under the assumptions of faithfulness and non-Gaussianity of exogenous noises. However, we will propose an efficient method to identify the set of all possible causal effects that are compatible with the observational data. Furthermore, we present some structural conditions on the causal graph under which we can learn causal effects among observed variables uniquely. We also provide necessary and sufficient graphical conditions for unique identification of the number of variables in the system. Experiments on synthetic data and real-world data show the effectiveness of our proposed algorithm on learning causal models. △ Less

Submitted 11 August, 2019; originally announced August 2019.

arXiv:1903.01422 [pdf, ps, other]

Database Alignment with Gaussian Features

Authors: Osman Emre Dai, Daniel Cullina, Negar Kiyavash

Abstract: We consider the problem of aligning a pair of databases with jointly Gaussian features. We consider two algorithms, complete database alignment via MAP estimation among all possible database alignments, and partial alignment via a thresholding approach of log likelihood ratios. We derive conditions on mutual information between feature pairs, identifying the regimes where the algorithms are guaran… ▽ More We consider the problem of aligning a pair of databases with jointly Gaussian features. We consider two algorithms, complete database alignment via MAP estimation among all possible database alignments, and partial alignment via a thresholding approach of log likelihood ratios. We derive conditions on mutual information between feature pairs, identifying the regimes where the algorithms are guaranteed to perform reliably and those where they cannot be expected to succeed. △ Less

Submitted 1 September, 2019; v1 submitted 4 March, 2019; originally announced March 2019.

arXiv:1809.03553 [pdf, ps, other]

Partial Recovery of Erdős-Rényi Graph Alignment via $k$-Core Alignment

Authors: Daniel Cullina, Negar Kiyavash, Prateek Mittal, H. Vincent Poor

Abstract: We determine information theoretic conditions under which it is possible to partially recover the alignment used to generate a pair of sparse, correlated Erdős-Rényi graphs. To prove our achievability result, we introduce the $k$-core alignment estimator. This estimator searches for an alignment in which the intersection of the correlated graphs using this alignment has a minimum degree of $k$. We… ▽ More We determine information theoretic conditions under which it is possible to partially recover the alignment used to generate a pair of sparse, correlated Erdős-Rényi graphs. To prove our achievability result, we introduce the $k$-core alignment estimator. This estimator searches for an alignment in which the intersection of the correlated graphs using this alignment has a minimum degree of $k$. We prove a matching converse bound. As the number of vertices grows, recovery of the alignment for a fraction of the vertices tending to one is possible when the average degree of the intersection of the graph pair tends to infinity. It was previously known that exact alignment is possible when this average degree grows faster than the logarithm of the number of vertices. △ Less

Submitted 3 November, 2018; v1 submitted 10 September, 2018; originally announced September 2018.

arXiv:1806.01814 [pdf, other]

doi 10.1109/RTAS.2019.00016

A Novel Side-Channel in Real-Time Schedulers

Authors: Chien-Ying Chen, Sibin Mohan, Rodolfo Pellizzoni, Rakesh B. Bobba, Negar Kiyavash

Abstract: We demonstrate the presence of a novel scheduler side-channel in preemptive, fixed-priority real-time systems (RTS); examples of such systems can be found in automotive systems, avionic systems, power plants and industrial control systems among others. This side-channel can leak important timing information such as the future arrival times of real-time tasks.This information can then be used to la… ▽ More We demonstrate the presence of a novel scheduler side-channel in preemptive, fixed-priority real-time systems (RTS); examples of such systems can be found in automotive systems, avionic systems, power plants and industrial control systems among others. This side-channel can leak important timing information such as the future arrival times of real-time tasks.This information can then be used to launch devastating attacks, two of which are demonstrated here (on real hardware platforms). Note that it is not easy to capture this timing information due to runtime variations in the schedules, the presence of multiple other tasks in the system and the typical constraints (e.g., deadlines) in the design of RTS. Our ScheduLeak algorithms demonstrate how to effectively exploit this side-channel. A complete implementation is presented on real operating systems (in Real-time Linux and FreeRTOS). Timing information leaked by ScheduLeak can significantly aid other, more advanced, attacks in better accomplishing their goals. △ Less

Submitted 9 May, 2019; v1 submitted 5 June, 2018; originally announced June 2018.

Journal ref: 2019 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), Montreal, 2019, pp. 90-102

arXiv:1806.01393 [pdf, other]

REORDER: Securing Dynamic-Priority Real-Time Systems Using Schedule Obfuscation

Authors: Chien-Ying Chen, Monowar Hasan, AmirEmad Ghassami, Sibin Mohan, Negar Kiyavash

Abstract: The deterministic (timing) behavior of real-time systems (RTS) can be used by adversaries - say, to launch side channel attacks or even destabilize the system by denying access to critical resources. We propose a protocol (named REORDER) to obfuscate this predictable timing behavior of RTS, especially ones designed using dynamic-priority scheduling algorithms (e.g., EDF). We also present a metric… ▽ More The deterministic (timing) behavior of real-time systems (RTS) can be used by adversaries - say, to launch side channel attacks or even destabilize the system by denying access to critical resources. We propose a protocol (named REORDER) to obfuscate this predictable timing behavior of RTS, especially ones designed using dynamic-priority scheduling algorithms (e.g., EDF). We also present a metric (named "schedule entropy") that measures the levels of obfuscation introduced into a given real-time system. The REORDER protocol was integrated into the standard Linux real-time scheduler and evaluated on a realistic embedded platform (Raspberry Pi) running the MiBench automotive benchmark workloads. We also demonstrate how designers of RTS can increase the security of their systems and also quantitatively measure the impact (both in terms of security and performance) of using this protocol. △ Less

Submitted 8 April, 2019; v1 submitted 4 June, 2018; originally announced June 2018.

arXiv:1805.03829 [pdf, ps, other]

Fundamental Limits of Database Alignment

Authors: Daniel Cullina, Prateek Mittal, Negar Kiyavash

Abstract: We consider the problem of aligning a pair of databases with correlated entries. We introduce a new measure of correlation in a joint distribution that we call cycle mutual information. This measure has operational significance: it determines whether exact recovery of the correspondence between database entries is possible for any algorithm. Additionally, there is an efficient algorithm for databa… ▽ More We consider the problem of aligning a pair of databases with correlated entries. We introduce a new measure of correlation in a joint distribution that we call cycle mutual information. This measure has operational significance: it determines whether exact recovery of the correspondence between database entries is possible for any algorithm. Additionally, there is an efficient algorithm for database alignment that achieves this information theoretic threshold. △ Less

Submitted 10 May, 2018; originally announced May 2018.

Comments: 5 pages, ISIT 2018

arXiv:1804.09758 [pdf, other]

Analysis of a Canonical Labeling Algorithm for the Alignment of Correlated Erdős-Rényi Graphs

Authors: Osman Emre Dai, Daniel Cullina, Negar Kiyavash, Matthias Grossglauser

Abstract: Graph alignment in two correlated random graphs refers to the task of identifying the correspondence between vertex sets of the graphs. Recent results have characterized the exact information-theoretic threshold for graph alignment in correlated Erdős-Rényi graphs. However, very little is known about the existence of efficient algorithms to achieve graph alignment without seeds. In this work we… ▽ More Graph alignment in two correlated random graphs refers to the task of identifying the correspondence between vertex sets of the graphs. Recent results have characterized the exact information-theoretic threshold for graph alignment in correlated Erdős-Rényi graphs. However, very little is known about the existence of efficient algorithms to achieve graph alignment without seeds. In this work we identify a region in which a straightforward $O(n^{11/5} \log n )$-time canonical labeling algorithm, initially introduced in the context of graph isomorphism, succeeds in aligning correlated Erdős-Rényi graphs. The algorithm has two steps. In the first step, all vertices are labeled by their degrees and a trivial minimum distance alignment (i.e., sorting vertices according to their degrees) matches a fixed number of highest degree vertices in the two graphs. Having identified this subset of vertices, the remaining vertices are matched using a alignment algorithm for bipartite graphs. △ Less

Submitted 1 September, 2019; v1 submitted 25 April, 2018; originally announced April 2018.

arXiv:1802.01239 [pdf, other]

Counting and Sampling from Markov Equivalent DAGs Using Clique Trees

Authors: AmirEmad Ghassami, Saber Salehkaleybar, Negar Kiyavash, Kun Zhang

Abstract: A directed acyclic graph (DAG) is the most common graphical model for representing causal relationships among a set of variables. When restricted to using only observational data, the structure of the ground truth DAG is identifiable only up to Markov equivalence, based on conditional independence relations among the variables. Therefore, the number of DAGs equivalent to the ground truth DAG is an… ▽ More A directed acyclic graph (DAG) is the most common graphical model for representing causal relationships among a set of variables. When restricted to using only observational data, the structure of the ground truth DAG is identifiable only up to Markov equivalence, based on conditional independence relations among the variables. Therefore, the number of DAGs equivalent to the ground truth DAG is an indicator of the causal complexity of the underlying structure--roughly speaking, it shows how many interventions or how much additional information is further needed to recover the underlying DAG. In this paper, we propose a new technique for counting the number of DAGs in a Markov equivalence class. Our approach is based on the clique tree representation of chordal graphs. We show that in the case of bounded degree graphs, the proposed algorithm is polynomial time. We further demonstrate that this technique can be utilized for uniform sampling from a Markov equivalence class, which provides a stochastic way to enumerate DAGs in the equivalence class and may be needed for finding the best DAG or for causal inference given the equivalence class as input. We also extend our counting and sampling method to the case where prior knowledge about the underlying DAG is available, and present applications of this extension in causal experiment design and estimating the causal effect of joint interventions. △ Less

Submitted 10 September, 2018; v1 submitted 4 February, 2018; originally announced February 2018.

arXiv:1801.04378 [pdf, other]

Fairness in Supervised Learning: An Information Theoretic Approach

Authors: AmirEmad Ghassami, Sajad Khodadadian, Negar Kiyavash

Abstract: Automated decision making systems are increasingly being used in real-world applications. In these systems for the most part, the decision rules are derived by minimizing the training error on the available historical data. Therefore, if there is a bias related to a sensitive attribute such as gender, race, religion, etc. in the data, say, due to cultural/historical discriminatory practices agains… ▽ More Automated decision making systems are increasingly being used in real-world applications. In these systems for the most part, the decision rules are derived by minimizing the training error on the available historical data. Therefore, if there is a bias related to a sensitive attribute such as gender, race, religion, etc. in the data, say, due to cultural/historical discriminatory practices against a certain demographic, the system could continue discrimination in decisions by including the said bias in its decision rule. We present an information theoretic framework for designing fair predictors from data, which aim to prevent discrimination against a specified sensitive attribute in a supervised learning setting. We use equalized odds as the criterion for discrimination, which demands that the prediction should be independent of the protected attribute conditioned on the actual label. To ensure fairness and generalization simultaneously, we compress the data to an auxiliary variable, which is used for the prediction task. This auxiliary variable is chosen such that it is decontaminated from the discriminatory attribute in the sense of equalized odds. The final predictor is obtained by applying a Bayesian decision rule to the auxiliary variable. △ Less

Submitted 29 July, 2018; v1 submitted 12 January, 2018; originally announced January 2018.

arXiv:1711.06783 [pdf, ps, other]

Exact alignment recovery for correlated Erdős-Rényi graphs

Authors: Daniel Cullina, Negar Kiyavash

Abstract: We consider the problem of perfectly recovering the vertex correspondence between two correlated Erdős-Rényi (ER) graphs on the same vertex set. The correspondence between the vertices can be obscured by randomly permuting the vertex labels of one of the graphs. We determine the information-theoretic threshold for exact recovery, i.e. the conditions under which the entire vertex correspondence can… ▽ More We consider the problem of perfectly recovering the vertex correspondence between two correlated Erdős-Rényi (ER) graphs on the same vertex set. The correspondence between the vertices can be obscured by randomly permuting the vertex labels of one of the graphs. We determine the information-theoretic threshold for exact recovery, i.e. the conditions under which the entire vertex correspondence can be correctly recovered given unbounded computational resources. △ Less

Submitted 14 May, 2018; v1 submitted 17 November, 2017; originally announced November 2017.

Comments: 12 pages. arXiv admin note: text overlap with arXiv:1602.01042

arXiv:1709.03625 [pdf, other]

Budgeted Experiment Design for Causal Structure Learning

Authors: AmirEmad Ghassami, Saber Salehkaleybar, Negar Kiyavash, Elias Bareinboim

Abstract: We study the problem of causal structure learning when the experimenter is limited to perform at most $k$ non-adaptive experiments of size $1$. We formulate the problem of finding the best intervention target set as an optimization problem, which aims to maximize the average number of edges whose directions are resolved. We prove that the corresponding objective function is submodular and a greedy… ▽ More We study the problem of causal structure learning when the experimenter is limited to perform at most $k$ non-adaptive experiments of size $1$. We formulate the problem of finding the best intervention target set as an optimization problem, which aims to maximize the average number of edges whose directions are resolved. We prove that the corresponding objective function is submodular and a greedy algorithm suffices to achieve $(1-\frac{1}{e})$-approximation of the optimal value. We further present an accelerated variant of the greedy algorithm, which can lead to orders of magnitude performance speedup. We validate our proposed approach on synthetic and real graphs. The results show that compared to the purely observational setting, our algorithm orients the majority of the edges through a considerably small number of interventions. △ Less

Submitted 29 July, 2018; v1 submitted 11 September, 2017; originally announced September 2017.

Journal ref: 35th International Conference on Machine Learning (ICML), PMLR 80:1719-1728, 2018

arXiv:1707.07234 [pdf, other]

doi 10.1109/TIFS.2018.2797953

A Covert Queueing Channel in FCFS Schedulers

Authors: AmirEmad Ghassami, Negar Kiyavash

Abstract: We study covert queueing channels (CQCs), which are a kind of covert timing channel that may be exploited in shared queues across supposedly isolated users. In our system model, a user sends messages to another user via his pattern of access to the shared resource, which serves the users according to a first come first served (FCFS) policy. One example of such a channel is the cross-virtual networ… ▽ More We study covert queueing channels (CQCs), which are a kind of covert timing channel that may be exploited in shared queues across supposedly isolated users. In our system model, a user sends messages to another user via his pattern of access to the shared resource, which serves the users according to a first come first served (FCFS) policy. One example of such a channel is the cross-virtual network covert channel in data center networks, resulting from the queueing effects of the shared resource. First, we study a system comprising a transmitter and a receiver that share a deterministic and work-conserving FCFS scheduler, and we compute the capacity of this channel. We also consider the effect of the presence of other users on the information transmission rate of this channel. The achievable information transmission rates obtained in this study demonstrate the possibility of significant information leakage and great privacy threats brought by CQCs in FCFS schedulers. △ Less

Submitted 29 July, 2018; v1 submitted 22 July, 2017; originally announced July 2017.

Journal ref: IEEE Transactions on Information Forensics and Security, 13, no. 6 (2018): pp. 1551-1563

arXiv:1706.06936 [pdf]

Significance of Side Information in the Graph Matching Problem

Authors: Kushagra Singhal, Daniel Cullina, Negar Kiyavash

Abstract: Percolation based graph matching algorithms rely on the availability of seed vertex pairs as side information to efficiently match users across networks. Although such algorithms work well in practice, there are other types of side information available which are potentially useful to an attacker. In this paper, we consider the problem of matching two correlated graphs when an attacker has access… ▽ More Percolation based graph matching algorithms rely on the availability of seed vertex pairs as side information to efficiently match users across networks. Although such algorithms work well in practice, there are other types of side information available which are potentially useful to an attacker. In this paper, we consider the problem of matching two correlated graphs when an attacker has access to side information, either in the form of community labels or an imperfect initial matching. In the former case, we propose a naive graph matching algorithm by introducing the community degree vectors which harness the information from community labels in an efficient manner. Furthermore, we analyze a variant of the basic percolation algorithm proposed in literature for graphs with community structure. In the latter case, we propose a novel percolation algorithm with two thresholds which uses an imperfect matching as input to match correlated graphs. We evaluate the proposed algorithms on synthetic as well as real world datasets using various experiments. The experimental results demonstrate the importance of communities as side information especially when the number of seeds is small and the networks are weakly correlated. △ Less

Submitted 21 June, 2017; originally announced June 2017.

Showing 1–50 of 79 results for author: Kiyavash, N