Standardizing Structural Causal Models

Weronika Ormaniec
ETH Zürich
Zürich, Switzerland
[email protected] &Scott Sussex
ETH Zürich
Zürich, Switzerland
[email protected] &Lars Lorch
ETH Zürich
Zürich, Switzerland
[email protected] Bernhard Schölkopf
MPI for Intelligent Systems
Tübingen, Germany
[email protected] &Andreas Krause
ETH Zürich
Zürich, Switzerland
[email protected]
Equal contribution.
Abstract

Synthetic datasets generated by structural causal models (SCMs) are commonly used for benchmarking causal structure learning algorithms. However, the variances and pairwise correlations in SCM data tend to increase along the causal ordering. Several popular algorithms exploit these artifacts, possibly leading to conclusions that do not generalize to real-world settings. Existing metrics like VarVar\operatorname{Var}roman_Var-sortability and R2superscriptR2\operatorname{R^{2}}roman_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-sortability quantify these patterns, but they do not provide tools to remedy them. To address this, we propose internally-standardized structural causal models (iSCMs), a modification of SCMs that introduces a standardization operation at each variable during the generative process. By construction, iSCMs are not VarVar\operatorname{Var}roman_Var-sortable, and as we show experimentally, not R2superscriptR2\operatorname{R^{2}}roman_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-sortable either for commonly-used graph families. Moreover, contrary to the post-hoc standardization of data generated by standard SCMs, we prove that linear iSCMs are less identifiable from prior knowledge on the weights and do not collapse to deterministic relationships in large systems, which may make iSCMs a useful model in causal inference beyond the benchmarking problem studied here.

1 Introduction

Predicting the effects of interventions and policy decisions requires reasoning about causality. Consequently, scientific fields ranging from biology and earth sciences to economics and statistics are interested in modeling causal structure (Pearl,, 2009; Maathuis et al.,, 2010; Imbens and Rubin,, 2015; Runge et al.,, 2019). A wide array of causal discovery algorithms has been proposed with the goal of inferring causal structure from data (e.g., Squires and Uhler,, 2022; Vowels et al.,, 2022). However, benchmarking these algorithms is challenging, since real-world datasets with an agreed-upon, ground-truth causal structure are rare (e.g., Sachs et al.,, 2005; see Mooij et al.,, 2020). The community predominantly relies on synthetic data for evaluating structure learning algorithms, where observations are generated according to a predetermined causal structure and system mechanisms. The inferred causal structures can then be directly compared to the ground truth. To generate synthetic data, it is common practice to sample from structural causal models with additive noise (SCMs) (Reisach et al.,, 2021). Unless stated otherwise, this work considers SCMs in which the variance scale of the additive noise is the same for all variables, a typical simplification made in benchmarking.

Under common benchmarking practices, synthetic datasets generated by SCMs contain patterns that are directly exploitable to make structure discovery easier. We will refer to such patterns as artifacts. In SCMs, the pairwise correlations between variables tend to increase along the causal ordering, since variance builds up downstream and, as a result, the proportion of the variance driven by the additive noise vanishes (Figure 1(a)). Reisach et al., (2024) characterize this phenomenon through an increase of the coefficients of determination (R2superscriptR2\operatorname{R^{2}}roman_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT) of the variables regressed on all others. Crucially, this artifact occurs both in the raw data and when shifting and scaling (standardizing) the variables to have zero mean and unit variance. One of the implications is that downstream causal dependencies in SCMs become effectively deterministic, especially in large-scale systems. As Reisach et al., (2024) demonstrate, simple causal discovery baselines can perform competitively on benchmarks of this kind by directly exploiting this phenomenon. This makes SCMs in their general definition possibly unsuitable for the purpose of benchmarking and, as we will argue, to some degree suboptimal for inferring causality more broadly. Ultimately, benchmarking on synthetic data with these patterns could lead to conclusions that do not generalize to real-world scenarios.

In this work, we propose a simple modification of SCMs that stabilizes the data-generating process and thereby removes exploitable covariance artifacts. Our models, denoted internally-standardized SCMs (iSCMs), introduce a standardization operation at each variable during the generative process (Figure 1(b)). In Section 4, we provide a theoretical motivation for this idea by studying linear iSCMs. We prove that, contrary to SCMs, the causal dependencies of iSCMs under mild assumptions never collapse to deterministic mechanisms as the graph size becomes large. Moreover, we formalize the correlation artifact commonly observed in benchmarks by proving that linear SCM structures in a Markov equivalence class (MEC) are partially identifiable for certain graph classes, given weak prior knowledge on the weight distribution of the ground-truth SCM. Most importantly, we show that this is not the case for the corresponding iSCMs. In Section 5, we empirically demonstrate that the baselines proposed in Reisach et al., (2021, 2024) are unable to exploit covariance artifacts in iSCMs, while practical classes of causal discovery algorithms are still able to learn causal structures in both linear and nonlinear systems. Our findings also reveal that SCM artifacts affect structure learning both positively and negatively, suggesting that generating (standardized) data from standard SCMs may be particularly ill-suited for benchmarking common approaches in use today.

x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTx1ssuperscriptsubscript𝑥1𝑠x_{1}^{s}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPTx2subscript𝑥2x_{2}italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTx2ssuperscriptsubscript𝑥2𝑠x_{2}^{s}italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT~{}~{}\dots~{}~{}~{}~{}\dots~{}~{}x9subscript𝑥9x_{9}italic_x start_POSTSUBSCRIPT 9 end_POSTSUBSCRIPTx9ssuperscriptsubscript𝑥9𝑠x_{9}^{s}italic_x start_POSTSUBSCRIPT 9 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPTx10subscript𝑥10x_{10}italic_x start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPTx10ssuperscriptsubscript𝑥10𝑠x_{10}^{s}italic_x start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT0.750.750.750.750.860.860.860.860.980.980.980.980.980.980.980.98|ρ|𝜌{\lvert\rho\rvert}| italic_ρ |:
(a) Standardized SCM
x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTx~1subscript~𝑥1\widetilde{x}_{1}over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTx2subscript𝑥2x_{2}italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTx~2subscript~𝑥2\widetilde{x}_{2}over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT~{}~{}\dots~{}~{}~{}~{}\dots~{}~{}~{}~{}\dots~{}~{}x9subscript𝑥9x_{9}italic_x start_POSTSUBSCRIPT 9 end_POSTSUBSCRIPTx~9subscript~𝑥9\widetilde{x}_{9}over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 9 end_POSTSUBSCRIPTx10subscript𝑥10x_{10}italic_x start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPTx~10subscript~𝑥10\widetilde{x}_{10}over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT0.750.750.750.750.750.750.750.750.750.750.750.750.750.750.750.75|ρ|𝜌{\lvert\rho\rvert}| italic_ρ |:
(b) iSCM
Figure 1: Standardizing SCMs two ways. Generative process for a chain graph of (a) standard SCMs, with data 𝐱𝐱\bf{x}bold_x standardized post-hoc, and (b) SCMs with standardization performed during the generative process (iSCMs). Dashed arrows indicate z-standardization. Solid arrows indicate linear functions with weights from Unif±[0.5,2.0]subscriptUnifplus-or-minus0.52.0\smash{\operatorname{Unif}_{\pm}[0.5,2.0]}roman_Unif start_POSTSUBSCRIPT ± end_POSTSUBSCRIPT [ 0.5 , 2.0 ] and additive noise from 𝒩(0,1)𝒩01\smash{\mathcal{N}(0,1)}caligraphic_N ( 0 , 1 ). We report absolute correlations |ρ|𝜌\smash{\lvert\rho\rvert}| italic_ρ | of two consecutive observed variables, (a) xjssuperscriptsubscript𝑥𝑗𝑠x_{j}^{s}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT and xj+1ssuperscriptsubscript𝑥𝑗1𝑠x_{j+1}^{s}italic_x start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT, or (b) x~jsubscript~𝑥𝑗\smash{\widetilde{x}_{j}}over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and x~j+1subscript~𝑥𝑗1\smash{\widetilde{x}_{j+1}}over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT, averaged over 100000100000100000100000 models. In standard SCMs (a), correlations tend to increase along the causal ordering.

2 Background and Related Work

We begin by introducing structural causal models and the problem of causal structure learning, before discussing how synthetic data is often generated for evaluating structure learning algorithms. We then review existing works that study identifiability and patterns frequently present in synthetic data.

Structural causal models

A structural causal model (SCM) (Peters et al.,, 2017) of d𝑑ditalic_d variables 𝐱={x1,,xd}𝐱subscript𝑥1subscript𝑥𝑑\mathbf{x}=\{x_{1},\dots,x_{d}\}bold_x = { italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT } consists of a collection of structural assignments, each given by

xi:=fi(𝐱pa(i),εi),assignsubscript𝑥𝑖subscript𝑓𝑖subscript𝐱pa𝑖subscript𝜀𝑖x_{i}:=f_{i}(\mathbf{x}_{\mathrm{pa}(i)},\varepsilon_{i})\,,italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT roman_pa ( italic_i ) end_POSTSUBSCRIPT , italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , (SCM)

where 𝐱pa(i)𝐱{xi}subscript𝐱pa𝑖𝐱subscript𝑥𝑖\mathbf{x}_{\mathrm{pa}(i)}\subseteq\mathbf{x}\setminus\{x_{i}\}bold_x start_POSTSUBSCRIPT roman_pa ( italic_i ) end_POSTSUBSCRIPT ⊆ bold_x ∖ { italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } are called the parents of xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Here, fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are arbitrary functions, and εisubscript𝜀𝑖\varepsilon_{i}italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are independent random variables that model exogenous noise (or unexplained variation). Together, they entail a joint probability distribution p(𝐱)𝑝𝐱p(\mathbf{x})italic_p ( bold_x ) over the variables 𝐱𝐱\bf{x}bold_x. It is common to consider SCMs with additive noise, e.g., with linear functions fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, as given by

fi(𝐱pa(i),εi)=𝐰i𝐱pa(i)+εi.subscript𝑓𝑖subscript𝐱pa𝑖subscript𝜀𝑖superscriptsubscript𝐰𝑖topsubscript𝐱pa𝑖subscript𝜀𝑖f_{i}(\mathbf{x}_{\mathrm{pa}(i)},\varepsilon_{i})=\mathbf{w}_{i}^{\top}% \mathbf{x}_{\mathrm{pa}(i)}+\varepsilon_{i}\,.italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT roman_pa ( italic_i ) end_POSTSUBSCRIPT , italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = bold_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_x start_POSTSUBSCRIPT roman_pa ( italic_i ) end_POSTSUBSCRIPT + italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT . (1)

The structural assignments in (SCM) induce a causal graph 𝒢=(𝒱,)𝒢𝒱\mathcal{G}=(\mathcal{V},\mathcal{E})caligraphic_G = ( caligraphic_V , caligraphic_E ) over the variables xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, which is assumed to be acyclic. Specifically, the directed acyclic graph (DAG) 𝒢𝒢\mathcal{G}caligraphic_G has vertices vi𝒱subscript𝑣𝑖𝒱v_{i}\in\mathcal{V}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_V for every xi𝐱subscript𝑥𝑖𝐱x_{i}\in\mathbf{x}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ bold_x and a directed edge (i,j)𝑖𝑗(i,j)\in\mathcal{E}( italic_i , italic_j ) ∈ caligraphic_E if xi𝐱pa(j)subscript𝑥𝑖subscript𝐱pa𝑗x_{i}\in\mathbf{x}_{\mathrm{pa}(j)}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ bold_x start_POSTSUBSCRIPT roman_pa ( italic_j ) end_POSTSUBSCRIPT. We will explicitly distinguish this DAG 𝒢𝒢\mathcal{G}caligraphic_G and its vertices 𝒱𝒱\mathcal{V}caligraphic_V from the variables 𝐱𝐱\mathbf{x}bold_x. The skeleton of 𝒢𝒢\mathcal{G}caligraphic_G denotes 𝒢𝒢\mathcal{G}caligraphic_G with all edges undirected. If the skeleton of 𝒢𝒢\mathcal{G}caligraphic_G is acyclic, we call 𝒢𝒢\mathcal{G}caligraphic_G a forest.

Structure learning and benchmarking

Given a set of i.i.d. observations from the probability distribution p(𝐱)𝑝𝐱p(\mathbf{x})italic_p ( bold_x ) induced by an unknown SCM, causal structure learning aims to infer the causal graph 𝒢𝒢\mathcal{G}caligraphic_G underlying the SCM. In this work, we focus on structure learning from observational, not interventional, data and only consider SCMs with no latent confounders.

Because it is difficult to obtain the ground-truth 𝒢𝒢\mathcal{G}caligraphic_G for many real-world datasets, it is common to evaluate structure learning algorithms on synthetic data where 𝒢𝒢\mathcal{G}caligraphic_G is known. A ubiquitous approach is to sample a DAG 𝒢𝒢\mathcal{G}caligraphic_G, then SCM functions defined over 𝒢𝒢\mathcal{G}caligraphic_G, and finally a dataset from this SCM, with the goal of later recovering 𝒢𝒢\mathcal{G}caligraphic_G from the data. It is common to consider εisubscript𝜀𝑖\varepsilon_{i}italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT with mean 00 and fixed variance (often 1111), and for linear systems, to sample each wi,jsubscript𝑤𝑖𝑗w_{i,j}italic_w start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT uniformly and i.i.d. with support bounded away from 00 (Shimizu et al.,, 2011; Peters and Bühlmann,, 2014; Zheng et al.,, 2018; Yu et al.,, 2019; Lachapelle et al.,, 2020; Zheng et al.,, 2020; Ng et al.,, 2020; Reisach et al.,, 2021; Lorch et al.,, 2022; Reisach et al.,, 2024). There exist alternative benchmarking strategies that involve sampling data from domain-specific simulators (Schaffter et al.,, 2011; Dibaeinia and Sinha,, 2020).

Data standardization and artifacts of SCMs

Previous work shows that generating data as described above can lead to strong artifacts. Reisach et al., (2021) observe that the variance of variables tends to increase along the topological ordering of 𝒢𝒢\mathcal{G}caligraphic_G. This leads to the VarVar\operatorname{Var}roman_Var-SortnRegress baseline, which sorts variables based on their empirical variance and then performs sparse regression to infer 𝒢𝒢\mathcal{G}caligraphic_G. Seng et al., (2024) show that structure learning algorithms minimizing an MSE-based loss (e.g., Zheng et al.,, 2018) can identify 𝒢𝒢\mathcal{G}caligraphic_G under similar conditions. Therefore, Reisach et al., (2021) propose using standardization (Figure 1(a)) to remove this variance artifact from benchmarks. Specifically, they first sample all xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT according to a standard SCM and then post-hoc transform the variables as

xis:=xi𝔼[xi]Var[xi],assignsuperscriptsubscript𝑥𝑖𝑠subscript𝑥𝑖𝔼delimited-[]subscript𝑥𝑖Varsubscript𝑥𝑖\displaystyle\hskip 81.0ptx_{i}^{s}:=\frac{x_{i}-\mathds{E}[x_{i}]}{\sqrt{% \operatorname{Var}[x_{i}]}}\,,italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT := divide start_ARG italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - blackboard_E [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] end_ARG start_ARG square-root start_ARG roman_Var [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] end_ARG end_ARG , (Standardized SCM)

such that our observations correspond to samples from p(𝐱𝐬)𝑝superscript𝐱𝐬p(\bf{x}^{s})italic_p ( bold_x start_POSTSUPERSCRIPT bold_s end_POSTSUPERSCRIPT ). Standardization, however, only removes the variance artifact. Even in standardized SCMs, the fraction of a variable’s variance that is explained by all others, measured by the coefficient of determination R2superscriptR2\operatorname{R^{2}}roman_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, tends to increase along the topological ordering (Reisach et al.,, 2024). R2superscriptR2\operatorname{R^{2}}roman_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-SortnRegress exploits this correlation artifact analogously to VarVar\operatorname{Var}roman_Var-SortnRegress. Existing heuristics aiming to avoid the increasing correlations adjust the sampling process of fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, but they ultimately limit the causal dependencies that can be modeled, e.g., to certain levels of correlations among the observed 𝐱𝐱\mathbf{x}bold_x (Mooij et al.,, 2020) or a constant proportion of variance explained by the parents 𝐱pa(i)subscript𝐱pa𝑖\mathbf{x}_{\mathrm{pa}(i)}bold_x start_POSTSUBSCRIPT roman_pa ( italic_i ) end_POSTSUBSCRIPT (Squires et al.,, 2022) (Appendix D.1). To our knowledge, there are currently no general methods for generating SCM data without strong correlation artifacts or significant limitations on the mechanisms fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and noise εisubscript𝜀𝑖\varepsilon_{i}italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

Identifiability

Given a class of SCMs, there may be several SCMs with different causal graphs 𝒢𝒢\mathcal{G}caligraphic_G that entail the same distribution p(𝐱)𝑝𝐱p(\mathbf{x})italic_p ( bold_x ) (Peters et al.,, 2017). Thus, even with infinite observations from p(𝐱)𝑝𝐱p(\mathbf{x})italic_p ( bold_x ), we may be unable to identify the causal graph 𝒢𝒢\mathcal{G}caligraphic_G that generated the observations. However, some identifiability results are known depending on the class of functions and noise distributions of the SCMs considered. For example, among all linear SCMs (1) with Gaussian noise εi𝒩(0,σi2)similar-tosubscript𝜀𝑖𝒩0subscriptsuperscript𝜎2𝑖\varepsilon_{i}\sim\mathcal{N}(0,\sigma^{2}_{i})italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), the graph 𝒢𝒢\mathcal{G}caligraphic_G can only be uniquely identified up to its MEC (Verma and Pearl,, 2013). However, if the noise scales σi=σsubscript𝜎𝑖𝜎\sigma_{i}=\sigmaitalic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_σ are equal (Peters and Bühlmann,, 2014) or the noise is non-Gaussian (Shimizu et al.,, 2006), 𝒢𝒢\mathcal{G}caligraphic_G can be uniquely identified given p(𝐱)𝑝𝐱p(\bf{x})italic_p ( bold_x ).

It is fundamental to recognize that existing identifiability results only concern the unstandardized distributions p(𝐱)𝑝𝐱p(\bf{x})italic_p ( bold_x ) of SCMs. When we standardize the data and observe p(𝐱𝐬)𝑝superscript𝐱𝐬p(\bf{x}^{s})italic_p ( bold_x start_POSTSUPERSCRIPT bold_s end_POSTSUPERSCRIPT ) instead, existing results no longer apply, because the implied SCM after standardization may violate the properties of the original SCM (e.g., its noise variances). In this work, we present, to our knowledge, the first (partial) identifiability result for standardized SCMs. Our result concerns a setting with prior knowledge on the magnitudes of w in Equation 1, an assumption underlying common benchmarking practices.

3 SCMs with Internal Standardization

3.1 Definition

We propose internally-standardized SCMs (iSCMs) as a modification to the standard data-generating process of SCMs. An iSCM (𝐒,𝒫𝜺)𝐒subscript𝒫𝜺(\mathbf{S},\mathcal{P}_{\bm{\varepsilon}})( bold_S , caligraphic_P start_POSTSUBSCRIPT bold_italic_ε end_POSTSUBSCRIPT ) consists of d𝑑ditalic_d pairs of assignments, where for each i{1,,d}𝑖1𝑑i\in\{1,\dots,d\}italic_i ∈ { 1 , … , italic_d },

xi:=fi(𝐱~pa(i),εi)andx~i:=xi𝔼[xi]Var[xi]formulae-sequenceassignsubscript𝑥𝑖subscript𝑓𝑖subscript~𝐱pa𝑖subscript𝜀𝑖andassignsubscript~𝑥𝑖subscript𝑥𝑖𝔼delimited-[]subscript𝑥𝑖Varsubscript𝑥𝑖\displaystyle x_{i}:=f_{i}(\mathbf{\widetilde{x}}_{\mathrm{pa}(i)},\varepsilon% _{i})~{}~{}~{}~{}\text{and}~{}~{}~{}~{}\widetilde{x}_{i}:=\frac{x_{i}-\mathds{% E}[x_{i}]}{\sqrt{\operatorname{Var}[x_{i}]}}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT roman_pa ( italic_i ) end_POSTSUBSCRIPT , italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) and over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := divide start_ARG italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - blackboard_E [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] end_ARG start_ARG square-root start_ARG roman_Var [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] end_ARG end_ARG (iSCM)

with parents 𝐱~pa(i)𝐱~{x~i}subscript~𝐱pa𝑖~𝐱subscript~𝑥𝑖\smash{\mathbf{\widetilde{x}}_{\mathrm{pa}(i)}\subseteq\widetilde{\bf{x}}% \setminus\{\widetilde{x}_{i}\}}over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT roman_pa ( italic_i ) end_POSTSUBSCRIPT ⊆ over~ start_ARG bold_x end_ARG ∖ { over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } of x~isubscript~𝑥𝑖\widetilde{x}_{i}over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in the underlying DAG and jointly-independent exogenous noise variables 𝜺=[ε1,,εd]𝒫𝜺𝜺subscript𝜀1subscript𝜀𝑑similar-tosubscript𝒫𝜺\bm{\varepsilon}=[\varepsilon_{1},...,\varepsilon_{d}]\sim\mathcal{P}_{\bm{% \varepsilon}}bold_italic_ε = [ italic_ε start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_ε start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ] ∼ caligraphic_P start_POSTSUBSCRIPT bold_italic_ε end_POSTSUBSCRIPT. The variables xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are latent, and the variables x~isubscript~𝑥𝑖\smash{\widetilde{x}_{i}}over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are observed. Figure 2 illustrates the generative process. Algorithm 1 summarizes how to sample from (iSCM). If computing the population expectations and variances of xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is intractable, the empirical statistics obtained from n𝑛nitalic_n samples can be used for standardization at each loop iteration of Algorithm 1.

Motivation

By construction, iSCMs model observed variables with zero mean and unit marginal variance. Contrary to standard SCMs, iSCMs avoid the accumulation of variance downstream in the causal ordering that can occur in standard SCMs (see Figure 1) through the standardization operation. Because each variable xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT only depends on the standardized variables 𝐱~pa(i)subscript~𝐱pa𝑖\mathbf{\widetilde{x}}_{\mathrm{pa}(i)}over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT roman_pa ( italic_i ) end_POSTSUBSCRIPT, the relative scales of the noise distribution 𝒫εisubscript𝒫subscript𝜀𝑖\mathcal{P}_{\varepsilon_{i}}caligraphic_P start_POSTSUBSCRIPT italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT and the causal mechanisms fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are the same everywhere in the system and do not change, for example, downstream in the causal ordering. The causal mechansims of iSCMs are thus scale-free, in that the local interaction of mechanism fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and noise εisubscript𝜀𝑖\varepsilon_{i}italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT occurs at a scale independent of the position of xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in the global ordering. This property makes iSCMs particularly useful for benchmarking, where random ground-truth models are commonly generated from a fixed distribution over functions fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and noise εisubscript𝜀𝑖\varepsilon_{i}italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Contrary to existing heuristics (Section 2), iSCMs model arbitrarily strong or weak causal dependencies and levels of cause-explained variance.

xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPTx~isubscript~𝑥𝑖{\widetilde{x}_{i}}over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT      x~jsubscript~𝑥𝑗{\widetilde{x}_{j}}over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPTx~ksubscript~𝑥𝑘{\widetilde{x}_{k}}over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT                            fisubscriptabsentsubscript𝑓𝑖\underbrace{\hskip 52.00005pt}_{\displaystyle f_{i}}under⏟ start_ARG end_ARG start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT
Figure 2: Causal mechanisms in iSCMs. The function fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT modeling xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT depends on the standardized 𝐱~pa(i)subscript~𝐱pa𝑖\smash{\mathbf{\widetilde{x}}_{\mathrm{pa}(i)}}over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT roman_pa ( italic_i ) end_POSTSUBSCRIPT. Dashing indicates z-standardization.
Algorithm 1 Sampling from an iSCM
Input: DAG 𝒢𝒢\mathcal{G}caligraphic_G, noise distribution 𝒫𝜺subscript𝒫𝜺\mathcal{P}_{\bm{\varepsilon}}caligraphic_P start_POSTSUBSCRIPT bold_italic_ε end_POSTSUBSCRIPT,
Input: functions {f1,,fd}subscript𝑓1subscript𝑓𝑑\{f_{1},...,f_{d}\}{ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_f start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT }
π𝜋absent\pi\leftarrowitalic_π ← topological ordering of 𝒢𝒢\mathcal{G}caligraphic_G
for i=1𝑖1i=1italic_i = 1 to d𝑑ditalic_d do
     επi𝒫επisimilar-tosubscript𝜀subscript𝜋𝑖subscript𝒫subscript𝜀subscript𝜋𝑖\varepsilon_{\pi_{i}}\sim\mathcal{P}_{\varepsilon_{\pi_{i}}}italic_ε start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∼ caligraphic_P start_POSTSUBSCRIPT italic_ε start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT
     xπifπi(𝐱~pa(πi),επi)subscript𝑥subscript𝜋𝑖subscript𝑓subscript𝜋𝑖subscript~𝐱pasubscript𝜋𝑖subscript𝜀subscript𝜋𝑖x_{\pi_{i}}\leftarrow f_{\pi_{i}}(\mathbf{\widetilde{x}}_{\mathrm{pa}(\pi_{i})% },\varepsilon_{\pi_{i}})italic_x start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ← italic_f start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT roman_pa ( italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT , italic_ε start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT )
     x~πixπi𝔼[xπi]Var[xπi]subscript~𝑥subscript𝜋𝑖subscript𝑥subscript𝜋𝑖𝔼delimited-[]subscript𝑥subscript𝜋𝑖Varsubscript𝑥subscript𝜋𝑖\displaystyle\widetilde{x}_{\pi_{i}}\leftarrow\frac{x_{\pi_{i}}-\mathbb{E}[x_{% \pi_{i}}]}{\sqrt{\operatorname{Var}[x_{\pi_{i}}]}}over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ← divide start_ARG italic_x start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT - blackboard_E [ italic_x start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ] end_ARG start_ARG square-root start_ARG roman_Var [ italic_x start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ] end_ARG end_ARG
return [x~1,,x~d]subscript~𝑥1subscript~𝑥𝑑\big{[}\widetilde{x}_{1},\dots,\widetilde{x}_{d}\big{]}[ over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ]\triangleright dabsentsuperscript𝑑\in\mathbb{R}^{d}∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT
Interventions

Analogous to standard SCMs, interventions in iSCMs can be defined as modifications of the structural assignments fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in (iSCM) (Figure 2), while kee** the standardization operation based on the observational distribution. When the population statistics for standardization are intractable, we first sample observational data to obtain empirical statistics. Since we do not study interventions in this work, we defer a further discussion of interventions in iSCMs to Appendix B.

Units

When modeling a physical system, the functional mechanisms in standard SCMs have to account for the difference in units between the variables for the model to be unit-covariant (see Villar et al.,, 2023). A side-effect of internal standardization is that variables of iSCMs become unit-less, so iSCMs obey the passive symmetry of unit covariance by construction. Therefore, iSCMs naturally model both unit-less quantities and variables measured in different units, which can make them useful beyond benchmarking. Learned iSCMs would be invariant to the units chosen by the experimenter, similar to the physical world being independent of the mathematical models chosen to describe it.

3.2 Implied SCMs

It is natural to investigate whether SCMs can generate the same observations as standardized SCMs or iSCMs, given the same causal graph 𝒢𝒢\mathcal{G}caligraphic_G and exogenous variables 𝜺𝜺\bm{\varepsilon}bold_italic_ε. In other words, can standardized SCMs and iSCMs be written as SCMs? For both models, the answer is yes. Specifically, we can express the generative process of 𝐱𝐬superscript𝐱𝐬\smash{\bf{x}^{s}}bold_x start_POSTSUPERSCRIPT bold_s end_POSTSUPERSCRIPT in (Standardized SCM) and 𝐱~~𝐱\smash{\widetilde{\bf{x}}}over~ start_ARG bold_x end_ARG in (iSCM) as

xis=gis(𝐱pa(i)s)+θisεiandx~i=g~i(𝐱~pa(i))+θ~iεi,formulae-sequencesubscriptsuperscript𝑥𝑠𝑖subscriptsuperscript𝑔𝑠𝑖superscriptsubscript𝐱pa𝑖𝑠subscriptsuperscript𝜃𝑠𝑖subscript𝜀𝑖andsubscript~𝑥𝑖subscript~𝑔𝑖subscript~𝐱pa𝑖subscript~𝜃𝑖subscript𝜀𝑖\displaystyle x^{s}_{i}=g^{s}_{i}(\mathbf{x}_{\mathrm{pa}(i)}^{s})+\theta^{s}_% {i}\varepsilon_{i}\quad\quad\quad\text{and}\quad\quad\quad\widetilde{x}_{i}=% \widetilde{g}_{i}(\mathbf{\widetilde{x}}_{\mathrm{pa}(i)})+\widetilde{\theta}_% {i}\varepsilon_{i}\,,italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_g start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT roman_pa ( italic_i ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ) + italic_θ start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = over~ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT roman_pa ( italic_i ) end_POSTSUBSCRIPT ) + over~ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , (2)

respectively, by moving the standardization operations into the causal mechanisms of the observables but leaving the DAG 𝒢𝒢\mathcal{G}caligraphic_G and the variables 𝜺𝜺\bm{\varepsilon}bold_italic_ε unchanged. Appendix A describes how to construct these implied causal mechanisms gissubscriptsuperscript𝑔𝑠𝑖\smash{g^{s}_{i}}italic_g start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and g~isubscript~𝑔𝑖\smash{\widetilde{g}_{i}}over~ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and implied noise scales θissubscriptsuperscript𝜃𝑠𝑖\smash{\theta^{s}_{i}}italic_θ start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and θ~isubscript~𝜃𝑖\smash{\widetilde{\theta}_{i}}over~ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. We refer to the above SCM form of a standardized SCM or an iSCM with additive noise as their implied (SCM) model. Correspondingly, the implied SCMs have zero mean and unit variance. The notion of implied SCMs is powerful, because it enables us to analyze standardized SCMs and iSCMs as SCMs, and it sheds light on the performance of structure learning algorithms that assume unstandardized SCMs to underlie the generative process of the data (e.g., Shimizu et al.,, 2011; Zheng et al.,, 2018; Yu et al.,, 2019; Lachapelle et al.,, 2020; Zheng et al.,, 2020).

To provide a first characterization of standardized SCMs and iSCMs, our theoretical analyses focus on systems where fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are linear functions with additive, zero-mean noise as given by Equation (1). As a step** stone for this analysis, we derive an analytical expression for the covariance of linear SCMs, whose variables have unit variance by construction, without any form of standardization:

Lemma 1 (Covariance in linear SCMs with unit marginal variances).

Let 𝐱𝐱\bf{x}bold_x be modeled by a linear SCM defined by (1) with DAG 𝒢𝒢\mathcal{G}caligraphic_G that satisfies Var[xi]=1Varsubscript𝑥𝑖1\operatorname{Var}[x_{i}]=1roman_Var [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] = 1. Then, the covariance Cov[xi,xj]Covsubscript𝑥𝑖subscript𝑥𝑗\operatorname{Cov}[x_{i},x_{j}]roman_Cov [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] is the sum of products of the weights along all unblocked paths between the nodes of xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and xjsubscript𝑥𝑗x_{j}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT in 𝒢𝒢\mathcal{G}caligraphic_G. Specifically, for any i,j{1,,d}𝑖𝑗1𝑑i,j\in\{1,...,d\}italic_i , italic_j ∈ { 1 , … , italic_d } such that ij𝑖𝑗i\neq jitalic_i ≠ italic_j, it holds that

Cov[xi,xj]=pjiPji(l,m)pjiwl,m,Covsubscript𝑥𝑖subscript𝑥𝑗subscriptsubscript𝑝𝑗𝑖subscript𝑃𝑗𝑖subscriptproduct𝑙𝑚subscript𝑝𝑗𝑖subscript𝑤𝑙𝑚\operatorname{Cov}[x_{i},x_{j}]=\sum_{p_{j\leftrightarrow i}\in P_{j% \leftrightarrow i}}\prod_{(l,m)\in p_{j\leftrightarrow i}}w_{l,m}\,,roman_Cov [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] = ∑ start_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_j ↔ italic_i end_POSTSUBSCRIPT ∈ italic_P start_POSTSUBSCRIPT italic_j ↔ italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT ( italic_l , italic_m ) ∈ italic_p start_POSTSUBSCRIPT italic_j ↔ italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_l , italic_m end_POSTSUBSCRIPT , (3)

where Pjisubscript𝑃𝑗𝑖P_{j\leftrightarrow i}italic_P start_POSTSUBSCRIPT italic_j ↔ italic_i end_POSTSUBSCRIPT are all unblocked paths from xjsubscript𝑥𝑗x_{j}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT to xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in 𝒢𝒢\mathcal{G}caligraphic_G, and (l,m)pji𝑙𝑚subscript𝑝𝑗𝑖(l,m)\in p_{j\leftrightarrow i}( italic_l , italic_m ) ∈ italic_p start_POSTSUBSCRIPT italic_j ↔ italic_i end_POSTSUBSCRIPT indicates that the directed edge (l,m)𝑙𝑚(l,m)( italic_l , italic_m ) is part of the path pjisubscript𝑝𝑗𝑖p_{j\leftrightarrow i}italic_p start_POSTSUBSCRIPT italic_j ↔ italic_i end_POSTSUBSCRIPT.

We give a proof in Appendix C.2. Since the implied SCMs of linear standardized SCMs and iSCMs are linear SCMs, the setting of Lemma 1 applies precisely to the SCM forms of both models. Thus, Lemma 1 enables us to study the covariances in standardized SCMs and iSCMs, and as we show next, derive conditions for the (non)identifiability of their DAGs 𝒢𝒢\mathcal{G}caligraphic_G from the observational distribution.

4 Analysis

In this section, we give two theoretical results that support the suitability of iSCMs over standard SCMs for causal discovery benchmarking. First, we prove the general case of Figure 1. Contrary to standardized SCMs, iSCMs do not degenerate towards deterministic implied SCM mechanisms in deep graphs. Moreover, we prove that the DAGs of linear iSCMs cannot be identified beyond their MEC, assuming the DAG is a forest, even if the support of 𝐰𝐰\bf{w}bold_w is known. Crucially, we also show that this is not generally true for standardized SCMs. This suggests that algorithms can less easily game benchmarks based on linear iSCMs when knowing the data-generating process. For all results, we consider linear SCMs (1) with zero-mean additive noise and equal noise variances. All results are at the population level, so assume we know p(𝐱𝐬)𝑝superscript𝐱𝐬p(\bf{x}^{s})italic_p ( bold_x start_POSTSUPERSCRIPT bold_s end_POSTSUPERSCRIPT ) or p(𝐱~)𝑝~𝐱p(\widetilde{\bf{x}})italic_p ( over~ start_ARG bold_x end_ARG ). Proofs are given in Appendix C.

4.1 Behavior with Increasing Graph Depth

Standardized SCMs tend towards increasing correlations between adjacent nodes down the topological ordering. This correlation artifact makes standardized SCMs problematic for benchmarking, because it may not be a property we expect to underlie real data. Reisach et al., (2024) show, under some assumptions on 𝐰𝐰\bf{w}bold_w, that the dependencies in standardized SCMs become deterministic with increasing graph depth. This implies that any exogenous variation εisubscript𝜀𝑖\varepsilon_{i}italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT vanishes lower down in the system. Unless prior domain knowledge leads us to assume this holds in applications of interest, it may not be desirable to implicitly bias structure learning benchmarks towards such systems. For example, if the causal ordering represents time (Pamfil et al.,, 2020), the mechanisms of standardized SCMs are unable to model or characterize time-invariant or stable processes. Moreover, if we expect causal mechanisms to be independent (Schölkopf,, 2022), the qualitative behavior of a causal mechanism should not provide information about its position in the topological ordering relative to other mechanisms, as it would in SCMs. Reisach et al., (2024) show that baselines like R2superscriptR2\operatorname{R^{2}}roman_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-SortnRegress can perform competitively on benchmarks by exploiting this artifact (Section 2).

iSCMs do not tend towards determinism with increasing graph depth (Figure 1(b)). In standardized SCMs, the correlations increase downstream, because the marginal variances of the underlying SCM increase with node depth, while the variance scale is fixed (Reisach et al.,, 2021). Thus, for large i𝑖iitalic_i, the variance scale of xi1subscript𝑥𝑖1x_{i-1}italic_x start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT becomes large relative to the scale of εisubscript𝜀𝑖\varepsilon_{i}italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, and the correlation of xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and xi1subscript𝑥𝑖1x_{i-1}italic_x start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT tends towards 1111. Since xissubscriptsuperscript𝑥𝑠𝑖x^{s}_{i}italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and xi1ssubscriptsuperscript𝑥𝑠𝑖1x^{s}_{i-1}italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT are just standardized versions of these variables, they maintain the same correlation. iSCMs avoid this by standardizing internally, which scales the variance of any parents in a mechanism fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to 1111, modulating the relative variance of εisubscript𝜀𝑖\varepsilon_{i}italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and 𝐱pa(i)subscript𝐱pa𝑖\smash{\mathbf{x}_{\mathrm{pa}(i)}}bold_x start_POSTSUBSCRIPT roman_pa ( italic_i ) end_POSTSUBSCRIPT. In the following, we formalize this result for general graphs by bounding the fraction of cause explained variance (CEV). The fraction of CEV for xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the proportion of Var[xi]Varsubscript𝑥𝑖\operatorname{Var}[x_{i}]roman_Var [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] explained by its causal parents and given by

CEVf[xi]=1Var[xi𝔼[xi|𝐱pa(i)]]Var[xi].subscriptCEVfsubscript𝑥𝑖1Varsubscript𝑥𝑖𝔼delimited-[]conditionalsubscript𝑥𝑖subscript𝐱pa𝑖Varsubscript𝑥𝑖\operatorname{CEV_{f}}[x_{i}]=1-\frac{\operatorname{Var}[x_{i}-\mathds{E}[x_{i% }|\mathbf{x}_{\mathrm{pa}(i)}]]}{\operatorname{Var}[x_{i}]}\,.start_OPFUNCTION roman_CEV start_POSTSUBSCRIPT roman_f end_POSTSUBSCRIPT end_OPFUNCTION [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] = 1 - divide start_ARG roman_Var [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - blackboard_E [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | bold_x start_POSTSUBSCRIPT roman_pa ( italic_i ) end_POSTSUBSCRIPT ] ] end_ARG start_ARG roman_Var [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] end_ARG . (4)

The following results shows that we can bound the fraction of CEV for any variable in a linear iSCM:

Theorem 2 (Bound on CEVfsubscriptCEVf\smash{\operatorname{CEV_{f}}}roman_CEV start_POSTSUBSCRIPT roman_f end_POSTSUBSCRIPT in linear iSCMs).

Let 𝐱𝐱\bf{x}bold_x be modeled by a linear iSCM (1) with DAG 𝒢𝒢\mathcal{G}caligraphic_G and additive noise of equal variances Var[εi]=σ2Varsubscript𝜀𝑖superscript𝜎2\operatorname{Var}[\varepsilon_{i}]=\smash{\sigma^{2}}roman_Var [ italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] = italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Suppose any node in 𝒢𝒢\mathcal{G}caligraphic_G has at most m𝑚mitalic_m parents and w=maxi,j{1,,d}|wi,j|𝑤subscript𝑖𝑗1𝑑subscript𝑤𝑖𝑗w=\max_{i,j\in\{1,...,d\}}\lvert w_{i,j}\rvertitalic_w = roman_max start_POSTSUBSCRIPT italic_i , italic_j ∈ { 1 , … , italic_d } end_POSTSUBSCRIPT | italic_w start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT |. Then, for any i{1,,d}𝑖1𝑑i\in\{1,...,d\}italic_i ∈ { 1 , … , italic_d }, the fraction of CEV for x~isubscript~𝑥𝑖\widetilde{x}_{i}over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is bounded as

CEVf[x~i]1σ2m2w2+σ2.subscriptCEVfsubscript~𝑥𝑖1superscript𝜎2superscript𝑚2superscript𝑤2superscript𝜎2\displaystyle\operatorname{CEV_{f}}[\widetilde{x}_{i}]\leq 1-\frac{\sigma^{2}}% {m^{2}w^{2}+\sigma^{2}}\,.start_OPFUNCTION roman_CEV start_POSTSUBSCRIPT roman_f end_POSTSUBSCRIPT end_OPFUNCTION [ over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] ≤ 1 - divide start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_w start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .

Since the fraction of CEV is bounded, iSCMs are guaranteed not to collapse to determinism in large systems, alleviating several of the concerns with (standardized) SCMs discussed above.

4.2 Identifiability

(i)i\mathrm{(i)}( roman_i )x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTx2subscript𝑥2x_{2}italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTx3subscript𝑥3x_{3}italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT(ii)ii\mathrm{(ii)}( roman_ii )x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTx2subscript𝑥2x_{2}italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTx3subscript𝑥3x_{3}italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT(iii)iii\mathrm{(iii)}( roman_iii )x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTx2subscript𝑥2x_{2}italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTx3subscript𝑥3x_{3}italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPTα𝛼\alphaitalic_αβ𝛽\betaitalic_βα𝛼\alphaitalic_αβ𝛽\betaitalic_βα𝛼\alphaitalic_αβ𝛽\betaitalic_β
(a) DAGs with edge weights α𝛼\alphaitalic_α and β𝛽\betaitalic_β
Refer to caption
(b) Cov. matrix of linear iSCMs
Figure 3: iSCMs with the same covariance matrix. (a) DAGs in an MEC with the same edge weights. (b) Covariance matrix for all linear iSCMs in (a) when α=1𝛼1\alpha=1italic_α = 1, β=2𝛽2\beta=2italic_β = 2.

Figure 1(a) illustrates that the pairwise correlations in SCMs over chain graphs depend on the position in the topological ordering. This can allow algorithms like R2superscriptR2\operatorname{R^{2}}roman_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-SortnRegress to infer the graph. By contrast, Figure 1(b) shows that iSCMs do no exhibit this pattern, with correlations between variables not increasing the identifiability of any part of the system.

In the following, we formalize this phenomenon for forests, that is, all DAGs with acyclic skeletons (Section 2). Specifically, we prove two results concerning the identifiability of the DAG 𝒢𝒢\mathcal{G}caligraphic_G from the observational distribution, for standardized SCMs and iSCMs. This makes our finding the first identifiability result for standardized SCMs. While not every DAG is a forest, DAGs have forests as subgraphs and resemble forests as sparsity increases, thus providing us with intuition for generally sparse systems (e.g., Alon and Spencer,, 2016, Chapter 11).

Our first result leverages the observation that, for standardized SCMs, many DAGs in an MEC are infeasible given p(𝐱𝐬)𝑝superscript𝐱𝐬p(\bf{x}^{s})italic_p ( bold_x start_POSTSUPERSCRIPT bold_s end_POSTSUPERSCRIPT ) when their edge directions are not consistent with the direction of increasing absolute covariance. To illustrate this idea, suppose our goal is to distinguish between the DAGs in the MEC 𝒢~={(i),(ii),(iii)}~𝒢iiiiii\smash{\tilde{\mathcal{G}}}=\{\mathrm{(i)},\mathrm{(ii)},\mathrm{(iii)}\}over~ start_ARG caligraphic_G end_ARG = { ( roman_i ) , ( roman_ii ) , ( roman_iii ) } in Figure 3(a). We overload notation and denote the weights of the edges α𝛼\alphaitalic_α and β𝛽\betaitalic_β regardless of orientation. For standardized SCMs, we can apply Lemma 1 to the implied SCM of graph (i)i\mathrm{(i)}( roman_i ) to obtain the covariances

Cov[x1s,x2s]=αα2+1andCov[x2s,x3s]=βα2+1β2(α2+1)+1.formulae-sequenceCovsuperscriptsubscript𝑥1𝑠superscriptsubscript𝑥2𝑠𝛼superscript𝛼21andCovsuperscriptsubscript𝑥2𝑠superscriptsubscript𝑥3𝑠𝛽superscript𝛼21superscript𝛽2superscript𝛼211\displaystyle\operatorname{Cov}[x_{1}^{s},x_{2}^{s}]=\tfrac{\alpha}{\sqrt{% \alpha^{2}+1}}\quad\quad\text{and}\quad\quad\operatorname{Cov}[x_{2}^{s},x_{3}% ^{s}]=\beta\sqrt{\tfrac{\alpha^{2}+1}{\beta^{2}(\alpha^{2}+1)+1}}\,.roman_Cov [ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ] = divide start_ARG italic_α end_ARG start_ARG square-root start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 end_ARG end_ARG and roman_Cov [ italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ] = italic_β square-root start_ARG divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 end_ARG start_ARG italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 ) + 1 end_ARG end_ARG .

See Appendix C.4.1. Together, both expressions imply that standardized SCMs with DAG (i)i\mathrm{(i)}( roman_i ) satisfy

|Cov[x1s,x2s]|<|Cov[x2s,x3s]|α2α2+1<β2.formulae-sequenceCovsuperscriptsubscript𝑥1𝑠superscriptsubscript𝑥2𝑠Covsuperscriptsubscript𝑥2𝑠superscriptsubscript𝑥3𝑠superscript𝛼2superscript𝛼21superscript𝛽2\lvert\operatorname{Cov}[x_{1}^{s},x_{2}^{s}]\rvert<\lvert\operatorname{Cov}[x% _{2}^{s},x_{3}^{s}]\rvert\quad\Longleftrightarrow\quad\tfrac{\alpha^{2}}{% \alpha^{2}+1}<\beta^{2}\,.| roman_Cov [ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ] | < | roman_Cov [ italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ] | ⟺ divide start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_α start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 end_ARG < italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (5)

If |β|1𝛽1\lvert\beta\rvert\geq 1| italic_β | ≥ 1, then the right-hand side of Equation 5 is always true. In this case, the absolute covariance increases from x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT to x3subscript𝑥3x_{3}italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT in all standardized SCMs with DAG (i)i\mathrm{(i)}( roman_i ). By symmetry, the covariance in SCMs with DAG (iii)iii\mathrm{(iii)}( roman_iii ) increases from x3subscript𝑥3x_{3}italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT to x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT when |α|1𝛼1\lvert\alpha\rvert\geq 1| italic_α | ≥ 1. Therefore, if both weights are greater than 1111, the absolute covariance increases downstream in all SCMs of (i)i\mathrm{(i)}( roman_i ) and (iii)iii\mathrm{(iii)}( roman_iii ). This implies that, among (i)i\mathrm{(i)}( roman_i ) and (iii)iii\mathrm{(iii)}( roman_iii ), only the DAG whose edges align with the covariance ordering in p(𝐱𝐬)𝑝superscript𝐱𝐬p(\bf{x}^{s})italic_p ( bold_x start_POSTSUPERSCRIPT bold_s end_POSTSUPERSCRIPT ) can induce p(𝐱𝐬)𝑝superscript𝐱𝐬p(\bf{x}^{s})italic_p ( bold_x start_POSTSUPERSCRIPT bold_s end_POSTSUPERSCRIPT ). Irrespectively, the DAG (ii)ii\mathrm{(ii)}( roman_ii ) remains plausible. We can extend the intuition of this 3-variable example to identify almost all edges in any forest MEC:

Theorem 3 (Partial identifiability of standardized linear SCMs with forest DAGs).

Let 𝐱𝐬superscript𝐱𝐬\bf{x}^{s}bold_x start_POSTSUPERSCRIPT bold_s end_POSTSUPERSCRIPT be modeled by a standardized linear SCM (1) with forest DAG 𝒢𝒢\mathcal{G}caligraphic_G, additive noise of equal variances Var[εi]=σ2Varsubscript𝜀𝑖superscript𝜎2\operatorname{Var}[\varepsilon_{i}]=\sigma^{2}roman_Var [ italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] = italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, and |wi,j|>1subscript𝑤𝑖𝑗1\left\lvert w_{i,j}\right\rvert>1| italic_w start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT | > 1 for all ipa(j)𝑖pa𝑗i\in\text{pa}(j)italic_i ∈ pa ( italic_j ). Then, given p(𝐱𝐬)𝑝superscript𝐱𝐬p(\bf{x}^{s})italic_p ( bold_x start_POSTSUPERSCRIPT bold_s end_POSTSUPERSCRIPT ) and the partially directed graph 𝒢~~𝒢\smash{\tilde{\mathcal{G}}}over~ start_ARG caligraphic_G end_ARG representing the MEC of 𝒢𝒢\mathcal{G}caligraphic_G, we can identify all but at most one edge of the true DAG 𝒢𝒢\mathcal{G}caligraphic_G in each undirected connected component of the MEC 𝒢~~𝒢\smash{\tilde{\mathcal{G}}}over~ start_ARG caligraphic_G end_ARG.

Our proof of Theorem 3 considers each undirected component separately from the rest of the MEC 𝒢~~𝒢\smash{\tilde{\mathcal{G}}}over~ start_ARG caligraphic_G end_ARG. Hence, the identifiability result extends to undirected tree components of arbitrary, non-forest MECs as well. Theorem 3 shows that, when using standardized SCM data for benchmarking, algorithms can use pairwise correlations to orient additional edges correctly. The weights assumption of Theorem 3 is relevant to causal discovery benchmarking, because weights are often sampled i.i.d. from intervals bounded away from 00 (Section 2). Hence, empirical evaluations may render standardized linear SCMs identifiable only through the design of their weights distribution. In the following, we show that, under similar conditions, iSCMs are more difficult to identify from their MEC. In the 3-variable example above, we can show that the observational distribution of iSCMs is the same for all DAGs (i)i\mathrm{(i)}( roman_i ), (ii)ii\mathrm{(ii)}( roman_ii ), and (iii)iii\mathrm{(iii)}( roman_iii ) when the weights α𝛼\alphaitalic_α and β𝛽\betaitalic_β are shared over the corresponding edges in the MEC (Figure 3(b); see Appendix C.4). This result generalizes to forests:

Theorem 4 (Nonidentifiability of linear Gaussian iSCMs with forest DAGs).

Let 𝐱~~𝐱\widetilde{\bf{x}}over~ start_ARG bold_x end_ARG be modeled by a linear iSCM (1) with forest DAG 𝒢𝒢\mathcal{G}caligraphic_G and additive Gaussian noise of equal variances Var[εi]Varsubscript𝜀𝑖\operatorname{Var}[\varepsilon_{i}]roman_Var [ italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ]. Then, for every DAG 𝒢superscript𝒢\mathcal{G}^{\prime}caligraphic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT in the MEC of 𝒢𝒢\mathcal{G}caligraphic_G, there exists a linear iSCM with DAG 𝒢superscript𝒢\mathcal{G}^{\prime}caligraphic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT that has the same observational distribution as 𝐱~~𝐱\widetilde{\bf{x}}over~ start_ARG bold_x end_ARG, the same noise variances, and the same weights on the corresponding edges in the MEC.

Our proof consists of showing that the covariance matrices of these systems are equal. For linear Gaussian iSCMs, this then implies that their observational distributions are identical. Theorem 4 thus shows that additional knowledge of the weight distribution in a benchmark does not allow identifying any additional edges beyond the MEC. By contrast, Theorem 3 shows that, for standardized SCMs, lower-bounding the weight magnitudes is sufficient for identifying most of the graph from its MEC. Without standardization, 𝒢𝒢\mathcal{G}caligraphic_G is fully identified from its observational distribution under even weaker assumptions (Peters and Bühlmann,, 2014). Importantly, Theorem 4 does not generalize to arbitrary graphs beyond forests. Appendix C.4 provides a counterexample involving a 3-node skeleton with a cycle. As we study in the next section, this implies that nontrivial causal structure can still be learned from iSCM data. However, DAGs in benchmarks are often sparse, so we expect the implications of our identifiability results to capture relevant parts of empirical phenomena in benchmarking settings.

5 Experimental Results

Our previous analyses suggest that iSCMs address shortcomings of naive standardization, in particular, when sampling each fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and εisubscript𝜀𝑖\varepsilon_{i}italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT from the same distribution, as commonly done in benchmarking. In this section, we now provide evidence that iSCMs do not contain the covariance artifacts of SCMs. Moreover, we benchmark the SortnRegress baselines (Section 2) and two representative structure learning algorithms to gain insights into how their performance varies when benchmarked on standardized SCMs and iSCMs. Appendix E provides all details of the experimental setup.

5.1 R2superscriptR2\operatorname{R^{2}}roman_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-Sortability

Reisach et al., (2024) introduce the R2superscriptR2\operatorname{R^{2}}roman_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-sortability metric to evaluate the correlation artifact underlying a dataset. R2superscriptR2\operatorname{R^{2}}roman_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-sortability measures the correlation between the variables’ causal ordering and the R2superscriptR2\operatorname{R^{2}}roman_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT coefficients obtained from regressing each variable onto all others (Appendix D.2). The metric gives rise to the R2superscriptR2\operatorname{R^{2}}roman_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-SortnRegress baseline described in Section 2. Reisach et al., (2024) show that R2superscriptR2\operatorname{R^{2}}roman_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-sortability in SCMs is driven by an interplay of graph connectivity and the weight distribution of fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

Refer to caption
Refer to caption
(a) ER(d,2)ER𝑑2\operatorname{ER}(d,2)roman_ER ( italic_d , 2 )
Refer to caption
(b) SF(d,2)SF𝑑2\operatorname{SF}(d,2)roman_SF ( italic_d , 2 )
Figure 4: R2superscriptR2\operatorname{R^{2}}roman_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-sortability for different graph sizes. Linear standardized SCMs and iSCMs with εi𝒩(0,1)similar-tosubscript𝜀𝑖𝒩01\varepsilon_{i}\sim\mathcal{N}(0,1)italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , 1 ) and weights drawn from uniform distributions with supports given above each plot. For every model, we evaluate 100100100100 systems and n=𝑛absentn=italic_n =1000100010001000 samples each. Lines and shaded regions denote mean and standard deviation. Datasets that satisfy R2superscriptR2\operatorname{R^{2}}roman_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-sortability =0.5absent0.5=0.5= 0.5 (dashed) are not R2superscriptR2\operatorname{R^{2}}roman_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-sortable.

Figure 4 summarizes the R2superscriptR2\operatorname{R^{2}}roman_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-sortability statistics for linear SCM and iSCM data. We write ER(d,k)ER𝑑𝑘\operatorname{ER}(d,k)roman_ER ( italic_d , italic_k ) and SF(d,k)SF𝑑𝑘\operatorname{SF}(d,k)roman_SF ( italic_d , italic_k ) to denote Erdős-Rényi and scale-free graphs of size d𝑑ditalic_d and (expected) degree k𝑘kitalic_k, respectively. We find that iSCMs generate datasets that are not R2superscriptR2\operatorname{R^{2}}roman_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-sortable (R2superscriptR2\operatorname{R^{2}}roman_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-sortability \approx 0.50.50.50.5) and thus artifact-free while sampling over common graph structures (e.g., Zheng et al.,, 2018; Yu et al.,, 2019; Reisach et al.,, 2021). Conversely, standardized SCMs generate datasets that are strongly R2superscriptR2\operatorname{R^{2}}roman_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-sortable (|R2-sortability0.5|0much-greater-thanR2-sortability0.50\lvert\text{$\operatorname{R^{2}}$-sortability}-0.5\rvert\gg 0| roman_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT -sortability - 0.5 | ≫ 0). Since R2superscriptR2\operatorname{R^{2}}roman_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-sortability can be exploited for causal discovery, iSCM data serves as a test for evaluating whether algorithms utilize any data properties beyond the association between R2superscriptR2\operatorname{R^{2}}roman_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and the causal ordering in SCMs. Our results do not exclude the possibility of iSCM configurations that still produce R2superscriptR2\operatorname{R^{2}}roman_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-sortable datasets. However, we show that, for commonly-used 𝒢𝒢\mathcal{G}caligraphic_G, 𝒫𝜺subscript𝒫𝜺\mathcal{P}_{\bm{\varepsilon}}caligraphic_P start_POSTSUBSCRIPT bold_italic_ε end_POSTSUBSCRIPT, and 𝐰𝐰\bf{w}bold_w, iSCM datasets are not R2superscriptR2\operatorname{R^{2}}roman_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-sortable with high probability. Appendix F provides results for denser graphs.

5.2 Structure Learning

Under the same weight and noise distributions, standardized SCMs and iSCMs have different implied SCMs and generate qualitatively different datasets. Here, we study how this affects causal structure learning in practice. We evaluate VarVar\operatorname{Var}roman_Var- and R2superscriptR2\operatorname{R^{2}}roman_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-SortnRegress (SR) and a baseline using random orderings (Reisach et al.,, 2021, 2024). In addition, we evaluate representative algorithms from two orthogonal approaches to learning structure from (co)variance information. Notears by Zheng et al., (2018) leverages continuous optimization to minimize an MSE loss, which is affected by noise scaling (Loh and Bühlmann,, 2014; Seng et al.,, 2024). Avici by Lorch et al., (2022) predicts graphs using a model pretrained on simulated data and is thus optimized to exploit any artifacts that improve predictive accuracy. To investigate its susceptibility to artifacts, we evaluate the public model checkpoints trained on standardized SCMs.

Figure 5(a) summarizes the results for linear and nonlinear systems. Here, the nonlinear mechanisms fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are samples from a Gaussian process with squared exponential kernel. As expected, VarVar\operatorname{Var}roman_Var-SortnRegress performs best when SCMs are not standardized. Likewise, R2superscriptR2\operatorname{R^{2}}roman_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-SortnRegress performs better on SCMs and standardized SCMs, as iSCMs have R2superscriptR2\operatorname{R^{2}}roman_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-sortability close to 0.50.50.50.5 (Section 5.1). Avici shows the same trend, suggesting it may indeed be exploiting the correlation artifacts present in its training distribution. Like Reisach et al., (2021), we find that Notears performs best on unstandardized data. However, and more interestingly, Notears also performs better on iSCMs than on standardized SCMs, especially in linear and larger systems. As we investigate next, this gap may be explained by the fact that the implied models of standardized SCMs violate the assumptions of Notears more strongly than iSCMs. Overall, performance differences are more pronounced for linear systems, where the downstream variance accumulation in SCMs is unbounded. Appendix F reports the results in terms of structural Hamming distance (SHD) and different weight ranges.

Refer to caption
ER(20,2)ER202\operatorname{ER}(20,2)roman_ER ( 20 , 2 ) Refer to caption Refer to caption
ER(100,2)ER1002\operatorname{ER}(100,2)roman_ER ( 100 , 2 ) Refer to caption Refer to caption
(a) Benchmarking results
Refer to caption
Refer to caption
Refer to caption
(b) Implied model analysis
Figure 5: Structure learning performance on SCM and iSCM data. (a) F1 scores for recovering the edges of the true graph. Box plots show median and interquartile range (IQR). Whiskers extend to the largest value inside 1.51.51.51.5×\times×IQR from the boxes. Left (right) column shows results for linear (nonlinear) causal mechanisms with additive noise εi𝒩(0,1)similar-tosubscript𝜀𝑖𝒩01\varepsilon_{i}\sim\mathcal{N}(0,1)italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , 1 ) (Appendix E). For every model, we evaluate 20202020 systems and n=𝑛absentn=italic_n =1000100010001000 data points each. (b) Bottom panel shows distribution over inverse implied noise scales in the implied SCMs for ER(100,2)ER1002\operatorname{ER}(100,2)roman_ER ( 100 , 2 ) graphs estimated with kernel density estimation. Lines and shading denote mean and standard deviation. Top panel shows performance of NOTEARS on systems with these noise scale statistics but the same VarVar\operatorname{Var}roman_Var-sortability of SCMs (Appendix E.2 and E.5).
Properties of the implied SCM

When standardizing SCM data, the implied SCM corresponds to the SCM that could have generated the observations. Therefore, algorithms assuming that unstandardized SCMs generated the data will be susceptible to any assumption violations of the implied SCM, such as assumptions about the exogenous noise. Figure 5(b) (bottom) shows the distribution of inverse implied noise scales 1/θi21superscriptsubscript𝜃𝑖21/\theta_{i}^{2}1 / italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT for the variables of the implied models (see Equation 2). Since Var[εi]=1Varsubscript𝜀𝑖1\operatorname{Var}[\varepsilon_{i}]=1roman_Var [ italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] = 1 in our experiments, these inverse squared noise scales are equal to the inverse variances of the full additive noise terms. We find that standardized SCMs induce inverse noise scales that are orders of magnitude greater than those of iSCMs. This distribution is essentially the footprint of the determinism in the depth limit discussed in Section 4.1. The modes at 1/θi2=11superscriptsubscript𝜃𝑖211/\theta_{i}^{2}=11 / italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 1 and at 1/θi2>11superscriptsubscript𝜃𝑖211/\theta_{i}^{2}>11 / italic_θ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT > 1 in the iSCM plot correspond to root and non-root nodes, respectively.

Figure 5(b) (top) shows the performance of Notears when isolating the noise properties of the implied models from the fact that standardized SCMs and iSCMs are not VarVar\operatorname{Var}roman_Var-sortable. For this, we construct SCMs that have the marginal variances (and VarVar\operatorname{Var}roman_Var-sortability, here 0.990.990.990.99 on average) of unstandardized SCMs but the noise variances of the implied models by correcting their weights (see Appendix E.5). Notears performs better in such systems, suggesting that (i) the noise statistics may indeed explain the performance difference on iSCM data, and (ii) VarVar\operatorname{Var}roman_Var-sortability may not be the only reason why Notears performs significantly worse on standardized data (Reisach et al.,, 2021). This sheds light on existing benchmarking results, where MSE-based algorithms perform below expectations despite perhaps not intending to evaluate the algorithms under model mismatch (e.g., Reisach et al.,, 2021; Kaiser and Sipos,, 2021). For the MSE loss, Loh and Bühlmann, (2014) and Seng et al., (2024) show that smaller ratios of noise variances increase the magnitude of weights required for the true DAG to be the unique minimizer. The MSE loss ultimately does not account for the inverse variance factor in the Gaussian noise likelihood. Overall, the statistics of the implied models of standardized SCMs are empirically further from SCMs with equal noise variances than their iSCM counterparts.

6 Conclusions

We describe the iSCM, a one-line modification of the SCM that modulates the scale of interaction between the causal mechanism fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and noise εisubscript𝜀𝑖\varepsilon_{i}italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT at each variable xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Through several theoretical and experimental results, we study its properties in relation to standard SCMs as well as the models they imply after standardization. To conclude, we highlight the following key takeaways:

Standardizing during the generative process removes sortability artifacts.

When the functions fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and the noise εisubscript𝜀𝑖\varepsilon_{i}italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are, for example, sampled i.i.d. for each variable xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, SCMs exhibit artifacts that are not removed when shifting and scaling the generated data. Our results in Section 5 show that iSCMs are effective at removing VarVar\operatorname{Var}roman_Var- and R2superscriptR2\operatorname{R^{2}}roman_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-sortability. This makes iSCMs a useful complement to structure learning benchmarks with SCMs, enabling a specific evaluation of the ability of algorithms to transfer to real-world settings that do not exhibit R2superscriptR2\operatorname{R^{2}}roman_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT artifacts. Despite the removed sortability artifacts, causal discovery algorithms are able to infer nontrivial structure from iSCM data (Figure 5).

Standardizing post-hoc can lead to partial identifiability and degenerate implied SCMs.

Scaling the units of SCM data is not innocuous. Theorem 3 shows that mild knowledge on the distribution of fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT can identify edges in standardized SCMs that are typically not identifiable from observational data. To our knowledge, our result is the first concerning the identifiability of 𝒢𝒢\mathcal{G}caligraphic_G from the standardized observational distribution of SCMs. This may make benchmarks, where similar assumptions on fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT often hold, trivial under standardized SCMs. Moreover, Figure 5(b) shows that standard SCMs can collapse to modeling near-zero exogenous noise. Theorems 2 and 4 demonstrate that neither property appears in the analogous iSCMs. Ultimately, (non)identifiability may be either a feature or bug, depending on whether assumptions are verifiable in practice or a priori known during evaluation.

iSCMs are stable and scale-free, making them useful beyond benchmarking.

Beyond data generation, the stable generative process of iSCMs can make them useful for modeling, e.g., large or temporal systems (e.g., Kilian,, 2013; Pamfil et al.,, 2020). In iSCMs, the scale of a causal mechanism fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and its unexplained variation εisubscript𝜀𝑖\varepsilon_{i}italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is both unit-less and independent from its position in the causal ordering (Section 3). Since each iSCM implies a standard SCM, iSCMs can be viewed as a reparameterization of SCMs that enables modeling and learning the functions fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT on the same scale, e.g., under a shared prior or level of regularization. Conceptually, iSCMs are related to batch normalization (Ioffe and Szegedy,, 2015), a technique used to stabilize the optimization of neural networks, which compose sequences of functions like SCMs, by adding internal standardization. Overall, these properties may make the iSCM a useful structural equation model beyond the benchmarking problem studied here.

Acknowledgments and Disclosure of Funding

This research was supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program grant agreement no. 815943 and the Swiss National Science Foundation under NCCR Automation, grant agreement 51NF40 180545. This work was also supported by the German Federal Ministry of Education and Research (BMBF): Tübingen AI Center, FKZ: 01IS18039B, and by the Machine Learning Cluster of Excellence, EXC number 2064/1, project number 390727645.

References

  • Alon and Spencer, (2016) Alon, N. and Spencer, J. H. (2016). The probabilistic method. John Wiley & Sons.
  • Andersson et al., (1997) Andersson, S. A., Madigan, D., and Perlman, M. D. (1997). A characterization of markov equivalence classes for acyclic digraphs. The Annals of Statistics, 25(2):505–541.
  • Barabási and Albert, (1999) Barabási, A.-L. and Albert, R. (1999). Emergence of scaling in random networks. science, 286(5439):509–512.
  • Dibaeinia and Sinha, (2020) Dibaeinia, P. and Sinha, S. (2020). SERGIO: a single-cell expression simulator guided by gene regulatory networks. Cell systems, 11(3):252–271.
  • Erdős and Rényi, (1959) Erdős, P. and Rényi, A. (1959). On random graphs. Publicationes Mathematicae, 6:290–297.
  • Imbens and Rubin, (2015) Imbens, G. W. and Rubin, D. B. (2015). Causal inference in statistics, social, and biomedical sciences. Cambridge university press.
  • Ioffe and Szegedy, (2015) Ioffe, S. and Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, pages 448–456. pmlr.
  • Kaiser and Sipos, (2021) Kaiser, M. and Sipos, M. (2021). Unsuitability of NOTEARS for causal graph discovery. arXiv preprint arXiv:2104.05441.
  • Kilian, (2013) Kilian, L. (2013). Structural vector autoregressions. In Handbook of research methods and applications in empirical macroeconomics, pages 515–554. Edward Elgar Publishing.
  • Lachapelle et al., (2020) Lachapelle, S., Brouillard, P., Deleu, T., and Lacoste-Julien, S. (2020). Gradient-based neural DAG learning. In International Conference on Learning Representations.
  • Loh and Bühlmann, (2014) Loh, P.-L. and Bühlmann, P. (2014). High-dimensional learning of linear causal networks via inverse covariance estimation. The Journal of Machine Learning Research, 15(1):3065–3105.
  • Lorch et al., (2022) Lorch, L., Sussex, S., Rothfuss, J., Krause, A., and Schölkopf, B. (2022). Amortized inference for causal structure learning. Advances in Neural Information Processing Systems, 35:13104–13118.
  • Maathuis et al., (2010) Maathuis, M. H., Colombo, D., Kalisch, M., and Bühlmann, P. (2010). Predicting causal effects in large-scale systems from observational data. Nature methods, 7(4):247–248.
  • Meek, (1995) Meek, C. (1995). Causal inference and causal explanation with background knowledge. In Proceedings of the Eleventh conference on Uncertainty in artificial intelligence, pages 403–410.
  • Mooij et al., (2020) Mooij, J. M., Magliacane, S., and Claassen, T. (2020). Joint causal inference from multiple contexts. The Journal of Machine Learning Research, 21(1):3919–4026.
  • Ng et al., (2020) Ng, I., Ghassami, A., and Zhang, K. (2020). On the role of sparsity and DAG constraints for learning linear DAGs. Advances in Neural Information Processing Systems, 33:17943–17954.
  • Pamfil et al., (2020) Pamfil, R., Sriwattanaworachai, N., Desai, S., Pilgerstorfer, P., Georgatzis, K., Beaumont, P., and Aragam, B. (2020). DYNOTEARS: Structure learning from time-series data. In International Conference on Artificial Intelligence and Statistics, pages 1595–1605. Pmlr.
  • Pearl, (2009) Pearl, J. (2009). Causality. Cambridge university press.
  • Peters and Bühlmann, (2014) Peters, J. and Bühlmann, P. (2014). Identifiability of Gaussian structural equation models with equal error variances. Biometrika, 101(1):219–228.
  • Peters et al., (2017) Peters, J., Janzing, D., and Schölkopf, B. (2017). Elements of causal inference: foundations and learning algorithms. The MIT Press.
  • Rahimi and Recht, (2007) Rahimi, A. and Recht, B. (2007). Random features for large-scale kernel machines. Advances in neural information processing systems, 20.
  • Reisach et al., (2021) Reisach, A., Seiler, C., and Weichwald, S. (2021). Beware of the simulated DAG! Causal discovery benchmarks may be easy to game. Advances in Neural Information Processing Systems, 34:27772–27784.
  • Reisach et al., (2024) Reisach, A., Tami, M., Seiler, C., Chambaz, A., and Weichwald, S. (2024). A scale-invariant sorting criterion to find a causal order in additive noise models. Advances in Neural Information Processing Systems, 36.
  • Runge et al., (2019) Runge, J., Bathiany, S., Bollt, E., Camps-Valls, G., Coumou, D., Deyle, E., Glymour, C., Kretschmer, M., Mahecha, M. D., Muñoz-Marí, J., et al. (2019). Inferring causation from time series in earth system sciences. Nature communications, 10(1):2553.
  • Sachs et al., (2005) Sachs, K., Perez, O., Pe’er, D., Lauffenburger, D. A., and Nolan, G. P. (2005). Causal protein-signaling networks derived from multiparameter single-cell data. Science, 308(5721):523–529.
  • Schaffter et al., (2011) Schaffter, T., Marbach, D., and Floreano, D. (2011). GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods. Bioinformatics, 27(16):2263–2270.
  • Schölkopf, (2022) Schölkopf, B. (2022). Causality for machine learning. In Probabilistic and causal inference: The works of Judea Pearl, pages 765–804.
  • Seng et al., (2024) Seng, J., Zečević, M., Dhami, D. S., and Kersting, K. (2024). Learning large DAGs is harder than you think: Many losses are minimal for the wrong DAG. In The Twelfth International Conference on Learning Representations.
  • Shimizu et al., (2006) Shimizu, S., Hoyer, P. O., Hyvärinen, A., Kerminen, A., and Jordan, M. (2006). A linear non-Gaussian acyclic model for causal discovery. Journal of Machine Learning Research, 7(10).
  • Shimizu et al., (2011) Shimizu, S., Inazumi, T., Sogawa, Y., Hyvarinen, A., Kawahara, Y., Washio, T., Hoyer, P. O., Bollen, K., and Hoyer, P. (2011). DirectLiNGAM: A direct method for learning a linear non-Gaussian structural equation model. Journal of Machine Learning Research-JMLR, 12(Apr):1225–1248.
  • Squires and Uhler, (2022) Squires, C. and Uhler, C. (2022). Causal structure learning: A combinatorial perspective. Foundations of Computational Mathematics, 23(5):1781–1815.
  • Squires et al., (2022) Squires, C., Yun, A., Nichani, E., Agrawal, R., and Uhler, C. (2022). Causal structure discovery between clusters of nodes induced by latent factors. In Conference on Causal Learning and Reasoning, pages 669–687. PMLR.
  • Verma and Pearl, (2013) Verma, T. S. and Pearl, J. (2013). On the equivalence of causal models.
  • Villar et al., (2023) Villar, S., Hogg, D. W., Yao, W., Kevrekidis, G. A., and Schölkopf, B. (2023). Towards fully covariant machine learning. arXiv preprint arXiv:2301.13724.
  • Vowels et al., (2022) Vowels, M. J., Camgoz, N. C., and Bowden, R. (2022). D’ya like DAGs? A survey on structure learning and causal discovery. ACM Computing Surveys, 55(4):1–36.
  • Wienöbst et al., (2023) Wienöbst, M., Luttermann, M., Bannach, M., and Liskiewicz, M. (2023). Efficient enumeration of markov equivalent dags. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 12313–12320.
  • Yu et al., (2019) Yu, Y., Chen, J., Gao, T., and Yu, M. (2019). DAG-GNN: DAG structure learning with graph neural networks. In International Conference on Machine Learning, pages 7154–7163. PMLR.
  • Zheng et al., (2018) Zheng, X., Aragam, B., Ravikumar, P. K., and Xing, E. P. (2018). DAGs with NO TEARS: Continuous optimization for structure learning. Advances in neural information processing systems, 31.
  • Zheng et al., (2020) Zheng, X., Dan, C., Aragam, B., Ravikumar, P., and Xing, E. (2020). Learning sparse nonparametric DAGs. In International Conference on Artificial Intelligence and Statistics, pages 3414–3425. Pmlr.

Appendix A Implied Models

In this section, we describe how to express the assignments of the observed variables of standardized SCMs and iSCMs with a general additive noise mechanism

fi(𝐱,εi)=fi(𝐱)+εi,subscript𝑓𝑖𝐱subscript𝜀𝑖subscript𝑓𝑖𝐱subscript𝜀𝑖\displaystyle f_{i}(\mathbf{x},\varepsilon_{i})=f_{i}(\mathbf{x})+\varepsilon_% {i}\,,italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x , italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x ) + italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , (6)

in the form of (SCM), while sharing the same causal graph 𝒢𝒢\mathcal{G}caligraphic_G and exogenous noise variables 𝜺𝜺\bm{\varepsilon}bold_italic_ε. We obtain the SCM form by moving the standardization steps into the causal mechanisms by linearly rescaling fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and εisubscript𝜀𝑖\varepsilon_{i}italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, such that each observed variable is only a function of observed variables and the noise εisubscript𝜀𝑖\varepsilon_{i}italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Throughout this work, the implied (SCM) model denotes the specific construction given in the following two subsections. For this, we assume that we can express the first two moments of the system in closed form. Similar to the main text, we overload notation for both standardized SCMs and iSCMs and write

μi:=𝔼[xi]andsi:=Var[xi].formulae-sequenceassignsubscript𝜇𝑖𝔼delimited-[]subscript𝑥𝑖andassignsubscript𝑠𝑖Varsubscript𝑥𝑖\mu_{i}:=\mathds{E}[x_{i}]\quad\quad\text{and}\quad\quad s_{i}:=\sqrt{% \operatorname{Var}[x_{i}]}\,.italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := blackboard_E [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] and italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := square-root start_ARG roman_Var [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] end_ARG .

We also derive analytic expressions for the weights of the implied models of linear iSCMs defined by Equation (1), which we later use in our proofs.

A.1 Implied Model of a Standardized SCM

Let 𝐱𝐬superscript𝐱𝐬\bf{x}^{s}bold_x start_POSTSUPERSCRIPT bold_s end_POSTSUPERSCRIPT be modeled by (Standardized SCM) with causal mechanisms defined by Equation (6). We recall that 𝐱𝐬superscript𝐱𝐬\bf{x}^{s}bold_x start_POSTSUPERSCRIPT bold_s end_POSTSUPERSCRIPT are the observations obtained after standardizing 𝐱𝐱\bf{x}bold_x. Thus, we can rearrange xissubscriptsuperscript𝑥𝑠𝑖x^{s}_{i}italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT as

xi=sixis+μisubscript𝑥𝑖subscript𝑠𝑖subscriptsuperscript𝑥𝑠𝑖subscript𝜇𝑖x_{i}=s_{i}x^{s}_{i}+\mu_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT

and substitute every unstandardized variable xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT by a function of its standardized parents 𝐱pa(i)ssuperscriptsubscript𝐱pa𝑖𝑠\mathbf{x}_{\mathrm{pa}(i)}^{s}bold_x start_POSTSUBSCRIPT roman_pa ( italic_i ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT as

xis=xiμisi=fi(𝐱pa(i))+εiμisi=fi(𝐱pa(i)s𝒔pa(i)+𝝁pa(i))μisi+1siεi,subscriptsuperscript𝑥𝑠𝑖subscript𝑥𝑖subscript𝜇𝑖subscript𝑠𝑖subscript𝑓𝑖subscript𝐱pa𝑖subscript𝜀𝑖subscript𝜇𝑖subscript𝑠𝑖subscript𝑓𝑖direct-productsuperscriptsubscript𝐱pa𝑖𝑠subscript𝒔pa𝑖subscript𝝁pa𝑖subscript𝜇𝑖subscript𝑠𝑖1subscript𝑠𝑖subscript𝜀𝑖x^{s}_{i}=\frac{x_{i}-\mu_{i}}{s_{i}}=\frac{f_{i}(\mathbf{x}_{\mathrm{pa}(i)})% +\varepsilon_{i}-\mu_{i}}{s_{i}}=\frac{f_{i}(\mathbf{x}_{\mathrm{pa}(i)}^{s}% \odot\bm{s}_{{\mathrm{pa}(i)}}+\bm{\mu}_{{\mathrm{pa}(i)}})-\mu_{i}}{s_{i}}+% \frac{1}{s_{i}}\varepsilon_{i}\,,italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG = divide start_ARG italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT roman_pa ( italic_i ) end_POSTSUBSCRIPT ) + italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG = divide start_ARG italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT roman_pa ( italic_i ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ⊙ bold_italic_s start_POSTSUBSCRIPT roman_pa ( italic_i ) end_POSTSUBSCRIPT + bold_italic_μ start_POSTSUBSCRIPT roman_pa ( italic_i ) end_POSTSUBSCRIPT ) - italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ,

where direct-product\odot denotes elementwise multiplication, and 𝝁pa(i)subscript𝝁pa𝑖\bm{\mu}_{{\mathrm{pa}(i)}}bold_italic_μ start_POSTSUBSCRIPT roman_pa ( italic_i ) end_POSTSUBSCRIPT and 𝒔pa(i)subscript𝒔pa𝑖\bm{s}_{{\mathrm{pa}(i)}}bold_italic_s start_POSTSUBSCRIPT roman_pa ( italic_i ) end_POSTSUBSCRIPT are the vectors of the parent means and standard deviations before standardization. Thus, the assignments of 𝐱𝐬superscript𝐱𝐬\bf{x}^{s}bold_x start_POSTSUPERSCRIPT bold_s end_POSTSUPERSCRIPT in a standardized SCM can be written as the SCM given by

xis=gis(𝐱pa(i)s)+θisεi,subscriptsuperscript𝑥𝑠𝑖subscriptsuperscript𝑔𝑠𝑖superscriptsubscript𝐱pa𝑖𝑠subscriptsuperscript𝜃𝑠𝑖subscript𝜀𝑖x^{s}_{i}=g^{s}_{i}(\mathbf{x}_{\mathrm{pa}(i)}^{s})+\theta^{s}_{i}\varepsilon% _{i}\,,italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_g start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT roman_pa ( italic_i ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ) + italic_θ start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ,

with implied noise scales θis:=1/siassignsubscriptsuperscript𝜃𝑠𝑖1subscript𝑠𝑖\theta^{s}_{i}:=1/s_{i}italic_θ start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := 1 / italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and implied causal mechanisms

gis(𝐱pa(i)s)subscriptsuperscript𝑔𝑠𝑖superscriptsubscript𝐱pa𝑖𝑠\displaystyle g^{s}_{i}(\mathbf{x}_{\mathrm{pa}(i)}^{s})italic_g start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT roman_pa ( italic_i ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ) :={fi(𝐱pa(i)s𝒔pa(i)+𝝁pa(i))μisiif i is a non-root variable, andfiμisiif i is a root variable.assignabsentcasessubscript𝑓𝑖direct-productsuperscriptsubscript𝐱pa𝑖𝑠subscript𝒔pa𝑖subscript𝝁pa𝑖subscript𝜇𝑖subscript𝑠𝑖if i is a non-root variable, andsubscript𝑓𝑖subscript𝜇𝑖subscript𝑠𝑖if i is a root variable.\displaystyle:=\begin{cases}\displaystyle\frac{f_{i}(\mathbf{x}_{\mathrm{pa}(i% )}^{s}\odot\bm{s}_{{\mathrm{pa}(i)}}+\bm{\mu}_{{\mathrm{pa}(i)}})-\mu_{i}}{s_{% i}}&\text{if $i$ is a non-root variable, and}\\ \displaystyle\frac{f_{i}-\mu_{i}}{s_{i}}&\text{if $i$ is a root variable.}\end% {cases}:= { start_ROW start_CELL divide start_ARG italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT roman_pa ( italic_i ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ⊙ bold_italic_s start_POSTSUBSCRIPT roman_pa ( italic_i ) end_POSTSUBSCRIPT + bold_italic_μ start_POSTSUBSCRIPT roman_pa ( italic_i ) end_POSTSUBSCRIPT ) - italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG end_CELL start_CELL if italic_i is a non-root variable, and end_CELL end_ROW start_ROW start_CELL divide start_ARG italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG end_CELL start_CELL if italic_i is a root variable. end_CELL end_ROW

A.2 Implied Model of an iSCM

Let 𝐱~~𝐱\widetilde{\bf{x}}over~ start_ARG bold_x end_ARG be modeled by (iSCM) with causal mechanisms defined by Equation (6). In an iSCM, 𝐱~~𝐱\widetilde{\bf{x}}over~ start_ARG bold_x end_ARG are the observed variables and 𝐱𝐱\bf{x}bold_x are the latent variables. We can express every observation x~isubscript~𝑥𝑖\widetilde{x}_{i}over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in terms of its observed parents 𝐱~pa(i)subscript~𝐱pa𝑖\mathbf{\widetilde{x}}_{\mathrm{pa}(i)}over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT roman_pa ( italic_i ) end_POSTSUBSCRIPT as

x~i=xiμisi=fi(𝐱~pa(i))+εiμisi=fi(𝐱~pa(i))μisi+1siεi.subscript~𝑥𝑖subscript𝑥𝑖subscript𝜇𝑖subscript𝑠𝑖subscript𝑓𝑖subscript~𝐱pa𝑖subscript𝜀𝑖subscript𝜇𝑖subscript𝑠𝑖subscript𝑓𝑖subscript~𝐱pa𝑖subscript𝜇𝑖subscript𝑠𝑖1subscript𝑠𝑖subscript𝜀𝑖\widetilde{x}_{i}=\frac{x_{i}-\mu_{i}}{s_{i}}=\frac{f_{i}(\mathbf{\widetilde{x% }}_{\mathrm{pa}(i)})+\varepsilon_{i}-\mu_{i}}{s_{i}}=\frac{f_{i}(\mathbf{% \widetilde{x}}_{\mathrm{pa}(i)})-\mu_{i}}{s_{i}}+\frac{1}{s_{i}}\varepsilon_{i% }\,.over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG = divide start_ARG italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT roman_pa ( italic_i ) end_POSTSUBSCRIPT ) + italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG = divide start_ARG italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT roman_pa ( italic_i ) end_POSTSUBSCRIPT ) - italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT .

Thus, the assignments of 𝐱~~𝐱\widetilde{\bf{x}}over~ start_ARG bold_x end_ARG in a iSCM can be written as the SCM given by

x~i=g~i(𝐱~pa(i))+θ~iεi,subscript~𝑥𝑖subscript~𝑔𝑖subscript~𝐱pa𝑖subscript~𝜃𝑖subscript𝜀𝑖\widetilde{x}_{i}=\widetilde{g}_{i}(\mathbf{\widetilde{x}}_{\mathrm{pa}(i)})+% \widetilde{\theta}_{i}\varepsilon_{i}\,,over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = over~ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT roman_pa ( italic_i ) end_POSTSUBSCRIPT ) + over~ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ,

with implied noise scales θ~i:=1/siassignsubscript~𝜃𝑖1subscript𝑠𝑖\widetilde{\theta}_{i}:=1/s_{i}over~ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := 1 / italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and implied causal mechanisms

g~i(𝐱~pa(i))subscript~𝑔𝑖subscript~𝐱pa𝑖\displaystyle\widetilde{g}_{i}(\mathbf{\widetilde{x}}_{\mathrm{pa}(i)})over~ start_ARG italic_g end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT roman_pa ( italic_i ) end_POSTSUBSCRIPT ) :={fi(𝐱~pa(i))μisiif i is a non-root variable, andfiμisiif i is a root variable.assignabsentcasessubscript𝑓𝑖subscript~𝐱pa𝑖subscript𝜇𝑖subscript𝑠𝑖if i is a non-root variable, andsubscript𝑓𝑖subscript𝜇𝑖subscript𝑠𝑖if i is a root variable.\displaystyle:=\begin{cases}\displaystyle\frac{f_{i}(\mathbf{\widetilde{x}}_{% \mathrm{pa}(i)})-\mu_{i}}{s_{i}}&\text{if $i$ is a non-root variable, and}\\ \displaystyle\frac{f_{i}-\mu_{i}}{s_{i}}&\text{if $i$ is a root variable.}\end% {cases}:= { start_ROW start_CELL divide start_ARG italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT roman_pa ( italic_i ) end_POSTSUBSCRIPT ) - italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG end_CELL start_CELL if italic_i is a non-root variable, and end_CELL end_ROW start_ROW start_CELL divide start_ARG italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG end_CELL start_CELL if italic_i is a root variable. end_CELL end_ROW

A.3 Weights of the Implied Model of a Linear iSCM

Here, we derive the analytical form for the mechanisms of the implied model of a linear iSCM with zero-centered, additive noise εisubscript𝜀𝑖\varepsilon_{i}italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. This iSCM is given by

xi:=𝐰iT𝐱~pa(i)+εiandx~i:=xiVar[xi],formulae-sequenceassignsubscript𝑥𝑖superscriptsubscript𝐰𝑖𝑇subscript~𝐱pa𝑖subscript𝜀𝑖andassignsubscript~𝑥𝑖subscript𝑥𝑖Varsubscript𝑥𝑖x_{i}:=\mathbf{w}_{i}^{T}\mathbf{\widetilde{x}}_{\mathrm{pa}(i)}+\varepsilon_{% i}\quad\quad\text{and}\quad\quad\widetilde{x}_{i}:=\frac{x_{i}}{\sqrt{% \operatorname{Var}[x_{i}]}}\,,italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := bold_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT roman_pa ( italic_i ) end_POSTSUBSCRIPT + italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := divide start_ARG italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Var [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] end_ARG end_ARG ,

where εisubscript𝜀𝑖\varepsilon_{i}italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT satisfies 𝔼[εi]=0𝔼delimited-[]subscript𝜀𝑖0\mathds{E}[\varepsilon_{i}]=0blackboard_E [ italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] = 0 and Var[εi]=σi2Varsubscript𝜀𝑖superscriptsubscript𝜎𝑖2\operatorname{Var}[\varepsilon_{i}]=\sigma_{i}^{2}roman_Var [ italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] = italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. We can write the above as

x~isubscript~𝑥𝑖\displaystyle\widetilde{x}_{i}over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =𝐰iT𝐱~pa(i)+εiVar[xi]=jpa(i)wj,ix~j+εiVar[xi]=jpa(i)wj,iVar[xi]x~j+1Var[xi]εi.absentsuperscriptsubscript𝐰𝑖𝑇subscript~𝐱pa𝑖subscript𝜀𝑖Varsubscript𝑥𝑖subscript𝑗pa𝑖subscript𝑤𝑗𝑖subscript~𝑥𝑗subscript𝜀𝑖Varsubscript𝑥𝑖subscript𝑗pa𝑖subscript𝑤𝑗𝑖Varsubscript𝑥𝑖subscript~𝑥𝑗1Varsubscript𝑥𝑖subscript𝜀𝑖\displaystyle=\frac{\mathbf{w}_{i}^{T}\mathbf{\widetilde{x}}_{\mathrm{pa}(i)}+% \varepsilon_{i}}{\sqrt{\operatorname{Var}[x_{i}]}}=\frac{\sum_{j\in\mathrm{pa}% (i)}w_{j,i}\widetilde{x}_{j}+\varepsilon_{i}}{\sqrt{\operatorname{Var}[x_{i}]}% }=\sum_{j\in\mathrm{pa}(i)}\frac{w_{j,i}}{\sqrt{\operatorname{Var}[x_{i}]}}\,% \widetilde{x}_{j}+\frac{1}{\sqrt{\operatorname{Var}[x_{i}]}}\varepsilon_{i}\,.= divide start_ARG bold_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT roman_pa ( italic_i ) end_POSTSUBSCRIPT + italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Var [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] end_ARG end_ARG = divide start_ARG ∑ start_POSTSUBSCRIPT italic_j ∈ roman_pa ( italic_i ) end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Var [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] end_ARG end_ARG = ∑ start_POSTSUBSCRIPT italic_j ∈ roman_pa ( italic_i ) end_POSTSUBSCRIPT divide start_ARG italic_w start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Var [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] end_ARG end_ARG over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + divide start_ARG 1 end_ARG start_ARG square-root start_ARG roman_Var [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] end_ARG end_ARG italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT .

It follows that the implied SCM of a linear iSCM is also linear, with weights and noise variances given by

w~j,i=wj,iVar[xi]andσ~i2=σi2Var[xi].formulae-sequencesubscript~𝑤𝑗𝑖subscript𝑤𝑗𝑖Varsubscript𝑥𝑖andsubscriptsuperscript~𝜎2𝑖subscriptsuperscript𝜎2𝑖Varsubscript𝑥𝑖\widetilde{w}_{j,i}=\frac{w_{j,i}}{\sqrt{\operatorname{Var}[x_{i}]}}\quad\quad% \text{and}\quad\quad\widetilde{\sigma}^{2}_{i}=\frac{\sigma^{2}_{i}}{% \operatorname{Var}[x_{i}]}\,.over~ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT = divide start_ARG italic_w start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Var [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] end_ARG end_ARG and over~ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG roman_Var [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] end_ARG . (7)

In the above, we can write the variance of xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT explicitly as

Var[xi]Varsubscript𝑥𝑖\displaystyle\operatorname{Var}[x_{i}]roman_Var [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] =Var[jpa(i)wj,ix~j+εi]=Var[jpa(i)wj,ix~j]+σi2absentVarsubscript𝑗pa𝑖subscript𝑤𝑗𝑖subscript~𝑥𝑗subscript𝜀𝑖Varsubscript𝑗pa𝑖subscript𝑤𝑗𝑖subscript~𝑥𝑗superscriptsubscript𝜎𝑖2\displaystyle=\operatorname{Var}\bigg{[}\sum_{j\in\mathrm{pa}(i)}w_{j,i}% \widetilde{x}_{j}+\varepsilon_{i}\bigg{]}=\operatorname{Var}\bigg{[}\sum_{j\in% \mathrm{pa}(i)}w_{j,i}\widetilde{x}_{j}\bigg{]}+\sigma_{i}^{2}= roman_Var [ ∑ start_POSTSUBSCRIPT italic_j ∈ roman_pa ( italic_i ) end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] = roman_Var [ ∑ start_POSTSUBSCRIPT italic_j ∈ roman_pa ( italic_i ) end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] + italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (8)
=1kpa(i)jpa(i)Cov[wk,ix~k,wj,ix~j]+σi21subscript𝑘pa𝑖subscript𝑗pa𝑖Covsubscript𝑤𝑘𝑖subscript~𝑥𝑘subscript𝑤𝑗𝑖subscript~𝑥𝑗superscriptsubscript𝜎𝑖2\displaystyle\overset{\tiny\leavevmode\hbox to10.08pt{\vbox to10.08pt{% \pgfpicture\makeatletter\hbox{\hskip 5.03984pt\lower-5.03984pt\hbox to0.0pt{% \pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}% {0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont% \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@invoke{\lxSVG@closescope }% \pgfsys@endscope\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{% }{}{}{{}\pgfsys@moveto{4.83984pt}{0.0pt}\pgfsys@curveto{4.83984pt}{2.673pt}{2.% 673pt}{4.83984pt}{0.0pt}{4.83984pt}\pgfsys@curveto{-2.673pt}{4.83984pt}{-4.839% 84pt}{2.673pt}{-4.83984pt}{0.0pt}\pgfsys@curveto{-4.83984pt}{-2.673pt}{-2.673% pt}{-4.83984pt}{0.0pt}{-4.83984pt}\pgfsys@curveto{2.673pt}{-4.83984pt}{4.83984% pt}{-2.673pt}{4.83984pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}% \pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.25pt}{-1.6111pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{% rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{1}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}}{=}\sum_{k\in\mathrm{pa}(i)}\sum_{j\in% \mathrm{pa}(i)}\operatorname{Cov}[w_{k,i}\widetilde{x}_{k},w_{j,i}\widetilde{x% }_{j}]+\sigma_{i}^{2}over1 start_ARG = end_ARG ∑ start_POSTSUBSCRIPT italic_k ∈ roman_pa ( italic_i ) end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j ∈ roman_pa ( italic_i ) end_POSTSUBSCRIPT roman_Cov [ italic_w start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_w start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] + italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
=2kpa(i)jpa(i)wk,iwj,iCov[x~k,x~j]+σi2,2subscript𝑘pa𝑖subscript𝑗pa𝑖subscript𝑤𝑘𝑖subscript𝑤𝑗𝑖Covsubscript~𝑥𝑘subscript~𝑥𝑗superscriptsubscript𝜎𝑖2\displaystyle\overset{\tiny\leavevmode\hbox to10.08pt{\vbox to10.08pt{% \pgfpicture\makeatletter\hbox{\hskip 5.03984pt\lower-5.03984pt\hbox to0.0pt{% \pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}% {0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont% \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@invoke{\lxSVG@closescope }% \pgfsys@endscope\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{% }{}{}{{}\pgfsys@moveto{4.83984pt}{0.0pt}\pgfsys@curveto{4.83984pt}{2.673pt}{2.% 673pt}{4.83984pt}{0.0pt}{4.83984pt}\pgfsys@curveto{-2.673pt}{4.83984pt}{-4.839% 84pt}{2.673pt}{-4.83984pt}{0.0pt}\pgfsys@curveto{-4.83984pt}{-2.673pt}{-2.673% pt}{-4.83984pt}{0.0pt}{-4.83984pt}\pgfsys@curveto{2.673pt}{-4.83984pt}{4.83984% pt}{-2.673pt}{4.83984pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}% \pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.25pt}{-1.6111pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{% rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{2}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}}{=}\sum_{k\in\mathrm{pa}(i)}\sum_{j\in% \mathrm{pa}(i)}w_{k,i}w_{j,i}\operatorname{Cov}[\widetilde{x}_{k},\widetilde{x% }_{j}]+\sigma_{i}^{2}\,,over2 start_ARG = end_ARG ∑ start_POSTSUBSCRIPT italic_k ∈ roman_pa ( italic_i ) end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j ∈ roman_pa ( italic_i ) end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT roman_Cov [ over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] + italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

where 1 follows from Bienaymé’s identity and 2 from covariance being bilinear. Substituting the variance into the expressions for the weights and noise variances, we obtain

w~j,isubscript~𝑤𝑗𝑖\displaystyle\widetilde{w}_{j,i}over~ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT =wj,ikpa(i)jpa(i)wk,iwj,iCov[x~k,x~j]+σi2,absentsubscript𝑤𝑗𝑖subscript𝑘pa𝑖subscript𝑗pa𝑖subscript𝑤𝑘𝑖subscript𝑤𝑗𝑖Covsubscript~𝑥𝑘subscript~𝑥𝑗superscriptsubscript𝜎𝑖2\displaystyle=\frac{w_{j,i}}{\sqrt{\sum_{k\in\mathrm{pa}(i)}\sum_{j\in\mathrm{% pa}(i)}w_{k,i}w_{j,i}\operatorname{Cov}[\widetilde{x}_{k},\widetilde{x}_{j}]+% \sigma_{i}^{2}}}\,,= divide start_ARG italic_w start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG ∑ start_POSTSUBSCRIPT italic_k ∈ roman_pa ( italic_i ) end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j ∈ roman_pa ( italic_i ) end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT roman_Cov [ over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] + italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG , (9)
σ~i2subscriptsuperscript~𝜎2𝑖\displaystyle\widetilde{\sigma}^{2}_{i}over~ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =σi2kpa(i)jpa(i)wk,iwj,iCov[x~k,x~j]+σi2.absentsubscriptsuperscript𝜎2𝑖subscript𝑘pa𝑖subscript𝑗pa𝑖subscript𝑤𝑘𝑖subscript𝑤𝑗𝑖Covsubscript~𝑥𝑘subscript~𝑥𝑗superscriptsubscript𝜎𝑖2\displaystyle=\frac{\sigma^{2}_{i}}{\sum_{k\in\mathrm{pa}(i)}\sum_{j\in\mathrm% {pa}(i)}w_{k,i}w_{j,i}\operatorname{Cov}[\widetilde{x}_{k},\widetilde{x}_{j}]+% \sigma_{i}^{2}}\,.= divide start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_k ∈ roman_pa ( italic_i ) end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j ∈ roman_pa ( italic_i ) end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT roman_Cov [ over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] + italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG . (10)

Finally, by construction, the variables 𝐱~~𝐱\widetilde{\bf{x}}over~ start_ARG bold_x end_ARG of an iSCM have unit marginal variances. Thus, when the parents of x~isubscript~𝑥𝑖\widetilde{x}_{i}over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are pairwise independent, Equation 10 simplifies to

w~j,i=wj,ijpa(i)wj,i2+σi2.subscript~𝑤𝑗𝑖subscript𝑤𝑗𝑖subscript𝑗pa𝑖superscriptsubscript𝑤𝑗𝑖2superscriptsubscript𝜎𝑖2\widetilde{w}_{j,i}=\frac{w_{j,i}}{\sqrt{\sum_{j\in\mathrm{pa}(i)}w_{j,i}^{2}+% \sigma_{i}^{2}}}.over~ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT = divide start_ARG italic_w start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG ∑ start_POSTSUBSCRIPT italic_j ∈ roman_pa ( italic_i ) end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG . (11)

This independence condition always holds when the DAG 𝒢𝒢\mathcal{G}caligraphic_G is a forest.

Efficient computation

We can efficiently compute the implied model weights using a bottom-up dynamic programming approach. This allows sampling data directly from the exact implied model of an iSCM without resorting to empirical standardization statistics. Algorithm 2 describes the procedure. We iteratively compute the weights and noise variances of the implied model following Equations (9) and (10). At each iteration, we update the covariance matrix according to Lemma 1. The algorithm processes the nodes in topological order, mirroring the proof by induction of Lemma 1.

Algorithm 2 Computing the Implied Model Parameters of Linear iSCMs
Input: DAG 𝒢𝒢\mathcal{G}caligraphic_G, weight matrix [W]i,j:=wi,jassignsubscriptdelimited-[]𝑊𝑖𝑗subscript𝑤𝑖𝑗[W]_{i,j}:=w_{i,j}[ italic_W ] start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT := italic_w start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT, noise variances 𝝈2+dsuperscript𝝈2superscriptsubscript𝑑\bm{\sigma}^{2}\in\mathbb{R}_{+}^{d}bold_italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT
W~0d×d~𝑊subscript0𝑑𝑑\widetilde{W}\leftarrow 0_{d\times d}over~ start_ARG italic_W end_ARG ← 0 start_POSTSUBSCRIPT italic_d × italic_d end_POSTSUBSCRIPT
ΣIdΣsubscript𝐼𝑑\Sigma\leftarrow I_{d}roman_Σ ← italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT
π𝜋absent\pi\leftarrowitalic_π ← topological ordering of 𝒢𝒢\mathcal{G}caligraphic_G
for i=1𝑖1i=1italic_i = 1 to d𝑑ditalic_d do
     𝐰W:,πi𝐰subscript𝑊:subscript𝜋𝑖\mathbf{w}\leftarrow W_{:,\pi_{i}}bold_w ← italic_W start_POSTSUBSCRIPT : , italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT \triangleright Edge weights ingoing to πisubscript𝜋𝑖\pi_{i}italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT
     Var[xπi]𝐰Σ𝐰+σπi2Varsubscript𝑥subscript𝜋𝑖superscript𝐰topΣ𝐰subscriptsuperscript𝜎2subscript𝜋𝑖\operatorname{Var}[x_{\pi_{i}}]\leftarrow\mathbf{w}^{\top}\Sigma\mathbf{w}+% \sigma^{2}_{\pi_{i}}roman_Var [ italic_x start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ] ← bold_w start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Σ bold_w + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT \triangleright Equation (8)
     W~:,πi𝐰/Var[xπi]subscript~𝑊:subscript𝜋𝑖𝐰Varsubscript𝑥subscript𝜋𝑖\widetilde{W}_{:,\pi_{i}}\leftarrow\mathbf{w}/\sqrt{\operatorname{Var}[x_{\pi_% {i}}]}over~ start_ARG italic_W end_ARG start_POSTSUBSCRIPT : , italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ← bold_w / square-root start_ARG roman_Var [ italic_x start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ] end_ARG \triangleright Equation (9)
     σ~πi2σπi2/Var[xπi]subscriptsuperscript~𝜎2subscript𝜋𝑖subscriptsuperscript𝜎2subscript𝜋𝑖Varsubscript𝑥subscript𝜋𝑖\widetilde{\sigma}^{2}_{\pi_{i}}\leftarrow\sigma^{2}_{\pi_{i}}/\operatorname{% Var}[x_{\pi_{i}}]over~ start_ARG italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ← italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT / roman_Var [ italic_x start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ] \triangleright Equation (10)
     for j=1𝑗1j=1italic_j = 1 to i𝑖iitalic_i do
         Σπj,πi(Σπj,:)W~:,πisubscriptΣsubscript𝜋𝑗subscript𝜋𝑖superscriptsubscriptΣsubscript𝜋𝑗:topsubscript~𝑊:subscript𝜋𝑖\Sigma_{\pi_{j},\pi_{i}}\leftarrow(\Sigma_{\pi_{j},:})^{\top}\widetilde{W}_{:,% \pi_{i}}roman_Σ start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ← ( roman_Σ start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , : end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over~ start_ARG italic_W end_ARG start_POSTSUBSCRIPT : , italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT
         Σπi,πjΣπj,πisubscriptΣsubscript𝜋𝑖subscript𝜋𝑗subscriptΣsubscript𝜋𝑗subscript𝜋𝑖\Sigma_{\pi_{i},\pi_{j}}\leftarrow\Sigma_{\pi_{j},\pi_{i}}roman_Σ start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ← roman_Σ start_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_π start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT      
return implied weights W~~𝑊\widetilde{W}over~ start_ARG italic_W end_ARG, implied noise variances 𝝈~2superscript~𝝈2\widetilde{\bm{\sigma}}^{2}over~ start_ARG bold_italic_σ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT

Appendix B Interventions in iSCMs

For an iSCM (𝐒,𝒫𝜺)𝐒subscript𝒫𝜺(\mathbf{S},\mathcal{P}_{\bm{\varepsilon}})( bold_S , caligraphic_P start_POSTSUBSCRIPT bold_italic_ε end_POSTSUBSCRIPT ), we can formalize interventions as changes to its causal mechanisms fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, analogous to the common definition for SCMs (Peters et al.,, 2017). Specifically, let μi:=𝔼[xi]assignsubscript𝜇𝑖𝔼delimited-[]subscript𝑥𝑖\mu_{i}:=\mathds{E}[x_{i}]italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := blackboard_E [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] and si:=Var[xi]assignsubscript𝑠𝑖Varsubscript𝑥𝑖s_{i}:=\smash{\sqrt{\operatorname{Var}[x_{i}]}}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := square-root start_ARG roman_Var [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] end_ARG be the mean and standard deviation of the latent variable xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. We define an intervention as replacing one (or several) of the assignments to the latent variables as

xi:=hi(𝐱~pa(i),εi),assignsubscript𝑥𝑖subscript𝑖subscript~𝐱pa𝑖subscript𝜀𝑖\displaystyle\begin{split}x_{i}&:=h_{i}(\mathbf{\widetilde{x}}_{\mathrm{pa}(i)% },\varepsilon_{i}),\end{split}start_ROW start_CELL italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL start_CELL := italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT roman_pa ( italic_i ) end_POSTSUBSCRIPT , italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , end_CELL end_ROW

for some function hisubscript𝑖h_{i}italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Importantly, the statistics μisubscript𝜇𝑖\mu_{i}italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT used for the standardization operation

x~i:=xiμisiassignsubscript~𝑥𝑖subscript𝑥𝑖subscript𝜇𝑖subscript𝑠𝑖\displaystyle\widetilde{x}_{i}:=\frac{x_{i}-\mu_{i}}{s_{i}}over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := divide start_ARG italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG

remain unchanged. Thus, if we intervene on mechanisms of iSCMs, the variables 𝐱~~𝐱\widetilde{\bf{x}}over~ start_ARG bold_x end_ARG may no longer have zero mean and unit variance, and the perturbations of xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT propagate downstream through the causal mechanisms. We note that, under the above definition, intervening on an iSCM through a new mechanism hisubscript𝑖h_{i}italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is equivalent to intervening on the implied SCM of an iSCM with the mechanism

h~i(𝐱,ε)=hi(𝐱,ε)μisi.subscript~𝑖𝐱𝜀subscript𝑖𝐱𝜀subscript𝜇𝑖subscript𝑠𝑖\displaystyle\widetilde{h}_{i}(\mathbf{x},\varepsilon)=\frac{h_{i}(\mathbf{x},% \varepsilon)-\mu_{i}}{s_{i}}\,.over~ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x , italic_ε ) = divide start_ARG italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x , italic_ε ) - italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG .

Appendix A.2 provides details on the implied models of iSCMs.

Appendix C Proofs

C.1 Definitions

We define the key concepts used throughout our analysis. A path pjisubscript𝑝𝑗𝑖p_{j\leftrightarrow i}italic_p start_POSTSUBSCRIPT italic_j ↔ italic_i end_POSTSUBSCRIPT between visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is a set of directed edges that allows reaching visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT from vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT (and vice versa), not taking into account edge directionality, and that joins unique vertices. We call a node a collider in a path if the node has two ingoing directed edges in the path. We say that a path between visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is unblocked if and only if there is no node vksubscript𝑣𝑘v_{k}italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT that is a collider in the path (see Figure 9(a)). Finally, we use the term undirected connected component to refer to any maximal subgraph of 𝒢~~𝒢\smash{\tilde{\mathcal{G}}}over~ start_ARG caligraphic_G end_ARG in which any two nodes are connected by a path containing only undirected edges (Wienöbst et al.,, 2023).

C.2 Explicit Covariance in Linear SCMs with Unit Marginal Variances

See 1

Proof.

We will give a proof by induction on the number of vertices d=|𝒱|𝑑𝒱d=|\mathcal{V}|italic_d = | caligraphic_V | in the DAG 𝒢𝒢\mathcal{G}caligraphic_G. Without loss of generality, we assume that the indices of the nodes are ordered according to some fixed topological ordering π𝜋\piitalic_π, so π(j)<π(i)𝜋𝑗𝜋𝑖\pi(j)<\pi(i)italic_π ( italic_j ) < italic_π ( italic_i ) if j<i𝑗𝑖j<iitalic_j < italic_i. By the unit marginal variance assumption,

Cov[xi,xi]=Var[xi]=1.Covsubscript𝑥𝑖subscript𝑥𝑖Varsubscript𝑥𝑖1\operatorname{Cov}[x_{i},x_{i}]=\operatorname{Var}[x_{i}]=1\,.roman_Cov [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] = roman_Var [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] = 1 . (12)

From now on and without loss of generality, we consider two arbitrary indices j<i𝑗𝑖j<iitalic_j < italic_i. The covariance between xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and xjsubscript𝑥𝑗x_{j}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is symmetric.

Base case (d=2𝑑2d=2italic_d = 2)

If vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is not an ancestor of visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in graph 𝒢𝒢\mathcal{G}caligraphic_G, they both must be root nodes, because the edge vivjsubscript𝑣𝑖subscript𝑣𝑗v_{i}\leftarrow v_{j}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ← italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is the only possible edge when π(j)<π(i)𝜋𝑗𝜋𝑖\pi(j)<\pi(i)italic_π ( italic_j ) < italic_π ( italic_i ). Since xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and xjsubscript𝑥𝑗x_{j}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are root nodes, they are independent and Cov[xi,xj]=0Covsubscript𝑥𝑖subscript𝑥𝑗0\operatorname{Cov}[x_{i},x_{j}]=0roman_Cov [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] = 0. Since a path of one edge cannot contain a collider, there are no unblocked paths between visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, so the RHS of Equation (3) is also 00.

Conversely, if vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is an ancestor of visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in graph 𝒢𝒢\mathcal{G}caligraphic_G, vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is the only parent and ancestor of visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. This implies that

Cov[xi,xj]=Cov[wj,ixj+εi,xj]=wj,iCov[xj,xj]=wj,i,Covsubscript𝑥𝑖subscript𝑥𝑗Covsubscript𝑤𝑗𝑖subscript𝑥𝑗subscript𝜀𝑖subscript𝑥𝑗subscript𝑤𝑗𝑖Covsubscript𝑥𝑗subscript𝑥𝑗subscript𝑤𝑗𝑖\displaystyle\begin{split}\operatorname{Cov}[x_{i},x_{j}]&=\operatorname{Cov}[% w_{j,i}x_{j}+\varepsilon_{i},x_{j}]\\ &=w_{j,i}\operatorname{Cov}[x_{j},x_{j}]\\ &=w_{j,i}\,,\end{split}start_ROW start_CELL roman_Cov [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] end_CELL start_CELL = roman_Cov [ italic_w start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = italic_w start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT roman_Cov [ italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = italic_w start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT , end_CELL end_ROW

where the last equality follows from Equation (12). This is exactly Equation (3) for a two-node graph.

visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPTvjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT
Figure 6: Lemma 1 inductive step. If vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is before visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in the topological ordering, then all unblocked paths from vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT to visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT must contain a parent of visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT as the second to last node. To see this, suppose an unblocked path from vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT to visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT would instead contain a child of visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT as the last node. Then, there either exists a collider on the path to vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, contradicting that the path is unblocked, or all edges in the path point away from visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, implying that vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is a descendant of visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and contradicting the topological ordering. Dotted lines represent unblocked paths (which may have common nodes). Solid lines represent edges. vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT may or may not be a parent of visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, which we illustrate with a blue arrow.
Induction step (d>2𝑑2d>2italic_d > 2)

Let us assume that Equation (3) holds for all graphs of size d1𝑑1d-1italic_d - 1, and let 𝒢𝒢\mathcal{G}caligraphic_G have d𝑑ditalic_d nodes. We will apply the inductive hypothesis to the subgraph of the first d1𝑑1d-1italic_d - 1 nodes in 𝒢𝒢\mathcal{G}caligraphic_G and show that the full DAG 𝒢𝒢\mathcal{G}caligraphic_G including the d𝑑ditalic_d-th vertex still satisfies Equation (3). First, we note that, since the d𝑑ditalic_d-th vertex is last in the topological ordering, it has no outgoing edges. Because the node has no outgoing edges, it is not visited on any unblocked paths between vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for i,j<d𝑖𝑗𝑑i,j<ditalic_i , italic_j < italic_d, as vdsubscript𝑣𝑑v_{d}italic_v start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT must be a collider in any path. Second, adding the node vdsubscript𝑣𝑑v_{d}italic_v start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT to a subsystem containing x1,,xd1subscript𝑥1subscript𝑥𝑑1x_{1},\dots,x_{d-1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_d - 1 end_POSTSUBSCRIPT results in no change to the joint distribution of xi,xjsubscript𝑥𝑖subscript𝑥𝑗x_{i},x_{j}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. Therefore, it has no effect on the covariance between xi,xjsubscript𝑥𝑖subscript𝑥𝑗x_{i},x_{j}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. Hence, both sides of Equation 3 are unchanged by the presence of a node vdsubscript𝑣𝑑v_{d}italic_v start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT for all i,j<d𝑖𝑗𝑑i,j<ditalic_i , italic_j < italic_d and the equation still holds for all i,j<d𝑖𝑗𝑑i,j<ditalic_i , italic_j < italic_d.

We want to show that Equation 3 also holds for i=d𝑖𝑑i=ditalic_i = italic_d and any j<i𝑗𝑖j<iitalic_j < italic_i. For this, we first construct all unblocked paths from vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT to visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. First, we note that any unblocked path must go through the parents kpa(i)𝑘pa𝑖k\in\text{pa}(i)italic_k ∈ pa ( italic_i ), because j<i𝑗𝑖j<iitalic_j < italic_i in the topological ordering (see Figure 6). Moreover, for any kpa(i)𝑘pa𝑖k\in\text{pa}(i)italic_k ∈ pa ( italic_i ), appending ki𝑘𝑖k\rightarrow iitalic_k → italic_i to an unblocked path pjksubscript𝑝𝑗𝑘p_{j\leftrightarrow k}italic_p start_POSTSUBSCRIPT italic_j ↔ italic_k end_POSTSUBSCRIPT between vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and vksubscript𝑣𝑘v_{k}italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, creates a new unblocked path between vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Hence, for i=d𝑖𝑑i=ditalic_i = italic_d and any j<i𝑗𝑖j<iitalic_j < italic_i, it holds that

Cov[xi,xj]=Cov[kpa(i)wk,ixk+εi,xj]=kpa(i)wk,iCov[xk,xj]=1wj,iCov[xj,xj]+kpa(i)jwk,iCov[xk,xj]=2wj,i+kpa(i)jwk,ipjkPjk(l,m)pjkwl,m=wj,i+kpa(i)j(pjkPjkwk,i(l,m)pjkwl,m)=3kpa(i)(𝟙[k=j]wj,i+𝟙[kj](pjkPjkwk,i(l,m)pjkwl,m))=4pjiPji(l,m)pjiwl,m.Covsubscript𝑥𝑖subscript𝑥𝑗Covsubscript𝑘pa𝑖subscript𝑤𝑘𝑖subscript𝑥𝑘subscript𝜀𝑖subscript𝑥𝑗subscript𝑘pa𝑖subscript𝑤𝑘𝑖Covsubscript𝑥𝑘subscript𝑥𝑗1subscript𝑤𝑗𝑖Covsubscript𝑥𝑗subscript𝑥𝑗subscript𝑘pa𝑖𝑗subscript𝑤𝑘𝑖Covsubscript𝑥𝑘subscript𝑥𝑗2subscript𝑤𝑗𝑖subscript𝑘pa𝑖𝑗subscript𝑤𝑘𝑖subscriptsubscript𝑝𝑗𝑘subscript𝑃𝑗𝑘subscriptproduct𝑙𝑚subscript𝑝𝑗𝑘subscript𝑤𝑙𝑚subscript𝑤𝑗𝑖subscript𝑘pa𝑖𝑗subscriptsubscript𝑝𝑗𝑘subscript𝑃𝑗𝑘subscript𝑤𝑘𝑖subscriptproduct𝑙𝑚subscript𝑝𝑗𝑘subscript𝑤𝑙𝑚3subscript𝑘pa𝑖1delimited-[]𝑘𝑗subscript𝑤𝑗𝑖1delimited-[]𝑘𝑗subscriptsubscript𝑝𝑗𝑘subscript𝑃𝑗𝑘subscript𝑤𝑘𝑖subscriptproduct𝑙𝑚subscript𝑝𝑗𝑘subscript𝑤𝑙𝑚4subscriptsubscript𝑝𝑗𝑖subscript𝑃𝑗𝑖subscriptproduct𝑙𝑚subscript𝑝𝑗𝑖subscript𝑤𝑙𝑚\displaystyle\begin{split}\operatorname{Cov}[x_{i},x_{j}]&=\operatorname{Cov}[% \sum_{k\in\mathrm{pa}(i)}w_{k,i}x_{k}+\varepsilon_{i},x_{j}]\\ &=\sum_{k\in\mathrm{pa}(i)}w_{k,i}\operatorname{Cov}[x_{k},x_{j}]\\ &\overset{\tiny\leavevmode\hbox to10.08pt{\vbox to10.08pt{\pgfpicture% \makeatletter\hbox{\hskip 5.03984pt\lower-5.03984pt\hbox to0.0pt{% \pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}% {0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont% \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@invoke{\lxSVG@closescope }% \pgfsys@endscope\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{% }{}{}{{}\pgfsys@moveto{4.83984pt}{0.0pt}\pgfsys@curveto{4.83984pt}{2.673pt}{2.% 673pt}{4.83984pt}{0.0pt}{4.83984pt}\pgfsys@curveto{-2.673pt}{4.83984pt}{-4.839% 84pt}{2.673pt}{-4.83984pt}{0.0pt}\pgfsys@curveto{-4.83984pt}{-2.673pt}{-2.673% pt}{-4.83984pt}{0.0pt}{-4.83984pt}\pgfsys@curveto{2.673pt}{-4.83984pt}{4.83984% pt}{-2.673pt}{4.83984pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}% \pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.25pt}{-1.6111pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{% rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{1}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}}{=}w_{j,i}\operatorname{Cov}[x_{j},x_{j}]+% \sum_{k\in\mathrm{pa}(i)\setminus{j}}w_{k,i}\operatorname{Cov}[x_{k},x_{j}]\\ &\overset{\tiny\leavevmode\hbox to10.08pt{\vbox to10.08pt{\pgfpicture% \makeatletter\hbox{\hskip 5.03984pt\lower-5.03984pt\hbox to0.0pt{% \pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}% {0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont% \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@invoke{\lxSVG@closescope }% \pgfsys@endscope\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{% }{}{}{{}\pgfsys@moveto{4.83984pt}{0.0pt}\pgfsys@curveto{4.83984pt}{2.673pt}{2.% 673pt}{4.83984pt}{0.0pt}{4.83984pt}\pgfsys@curveto{-2.673pt}{4.83984pt}{-4.839% 84pt}{2.673pt}{-4.83984pt}{0.0pt}\pgfsys@curveto{-4.83984pt}{-2.673pt}{-2.673% pt}{-4.83984pt}{0.0pt}{-4.83984pt}\pgfsys@curveto{2.673pt}{-4.83984pt}{4.83984% pt}{-2.673pt}{4.83984pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}% \pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.25pt}{-1.6111pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{% rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{2}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}}{=}w_{j,i}+\sum_{k\in\mathrm{pa}(i)\setminus% {j}}w_{k,i}\sum_{p_{j\leftrightarrow k}\in P_{j\leftrightarrow k}}\prod_{(l,m)% \in p_{j\leftrightarrow k}}w_{l,m}\\ &=w_{j,i}+\sum_{k\in\mathrm{pa}(i)\setminus{j}}\left(\sum_{p_{j\leftrightarrow k% }\in P_{j\leftrightarrow k}}w_{k,i}\prod_{(l,m)\in p_{j\leftrightarrow k}}w_{l% ,m}\right)\\ &\overset{\tiny\leavevmode\hbox to10.08pt{\vbox to10.08pt{\pgfpicture% \makeatletter\hbox{\hskip 5.03984pt\lower-5.03984pt\hbox to0.0pt{% \pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}% {0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont% \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@invoke{\lxSVG@closescope }% \pgfsys@endscope\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{% }{}{}{{}\pgfsys@moveto{4.83984pt}{0.0pt}\pgfsys@curveto{4.83984pt}{2.673pt}{2.% 673pt}{4.83984pt}{0.0pt}{4.83984pt}\pgfsys@curveto{-2.673pt}{4.83984pt}{-4.839% 84pt}{2.673pt}{-4.83984pt}{0.0pt}\pgfsys@curveto{-4.83984pt}{-2.673pt}{-2.673% pt}{-4.83984pt}{0.0pt}{-4.83984pt}\pgfsys@curveto{2.673pt}{-4.83984pt}{4.83984% pt}{-2.673pt}{4.83984pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}% \pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.25pt}{-1.6111pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{% rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{3}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}}{=}\sum_{k\in\mathrm{pa}(i)}\left(\mathbbm{1% }[k=j]w_{j,i}+\mathbbm{1}[k\neq j]\left(\sum_{p_{j\leftrightarrow k}\in P_{j% \leftrightarrow k}}w_{k,i}\prod_{(l,m)\in p_{j\leftrightarrow k}}w_{l,m}\right% )\right)\\ &\overset{\tiny\leavevmode\hbox to10.08pt{\vbox to10.08pt{\pgfpicture% \makeatletter\hbox{\hskip 5.03984pt\lower-5.03984pt\hbox to0.0pt{% \pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}% {0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont% \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@invoke{\lxSVG@closescope }% \pgfsys@endscope\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{% }{}{}{{}\pgfsys@moveto{4.83984pt}{0.0pt}\pgfsys@curveto{4.83984pt}{2.673pt}{2.% 673pt}{4.83984pt}{0.0pt}{4.83984pt}\pgfsys@curveto{-2.673pt}{4.83984pt}{-4.839% 84pt}{2.673pt}{-4.83984pt}{0.0pt}\pgfsys@curveto{-4.83984pt}{-2.673pt}{-2.673% pt}{-4.83984pt}{0.0pt}{-4.83984pt}\pgfsys@curveto{2.673pt}{-4.83984pt}{4.83984% pt}{-2.673pt}{4.83984pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}% \pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.25pt}{-1.6111pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{% rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{4}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}}{=}\sum_{p_{j\leftrightarrow i}\in P_{j% \leftrightarrow i}}\prod_{(l,m)\in p_{j\leftrightarrow i}}w_{l,m}\,.\end{split}start_ROW start_CELL roman_Cov [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] end_CELL start_CELL = roman_Cov [ ∑ start_POSTSUBSCRIPT italic_k ∈ roman_pa ( italic_i ) end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = ∑ start_POSTSUBSCRIPT italic_k ∈ roman_pa ( italic_i ) end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT roman_Cov [ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL over1 start_ARG = end_ARG italic_w start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT roman_Cov [ italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] + ∑ start_POSTSUBSCRIPT italic_k ∈ roman_pa ( italic_i ) ∖ italic_j end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT roman_Cov [ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL over2 start_ARG = end_ARG italic_w start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_k ∈ roman_pa ( italic_i ) ∖ italic_j end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_j ↔ italic_k end_POSTSUBSCRIPT ∈ italic_P start_POSTSUBSCRIPT italic_j ↔ italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT ( italic_l , italic_m ) ∈ italic_p start_POSTSUBSCRIPT italic_j ↔ italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_l , italic_m end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = italic_w start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_k ∈ roman_pa ( italic_i ) ∖ italic_j end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_j ↔ italic_k end_POSTSUBSCRIPT ∈ italic_P start_POSTSUBSCRIPT italic_j ↔ italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT ( italic_l , italic_m ) ∈ italic_p start_POSTSUBSCRIPT italic_j ↔ italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_l , italic_m end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL over3 start_ARG = end_ARG ∑ start_POSTSUBSCRIPT italic_k ∈ roman_pa ( italic_i ) end_POSTSUBSCRIPT ( blackboard_1 [ italic_k = italic_j ] italic_w start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT + blackboard_1 [ italic_k ≠ italic_j ] ( ∑ start_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_j ↔ italic_k end_POSTSUBSCRIPT ∈ italic_P start_POSTSUBSCRIPT italic_j ↔ italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT ( italic_l , italic_m ) ∈ italic_p start_POSTSUBSCRIPT italic_j ↔ italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_l , italic_m end_POSTSUBSCRIPT ) ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL over4 start_ARG = end_ARG ∑ start_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_j ↔ italic_i end_POSTSUBSCRIPT ∈ italic_P start_POSTSUBSCRIPT italic_j ↔ italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT ( italic_l , italic_m ) ∈ italic_p start_POSTSUBSCRIPT italic_j ↔ italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_l , italic_m end_POSTSUBSCRIPT . end_CELL end_ROW

For step 1, consider two cases. If jpa(i)𝑗pa𝑖j\notin\text{pa}(i)italic_j ∉ pa ( italic_i ), then wj,i=0subscript𝑤𝑗𝑖0w_{j,i}=0italic_w start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT = 0 and the equality trivially holds. If jpa(i)𝑗pa𝑖j\in\text{pa}(i)italic_j ∈ pa ( italic_i ), then it holds by pulling the term for j𝑗jitalic_j out of the sum in the previous line. In 2, we apply the inductive hypothesis to express the covariances in terms of a sum of products of weights. In 3, we rearrange terms to pull the wj,isubscript𝑤𝑗𝑖w_{j,i}italic_w start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT term into the sum over parents. In 4, we use the fact that the set of unblocked paths from vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT to visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT corresponds to all paths from vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT to any parent of visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, which is vksubscript𝑣𝑘v_{k}italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT here, with an extra edge ki𝑘𝑖k\rightarrow iitalic_k → italic_i appended, and a possible single-edge path directly connecting vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT with visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (if jpa(i)𝑗pa𝑖j\in\mathrm{pa}(i)italic_j ∈ roman_pa ( italic_i )).

This completes the induction step and the proof. ∎

C.3 Bound on the Fraction of CEV

See 2

Proof.

We begin by bounding the variance of the latent variables xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in iSCMs. Starting from Equation (8), we can bound the covariances with a product of unit variances as

Var[xi]=kpa(i)jpa(i)wk,iwj,iCov[x~j,x~k]+σ21kpa(i)jpa(i)wk,iwj,i+σ2=(jpa(i)wj,i)2+σ22m2w2+σ2,Varsubscript𝑥𝑖subscript𝑘pa𝑖subscript𝑗pa𝑖subscript𝑤𝑘𝑖subscript𝑤𝑗𝑖Covsubscript~𝑥𝑗subscript~𝑥𝑘superscript𝜎21subscript𝑘pa𝑖subscript𝑗pa𝑖subscript𝑤𝑘𝑖subscript𝑤𝑗𝑖superscript𝜎2superscriptsubscript𝑗pa𝑖subscript𝑤𝑗𝑖2superscript𝜎22superscript𝑚2superscript𝑤2superscript𝜎2\displaystyle\begin{split}\operatorname{Var}[x_{i}]&=\sum_{k\in\mathrm{pa}(i)}% \sum_{j\in\mathrm{pa}(i)}w_{k,i}w_{j,i}\operatorname{Cov}[\widetilde{x}_{j},% \widetilde{x}_{k}]+\sigma^{2}\\ &\overset{\tiny\leavevmode\hbox to10.08pt{\vbox to10.08pt{\pgfpicture% \makeatletter\hbox{\hskip 5.03984pt\lower-5.03984pt\hbox to0.0pt{% \pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}% {0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont% \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@invoke{\lxSVG@closescope }% \pgfsys@endscope\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{% }{}{}{{}\pgfsys@moveto{4.83984pt}{0.0pt}\pgfsys@curveto{4.83984pt}{2.673pt}{2.% 673pt}{4.83984pt}{0.0pt}{4.83984pt}\pgfsys@curveto{-2.673pt}{4.83984pt}{-4.839% 84pt}{2.673pt}{-4.83984pt}{0.0pt}\pgfsys@curveto{-4.83984pt}{-2.673pt}{-2.673% pt}{-4.83984pt}{0.0pt}{-4.83984pt}\pgfsys@curveto{2.673pt}{-4.83984pt}{4.83984% pt}{-2.673pt}{4.83984pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}% \pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.25pt}{-1.6111pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{% rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{1}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}}{\leq}\sum_{k\in\mathrm{pa}(i)}\sum_{j\in% \mathrm{pa}(i)}w_{k,i}w_{j,i}+\sigma^{2}\\ &=\Big{(}\sum_{j\in\mathrm{pa}(i)}w_{j,i}\Big{)}^{2}+\sigma^{2}\\ &\overset{\tiny\leavevmode\hbox to10.08pt{\vbox to10.08pt{\pgfpicture% \makeatletter\hbox{\hskip 5.03984pt\lower-5.03984pt\hbox to0.0pt{% \pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}% {0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont% \pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@invoke{\lxSVG@closescope }% \pgfsys@endscope\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{% }{}{}{{}\pgfsys@moveto{4.83984pt}{0.0pt}\pgfsys@curveto{4.83984pt}{2.673pt}{2.% 673pt}{4.83984pt}{0.0pt}{4.83984pt}\pgfsys@curveto{-2.673pt}{4.83984pt}{-4.839% 84pt}{2.673pt}{-4.83984pt}{0.0pt}\pgfsys@curveto{-4.83984pt}{-2.673pt}{-2.673% pt}{-4.83984pt}{0.0pt}{-4.83984pt}\pgfsys@curveto{2.673pt}{-4.83984pt}{4.83984% pt}{-2.673pt}{4.83984pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}% \pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1% .0}{-1.25pt}{-1.6111pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{% rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }% \pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{2}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}% \pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}% \lxSVG@closescope\endpgfpicture}}}{\leq}m^{2}w^{2}+\sigma^{2}\,,\end{split}start_ROW start_CELL roman_Var [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] end_CELL start_CELL = ∑ start_POSTSUBSCRIPT italic_k ∈ roman_pa ( italic_i ) end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j ∈ roman_pa ( italic_i ) end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT roman_Cov [ over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL over1 start_ARG ≤ end_ARG ∑ start_POSTSUBSCRIPT italic_k ∈ roman_pa ( italic_i ) end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j ∈ roman_pa ( italic_i ) end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_k , italic_i end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = ( ∑ start_POSTSUBSCRIPT italic_j ∈ roman_pa ( italic_i ) end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL over2 start_ARG ≤ end_ARG italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_w start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , end_CELL end_ROW

where 1 uses Cov[x~j,x~k]1Covsubscript~𝑥𝑗subscript~𝑥𝑘1\operatorname{Cov}[\widetilde{x}_{j},\widetilde{x}_{k}]\leq 1roman_Cov [ over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] ≤ 1 since Var[x~j]=1Varsubscript~𝑥𝑗1\operatorname{Var}[\widetilde{x}_{j}]=1roman_Var [ over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] = 1 and Var[x~k]=1Varsubscript~𝑥𝑘1\operatorname{Var}[\widetilde{x}_{k}]=1roman_Var [ over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] = 1, and 2 applies the Cauchy-Schwartz inequality. Since we obtain xi~~subscript𝑥𝑖\widetilde{x_{i}}over~ start_ARG italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG from xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT just by shifting and scaling the latter, we observe that CEVf[xi~]=CEVf[xi]subscriptCEVf~subscript𝑥𝑖subscriptCEVfsubscript𝑥𝑖\operatorname{CEV_{f}}[\widetilde{x_{i}}]=\operatorname{CEV_{f}}[x_{i}]start_OPFUNCTION roman_CEV start_POSTSUBSCRIPT roman_f end_POSTSUBSCRIPT end_OPFUNCTION [ over~ start_ARG italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ] = start_OPFUNCTION roman_CEV start_POSTSUBSCRIPT roman_f end_POSTSUBSCRIPT end_OPFUNCTION [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ]. Using the upper bound on the variance of xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and the definition of the fraction of cause-explained variance in Equation (4)), we get

CEVf[xi~]=CEVf[xi]=1Var[xi𝔼[xi|𝐱pa(i)]]Var[xi]=1Var[xi𝐰i𝐱pa(i)]Var[xi]=1Var[εi]Var[xi]=1σ2Var[xi]1σ2m2w2+σ2.subscriptCEVf~subscript𝑥𝑖subscriptCEVfsubscript𝑥𝑖1Varsubscript𝑥𝑖𝔼delimited-[]conditionalsubscript𝑥𝑖subscript𝐱pa𝑖Varsubscript𝑥𝑖1Varsubscript𝑥𝑖superscriptsubscript𝐰𝑖topsubscript𝐱pa𝑖Varsubscript𝑥𝑖1Varsubscript𝜀𝑖Varsubscript𝑥𝑖1superscript𝜎2Varsubscript𝑥𝑖1superscript𝜎2superscript𝑚2superscript𝑤2superscript𝜎2\displaystyle\begin{split}\operatorname{CEV_{f}}[\widetilde{x_{i}}]&=% \operatorname{CEV_{f}}[x_{i}]=1-\frac{\operatorname{Var}[x_{i}-\mathds{E}[x_{i% }|\mathbf{x}_{\mathrm{pa}(i)}]]}{\operatorname{Var}[x_{i}]}=1-\frac{% \operatorname{Var}[x_{i}-\mathbf{w}_{i}^{\top}\mathbf{x}_{\mathrm{pa}(i)}]}{% \operatorname{Var}[x_{i}]}\\ &=1-\frac{\operatorname{Var}[\varepsilon_{i}]}{\operatorname{Var}[x_{i}]}=1-% \frac{\sigma^{2}}{\operatorname{Var}[x_{i}]}\leq 1-\frac{\sigma^{2}}{m^{2}w^{2% }+\sigma^{2}}\,.\end{split}start_ROW start_CELL start_OPFUNCTION roman_CEV start_POSTSUBSCRIPT roman_f end_POSTSUBSCRIPT end_OPFUNCTION [ over~ start_ARG italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ] end_CELL start_CELL = start_OPFUNCTION roman_CEV start_POSTSUBSCRIPT roman_f end_POSTSUBSCRIPT end_OPFUNCTION [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] = 1 - divide start_ARG roman_Var [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - blackboard_E [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | bold_x start_POSTSUBSCRIPT roman_pa ( italic_i ) end_POSTSUBSCRIPT ] ] end_ARG start_ARG roman_Var [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] end_ARG = 1 - divide start_ARG roman_Var [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_x start_POSTSUBSCRIPT roman_pa ( italic_i ) end_POSTSUBSCRIPT ] end_ARG start_ARG roman_Var [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] end_ARG end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = 1 - divide start_ARG roman_Var [ italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] end_ARG start_ARG roman_Var [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] end_ARG = 1 - divide start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG roman_Var [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] end_ARG ≤ 1 - divide start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_w start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG . end_CELL end_ROW

C.4 Identifiability

In this section, we prove Theorems 3 and 4. We begin by deriving the covariances for the 3-node example in Section 4.2 and then give the general proofs for forests. The proofs of both theorems share the same underlying argument. We first derive the SCM forms of the original models, i.e., standardized SCMs in Theorem 3 and iSCMs in Theorem 4. By showing that the standardized SCMs and iSCMs are SCMs with the same causal graphs 𝒢𝒢\mathcal{G}caligraphic_G and observational distributions p(𝐱)𝑝𝐱p(\bf{x})italic_p ( bold_x ), we can leverage Lemma 1 to obtain the covariances between the observed variables in both model classes. Ultimately, these covariances allow us to derive (non)identifiability conditions for the DAGs 𝒢𝒢\mathcal{G}caligraphic_G in an MEC underlying the original models.

Theorems 3 and 4 assume that the exogenous noise is sampled from a zero-centered distribution with equal variance across variables. Since the results are based on the analysis of covariances, they also hold with the assumption that 𝔼[εi]0𝔼delimited-[]subscript𝜀𝑖0\mathds{E}[\varepsilon_{i}]\neq 0blackboard_E [ italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] ≠ 0, but the zero-mean assumption simplifies notation. To derive the results for iSCMs, we additionally assume that the noise is Gaussian (see Theorem 4) . When referring to an undirected edge between nodes vi,vjsubscript𝑣𝑖subscript𝑣𝑗v_{i},v_{j}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, for example, in an MEC, we still denote the edge with (vi,vj)subscript𝑣𝑖subscript𝑣𝑗(v_{i},v_{j})( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ), but the ordering of the nodes is arbitrary.

C.4.1 3-Node Case

We begin by studying the 3-node example of Figure 3 in Section 4.2. Let αi,βi,γi,λisubscript𝛼𝑖subscript𝛽𝑖subscript𝛾𝑖subscript𝜆𝑖\alpha_{i},\beta_{i},\gamma_{i},\lambda_{i}\in\mathbb{R}italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R be linear function weights, and consider the following three causal graphs 𝒢𝒢\mathcal{G}caligraphic_G belonging to the same MEC, along with their corresponding SCMs and iSCMs.

𝒢𝒢\mathcal{G}caligraphic_G

SCM

iSCM

v1subscript𝑣1v_{1}italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTv2subscript𝑣2v_{2}italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTv3subscript𝑣3v_{3}italic_v start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT
x1subscript𝑥1\displaystyle x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT :=ε1assignabsentsubscript𝜀1\displaystyle:=\varepsilon_{1}:= italic_ε start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (13)
x2subscript𝑥2\displaystyle x_{2}italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT :=α1x1+ε2assignabsentsubscript𝛼1subscript𝑥1subscript𝜀2\displaystyle:=\alpha_{1}x_{1}+\varepsilon_{2}:= italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_ε start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
x3subscript𝑥3\displaystyle x_{3}italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT :=β1x2+ε3assignabsentsubscript𝛽1subscript𝑥2subscript𝜀3\displaystyle:=\beta_{1}x_{2}+\varepsilon_{3}:= italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_ε start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT
x1subscript𝑥1\displaystyle x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT :=ε1assignabsentsubscript𝜀1\displaystyle:=\varepsilon_{1}:= italic_ε start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (14)
x2subscript𝑥2\displaystyle x_{2}italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT :=γ1x~1+ε2assignabsentsubscript𝛾1subscript~𝑥1subscript𝜀2\displaystyle:=\gamma_{1}\widetilde{x}_{1}+\varepsilon_{2}:= italic_γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_ε start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
x3subscript𝑥3\displaystyle x_{3}italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT :=λ1x~2+ε3assignabsentsubscript𝜆1subscript~𝑥2subscript𝜀3\displaystyle:=\lambda_{1}\widetilde{x}_{2}+\varepsilon_{3}:= italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_ε start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT
v1subscript𝑣1v_{1}italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTv2subscript𝑣2v_{2}italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTv3subscript𝑣3v_{3}italic_v start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT
x1subscript𝑥1\displaystyle x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT :=α2x2+ε1assignabsentsubscript𝛼2subscript𝑥2subscript𝜀1\displaystyle:=\alpha_{2}x_{2}+\varepsilon_{1}:= italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_ε start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (15)
x2subscript𝑥2\displaystyle x_{2}italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT :=ε2assignabsentsubscript𝜀2\displaystyle:=\varepsilon_{2}:= italic_ε start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
x3subscript𝑥3\displaystyle x_{3}italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT :=β2x2+ε3assignabsentsubscript𝛽2subscript𝑥2subscript𝜀3\displaystyle:=\beta_{2}x_{2}+\varepsilon_{3}:= italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_ε start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT
x1subscript𝑥1\displaystyle x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT :=γ2x~1+ε1assignabsentsubscript𝛾2subscript~𝑥1subscript𝜀1\displaystyle:=\gamma_{2}\widetilde{x}_{1}+\varepsilon_{1}:= italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_ε start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (16)
x2subscript𝑥2\displaystyle x_{2}italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT :=ε2assignabsentsubscript𝜀2\displaystyle:=\varepsilon_{2}:= italic_ε start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
x3subscript𝑥3\displaystyle x_{3}italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT :=λ2x~2+ε3assignabsentsubscript𝜆2subscript~𝑥2subscript𝜀3\displaystyle:=\lambda_{2}\widetilde{x}_{2}+\varepsilon_{3}:= italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_ε start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT
v1subscript𝑣1v_{1}italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTv2subscript𝑣2v_{2}italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTv3subscript𝑣3v_{3}italic_v start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT

In the following subsections, we derive the covariance matrices of each of the three systems, respectively. This leads us to the equivalence presented in Equation (5) for standardized SCMs. Moreover, we show that, for iSCMs, all three systems induce exactly the same observational distribution if and only if λ1=λ2=λ3subscript𝜆1subscript𝜆2subscript𝜆3\lambda_{1}=\lambda_{2}=\lambda_{3}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_λ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT and γ1=γ2=γ3subscript𝛾1subscript𝛾2subscript𝛾3\gamma_{1}=\gamma_{2}=\gamma_{3}italic_γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_γ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT. These are the 3-node special cases of Theorems 3 and 4.

Standardized SCM

To obtain the covariances between the observed variables in the standardized SCMs of Equations (13), (15), and (LABEL:eq:s3), we first show that the assignments to the observed variables in standardized SCMs can be written in the form of linear SCMs over the same causal graph, which allows us to use Lemma 1. In all three systems, every vertex has at most one parent. When the node vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is the only parent of visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, under our assumptions on the noise, we have xj=Var[xj]xjssubscript𝑥𝑗Varsubscript𝑥𝑗superscriptsubscript𝑥𝑗𝑠x_{j}=\smash{\sqrt{\operatorname{Var}[x_{j}]}x_{j}^{s}}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = square-root start_ARG roman_Var [ italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] end_ARG italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT, so the assignment of xissuperscriptsubscript𝑥𝑖𝑠x_{i}^{s}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT can be written in the form of an SCM over 𝐱𝐬superscript𝐱𝐬\bf{x}^{s}bold_x start_POSTSUPERSCRIPT bold_s end_POSTSUPERSCRIPT as

xis:=xiVar[xi]=wj,ixj+εiVar[xi]=wj,iVar[xj]xjs+εiVar[xi]=wj,iVar[xj]Var[xi]xjs+εiVar[xi].assignsuperscriptsubscript𝑥𝑖𝑠subscript𝑥𝑖Varsubscript𝑥𝑖subscript𝑤𝑗𝑖subscript𝑥𝑗subscript𝜀𝑖Varsubscript𝑥𝑖subscript𝑤𝑗𝑖Varsubscript𝑥𝑗superscriptsubscript𝑥𝑗𝑠subscript𝜀𝑖Varsubscript𝑥𝑖subscript𝑤𝑗𝑖Varsubscript𝑥𝑗Varsubscript𝑥𝑖superscriptsubscript𝑥𝑗𝑠subscript𝜀𝑖Varsubscript𝑥𝑖x_{i}^{s}:=\frac{x_{i}}{\sqrt{\operatorname{Var}[x_{i}]}}=\frac{w_{j,i}x_{j}+% \varepsilon_{i}}{\sqrt{\operatorname{Var}[x_{i}]}}=\frac{w_{j,i}{\sqrt{% \operatorname{Var}[x_{j}]}}x_{j}^{s}+\varepsilon_{i}}{\sqrt{\operatorname{Var}% [x_{i}]}}=w_{j,i}\sqrt{\frac{\operatorname{Var}[x_{j}]}{\operatorname{Var}[x_{% i}]}}x_{j}^{s}+\frac{\varepsilon_{i}}{\sqrt{\operatorname{Var}[x_{i}]}}\,.italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT := divide start_ARG italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Var [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] end_ARG end_ARG = divide start_ARG italic_w start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Var [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] end_ARG end_ARG = divide start_ARG italic_w start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT square-root start_ARG roman_Var [ italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] end_ARG italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT + italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Var [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] end_ARG end_ARG = italic_w start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT square-root start_ARG divide start_ARG roman_Var [ italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] end_ARG start_ARG roman_Var [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] end_ARG end_ARG italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT + divide start_ARG italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Var [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] end_ARG end_ARG . (19)

To use Equation (19), we first need to compute the marginal variances of the unstandardized observations xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. For the standardized SCMs, these marginal variances are, respectively:

for Equation (13): for Equation (15): for Equation (LABEL:eq:s3):
Var[x1]=σ2Varsubscript𝑥1superscript𝜎2\operatorname{Var}[x_{1}]=\sigma^{2}roman_Var [ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] = italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT Var[x1]=(α22+1)σ2Varsubscript𝑥1superscriptsubscript𝛼221superscript𝜎2\operatorname{Var}[x_{1}]=(\alpha_{2}^{2}+1)\sigma^{2}roman_Var [ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] = ( italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 ) italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT Var[x1]=(α32(β32+1)+1)σ2Varsubscript𝑥1superscriptsubscript𝛼32superscriptsubscript𝛽3211superscript𝜎2\operatorname{Var}[x_{1}]=(\alpha_{3}^{2}(\beta_{3}^{2}+1)+1)\sigma^{2}roman_Var [ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] = ( italic_α start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_β start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 ) + 1 ) italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
Var[x2]=(α12+1)σ2Varsubscript𝑥2superscriptsubscript𝛼121superscript𝜎2\operatorname{Var}[x_{2}]=(\alpha_{1}^{2}+1)\sigma^{2}roman_Var [ italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] = ( italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 ) italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT Var[x2]=σ2Varsubscript𝑥2superscript𝜎2\operatorname{Var}[x_{2}]=\sigma^{2}roman_Var [ italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] = italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT Var[x2]=(β32+1)σ2Varsubscript𝑥2superscriptsubscript𝛽321superscript𝜎2\operatorname{Var}[x_{2}]=(\beta_{3}^{2}+1)\sigma^{2}roman_Var [ italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] = ( italic_β start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 ) italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
Var[x3]=(β12(α12+1)+1)σ2Varsubscript𝑥3superscriptsubscript𝛽12superscriptsubscript𝛼1211superscript𝜎2\operatorname{Var}[x_{3}]=(\beta_{1}^{2}(\alpha_{1}^{2}+1)+1)\sigma^{2}roman_Var [ italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ] = ( italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 ) + 1 ) italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT Var[x3]=(β22+1)σ2Varsubscript𝑥3superscriptsubscript𝛽221superscript𝜎2\operatorname{Var}[x_{3}]=(\beta_{2}^{2}+1)\sigma^{2}roman_Var [ italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ] = ( italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 ) italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT Var[x3]=σ2Varsubscript𝑥3superscript𝜎2\operatorname{Var}[x_{3}]=\sigma^{2}roman_Var [ italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ] = italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT

Given Equation (19) and the marginal variances, we know the weights of all three implied SCMs explicitly. Since all implied SCMs are linear, have unit marginal variances, and share the same causal graph, we can apply Lemma 1 and obtain the covariances of the observational distributions in the original models:

for Equation (13): for Equation (15): for Equation (LABEL:eq:s3):
Cov[x1s,x2s]=α1α12+1Covsuperscriptsubscript𝑥1𝑠superscriptsubscript𝑥2𝑠subscript𝛼1superscriptsubscript𝛼121\operatorname{Cov}[x_{1}^{s},x_{2}^{s}]=\tfrac{\alpha_{1}}{\sqrt{\alpha_{1}^{2% }+1}}roman_Cov [ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ] = divide start_ARG italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 end_ARG end_ARG Cov[x1s,x2s]=α2α22+1Covsuperscriptsubscript𝑥1𝑠superscriptsubscript𝑥2𝑠subscript𝛼2superscriptsubscript𝛼221\operatorname{Cov}[x_{1}^{s},x_{2}^{s}]=\tfrac{\alpha_{2}}{\sqrt{\alpha_{2}^{2% }+1}}roman_Cov [ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ] = divide start_ARG italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 end_ARG end_ARG Cov[x1s,x2s]=α3β32+1α32(β32+1)+1Covsuperscriptsubscript𝑥1𝑠superscriptsubscript𝑥2𝑠subscript𝛼3superscriptsubscript𝛽321superscriptsubscript𝛼32superscriptsubscript𝛽3211\operatorname{Cov}[x_{1}^{s},x_{2}^{s}]=\alpha_{3}\sqrt{\tfrac{\beta_{3}^{2}+1% }{\alpha_{3}^{2}(\beta_{3}^{2}+1)+1}}roman_Cov [ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ] = italic_α start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT square-root start_ARG divide start_ARG italic_β start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 end_ARG start_ARG italic_α start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_β start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 ) + 1 end_ARG end_ARG
Cov[x1s,x3s]=α1β1β12(α12+1)+1Covsuperscriptsubscript𝑥1𝑠superscriptsubscript𝑥3𝑠subscript𝛼1subscript𝛽1superscriptsubscript𝛽12superscriptsubscript𝛼1211\operatorname{Cov}[x_{1}^{s},x_{3}^{s}]=\tfrac{\alpha_{1}\beta_{1}}{\sqrt{% \beta_{1}^{2}(\alpha_{1}^{2}+1)+1}}roman_Cov [ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ] = divide start_ARG italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 ) + 1 end_ARG end_ARG Cov[x1s,x3s]=α2β2(α22+1)(β22+1)Covsuperscriptsubscript𝑥1𝑠superscriptsubscript𝑥3𝑠subscript𝛼2subscript𝛽2superscriptsubscript𝛼221superscriptsubscript𝛽221\operatorname{Cov}[x_{1}^{s},x_{3}^{s}]=\tfrac{\alpha_{2}\beta_{2}}{\sqrt{(% \alpha_{2}^{2}+1)(\beta_{2}^{2}+1)}}roman_Cov [ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ] = divide start_ARG italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG ( italic_α start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 ) ( italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 ) end_ARG end_ARG Cov[x1s,x3s]=α3α32(β32+1)+1Covsuperscriptsubscript𝑥1𝑠superscriptsubscript𝑥3𝑠subscript𝛼3superscriptsubscript𝛼32superscriptsubscript𝛽3211\operatorname{Cov}[x_{1}^{s},x_{3}^{s}]=\tfrac{\alpha_{3}}{\sqrt{\alpha_{3}^{2% }(\beta_{3}^{2}+1)+1}}roman_Cov [ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ] = divide start_ARG italic_α start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG italic_α start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_β start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 ) + 1 end_ARG end_ARG
Cov[x2s,x3s]=β1α12+1β12(α12+1)+1Covsuperscriptsubscript𝑥2𝑠superscriptsubscript𝑥3𝑠subscript𝛽1superscriptsubscript𝛼121superscriptsubscript𝛽12superscriptsubscript𝛼1211\operatorname{Cov}[x_{2}^{s},x_{3}^{s}]=\beta_{1}\sqrt{\tfrac{\alpha_{1}^{2}+1% }{\beta_{1}^{2}(\alpha_{1}^{2}+1)+1}}roman_Cov [ italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ] = italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT square-root start_ARG divide start_ARG italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 end_ARG start_ARG italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 ) + 1 end_ARG end_ARG Cov[x2s,x3s]=β2β22+1Covsuperscriptsubscript𝑥2𝑠superscriptsubscript𝑥3𝑠subscript𝛽2superscriptsubscript𝛽221\operatorname{Cov}[x_{2}^{s},x_{3}^{s}]=\tfrac{\beta_{2}}{\sqrt{\beta_{2}^{2}+% 1}}roman_Cov [ italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ] = divide start_ARG italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 end_ARG end_ARG Cov[x2s,x3s]=β3β32+1Covsuperscriptsubscript𝑥2𝑠superscriptsubscript𝑥3𝑠subscript𝛽3superscriptsubscript𝛽321\operatorname{Cov}[x_{2}^{s},x_{3}^{s}]=\tfrac{\beta_{3}}{\sqrt{\beta_{3}^{2}+% 1}}roman_Cov [ italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ] = divide start_ARG italic_β start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG italic_β start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 end_ARG end_ARG

In the standardized SCM (13), the causal graph is v1v2v3subscript𝑣1subscript𝑣2subscript𝑣3v_{1}\rightarrow v_{2}\rightarrow v_{3}italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT → italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT → italic_v start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT. Hence, the edge directions of the DAG 𝒢𝒢\mathcal{G}caligraphic_G are consistent with the direction of increasing absolute covariance if and only if

|Cov[x1s,x2s]|<|Cov[x2s,x3s]||α1α12+1|<|β1α12+1β12(α12+1)+1|α12α12+1<β12α12+1β12(α12+1)+1α12(β12(α12+1)+1)<β12(α12+1)2β12α14+β12α12+α12<β12α14+2β12α12+β12α12<β12(α12+1)α12α12+1<β12.\displaystyle\begin{split}\lvert\operatorname{Cov}[x_{1}^{s},x_{2}^{s}]\rvert<% \lvert\operatorname{Cov}[x_{2}^{s},x_{3}^{s}]\rvert&\quad\Longleftrightarrow% \quad\left\lvert\tfrac{\alpha_{1}}{\sqrt{\alpha_{1}^{2}+1}}\right\rvert<\left% \lvert\beta_{1}\sqrt{\tfrac{\alpha_{1}^{2}+1}{\beta_{1}^{2}(\alpha_{1}^{2}+1)+% 1}}\right\rvert\\ &\quad\Longleftrightarrow\quad\tfrac{\alpha_{1}^{2}}{\alpha_{1}^{2}+1}<\beta_{% 1}^{2}\tfrac{\alpha_{1}^{2}+1}{\beta_{1}^{2}(\alpha_{1}^{2}+1)+1}\\ &\quad\Longleftrightarrow\quad\alpha_{1}^{2}(\beta_{1}^{2}(\alpha_{1}^{2}+1)+1% )<\beta_{1}^{2}(\alpha_{1}^{2}+1)^{2}\\ &\quad\Longleftrightarrow\quad\cancel{\beta_{1}^{2}\alpha_{1}^{4}}+\cancel{% \beta_{1}^{2}\alpha_{1}^{2}}+\alpha_{1}^{2}<\cancel{\beta_{1}^{2}\alpha_{1}^{4% }}+\cancel{2}\beta_{1}^{2}\alpha_{1}^{2}+\beta_{1}^{2}\\ &\quad\Longleftrightarrow\quad\alpha_{1}^{2}<\beta_{1}^{2}(\alpha_{1}^{2}+1)\\ &\quad\Longleftrightarrow\quad\tfrac{\alpha_{1}^{2}}{\alpha_{1}^{2}+1}<\beta_{% 1}^{2}\,.\end{split}start_ROW start_CELL | roman_Cov [ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ] | < | roman_Cov [ italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ] | end_CELL start_CELL ⟺ | divide start_ARG italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 end_ARG end_ARG | < | italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT square-root start_ARG divide start_ARG italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 end_ARG start_ARG italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 ) + 1 end_ARG end_ARG | end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ⟺ divide start_ARG italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 end_ARG < italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT divide start_ARG italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 end_ARG start_ARG italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 ) + 1 end_ARG end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ⟺ italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 ) + 1 ) < italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ⟺ cancel italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT + cancel italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT < cancel italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT + cancel 2 italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ⟺ italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT < italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ⟺ divide start_ARG italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 end_ARG < italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . end_CELL end_ROW (20)

In the above equivalences, we always multiply or divide by quantities greater than 00, so the direction of the inequality does not change, and transformations are equivalent. For the standardized SCM (LABEL:eq:s3) with causal graph v1v2v3subscript𝑣1subscript𝑣2subscript𝑣3v_{1}\leftarrow v_{2}\leftarrow v_{3}italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ← italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ← italic_v start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT, we get an analogous condition for the edges to be aligned with the order of increasing absolute covariance when following the same algebraic manipulations:

|Cov[x3s,x2s]|<|Cov[x2s,x1s]|Covsuperscriptsubscript𝑥3𝑠superscriptsubscript𝑥2𝑠Covsuperscriptsubscript𝑥2𝑠superscriptsubscript𝑥1𝑠\displaystyle\lvert\operatorname{Cov}[x_{3}^{s},x_{2}^{s}]\rvert<\lvert% \operatorname{Cov}[x_{2}^{s},x_{1}^{s}]\rvert| roman_Cov [ italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ] | < | roman_Cov [ italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT ] | β32β32+1<α32.superscriptsubscript𝛽32superscriptsubscript𝛽321superscriptsubscript𝛼32\displaystyle\quad\Longleftrightarrow\quad\tfrac{\beta_{3}^{2}}{\beta_{3}^{2}+% 1}<\alpha_{3}^{2}.⟺ divide start_ARG italic_β start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_β start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 end_ARG < italic_α start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

We make use of both of these conditions in Section 4. Since z/(z+1)<1𝑧𝑧11z/(z+1)<1italic_z / ( italic_z + 1 ) < 1 for any z>0𝑧0z>0italic_z > 0, the right-hand sides of both conditions are true if all weights are greater than 1111. In this case, the absolute covariance increases downstream in all SCMs of Equations (13) and (LABEL:eq:s3). Hence, among these two systems, only the DAG 𝒢𝒢\mathcal{G}caligraphic_G whose edges aligns with the covariance ordering in the observed p(𝐱𝐬)𝑝superscript𝐱𝐬p(\bf{x}^{s})italic_p ( bold_x start_POSTSUPERSCRIPT bold_s end_POSTSUPERSCRIPT ) can induce p(𝐱𝐬)𝑝superscript𝐱𝐬p(\bf{x}^{s})italic_p ( bold_x start_POSTSUPERSCRIPT bold_s end_POSTSUPERSCRIPT ), and we can conclude that the other DAG is not the true causal graph.

iSCM

To derive the observational distributions of the iSCMs in Equations (14), (16), and (LABEL:eq:s3_ours), we proceed in the same way as we did for standardized SCMs. We first show that the iSCM is an SCM with a specific set of mechanisms and then apply Lemma 1 to obtain the covariances between the observed variables. To see this, we write the assignment of xi~~subscript𝑥𝑖\widetilde{x_{i}}over~ start_ARG italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG as

xi~:=xiVar[xi]=wj,ixj~+εiVar[xi]=wj,iVar[xi]xj~+εiVar[xi]assign~subscript𝑥𝑖subscript𝑥𝑖Varsubscript𝑥𝑖subscript𝑤𝑗𝑖~subscript𝑥𝑗subscript𝜀𝑖Varsubscript𝑥𝑖subscript𝑤𝑗𝑖Varsubscript𝑥𝑖~subscript𝑥𝑗subscript𝜀𝑖Varsubscript𝑥𝑖\widetilde{x_{i}}:=\frac{x_{i}}{\sqrt{\operatorname{Var}[x_{i}]}}=\frac{w_{j,i% }\widetilde{x_{j}}+\varepsilon_{i}}{\sqrt{\operatorname{Var}[x_{i}]}}=\frac{w_% {j,i}}{\sqrt{\operatorname{Var}[x_{i}]}}\widetilde{x_{j}}+\frac{\varepsilon_{i% }}{\sqrt{\operatorname{Var}[x_{i}]}}over~ start_ARG italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG := divide start_ARG italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Var [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] end_ARG end_ARG = divide start_ARG italic_w start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT over~ start_ARG italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG + italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Var [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] end_ARG end_ARG = divide start_ARG italic_w start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Var [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] end_ARG end_ARG over~ start_ARG italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG + divide start_ARG italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Var [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] end_ARG end_ARG (21)

As before, using Equation 21 requires first computing the marginal variances of the latent variables xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. For the iSCMs defined by Equations (14), (16), and (LABEL:eq:s3_ours), they are given by

for Equation (14): for Equation (16): for Equation (LABEL:eq:s3_ours):
Var[x1]=σ2Varsubscript𝑥1superscript𝜎2\operatorname{Var}[x_{1}]=\sigma^{2}roman_Var [ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] = italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT Var[x1]=γ22+σ2Varsubscript𝑥1superscriptsubscript𝛾22superscript𝜎2\operatorname{Var}[x_{1}]=\gamma_{2}^{2}+\sigma^{2}roman_Var [ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] = italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT Var[x1]=γ32+σ2Varsubscript𝑥1superscriptsubscript𝛾32superscript𝜎2\operatorname{Var}[x_{1}]=\gamma_{3}^{2}+\sigma^{2}roman_Var [ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] = italic_γ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
Var[x2]=γ12+σ2Varsubscript𝑥2superscriptsubscript𝛾12superscript𝜎2\operatorname{Var}[x_{2}]=\gamma_{1}^{2}+\sigma^{2}roman_Var [ italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] = italic_γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT Var[x2]=σ2Varsubscript𝑥2superscript𝜎2\operatorname{Var}[x_{2}]=\sigma^{2}roman_Var [ italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] = italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT Var[x2]=λ32+σ2Varsubscript𝑥2superscriptsubscript𝜆32superscript𝜎2\operatorname{Var}[x_{2}]=\lambda_{3}^{2}+\sigma^{2}roman_Var [ italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] = italic_λ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
Var[x3]=λ12+σ2Varsubscript𝑥3superscriptsubscript𝜆12superscript𝜎2\operatorname{Var}[x_{3}]=\lambda_{1}^{2}+\sigma^{2}roman_Var [ italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ] = italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT Var[x3]=λ22+σ2Varsubscript𝑥3superscriptsubscript𝜆22superscript𝜎2\operatorname{Var}[x_{3}]=\lambda_{2}^{2}+\sigma^{2}roman_Var [ italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ] = italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT Var[x3]=σ2Varsubscript𝑥3superscript𝜎2\operatorname{Var}[x_{3}]=\sigma^{2}roman_Var [ italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ] = italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT

Given Equation (21) and the marginal variances, we obtain an explicit form for the weights of all three implied SCMs. Since the implied SCMs are linear, have unit marginal variances, and share the same causal graph, we can apply Lemma 1 and obtain the covariances of the observational distributions in the original models. It turns out that the observational distribution of all three ground-truth systems (x~1,x~2,x~3)subscript~𝑥1subscript~𝑥2subscript~𝑥3(\widetilde{x}_{1},\widetilde{x}_{2},\widetilde{x}_{3})( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) in Equations (14), (16), and (LABEL:eq:s3_ours) is a multivariate Gaussian with the same covariance matrix, with the diagonal elements equal to 1111 and the off-diagonal elements given by

Cov[x~1,x~2]=γiγi2+σ2Cov[x~1,x~3]=γiλi(λi2+σ2)(γi2+σ2)Cov[x~2,x~3]=λiλi2+σ2Covsubscript~𝑥1subscript~𝑥2subscript𝛾𝑖superscriptsubscript𝛾𝑖2superscript𝜎2Covsubscript~𝑥1subscript~𝑥3subscript𝛾𝑖subscript𝜆𝑖superscriptsubscript𝜆𝑖2superscript𝜎2superscriptsubscript𝛾𝑖2superscript𝜎2Covsubscript~𝑥2subscript~𝑥3subscript𝜆𝑖superscriptsubscript𝜆𝑖2superscript𝜎2\displaystyle\begin{split}\operatorname{Cov}[\widetilde{x}_{1},\widetilde{x}_{% 2}]&=\displaystyle\frac{\gamma_{i}}{\sqrt{\gamma_{i}^{2}+\sigma^{2}}}\\ \operatorname{Cov}[\widetilde{x}_{1},\widetilde{x}_{3}]&=\displaystyle\frac{% \gamma_{i}\lambda_{i}}{\sqrt{(\lambda_{i}^{2}+\sigma^{2})(\gamma_{i}^{2}+% \sigma^{2})}}\\ \operatorname{Cov}[\widetilde{x}_{2},\widetilde{x}_{3}]&=\displaystyle\frac{% \lambda_{i}}{\sqrt{\lambda_{i}^{2}+\sigma^{2}}}\end{split}start_ROW start_CELL roman_Cov [ over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] end_CELL start_CELL = divide start_ARG italic_γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG italic_γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG end_CELL end_ROW start_ROW start_CELL roman_Cov [ over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ] end_CELL start_CELL = divide start_ARG italic_γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG ( italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ( italic_γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG end_ARG end_CELL end_ROW start_ROW start_CELL roman_Cov [ over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ] end_CELL start_CELL = divide start_ARG italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG end_CELL end_ROW (22)

Since the observational distribution of all three SCMs is a zero-centered multivariate Gaussian, the distributions are equal if and only if their their covariance matrices are identical. The covariances are equal if and only if λ1=λ2=λ3subscript𝜆1subscript𝜆2subscript𝜆3\lambda_{1}=\lambda_{2}=\lambda_{3}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_λ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT and γ1=γ2=γ3subscript𝛾1subscript𝛾2subscript𝛾3\gamma_{1}=\gamma_{2}=\gamma_{3}italic_γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_γ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT, because the function f(z)=z/z2+σ2𝑓𝑧𝑧superscript𝑧2superscript𝜎2f(z)=\smash{z/\sqrt{z^{2}+\sigma^{2}}}italic_f ( italic_z ) = italic_z / square-root start_ARG italic_z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG appearing in Cov[x~1,x~2]Covsubscript~𝑥1subscript~𝑥2\smash{\operatorname{Cov}[\widetilde{x}_{1},\widetilde{x}_{2}]}roman_Cov [ over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] and Cov[x~2,x~3]Covsubscript~𝑥2subscript~𝑥3\smash{\operatorname{Cov}[\widetilde{x}_{2},\widetilde{x}_{3}]}roman_Cov [ over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ] of Equation 22 is injective for any σ>0𝜎0\sigma>0italic_σ > 0, which means that distinct weights z𝑧zitalic_z are mapped to distinct covariances. Therefore, the three node linear iSCMs in the above MEC share the same observational distribution if and only if they also share the same weights for each edge, regardless of edge orientation.

This implies that the three DAGs 𝒢𝒢\mathcal{G}caligraphic_G in the MEC of Equations (14), (16), and (LABEL:eq:s3_ours) are not identifiable from p(𝐱~)𝑝~𝐱p(\widetilde{\bf{x}})italic_p ( over~ start_ARG bold_x end_ARG ): given p(𝐱~)𝑝~𝐱p(\widetilde{\bf{x}})italic_p ( over~ start_ARG bold_x end_ARG ) induced by an iSCM with DAG in this 3-node MEC, the two other DAGs with the same linear function weights induce the same distribution p(𝐱~)𝑝~𝐱p(\widetilde{\bf{x}})italic_p ( over~ start_ARG bold_x end_ARG ).

C.4.2 Forests

In this section, we generalize the above partial identifiability result for standardized SCMs to arbitrary forest DAGs (Theorem 3). After that, we similarly generalize the nonidentifiability of iSCMs to forests (Theorem 4). Our results concern the identification edge directions in an MEC represented by its partially directed graph 𝒢~=(𝒱,~)~𝒢𝒱~\smash{\smash{\tilde{\mathcal{G}}}=(\mathcal{V},\tilde{\mathcal{E}})}over~ start_ARG caligraphic_G end_ARG = ( caligraphic_V , over~ start_ARG caligraphic_E end_ARG ), where ~~\smash{\tilde{\mathcal{E}}}over~ start_ARG caligraphic_E end_ARG contains both directed and undirected edges.

Standardized SCM

Before proving the main theorem, we extend the 3-node example to chains of arbitrary length. We show that all but at most one edge in the MEC can be correctly oriented from observational data using the assumption on the support of the weights. Analogous to the three node case, we then use this to prove a similar result for forest graphs.

Lemma 5 (Orientation of edges in undirected chains of standardized SCMs).

Let 𝐱𝐬superscript𝐱𝐬\bf{x}^{s}bold_x start_POSTSUPERSCRIPT bold_s end_POSTSUPERSCRIPT be modeled by a standardized linear SCM (1) with chain DAG 𝒢=(𝒱,)𝒢𝒱\mathcal{G}=(\mathcal{V},\mathcal{E})caligraphic_G = ( caligraphic_V , caligraphic_E ) , where Var[εi]=σ2Varsubscript𝜀𝑖superscript𝜎2\operatorname{Var}[\varepsilon_{i}]=\sigma^{2}roman_Var [ italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] = italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT for non-root nodes and |wi,j|>1subscript𝑤𝑖𝑗1\left\lvert w_{i,j}\right\rvert>1| italic_w start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT | > 1 for all ipa(j)𝑖pa𝑗i\in\text{pa}(j)italic_i ∈ pa ( italic_j ). Additionally, suppose 𝒢𝒢\mathcal{G}caligraphic_G contains no colliders. Then, given p(𝐱𝐬)𝑝superscript𝐱𝐬p(\bf{x}^{s})italic_p ( bold_x start_POSTSUPERSCRIPT bold_s end_POSTSUPERSCRIPT ) and the partially directed graph 𝒢~~𝒢\smash{\tilde{\mathcal{G}}}over~ start_ARG caligraphic_G end_ARG representing the MEC of 𝒢𝒢\mathcal{G}caligraphic_G, we can identify all but at most one edge (vi,vj)subscript𝑣𝑖subscript𝑣𝑗(v_{i},v_{j})( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) of the true DAG 𝒢𝒢\mathcal{G}caligraphic_G in each undirected connected component of the MEC 𝒢~~𝒢\smash{\tilde{\mathcal{G}}}over~ start_ARG caligraphic_G end_ARG. The possible undirected edge has the smallest absolute covariance of all variables connected by edges in the MEC, satisfying |Cov[xis,xjs]|<|Cov[xks,xls]|Covsubscriptsuperscript𝑥𝑠𝑖subscriptsuperscript𝑥𝑠𝑗Covsubscriptsuperscript𝑥𝑠𝑘subscriptsuperscript𝑥𝑠𝑙\smash{\lvert\operatorname{Cov}[x^{s}_{i},x^{s}_{j}]\rvert<\lvert\operatorname% {Cov}[x^{s}_{k},x^{s}_{l}]\rvert}\,| roman_Cov [ italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] | < | roman_Cov [ italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ] | for all (k,l)~(i,j)𝑘𝑙~𝑖𝑗(k,l)\in\smash{\tilde{\mathcal{E}}}\setminus(i,j)( italic_k , italic_l ) ∈ over~ start_ARG caligraphic_E end_ARG ∖ ( italic_i , italic_j ).

Proof.
visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPTvi+1subscript𝑣𝑖1v_{i+1}italic_v start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPTvi+2subscript𝑣𝑖2v_{i+2}italic_v start_POSTSUBSCRIPT italic_i + 2 end_POSTSUBSCRIPT
(a) Subsystem 1
visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPTvi+1subscript𝑣𝑖1v_{i+1}italic_v start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPTvi+2subscript𝑣𝑖2v_{i+2}italic_v start_POSTSUBSCRIPT italic_i + 2 end_POSTSUBSCRIPT
(b) Subsystem 2
visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPTvi+1subscript𝑣𝑖1v_{i+1}italic_v start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPTvi+2subscript𝑣𝑖2v_{i+2}italic_v start_POSTSUBSCRIPT italic_i + 2 end_POSTSUBSCRIPT
(c) Subsystem 3
Figure 7: Proof subcases of Lemma 5. Three possible subgraphs in a chain without a collider.

Throughout the proof, we label the nodes vi𝒱subscript𝑣𝑖𝒱v_{i}\in\mathcal{V}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_V such that vi1subscript𝑣𝑖1v_{i-1}italic_v start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT and vi+1subscript𝑣𝑖1v_{i+1}italic_v start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT are its neighbors for i{2,,d1}𝑖2𝑑1i\in\{2,\dots,d-1\}italic_i ∈ { 2 , … , italic_d - 1 }. We start with the analysis of three arbitrary, consecutive vertices in a chain graph. The three possible subgraphs are depicted in Figure 7. We can always find p𝑝p\in\mathds{R}italic_p ∈ blackboard_R such that the variance of the latent root of this directed subgraph is p2σ2superscript𝑝2superscript𝜎2p^{2}\sigma^{2}italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. This relaxed assumption on specifically the root node allows for the root of the subgraph to have potential parents outside the subgraph, or to be the root of the whole chain, when later using this lemma to prove the main theorem.

We will follow similar derivations as in Section C.4.1. Specifically, we first write the observed variables of the standardized SCM in SCM form, and then invoke Lemma 1 to obtain the covariances of the observed variables. To use Equation 19, we again need to compute the marginal variances of the variables before standardization. For the subsystems in Figures 7(a) and 7(b), these are, respectively:

for Figure 7(a): for Figure 7(b):
Var[xi]=p2σ2Varsubscript𝑥𝑖superscript𝑝2superscript𝜎2\operatorname{Var}[x_{i}]=p^{2}\sigma^{2}roman_Var [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] = italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT Var[xi]=(wi+1,i2p2+1)σ2Varsubscript𝑥𝑖superscriptsubscript𝑤𝑖1𝑖2superscript𝑝21superscript𝜎2\operatorname{Var}[x_{i}]=(w_{i+1,i}^{2}p^{2}+1)\sigma^{2}roman_Var [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] = ( italic_w start_POSTSUBSCRIPT italic_i + 1 , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 ) italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
Var[xi+1]=(wi,i+12p2+1)σ2Varsubscript𝑥𝑖1superscriptsubscript𝑤𝑖𝑖12superscript𝑝21superscript𝜎2\operatorname{Var}[x_{i+1}]=(w_{i,i+1}^{2}p^{2}+1)\sigma^{2}roman_Var [ italic_x start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ] = ( italic_w start_POSTSUBSCRIPT italic_i , italic_i + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 ) italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT Var[xi+1]=p2σ2Varsubscript𝑥𝑖1superscript𝑝2superscript𝜎2\operatorname{Var}[x_{i+1}]=p^{2}\sigma^{2}roman_Var [ italic_x start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ] = italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
Var[xi+2]=(wi+1,i+22(wi,i+12p2+1)+1)σ2Varsubscript𝑥𝑖2superscriptsubscript𝑤𝑖1𝑖22superscriptsubscript𝑤𝑖𝑖12superscript𝑝211superscript𝜎2\operatorname{Var}[x_{i+2}]=(w_{i+1,i+2}^{2}(w_{i,i+1}^{2}p^{2}+1)+1)\sigma^{2}roman_Var [ italic_x start_POSTSUBSCRIPT italic_i + 2 end_POSTSUBSCRIPT ] = ( italic_w start_POSTSUBSCRIPT italic_i + 1 , italic_i + 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_w start_POSTSUBSCRIPT italic_i , italic_i + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 ) + 1 ) italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT Var[xi+2]=(wi+1,i+22p2+1)σ2Varsubscript𝑥𝑖2superscriptsubscript𝑤𝑖1𝑖22superscript𝑝21superscript𝜎2\operatorname{Var}[x_{i+2}]=(w_{i+1,i+2}^{2}p^{2}+1)\sigma^{2}roman_Var [ italic_x start_POSTSUBSCRIPT italic_i + 2 end_POSTSUBSCRIPT ] = ( italic_w start_POSTSUBSCRIPT italic_i + 1 , italic_i + 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 ) italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT

By substituting the expressions for the marginal variances into Equation 19, we obtain the weights of the implied models of the standardized SCM. Using Lemma 1, we obtain the covariances between the observed variables xi1s,xis,xi+1ssuperscriptsubscript𝑥𝑖1𝑠superscriptsubscript𝑥𝑖𝑠superscriptsubscript𝑥𝑖1𝑠x_{i-1}^{s},x_{i}^{s},x_{i+1}^{s}italic_x start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT , italic_x start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT. By construction, the marginal variances of the observed variables are equal to 1111. We treat each subsystem separately:

Subsystem 1 (Figure 7(a))

Given the marginal variances and Lemma 1, the covariances are

Cov[xis,xi+1s]Covsubscriptsuperscript𝑥𝑠𝑖subscriptsuperscript𝑥𝑠𝑖1\displaystyle\operatorname{Cov}[x^{s}_{i},x^{s}_{i+1}]roman_Cov [ italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ] =wi,i+1pwi,i+12p2+1absentsubscript𝑤𝑖𝑖1𝑝superscriptsubscript𝑤𝑖𝑖12superscript𝑝21\displaystyle=\frac{w_{i,i+1}p}{\sqrt{w_{i,i+1}^{2}p^{2}+1}}= divide start_ARG italic_w start_POSTSUBSCRIPT italic_i , italic_i + 1 end_POSTSUBSCRIPT italic_p end_ARG start_ARG square-root start_ARG italic_w start_POSTSUBSCRIPT italic_i , italic_i + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 end_ARG end_ARG
Cov[xi+1s,xi+2s]Covsubscriptsuperscript𝑥𝑠𝑖1subscriptsuperscript𝑥𝑠𝑖2\displaystyle\operatorname{Cov}[x^{s}_{i+1},x^{s}_{i+2}]roman_Cov [ italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i + 2 end_POSTSUBSCRIPT ] =wi+1,i+2wi,i+12p2+1wi+1,i+22(wi,i+12p2+1)+1absentsubscript𝑤𝑖1𝑖2superscriptsubscript𝑤𝑖𝑖12superscript𝑝21superscriptsubscript𝑤𝑖1𝑖22superscriptsubscript𝑤𝑖𝑖12superscript𝑝211\displaystyle=w_{i+1,i+2}\sqrt{\frac{w_{i,i+1}^{2}p^{2}+1}{w_{i+1,i+2}^{2}(w_{% i,i+1}^{2}p^{2}+1)+1}}= italic_w start_POSTSUBSCRIPT italic_i + 1 , italic_i + 2 end_POSTSUBSCRIPT square-root start_ARG divide start_ARG italic_w start_POSTSUBSCRIPT italic_i , italic_i + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 end_ARG start_ARG italic_w start_POSTSUBSCRIPT italic_i + 1 , italic_i + 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_w start_POSTSUBSCRIPT italic_i , italic_i + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 ) + 1 end_ARG end_ARG

Following the same algebraic manipulations as in Equation 20, substituting α1:=wi,i+1passignsubscript𝛼1subscript𝑤𝑖𝑖1𝑝\alpha_{1}:=w_{i,i+1}pitalic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT := italic_w start_POSTSUBSCRIPT italic_i , italic_i + 1 end_POSTSUBSCRIPT italic_p and β1:=wi+1,i+2assignsubscript𝛽1subscript𝑤𝑖1𝑖2\beta_{1}:=w_{i+1,i+2}italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT := italic_w start_POSTSUBSCRIPT italic_i + 1 , italic_i + 2 end_POSTSUBSCRIPT in the derivation, we obtain

|Cov[xis,xi+1s]|<|Cov[xi+1s,xi+2s]|wi,i+12p2wi,i+12p2+1<wi+1,i+22.formulae-sequenceCovsubscriptsuperscript𝑥𝑠𝑖subscriptsuperscript𝑥𝑠𝑖1Covsubscriptsuperscript𝑥𝑠𝑖1subscriptsuperscript𝑥𝑠𝑖2superscriptsubscript𝑤𝑖𝑖12superscript𝑝2superscriptsubscript𝑤𝑖𝑖12superscript𝑝21superscriptsubscript𝑤𝑖1𝑖22\displaystyle\left\lvert\operatorname{Cov}[x^{s}_{i},x^{s}_{i+1}]\right\rvert<% \left\lvert\operatorname{Cov}[x^{s}_{i+1},x^{s}_{i+2}]\right\rvert\quad% \Longleftrightarrow\quad\frac{w_{i,i+1}^{2}p^{2}}{w_{i,i+1}^{2}p^{2}+1}<w_{i+1% ,i+2}^{2}\,.| roman_Cov [ italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ] | < | roman_Cov [ italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i + 2 end_POSTSUBSCRIPT ] | ⟺ divide start_ARG italic_w start_POSTSUBSCRIPT italic_i , italic_i + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_w start_POSTSUBSCRIPT italic_i , italic_i + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 end_ARG < italic_w start_POSTSUBSCRIPT italic_i + 1 , italic_i + 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (23)

The left-hand side of the right-hand inequality in Equation 23 is upper-bounded by 1111, similar to the 3-node case. Therefore, if we assume that |wi+1,i+2|1subscript𝑤𝑖1𝑖21\lvert w_{i+1,i+2}\rvert\geq 1| italic_w start_POSTSUBSCRIPT italic_i + 1 , italic_i + 2 end_POSTSUBSCRIPT | ≥ 1, it must hold that |Cov[xis,xi+1s]|<|Cov[xi+1s,xi+2s]|Covsubscriptsuperscript𝑥𝑠𝑖subscriptsuperscript𝑥𝑠𝑖1Covsubscriptsuperscript𝑥𝑠𝑖1subscriptsuperscript𝑥𝑠𝑖2\lvert{\operatorname{Cov}[x^{s}_{i},x^{s}_{i+1}]}\rvert<\lvert{\operatorname{% Cov}[x^{s}_{i+1},x^{s}_{i+2}]}\rvert| roman_Cov [ italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ] | < | roman_Cov [ italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i + 2 end_POSTSUBSCRIPT ] | for any choice of p𝑝pitalic_p.

Subsystem 2 (Figure 7(b))

Given the marginal variances and Lemma 1, the covariances are

Cov[xis,xi+1s]Covsubscriptsuperscript𝑥𝑠𝑖subscriptsuperscript𝑥𝑠𝑖1\displaystyle\operatorname{Cov}[x^{s}_{i},x^{s}_{i+1}]roman_Cov [ italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ] =wi+1,ipwi+1,i2p2+1absentsubscript𝑤𝑖1𝑖𝑝superscriptsubscript𝑤𝑖1𝑖2superscript𝑝21\displaystyle=\frac{w_{i+1,i}p}{\sqrt{w_{i+1,i}^{2}p^{2}+1}}= divide start_ARG italic_w start_POSTSUBSCRIPT italic_i + 1 , italic_i end_POSTSUBSCRIPT italic_p end_ARG start_ARG square-root start_ARG italic_w start_POSTSUBSCRIPT italic_i + 1 , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 end_ARG end_ARG
Cov[xi+1s,xi+2s]Covsubscriptsuperscript𝑥𝑠𝑖1subscriptsuperscript𝑥𝑠𝑖2\displaystyle\operatorname{Cov}[x^{s}_{i+1},x^{s}_{i+2}]roman_Cov [ italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i + 2 end_POSTSUBSCRIPT ] =wi+1,i+2pwi+1,i+22p2+1.absentsubscript𝑤𝑖1𝑖2𝑝superscriptsubscript𝑤𝑖1𝑖22superscript𝑝21\displaystyle=\frac{w_{i+1,i+2}p}{\sqrt{w_{i+1,i+2}^{2}p^{2}+1}}\,.= divide start_ARG italic_w start_POSTSUBSCRIPT italic_i + 1 , italic_i + 2 end_POSTSUBSCRIPT italic_p end_ARG start_ARG square-root start_ARG italic_w start_POSTSUBSCRIPT italic_i + 1 , italic_i + 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 end_ARG end_ARG .

The ordering of the covariances in this case depends on the specific choice of the weights.

Subsystem 3 (Figure 7(c))

Following steps analogous to the symmetric subsystem 1, we conclude that, if |wi+1,i|1subscript𝑤𝑖1𝑖1\lvert{w_{i+1,i}}\rvert\geq 1| italic_w start_POSTSUBSCRIPT italic_i + 1 , italic_i end_POSTSUBSCRIPT | ≥ 1, it must hold that |Cov[xis,xi+1s]|>|Cov[xi+1s,xi+2s]|Covsubscriptsuperscript𝑥𝑠𝑖subscriptsuperscript𝑥𝑠𝑖1Covsubscriptsuperscript𝑥𝑠𝑖1subscriptsuperscript𝑥𝑠𝑖2\lvert{\operatorname{Cov}[x^{s}_{i},x^{s}_{i+1}]}\rvert>\lvert{\operatorname{% Cov}[x^{s}_{i+1},x^{s}_{i+2}]}\rvert| roman_Cov [ italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ] | > | roman_Cov [ italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i + 2 end_POSTSUBSCRIPT ] | for any p𝑝pitalic_p.


Given the above, we can now study the relationship between the underlying DAG 𝒢𝒢\mathcal{G}caligraphic_G and the absolute covariance magnitudes under the assumption that |wi,i+1|>1subscript𝑤𝑖𝑖11\smash{\lvert w_{i,i+1}\rvert}>1| italic_w start_POSTSUBSCRIPT italic_i , italic_i + 1 end_POSTSUBSCRIPT | > 1. We will use the fact that, if the chain does not contain a collider, then there can be at most one node contained in edges pointing in opposite directions.

First, we treat the case where there exists a vertex visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT such that |Cov[xi1s,xis]|=|Cov[xis,xi+1s]|Covsubscriptsuperscript𝑥𝑠𝑖1subscriptsuperscript𝑥𝑠𝑖Covsubscriptsuperscript𝑥𝑠𝑖subscriptsuperscript𝑥𝑠𝑖1\lvert\operatorname{Cov}[x^{s}_{i-1},x^{s}_{i}]\rvert=\lvert\operatorname{Cov}% [x^{s}_{i},x^{s}_{i+1}]\rvert| roman_Cov [ italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] | = | roman_Cov [ italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ] |, that is, where some neighboring covariances are equal. If this occurs in a 3-node subsystem, only subsystem 2 can describe the true graph. To be consistent with the assumption that there are no colliders in the graph (see Lemma 5), all other edges must be oriented in a direction away from visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, which completely identifies the graph 𝒢𝒢\mathcal{G}caligraphic_G in the MEC.

In the second case, |Cov[xj1s,xjs]||Cov[xjs,xj+1s]|Covsubscriptsuperscript𝑥𝑠𝑗1subscriptsuperscript𝑥𝑠𝑗Covsubscriptsuperscript𝑥𝑠𝑗subscriptsuperscript𝑥𝑠𝑗1\lvert\operatorname{Cov}[x^{s}_{j-1},x^{s}_{j}]\rvert\neq\lvert\operatorname{% Cov}[x^{s}_{j},x^{s}_{j+1}]\rvert| roman_Cov [ italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] | ≠ | roman_Cov [ italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT ] | holds for all nodes vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT that have two neighbors in the path. Let xis,xi+1ssubscriptsuperscript𝑥𝑠𝑖subscriptsuperscript𝑥𝑠𝑖1x^{s}_{i},x^{s}_{i+1}italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT be the unique pair of consecutive variables in the chain that minimizes |Cov[xis,xi+1s]|Covsubscriptsuperscript𝑥𝑠𝑖subscriptsuperscript𝑥𝑠𝑖1\lvert\operatorname{Cov}[x^{s}_{i},x^{s}_{i+1}]\rvert| roman_Cov [ italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ] |. We can show that this pair is the unique minimizer using a proof by contradiction. Suppose there exist two pairs xis,xi+1ssubscriptsuperscript𝑥𝑠𝑖subscriptsuperscript𝑥𝑠𝑖1x^{s}_{i},x^{s}_{i+1}italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT and xjs,xj+1ssubscriptsuperscript𝑥𝑠𝑗subscriptsuperscript𝑥𝑠𝑗1x^{s}_{j},x^{s}_{j+1}italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT such that |Cov[xis,xi+1s]|=|Cov[xjs,xj+1s]|Covsubscriptsuperscript𝑥𝑠𝑖subscriptsuperscript𝑥𝑠𝑖1Covsubscriptsuperscript𝑥𝑠𝑗subscriptsuperscript𝑥𝑠𝑗1\lvert\operatorname{Cov}[x^{s}_{i},x^{s}_{i+1}]\rvert=\lvert\operatorname{Cov}% [x^{s}_{j},x^{s}_{j+1}]\rvert| roman_Cov [ italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ] | = | roman_Cov [ italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT ] | is the minimum covariance. Without loss of generality, let j+1<i𝑗1𝑖j+1<iitalic_j + 1 < italic_i. Then, the triple xi1s,xis,xi+1ssubscriptsuperscript𝑥𝑠𝑖1subscriptsuperscript𝑥𝑠𝑖subscriptsuperscript𝑥𝑠𝑖1x^{s}_{i-1},x^{s}_{i},x^{s}_{i+1}italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT is consistent with only subsystems 2 or 3 based on their relative covariances, which implies that we must have vi1visubscript𝑣𝑖1subscript𝑣𝑖v_{i-1}\leftarrow v_{i}italic_v start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ← italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Using the fact that we have no colliders, we can then orient all edges vk1vksubscript𝑣𝑘1subscript𝑣𝑘v_{k-1}\leftarrow v_{k}italic_v start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ← italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT for 1<k<i1𝑘𝑖1<k<i1 < italic_k < italic_i. Thus, we can find a subsystem containing vj,vj+1,vj+2subscript𝑣𝑗subscript𝑣𝑗1subscript𝑣𝑗2v_{j},v_{j+1},v_{j+2}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_j + 2 end_POSTSUBSCRIPT, which has been already oriented as subsystem 3, meaning |Cov[xjs,xj+1s]|>|Cov[xj+1s,xj+2s]|Covsubscriptsuperscript𝑥𝑠𝑗subscriptsuperscript𝑥𝑠𝑗1Covsubscriptsuperscript𝑥𝑠𝑗1subscriptsuperscript𝑥𝑠𝑗2\lvert\operatorname{Cov}[x^{s}_{j},x^{s}_{j+1}]\rvert>\lvert\operatorname{Cov}% [x^{s}_{j+1},x^{s}_{j+2}]\rvert| roman_Cov [ italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT ] | > | roman_Cov [ italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j + 2 end_POSTSUBSCRIPT ] |, a contradiction.

Given xis,xi+1ssubscriptsuperscript𝑥𝑠𝑖subscriptsuperscript𝑥𝑠𝑖1x^{s}_{i},x^{s}_{i+1}italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT is the unique pair of consecutive variables that minimizes |Cov[xis,xi+1s]|Covsubscriptsuperscript𝑥𝑠𝑖subscriptsuperscript𝑥𝑠𝑖1\lvert{\operatorname{Cov}[x^{s}_{i},x^{s}_{i+1}]\rvert}| roman_Cov [ italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ] |, we now show that we can orient all edges except (vi,vi+1)subscript𝑣𝑖subscript𝑣𝑖1(v_{i},v_{i+1})( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ). We will do this in two parts. First, we show that one can orient all edges (vj,vj+1)subscript𝑣𝑗subscript𝑣𝑗1(v_{j},v_{j+1})( italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT ) with j<i𝑗𝑖j<iitalic_j < italic_i, and then we show that we can do the same for all edges (vj,vj+1)subscript𝑣𝑗subscript𝑣𝑗1(v_{j},v_{j+1})( italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT ) with j>i𝑗𝑖j>iitalic_j > italic_i. If i>1𝑖1i>1italic_i > 1, consider the subsystem vi1,vi,vi+1subscript𝑣𝑖1subscript𝑣𝑖subscript𝑣𝑖1v_{i-1},v_{i},v_{i+1}italic_v start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT. Since |Cov[xi1s,xis]|>|Cov[xis,xi+1s]|\smash{\lvert{\operatorname{Cov}[x^{s}_{i-1},x^{s}_{i}]}\rvert}>\smash{\lvert{% \operatorname{Cov}[x^{s}_{i},x^{s}_{i+1}]}\lvert}| roman_Cov [ italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] | > | roman_Cov [ italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ] |, only subsystems 2 and 3 are possible for this subgraph. We can therefore orient vi1visubscript𝑣𝑖1subscript𝑣𝑖v_{i-1}\leftarrow v_{i}italic_v start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT ← italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Similarly, if i<d1𝑖𝑑1i<d-1italic_i < italic_d - 1, by a symmetric argument on vi,vi+1,vi+2subscript𝑣𝑖subscript𝑣𝑖1subscript𝑣𝑖2v_{i},v_{i+1},v_{i+2}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_i + 2 end_POSTSUBSCRIPT, we can orient vi+1vi+2subscript𝑣𝑖1subscript𝑣𝑖2v_{i+1}\rightarrow v_{i+2}italic_v start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT → italic_v start_POSTSUBSCRIPT italic_i + 2 end_POSTSUBSCRIPT. Since the graph cannot contain colliders, all other edges must be oriented as vjvj+1subscript𝑣𝑗subscript𝑣𝑗1v_{j}\leftarrow v_{j+1}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ← italic_v start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT for j<i𝑗𝑖j<iitalic_j < italic_i, and vjvj+1subscript𝑣𝑗subscript𝑣𝑗1v_{j}\rightarrow v_{j+1}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT → italic_v start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT for j>i𝑗𝑖j>iitalic_j > italic_i. In other words, all edges except (vi,vi+1)subscript𝑣𝑖subscript𝑣𝑖1(v_{i},v_{i+1})( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ) point away from the two vertices vi,vi+1subscript𝑣𝑖subscript𝑣𝑖1v_{i},v_{i+1}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT, and one of the two variables must be the root of the chain. Therefore, if |Cov[xj1s,xjs]||Cov[xjs,xj+1s]|Covsubscriptsuperscript𝑥𝑠𝑗1subscriptsuperscript𝑥𝑠𝑗Covsubscriptsuperscript𝑥𝑠𝑗subscriptsuperscript𝑥𝑠𝑗1\smash{\lvert\operatorname{Cov}[x^{s}_{j-1},x^{s}_{j}]\rvert\neq\lvert% \operatorname{Cov}[x^{s}_{j},x^{s}_{j+1}]\rvert}| roman_Cov [ italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] | ≠ | roman_Cov [ italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT ] | holds for all vertices vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT that have two neighbors, then there exists a unique covariance minimizing pair xis,xi+1ssubscriptsuperscript𝑥𝑠𝑖subscriptsuperscript𝑥𝑠𝑖1x^{s}_{i},x^{s}_{i+1}italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT, and all edges except (vi,vi+1)subscript𝑣𝑖subscript𝑣𝑖1(v_{i},v_{i+1})( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ) are oriented.

The two cases above are exhaustive, and in the worst case at most one edge (vj,vj+1)subscript𝑣𝑗subscript𝑣𝑗1(v_{j},v_{j+1})( italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT ) is left unoriented in the chain. This edge always corresponds to the minimizer of |Cov[xjs,xj+1s]|Covsubscriptsuperscript𝑥𝑠𝑗subscriptsuperscript𝑥𝑠𝑗1\lvert\operatorname{Cov}[x^{s}_{j},x^{s}_{j+1}]\rvert| roman_Cov [ italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT ] |. This completes the proof. ∎

Remark

From the proof of Lemma 5, it follows that if we are able to orient all the edges in the chain, then the root of the chain is the node joining the two edges with minimum absolute covariance. When we orient all but one edge (vi,vi+1)subscript𝑣𝑖subscript𝑣𝑖1(v_{i},v_{i+1})( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ), the root node of the chain is either visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT or vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT.

We can extend Lemma 5 to forest graphs. For this, we will make use of the first Meek rule (Meek,, 1995). The first Meek rule concerns an MEC 𝒢~~𝒢\smash{\tilde{\mathcal{G}}}over~ start_ARG caligraphic_G end_ARG, containing the undirected edges (vi,vj),(vj,vk)subscript𝑣𝑖subscript𝑣𝑗subscript𝑣𝑗subscript𝑣𝑘(v_{i},v_{j}),(v_{j},v_{k})( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) , ( italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) but not the edge (vi,vk)subscript𝑣𝑖subscript𝑣𝑘(v_{i},v_{k})( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ). It states that, if one can orient vivjsubscript𝑣𝑖subscript𝑣𝑗v_{i}\rightarrow v_{j}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, we must have vjvksubscript𝑣𝑗subscript𝑣𝑘v_{j}\rightarrow v_{k}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT → italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT.


See 3

Proof.

The undirected parts of an MEC 𝒢~~𝒢\smash{\tilde{\mathcal{G}}}over~ start_ARG caligraphic_G end_ARG are disjoint undirected connected components. Orienting the edges in all these undirected connected components without introducing a v-structure produces a valid DAG 𝒢𝒢\mathcal{G}caligraphic_G in 𝒢~~𝒢\smash{\tilde{\mathcal{G}}}over~ start_ARG caligraphic_G end_ARG (Andersson et al.,, 1997). Each undirected connected components represents a Markov equivalence class of its own (Andersson et al.,, 1997). Thus, to prove the theorem, we consider these undirected connected components independently with respect to the rest of the graph and show how to orient the edges in each undirected connected component.111Orienting edges of an undirected connected component that touch a directed edge in 𝒢~~𝒢\smash{\tilde{\mathcal{G}}}over~ start_ARG caligraphic_G end_ARG never introduces an additional v-structure. If a directed edge pointed into the undirected connected component, the undirected edge downstream would have had to already be directed in 𝒢~~𝒢\smash{\tilde{\mathcal{G}}}over~ start_ARG caligraphic_G end_ARG by the first Meek rule. Hence, all directed edges bordering the undirected connected component must be oriented away from it, and none of the possible undirected edge orientations creates a new collider at the border node. This implies that all undirected connected components in 𝒢~~𝒢\smash{\tilde{\mathcal{G}}}over~ start_ARG caligraphic_G end_ARG are upstream of the colliders and directed subgraphs of 𝒢~~𝒢\smash{\tilde{\mathcal{G}}}over~ start_ARG caligraphic_G end_ARG. In the following argument, we therefore consider 𝒢~~𝒢\smash{\tilde{\mathcal{G}}}over~ start_ARG caligraphic_G end_ARG to be a single undirected connected component, with no directed edges by definition, and show that we can orient all but one edge in 𝒢~~𝒢\smash{\tilde{\mathcal{G}}}over~ start_ARG caligraphic_G end_ARG. This argument then extends to all undirected connected components of the original MEC 𝒢~~𝒢\smash{\tilde{\mathcal{G}}}over~ start_ARG caligraphic_G end_ARG, implying the statement made in Theorem 3.

If 𝒢~~𝒢\smash{\tilde{\mathcal{G}}}over~ start_ARG caligraphic_G end_ARG is an undirected connected component with no directed edges, we only have to consider SCMs with a ground-truth DAG 𝒢𝒢\mathcal{G}caligraphic_G that are members of this MEC 𝒢~~𝒢\smash{\tilde{\mathcal{G}}}over~ start_ARG caligraphic_G end_ARG to distinguish among possible edge orientations in 𝒢~~𝒢\smash{\tilde{\mathcal{G}}}over~ start_ARG caligraphic_G end_ARG. In the case of undirected trees, the ground-truth DAG 𝒢𝒢\mathcal{G}caligraphic_G must be a tree with no colliders and the same skeleton as 𝒢~~𝒢\smash{\tilde{\mathcal{G}}}over~ start_ARG caligraphic_G end_ARG, since any other DAGs would belong to a different MEC.

v1subscript𝑣1v_{1}italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTvi1subscript𝑣𝑖1v_{i-1}italic_v start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPTvisubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPTvi+1subscript𝑣𝑖1v_{i+1}italic_v start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPTvi+2subscript𝑣𝑖2v_{i+2}italic_v start_POSTSUBSCRIPT italic_i + 2 end_POSTSUBSCRIPTvksubscript𝑣𝑘v_{k}italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPTu𝑢uitalic_u
Figure 8: Inductive step of the proof of Theorem 3. Ground-truth DAG 𝒢𝒢\mathcal{G}caligraphic_G underlying an undirected connected component 𝒢~~𝒢\smash{\tilde{\mathcal{G}}}over~ start_ARG caligraphic_G end_ARG in some given MEC. The nodes 𝒱C={v1,,vk}subscript𝒱𝐶subscript𝑣1subscript𝑣𝑘\mathcal{V}_{C}=\{v_{1},\dots,v_{k}\}caligraphic_V start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT = { italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } are a longest chain in 𝒢𝒢\mathcal{G}caligraphic_G. Using Lemma 5, we can orient all edges in 𝒢~Csubscript~𝒢𝐶\smash{\smash{\tilde{\mathcal{G}}}_{C}}over~ start_ARG caligraphic_G end_ARG start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT except possibly (vi,vi+1)subscript𝑣𝑖subscript𝑣𝑖1(v_{i},v_{i+1})( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ) (blue). Edges like (vi1,u)subscript𝑣𝑖1𝑢(v_{i-1},u)( italic_v start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT , italic_u ) are oriented by the first Meek rule. After Lemma 5, we are left with either the single undirected tree of visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (left shaded tree) or the single undirected tree consisting of (vi,vi+1)subscript𝑣𝑖subscript𝑣𝑖1(v_{i},v_{i+1})( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ) (blue) and both undirected trees of visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and vi+1subscript𝑣𝑖1v_{i+1}italic_v start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT (both shaded trees). Either visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT or vi+1subscript𝑣𝑖1v_{i+1}italic_v start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT must be the root of 𝒢Csubscript𝒢𝐶\smash{\mathcal{G}_{C}}caligraphic_G start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT. In this specific example, visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the root of 𝒢Csubscript𝒢𝐶\smash{\mathcal{G}_{C}}caligraphic_G start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT and is therefore the only node that can have a parent outside 𝒢Csubscript𝒢𝐶\mathcal{G}_{C}caligraphic_G start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT. Any node in 𝒢𝒢\mathcal{G}caligraphic_G may have directed, outgoing edges to children in an MEC the undirected connected component 𝒢~~𝒢\smash{\tilde{\mathcal{G}}}over~ start_ARG caligraphic_G end_ARG may be a subgraph of.

We give a proof by strong induction on the number of vertices |𝒱|𝒱\lvert\mathcal{V}\rvert| caligraphic_V | in the MEC 𝒢~~𝒢\smash{\tilde{\mathcal{G}}}over~ start_ARG caligraphic_G end_ARG. The base case of the induction argument is an MEC with |𝒱|=2𝒱2\lvert\mathcal{V}\rvert=2| caligraphic_V | = 2 nodes. This case holds trivially, since this MEC can contain at most one undirected edge. For the inductive step, we consider an undirected tree MEC 𝒢~~𝒢\smash{\tilde{\mathcal{G}}}over~ start_ARG caligraphic_G end_ARG with |𝒱|=d𝒱𝑑\left\lvert\mathcal{V}\right\rvert=d| caligraphic_V | = italic_d and assume that we can orient all but one edge of undirected tree MECs with |𝒱|<d𝒱𝑑\left\lvert\mathcal{V}\right\rvert<d| caligraphic_V | < italic_d.

Our argument will proceed by considering the longest chain of the undirected tree 𝒢~~𝒢\smash{\tilde{\mathcal{G}}}over~ start_ARG caligraphic_G end_ARG. We will use Lemma 5 to orient all but at most one edge in this chain and then apply the first Meek rule to possibly orient additional edges in 𝒢~~𝒢\smash{\tilde{\mathcal{G}}}over~ start_ARG caligraphic_G end_ARG outside the chain. After orienting these edges, we show that we reduced the original problem of orienting all but one edge in 𝒢~~𝒢\smash{\tilde{\mathcal{G}}}over~ start_ARG caligraphic_G end_ARG with |𝒱|=d𝒱𝑑\left\lvert\mathcal{V}\right\rvert=d| caligraphic_V | = italic_d to orienting all but one edge in a single undirected connected component that has strictly fewer than d𝑑ditalic_d nodes. This allows us to apply the inductive hypothesis and complete the proof (see Figure 8).

Consider a longest undirected chain 𝒢~C=(𝒱C,~C)subscript~𝒢𝐶subscript𝒱𝐶subscript~𝐶\tilde{\mathcal{G}}_{C}=(\mathcal{V}_{C},\tilde{\mathcal{E}}_{C})over~ start_ARG caligraphic_G end_ARG start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT = ( caligraphic_V start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT , over~ start_ARG caligraphic_E end_ARG start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ) that is a subgraph of the undirected tree 𝒢~~𝒢\smash{\tilde{\mathcal{G}}}over~ start_ARG caligraphic_G end_ARG. Let 𝒢Csubscript𝒢𝐶\smash{\mathcal{G}_{C}}caligraphic_G start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT refer to the directed subgraph of the DAG 𝒢𝒢\mathcal{G}caligraphic_G induced by considering only the vertices 𝒱Csubscript𝒱𝐶\mathcal{V}_{C}caligraphic_V start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT. We label the k𝑘kitalic_k vertices in 𝒱Csubscript𝒱𝐶\mathcal{V}_{C}caligraphic_V start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT as v1,,vksubscript𝑣1subscript𝑣𝑘v_{1},...,v_{k}italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, with undirected edges (vi,vi+1)~subscript𝑣𝑖subscript𝑣𝑖1~(v_{i},v_{i+1})\in\smash{\tilde{\mathcal{E}}}( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ) ∈ over~ start_ARG caligraphic_E end_ARG for all i{1,,k1}𝑖1𝑘1i\in\{1,\dots,k-1\}italic_i ∈ { 1 , … , italic_k - 1 }. The nodes v1,vksubscript𝑣1subscript𝑣𝑘v_{1},v_{k}italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT can have no undirected neighbours in 𝒢~~𝒢\smash{\tilde{\mathcal{G}}}over~ start_ARG caligraphic_G end_ARG outside the chain, because otherwise we could construct a longer chain in 𝒢~~𝒢\smash{\tilde{\mathcal{G}}}over~ start_ARG caligraphic_G end_ARG.

The only vertex in 𝒱Csubscript𝒱𝐶\mathcal{V}_{C}caligraphic_V start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT that can have a parent in the DAG 𝒢𝒢\mathcal{G}caligraphic_G outside the chain 𝒢Csubscript𝒢𝐶\smash{\mathcal{G}_{C}}caligraphic_G start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT, that is, in 𝒱𝒱C𝒱subscript𝒱𝐶\mathcal{V}\smash{\setminus}\mathcal{V}_{C}caligraphic_V ∖ caligraphic_V start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT, is the unique root of 𝒢Csubscript𝒢𝐶\smash{\mathcal{G}_{C}}caligraphic_G start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT. To see this, we first note that all nodes visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT have at most one parent in 𝒢𝒢\mathcal{G}caligraphic_G, because any visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT with |pa(vi))|>1\left\lvert\text{pa}(v_{i}))\right\rvert>1| pa ( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) | > 1 in 𝒢𝒢\mathcal{G}caligraphic_G would be a collider, but 𝒢𝒢\mathcal{G}caligraphic_G contains no colliders. Since non-root nodes in 𝒢Csubscript𝒢𝐶\smash{\mathcal{G}_{C}}caligraphic_G start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT have an in-chain parent, they cannot have a parent outside of 𝒱Csubscript𝒱𝐶\mathcal{V}_{C}caligraphic_V start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT. Therefore, besides the root node of 𝒢Csubscript𝒢𝐶\smash{\mathcal{G}_{C}}caligraphic_G start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT via its potential outside parent, 𝒢Csubscript𝒢𝐶\smash{\mathcal{G}_{C}}caligraphic_G start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT is a completely disconnected subgraph from the rest of 𝒢𝒢\mathcal{G}caligraphic_G. This implies that we may treat 𝒢Csubscript𝒢𝐶\smash{\mathcal{G}_{C}}caligraphic_G start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT as a separate standardized SCM with undirected chain MEC, in which the potential parent of the root of 𝒢Csubscript𝒢𝐶\smash{\mathcal{G}_{C}}caligraphic_G start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT is modeled as part of the exogenous noise of the root. This allows us to apply Lemma 5 to the variables of the subgraph 𝒢Csubscript𝒢𝐶\smash{\mathcal{G}_{C}}caligraphic_G start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT.

By applying Lemma 5 to 𝒢Csubscript𝒢𝐶\smash{\mathcal{G}_{C}}caligraphic_G start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT, we can orient all but at most one undirected edge in 𝒢~Csubscript~𝒢𝐶\smash{\mathcal{\tilde{G}}_{C}}over~ start_ARG caligraphic_G end_ARG start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT. We split the resulting analysis into the two cases of Lemma 5 leaving either 00 or 1111 undirected edge. In the first case, we can orient all edges in 𝒢~Csubscript~𝒢𝐶\smash{\mathcal{\tilde{G}}_{C}}over~ start_ARG caligraphic_G end_ARG start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT with Lemma 5. In this case, we know that the root of 𝒢Csubscript𝒢𝐶\smash{\mathcal{G}_{C}}caligraphic_G start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT is the node visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (see Remark of Lemma 5). By the first Meek rule, we can recursively orient all additional edges in 𝒢~~𝒢\smash{\tilde{\mathcal{G}}}over~ start_ARG caligraphic_G end_ARG outside of 𝒢~Csubscript~𝒢𝐶\smash{\mathcal{\tilde{G}}_{C}}over~ start_ARG caligraphic_G end_ARG start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT away from visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, except for the subtrees of 𝒢~~𝒢\smash{\tilde{\mathcal{G}}}over~ start_ARG caligraphic_G end_ARG connected to visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT itself (Figure 8). This leaves at most a single connected undirected subtree containing visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and strictly less than d𝑑ditalic_d vertices.

In the second case, we orient all but one edge (vi,vi+1)subscript𝑣𝑖subscript𝑣𝑖1(v_{i},v_{i+1})( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ) in 𝒢~Csubscript~𝒢𝐶\smash{\mathcal{\tilde{G}}_{C}}over~ start_ARG caligraphic_G end_ARG start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT by applying Lemma 5. In this case, we know that the root of 𝒢Csubscript𝒢𝐶\smash{\mathcal{G}_{C}}caligraphic_G start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT is either the node visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT or vi+1subscript𝑣𝑖1v_{i+1}italic_v start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT (see Remark of Lemma 5). Similar to the first case, we can recursively use the first Meek rule to orient all additional edges in 𝒢~~𝒢\smash{\tilde{\mathcal{G}}}over~ start_ARG caligraphic_G end_ARG pointing away from visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and vi+1subscript𝑣𝑖1v_{i+1}italic_v start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT, except for the subtrees of 𝒢~~𝒢\smash{\tilde{\mathcal{G}}}over~ start_ARG caligraphic_G end_ARG connected to visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and vi+1subscript𝑣𝑖1v_{i+1}italic_v start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT itself. Since visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and vi+1subscript𝑣𝑖1v_{i+1}italic_v start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT are connected by an undirected edge, we are left with a single connected subtree containing the undirected edge (vi,vi+1)subscript𝑣𝑖subscript𝑣𝑖1(v_{i},v_{i+1})( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ) that is strictly smaller than before.

In both cases, we orient at least one undirected edge of 𝒢~~𝒢\smash{\tilde{\mathcal{G}}}over~ start_ARG caligraphic_G end_ARG, because the longest undirected chain in 𝒢~~𝒢\smash{\tilde{\mathcal{G}}}over~ start_ARG caligraphic_G end_ARG with |𝒱|>2𝒱2\left\lvert\mathcal{V}\right\rvert>2| caligraphic_V | > 2 has at least length 2222. We always obtain at most a single undirected connected tree component with strictly less than d𝑑ditalic_d vertices, allowing us to apply the inductive hypothesis and complete the proof.

iSCM

See 4

Proof.

Because we consider linear iSCMs with Gaussian noise, the implied model is a linear SCM with additive Gaussian noise (see Section A.2). Hence, the observational distribution is a multivariate Gaussian with mean zero. In iSCMs, the marginal variance of an observed variable is always 1111. Hence, we prove the statement if we show that for all x~i,x~jsubscript~𝑥𝑖subscript~𝑥𝑗\widetilde{x}_{i},\widetilde{x}_{j}over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT in the iSCM with graph 𝒢𝒢\mathcal{G}caligraphic_G, and the corresponding x~i,x~jsuperscriptsubscript~𝑥𝑖superscriptsubscript~𝑥𝑗\widetilde{x}_{i}^{\prime},\widetilde{x}_{j}^{\prime}over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT in the iSCM with graph 𝒢=(𝒱,)superscript𝒢𝒱superscript\mathcal{G}^{\prime}=(\mathcal{V},\mathcal{E}^{\prime})caligraphic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = ( caligraphic_V , caligraphic_E start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ), Cov[x~i,x~j]=Cov[x~i,x~j]Covsubscript~𝑥𝑖subscript~𝑥𝑗Covsuperscriptsubscript~𝑥𝑖superscriptsubscript~𝑥𝑗\operatorname{Cov}[\widetilde{x}_{i},\widetilde{x}_{j}]=\operatorname{Cov}[% \widetilde{x}_{i}^{\prime},\widetilde{x}_{j}^{\prime}]roman_Cov [ over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] = roman_Cov [ over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ].

Let x~isuperscriptsubscript~𝑥𝑖\widetilde{x}_{i}^{\prime}over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and x~jsuperscriptsubscript~𝑥𝑗\smash{\widetilde{x}_{j}^{\prime}}over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT be the random variables associated with the nodes visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT from 𝒢superscript𝒢\mathcal{G}^{\prime}caligraphic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, respectively. We consider two cases. First, if there is no path between visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT in the skeleton of 𝒢superscript𝒢\mathcal{G}^{\prime}caligraphic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT then there is no path between visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT in the skeleton of 𝒢𝒢\mathcal{G}caligraphic_G and hence Cov[x~i,x~j]=Cov[x~i,x~j]=0.Covsubscript~𝑥𝑖subscript~𝑥𝑗Covsuperscriptsubscript~𝑥𝑖superscriptsubscript~𝑥𝑗0\operatorname{Cov}[\widetilde{x}_{i},\widetilde{x}_{j}]=\operatorname{Cov}[% \widetilde{x}_{i}^{\prime},\widetilde{x}_{j}^{\prime}]=0.roman_Cov [ over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] = roman_Cov [ over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ] = 0 . In the second case, there is a path between visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT in the skeleton of 𝒢superscript𝒢\mathcal{G}^{\prime}caligraphic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, so there also exists a path in the skeleton of 𝒢𝒢\mathcal{G}caligraphic_G, as both graphs have the same skeleton. Due to the acyclicity of the skeleton in forests, this path is the only one connecting visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT in both 𝒢𝒢\mathcal{G}caligraphic_G and 𝒢superscript𝒢\mathcal{G}^{\prime}caligraphic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT.

visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPTvksubscript𝑣𝑘v_{k}italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPTvjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT
(a) First subcase
visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPTvlsubscript𝑣𝑙v_{l}italic_v start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPTvksubscript𝑣𝑘v_{k}italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPTvpsubscript𝑣𝑝v_{p}italic_v start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPTvjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT
(b) Second subcase (More than one parent in 𝒢superscript𝒢\mathcal{G}^{\prime}caligraphic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT)
visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPTvlsubscript𝑣𝑙v_{l}italic_v start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPTvksubscript𝑣𝑘v_{k}italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPTvjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT
(c) Second subcase (A single parent in 𝒢superscript𝒢\mathcal{G}^{\prime}caligraphic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT)
Figure 9: Proof subcases of Theorem 4. (a) Path with a collider. In other words, a path blocked by an empty set. In the case of forests, this configuration implies that visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are d𝑑ditalic_d-separated. (b) Unblocked path connecting visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT with one of the path nodes having a parent both in the path and outside the path. The weight wp,ksubscript𝑤𝑝𝑘w_{p,k}italic_w start_POSTSUBSCRIPT italic_p , italic_k end_POSTSUBSCRIPT influences the weight w~l,ksubscript~𝑤𝑙𝑘\widetilde{w}_{l,k}over~ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_l , italic_k end_POSTSUBSCRIPT in the implied model of the iSCM. If this structure is present in a forest, it has to be present in other graphs in the same MEC. (c) Unblocked path connecting visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT with the only parent of vksubscript𝑣𝑘v_{k}italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT being part of the considered path. The weight w~l,ksubscript~𝑤𝑙𝑘\widetilde{w}_{l,k}over~ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_l , italic_k end_POSTSUBSCRIPT depends only on wl,ksubscript𝑤𝑙𝑘w_{l,k}italic_w start_POSTSUBSCRIPT italic_l , italic_k end_POSTSUBSCRIPT, irrespective of the edge direction.

We further break this second case into two subcases. In the first subcase, this path contains a collider in 𝒢𝒢\mathcal{G}caligraphic_G as shown in Figure 9(a). Because the skeleton cannot have undirected cycles under the forest assumption, this collider forms a v𝑣vitalic_v-structure. 𝒢𝒢~superscript𝒢~𝒢\mathcal{G}^{\prime}\in\smash{\tilde{\mathcal{G}}}caligraphic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ over~ start_ARG caligraphic_G end_ARG implies that the same v𝑣vitalic_v-structure must be present in 𝒢𝒢\mathcal{G}caligraphic_G. Hence, visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are d𝑑ditalic_d-separated in both 𝒢𝒢\mathcal{G}caligraphic_G and 𝒢superscript𝒢\mathcal{G}^{\prime}caligraphic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. By the global Markov condition, this implies that x~isuperscriptsubscript~𝑥𝑖\widetilde{x}_{i}^{\prime}over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and x~jsuperscriptsubscript~𝑥𝑗\widetilde{x}_{j}^{\prime}over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT are independent, and that x~isubscript~𝑥𝑖\widetilde{x}_{i}over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and x~jsubscript~𝑥𝑗\widetilde{x}_{j}over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are independent. This implies that both Cov[x~i,x~j]=Cov[x~i,x~i]=0Covsuperscriptsubscript~𝑥𝑖superscriptsubscript~𝑥𝑗Covsubscript~𝑥𝑖subscript~𝑥𝑖0\operatorname{Cov}[\widetilde{x}_{i}^{\prime},\widetilde{x}_{j}^{\prime}]=% \operatorname{Cov}[\widetilde{x}_{i},\widetilde{x}_{i}]=0roman_Cov [ over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ] = roman_Cov [ over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] = 0.

In the second subcase, there exists an unblocked path between visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT in both 𝒢𝒢\mathcal{G}caligraphic_G and 𝒢superscript𝒢\mathcal{G}^{\prime}caligraphic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. Here, we denote the weight matrix associated with both iSCMs by W:=[wi,j]assign𝑊delimited-[]subscript𝑤𝑖𝑗W:=[w_{i,j}]italic_W := [ italic_w start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ], with W𝑊Witalic_W being symmetric, so that wi,j=wj,isubscript𝑤𝑖𝑗subscript𝑤𝑗𝑖w_{i,j}=w_{j,i}italic_w start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = italic_w start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT is the linear weight of the edge (i,j)𝑖𝑗(i,j)( italic_i , italic_j ) regardless of its orientation in the graph.

We now derive the analogous weights W~,W~~𝑊superscript~𝑊\widetilde{W},\widetilde{W}^{\prime}over~ start_ARG italic_W end_ARG , over~ start_ARG italic_W end_ARG start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT in the implied SCMs for 𝒢,𝒢𝒢superscript𝒢\mathcal{G},\mathcal{G}^{\prime}caligraphic_G , caligraphic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT respectively. Ultimately, we will demonstrate that the implied SCMs have the same weights. Specifically, we will show that w~k,l=w~k,lsubscript~𝑤𝑘𝑙superscriptsubscript~𝑤𝑘𝑙\smash{\widetilde{w}_{k,l}}=\smash{\widetilde{w}_{k,l}^{\prime}}over~ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_k , italic_l end_POSTSUBSCRIPT = over~ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_k , italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. Given this, Lemma 1 implies that both iSCMs have the same covariance matrix over the observed variables.

Without loss of generality, since the node labelling is arbitrary, let vksubscript𝑣𝑘v_{k}italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT have at least as many incoming edges as vlsubscript𝑣𝑙v_{l}italic_v start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT in 𝒢superscript𝒢\mathcal{G}^{\prime}caligraphic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. We divide the analysis into two cases: vksubscript𝑣𝑘v_{k}italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT having only 1111 parent in 𝒢superscript𝒢\mathcal{G}^{\prime}caligraphic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, and vksubscript𝑣𝑘v_{k}italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT having more than 1111 parent. The node vksubscript𝑣𝑘v_{k}italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT must have at least one parent, since at least one of vk,vlsubscript𝑣𝑘subscript𝑣𝑙v_{k},v_{l}italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT have an incoming edge in 𝒢superscript𝒢\mathcal{G}^{\prime}caligraphic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, and we chose vksubscript𝑣𝑘v_{k}italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT to have at least as many incoming edges as vlsubscript𝑣𝑙v_{l}italic_v start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT.

More than one parent in 𝒢superscript𝒢\mathcal{G}^{\prime}caligraphic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT

We know that any collider in 𝒢superscript𝒢\mathcal{G}^{\prime}caligraphic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT will appear as part of a v𝑣vitalic_v-structure in 𝒢~~𝒢\smash{\tilde{\mathcal{G}}}over~ start_ARG caligraphic_G end_ARG due to the forest assumption, and therefore will also be a collider in 𝒢𝒢\mathcal{G}caligraphic_G. Therefore, if vksubscript𝑣𝑘v_{k}italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT has more than one parent in 𝒢superscript𝒢\mathcal{G}^{\prime}caligraphic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT (see Figure 9(b)), all pairs of edges incoming to vksubscript𝑣𝑘v_{k}italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT will form v𝑣vitalic_v-structures, so vksubscript𝑣𝑘v_{k}italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT must have exactly the same set of parents in 𝒢𝒢\mathcal{G}caligraphic_G.

Moreover, any two parents of vksubscript𝑣𝑘v_{k}italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT are d-separated in 𝒢𝒢\mathcal{G}caligraphic_G and 𝒢superscript𝒢\mathcal{G}^{\prime}caligraphic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT by the forest assumption, since the blocked path going through vksubscript𝑣𝑘v_{k}italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is the only path connecting them. By the global Markov condition, the parents are pairwise independent. Hence, we can use Equation (11) to compute w~k,l,w~k,lsubscript~𝑤𝑘𝑙superscriptsubscript~𝑤𝑘𝑙\widetilde{w}_{k,l},\widetilde{w}_{k,l}^{\prime}over~ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_k , italic_l end_POSTSUBSCRIPT , over~ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_k , italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. Since the parent sets are the same between the two graphs, and W𝑊Witalic_W is shared between the two iSCMs, the weight associated with the edge (l,k)𝑙𝑘(l,k)( italic_l , italic_k ) in both graphs in the implied models is given by

w~l,k=w~l,k=wl,kupa(k)wu,k2+σ2.subscript~𝑤𝑙𝑘superscriptsubscript~𝑤𝑙𝑘subscript𝑤𝑙𝑘subscript𝑢pa𝑘superscriptsubscript𝑤𝑢𝑘2superscript𝜎2\widetilde{w}_{l,k}=\widetilde{w}_{l,k}^{\prime}=\frac{w_{l,k}}{\sqrt{\sum_{u% \in\mathrm{pa}(k)}w_{u,k}^{2}+\sigma^{2}}}\,.over~ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_l , italic_k end_POSTSUBSCRIPT = over~ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_l , italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = divide start_ARG italic_w start_POSTSUBSCRIPT italic_l , italic_k end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG ∑ start_POSTSUBSCRIPT italic_u ∈ roman_pa ( italic_k ) end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_u , italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG . (24)
A single parent in 𝒢superscript𝒢\mathcal{G}^{\prime}caligraphic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT

Let (l,k)𝑙𝑘(l,k)( italic_l , italic_k ) be the only incoming edge to vksubscript𝑣𝑘v_{k}italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT in 𝒢superscript𝒢\mathcal{G}^{\prime}caligraphic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, as depicted in Figure 9(c). Then, the edge connecting vlsubscript𝑣𝑙v_{l}italic_v start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT and vksubscript𝑣𝑘v_{k}italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT in 𝒢𝒢\mathcal{G}caligraphic_G is either the only incoming edge to vksubscript𝑣𝑘v_{k}italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT or the only incoming edge to vlsubscript𝑣𝑙v_{l}italic_v start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT. To see this, suppose that it was not the only incoming edge to vksubscript𝑣𝑘v_{k}italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT or vlsubscript𝑣𝑙v_{l}italic_v start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT in 𝒢𝒢\mathcal{G}caligraphic_G. This would make vksubscript𝑣𝑘v_{k}italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT or vlsubscript𝑣𝑙v_{l}italic_v start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT a collider that would be common to both graphs, implying that vksubscript𝑣𝑘v_{k}italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT or vlsubscript𝑣𝑙v_{l}italic_v start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT would have at least two parents in 𝒢superscript𝒢\mathcal{G}^{\prime}caligraphic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. We operate under the assumption that vksubscript𝑣𝑘v_{k}italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT has at least as many parents as vlsubscript𝑣𝑙v_{l}italic_v start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT, so it would imply that vksubscript𝑣𝑘v_{k}italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT has more than one parent, contradicting the assumption we made for case we consider in this paragraph. Irrespective of the direction, the weight associated with the edge (l,k)𝑙𝑘(l,k)( italic_l , italic_k ) in the skeleton of both graphs in the implied model is, similar to Equation (21), given by

w~l,k=w~l,k=wl,kwl,k2+σ2.subscript~𝑤𝑙𝑘superscriptsubscript~𝑤𝑙𝑘subscript𝑤𝑙𝑘superscriptsubscript𝑤𝑙𝑘2superscript𝜎2\widetilde{w}_{l,k}=\widetilde{w}_{l,k}^{\prime}=\frac{w_{l,k}}{\sqrt{w_{l,k}^% {2}+\sigma^{2}}}\,.over~ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_l , italic_k end_POSTSUBSCRIPT = over~ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_l , italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = divide start_ARG italic_w start_POSTSUBSCRIPT italic_l , italic_k end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG italic_w start_POSTSUBSCRIPT italic_l , italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG . (25)

Equations (24) and (25) show that, for the SCM form of each iSCM, the edges connecting the same nodes irrespective of their direction in 𝒢superscript𝒢\mathcal{G}^{\prime}caligraphic_G start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and 𝒢𝒢\mathcal{G}caligraphic_G have the same weights. By Lemma 1, the covariance between any x~isubscript~𝑥𝑖\widetilde{x}_{i}over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and x~jsubscript~𝑥𝑗\widetilde{x}_{j}over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT can be expressed as a product of the weights in the implied SCM corresponding to the edges on the path between vi,vjsubscript𝑣𝑖subscript𝑣𝑗v_{i},v_{j}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. Hence, Cov[x~i,x~j]=Cov[x~i,x~j]Covsubscript~𝑥𝑖subscript~𝑥𝑗Covsuperscriptsubscript~𝑥𝑖superscriptsubscript~𝑥𝑗\operatorname{Cov}[\widetilde{x}_{i},\widetilde{x}_{j}]=\operatorname{Cov}[% \widetilde{x}_{i}^{\prime},\widetilde{x}_{j}^{\prime}]roman_Cov [ over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] = roman_Cov [ over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ]. ∎

Figure 10 shows an example for Theorem 4 for two trees from the same MEC.

v1subscript𝑣1v_{1}italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTv2subscript𝑣2v_{2}italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTv3subscript𝑣3v_{3}italic_v start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPTv4subscript𝑣4v_{4}italic_v start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPTv5subscript𝑣5v_{5}italic_v start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPTv6subscript𝑣6v_{6}italic_v start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT1111   2222   3333 44445555  
v1subscript𝑣1v_{1}italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTv2subscript𝑣2v_{2}italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTv3subscript𝑣3v_{3}italic_v start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPTv4subscript𝑣4v_{4}italic_v start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPTv5subscript𝑣5v_{5}italic_v start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPTv6subscript𝑣6v_{6}italic_v start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT1111   22223333   44445555   
Refer to caption
Figure 10: Illustrating Theorem 4 for trees in the same MEC. Covariance matrix of observed iSCM variables for two example forests belonging to the same MEC with the same weights assigned to the edges of the skeleton.
Remark

In Figure 11, we empirically demonstrate that Theorem 4 no longer holds if we drop the forest assumption. For data generated from an iSCM and two graphs from the same 𝒢~~𝒢\smash{\tilde{\mathcal{G}}}over~ start_ARG caligraphic_G end_ARG with the same weights assigned to the skeleton edges, we observe that the estimated covariances differ. The two systems entail different observational distributions.

v1subscript𝑣1v_{1}italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTv2subscript𝑣2v_{2}italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTv3subscript𝑣3v_{3}italic_v start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT1111    22223333
Refer to caption
v1subscript𝑣1v_{1}italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTv2subscript𝑣2v_{2}italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTv3subscript𝑣3v_{3}italic_v start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT1111    22223333
Refer to caption
Figure 11: Non-forest counterexample for Theorem 4. Covariance matrix of observed iSCM variables for two non-forests belonging to the same MEC with the same weights assigned to the edges of the skeleton.

Appendix D Background on Related Work

D.1 Heuristics for Mitigating Variance Accumulation and VarVar\operatorname{Var}roman_Var-sortability in SCMs

Here, we review existing heuristics for avoiding the exploding variance in structure learning benchmarking with linear SCMs as defined in Equation (1). We also describe how these heuristics limit the causal dependencies that can be modeled in terms of the correlations among the SCM variables or their cause-explained variance, both of which does not occur in linear iSCMs.

Scaling weights by the inverse weight norm

Mooij et al., (2020, Section 5.2) sample the edge weights in linear SCMs as wi,jUnif±[0.5,1.5]similar-tosubscript𝑤𝑖𝑗subscriptUnifplus-or-minus0.51.5w_{i,j}\sim\operatorname{Unif}_{\pm}{[0.5,1.5]}italic_w start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ∼ roman_Unif start_POSTSUBSCRIPT ± end_POSTSUBSCRIPT [ 0.5 , 1.5 ]. To achieve a comparable variance of each variable xjsubscript𝑥𝑗x_{j}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT in the SCM, they propose re-scaling the sampled weights prior to the data-generating process as

wi,jwi,j1+ipa(j)wi,j2.subscript𝑤𝑖𝑗subscript𝑤𝑖𝑗1subscript𝑖pa𝑗superscriptsubscript𝑤𝑖𝑗2\displaystyle w_{i,j}\leftarrow\frac{w_{i,j}}{\sqrt{1+\sum_{i\in\mathrm{pa}(j)% }w_{i,j}^{2}}}\,.italic_w start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ← divide start_ARG italic_w start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG 1 + ∑ start_POSTSUBSCRIPT italic_i ∈ roman_pa ( italic_j ) end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG .

If all parents of xjsubscript𝑥𝑗x_{j}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are i.i.d. Gaussian with variance 1111, this adjustment ensures that the variance of xjsubscript𝑥𝑗x_{j}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is similar for all xjsubscript𝑥𝑗x_{j}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. However, this approximation does not take into account the covariances of the parents. Moreover, since Var[εj]Varsubscript𝜀𝑗\smash{\operatorname{Var}[\varepsilon_{j}]}roman_Var [ italic_ε start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] is unchanged, the scaling limits the strength of the causal effect that parents can have on xjsubscript𝑥𝑗x_{j}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. For example, when x1=ε1subscript𝑥1subscript𝜀1x_{1}=\varepsilon_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_ε start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and x2=wx1+ε2subscript𝑥2𝑤subscript𝑥1subscript𝜀2x_{2}=wx_{1}+\varepsilon_{2}italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_w italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_ε start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT with Var[εj]=1Varsubscript𝜀𝑗1\smash{\operatorname{Var}[\varepsilon_{j}]=1}roman_Var [ italic_ε start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] = 1 as for Mooij et al., (2020), the adjusted weight is w=w/1+w2<1superscript𝑤𝑤1superscript𝑤21\smash{w^{\prime}=w/\sqrt{1+w^{2}}}<1italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_w / square-root start_ARG 1 + italic_w start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG < 1. Thus, for any w0𝑤0w\neq 0italic_w ≠ 0, we have

|Corr[x1,x2]|=|Cov[ε1,wε1+ε2]|Var[ε1]Var[wε1+ε2]=|w|w2+1<120.707.Corrsubscript𝑥1subscript𝑥2Covsubscript𝜀1superscript𝑤subscript𝜀1subscript𝜀2Varsubscript𝜀1Varsuperscript𝑤subscript𝜀1subscript𝜀2superscript𝑤superscriptsuperscript𝑤21120.707\displaystyle\lvert\mathrm{Corr}[x_{1},x_{2}]\rvert=\frac{\lvert\operatorname{% Cov}[\varepsilon_{1},\smash{w^{\prime}}\varepsilon_{1}+\varepsilon_{2}]\rvert}% {\sqrt{\operatorname{Var}[\varepsilon_{1}]\operatorname{Var}[\smash{w^{\prime}% }\varepsilon_{1}+\varepsilon_{2}]}}=\frac{\lvert\smash{w^{\prime}}\rvert}{% \sqrt{\smash{w^{\prime}}^{2}+1}}<\frac{1}{\sqrt{2}}\approx 0.707\,.| roman_Corr [ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] | = divide start_ARG | roman_Cov [ italic_ε start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_ε start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_ε start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] | end_ARG start_ARG square-root start_ARG roman_Var [ italic_ε start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] roman_Var [ italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_ε start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_ε start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] end_ARG end_ARG = divide start_ARG | italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | end_ARG start_ARG square-root start_ARG italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 end_ARG end_ARG < divide start_ARG 1 end_ARG start_ARG square-root start_ARG 2 end_ARG end_ARG ≈ 0.707 .

This is the maximum correlation between neighbouring variables that any SCM can model under the proposed re-scaling when Var[εj]=1Varsubscript𝜀𝑗1\smash{\operatorname{Var}[\varepsilon_{j}]=1}roman_Var [ italic_ε start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] = 1, since additional parents decrease the parent-child correlations. By contrast, iSCMs can model any level of correlation by sampling arbitrary values of wi,jsubscript𝑤𝑖𝑗w_{i,j}italic_w start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT, while guaranteeing unit-variance observations xjsubscript𝑥𝑗x_{j}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. Intuitively, iSCMs achieve this by standardizing xjsubscript𝑥𝑗x_{j}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT after the exogenous noise εjsubscript𝜀𝑗\varepsilon_{j}italic_ε start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is added to the endogenous contributions of the parents 𝐱pa(j)subscript𝐱pa𝑗\mathbf{x}_{\mathrm{pa}(j)}bold_x start_POSTSUBSCRIPT roman_pa ( italic_j ) end_POSTSUBSCRIPT, while weight scaling is done before εjsubscript𝜀𝑗\varepsilon_{j}italic_ε start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is added to xjsubscript𝑥𝑗x_{j}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT.

Scaling weights by the incoming variance

Squires et al., (2022, Section 5.1) sample the weights of linear SCMs as wi,jUnif±[0.25,1.0]similar-tosubscript𝑤𝑖𝑗subscriptUnifplus-or-minus0.251.0w_{i,j}\sim\operatorname{Unif}_{\pm}{[0.25,1.0]}italic_w start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ∼ roman_Unif start_POSTSUBSCRIPT ± end_POSTSUBSCRIPT [ 0.25 , 1.0 ]. Given the initial edge weights, they propose adjusting the weights during the generative process by first estimating the variance σ^j2superscriptsubscript^𝜎𝑗2\smash{\hat{\sigma}_{j}^{2}}over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT of xjsubscript𝑥𝑗x_{j}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT from samples drawn under an initial level of additive noise with Var[εj]=1Varsubscript𝜀𝑗1\operatorname{Var}[\varepsilon_{j}]=1roman_Var [ italic_ε start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] = 1 and then re-scaling the weights as

wi,jwi,j2σ^j2.subscript𝑤𝑖𝑗subscript𝑤𝑖𝑗2superscriptsubscript^𝜎𝑗2\displaystyle w_{i,j}\leftarrow\frac{w_{i,j}}{\sqrt{2\hat{\sigma}_{j}^{2}}}\,.italic_w start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ← divide start_ARG italic_w start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG 2 over^ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG .

When using additive noise with Var[εj]=0.5Varsubscript𝜀𝑗0.5\operatorname{Var}[\varepsilon_{j}]=0.5roman_Var [ italic_ε start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] = 0.5 to generate the actual samples, this scaling results in Var[xj]=1Varsubscript𝑥𝑗1\operatorname{Var}[x_{j}]=1roman_Var [ italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] = 1 with a constant fraction of cause-explained variance CEVf[xi]=0.5subscriptCEVfsubscript𝑥𝑖0.5\operatorname{CEV_{f}}[x_{i}]=0.5start_OPFUNCTION roman_CEV start_POSTSUBSCRIPT roman_f end_POSTSUBSCRIPT end_OPFUNCTION [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] = 0.5. In benchmarks, however, we may be interested in evaluating SCMs with arbitrary levels of cause-explained variance. iSCMs allow this by construction. Contrary to Squires et al., (2022), iSCMs scale the variables xjsubscript𝑥𝑗x_{j}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT rather than the weights wi,jsubscript𝑤𝑖𝑗w_{i,j}italic_w start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT while leaving the exogenous noise εjsubscript𝜀𝑗\varepsilon_{j}italic_ε start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT unchanged, which enables modeling arbitrarily small or large levels of unexplained variation.

D.2 Sortability Metrics

In this section, we describe the definition of a sortability metric as introduced by Reisach et al., (2024), which we use in Section 5. For a function τ𝜏\tauitalic_τ, τ𝜏\tauitalic_τ-sortability assigns a scalar in [0,1]01[0,1][ 0 , 1 ] to the variables 𝐱𝐱\mathbf{x}bold_x and graph 𝒢𝒢\mathcal{G}caligraphic_G (with weight matrix W𝒢subscript𝑊𝒢W_{\mathcal{G}}italic_W start_POSTSUBSCRIPT caligraphic_G end_POSTSUBSCRIPT) as

i=1dpstW𝒢iincr(τ(𝐱,s),τ(𝐱,t))i=1dpstW𝒢i1 where incr(a,b)={1if a<b12if a=b0if a>bsuperscriptsubscript𝑖1𝑑subscriptsubscript𝑝𝑠𝑡subscriptsuperscript𝑊𝑖𝒢incr𝜏𝐱𝑠𝜏𝐱𝑡superscriptsubscript𝑖1𝑑subscriptsubscript𝑝𝑠𝑡subscriptsuperscript𝑊𝑖𝒢1 where incr𝑎𝑏cases1if 𝑎𝑏12if 𝑎𝑏0if 𝑎𝑏\frac{\sum_{i=1}^{d}\sum_{p_{s\rightarrow t}\in W^{i}_{\mathcal{G}}}\text{incr% }(\tau(\mathbf{x},s),\tau(\mathbf{x},t))}{\sum_{i=1}^{d}\sum_{p_{s\rightarrow t% }\in W^{i}_{\mathcal{G}}}1}\quad\text{ where }\text{incr}(a,b)=\begin{cases}1&% \text{if }a<b\\ \frac{1}{2}&\text{if }a=b\\ 0&\text{if }a>b\end{cases}divide start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_s → italic_t end_POSTSUBSCRIPT ∈ italic_W start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT incr ( italic_τ ( bold_x , italic_s ) , italic_τ ( bold_x , italic_t ) ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_s → italic_t end_POSTSUBSCRIPT ∈ italic_W start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT 1 end_ARG where roman_incr ( italic_a , italic_b ) = { start_ROW start_CELL 1 end_CELL start_CELL if italic_a < italic_b end_CELL end_ROW start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_CELL start_CELL if italic_a = italic_b end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL if italic_a > italic_b end_CELL end_ROW

and W𝒢isubscriptsuperscript𝑊𝑖𝒢W^{i}_{\mathcal{G}}italic_W start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_G end_POSTSUBSCRIPT is the i𝑖iitalic_i-th power of the adjacency matrix W𝒢subscript𝑊𝒢W_{\mathcal{G}}italic_W start_POSTSUBSCRIPT caligraphic_G end_POSTSUBSCRIPT and pstW𝒢isubscript𝑝𝑠𝑡subscriptsuperscript𝑊𝑖𝒢p_{s\rightarrow t}\in W^{i}_{\mathcal{G}}italic_p start_POSTSUBSCRIPT italic_s → italic_t end_POSTSUBSCRIPT ∈ italic_W start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_G end_POSTSUBSCRIPT if and only if at least one directed path from vssubscript𝑣𝑠v_{s}italic_v start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT to vtsubscript𝑣𝑡v_{t}italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT of length i𝑖iitalic_i exists in 𝒢𝒢\mathcal{G}caligraphic_G. If τ(𝐱,t)=Var[xt]𝜏𝐱𝑡Varsubscript𝑥𝑡\tau(\mathbf{x},t)=\operatorname{Var}[x_{t}]italic_τ ( bold_x , italic_t ) = roman_Var [ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ], we obtain VarVar\operatorname{Var}roman_Var-sortability from Reisach et al., (2021). If

τ(𝐱,t)=R2[xt]=1Var[xt𝔼[xt|𝐱{1,,d}\{t}]]Var[xt],𝜏𝐱𝑡superscript𝑅2delimited-[]subscript𝑥𝑡1Varsubscript𝑥𝑡𝔼delimited-[]conditionalsubscript𝑥𝑡subscript𝐱\1𝑑𝑡Varsubscript𝑥𝑡\tau(\mathbf{x},t)=R^{2}[x_{t}]=1-\frac{\operatorname{Var}[x_{t}-\mathds{E}[x_% {t}|\mathbf{x}_{\{1,...,d\}\backslash\{t\}}]]}{\operatorname{Var}[x_{t}]}\,,italic_τ ( bold_x , italic_t ) = italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT [ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] = 1 - divide start_ARG roman_Var [ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - blackboard_E [ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | bold_x start_POSTSUBSCRIPT { 1 , … , italic_d } \ { italic_t } end_POSTSUBSCRIPT ] ] end_ARG start_ARG roman_Var [ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] end_ARG ,

we obtain R2superscriptR2\operatorname{R^{2}}roman_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-sortability. Estimating R2[xt]superscript𝑅2delimited-[]subscript𝑥𝑡R^{2}[x_{t}]italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT [ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] requires performing regression of xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT onto 𝐱{1,,d}\{t}subscript𝐱\1𝑑𝑡\mathbf{x}_{\{1,...,d\}\backslash\{t\}}bold_x start_POSTSUBSCRIPT { 1 , … , italic_d } \ { italic_t } end_POSTSUBSCRIPT.

D.3 Structure Learning Algorithms

To complement the interpretation of the results in Section 5, we provide some background on the structure learning methods we evaluate.

Notears (Zheng et al.,, 2018)

Notears uses continuous optimization to minimize the regularized mean-squared error (MSE) between the the variables modeled by a linear SCM and the observations, while enforcing a differentiable acyclicity constraint. The objective function of Notears is given by F(𝐖)=𝐗𝐗𝐖F2/2n+λ𝐖1𝐹𝐖subscriptsuperscriptnorm𝐗𝐗𝐖2𝐹2𝑛𝜆subscriptnorm𝐖1F(\mathbf{W})=||\mathbf{X}-\mathbf{X}\mathbf{W}||^{2}_{F}/2n+\lambda||\mathbf{% W}||_{1}italic_F ( bold_W ) = | | bold_X - bold_XW | | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT / 2 italic_n + italic_λ | | bold_W | | start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, where ||||F||\cdot||_{F}| | ⋅ | | start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT and ||||1||\cdot||_{1}| | ⋅ | | start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT are a Frobenius and 1subscript1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT norm respectively. When the objective is minimized, weights below a fixed threshold are set to zero.

Avici (Lorch et al.,, 2022)

Avici is an amortized variational inference method that approximates the posterior distribution over causal structures given a dataset through a pretrained inference model. The variational approximation of Avici uses a fully-factored product of Bernoulli distributions for every possible graph edge. The inference model is a neural network that predict the variational parameters of the Bernoulli distributions by minimizing the expected forward KL divergence between the true posterior and the approximation. To train the inference model, Avici can be optimized on any training distribution of (synthetic) dataset-graph pairs. Lorch et al., (2022) publish the pretrained parameters of inference models trained on standardized SCMs with linear and nonlinear mechanisms, which we evaluate in this work.

SortnRegress methods (Reisach et al.,, 2021, 2024)

The SortnRegress methods order the vertices by a chosen statistic and sparsely regress every node on all of its predecessors in the obtained order. They use Lasso regression with the Bayesian Information Criterion to learn the regression function for a given variable. VarVar\operatorname{Var}roman_Var-SortnRegress uses estimated marginal variances as the sorting criterion. R2superscriptR2\operatorname{R^{2}}roman_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-SortnRegress uses R2superscriptR2\operatorname{R^{2}}roman_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT coefficient of determination estimated after performing a regression of every variable onto all remaining variables. Rand-SortnRegress orders the vertices randomly.

Appendix E Experimental Setup

E.1 Data

Causal mechanisms

We consider systems with additive noise, where

fi(𝐱,εi)=hi(𝐱)+εi,subscript𝑓𝑖𝐱subscript𝜀𝑖subscript𝑖𝐱subscript𝜀𝑖f_{i}(\mathbf{x},\varepsilon_{i})=h_{i}(\mathbf{x})+\varepsilon_{i},italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x , italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x ) + italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ,

for a chosen function hisubscript𝑖h_{i}italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. The Linear systems used in this experiments have causal mechanisms as defined in Equation 1. To model nonlinear systems, we use smooth nonlinear functional mechanisms as used by Lorch et al., (2022). Specifically, the function hisubscript𝑖h_{i}italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT that models the relationship between xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and its parents is sampled from a Gaussian Process

hi𝒢𝒫(0,ki),similar-tosubscript𝑖𝒢𝒫0subscript𝑘𝑖h_{i}\sim\mathcal{GP}(0,k_{i})\,,italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ caligraphic_G caligraphic_P ( 0 , italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ,

where k𝑘kitalic_k is a squared exponential kernel ki(𝐱,𝐱)=ci2exp(𝐱𝐱22/2li2)subscript𝑘𝑖𝐱superscript𝐱superscriptsubscript𝑐𝑖2subscriptsuperscriptnorm𝐱superscript𝐱222superscriptsubscript𝑙𝑖2k_{i}(\mathbf{x},\mathbf{x}^{\prime})=c_{i}^{2}\exp\left(-||\mathbf{x}-\mathbf% {x}^{\prime}||^{2}_{2}/2l_{i}^{2}\right)italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x , bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_exp ( - | | bold_x - bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT / 2 italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) with output and length scales cisubscript𝑐𝑖c_{i}italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and lisubscript𝑙𝑖l_{i}italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT respectively. We can approximately express the function sample hisubscript𝑖h_{i}italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT analytically using random Fourier features (Rahimi and Recht,, 2007) by sampling

hi(𝐱)=ci2Mj=1Mα(i)cos(ω(i)𝐱li+δ(i))subscript𝑖𝐱subscript𝑐𝑖2𝑀superscriptsubscript𝑗1𝑀superscript𝛼𝑖superscript𝜔𝑖𝐱subscript𝑙𝑖superscript𝛿𝑖h_{i}(\mathbf{x})=c_{i}\sqrt{\tfrac{2}{M}}\sum_{j=1}^{M}\alpha^{(i)}\cos\left(% \tfrac{\omega^{(i)}\cdot\mathbf{x}}{l_{i}}+\delta^{(i)}\right)italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_x ) = italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT square-root start_ARG divide start_ARG 2 end_ARG start_ARG italic_M end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT italic_α start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT roman_cos ( divide start_ARG italic_ω start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ⋅ bold_x end_ARG start_ARG italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG + italic_δ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT )

where α(i)𝒩(0,1)similar-tosuperscript𝛼𝑖𝒩01\alpha^{(i)}\sim\mathcal{N}(0,1)italic_α start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ∼ caligraphic_N ( 0 , 1 ), ω(i)𝒩(0,𝐈)similar-tosuperscript𝜔𝑖𝒩0𝐈\omega^{(i)}\sim\mathcal{N}(0,\mathbf{I})italic_ω start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ∼ caligraphic_N ( 0 , bold_I ), and δ(i)Unif[0,2π]similar-tosuperscript𝛿𝑖Unif02𝜋\delta^{(i)}\sim\operatorname{Unif}{[0,2\pi]}italic_δ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ∼ roman_Unif [ 0 , 2 italic_π ]. In this work, we use M=100𝑀100M=100italic_M = 100.

Generating a random model

Following prior work (Section 2), we sample random systems in any simulation performed in this work by first drawing a graph 𝒢𝒢\mathcal{G}caligraphic_G from the specified random graph distribution. Given the graph 𝒢𝒢\mathcal{G}caligraphic_G, we sample function parameters of the structural mechanisms over 𝒢𝒢\mathcal{G}caligraphic_G. For linear systems, we sample wi,jUnif±[a,b]similar-tosubscript𝑤𝑖𝑗subscriptUnifplus-or-minus𝑎𝑏w_{i,j}\sim\operatorname{Unif}_{\pm}{}[a,b]italic_w start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ∼ roman_Unif start_POSTSUBSCRIPT ± end_POSTSUBSCRIPT [ italic_a , italic_b ], where a,b𝑎𝑏a,bitalic_a , italic_b are fixed, i.i.d. for every graph edge. Similarly, for nonlinear systems, for every graph vertex, we draw the length scales liUnif[a1,b1]similar-tosubscript𝑙𝑖Unifsubscript𝑎1subscript𝑏1l_{i}\sim\operatorname{Unif}[a_{1},b_{1}]italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ roman_Unif [ italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] and output scales ciUnif[a2,b2]similar-tosubscript𝑐𝑖Unifsubscript𝑎2subscript𝑏2c_{i}\sim\operatorname{Unif}[a_{2},b_{2}]italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ roman_Unif [ italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] with predefined a1,b1,a2,b2subscript𝑎1subscript𝑏1subscript𝑎2subscript𝑏2a_{1},b_{1},a_{2},b_{2}italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.

Sampling data from a model

Given a graph 𝒢𝒢\mathcal{G}caligraphic_G, noise distribution 𝒫𝜺subscript𝒫𝜺\mathcal{P}_{\bm{\varepsilon}}caligraphic_P start_POSTSUBSCRIPT bold_italic_ε end_POSTSUBSCRIPT, and a set of functions {f1,fd}subscript𝑓1subscript𝑓𝑑\{f_{1},...f_{d}\}{ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … italic_f start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT }, we sample n𝑛nitalic_n datapoints from an SCM by traversing 𝒢𝒢\mathcal{G}caligraphic_G in a topological ordering. For every vertex visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, we draw a noise sample εi𝒫εinsimilar-tosubscript𝜀𝑖superscriptsubscript𝒫subscript𝜀𝑖𝑛\varepsilon_{i}\sim\mathcal{P}_{\varepsilon_{i}}^{n}italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ caligraphic_P start_POSTSUBSCRIPT italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. The sample for xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is then deterministically computed by fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT from the exogenous εisubscript𝜀𝑖\varepsilon_{i}italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and the parents of xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. To sample from a Standardized SCM, we draw a dataset from an SCM and standardize it. To sample from an iSCM, we use Algorithm 1.

E.2 Experiment Configurations

Sortability

For Figures 4(a) and 14(a), we generate Erdős-Rényi graphs (Erdős and Rényi,, 1959) with expected number of edges per vertex equal to 2222 and 4444, respectively. For Figures 4(b) and 14(b), we generate undirected scale-free graphs (Barabási and Albert,, 1999) with 2222 and 4444 edges per node respectively. Then, we order the graphs according to random topological orderings. We do not sample ordered scale-free graphs directly to avoid high sortability by in-degree, which may confound the results. For all four figures, we generate Linear systems with weights sampled from three possible distributions wi,jUnif±[0.3,1.8]similar-tosubscript𝑤𝑖𝑗subscriptUnifplus-or-minus0.31.8w_{i,j}\sim\operatorname{Unif}_{\pm}{}[0.3,1.8]italic_w start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ∼ roman_Unif start_POSTSUBSCRIPT ± end_POSTSUBSCRIPT [ 0.3 , 1.8 ], wi,jUnif±[0.5,2.0]similar-tosubscript𝑤𝑖𝑗subscriptUnifplus-or-minus0.52.0w_{i,j}\sim\operatorname{Unif}_{\pm}{}[0.5,2.0]italic_w start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ∼ roman_Unif start_POSTSUBSCRIPT ± end_POSTSUBSCRIPT [ 0.5 , 2.0 ] or wi,jUnif±[1.3,3.0]similar-tosubscript𝑤𝑖𝑗subscriptUnifplus-or-minus1.33.0w_{i,j}\sim\operatorname{Unif}_{\pm}{}[1.3,3.0]italic_w start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ∼ roman_Unif start_POSTSUBSCRIPT ± end_POSTSUBSCRIPT [ 1.3 , 3.0 ] and noise sampled from εi𝒩(0,1)similar-tosubscript𝜀𝑖𝒩01\varepsilon_{i}\sim\mathcal{N}(0,1)italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , 1 ). For every model configuration, we sample 100100100100 systems and n=𝑛absentn=italic_n =1000100010001000 data points each. We generate graphs of sizes {20,60,100,140,180,220}2060100140180220\{20,60,100,140,180,220\}{ 20 , 60 , 100 , 140 , 180 , 220 }.

Structure Learning (Section 5.2)

For Figures 5(a) and 12, we sample Linear systems with weights wi,jUnif±[0.5,2.0]similar-tosubscript𝑤𝑖𝑗subscriptUnifplus-or-minus0.52.0w_{i,j}\sim\operatorname{Unif}_{\pm}{}[0.5,2.0]italic_w start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ∼ roman_Unif start_POSTSUBSCRIPT ± end_POSTSUBSCRIPT [ 0.5 , 2.0 ]. Following Lorch et al., (2022), Nonlinear mechanisms have length scales liUnif[7.0,10.0]similar-tosubscript𝑙𝑖Unif7.010.0l_{i}\sim\operatorname{Unif}[7.0,10.0]italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ roman_Unif [ 7.0 , 10.0 ] and output scales ciUnif[10.0,20.0]similar-tosubscript𝑐𝑖Unif10.020.0c_{i}\sim\operatorname{Unif}[10.0,20.0]italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ roman_Unif [ 10.0 , 20.0 ]. Both mechanisms are defined in Appendix E.1. For Figures 13(a) and 13(b), we generate Linear systems with weights wi,jUnif±[0.3,0.8]similar-tosubscript𝑤𝑖𝑗subscriptUnifplus-or-minus0.30.8w_{i,j}\sim\operatorname{Unif}_{\pm}{}[0.3,0.8]italic_w start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ∼ roman_Unif start_POSTSUBSCRIPT ± end_POSTSUBSCRIPT [ 0.3 , 0.8 ] and wi,jUnif±[1.3,3.0]similar-tosubscript𝑤𝑖𝑗subscriptUnifplus-or-minus1.33.0w_{i,j}\sim\operatorname{Unif}_{\pm}{}[1.3,3.0]italic_w start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ∼ roman_Unif start_POSTSUBSCRIPT ± end_POSTSUBSCRIPT [ 1.3 , 3.0 ]. For all four figures, we sample random ER(20,2)ER202\operatorname{ER}(20,2)roman_ER ( 20 , 2 ) and ER(100,2)ER1002\operatorname{ER}(100,2)roman_ER ( 100 , 2 ) graphs with noise εi𝒩(0,1)similar-tosubscript𝜀𝑖𝒩01\varepsilon_{i}\sim\mathcal{N}(0,1)italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , 1 ). For every model configuration, we sample 20202020 systems and n=1000𝑛1000n=1000italic_n = 1000 data points each.

Noise Transfer

For Figure 5(b) (top), we sample SCMs, standardized SCMs, and iSCMs with exactly the same underlying graph and weights sampled from wi,jUnif±[0.5,2.0]similar-tosubscript𝑤𝑖𝑗subscriptUnifplus-or-minus0.52.0w_{i,j}\sim\operatorname{Unif}_{\pm}{}[0.5,2.0]italic_w start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ∼ roman_Unif start_POSTSUBSCRIPT ± end_POSTSUBSCRIPT [ 0.5 , 2.0 ]. The noise variables are drawn from εi𝒩(0,1)similar-tosubscript𝜀𝑖𝒩01\varepsilon_{i}\sim\mathcal{N}(0,1)italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , 1 ). Then, for every triple of SCM, standardized SCM, and iSCM that shares a graph and weights, we create two more SCMs with the same marginal variances as the SCM, but with the noise variances of the implied models of the standardized SCM and iSCM, respectively. Appendix E.5 provides a motivation and detailed explanation of this procedure. Figure 5(b) (top) shows the performance of Notears on the original SCMs and the two SCMs with transferred noise.

For Figure 5(b) (bottom), we sample multiple instances of standardized SCMs, and iSCMs with weights drawn from wi,jUnif±[0.5,2.0]similar-tosubscript𝑤𝑖𝑗subscriptUnifplus-or-minus0.52.0w_{i,j}\sim\operatorname{Unif}_{\pm}{}[0.5,2.0]italic_w start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ∼ roman_Unif start_POSTSUBSCRIPT ± end_POSTSUBSCRIPT [ 0.5 , 2.0 ] and noise from εi𝒩(0,1)similar-tosubscript𝜀𝑖𝒩01\varepsilon_{i}\sim\mathcal{N}(0,1)italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , 1 ). For every model instance, we approximate the density of the inverse of their implied noise variances using kernel density estimation. The figure shows the mean and standard deviation of the p.d.f. values over 100100100100 systems. For both figures, we use ER(100,2)ER1002\operatorname{ER}(100,2)roman_ER ( 100 , 2 ) graphs.

E.3 Methods

Notears (Zheng et al.,, 2018)

To run Notears, we use the original implementation provided by the authors of Zheng et al., (2018) (Apache-2.0 license). Before benchmarking Notears, we run a hyperparameter search to calibrate the weight penalty (λ𝜆\lambdaitalic_λ) and threshold on held-out instances of each data generation method. The hyperparameters can be found in Appendix E.4.

Avici (Lorch et al.,, 2022)

To evaluate Avici, we use the code and model checkpoints provided by the authors of the method (MIT license). Specifically, we use the model trained on linear data to benchmark the method on Linear systems and the model trained on nonlinear data to benchmark on Nonlinear systems. We score an edge as predicted if the probability prediction by Avici is greater than 0.50.50.50.5. Since the parameters are pretrained, the method has otherwise no tuneable hyperparameters.

Sortabilities and SortnRegress methods (Reisach et al.,, 2021, 2024)

To compute the sortability metrics and run the SortnRegress baselines, we use the CausalDisco library (BSD-3-Clause license) created by the authors of the method. The algorithms require no tuneable hyperparameters.

E.4 Hyperparameter Selection

To run Notears, we need to specify the regularisation strength λ𝜆\lambdaitalic_λ and a weight threshold η𝜂\etaitalic_η for thresholding the final weights for graph structure prediction. To select these hyperparameters, we run an parameter search with λ{0.0,0.05,0.1,0.15,0.2,0.25,0.3}𝜆0.00.050.10.150.20.250.3\lambda\in\{0.0,0.05,0.1,0.15,0.2,0.25,0.3\}italic_λ ∈ { 0.0 , 0.05 , 0.1 , 0.15 , 0.2 , 0.25 , 0.3 } and three possible values of the weight threshold {0.1,0.2,0.3}0.10.20.3\{0.1,0.2,0.3\}{ 0.1 , 0.2 , 0.3 }. We perform the search on a separate, held-out systems that follow the same configurations as the ones we present in our final experimental results. We run Notears 20202020 times per configuration and choose the median F1F1\operatorname{F1}F1 score as the criterion for selecting the best hyperparameters. Table 1 presents all final hyperparameter configurations. For some hyperparameter configurations, 1111 in 20202020 runs experienced numerical issues caused by the acyclicity constraint. However, this never occurs for the selected, optimal hyperparameters, neither when performing the hyperparameter search nor when running the reported experiments.

Table 1: Notears hyperparameters for all experiments. Final settings for the regularization strength λ𝜆\lambdaitalic_λ and the weight threshold η𝜂\etaitalic_η after hyperparameter tuning on the respective models and data-generating processes together with the F1 (median) validation scores achieved by Notears.
(a) ER(20,2)ER202\operatorname{ER}(20,2)roman_ER ( 20 , 2 ) DAGs, Linear mechanisms
Weight Distribution Model λ𝜆\lambdaitalic_λ η𝜂\etaitalic_η F1 (median)
Unif±[0.3,0.8]subscriptUnifplus-or-minus0.30.8\operatorname{Unif}_{\pm}{[0.3,0.8]}roman_Unif start_POSTSUBSCRIPT ± end_POSTSUBSCRIPT [ 0.3 , 0.8 ] SCM 0.05 0.20 0.97
Unif±[0.3,0.8]subscriptUnifplus-or-minus0.30.8\operatorname{Unif}_{\pm}{[0.3,0.8]}roman_Unif start_POSTSUBSCRIPT ± end_POSTSUBSCRIPT [ 0.3 , 0.8 ] Standardized SCM 0.15 0.10 0.59
Unif±[0.3,0.8]subscriptUnifplus-or-minus0.30.8\operatorname{Unif}_{\pm}{[0.3,0.8]}roman_Unif start_POSTSUBSCRIPT ± end_POSTSUBSCRIPT [ 0.3 , 0.8 ] iSCM 0.15 0.10 0.57
Unif±[0.5,2.0]subscriptUnifplus-or-minus0.52.0\operatorname{Unif}_{\pm}{[0.5,2.0]}roman_Unif start_POSTSUBSCRIPT ± end_POSTSUBSCRIPT [ 0.5 , 2.0 ] SCM 0.00 0.30 0.98
Unif±[0.5,2.0]subscriptUnifplus-or-minus0.52.0\operatorname{Unif}_{\pm}{[0.5,2.0]}roman_Unif start_POSTSUBSCRIPT ± end_POSTSUBSCRIPT [ 0.5 , 2.0 ] Standardized SCM 0.15 0.20 0.30
Unif±[0.5,2.0]subscriptUnifplus-or-minus0.52.0\operatorname{Unif}_{\pm}{[0.5,2.0]}roman_Unif start_POSTSUBSCRIPT ± end_POSTSUBSCRIPT [ 0.5 , 2.0 ] iSCM 0.15 0.10 0.50
Unif±[1.3,3.0]subscriptUnifplus-or-minus1.33.0\operatorname{Unif}_{\pm}{[1.3,3.0]}roman_Unif start_POSTSUBSCRIPT ± end_POSTSUBSCRIPT [ 1.3 , 3.0 ] SCM 0.05 0.30 0.98
Unif±[1.3,3.0]subscriptUnifplus-or-minus1.33.0\operatorname{Unif}_{\pm}{[1.3,3.0]}roman_Unif start_POSTSUBSCRIPT ± end_POSTSUBSCRIPT [ 1.3 , 3.0 ] Standardized SCM 0.25 0.10 0.24
Unif±[1.3,3.0]subscriptUnifplus-or-minus1.33.0\operatorname{Unif}_{\pm}{[1.3,3.0]}roman_Unif start_POSTSUBSCRIPT ± end_POSTSUBSCRIPT [ 1.3 , 3.0 ] iSCM 0.20 0.10 0.40
(b) ER(100,2)ER1002\operatorname{ER}(100,2)roman_ER ( 100 , 2 ) DAGs, Linear mechanisms
Weight Distribution Model λ𝜆\lambdaitalic_λ η𝜂\etaitalic_η F1 (median)
Unif±[0.3,0.8]subscriptUnifplus-or-minus0.30.8\operatorname{Unif}_{\pm}{[0.3,0.8]}roman_Unif start_POSTSUBSCRIPT ± end_POSTSUBSCRIPT [ 0.3 , 0.8 ] SCM 0.10 0.10 0.99
Unif±[0.3,0.8]subscriptUnifplus-or-minus0.30.8\operatorname{Unif}_{\pm}{[0.3,0.8]}roman_Unif start_POSTSUBSCRIPT ± end_POSTSUBSCRIPT [ 0.3 , 0.8 ] Standardized SCM 0.10 0.10 0.83
Unif±[0.3,0.8]subscriptUnifplus-or-minus0.30.8\operatorname{Unif}_{\pm}{[0.3,0.8]}roman_Unif start_POSTSUBSCRIPT ± end_POSTSUBSCRIPT [ 0.3 , 0.8 ] iSCM 0.10 0.10 0.84
Unif±[0.5,2.0]subscriptUnifplus-or-minus0.52.0\operatorname{Unif}_{\pm}{[0.5,2.0]}roman_Unif start_POSTSUBSCRIPT ± end_POSTSUBSCRIPT [ 0.5 , 2.0 ] SCM 0.05 0.30 0.94
Unif±[0.5,2.0]subscriptUnifplus-or-minus0.52.0\operatorname{Unif}_{\pm}{[0.5,2.0]}roman_Unif start_POSTSUBSCRIPT ± end_POSTSUBSCRIPT [ 0.5 , 2.0 ] Standardized SCM 0.15 0.10 0.47
Unif±[0.5,2.0]subscriptUnifplus-or-minus0.52.0\operatorname{Unif}_{\pm}{[0.5,2.0]}roman_Unif start_POSTSUBSCRIPT ± end_POSTSUBSCRIPT [ 0.5 , 2.0 ] iSCM 0.15 0.10 0.76
Unif±[1.3,3.0]subscriptUnifplus-or-minus1.33.0\operatorname{Unif}_{\pm}{[1.3,3.0]}roman_Unif start_POSTSUBSCRIPT ± end_POSTSUBSCRIPT [ 1.3 , 3.0 ] SCM 0.10 0.30 0.82
Unif±[1.3,3.0]subscriptUnifplus-or-minus1.33.0\operatorname{Unif}_{\pm}{[1.3,3.0]}roman_Unif start_POSTSUBSCRIPT ± end_POSTSUBSCRIPT [ 1.3 , 3.0 ] Standardized SCM 0.20 0.10 0.30
Unif±[1.3,3.0]subscriptUnifplus-or-minus1.33.0\operatorname{Unif}_{\pm}{[1.3,3.0]}roman_Unif start_POSTSUBSCRIPT ± end_POSTSUBSCRIPT [ 1.3 , 3.0 ] iSCM 0.15 0.10 0.70
(c) ER(20,2)ER202\operatorname{ER}(20,2)roman_ER ( 20 , 2 ) DAGs, Nonlinear mechanisms
Model λ𝜆\lambdaitalic_λ η𝜂\etaitalic_η F1 (median)
SCM 0.15 0.30 0.58
Standardized SCM 0.15 0.10 0.33
iSCM 0.15 0.20 0.42
(d) ER(100,2)ER1002\operatorname{ER}(100,2)roman_ER ( 100 , 2 ) DAGs, Nonlinear mechanisms
Model λ𝜆\lambdaitalic_λ η𝜂\etaitalic_η F1 (median)
SCM 0.30 0.30 0.50
Standardized SCM 0.15 0.10 0.43
iSCM 0.15 0.10 0.61
(e) Noise transfer experiment: ER(100,2)ER1002\operatorname{ER}(100,2)roman_ER ( 100 , 2 ) DAGs, Linear mechanisms wijUnif±[0.5,2.0]similar-tosubscript𝑤𝑖𝑗subscriptUnifplus-or-minus0.52.0w_{ij}\sim\operatorname{Unif}_{\pm}{[0.5,2.0]}italic_w start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ∼ roman_Unif start_POSTSUBSCRIPT ± end_POSTSUBSCRIPT [ 0.5 , 2.0 ]
Model λ𝜆\lambdaitalic_λ η𝜂\etaitalic_η F1 (median)
Original 0.05 0.30 0.96
Noise from standardized SCM 0.10 0.30 0.72
Noise from iSCM 0.05 0.30 0.82

E.5 Transferring Noise Variances While Kee** VarVar\operatorname{Var}roman_Var-Sortability Unchanged

Reisach et al., (2021) show that post-hoc standardization of SCM data strongly impairs the performance of Notears. When comparing the performance of Notears between data sampled from iSCM and standardized SCMs, there are at least two factors that can affect the performance of Notears, low VarVar\operatorname{Var}roman_Var-sortability and the violation of the equal noise variance assumption. Our experiments in Figure 5(b) of Section 5 aim at isolating the effect of the latter. Specifically, we investigate whether Notears performs better on VarVar\operatorname{Var}roman_Var-sortable datasets that have the noise scale patterns implied when assuming SCMs generated the data—when in fact the data was sampled from iSCMs or standardized SCMs. To achieve this, we ensure that the VarVar\operatorname{Var}roman_Var-sortability metrics of the data sampled from the models is the same, here close to 1111.

Given two linear SCMs Sasuperscript𝑆𝑎S^{a}italic_S start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT and Sbsuperscript𝑆𝑏S^{b}italic_S start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT with the same underlying graph 𝒢𝒢\mathcal{G}caligraphic_G, our goal is to construct a system Stsuperscript𝑆𝑡S^{t}italic_S start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT with the same marginal variances as Sasuperscript𝑆𝑎S^{a}italic_S start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT (condition 1) and the same noise variances as Sbsuperscript𝑆𝑏S^{b}italic_S start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT (condition 2). For this task to be well-defined, we assume that the noise variances of the root variables in Sasuperscript𝑆𝑎S^{a}italic_S start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT and Sbsuperscript𝑆𝑏S^{b}italic_S start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT are the same. The first step in constructing Stsuperscript𝑆𝑡S^{t}italic_S start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT is to copy the noise variances from Sbsuperscript𝑆𝑏S^{b}italic_S start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT, so that for every i{1,,d}𝑖1𝑑i\in\{1,...,d\}italic_i ∈ { 1 , … , italic_d }.

σi2t:=σi2b.assignsuperscriptsuperscriptsubscript𝜎𝑖2𝑡superscriptsuperscriptsubscript𝜎𝑖2𝑏{\sigma_{i}^{2}}^{t}:={\sigma_{i}^{2}}^{b}\,.italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT := italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT .

This satisfies condition 2. Given this, we define xitsuperscriptsubscript𝑥𝑖𝑡x_{i}^{t}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT as

xit:=Var[xia]σi2bVar[𝐰iaT𝐱pa(i)t]𝐰iaT𝐱pa(i)t+εit,assignsuperscriptsubscript𝑥𝑖𝑡Varsuperscriptsubscript𝑥𝑖𝑎superscriptsuperscriptsubscript𝜎𝑖2𝑏Varsuperscriptsubscriptsuperscript𝐰𝑎𝑖𝑇superscriptsubscript𝐱pa𝑖𝑡superscriptsubscriptsuperscript𝐰𝑎𝑖𝑇superscriptsubscript𝐱pa𝑖𝑡superscriptsubscript𝜀𝑖𝑡x_{i}^{t}:=\sqrt{\frac{\operatorname{Var}[x_{i}^{a}]-{\sigma_{i}^{2}}^{b}}{% \operatorname{Var}[{\mathbf{w}^{a}_{i}}^{T}\mathbf{x}_{\mathrm{pa}(i)}^{t}]}}{% \mathbf{w}^{a}_{i}}^{T}\mathbf{x}_{\mathrm{pa}(i)}^{t}+\varepsilon_{i}^{t}\,,italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT := square-root start_ARG divide start_ARG roman_Var [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT ] - italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT end_ARG start_ARG roman_Var [ bold_w start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_x start_POSTSUBSCRIPT roman_pa ( italic_i ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] end_ARG end_ARG bold_w start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_x start_POSTSUBSCRIPT roman_pa ( italic_i ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ,

where εitsuperscriptsubscript𝜀𝑖𝑡\varepsilon_{i}^{t}italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT has variance σi2tsuperscriptsuperscriptsubscript𝜎𝑖2𝑡{\sigma_{i}^{2}}^{t}italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT. By construction, the condition of Stsuperscript𝑆𝑡S^{t}italic_S start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT sharing the noise variances with Sbsuperscript𝑆𝑏S^{b}italic_S start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT and the marginal variances with Sasuperscript𝑆𝑎S^{a}italic_S start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT is fulfilled for the root variables. For all the remaining variables, it holds that

Var[xit]Varsuperscriptsubscript𝑥𝑖𝑡\displaystyle\operatorname{Var}[x_{i}^{t}]roman_Var [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] =Var[Var[xia]σi2bVar[𝐰iaT𝐱pa(i)t]𝐰iaT𝐱pa(i)t+εit]absentVarVarsuperscriptsubscript𝑥𝑖𝑎superscriptsuperscriptsubscript𝜎𝑖2𝑏Varsuperscriptsubscriptsuperscript𝐰𝑎𝑖𝑇superscriptsubscript𝐱pa𝑖𝑡superscriptsubscriptsuperscript𝐰𝑎𝑖𝑇superscriptsubscript𝐱pa𝑖𝑡superscriptsubscript𝜀𝑖𝑡\displaystyle=\operatorname{Var}\left[\sqrt{\frac{\operatorname{Var}[x_{i}^{a}% ]-{\sigma_{i}^{2}}^{b}}{\operatorname{Var}[{\mathbf{w}^{a}_{i}}^{T}\mathbf{x}_% {\mathrm{pa}(i)}^{t}]}}{\mathbf{w}^{a}_{i}}^{T}\mathbf{x}_{\mathrm{pa}(i)}^{t}% +\varepsilon_{i}^{t}\right]= roman_Var [ square-root start_ARG divide start_ARG roman_Var [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT ] - italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT end_ARG start_ARG roman_Var [ bold_w start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_x start_POSTSUBSCRIPT roman_pa ( italic_i ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] end_ARG end_ARG bold_w start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_x start_POSTSUBSCRIPT roman_pa ( italic_i ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ]
=Var[xia]σi2bVar[𝐰iaT𝐱pa(i)t]Var[𝐰iaT𝐱pa(i)t]+σi2babsentVarsuperscriptsubscript𝑥𝑖𝑎superscriptsuperscriptsubscript𝜎𝑖2𝑏Varsuperscriptsubscriptsuperscript𝐰𝑎𝑖𝑇superscriptsubscript𝐱pa𝑖𝑡Varsuperscriptsubscriptsuperscript𝐰𝑎𝑖𝑇superscriptsubscript𝐱pa𝑖𝑡superscriptsuperscriptsubscript𝜎𝑖2𝑏\displaystyle=\frac{\operatorname{Var}[x_{i}^{a}]-{\sigma_{i}^{2}}^{b}}{% \operatorname{Var}[{\mathbf{w}^{a}_{i}}^{T}\mathbf{x}_{\mathrm{pa}(i)}^{t}]}% \operatorname{Var}[{\mathbf{w}^{a}_{i}}^{T}\mathbf{x}_{\mathrm{pa}(i)}^{t}]+{% \sigma_{i}^{2}}^{b}= divide start_ARG roman_Var [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT ] - italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT end_ARG start_ARG roman_Var [ bold_w start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_x start_POSTSUBSCRIPT roman_pa ( italic_i ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] end_ARG roman_Var [ bold_w start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_x start_POSTSUBSCRIPT roman_pa ( italic_i ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] + italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT
=Var[xia],absentVarsuperscriptsubscript𝑥𝑖𝑎\displaystyle=\operatorname{Var}[x_{i}^{a}]\,,= roman_Var [ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT ] ,

which satisfies condition 1. Since the systems Stsuperscript𝑆𝑡S^{t}italic_S start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT and Sasuperscript𝑆𝑎S^{a}italic_S start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT have the same marginal variances, they have the same VarVar\operatorname{Var}roman_Var-sortability. In the noise transfer experiment of Figure 5(b), we transfer the noise variances from the implied models of iSCMs and standardized SCMs. To obtain the noise variances in the implied models, we divide the original noise variances (equal to 1111) by the estimated marginal variances of the corresponding variable before standardization, which we estimate from n=1000𝑛1000n=1000italic_n = 1000 datapoints. For iSCM, this corresponds to an empirical statistics of Equation (7).

E.6 Compute Resources

Our experiments were run on an internal cluster. All experiments in this work were computed using CPUs with 3333GB of memory per CPU, with an exception of the Avici runs on graphs with 100100100100 vertices, which used 12121212GB per CPU. The data generation takes less than a few minutes on a single CPU, with the exception of the sortability results (Section 5.1). For the sortability results, it takes around 30303030 minutes to generate the datasets for a single graph specification across all weight supports and graph sizes. This is due to a bigger number of configurations and repetitions than in the other experiments. For a single graph specification and across all weight supports and graph sizes, it takes around 6666 hours to compute the sortability statistics on a single CPU. Running one execution of Notears (Avici) takes approximately 2222min (1111min) for d=20𝑑20d=20italic_d = 20 and 30303030min (2222min) for d=100𝑑100d=100italic_d = 100, respectively. The SortnRegress baselines run in less than 1111min.

Appendix F Additional Experimental Results

F.1 Structure Learning

Figure 12 summarizes the structural Hamming distance (SHD) between the predicted and true graphs for the same datasets and algorithms as in Figure 5(a).

In Figures 13(a) and 13(b), we present the F1 scores and SHD attained by the structure learning algorithms on data of Linear iSCMs, SCMs, and standardized SCMs, across different weight distribution supports and graph sizes. We find that the difference in performance of Notears on data sampled from iSCM and standardized SCMs is larger for larger weight magnitudes and for bigger graphs. For smaller weights, the difference in the mean F1 score of Notears between the two standardization approaches is smaller, which is in line with our proposed explanation about the shifts of the implied noise variance distribution in Section 5.2.

In Figure 13(a), we also find that when weight magnitudes are below 1111, R2superscriptR2\operatorname{R^{2}}roman_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-SortnRegress performs similarly for both standardized SCMs and iSCMs. We also observe this for Avici. Meanwhile, for larger weights with support extending above 1111, these algorithms achieve significantly higher F1 scores on standardized SCMs. This suggests that our condition of |wi,j|>1subscript𝑤𝑖𝑗1\left\lvert w_{i,j}\right\rvert>1| italic_w start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT | > 1 for all edges (vi,vj)subscript𝑣𝑖subscript𝑣𝑗(v_{i},v_{j})( italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) in the statement of Theorem 3, concerning the identifiability of linear standardized SCMs, may have a more fundamental practical significance, rather than being merely an artifact of the analysis.

Refer to caption
ER(20,2)ER202\operatorname{ER}(20,2)roman_ER ( 20 , 2 ) Refer to caption Refer to caption
ER(100,2)ER1002\operatorname{ER}(100,2)roman_ER ( 100 , 2 ) Refer to caption Refer to caption
Figure 12: SHD to the true causal graph for Linear and Nonlinear mechanisms. Box plots show median and interquartile range (IQR). Whiskers extend to the largest value inside 1.51.51.51.5×\times×IQR from the boxes. Left (right) column shows results for linear (nonlinear) causal mechanisms with additive noise εi𝒩(0,1)similar-tosubscript𝜀𝑖𝒩01\varepsilon_{i}\sim\mathcal{N}(0,1)italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , 1 ). Linear mechanisms have weights wi,jUnif±[0.5,2.0]similar-tosubscript𝑤𝑖𝑗subscriptUnifplus-or-minus0.52.0w_{i,j}\sim\operatorname{Unif}_{\pm}{}[0.5,2.0]italic_w start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ∼ roman_Unif start_POSTSUBSCRIPT ± end_POSTSUBSCRIPT [ 0.5 , 2.0 ].
Refer to caption
ER(20,2)ER202\operatorname{ER}(20,2)roman_ER ( 20 , 2 ) Refer to caption Refer to caption
ER(100,2)ER1002\operatorname{ER}(100,2)roman_ER ( 100 , 2 ) Refer to caption Refer to caption
(a) F1 scores
ER(20,2)ER202\operatorname{ER}(20,2)roman_ER ( 20 , 2 ) Refer to caption Refer to caption
ER(100,2)ER1002\operatorname{ER}(100,2)roman_ER ( 100 , 2 ) Refer to caption Refer to caption
(b) SHD to the true causal graph
Figure 13: Structure learning results for different Linear weight ranges. Results for Linear causal mechanisms with additive noise εi𝒩(0,1)similar-tosubscript𝜀𝑖𝒩01\varepsilon_{i}\sim\mathcal{N}(0,1)italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , 1 ) and weights sampled uniformly from support indicated above each column. Box plots show median and interquartile range (IQR). Whiskers extend to the largest value inside 1.51.51.51.5×\times×IQR from the boxes. For every model, we sample 20202020 systems and n=𝑛absentn=italic_n =1000100010001000 data points each.

F.2 R2superscriptR2\operatorname{R^{2}}roman_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-Sortability

Figure 14 reports the R2superscriptR2\operatorname{R^{2}}roman_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-sortability statistics across varying graph sizes and weight distributions, but this time for the denser graphs ER(d𝑑ditalic_d, 4444) and SF(d𝑑ditalic_d, 4444). We again observe R2superscriptR2\operatorname{R^{2}}roman_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-sortability very close to 0.50.50.50.5 for datasets sampled from iSCM and high degrees of R2superscriptR2\operatorname{R^{2}}roman_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-sortability for data drawn from standardized SCMs. We omit standard SCMs from the plots as the datasets coming of SCMs and their standardized versions have the same R2superscriptR2\operatorname{R^{2}}roman_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-sortability due to scale-invariance of the R2superscriptR2\operatorname{R^{2}}roman_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT coefficient.

Refer to caption
Refer to caption
(a) ER(d,4)ER𝑑4\operatorname{ER}(d,4)roman_ER ( italic_d , 4 )
Refer to caption
(b) SF(d,4)SF𝑑4\operatorname{SF}(d,4)roman_SF ( italic_d , 4 )
Figure 14: R2superscriptR2\operatorname{R^{2}}roman_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-sortability for different graph sizes. Linear standardized SCMs and iSCMs with εi𝒩(0,1)similar-tosubscript𝜀𝑖𝒩01\varepsilon_{i}\sim\mathcal{N}(0,1)italic_ε start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , 1 ) and weights drawn from uniform distributions with supports given above each plot. For every model, we sample 100100100100 systems and n=𝑛absentn=italic_n =1000100010001000 data points each. Lines and shaded regions denote mean and standard deviation of R2superscriptR2\operatorname{R^{2}}roman_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-sortability across runs. Datasets that satisfy R2superscriptR2\operatorname{R^{2}}roman_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-sortability=0.5absent0.5\,=0.5= 0.5 (dashed) are not R2superscriptR2\operatorname{R^{2}}roman_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-sortable.

F.3 Covariance Matrices for Figure 1

Figure 15 visualizes the full mean absolute covariance (correlation) matrices of the systems presented in Figure 1. The matrix shows that the pattern of increasing mean absolute covariance in standardized SCMs is not only a feature of neighboring nodes, but it also occurs for vertex pairs further apart, though less strongly. This is not the case for iSCMs, where any two pairs of equally spaced vertices have equal covariances in expectation over the weight sampling distribution.

Refer to caption
Figure 15: Mean absolute covariance matrices for models in Figure 1. Linear standardized SCMs (left) and iSCMs (right) with 10101010-variable chain DAGs from x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT to x10subscript𝑥10x_{10}italic_x start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT and weights wi,jUnif±[0.5,2.0]similar-tosubscript𝑤𝑖𝑗subscriptUnifplus-or-minus0.52.0w_{i,j}\sim\operatorname{Unif}_{\pm}{}[0.5,2.0]italic_w start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ∼ roman_Unif start_POSTSUBSCRIPT ± end_POSTSUBSCRIPT [ 0.5 , 2.0 ] and additive noise from 𝒩(0,1)𝒩01\mathcal{N}(0,1)caligraphic_N ( 0 , 1 ). Mean covariances are estimated from n=𝑛absentn=italic_n = 100000100000100000100000 datapoints and averaged over 100000100000100000100000 models. Since both models have unit marginal variances, covariance equals correlation.