-
Confounder selection via iterative graph expansion
Authors:
F. Richard Guo,
Qingyuan Zhao
Abstract:
Confounder selection, namely choosing a set of covariates to control for confounding between a treatment and an outcome, is arguably the most important step in the design of observational studies. Previous methods, such as Pearl's celebrated back-door criterion, typically require pre-specifying a causal graph, which can often be difficult in practice. We propose an interactive procedure for confou…
▽ More
Confounder selection, namely choosing a set of covariates to control for confounding between a treatment and an outcome, is arguably the most important step in the design of observational studies. Previous methods, such as Pearl's celebrated back-door criterion, typically require pre-specifying a causal graph, which can often be difficult in practice. We propose an interactive procedure for confounder selection that does not require pre-specifying the graph or the set of observed variables. This procedure iteratively expands the causal graph by finding what we call "primary adjustment sets" for a pair of possibly confounded variables. This can be viewed as inverting a sequence of latent projections of the underlying causal graph. Structural information in the form of primary adjustment sets is elicited from the user, bit by bit, until either a set of covariates are found to control for confounding or it can be determined that no such set exists. Other information, such as the causal relations between confounders, is not required by the procedure. We show that if the user correctly specifies the primary adjustment sets in every step, our procedure is both sound and complete.
△ Less
Submitted 24 October, 2023; v1 submitted 12 September, 2023;
originally announced September 2023.
-
Rank-transformed subsampling: inference for multiple data splitting and exchangeable p-values
Authors:
F. Richard Guo,
Rajen D. Shah
Abstract:
Many testing problems are readily amenable to randomised tests such as those employing data splitting. However despite their usefulness in principle, randomised tests have obvious drawbacks. Firstly, two analyses of the same dataset may lead to different results. Secondly, the test typically loses power because it does not fully utilise the entire sample. As a remedy to these drawbacks, we study h…
▽ More
Many testing problems are readily amenable to randomised tests such as those employing data splitting. However despite their usefulness in principle, randomised tests have obvious drawbacks. Firstly, two analyses of the same dataset may lead to different results. Secondly, the test typically loses power because it does not fully utilise the entire sample. As a remedy to these drawbacks, we study how to combine the test statistics or p-values resulting from multiple random realisations such as through random data splits. We develop rank-transformed subsampling as a general method for delivering large sample inference about the combined statistic or p-value under mild assumptions. We apply our methodology to a wide range of problems, including testing unimodality in high-dimensional data, testing goodness-of-fit of parametric quantile regression models, testing no direct effect in a sequentially randomised trial and calibrating cross-fit double machine learning confidence intervals. In contrast to existing p-value aggregation schemes that can be highly conservative, our method enjoys type-I error control that asymptotically approaches the nominal level. Moreover, compared to using the ordinary subsampling, we show that our rank transform can remove the first-order bias in approximating the null under alternatives and greatly improve power.
△ Less
Submitted 22 January, 2024; v1 submitted 6 January, 2023;
originally announced January 2023.
-
Confounder Selection: Objectives and Approaches
Authors:
F. Richard Guo,
Anton Rask Lundborg,
Qingyuan Zhao
Abstract:
Confounder selection is perhaps the most important step in the design of observational studies. A number of criteria, often with different objectives and approaches, have been proposed, and their validity and practical value have been debated in the literature. Here, we provide a unified review of these criteria and the assumptions behind them. We list several objectives that confounder selection…
▽ More
Confounder selection is perhaps the most important step in the design of observational studies. A number of criteria, often with different objectives and approaches, have been proposed, and their validity and practical value have been debated in the literature. Here, we provide a unified review of these criteria and the assumptions behind them. We list several objectives that confounder selection methods aim to achieve and discuss the amount of structural knowledge required by different approaches. Finally, we discuss limitations of the existing approaches and implications for practitioners.
△ Less
Submitted 24 September, 2023; v1 submitted 29 August, 2022;
originally announced August 2022.
-
Variable elimination, graph reduction and efficient g-formula
Authors:
F. Richard Guo,
Emilija Perković,
Andrea Rotnitzky
Abstract:
We study efficient estimation of an interventional mean associated with a point exposure treatment under a causal graphical model represented by a directed acyclic graph without hidden variables. Under such a model, it may happen that a subset of the variables are uninformative in that failure to measure them neither precludes identification of the interventional mean nor changes the semiparametri…
▽ More
We study efficient estimation of an interventional mean associated with a point exposure treatment under a causal graphical model represented by a directed acyclic graph without hidden variables. Under such a model, it may happen that a subset of the variables are uninformative in that failure to measure them neither precludes identification of the interventional mean nor changes the semiparametric variance bound for regular estimators of it. We develop a set of graphical criteria that are sound and complete for eliminating all the uninformative variables so that the cost of measuring them can be saved without sacrificing estimation efficiency, which could be useful when designing a planned observational or randomized study. Further, we construct a reduced directed acyclic graph on the set of informative variables only. We show that the interventional mean is identified from the marginal law by the g-formula (Robins, 1986) associated with the reduced graph, and the semiparametric variance bounds for estimating the interventional mean under the original and the reduced graphical model agree. This g-formula is an irreducible, efficient identifying formula in the sense that the nonparametric estimator of the formula, under regularity conditions, is asymptotically efficient under the original causal graphical model, and no formula with such property exists that only depends on a strict subset of the variables.
△ Less
Submitted 2 December, 2022; v1 submitted 24 February, 2022;
originally announced February 2022.
-
Discussion of 'Estimating time-varying causal excursion effect in mobile health with binary outcomes' by T. Qian et al
Authors:
F. Richard Guo,
Thomas S. Richardson,
James M. Robins
Abstract:
We discuss the recent paper on "excursion effect" by T. Qian et al. (2020). We show that the methods presented have close relationships to others in the literature, in particular to a series of papers by Robins, Hernán and collaborators on analyzing observational studies as a series of randomized trials. There is also a close relationship to the history-restricted and the history-adjusted marginal…
▽ More
We discuss the recent paper on "excursion effect" by T. Qian et al. (2020). We show that the methods presented have close relationships to others in the literature, in particular to a series of papers by Robins, Hernán and collaborators on analyzing observational studies as a series of randomized trials. There is also a close relationship to the history-restricted and the history-adjusted marginal structural models (MSM). Important differences and their methodological implications are clarified. We also demonstrate that the excursion effect can depend on the design and discuss its suitability for modifying the treatment protocol.
△ Less
Submitted 3 March, 2021;
originally announced March 2021.
-
Minimal enumeration of all possible total effects in a Markov equivalence class
Authors:
F. Richard Guo,
Emilija Perković
Abstract:
In observational studies, when a total causal effect of interest is not identified, the set of all possible effects can be reported instead. This typically occurs when the underlying causal DAG is only known up to a Markov equivalence class, or a refinement thereof due to background knowledge. As such, the class of possible causal DAGs is represented by a maximally oriented partially directed acyc…
▽ More
In observational studies, when a total causal effect of interest is not identified, the set of all possible effects can be reported instead. This typically occurs when the underlying causal DAG is only known up to a Markov equivalence class, or a refinement thereof due to background knowledge. As such, the class of possible causal DAGs is represented by a maximally oriented partially directed acyclic graph (MPDAG), which contains both directed and undirected edges. We characterize the minimal additional edge orientations required to identify a given total effect. A recursive algorithm is then developed to enumerate subclasses of DAGs, such that the total effect in each subclass is identified as a distinct functional of the observed distribution. This resolves an issue with existing methods, which often report possible total effects with duplicates, namely those that are numerically distinct due to sampling variability but are in fact causally identical.
△ Less
Submitted 2 March, 2021; v1 submitted 16 October, 2020;
originally announced October 2020.
-
Efficient least squares for estimating total effects under linearity and causal sufficiency
Authors:
F. Richard Guo,
Emilija Perković
Abstract:
Recursive linear structural equation models are widely used to postulate causal mechanisms underlying observational data. In these models, each variable equals a linear combination of a subset of the remaining variables plus an error term. When there is no unobserved confounding or selection bias, the error terms are assumed to be independent. We consider estimating a total causal effect in this s…
▽ More
Recursive linear structural equation models are widely used to postulate causal mechanisms underlying observational data. In these models, each variable equals a linear combination of a subset of the remaining variables plus an error term. When there is no unobserved confounding or selection bias, the error terms are assumed to be independent. We consider estimating a total causal effect in this setting. The causal structure is assumed to be known only up to a maximally oriented partially directed acyclic graph (MPDAG), a general class of graphs that can represent a Markov equivalence class of directed acyclic graphs (DAGs) with added background knowledge. We propose a simple estimator based on recursive least squares, which can consistently estimate any identified total causal effect, under point or joint intervention. We show that this estimator is the most efficient among all regular estimators that are based on the sample covariance, which includes covariate adjustment and the estimators employed by the joint-IDA algorithm. Notably, our result holds without assuming Gaussian errors.
△ Less
Submitted 17 March, 2022; v1 submitted 8 August, 2020;
originally announced August 2020.
-
Empirical Bayes for Large-scale Randomized Experiments: a Spectral Approach
Authors:
F. Richard Guo,
James McQueen,
Thomas S. Richardson
Abstract:
Large-scale randomized experiments, sometimes called A/B tests, are increasingly prevalent in many industries. Though such experiments are often analyzed via frequentist $t$-tests, arguably such analyses are deficient: $p$-values are hard to interpret and not easily incorporated into decision-making. As an alternative, we propose an empirical Bayes approach, which assumes that the treatment effect…
▽ More
Large-scale randomized experiments, sometimes called A/B tests, are increasingly prevalent in many industries. Though such experiments are often analyzed via frequentist $t$-tests, arguably such analyses are deficient: $p$-values are hard to interpret and not easily incorporated into decision-making. As an alternative, we propose an empirical Bayes approach, which assumes that the treatment effects are realized from a "true prior". This requires inferring the prior from previous experiments. Following Robbins, we estimate a family of marginal densities of empirical effects, indexed by the noise scale. We show that this family is characterized by the heat equation. We develop a spectral maximum likelihood estimate based on a Fourier series representation, which can be efficiently computed via convex optimization. In order to select hyperparameters and compare models, we describe two model selection criteria. We demonstrate our method on simulated and real data, and compare posterior inference to that under a Gaussian mixture model of the prior.
△ Less
Submitted 25 March, 2020; v1 submitted 6 February, 2020;
originally announced February 2020.
-
On Testing Marginal versus Conditional Independence
Authors:
F. Richard Guo,
Thomas S. Richardson
Abstract:
We consider testing marginal independence versus conditional independence in a trivariate Gaussian setting. The two models are non-nested and their intersection is a union of two marginal independences. We consider two sequences of such models, one from each type of independence, that are closest to each other in the Kullback-Leibler sense as they approach the intersection. They become indistingui…
▽ More
We consider testing marginal independence versus conditional independence in a trivariate Gaussian setting. The two models are non-nested and their intersection is a union of two marginal independences. We consider two sequences of such models, one from each type of independence, that are closest to each other in the Kullback-Leibler sense as they approach the intersection. They become indistinguishable if the signal strength, as measured by the product of two correlation parameters, decreases faster than the standard parametric rate. Under local alternatives at such rate, we show that the asymptotic distribution of the likelihood ratio depends on where and how the local alternatives approach the intersection. To deal with this non-uniformity, we study a class of "envelope" distributions by taking pointwise suprema over asymptotic cumulative distribution functions. We show that these envelope distributions are well-behaved and lead to model selection procedures with rate-free uniform error guarantees and near-optimal power. To control the error even when the two models are indistinguishable, rather than insist on a dichotomous choice, the proposed procedure will choose either or both models.
△ Less
Submitted 10 January, 2020; v1 submitted 5 June, 2019;
originally announced June 2019.