Search | arXiv e-print repository

Confounder selection via iterative graph expansion

Abstract: Confounder selection, namely choosing a set of covariates to control for confounding between a treatment and an outcome, is arguably the most important step in the design of observational studies. Previous methods, such as Pearl's celebrated back-door criterion, typically require pre-specifying a causal graph, which can often be difficult in practice. We propose an interactive procedure for confou… ▽ More Confounder selection, namely choosing a set of covariates to control for confounding between a treatment and an outcome, is arguably the most important step in the design of observational studies. Previous methods, such as Pearl's celebrated back-door criterion, typically require pre-specifying a causal graph, which can often be difficult in practice. We propose an interactive procedure for confounder selection that does not require pre-specifying the graph or the set of observed variables. This procedure iteratively expands the causal graph by finding what we call "primary adjustment sets" for a pair of possibly confounded variables. This can be viewed as inverting a sequence of latent projections of the underlying causal graph. Structural information in the form of primary adjustment sets is elicited from the user, bit by bit, until either a set of covariates are found to control for confounding or it can be determined that no such set exists. Other information, such as the causal relations between confounders, is not required by the procedure. We show that if the user correctly specifies the primary adjustment sets in every step, our procedure is both sound and complete. △ Less

Submitted 24 October, 2023; v1 submitted 12 September, 2023; originally announced September 2023.

Comments: 29 pages; added link to Shiny web app

arXiv:2301.02739 [pdf, other]

Rank-transformed subsampling: inference for multiple data splitting and exchangeable p-values

Authors: F. Richard Guo, Rajen D. Shah

Abstract: Many testing problems are readily amenable to randomised tests such as those employing data splitting. However despite their usefulness in principle, randomised tests have obvious drawbacks. Firstly, two analyses of the same dataset may lead to different results. Secondly, the test typically loses power because it does not fully utilise the entire sample. As a remedy to these drawbacks, we study h… ▽ More Many testing problems are readily amenable to randomised tests such as those employing data splitting. However despite their usefulness in principle, randomised tests have obvious drawbacks. Firstly, two analyses of the same dataset may lead to different results. Secondly, the test typically loses power because it does not fully utilise the entire sample. As a remedy to these drawbacks, we study how to combine the test statistics or p-values resulting from multiple random realisations such as through random data splits. We develop rank-transformed subsampling as a general method for delivering large sample inference about the combined statistic or p-value under mild assumptions. We apply our methodology to a wide range of problems, including testing unimodality in high-dimensional data, testing goodness-of-fit of parametric quantile regression models, testing no direct effect in a sequentially randomised trial and calibrating cross-fit double machine learning confidence intervals. In contrast to existing p-value aggregation schemes that can be highly conservative, our method enjoys type-I error control that asymptotically approaches the nominal level. Moreover, compared to using the ordinary subsampling, we show that our rank transform can remove the first-order bias in approximating the null under alternatives and greatly improve power. △ Less

Submitted 22 January, 2024; v1 submitted 6 January, 2023; originally announced January 2023.

Comments: 83 pages; new power theory in Sec 3.2 and Appendix D, new DML example in Appendix E and various other edits

arXiv:2208.13871 [pdf, ps, other]

Confounder Selection: Objectives and Approaches

Authors: F. Richard Guo, Anton Rask Lundborg, Qingyuan Zhao

Abstract: Confounder selection is perhaps the most important step in the design of observational studies. A number of criteria, often with different objectives and approaches, have been proposed, and their validity and practical value have been debated in the literature. Here, we provide a unified review of these criteria and the assumptions behind them. We list several objectives that confounder selection… ▽ More Confounder selection is perhaps the most important step in the design of observational studies. A number of criteria, often with different objectives and approaches, have been proposed, and their validity and practical value have been debated in the literature. Here, we provide a unified review of these criteria and the assumptions behind them. We list several objectives that confounder selection methods aim to achieve and discuss the amount of structural knowledge required by different approaches. Finally, we discuss limitations of the existing approaches and implications for practitioners. △ Less

Submitted 24 September, 2023; v1 submitted 29 August, 2022; originally announced August 2022.

Comments: 15 pages

arXiv:2202.11994 [pdf, ps, other]

doi 10.1093/biomet/asac062

Variable elimination, graph reduction and efficient g-formula

Authors: F. Richard Guo, Emilija Perković, Andrea Rotnitzky

Abstract: We study efficient estimation of an interventional mean associated with a point exposure treatment under a causal graphical model represented by a directed acyclic graph without hidden variables. Under such a model, it may happen that a subset of the variables are uninformative in that failure to measure them neither precludes identification of the interventional mean nor changes the semiparametri… ▽ More We study efficient estimation of an interventional mean associated with a point exposure treatment under a causal graphical model represented by a directed acyclic graph without hidden variables. Under such a model, it may happen that a subset of the variables are uninformative in that failure to measure them neither precludes identification of the interventional mean nor changes the semiparametric variance bound for regular estimators of it. We develop a set of graphical criteria that are sound and complete for eliminating all the uninformative variables so that the cost of measuring them can be saved without sacrificing estimation efficiency, which could be useful when designing a planned observational or randomized study. Further, we construct a reduced directed acyclic graph on the set of informative variables only. We show that the interventional mean is identified from the marginal law by the g-formula (Robins, 1986) associated with the reduced graph, and the semiparametric variance bounds for estimating the interventional mean under the original and the reduced graphical model agree. This g-formula is an irreducible, efficient identifying formula in the sense that the nonparametric estimator of the formula, under regularity conditions, is asymptotically efficient under the original causal graphical model, and no formula with such property exists that only depends on a strict subset of the variables. △ Less

Submitted 2 December, 2022; v1 submitted 24 February, 2022; originally announced February 2022.

Comments: 67 pages; to appear in Biometrika

arXiv:2103.02323 [pdf, ps, other]

doi 10.1093/biomet/asab029

Discussion of 'Estimating time-varying causal excursion effect in mobile health with binary outcomes' by T. Qian et al

Authors: F. Richard Guo, Thomas S. Richardson, James M. Robins

Abstract: We discuss the recent paper on "excursion effect" by T. Qian et al. (2020). We show that the methods presented have close relationships to others in the literature, in particular to a series of papers by Robins, Hernán and collaborators on analyzing observational studies as a series of randomized trials. There is also a close relationship to the history-restricted and the history-adjusted marginal… ▽ More We discuss the recent paper on "excursion effect" by T. Qian et al. (2020). We show that the methods presented have close relationships to others in the literature, in particular to a series of papers by Robins, Hernán and collaborators on analyzing observational studies as a series of randomized trials. There is also a close relationship to the history-restricted and the history-adjusted marginal structural models (MSM). Important differences and their methodological implications are clarified. We also demonstrate that the excursion effect can depend on the design and discuss its suitability for modifying the treatment protocol. △ Less

Submitted 3 March, 2021; originally announced March 2021.

Comments: Submitted to Biometrika as an invited discussion

arXiv:2010.08611 [pdf, other]

Minimal enumeration of all possible total effects in a Markov equivalence class

Authors: F. Richard Guo, Emilija Perković

Abstract: In observational studies, when a total causal effect of interest is not identified, the set of all possible effects can be reported instead. This typically occurs when the underlying causal DAG is only known up to a Markov equivalence class, or a refinement thereof due to background knowledge. As such, the class of possible causal DAGs is represented by a maximally oriented partially directed acyc… ▽ More In observational studies, when a total causal effect of interest is not identified, the set of all possible effects can be reported instead. This typically occurs when the underlying causal DAG is only known up to a Markov equivalence class, or a refinement thereof due to background knowledge. As such, the class of possible causal DAGs is represented by a maximally oriented partially directed acyclic graph (MPDAG), which contains both directed and undirected edges. We characterize the minimal additional edge orientations required to identify a given total effect. A recursive algorithm is then developed to enumerate subclasses of DAGs, such that the total effect in each subclass is identified as a distinct functional of the observed distribution. This resolves an issue with existing methods, which often report possible total effects with duplicates, namely those that are numerically distinct due to sampling variability but are in fact causally identical. △ Less

Submitted 2 March, 2021; v1 submitted 16 October, 2020; originally announced October 2020.

Comments: Corrected Figure 7

arXiv:2008.03481 [pdf, other]

Efficient least squares for estimating total effects under linearity and causal sufficiency

Authors: F. Richard Guo, Emilija Perković

Abstract: Recursive linear structural equation models are widely used to postulate causal mechanisms underlying observational data. In these models, each variable equals a linear combination of a subset of the remaining variables plus an error term. When there is no unobserved confounding or selection bias, the error terms are assumed to be independent. We consider estimating a total causal effect in this s… ▽ More Recursive linear structural equation models are widely used to postulate causal mechanisms underlying observational data. In these models, each variable equals a linear combination of a subset of the remaining variables plus an error term. When there is no unobserved confounding or selection bias, the error terms are assumed to be independent. We consider estimating a total causal effect in this setting. The causal structure is assumed to be known only up to a maximally oriented partially directed acyclic graph (MPDAG), a general class of graphs that can represent a Markov equivalence class of directed acyclic graphs (DAGs) with added background knowledge. We propose a simple estimator based on recursive least squares, which can consistently estimate any identified total causal effect, under point or joint intervention. We show that this estimator is the most efficient among all regular estimators that are based on the sample covariance, which includes covariate adjustment and the estimators employed by the joint-IDA algorithm. Notably, our result holds without assuming Gaussian errors. △ Less

Submitted 17 March, 2022; v1 submitted 8 August, 2020; originally announced August 2020.

Comments: Minor edits

arXiv:2002.02564 [pdf, other]

Empirical Bayes for Large-scale Randomized Experiments: a Spectral Approach

Authors: F. Richard Guo, James McQueen, Thomas S. Richardson

Abstract: Large-scale randomized experiments, sometimes called A/B tests, are increasingly prevalent in many industries. Though such experiments are often analyzed via frequentist $t$-tests, arguably such analyses are deficient: $p$-values are hard to interpret and not easily incorporated into decision-making. As an alternative, we propose an empirical Bayes approach, which assumes that the treatment effect… ▽ More Large-scale randomized experiments, sometimes called A/B tests, are increasingly prevalent in many industries. Though such experiments are often analyzed via frequentist $t$-tests, arguably such analyses are deficient: $p$-values are hard to interpret and not easily incorporated into decision-making. As an alternative, we propose an empirical Bayes approach, which assumes that the treatment effects are realized from a "true prior". This requires inferring the prior from previous experiments. Following Robbins, we estimate a family of marginal densities of empirical effects, indexed by the noise scale. We show that this family is characterized by the heat equation. We develop a spectral maximum likelihood estimate based on a Fourier series representation, which can be efficiently computed via convex optimization. In order to select hyperparameters and compare models, we describe two model selection criteria. We demonstrate our method on simulated and real data, and compare posterior inference to that under a Gaussian mixture model of the prior. △ Less

Submitted 25 March, 2020; v1 submitted 6 February, 2020; originally announced February 2020.

Comments: Corrections and notational changes to Sec 4.4; added acknowledgments; some contents of Sec 2.3 are moved to the Appendix

arXiv:1906.01850 [pdf, other]

doi 10.1093/biomet/asaa040

On Testing Marginal versus Conditional Independence

Authors: F. Richard Guo, Thomas S. Richardson

Abstract: We consider testing marginal independence versus conditional independence in a trivariate Gaussian setting. The two models are non-nested and their intersection is a union of two marginal independences. We consider two sequences of such models, one from each type of independence, that are closest to each other in the Kullback-Leibler sense as they approach the intersection. They become indistingui… ▽ More We consider testing marginal independence versus conditional independence in a trivariate Gaussian setting. The two models are non-nested and their intersection is a union of two marginal independences. We consider two sequences of such models, one from each type of independence, that are closest to each other in the Kullback-Leibler sense as they approach the intersection. They become indistinguishable if the signal strength, as measured by the product of two correlation parameters, decreases faster than the standard parametric rate. Under local alternatives at such rate, we show that the asymptotic distribution of the likelihood ratio depends on where and how the local alternatives approach the intersection. To deal with this non-uniformity, we study a class of "envelope" distributions by taking pointwise suprema over asymptotic cumulative distribution functions. We show that these envelope distributions are well-behaved and lead to model selection procedures with rate-free uniform error guarantees and near-optimal power. To control the error even when the two models are indistinguishable, rather than insist on a dichotomous choice, the proposed procedure will choose either or both models. △ Less

Submitted 10 January, 2020; v1 submitted 5 June, 2019; originally announced June 2019.

Comments: Revisions and updated references

Showing 1–9 of 9 results for author: Guo, F R