Search | arXiv e-print repository

Sensitivity analysis with multiple treatments and multiple outcomes with applications to air pollution mixtures

Authors: Suyeon Kang, Alexander Franks, Joseph Antonelli

Abstract: Understanding the health impacts of air pollution is vital in public health research. Numerous studies have estimated negative health effects of a variety of pollutants, but accurately gauging these impacts remains challenging due to the potential for unmeasured confounding bias that is ubiquitous in observational studies. In this study, we develop a framework for sensitivity analysis in settings… ▽ More Understanding the health impacts of air pollution is vital in public health research. Numerous studies have estimated negative health effects of a variety of pollutants, but accurately gauging these impacts remains challenging due to the potential for unmeasured confounding bias that is ubiquitous in observational studies. In this study, we develop a framework for sensitivity analysis in settings with both multiple treatments and multiple outcomes simultaneously. This setting is of particular interest because one can identify the strength of association between the unmeasured confounders and both the treatment and outcome, under a factor confounding assumption. This provides informative bounds on the causal effect leading to partial identification regions for the effects of multivariate treatments that account for the maximum possible bias from unmeasured confounding. We also show that when negative controls are available, we are able to refine the partial identification regions substantially, and in certain cases, even identify the causal effect in the presence of unmeasured confounding. We derive partial identification regions for general estimands in this setting, and develop a novel computational approach to finding these regions. △ Less

Submitted 20 November, 2023; originally announced November 2023.

Comments: 29 pages, 6 figures

arXiv:2208.06552 [pdf, other]

Sensitivity to Unobserved Confounding in Studies with Factor-structured Outcomes

Authors: Jia**g Zheng, Jiaxi Wu, Alexander D'Amour, Alexander Franks

Abstract: In this work, we propose an approach for assessing sensitivity to unobserved confounding in studies with multiple outcomes. We demonstrate how prior knowledge unique to the multi-outcome setting can be leveraged to strengthen causal conclusions beyond what can be achieved from analyzing individual outcomes in isolation. We argue that it is often reasonable to make a shared confounding assumption,… ▽ More In this work, we propose an approach for assessing sensitivity to unobserved confounding in studies with multiple outcomes. We demonstrate how prior knowledge unique to the multi-outcome setting can be leveraged to strengthen causal conclusions beyond what can be achieved from analyzing individual outcomes in isolation. We argue that it is often reasonable to make a shared confounding assumption, under which residual dependence amongst outcomes can be used to simplify and sharpen sensitivity analyses. We focus on a class of factor models for which we can bound the causal effects for all outcomes conditional on a single sensitivity parameter that represents the fraction of treatment variance explained by unobserved confounders. We characterize how causal ignorance regions shrink under additional prior assumptions about the presence of null control outcomes, and provide new approaches for quantifying the robustness of causal effect estimates. Finally, we illustrate our sensitivity analysis workflow in practice, in an analysis of both simulated data and a case study with data from the National Health and Nutrition Examination Survey (NHANES). △ Less

Submitted 24 January, 2023; v1 submitted 12 August, 2022; originally announced August 2022.

arXiv:2111.07973 [pdf, other]

Bayesian Inference and Partial Identification in Multi-Treatment Causal Inference with Unobserved Confounding

Authors: Jia**g Zheng, Alexander D'Amour, Alexander Franks

Abstract: In causal estimation problems, the parameter of interest is often only partially identified, implying that the parameter cannot be recovered exactly, even with infinite data. Here, we study Bayesian inference for partially identified treatment effects in multi-treatment causal inference problems with unobserved confounding. In principle, inferring the partially identified treatment effects is natu… ▽ More In causal estimation problems, the parameter of interest is often only partially identified, implying that the parameter cannot be recovered exactly, even with infinite data. Here, we study Bayesian inference for partially identified treatment effects in multi-treatment causal inference problems with unobserved confounding. In principle, inferring the partially identified treatment effects is natural under the Bayesian paradigm, but the results can be highly sensitive to parameterization and prior specification, often in surprising ways. It is thus essential to understand which aspects of the conclusions about treatment effects are driven entirely by the prior specification. We use a so-called transparent parameterization to contextualize the effects of more interpretable scientifically motivated prior specifications on the multiple effects. We demonstrate our analysis in an example quantifying the effects of gene expression levels on mouse obesity. △ Less

Submitted 23 April, 2022; v1 submitted 15 November, 2021; originally announced November 2021.

arXiv:2110.07006 [pdf, other]

Estimating the effects of a California gun control program with Multitask Gaussian Processes

Authors: Eli Ben-Michael, David Arbour, Avi Feller, Alex Franks, Steven Raphael

Abstract: Gun violence is a critical public safety concern in the United States. In 2006 California implemented a unique firearm monitoring program, the Armed and Prohibited Persons System (APPS), to address gun violence in the state. The APPS program first identifies those firearm owners who become prohibited from owning one due to federal or state law, then confiscates their firearms. Our goal is to asses… ▽ More Gun violence is a critical public safety concern in the United States. In 2006 California implemented a unique firearm monitoring program, the Armed and Prohibited Persons System (APPS), to address gun violence in the state. The APPS program first identifies those firearm owners who become prohibited from owning one due to federal or state law, then confiscates their firearms. Our goal is to assess the effect of APPS on California murder rates using annual, state-level crime data across the US for the years before and after the introduction of the program. To do so, we adapt a non-parametric Bayesian approach, multitask Gaussian Processes (MTGPs), to the panel data setting. MTGPs allow for flexible and parsimonious panel data models that nest many existing approaches and allow for direct control over both dependence across time and dependence across units, as well as natural uncertainty quantification. We extend this approach to incorporate non-Normal outcomes, auxiliary covariates, and multiple outcome series, which are all important in our application. We also show that this approach has attractive Frequentist properties, including a representation as a weighting estimator with separate weights over units and time periods. Applying this approach, we find that the increased monitoring and enforcement from the APPS program substantially decreased homicides in California. We also find that the effect on murder is driven entirely by declines in gun-related murder with no measurable effect on non-gun murder. Estimated cost per murder avoided are substantially lower than conventional estimates of the value of a statistical life, suggesting a very high benefit-cost ratio for this enforcement effort. △ Less

Submitted 8 June, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

arXiv:2105.06600 [pdf, other]

Learning Gaussian Graphical Models with Latent Confounders

Authors: Ke Wang, Alexander Franks, Sang-Yun Oh

Abstract: Gaussian Graphical models (GGM) are widely used to estimate the network structures in many applications ranging from biology to finance. In practice, data is often corrupted by latent confounders which biases inference of the underlying true graphical structure. In this paper, we compare and contrast two strategies for inference in graphical models with latent confounders: Gaussian graphical model… ▽ More Gaussian Graphical models (GGM) are widely used to estimate the network structures in many applications ranging from biology to finance. In practice, data is often corrupted by latent confounders which biases inference of the underlying true graphical structure. In this paper, we compare and contrast two strategies for inference in graphical models with latent confounders: Gaussian graphical models with latent variables (LVGGM) and PCA-based removal of confounding (PCA+GGM). While these two approaches have similar goals, they are motivated by different assumptions about confounding. In this paper, we explore the connection between these two approaches and propose a new method, which combines the strengths of these two approaches. We prove the consistency and convergence rate for the PCA-based method and use these results to provide guidance about when to use each method. We demonstrate the effectiveness of our methodology using both simulations and in two real-world applications. △ Less

Submitted 23 July, 2023; v1 submitted 13 May, 2021; originally announced May 2021.

arXiv:2104.05762 [pdf, other]

Deconfounding Scores: Feature Representations for Causal Effect Estimation with Weak Overlap

Authors: Alexander D'Amour, Alexander Franks

Abstract: A key condition for obtaining reliable estimates of the causal effect of a treatment is overlap (a.k.a. positivity): the distributions of the features used to perform causal adjustment cannot be too different in the treated and control groups. In cases where overlap is poor, causal effect estimators can become brittle, especially when they incorporate weighting. To address this problem, a number o… ▽ More A key condition for obtaining reliable estimates of the causal effect of a treatment is overlap (a.k.a. positivity): the distributions of the features used to perform causal adjustment cannot be too different in the treated and control groups. In cases where overlap is poor, causal effect estimators can become brittle, especially when they incorporate weighting. To address this problem, a number of proposals (including confounder selection or dimension reduction methods) incorporate feature representations to induce better overlap between the treated and control groups. A key concern in these proposals is that the representation may introduce confounding bias into the effect estimator. In this paper, we introduce deconfounding scores, which are feature representations that induce better overlap without biasing the target of estimation. We show that deconfounding scores satisfy a zero-covariance condition that is identifiable in observed data. As a proof of concept, we characterize a family of deconfounding scores in a simplified setting with Gaussian covariates, and show that in some simple simulations, these scores can be used to construct estimators with good finite-sample properties. In particular, we show that this technique could be an attractive alternative to standard regularizations that are often applied to IPW and balancing weights. △ Less

Submitted 12 April, 2021; originally announced April 2021.

Comments: A previous version of this paper was presented at the NeurIPS 2019 Causal ML workshop (https://tripods.cis.cornell.edu/neurips19_causalml/)

arXiv:2102.09412 [pdf, other]

Copula-based Sensitivity Analysis for Multi-Treatment Causal Inference with Unobserved Confounding

Authors: Jia**g Zheng, Alexander D'Amour, Alexander Franks

Abstract: Recent work has focused on the potential and pitfalls of causal identification in observational studies with multiple simultaneous treatments. Building on previous work, we show that even if the conditional distribution of unmeasured confounders given treatments were known exactly, the causal effects would not in general be identifiable, although they may be partially identified. Given these resul… ▽ More Recent work has focused on the potential and pitfalls of causal identification in observational studies with multiple simultaneous treatments. Building on previous work, we show that even if the conditional distribution of unmeasured confounders given treatments were known exactly, the causal effects would not in general be identifiable, although they may be partially identified. Given these results, we propose a sensitivity analysis method for characterizing the effects of potential unmeasured confounding, tailored to the multiple treatment setting, that can be used to characterize a range of causal effects that are compatible with the observed data. Our method is based on a copula factorization of the joint distribution of outcomes, treatments, and confounders, and can be layered on top of arbitrary observed data models. We propose a practical implementation of this approach making use of the Gaussian copula, and establish conditions under which causal effects can be bounded. We also describe approaches for reasoning about effects, including calibrating sensitivity parameters, quantifying robustness of effect estimates, and selecting models that are most consistent with prior hypotheses. △ Less

Submitted 11 May, 2023; v1 submitted 18 February, 2021; originally announced February 2021.

arXiv:2010.00503 [pdf, other]

Reducing Subspace Models for Large-Scale Covariance Regression

Authors: Alexander Franks

Abstract: We develop an envelope model for joint mean and covariance regression in the large $p$, small $n$ setting. In contrast to existing envelope methods, which improve mean estimates by incorporating estimates of the covariance structure, we focus on identifying covariance heterogeneity by incorporating information about mean-level differences. We use a Monte Carlo EM algorithm to identify a low-dimens… ▽ More We develop an envelope model for joint mean and covariance regression in the large $p$, small $n$ setting. In contrast to existing envelope methods, which improve mean estimates by incorporating estimates of the covariance structure, we focus on identifying covariance heterogeneity by incorporating information about mean-level differences. We use a Monte Carlo EM algorithm to identify a low-dimensional subspace which explains differences in both means and covariances as a function of covariates, and then use MCMC to estimate the posterior uncertainty conditional on the inferred low-dimensional subspace. We demonstrate the utility of our model on a motivating application on the metabolomics of aging. We also provide R code which can be used to develop and test other generalizations of the response envelope model. △ Less

Submitted 1 October, 2020; originally announced October 2020.

arXiv:2007.10550 [pdf, other]

Modeling Player and Team Performance in Basketball

Authors: Zachary Terner, Alexander Franks

Abstract: In recent years, analytics has started to revolutionize the game of basketball: quantitative analyses of the game inform team strategy, management of player health and fitness, and how teams draft, sign, and trade players. In this review, we focus on methods for quantifying and characterizing basketball gameplay. At the team level, we discuss methods for characterizing team strategy and performanc… ▽ More In recent years, analytics has started to revolutionize the game of basketball: quantitative analyses of the game inform team strategy, management of player health and fitness, and how teams draft, sign, and trade players. In this review, we focus on methods for quantifying and characterizing basketball gameplay. At the team level, we discuss methods for characterizing team strategy and performance, while at the player level, we take a deep look into a myriad of tools for player evaluation. This includes metrics for overall player value, defensive ability, and shot modeling, and methods for understanding performance over multiple seasons via player production curves. We conclude with a discussion on the future of basketball analytics, and in particular highlight the need for causal inference in sports. △ Less

Submitted 20 July, 2020; originally announced July 2020.

Comments: 25 pages, 3 figures, supplement included before bibliography

arXiv:1809.00399 [pdf, other]

Flexible sensitivity analysis for observational studies without observable implications

Authors: Alexander Franks, Alexander D'Amour, Avi Feller

Abstract: A fundamental challenge in observational causal inference is that assumptions about unconfoundedness are not testable from data. Assessing sensitivity to such assumptions is therefore important in practice. Unfortunately, some existing sensitivity analysis approaches inadvertently impose restrictions that are at odds with modern causal inference methods, which emphasize flexible models for observe… ▽ More A fundamental challenge in observational causal inference is that assumptions about unconfoundedness are not testable from data. Assessing sensitivity to such assumptions is therefore important in practice. Unfortunately, some existing sensitivity analysis approaches inadvertently impose restrictions that are at odds with modern causal inference methods, which emphasize flexible models for observed data. To address this issue, we propose a framework that allows (1) flexible models for the observed data and (2) clean separation of the identified and unidentified parts of the sensitivity model. Our framework extends an approach from the missing data literature, known as Tukey's factorization, to the causal inference setting. Under this factorization, we can represent the distributions of unobserved potential outcomes in terms of unidentified selection functions that posit an unidentified relationship between the treatment assignment indicator and the observed potential outcomes. The sensitivity parameters in this framework are easily interpreted, and we provide heuristics for calibrating these parameters against observable quantities. We demonstrate the flexibility of this approach in two examples, where we estimate both average treatment effects and quantile treatment effects using Bayesian nonparametric models for the observed data. △ Less

Submitted 13 January, 2019; v1 submitted 2 September, 2018; originally announced September 2018.

arXiv:1609.09830 [pdf, other]

Meta-Analytics: Tools for Understanding the Statistical Properties of Sports Metrics

Authors: Alexander Franks, Alexander D'Amour, Daniel Cervone, Luke Bornn

Abstract: In sports, there is a constant effort to improve metrics which assess player ability, but there has been almost no effort to quantify and compare existing metrics. Any individual making a management, coaching, or gambling decision is quickly overwhelmed with hundreds of statistics. We address this problem by proposing a set of "meta-metrics" which can be used to identify the metrics that provide t… ▽ More In sports, there is a constant effort to improve metrics which assess player ability, but there has been almost no effort to quantify and compare existing metrics. Any individual making a management, coaching, or gambling decision is quickly overwhelmed with hundreds of statistics. We address this problem by proposing a set of "meta-metrics" which can be used to identify the metrics that provide the most unique, reliable, and useful information for decision-makers. Specifically, we develop methods to evalute metrics based on three criteria: 1) stability: does the metric measure the same thing over time 2) discrimination: does the metric differentiate between players and 3) independence: does the metric provide new information? Our methods are easy to implement and widely applicable so they should be of interest to the broader sports community. We demonstrate our methods in analyses of both NBA and NHL metrics. Our results indicate the most reliable metrics and highlight how they should be used by sports analysts. The meta-metrics also provide useful insights about how to best construct new metrics which provide independent and reliable information about athletes. △ Less

Submitted 30 September, 2016; originally announced September 2016.

arXiv:1607.03045 [pdf, other]

Shared Subspace Models for Multi-Group Covariance Estimation

Authors: Alexander Franks, Peter Hoff

Abstract: We develop a model-based method for evaluating heterogeneity among several p x p covariance matrices in the large p, small n setting. This is done by assuming a spiked covariance model for each group and sharing information about the space spanned by the group-level eigenvectors. We use an empirical Bayes method to identify a low-dimensional subspace which explains variation across all groups and… ▽ More We develop a model-based method for evaluating heterogeneity among several p x p covariance matrices in the large p, small n setting. This is done by assuming a spiked covariance model for each group and sharing information about the space spanned by the group-level eigenvectors. We use an empirical Bayes method to identify a low-dimensional subspace which explains variation across all groups and use an MCMC algorithm to estimate the posterior uncertainty of eigenvectors and eigenvalues on this subspace. The implementation and utility of our model is illustrated with analyses of high-dimensional multivariate gene expression. △ Less

Submitted 21 October, 2019; v1 submitted 11 July, 2016; originally announced July 2016.

arXiv:1603.06045 [pdf, other]

Non-standard conditionally specified models for non-ignorable missing data

Authors: Alexander M Franks, Edoardo M Airoldi, Donald B Rubin

Abstract: Data analyses typically rely upon assumptions about missingness mechanisms that lead to observed versus missing data. When the data are missing not at random, direct assumptions about the missingness mechanism, and indirect assumptions about the distributions of observed and missing data, are typically untestable. We explore an approach, where the joint distribution of observed data and missing da… ▽ More Data analyses typically rely upon assumptions about missingness mechanisms that lead to observed versus missing data. When the data are missing not at random, direct assumptions about the missingness mechanism, and indirect assumptions about the distributions of observed and missing data, are typically untestable. We explore an approach, where the joint distribution of observed data and missing data is specified through non-standard conditional distributions. In this formulation, which traces back to a factorization of the joint distribution, apparently proposed by J.W. Tukey, the modeling assumptions about the conditional factors are either testable or are designed to allow the incorporation of substantive knowledge about the problem at hand, thereby offering a possibly realistic portrayal of the data, both missing and observed. We apply Tukey's conditional representation to exponential family models, and we propose a computationally tractable inferential strategy for this class of models. We illustrate the utility of this approach using high-throughput biological data with missing data that are not missing at random. △ Less

Submitted 19 March, 2016; originally announced March 2016.

Comments: 37 pages, 9 figures, 1 table

arXiv:1506.00219 [pdf, other]

doi 10.1371/journal.pcbi.1005535

Post-transcriptional regulation across human tissues

Authors: Alexander Franks, Edoardo Airoldi, Nikolai Slavov

Abstract: Transcriptional and post-transcriptional regulation shape tissue-type-specific proteomes, but their relative contributions remain contested. Estimates of the factors determining protein levels in human tissues do not distinguish between (i) the factors determining the variability between the abundances of different proteins, i.e., mean-level-variability and, (ii) the factors determining the physio… ▽ More Transcriptional and post-transcriptional regulation shape tissue-type-specific proteomes, but their relative contributions remain contested. Estimates of the factors determining protein levels in human tissues do not distinguish between (i) the factors determining the variability between the abundances of different proteins, i.e., mean-level-variability and, (ii) the factors determining the physiological variability of the same protein across different tissue types, i.e., across-tissues variability. We sought to estimate the contribution of transcript levels to these two orthogonal sources of variability, and found that scaled mRNA levels can account for most of the mean-level-variability but not necessarily for across-tissues variability. The reliable quantification of the latter estimate is limited by substantial measurement noise. However, protein-to-mRNA ratios exhibit substantial across-tissues variability that is functionally concerted and reproducible across different datasets, suggesting extensive post-transcriptional regulation. These results caution against estimating protein fold-changes from mRNA fold-changes between different cell-types, and highlight the contribution of post-transcriptional regulation to sha** tissue-type-specific proteomes. △ Less

Submitted 2 May, 2017; v1 submitted 31 May, 2015; originally announced June 2015.

Comments: 30 pages, 4 figures

Journal ref: PLoS Comput Biol 13(5): e1005535 (2017)

arXiv:1406.5799 [pdf, other]

Estimating cellular pathways from an ensemble of heterogeneous data sources

Authors: Alexander Franks, Florian Markowetz, Edoardo Airoldi

Abstract: Building better models of cellular pathways is one of the major challenges of systems biology and functional genomics. There is a need for methods to build on established expert knowledge and reconcile it with results of high-throughput studies. Moreover, the available data sources are heterogeneous and need to be combined in a way specific for the part of the pathway in which they are most inform… ▽ More Building better models of cellular pathways is one of the major challenges of systems biology and functional genomics. There is a need for methods to build on established expert knowledge and reconcile it with results of high-throughput studies. Moreover, the available data sources are heterogeneous and need to be combined in a way specific for the part of the pathway in which they are most informative. Here, we present a compartment specific strategy to integrate edge, node and path data for the refinement of a network hypothesis. Specifically, we use a local-move Gibbs sampler for refining pathway hypotheses from a compendium of heterogeneous data sources, including novel methodology for integrating protein attributes. We demonstrate the utility of this approach in a case study of the pheromone response MAPK pathway in the yeast S. cerevisiae. △ Less

Submitted 22 June, 2014; originally announced June 2014.

arXiv:1405.0231 [pdf, ps, other]

doi 10.1214/14-AOAS799

Characterizing the spatial structure of defensive skill in professional basketball

Authors: Alexander Franks, Andrew Miller, Luke Bornn, Kirk Goldsberry

Abstract: Although basketball is a dualistic sport, with all players competing on both offense and defense, almost all of the sport's conventional metrics are designed to summarize offensive play. As a result, player valuations are largely based on offensive performances and to a much lesser degree on defensive ones. Steals, blocks and defensive rebounds provide only a limited summary of defensive effective… ▽ More Although basketball is a dualistic sport, with all players competing on both offense and defense, almost all of the sport's conventional metrics are designed to summarize offensive play. As a result, player valuations are largely based on offensive performances and to a much lesser degree on defensive ones. Steals, blocks and defensive rebounds provide only a limited summary of defensive effectiveness, yet they persist because they summarize salient events that are easy to observe. Due to the inefficacy of traditional defensive statistics, the state of the art in defensive analytics remains qualitative, based on expert intuition and analysis that can be prone to human biases and imprecision. Fortunately, emerging optical player tracking systems have the potential to enable a richer quantitative characterization of basketball performance, particularly defensive performance. Unfortunately, due to computational and methodological complexities, that potential remains unmet. This paper attempts to fill this void, combining spatial and spatio-temporal processes, matrix factorization techniques and hierarchical regression models with player tracking data to advance the state of defensive analytics in the NBA. Our approach detects, characterizes and quantifies multiple aspects of defensive play in basketball, supporting some common understandings of defensive effectiveness, challenging others and opening up many new insights into the defensive elements of basketball. △ Less

Submitted 28 May, 2015; v1 submitted 1 May, 2014; originally announced May 2014.

Comments: Published at http://dx.doi.org/10.1214/14-AOAS799 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS799

Journal ref: Annals of Applied Statistics 2015, Vol. 9, No. 1, 94-121

Showing 1–16 of 16 results for author: Franks, A