-
Sensitivity analysis with multiple treatments and multiple outcomes with applications to air pollution mixtures
Authors:
Suyeon Kang,
Alexander Franks,
Joseph Antonelli
Abstract:
Understanding the health impacts of air pollution is vital in public health research. Numerous studies have estimated negative health effects of a variety of pollutants, but accurately gauging these impacts remains challenging due to the potential for unmeasured confounding bias that is ubiquitous in observational studies. In this study, we develop a framework for sensitivity analysis in settings…
▽ More
Understanding the health impacts of air pollution is vital in public health research. Numerous studies have estimated negative health effects of a variety of pollutants, but accurately gauging these impacts remains challenging due to the potential for unmeasured confounding bias that is ubiquitous in observational studies. In this study, we develop a framework for sensitivity analysis in settings with both multiple treatments and multiple outcomes simultaneously. This setting is of particular interest because one can identify the strength of association between the unmeasured confounders and both the treatment and outcome, under a factor confounding assumption. This provides informative bounds on the causal effect leading to partial identification regions for the effects of multivariate treatments that account for the maximum possible bias from unmeasured confounding. We also show that when negative controls are available, we are able to refine the partial identification regions substantially, and in certain cases, even identify the causal effect in the presence of unmeasured confounding. We derive partial identification regions for general estimands in this setting, and develop a novel computational approach to finding these regions.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
Sensitivity to Unobserved Confounding in Studies with Factor-structured Outcomes
Authors:
Jia**g Zheng,
Jiaxi Wu,
Alexander D'Amour,
Alexander Franks
Abstract:
In this work, we propose an approach for assessing sensitivity to unobserved confounding in studies with multiple outcomes. We demonstrate how prior knowledge unique to the multi-outcome setting can be leveraged to strengthen causal conclusions beyond what can be achieved from analyzing individual outcomes in isolation. We argue that it is often reasonable to make a shared confounding assumption,…
▽ More
In this work, we propose an approach for assessing sensitivity to unobserved confounding in studies with multiple outcomes. We demonstrate how prior knowledge unique to the multi-outcome setting can be leveraged to strengthen causal conclusions beyond what can be achieved from analyzing individual outcomes in isolation. We argue that it is often reasonable to make a shared confounding assumption, under which residual dependence amongst outcomes can be used to simplify and sharpen sensitivity analyses. We focus on a class of factor models for which we can bound the causal effects for all outcomes conditional on a single sensitivity parameter that represents the fraction of treatment variance explained by unobserved confounders. We characterize how causal ignorance regions shrink under additional prior assumptions about the presence of null control outcomes, and provide new approaches for quantifying the robustness of causal effect estimates. Finally, we illustrate our sensitivity analysis workflow in practice, in an analysis of both simulated data and a case study with data from the National Health and Nutrition Examination Survey (NHANES).
△ Less
Submitted 24 January, 2023; v1 submitted 12 August, 2022;
originally announced August 2022.
-
Bayesian Inference and Partial Identification in Multi-Treatment Causal Inference with Unobserved Confounding
Authors:
Jia**g Zheng,
Alexander D'Amour,
Alexander Franks
Abstract:
In causal estimation problems, the parameter of interest is often only partially identified, implying that the parameter cannot be recovered exactly, even with infinite data. Here, we study Bayesian inference for partially identified treatment effects in multi-treatment causal inference problems with unobserved confounding. In principle, inferring the partially identified treatment effects is natu…
▽ More
In causal estimation problems, the parameter of interest is often only partially identified, implying that the parameter cannot be recovered exactly, even with infinite data. Here, we study Bayesian inference for partially identified treatment effects in multi-treatment causal inference problems with unobserved confounding. In principle, inferring the partially identified treatment effects is natural under the Bayesian paradigm, but the results can be highly sensitive to parameterization and prior specification, often in surprising ways. It is thus essential to understand which aspects of the conclusions about treatment effects are driven entirely by the prior specification. We use a so-called transparent parameterization to contextualize the effects of more interpretable scientifically motivated prior specifications on the multiple effects. We demonstrate our analysis in an example quantifying the effects of gene expression levels on mouse obesity.
△ Less
Submitted 23 April, 2022; v1 submitted 15 November, 2021;
originally announced November 2021.
-
Estimating the effects of a California gun control program with Multitask Gaussian Processes
Authors:
Eli Ben-Michael,
David Arbour,
Avi Feller,
Alex Franks,
Steven Raphael
Abstract:
Gun violence is a critical public safety concern in the United States. In 2006 California implemented a unique firearm monitoring program, the Armed and Prohibited Persons System (APPS), to address gun violence in the state. The APPS program first identifies those firearm owners who become prohibited from owning one due to federal or state law, then confiscates their firearms. Our goal is to asses…
▽ More
Gun violence is a critical public safety concern in the United States. In 2006 California implemented a unique firearm monitoring program, the Armed and Prohibited Persons System (APPS), to address gun violence in the state. The APPS program first identifies those firearm owners who become prohibited from owning one due to federal or state law, then confiscates their firearms. Our goal is to assess the effect of APPS on California murder rates using annual, state-level crime data across the US for the years before and after the introduction of the program. To do so, we adapt a non-parametric Bayesian approach, multitask Gaussian Processes (MTGPs), to the panel data setting. MTGPs allow for flexible and parsimonious panel data models that nest many existing approaches and allow for direct control over both dependence across time and dependence across units, as well as natural uncertainty quantification. We extend this approach to incorporate non-Normal outcomes, auxiliary covariates, and multiple outcome series, which are all important in our application. We also show that this approach has attractive Frequentist properties, including a representation as a weighting estimator with separate weights over units and time periods. Applying this approach, we find that the increased monitoring and enforcement from the APPS program substantially decreased homicides in California. We also find that the effect on murder is driven entirely by declines in gun-related murder with no measurable effect on non-gun murder. Estimated cost per murder avoided are substantially lower than conventional estimates of the value of a statistical life, suggesting a very high benefit-cost ratio for this enforcement effort.
△ Less
Submitted 8 June, 2022; v1 submitted 13 October, 2021;
originally announced October 2021.
-
Learning Gaussian Graphical Models with Latent Confounders
Authors:
Ke Wang,
Alexander Franks,
Sang-Yun Oh
Abstract:
Gaussian Graphical models (GGM) are widely used to estimate the network structures in many applications ranging from biology to finance. In practice, data is often corrupted by latent confounders which biases inference of the underlying true graphical structure. In this paper, we compare and contrast two strategies for inference in graphical models with latent confounders: Gaussian graphical model…
▽ More
Gaussian Graphical models (GGM) are widely used to estimate the network structures in many applications ranging from biology to finance. In practice, data is often corrupted by latent confounders which biases inference of the underlying true graphical structure. In this paper, we compare and contrast two strategies for inference in graphical models with latent confounders: Gaussian graphical models with latent variables (LVGGM) and PCA-based removal of confounding (PCA+GGM). While these two approaches have similar goals, they are motivated by different assumptions about confounding. In this paper, we explore the connection between these two approaches and propose a new method, which combines the strengths of these two approaches. We prove the consistency and convergence rate for the PCA-based method and use these results to provide guidance about when to use each method. We demonstrate the effectiveness of our methodology using both simulations and in two real-world applications.
△ Less
Submitted 23 July, 2023; v1 submitted 13 May, 2021;
originally announced May 2021.
-
Deconfounding Scores: Feature Representations for Causal Effect Estimation with Weak Overlap
Authors:
Alexander D'Amour,
Alexander Franks
Abstract:
A key condition for obtaining reliable estimates of the causal effect of a treatment is overlap (a.k.a. positivity): the distributions of the features used to perform causal adjustment cannot be too different in the treated and control groups. In cases where overlap is poor, causal effect estimators can become brittle, especially when they incorporate weighting. To address this problem, a number o…
▽ More
A key condition for obtaining reliable estimates of the causal effect of a treatment is overlap (a.k.a. positivity): the distributions of the features used to perform causal adjustment cannot be too different in the treated and control groups. In cases where overlap is poor, causal effect estimators can become brittle, especially when they incorporate weighting. To address this problem, a number of proposals (including confounder selection or dimension reduction methods) incorporate feature representations to induce better overlap between the treated and control groups. A key concern in these proposals is that the representation may introduce confounding bias into the effect estimator. In this paper, we introduce deconfounding scores, which are feature representations that induce better overlap without biasing the target of estimation. We show that deconfounding scores satisfy a zero-covariance condition that is identifiable in observed data. As a proof of concept, we characterize a family of deconfounding scores in a simplified setting with Gaussian covariates, and show that in some simple simulations, these scores can be used to construct estimators with good finite-sample properties. In particular, we show that this technique could be an attractive alternative to standard regularizations that are often applied to IPW and balancing weights.
△ Less
Submitted 12 April, 2021;
originally announced April 2021.
-
Copula-based Sensitivity Analysis for Multi-Treatment Causal Inference with Unobserved Confounding
Authors:
Jia**g Zheng,
Alexander D'Amour,
Alexander Franks
Abstract:
Recent work has focused on the potential and pitfalls of causal identification in observational studies with multiple simultaneous treatments. Building on previous work, we show that even if the conditional distribution of unmeasured confounders given treatments were known exactly, the causal effects would not in general be identifiable, although they may be partially identified. Given these resul…
▽ More
Recent work has focused on the potential and pitfalls of causal identification in observational studies with multiple simultaneous treatments. Building on previous work, we show that even if the conditional distribution of unmeasured confounders given treatments were known exactly, the causal effects would not in general be identifiable, although they may be partially identified. Given these results, we propose a sensitivity analysis method for characterizing the effects of potential unmeasured confounding, tailored to the multiple treatment setting, that can be used to characterize a range of causal effects that are compatible with the observed data. Our method is based on a copula factorization of the joint distribution of outcomes, treatments, and confounders, and can be layered on top of arbitrary observed data models. We propose a practical implementation of this approach making use of the Gaussian copula, and establish conditions under which causal effects can be bounded. We also describe approaches for reasoning about effects, including calibrating sensitivity parameters, quantifying robustness of effect estimates, and selecting models that are most consistent with prior hypotheses.
△ Less
Submitted 11 May, 2023; v1 submitted 18 February, 2021;
originally announced February 2021.
-
Reducing Subspace Models for Large-Scale Covariance Regression
Authors:
Alexander Franks
Abstract:
We develop an envelope model for joint mean and covariance regression in the large $p$, small $n$ setting. In contrast to existing envelope methods, which improve mean estimates by incorporating estimates of the covariance structure, we focus on identifying covariance heterogeneity by incorporating information about mean-level differences. We use a Monte Carlo EM algorithm to identify a low-dimens…
▽ More
We develop an envelope model for joint mean and covariance regression in the large $p$, small $n$ setting. In contrast to existing envelope methods, which improve mean estimates by incorporating estimates of the covariance structure, we focus on identifying covariance heterogeneity by incorporating information about mean-level differences. We use a Monte Carlo EM algorithm to identify a low-dimensional subspace which explains differences in both means and covariances as a function of covariates, and then use MCMC to estimate the posterior uncertainty conditional on the inferred low-dimensional subspace. We demonstrate the utility of our model on a motivating application on the metabolomics of aging. We also provide R code which can be used to develop and test other generalizations of the response envelope model.
△ Less
Submitted 1 October, 2020;
originally announced October 2020.
-
Modeling Player and Team Performance in Basketball
Authors:
Zachary Terner,
Alexander Franks
Abstract:
In recent years, analytics has started to revolutionize the game of basketball: quantitative analyses of the game inform team strategy, management of player health and fitness, and how teams draft, sign, and trade players. In this review, we focus on methods for quantifying and characterizing basketball gameplay. At the team level, we discuss methods for characterizing team strategy and performanc…
▽ More
In recent years, analytics has started to revolutionize the game of basketball: quantitative analyses of the game inform team strategy, management of player health and fitness, and how teams draft, sign, and trade players. In this review, we focus on methods for quantifying and characterizing basketball gameplay. At the team level, we discuss methods for characterizing team strategy and performance, while at the player level, we take a deep look into a myriad of tools for player evaluation. This includes metrics for overall player value, defensive ability, and shot modeling, and methods for understanding performance over multiple seasons via player production curves. We conclude with a discussion on the future of basketball analytics, and in particular highlight the need for causal inference in sports.
△ Less
Submitted 20 July, 2020;
originally announced July 2020.
-
Flexible sensitivity analysis for observational studies without observable implications
Authors:
Alexander Franks,
Alexander D'Amour,
Avi Feller
Abstract:
A fundamental challenge in observational causal inference is that assumptions about unconfoundedness are not testable from data. Assessing sensitivity to such assumptions is therefore important in practice. Unfortunately, some existing sensitivity analysis approaches inadvertently impose restrictions that are at odds with modern causal inference methods, which emphasize flexible models for observe…
▽ More
A fundamental challenge in observational causal inference is that assumptions about unconfoundedness are not testable from data. Assessing sensitivity to such assumptions is therefore important in practice. Unfortunately, some existing sensitivity analysis approaches inadvertently impose restrictions that are at odds with modern causal inference methods, which emphasize flexible models for observed data. To address this issue, we propose a framework that allows (1) flexible models for the observed data and (2) clean separation of the identified and unidentified parts of the sensitivity model. Our framework extends an approach from the missing data literature, known as Tukey's factorization, to the causal inference setting. Under this factorization, we can represent the distributions of unobserved potential outcomes in terms of unidentified selection functions that posit an unidentified relationship between the treatment assignment indicator and the observed potential outcomes. The sensitivity parameters in this framework are easily interpreted, and we provide heuristics for calibrating these parameters against observable quantities. We demonstrate the flexibility of this approach in two examples, where we estimate both average treatment effects and quantile treatment effects using Bayesian nonparametric models for the observed data.
△ Less
Submitted 13 January, 2019; v1 submitted 2 September, 2018;
originally announced September 2018.
-
Meta-Analytics: Tools for Understanding the Statistical Properties of Sports Metrics
Authors:
Alexander Franks,
Alexander D'Amour,
Daniel Cervone,
Luke Bornn
Abstract:
In sports, there is a constant effort to improve metrics which assess player ability, but there has been almost no effort to quantify and compare existing metrics. Any individual making a management, coaching, or gambling decision is quickly overwhelmed with hundreds of statistics. We address this problem by proposing a set of "meta-metrics" which can be used to identify the metrics that provide t…
▽ More
In sports, there is a constant effort to improve metrics which assess player ability, but there has been almost no effort to quantify and compare existing metrics. Any individual making a management, coaching, or gambling decision is quickly overwhelmed with hundreds of statistics. We address this problem by proposing a set of "meta-metrics" which can be used to identify the metrics that provide the most unique, reliable, and useful information for decision-makers. Specifically, we develop methods to evalute metrics based on three criteria: 1) stability: does the metric measure the same thing over time 2) discrimination: does the metric differentiate between players and 3) independence: does the metric provide new information? Our methods are easy to implement and widely applicable so they should be of interest to the broader sports community. We demonstrate our methods in analyses of both NBA and NHL metrics. Our results indicate the most reliable metrics and highlight how they should be used by sports analysts. The meta-metrics also provide useful insights about how to best construct new metrics which provide independent and reliable information about athletes.
△ Less
Submitted 30 September, 2016;
originally announced September 2016.
-
Shared Subspace Models for Multi-Group Covariance Estimation
Authors:
Alexander Franks,
Peter Hoff
Abstract:
We develop a model-based method for evaluating heterogeneity among several p x p covariance matrices in the large p, small n setting. This is done by assuming a spiked covariance model for each group and sharing information about the space spanned by the group-level eigenvectors. We use an empirical Bayes method to identify a low-dimensional subspace which explains variation across all groups and…
▽ More
We develop a model-based method for evaluating heterogeneity among several p x p covariance matrices in the large p, small n setting. This is done by assuming a spiked covariance model for each group and sharing information about the space spanned by the group-level eigenvectors. We use an empirical Bayes method to identify a low-dimensional subspace which explains variation across all groups and use an MCMC algorithm to estimate the posterior uncertainty of eigenvectors and eigenvalues on this subspace. The implementation and utility of our model is illustrated with analyses of high-dimensional multivariate gene expression.
△ Less
Submitted 21 October, 2019; v1 submitted 11 July, 2016;
originally announced July 2016.
-
Non-standard conditionally specified models for non-ignorable missing data
Authors:
Alexander M Franks,
Edoardo M Airoldi,
Donald B Rubin
Abstract:
Data analyses typically rely upon assumptions about missingness mechanisms that lead to observed versus missing data. When the data are missing not at random, direct assumptions about the missingness mechanism, and indirect assumptions about the distributions of observed and missing data, are typically untestable. We explore an approach, where the joint distribution of observed data and missing da…
▽ More
Data analyses typically rely upon assumptions about missingness mechanisms that lead to observed versus missing data. When the data are missing not at random, direct assumptions about the missingness mechanism, and indirect assumptions about the distributions of observed and missing data, are typically untestable. We explore an approach, where the joint distribution of observed data and missing data is specified through non-standard conditional distributions. In this formulation, which traces back to a factorization of the joint distribution, apparently proposed by J.W. Tukey, the modeling assumptions about the conditional factors are either testable or are designed to allow the incorporation of substantive knowledge about the problem at hand, thereby offering a possibly realistic portrayal of the data, both missing and observed. We apply Tukey's conditional representation to exponential family models, and we propose a computationally tractable inferential strategy for this class of models. We illustrate the utility of this approach using high-throughput biological data with missing data that are not missing at random.
△ Less
Submitted 19 March, 2016;
originally announced March 2016.
-
Post-transcriptional regulation across human tissues
Authors:
Alexander Franks,
Edoardo Airoldi,
Nikolai Slavov
Abstract:
Transcriptional and post-transcriptional regulation shape tissue-type-specific proteomes, but their relative contributions remain contested. Estimates of the factors determining protein levels in human tissues do not distinguish between (i) the factors determining the variability between the abundances of different proteins, i.e., mean-level-variability and, (ii) the factors determining the physio…
▽ More
Transcriptional and post-transcriptional regulation shape tissue-type-specific proteomes, but their relative contributions remain contested. Estimates of the factors determining protein levels in human tissues do not distinguish between (i) the factors determining the variability between the abundances of different proteins, i.e., mean-level-variability and, (ii) the factors determining the physiological variability of the same protein across different tissue types, i.e., across-tissues variability. We sought to estimate the contribution of transcript levels to these two orthogonal sources of variability, and found that scaled mRNA levels can account for most of the mean-level-variability but not necessarily for across-tissues variability. The reliable quantification of the latter estimate is limited by substantial measurement noise. However, protein-to-mRNA ratios exhibit substantial across-tissues variability that is functionally concerted and reproducible across different datasets, suggesting extensive post-transcriptional regulation. These results caution against estimating protein fold-changes from mRNA fold-changes between different cell-types, and highlight the contribution of post-transcriptional regulation to sha** tissue-type-specific proteomes.
△ Less
Submitted 2 May, 2017; v1 submitted 31 May, 2015;
originally announced June 2015.
-
Estimating cellular pathways from an ensemble of heterogeneous data sources
Authors:
Alexander Franks,
Florian Markowetz,
Edoardo Airoldi
Abstract:
Building better models of cellular pathways is one of the major challenges of systems biology and functional genomics. There is a need for methods to build on established expert knowledge and reconcile it with results of high-throughput studies. Moreover, the available data sources are heterogeneous and need to be combined in a way specific for the part of the pathway in which they are most inform…
▽ More
Building better models of cellular pathways is one of the major challenges of systems biology and functional genomics. There is a need for methods to build on established expert knowledge and reconcile it with results of high-throughput studies. Moreover, the available data sources are heterogeneous and need to be combined in a way specific for the part of the pathway in which they are most informative. Here, we present a compartment specific strategy to integrate edge, node and path data for the refinement of a network hypothesis. Specifically, we use a local-move Gibbs sampler for refining pathway hypotheses from a compendium of heterogeneous data sources, including novel methodology for integrating protein attributes. We demonstrate the utility of this approach in a case study of the pheromone response MAPK pathway in the yeast S. cerevisiae.
△ Less
Submitted 22 June, 2014;
originally announced June 2014.
-
Characterizing the spatial structure of defensive skill in professional basketball
Authors:
Alexander Franks,
Andrew Miller,
Luke Bornn,
Kirk Goldsberry
Abstract:
Although basketball is a dualistic sport, with all players competing on both offense and defense, almost all of the sport's conventional metrics are designed to summarize offensive play. As a result, player valuations are largely based on offensive performances and to a much lesser degree on defensive ones. Steals, blocks and defensive rebounds provide only a limited summary of defensive effective…
▽ More
Although basketball is a dualistic sport, with all players competing on both offense and defense, almost all of the sport's conventional metrics are designed to summarize offensive play. As a result, player valuations are largely based on offensive performances and to a much lesser degree on defensive ones. Steals, blocks and defensive rebounds provide only a limited summary of defensive effectiveness, yet they persist because they summarize salient events that are easy to observe. Due to the inefficacy of traditional defensive statistics, the state of the art in defensive analytics remains qualitative, based on expert intuition and analysis that can be prone to human biases and imprecision. Fortunately, emerging optical player tracking systems have the potential to enable a richer quantitative characterization of basketball performance, particularly defensive performance. Unfortunately, due to computational and methodological complexities, that potential remains unmet. This paper attempts to fill this void, combining spatial and spatio-temporal processes, matrix factorization techniques and hierarchical regression models with player tracking data to advance the state of defensive analytics in the NBA. Our approach detects, characterizes and quantifies multiple aspects of defensive play in basketball, supporting some common understandings of defensive effectiveness, challenging others and opening up many new insights into the defensive elements of basketball.
△ Less
Submitted 28 May, 2015; v1 submitted 1 May, 2014;
originally announced May 2014.