Search | arXiv e-print repository

Inference with generalizable classifier predictions

Authors: Ciaran Evans, Zara Y. Weinberg, Manojkumar A. Puthenveedu, Max G'Sell

Abstract: This paper addresses the problem of making statistical inference about a population that can only be identified through classifier predictions. The problem is motivated by scientific studies in which human labels of a population are replaced by a classifier. For downstream analysis of the population based on classifier predictions to be sound, the predictions must generalize equally across experim… ▽ More This paper addresses the problem of making statistical inference about a population that can only be identified through classifier predictions. The problem is motivated by scientific studies in which human labels of a population are replaced by a classifier. For downstream analysis of the population based on classifier predictions to be sound, the predictions must generalize equally across experimental conditions. In this paper, we formalize the task of statistical inference using classifier predictions, and propose bootstrap procedures to allow inference with a generalizable classifier. We demonstrate the performance of our methods through extensive simulations and a case study with live cell imaging data. △ Less

Submitted 14 June, 2021; originally announced June 2021.

Comments: 26 pages, 9 figures

arXiv:2009.08592 [pdf, other]

Sequential changepoint detection in classification data under label shift

Authors: Ciaran Evans, Max G'Sell

Abstract: Classifier predictions often rely on the assumption that new observations come from the same distribution as training data. When the underlying distribution changes, so does the optimal classification rule, and performance may degrade. We consider the problem of detecting such a change in distribution in sequentially-observed, unlabeled classification data. We focus on label shift changes to the d… ▽ More Classifier predictions often rely on the assumption that new observations come from the same distribution as training data. When the underlying distribution changes, so does the optimal classification rule, and performance may degrade. We consider the problem of detecting such a change in distribution in sequentially-observed, unlabeled classification data. We focus on label shift changes to the distribution, where the class priors shift but the class conditional distributions remain unchanged. We reduce this problem to the problem of detecting a change in the one-dimensional classifier scores, leading to simple nonparametric sequential changepoint detection procedures. Our procedures leverage classifier training data to estimate the detection statistic, and converge to their parametric counterparts in the size of the training data. In simulations, we show that our method outperforms other detection procedures in this label shift setting. △ Less

Submitted 31 August, 2021; v1 submitted 17 September, 2020; originally announced September 2020.

Comments: 25 pages, 3 figures, 4 tables

arXiv:2003.13808 [pdf, other]

Fairness Evaluation in Presence of Biased Noisy Labels

Authors: Riccardo Fogliato, Max G'Sell, Alexandra Chouldechova

Abstract: Risk assessment tools are widely used around the country to inform decision making within the criminal justice system. Recently, considerable attention has been devoted to the question of whether such tools may suffer from racial bias. In this type of assessment, a fundamental issue is that the training and evaluation of the model is based on a variable (arrest) that may represent a noisy version… ▽ More Risk assessment tools are widely used around the country to inform decision making within the criminal justice system. Recently, considerable attention has been devoted to the question of whether such tools may suffer from racial bias. In this type of assessment, a fundamental issue is that the training and evaluation of the model is based on a variable (arrest) that may represent a noisy version of an unobserved outcome of more central interest (offense). We propose a sensitivity analysis framework for assessing how assumptions on the noise across groups affect the predictive bias properties of the risk assessment model as a predictor of reoffense. Our experimental results on two real world criminal justice data sets demonstrate how even small biases in the observed labels may call into question the conclusions of an analysis based on the noisy outcome. △ Less

Submitted 30 March, 2020; originally announced March 2020.

Comments: Accepted at International Conference on Artificial Intelligence and Statistics (AISTATS), 2020

arXiv:1812.03644 [pdf, other]

Post-Selection Inference for Changepoint Detection Algorithms with Application to Copy Number Variation Data

Authors: Sangwon Hyun, Kevin Lin, Max G'Sell, Ryan J. Tibshirani

Abstract: Changepoint detection methods are used in many areas of science and engineering, e.g., in the analysis of copy number variation data, to detect abnormalities in copy numbers along the genome. Despite the broad array of available tools, methodology for quantifying our uncertainty in the strength (or presence) of given changepoints, post-detection, are lacking. Post-selection inference offers a fram… ▽ More Changepoint detection methods are used in many areas of science and engineering, e.g., in the analysis of copy number variation data, to detect abnormalities in copy numbers along the genome. Despite the broad array of available tools, methodology for quantifying our uncertainty in the strength (or presence) of given changepoints, post-detection, are lacking. Post-selection inference offers a framework to fill this gap, but the most straightforward application of these methods results in low-powered tests and leaves open several important questions about practical usability. In this work, we carefully tailor post-selection inference methods towards changepoint detection, focusing as our main scientific application on copy number variation data. As for changepoint algorithms, we study binary segmentation, and two of its most popular variants, wild and circular, and the fused lasso. We implement some of the latest developments in post-selection inference theory: we use auxiliary randomization to improve power, which requires implementations of MCMC algorithms (importance sampling and hit-and-run sampling) to carry out our tests. We also provide recommendations for improving practical useability, detailed simulations, and an example analysis on array comparative genomic hybridization (CGH) data. △ Less

Submitted 10 December, 2018; originally announced December 2018.

arXiv:1801.03635 [pdf, other]

Sharp instruments for classifying compliers and generalizing causal effects

Authors: Edward H. Kennedy, Sivaraman Balakrishnan, Max G'Sell

Abstract: It is well-known that, without restricting treatment effect heterogeneity, instrumental variable (IV) methods only identify "local" effects among compliers, i.e., those subjects who take treatment only when encouraged by the IV. Local effects are controversial since they seem to only apply to an unidentified subgroup; this has led many to denounce these effects as having little policy relevance. H… ▽ More It is well-known that, without restricting treatment effect heterogeneity, instrumental variable (IV) methods only identify "local" effects among compliers, i.e., those subjects who take treatment only when encouraged by the IV. Local effects are controversial since they seem to only apply to an unidentified subgroup; this has led many to denounce these effects as having little policy relevance. However, we show that such pessimism is not always warranted: it is possible in some cases to accurately predict who compliers are, and obtain tight bounds on more generalizable effects in identifiable subgroups. We propose methods for doing so and study their estimation error and asymptotic properties, showing that these tasks can in theory be accomplished even with very weak IVs. We go on to introduce a new measure of IV quality called "sharpness", which reflects the variation in compliance explained by covariates, and captures how well one can identify compliers and obtain tight bounds on identifiable subgroup effects. We develop an estimator of sharpness, and show that it is asymptotically efficient under weak conditions. Finally we explore finite-sample properties via simulation, and apply the methods to study canvassing effects on voter turnout. We propose that sharpness should be presented alongside strength to assess IV quality. △ Less

Submitted 30 May, 2019; v1 submitted 11 January, 2018; originally announced January 2018.

arXiv:1707.00046 [pdf, other]

Fairer and more accurate, but for whom?

Authors: Alexandra Chouldechova, Max G'Sell

Abstract: Complex statistical machine learning models are increasingly being used or considered for use in high-stakes decision-making pipelines in domains such as financial services, health care, criminal justice and human services. These models are often investigated as possible improvements over more classical tools such as regression models or human judgement. While the modeling approach may be new, the… ▽ More Complex statistical machine learning models are increasingly being used or considered for use in high-stakes decision-making pipelines in domains such as financial services, health care, criminal justice and human services. These models are often investigated as possible improvements over more classical tools such as regression models or human judgement. While the modeling approach may be new, the practice of using some form of risk assessment to inform decisions is not. When determining whether a new model should be adopted, it is therefore essential to be able to compare the proposed model to the existing approach across a range of task-relevant accuracy and fairness metrics. Looking at overall performance metrics, however, may be misleading. Even when two models have comparable overall performance, they may nevertheless disagree in their classifications on a considerable fraction of cases. In this paper we introduce a model comparison framework for automatically identifying subgroups in which the differences between models are most pronounced. Our primary focus is on identifying subgroups where the models differ in terms of fairness-related quantities such as racial or gender disparities. We present experimental results from a recidivism prediction task and a hypothetical lending example. △ Less

Submitted 30 June, 2017; originally announced July 2017.

Comments: Presented as a poster at the 2017 Workshop on Fairness, Accountability, and Transparency in Machine Learning (FAT/ML 2017)

arXiv:1606.03552 [pdf, other]

Exact Post-Selection Inference for Changepoint Detection and Other Generalized Lasso Problems

Authors: Sangwon Hyun, Max G'Sell, Ryan J. Tibshirani

Abstract: We study tools for inference conditioned on model selection events that are defined by the generalized lasso regularization path. The generalized lasso estimate is given by the solution of a penalized least squares regression problem, where the penalty is the l1 norm of a matrix D times the coefficient vector. The generalized lasso path collects these estimates for a range of penalty parameter (λ)… ▽ More We study tools for inference conditioned on model selection events that are defined by the generalized lasso regularization path. The generalized lasso estimate is given by the solution of a penalized least squares regression problem, where the penalty is the l1 norm of a matrix D times the coefficient vector. The generalized lasso path collects these estimates for a range of penalty parameter (λ) values. Leveraging a sequential characterization of this path from Tibshirani & Taylor (2011), and recent advances in post-selection inference from Lee et al. (2016), Tibshirani et al. (2016), we develop exact hypothesis tests and confidence intervals for linear contrasts of the underlying mean vector, conditioned on any model selection event along the generalized lasso path (assuming Gaussian errors in the observations). By inspecting specific choices of D, we obtain post-selection tests and confidence intervals for specific cases of generalized lasso estimates, such as the fused lasso, trend filtering, and the graph fused lasso. In the fused lasso case, the underlying coordinates of the mean are assigned a linear ordering, and our framework allows us to test selectively chosen breakpoints or changepoints in these mean coordinates. This is an interesting and well-studied problem with broad applications, our framework applied to the trend filtering and graph fused lasso serves several applications as well. Aside from the development of selective inference tools, we describe several practical aspects of our methods such as valid post-processing of generalized estimates before performing inference in order to improve power, and problem-specific visualization aids that may be given to the data analyst for he/she to choose linear contrasts to be tested. Many examples, both from simulated and real data sources, are presented to examine the empirical properties of our inference methods. △ Less

Submitted 11 June, 2016; originally announced June 2016.

arXiv:1604.04173 [pdf, other]

Distribution-Free Predictive Inference For Regression

Authors: **g Lei, Max G'Sell, Alessandro Rinaldo, Ryan J. Tibshirani, Larry Wasserman

Abstract: We develop a general framework for distribution-free predictive inference in regression, using conformal inference. The proposed methodology allows for the construction of a prediction band for the response variable using any estimator of the regression function. The resulting prediction band preserves the consistency properties of the original estimator under standard assumptions, while guarantee… ▽ More We develop a general framework for distribution-free predictive inference in regression, using conformal inference. The proposed methodology allows for the construction of a prediction band for the response variable using any estimator of the regression function. The resulting prediction band preserves the consistency properties of the original estimator under standard assumptions, while guaranteeing finite-sample marginal coverage even when these assumptions do not hold. We analyze and compare, both empirically and theoretically, the two major variants of our conformal framework: full conformal inference and split conformal inference, along with a related jackknife method. These methods offer different tradeoffs between statistical accuracy (length of resulting prediction intervals) and computational efficiency. As extensions, we develop a method for constructing valid in-sample prediction intervals called {\it rank-one-out} conformal inference, which has essentially the same computational efficiency as split conformal inference. We also describe an extension of our procedures for producing prediction bands with locally varying length, in order to adapt to heteroskedascity in the data. Finally, we propose a model-free notion of variable importance, called {\it leave-one-covariate-out} or LOCO inference. Accompanying this paper is an R package {\tt conformalInference} that implements all of the proposals we have introduced. In the spirit of reproducibility, all of our empirical results can also be easily (re)generated using this package. △ Less

Submitted 8 March, 2017; v1 submitted 14 April, 2016; originally announced April 2016.

Comments: 50 pages, 7 figures, 3 tables

arXiv:1309.5352 [pdf, other]

Sequential Selection Procedures and False Discovery Rate Control

Authors: Max Grazier G'Sell, Stefan Wager, Alexandra Chouldechova, Robert Tibshirani

Abstract: We consider a multiple hypothesis testing setting where the hypotheses are ordered and one is only permitted to reject an initial contiguous block, H_1,\dots,H_k, of hypotheses. A rejection rule in this setting amounts to a procedure for choosing the stop** point k. This setting is inspired by the sequential nature of many model selection problems, where choosing a stop** point or a model is e… ▽ More We consider a multiple hypothesis testing setting where the hypotheses are ordered and one is only permitted to reject an initial contiguous block, H_1,\dots,H_k, of hypotheses. A rejection rule in this setting amounts to a procedure for choosing the stop** point k. This setting is inspired by the sequential nature of many model selection problems, where choosing a stop** point or a model is equivalent to rejecting all hypotheses up to that point and none thereafter. We propose two new testing procedures, and prove that they control the false discovery rate in the ordered testing setting. We also show how the methods can be applied to model selection using recent results on p-values in sequential model selection settings. △ Less

Submitted 23 March, 2015; v1 submitted 20 September, 2013; originally announced September 2013.

Comments: 31 pages, 14 figures. Accepted to the Journal of the Royal Statistical Society: Series B

arXiv:1308.2329 [pdf, other]

Sensitivity Analysis for Inference with Partially Identifiable Covariance Matrices

Authors: Max Grazier G'Sell, Shai S. Shen-Orr, Robert Tibshirani

Abstract: In some multivariate problems with missing data, pairs of variables exist that are never observed together. For example, some modern biological tools can produce data of this form. As a result of this structure, the covariance matrix is only partially identifiable, and point estimation requires that identifying assumptions be made. These assumptions can introduce an unknown and potentially large b… ▽ More In some multivariate problems with missing data, pairs of variables exist that are never observed together. For example, some modern biological tools can produce data of this form. As a result of this structure, the covariance matrix is only partially identifiable, and point estimation requires that identifying assumptions be made. These assumptions can introduce an unknown and potentially large bias into the inference. This paper presents a method based on semidefinite programming for automatically quantifying this potential bias by computing the range of possible equal-likelihood inferred values for convex functions of the covariance matrix. We focus on the bias of missing value imputation via conditional expectation and show that our method can give an accurate assessment of the true error in cases where estimates based on sampling uncertainty alone are overly optimistic. △ Less

Submitted 10 August, 2013; originally announced August 2013.

Comments: 19 pages, 8 figures. Submitted to Computational Statistics

arXiv:1307.4765 [pdf, other]

Adaptive testing for the graphical lasso

Authors: Max Grazier G'Sell, Jonathan Taylor, Robert Tibshirani

Abstract: We consider tests of significance in the setting of the graphical lasso for inverse covariance matrix estimation. We propose a simple test statistic based on a subsequence of the knots in the graphical lasso path. We show that this statistic has an exponential asymptotic null distribution, under the null hypothesis that the model contains the true connected components. Though the null distributi… ▽ More We consider tests of significance in the setting of the graphical lasso for inverse covariance matrix estimation. We propose a simple test statistic based on a subsequence of the knots in the graphical lasso path. We show that this statistic has an exponential asymptotic null distribution, under the null hypothesis that the model contains the true connected components. Though the null distribution is asymptotic, we show through simulation that it provides a close approximation to the true distribution at reasonable sample sizes. Thus the test provides a simple, tractable test for the significance of new edges as they are introduced into the model. Finally, we show connections between our results and other results for regularized regression, as well as extensions of our results to other correlation matrix based methods like single-linkage clustering. △ Less

Submitted 22 July, 2013; v1 submitted 17 July, 2013; originally announced July 2013.

Comments: 33 pages, 8 figures. Submitted to Annals of Statistics

MSC Class: 62F12; 62H15

arXiv:1302.2303 [pdf, other]

False Variable Selection Rates in Regression

Authors: Max Grazier G'Sell, Trevor Hastie, Robert Tibshirani

Abstract: There has been recent interest in extending the ideas of False Discovery Rates (FDR) to variable selection in regression settings. Traditionally the FDR in these settings has been defined in terms of the coefficients of the full regression model. Recent papers have struggled with controlling this quantity when the predictors are correlated. This paper shows that this full model definition of FDR s… ▽ More There has been recent interest in extending the ideas of False Discovery Rates (FDR) to variable selection in regression settings. Traditionally the FDR in these settings has been defined in terms of the coefficients of the full regression model. Recent papers have struggled with controlling this quantity when the predictors are correlated. This paper shows that this full model definition of FDR suffers from unintuitive and potentially undesirable behavior in the presence of correlated predictors. We propose a new false selection error criterion, the False Variable Rate (FVR), that avoids these problems and behaves in a more intuitive manner. We discuss the behavior of this criterion and how it compares with the traditional FDR, as well as presenting guidelines for determining which is appropriate in a particular setting. Finally, we present a simple estimation procedure for FVR in stepwise variable selection. We analyze the performance of this estimator and draw connections to recent estimators in the literature. △ Less

Submitted 10 February, 2013; originally announced February 2013.

Comments: 14 figures, 21 pages. Submitted to Annals of Applied Statistics

Showing 1–12 of 12 results for author: G'Sell, M