Search | arXiv e-print repository

Large language model validity via enhanced conformal prediction methods

Authors: John J. Cherian, Isaac Gibbs, Emmanuel J. Candès

Abstract: We develop new conformal inference methods for obtaining validity guarantees on the output of large language models (LLMs). Prior work in conformal language modeling identifies a subset of the text that satisfies a high-probability guarantee of correctness. These methods work by filtering claims from the LLM's original response if a scoring function evaluated on the claim fails to exceed a thresho… ▽ More We develop new conformal inference methods for obtaining validity guarantees on the output of large language models (LLMs). Prior work in conformal language modeling identifies a subset of the text that satisfies a high-probability guarantee of correctness. These methods work by filtering claims from the LLM's original response if a scoring function evaluated on the claim fails to exceed a threshold calibrated via split conformal prediction. Existing methods in this area suffer from two deficiencies. First, the guarantee stated is not conditionally valid. The trustworthiness of the filtering step may vary based on the topic of the response. Second, because the scoring function is imperfect, the filtering step can remove many valuable and accurate claims. We address both of these challenges via two new conformal methods. First, we generalize the conditional conformal procedure of Gibbs et al. (2023) in order to adaptively issue weaker guarantees when they are required to preserve the utility of the output. Second, we show how to systematically improve the quality of the scoring function via a novel algorithm for differentiating through the conditional conformal procedure. We demonstrate the efficacy of our approach on both synthetic and real-world datasets. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: 20 pages, 8 figures

arXiv:2406.07449 [pdf, other]

Boosted Conformal Prediction Intervals

Authors: Ran Xie, Rina Foygel Barber, Emmanuel J. Candès

Abstract: This paper introduces a boosted conformal procedure designed to tailor conformalized prediction intervals toward specific desired properties, such as enhanced conditional coverage or reduced interval length. We employ machine learning techniques, notably gradient boosting, to systematically improve upon a predefined conformity score function. This process is guided by carefully constructed loss fu… ▽ More This paper introduces a boosted conformal procedure designed to tailor conformalized prediction intervals toward specific desired properties, such as enhanced conditional coverage or reduced interval length. We employ machine learning techniques, notably gradient boosting, to systematically improve upon a predefined conformity score function. This process is guided by carefully constructed loss functions that measure the deviation of prediction intervals from the targeted properties. The procedure operates post-training, relying solely on model predictions and without modifying the trained model (e.g., the deep network). Systematic experiments demonstrate that starting from conventional conformal methods, our boosted procedure achieves substantial improvements in reducing interval length and decreasing deviation from target conditional coverage. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: 22 pages, 9 figures

arXiv:2403.03208 [pdf, other]

Active Statistical Inference

Authors: Tijana Zrnic, Emmanuel J. Candès

Abstract: Inspired by the concept of active learning, we propose active inference$\unicode{x2013}$a methodology for statistical inference with machine-learning-assisted data collection. Assuming a budget on the number of labels that can be collected, the methodology uses a machine learning model to identify which data points would be most beneficial to label, thus effectively utilizing the budget. It operat… ▽ More Inspired by the concept of active learning, we propose active inference$\unicode{x2013}$a methodology for statistical inference with machine-learning-assisted data collection. Assuming a budget on the number of labels that can be collected, the methodology uses a machine learning model to identify which data points would be most beneficial to label, thus effectively utilizing the budget. It operates on a simple yet powerful intuition: prioritize the collection of labels for data points where the model exhibits uncertainty, and rely on the model's predictions where it is confident. Active inference constructs provably valid confidence intervals and hypothesis tests while leveraging any black-box machine learning model and handling any data distribution. The key point is that it achieves the same level of accuracy with far fewer samples than existing baselines relying on non-adaptively-collected data. This means that for the same number of collected samples, active inference enables smaller confidence intervals and more powerful p-values. We evaluate active inference on datasets from public opinion research, census analysis, and proteomics. △ Less

Submitted 29 May, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

arXiv:2309.16598 [pdf, other]

Cross-Prediction-Powered Inference

Authors: Tijana Zrnic, Emmanuel J. Candès

Abstract: While reliable data-driven decision-making hinges on high-quality labeled data, the acquisition of quality labels often involves laborious human annotations or slow and expensive scientific measurements. Machine learning is becoming an appealing alternative as sophisticated predictive techniques are being used to quickly and cheaply produce large amounts of predicted labels; e.g., predicted protei… ▽ More While reliable data-driven decision-making hinges on high-quality labeled data, the acquisition of quality labels often involves laborious human annotations or slow and expensive scientific measurements. Machine learning is becoming an appealing alternative as sophisticated predictive techniques are being used to quickly and cheaply produce large amounts of predicted labels; e.g., predicted protein structures are used to supplement experimentally derived structures, predictions of socioeconomic indicators from satellite imagery are used to supplement accurate survey data, and so on. Since predictions are imperfect and potentially biased, this practice brings into question the validity of downstream inferences. We introduce cross-prediction: a method for valid inference powered by machine learning. With a small labeled dataset and a large unlabeled dataset, cross-prediction imputes the missing labels via machine learning and applies a form of debiasing to remedy the prediction inaccuracies. The resulting inferences achieve the desired error probability and are more powerful than those that only leverage the labeled data. Closely related is the recent proposal of prediction-powered inference, which assumes that a good pre-trained model is already available. We show that cross-prediction is consistently more powerful than an adaptation of prediction-powered inference in which a fraction of the labeled data is split off and used to train the model. Finally, we observe that cross-prediction gives more stable conclusions than its competitors; its confidence intervals typically have significantly lower variability. △ Less

Submitted 28 February, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

arXiv:2307.16895 [pdf, other]

Conformal PID Control for Time Series Prediction

Authors: Anastasios N. Angelopoulos, Emmanuel J. Candes, Ryan J. Tibshirani

Abstract: We study the problem of uncertainty quantification for time series prediction, with the goal of providing easy-to-use algorithms with formal guarantees. The algorithms we present build upon ideas from conformal prediction and control theory, are able to prospectively model conformal scores in an online setting, and adapt to the presence of systematic errors due to seasonality, trends, and general… ▽ More We study the problem of uncertainty quantification for time series prediction, with the goal of providing easy-to-use algorithms with formal guarantees. The algorithms we present build upon ideas from conformal prediction and control theory, are able to prospectively model conformal scores in an online setting, and adapt to the presence of systematic errors due to seasonality, trends, and general distribution shifts. Our theory both simplifies and strengthens existing analyses in online conformal prediction. Experiments on 4-week-ahead forecasting of statewide COVID-19 death counts in the U.S. show an improvement in coverage over the ensemble forecaster used in official CDC communications. We also run experiments on predicting electricity demand, market returns, and temperature using autoregressive, Theta, Prophet, and Transformer models. We provide an extendable codebase for testing our methods and for the integration of new algorithms, data sets, and forecasting rules. △ Less

Submitted 31 July, 2023; originally announced July 2023.

Comments: Code available at https://github.com/aangelopoulos/conformal-time-series

arXiv:2307.09291 [pdf, other]

Model-free selective inference under covariate shift via weighted conformal p-values

Authors: Ying **, Emmanuel J. Candès

Abstract: This paper introduces novel weighted conformal p-values and methods for model-free selective inference. The problem is as follows: given test units with covariates $X$ and missing responses $Y$, how do we select units for which the responses $Y$ are larger than user-specified values while controlling the proportion of false positives? Can we achieve this without any modeling assumptions on the dat… ▽ More This paper introduces novel weighted conformal p-values and methods for model-free selective inference. The problem is as follows: given test units with covariates $X$ and missing responses $Y$, how do we select units for which the responses $Y$ are larger than user-specified values while controlling the proportion of false positives? Can we achieve this without any modeling assumptions on the data and without any restriction on the model for predicting the responses? Last, methods should be applicable when there is a covariate shift between training and test data, which commonly occurs in practice. We answer these questions by first leveraging any prediction model to produce a class of well-calibrated weighted conformal p-values, which control the type-I error in detecting a large response. These p-values cannot be passed on to classical multiple testing procedures since they may not obey a well-known positive dependence property. Hence, we introduce weighted conformalized selection (WCS), a new procedure which controls false discovery rate (FDR) in finite samples. Besides prediction-assisted candidate selection, WCS (1) allows to infer multiple individual treatment effects, and (2) extends to outlier detection with inlier distributions shifts. We demonstrate performance via simulations and applications to causal inference, drug discovery, and outlier detection datasets. △ Less

Submitted 26 September, 2023; v1 submitted 18 July, 2023; originally announced July 2023.

arXiv:2305.12616 [pdf, other]

Conformal Prediction With Conditional Guarantees

Authors: Isaac Gibbs, John J. Cherian, Emmanuel J. Candès

Abstract: We consider the problem of constructing distribution-free prediction sets with finite-sample conditional guarantees. Prior work has shown that it is impossible to provide exact conditional coverage universally in finite samples. Thus, most popular methods only provide marginal coverage over the covariates. This paper bridges this gap by defining a spectrum of problems that interpolate between marg… ▽ More We consider the problem of constructing distribution-free prediction sets with finite-sample conditional guarantees. Prior work has shown that it is impossible to provide exact conditional coverage universally in finite samples. Thus, most popular methods only provide marginal coverage over the covariates. This paper bridges this gap by defining a spectrum of problems that interpolate between marginal and conditional validity. We motivate these problems by reformulating conditional coverage as coverage over a class of covariate shifts. When the target class of shifts is finite dimensional, we show how to simultaneously obtain exact finite sample coverage over all possible shifts. For example, given a collection of protected subgroups, our algorithm outputs intervals with exact coverage over each group. For more flexible, infinite dimensional classes where exact coverage is impossible, we provide a simple procedure for quantifying the gap between the coverage of our algorithm and the target level. Moreover, by tuning a single hyperparameter, we allow the practitioner to control the size of this gap across shifts of interest. Our methods can be easily incorporated into existing split conformal inference pipelines, and thus can be used to quantify the uncertainty of modern black-box algorithms without distributional assumptions. △ Less

Submitted 20 December, 2023; v1 submitted 21 May, 2023; originally announced May 2023.

Comments: 46 pages, 11 figures

arXiv:2305.03712 [pdf, other]

Statistical Inference for Fairness Auditing

Authors: John J. Cherian, Emmanuel J. Candès

Abstract: Before deploying a black-box model in high-stakes problems, it is important to evaluate the model's performance on sensitive subpopulations. For example, in a recidivism prediction task, we may wish to identify demographic groups for which our prediction model has unacceptably high false positive rates or certify that no such groups exist. In this paper, we frame this task, often referred to as "f… ▽ More Before deploying a black-box model in high-stakes problems, it is important to evaluate the model's performance on sensitive subpopulations. For example, in a recidivism prediction task, we may wish to identify demographic groups for which our prediction model has unacceptably high false positive rates or certify that no such groups exist. In this paper, we frame this task, often referred to as "fairness auditing," in terms of multiple hypothesis testing. We show how the bootstrap can be used to simultaneously bound performance disparities over a collection of groups with statistical guarantees. Our methods can be used to flag subpopulations affected by model underperformance, and certify subpopulations for which the model performs adequately. Crucially, our audit is model-agnostic and applicable to nearly any performance metric or group fairness criterion. Our methods also accommodate extremely rich -- even infinite -- collections of subpopulations. Further, we generalize beyond subpopulations by showing how to assess performance over certain distribution shifts. We test the proposed methods on benchmark datasets in predictive inference and algorithmic fairness and find that our audits can provide interpretable and trustworthy guarantees. △ Less

Submitted 8 June, 2023; v1 submitted 5 May, 2023; originally announced May 2023.

Comments: 44 pages, 8 figures

arXiv:2210.01408 [pdf, other]

Selection by Prediction with Conformal p-values

Authors: Ying **, Emmanuel J. Candès

Abstract: Decision making or scientific discovery pipelines such as job hiring and drug discovery often involve multiple stages: before any resource-intensive step, there is often an initial screening that uses predictions from a machine learning model to shortlist a few candidates from a large pool. We study screening procedures that aim to select candidates whose unobserved outcomes exceed user-specified… ▽ More Decision making or scientific discovery pipelines such as job hiring and drug discovery often involve multiple stages: before any resource-intensive step, there is often an initial screening that uses predictions from a machine learning model to shortlist a few candidates from a large pool. We study screening procedures that aim to select candidates whose unobserved outcomes exceed user-specified values. We develop a method that wraps around any prediction model to produce a subset of candidates while controlling the proportion of falsely selected units. Building upon the conformal inference framework, our method first constructs p-values that quantify the statistical evidence for large outcomes; it then determines the shortlist by comparing the p-values to a threshold introduced in the multiple testing literature. In many cases, the procedure selects candidates whose predictions are above a data-dependent threshold. Our theoretical guarantee holds under mild exchangeability conditions on the samples, generalizing existing results on multiple conformal p-values. We demonstrate the empirical performance of our method via simulations, and apply it to job hiring and drug discovery datasets. △ Less

Submitted 26 May, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

Comments: Journal of Machine Learning Research

arXiv:2208.08944 [pdf, other]

An Adaptively Resized Parametric Bootstrap for Inference in High-dimensional Generalized Linear Models

Authors: Qian Zhao, Emmanuel J. Candes

Abstract: Accurate statistical inference in logistic regression models remains a critical challenge when the ratio between the number of parameters and sample size is not negligible. This is because approximations based on either classical asymptotic theory or bootstrap calculations are grossly off the mark. This paper introduces a resized bootstrap method to infer model parameters in arbitrary dimensions.… ▽ More Accurate statistical inference in logistic regression models remains a critical challenge when the ratio between the number of parameters and sample size is not negligible. This is because approximations based on either classical asymptotic theory or bootstrap calculations are grossly off the mark. This paper introduces a resized bootstrap method to infer model parameters in arbitrary dimensions. As in the parametric bootstrap, we resample observations from a distribution, which depends on an estimated regression coefficient sequence. The novelty is that this estimate is actually far from the maximum likelihood estimate (MLE). This estimate is informed by recent theory studying properties of the MLE in high dimensions, and is obtained by appropriately shrinking the MLE towards the origin. We demonstrate that the resized bootstrap method yields valid confidence intervals in both simulated and real data examples. Our methods extend to other high-dimensional generalized linear models. △ Less

Submitted 18 August, 2022; originally announced August 2022.

arXiv:2204.13581 [pdf, ps, other]

Permutation tests using arbitrary permutation distributions

Authors: Aaditya Ramdas, Rina Foygel Barber, Emmanuel J. Candes, Ryan J. Tibshirani

Abstract: Permutation tests date back nearly a century to Fisher's randomized experiments, and remain an immensely popular statistical tool, used for testing hypotheses of independence between variables and other common inferential questions. Much of the existing literature has emphasized that, for the permutation p-value to be valid, one must first pick a subgroup $G$ of permutations (which could equal the… ▽ More Permutation tests date back nearly a century to Fisher's randomized experiments, and remain an immensely popular statistical tool, used for testing hypotheses of independence between variables and other common inferential questions. Much of the existing literature has emphasized that, for the permutation p-value to be valid, one must first pick a subgroup $G$ of permutations (which could equal the full group) and then recalculate the test statistic on permuted data using either an exhaustive enumeration of $G$, or a sample from $G$ drawn uniformly at random. In this work, we demonstrate that the focus on subgroups and uniform sampling are both unnecessary for validity -- in fact, a simple random modification of the permutation p-value remains valid even when using an arbitrary distribution (not necessarily uniform) over any subset of permutations (not necessarily a subgroup). We provide a unified theoretical treatment of such generalized permutation tests, recovering all known results from the literature as special cases. Thus, this work expands the flexibility of the permutation test toolkit available to the practitioner. △ Less

Submitted 2 December, 2022; v1 submitted 28 April, 2022; originally announced April 2022.

arXiv:2202.13415 [pdf, other]

Conformal prediction beyond exchangeability

Authors: Rina Foygel Barber, Emmanuel J. Candes, Aaditya Ramdas, Ryan J. Tibshirani

Abstract: Conformal prediction is a popular, modern technique for providing valid predictive inference for arbitrary machine learning models. Its validity relies on the assumptions of exchangeability of the data, and symmetry of the given model fitting algorithm as a function of the data. However, exchangeability is often violated when predictive models are deployed in practice. For example, if the data dis… ▽ More Conformal prediction is a popular, modern technique for providing valid predictive inference for arbitrary machine learning models. Its validity relies on the assumptions of exchangeability of the data, and symmetry of the given model fitting algorithm as a function of the data. However, exchangeability is often violated when predictive models are deployed in practice. For example, if the data distribution drifts over time, then the data points are no longer exchangeable; moreover, in such settings, we might want to use a nonsymmetric algorithm that treats recent observations as more relevant. This paper generalizes conformal prediction to deal with both aspects: we employ weighted quantiles to introduce robustness against distribution drift, and design a new randomization technique to allow for algorithms that do not treat data points symmetrically. Our new methods are provably robust, with substantially less loss of coverage when exchangeability is violated due to distribution drift or other challenging features of real data, while also achieving the same coverage guarantees as existing conformal prediction methods if the data points are in fact exchangeable. We demonstrate the practical utility of these new tools with simulations and real-data experiments on electricity and election forecasting. △ Less

Submitted 16 March, 2023; v1 submitted 27 February, 2022; originally announced February 2022.

arXiv:2111.12161 [pdf, other]

Sensitivity Analysis of Individual Treatment Effects: A Robust Conformal Inference Approach

Authors: Ying **, Zhimei Ren, Emmanuel J. Candès

Abstract: We propose a model-free framework for sensitivity analysis of individual treatment effects (ITEs), building upon ideas from conformal inference. For any unit, our procedure reports the $Γ$-value, a number which quantifies the minimum strength of confounding needed to explain away the evidence for ITE. Our approach rests on the reliable predictive inference of counterfactuals and ITEs in situations… ▽ More We propose a model-free framework for sensitivity analysis of individual treatment effects (ITEs), building upon ideas from conformal inference. For any unit, our procedure reports the $Γ$-value, a number which quantifies the minimum strength of confounding needed to explain away the evidence for ITE. Our approach rests on the reliable predictive inference of counterfactuals and ITEs in situations where the training data is confounded. Under the marginal sensitivity model of Tan (2006), we characterize the shift between the distribution of the observations and that of the counterfactuals. We first develop a general method for predictive inference of test samples from a shifted distribution; we then leverage this to construct covariate-dependent prediction sets for counterfactuals. No matter the value of the shift, these prediction sets (resp. approximately) achieve marginal coverage if the propensity score is known exactly (resp. estimated). We describe a distinct procedure also attaining coverage, however, conditional on the training data. In the latter case, we prove a sharpness result showing that for certain classes of prediction problems, the prediction intervals cannot possibly be tightened. We verify the validity and performance of the new methods via simulation studies and apply them to analyze real datasets. △ Less

Submitted 24 April, 2022; v1 submitted 23 November, 2021; originally announced November 2021.

arXiv:2110.02422 [pdf, other]

Deploying the Conditional Randomization Test in High Multiplicity Problems

Authors: Shuangning Li, Emmanuel J. Candès

Abstract: This paper introduces the sequential CRT, which is a variable selection procedure that combines the conditional randomization test (CRT) and Selective SeqStep+. Valid p-values are constructed via the flexible CRT, which are then ordered and passed through the selective SeqStep+ filter to produce a list of discoveries. We develop theory guaranteeing control on the false discovery rate (FDR) even th… ▽ More This paper introduces the sequential CRT, which is a variable selection procedure that combines the conditional randomization test (CRT) and Selective SeqStep+. Valid p-values are constructed via the flexible CRT, which are then ordered and passed through the selective SeqStep+ filter to produce a list of discoveries. We develop theory guaranteeing control on the false discovery rate (FDR) even though the p-values are not independent. We show in simulations that our novel procedure indeed controls the FDR and are competitive with -- and sometimes outperform -- state-of-the-art alternatives in terms of power. Finally, we apply our methodology to a breast cancer dataset with the goal of identifying biomarkers associated with cancer stage. △ Less

Submitted 7 April, 2022; v1 submitted 5 October, 2021; originally announced October 2021.

arXiv:2110.01052 [pdf, other]

Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control

Authors: Anastasios N. Angelopoulos, Stephen Bates, Emmanuel J. Candès, Michael I. Jordan, Lihua Lei

Abstract: We introduce a framework for calibrating machine learning models so that their predictions satisfy explicit, finite-sample statistical guarantees. Our calibration algorithms work with any underlying model and (unknown) data-generating distribution and do not require model refitting. The framework addresses, among other examples, false discovery rate control in multi-label classification, intersect… ▽ More We introduce a framework for calibrating machine learning models so that their predictions satisfy explicit, finite-sample statistical guarantees. Our calibration algorithms work with any underlying model and (unknown) data-generating distribution and do not require model refitting. The framework addresses, among other examples, false discovery rate control in multi-label classification, intersection-over-union control in instance segmentation, and the simultaneous control of the type-1 error of outlier detection and confidence set coverage in classification or regression. Our main insight is to reframe the risk-control problem as multiple hypothesis testing, enabling techniques and mathematical arguments different from those in the previous literature. We use the framework to provide new calibration methods for several core machine learning tasks, with detailed worked examples in computer vision and tabular medical data. △ Less

Submitted 29 September, 2022; v1 submitted 3 October, 2021; originally announced October 2021.

Comments: Code available at https://github.com/aangelopoulos/ltt

arXiv:2103.09763 [pdf, other]

Conformalized Survival Analysis

Authors: Emmanuel J. Candès, Lihua Lei, Zhimei Ren

Abstract: Existing survival analysis techniques heavily rely on strong modelling assumptions and are, therefore, prone to model misspecification errors. In this paper, we develop an inferential method based on ideas from conformal prediction, which can wrap around any survival prediction algorithm to produce calibrated, covariate-dependent lower predictive bounds on survival times. In the Type I right-censo… ▽ More Existing survival analysis techniques heavily rely on strong modelling assumptions and are, therefore, prone to model misspecification errors. In this paper, we develop an inferential method based on ideas from conformal prediction, which can wrap around any survival prediction algorithm to produce calibrated, covariate-dependent lower predictive bounds on survival times. In the Type I right-censoring setting, when the censoring times are completely exogenous, the lower predictive bounds have guaranteed coverage in finite samples without any assumptions other than that of operating on independent and identically distributed data points. Under a more general conditionally independent censoring assumption, the bounds satisfy a doubly robust property which states the following: marginal coverage is approximately guaranteed if either the censoring mechanism or the conditional survival function is estimated well. Further, we demonstrate that the lower predictive bounds remain valid and informative for other types of censoring. The validity and efficiency of our procedure are demonstrated on synthetic data and real COVID-19 data from the UK Biobank. △ Less

Submitted 23 April, 2023; v1 submitted 17 March, 2021; originally announced March 2021.

arXiv:2102.07967 [pdf, other]

Distribution-Free Conditional Median Inference

Authors: Dhruv Medarametla, Emmanuel J. Candès

Abstract: We consider the problem of constructing confidence intervals for the median of a response $Y \in \mathbb{R}$ conditional on features $X \in \mathbb{R}^d$ in a situation where we are not willing to make any assumption whatsoever on the underlying distribution of the data $(X,Y)$. We propose a method based upon ideas from conformal prediction and establish a theoretical guarantee of coverage while a… ▽ More We consider the problem of constructing confidence intervals for the median of a response $Y \in \mathbb{R}$ conditional on features $X \in \mathbb{R}^d$ in a situation where we are not willing to make any assumption whatsoever on the underlying distribution of the data $(X,Y)$. We propose a method based upon ideas from conformal prediction and establish a theoretical guarantee of coverage while also going over particular distributions where its performance is sharp. Additionally, we prove an equivalence between confidence intervals for the conditional median and confidence intervals for the response variable, resulting in a lower bound on the length of any possible conditional median confidence interval. This lower bound is independent of sample size and holds for all distributions with no point masses. △ Less

Submitted 3 September, 2021; v1 submitted 16 February, 2021; originally announced February 2021.

Comments: 27 pages, 4 figures

arXiv:2006.06138 [pdf, other]

Conformal Inference of Counterfactuals and Individual Treatment Effects

Authors: Lihua Lei, Emmanuel J. Candès

Abstract: Evaluating treatment effect heterogeneity widely informs treatment decision making. At the moment, much emphasis is placed on the estimation of the conditional average treatment effect via flexible machine learning algorithms. While these methods enjoy some theoretical appeal in terms of consistency and convergence rates, they generally perform poorly in terms of uncertainty quantification. This i… ▽ More Evaluating treatment effect heterogeneity widely informs treatment decision making. At the moment, much emphasis is placed on the estimation of the conditional average treatment effect via flexible machine learning algorithms. While these methods enjoy some theoretical appeal in terms of consistency and convergence rates, they generally perform poorly in terms of uncertainty quantification. This is troubling since assessing risk is crucial for reliable decision-making in sensitive and uncertain environments. In this work, we propose a conformal inference-based approach that can produce reliable interval estimates for counterfactuals and individual treatment effects under the potential outcome framework. For completely randomized or stratified randomized experiments with perfect compliance, the intervals have guaranteed average coverage in finite samples regardless of the unknown data generating mechanism. For randomized experiments with ignorable compliance and general observational studies obeying the strong ignorability assumption, the intervals satisfy a doubly robust property which states the following: the average coverage is approximately controlled if either the propensity score or the conditional quantiles of potential outcomes can be estimated accurately. Numerical studies on both synthetic and real datasets empirically demonstrate that existing methods suffer from a significant coverage deficit even in simple models. In contrast, our methods achieve the desired coverage with reasonably short intervals. △ Less

Submitted 5 May, 2021; v1 submitted 10 June, 2020; originally announced June 2020.

Comments: Accepted by Journal of the Royal Statistical Society: Series B (JRSSB); 38 pages

arXiv:2006.04937 [pdf, other]

doi 10.1109/JBHI.2021.3094873

Interpretable Classification of Bacterial Raman Spectra with Knockoff Wavelets

Authors: Charmaine Chia, Matteo Sesia, Chi-Sing Ho, Stefanie S. Jeffrey, Jennifer Dionne, Emmanuel J. Candès, Roger T. Howe

Abstract: Deep neural networks and other sophisticated machine learning models are widely applied to biomedical signal data because they can detect complex patterns and compute accurate predictions. However, the difficulty of interpreting such models is a limitation, especially for applications involving high-stakes decision, including the identification of bacterial infections. In this paper, we consider f… ▽ More Deep neural networks and other sophisticated machine learning models are widely applied to biomedical signal data because they can detect complex patterns and compute accurate predictions. However, the difficulty of interpreting such models is a limitation, especially for applications involving high-stakes decision, including the identification of bacterial infections. In this paper, we consider fast Raman spectroscopy data and demonstrate that a logistic regression model with carefully selected features achieves accuracy comparable to that of neural networks, while being much simpler and more transparent. Our analysis leverages wavelet features with intuitive chemical interpretations, and performs controlled variable selection with knockoffs to ensure the predictors are relevant and non-redundant. Although we focus on a particular data set, the proposed approach is broadly applicable to other types of signal data for which interpretability may be important. △ Less

Submitted 1 May, 2021; v1 submitted 8 June, 2020; originally announced June 2020.

Comments: 9 pages, 6 figures, 4 tables

arXiv:2006.04292 [pdf, other]

Achieving Equalized Odds by Resampling Sensitive Attributes

Authors: Yaniv Romano, Stephen Bates, Emmanuel J. Candès

Abstract: We present a flexible framework for learning predictive models that approximately satisfy the equalized odds notion of fairness. This is achieved by introducing a general discrepancy functional that rigorously quantifies violations of this criterion. This differentiable functional is used as a penalty driving the model parameters towards equalized odds. To rigorously evaluate fitted models, we dev… ▽ More We present a flexible framework for learning predictive models that approximately satisfy the equalized odds notion of fairness. This is achieved by introducing a general discrepancy functional that rigorously quantifies violations of this criterion. This differentiable functional is used as a penalty driving the model parameters towards equalized odds. To rigorously evaluate fitted models, we develop a formal hypothesis test to detect whether a prediction rule violates this property, the first such test in the literature. Both the model fitting and hypothesis testing leverage a resampled version of the sensitive attribute obeying equalized odds, by construction. We demonstrate the applicability and validity of the proposed framework both in regression and multi-class classification problems, reporting improved performance over state-of-the-art methods. Lastly, we show how to incorporate techniques for equitable uncertainty quantification---unbiased for each group under study---to communicate the results of the data analysis in exact terms. △ Less

Submitted 7 June, 2020; originally announced June 2020.

Comments: 14 pages, 4 figures

arXiv:2006.02544 [pdf, other]

Classification with Valid and Adaptive Coverage

Authors: Yaniv Romano, Matteo Sesia, Emmanuel J. Candès

Abstract: Conformal inference, cross-validation+, and the jackknife+ are hold-out methods that can be combined with virtually any machine learning algorithm to construct prediction sets with guaranteed marginal coverage. In this paper, we develop specialized versions of these techniques for categorical and unordered response labels that, in addition to providing marginal coverage, are also fully adaptive to… ▽ More Conformal inference, cross-validation+, and the jackknife+ are hold-out methods that can be combined with virtually any machine learning algorithm to construct prediction sets with guaranteed marginal coverage. In this paper, we develop specialized versions of these techniques for categorical and unordered response labels that, in addition to providing marginal coverage, are also fully adaptive to complex data distributions, in the sense that they perform favorably in terms of approximate conditional coverage compared to alternative methods. The heart of our contribution is a novel conformity score, which we explicitly demonstrate to be powerful and intuitive for classification problems, but whose underlying principle is potentially far more general. Experiments on synthetic and real data demonstrate the practical value of our theoretical guarantees, as well as the statistical advantages of the proposed methods over the existing alternatives. △ Less

Submitted 3 June, 2020; originally announced June 2020.

Comments: 10 pages, 3 figures; 13 supplementary pages, 4 supplementary figures, 4 supplementary tables

Journal ref: Advances in Neural Information Processing Systems 33 (NeurIPS 2020)

arXiv:2001.09351 [pdf, other]

doi 10.3150/21-BEJ1401

The Asymptotic Distribution of the MLE in High-dimensional Logistic Models: Arbitrary Covariance

Authors: Qian Zhao, Pragya Sur, Emmanuel J. Candès

Abstract: We study the distribution of the maximum likelihood estimate (MLE) in high-dimensional logistic models, extending the recent results from Sur (2019) to the case where the Gaussian covariates may have an arbitrary covariance structure. We prove that in the limit of large problems holding the ratio between the number $p$ of covariates and the sample size $n$ constant, every finite list of MLE coordi… ▽ More We study the distribution of the maximum likelihood estimate (MLE) in high-dimensional logistic models, extending the recent results from Sur (2019) to the case where the Gaussian covariates may have an arbitrary covariance structure. We prove that in the limit of large problems holding the ratio between the number $p$ of covariates and the sample size $n$ constant, every finite list of MLE coordinates follows a multivariate normal distribution. Concretely, the $j$th coordinate $\hat β_j$ of the MLE is asymptotically normally distributed with mean $α_\star β_j$ and standard deviation $σ_\star/τ_j$; here, $β_j$ is the value of the true regression coefficient, and $τ_j$ the standard deviation of the $j$th predictor conditional on all the others. The numerical parameters $α_\star > 1$ and $σ_\star$ only depend upon the problem dimensionality $p/n$ and the overall signal strength, and can be accurately estimated. Our results imply that the MLE's magnitude is biased upwards and that the MLE's standard deviation is greater than that predicted by classical theory. We present a series of experiments on simulated and real data showing excellent agreement with the theory. △ Less

Submitted 4 January, 2023; v1 submitted 25 January, 2020; originally announced January 2020.

Journal ref: Bernoulli 28 (3) 1835-1861, August 2022

arXiv:1909.05433 [pdf, other]

doi 10.1002/sta4.261

A comparison of some conformal quantile regression methods

Authors: Matteo Sesia, Emmanuel J. Candès

Abstract: We compare two recently proposed methods that combine ideas from conformal inference and quantile regression to produce locally adaptive and marginally valid prediction intervals under sample exchangeability (Romano et al., 2019; Kivaranovic et al., 2019). First, we prove that these two approaches are asymptotically efficient in large samples, under some additional assumptions. Then we compare the… ▽ More We compare two recently proposed methods that combine ideas from conformal inference and quantile regression to produce locally adaptive and marginally valid prediction intervals under sample exchangeability (Romano et al., 2019; Kivaranovic et al., 2019). First, we prove that these two approaches are asymptotically efficient in large samples, under some additional assumptions. Then we compare them empirically on simulated and real data. Our results demonstrate that the method in Romano et al. (2019) typically yields tighter prediction intervals in finite samples. Finally, we discuss how to tune these procedures by fixing the relative proportions of observations used for training and conformalization. △ Less

Submitted 11 September, 2019; originally announced September 2019.

Comments: 20 pages, 9 figures, 3 tables

Journal ref: Stat. 2020; 9:e261

arXiv:1908.05428 [pdf, other]

With Malice Towards None: Assessing Uncertainty via Equalized Coverage

Authors: Yaniv Romano, Rina Foygel Barber, Chiara Sabatti, Emmanuel J. Candès

Abstract: An important factor to guarantee a fair use of data-driven recommendation systems is that we should be able to communicate their uncertainty to decision makers. This can be accomplished by constructing prediction intervals, which provide an intuitive measure of the limits of predictive performance. To support equitable treatment, we force the construction of such intervals to be unbiased in the se… ▽ More An important factor to guarantee a fair use of data-driven recommendation systems is that we should be able to communicate their uncertainty to decision makers. This can be accomplished by constructing prediction intervals, which provide an intuitive measure of the limits of predictive performance. To support equitable treatment, we force the construction of such intervals to be unbiased in the sense that their coverage must be equal across all protected groups of interest. We present an operational methodology that achieves this goal by offering rigorous distribution-free coverage guarantees holding in finite samples. Our methodology, equalized coverage, is flexible as it can be viewed as a wrapper around any predictive algorithm. We test the applicability of the proposed framework on real data, demonstrating that equalized coverage constructs unbiased prediction intervals, unlike competitive methods. △ Less

Submitted 15 August, 2019; originally announced August 2019.

Comments: 14 pages, 1 figure, 1 table

arXiv:1905.03222 [pdf, other]

Conformalized Quantile Regression

Authors: Yaniv Romano, Evan Patterson, Emmanuel J. Candès

Abstract: Conformal prediction is a technique for constructing prediction intervals that attain valid coverage in finite samples, without making distributional assumptions. Despite this appeal, existing conformal methods can be unnecessarily conservative because they form intervals of constant or weakly varying length across the input space. In this paper we propose a new method that is fully adaptive to he… ▽ More Conformal prediction is a technique for constructing prediction intervals that attain valid coverage in finite samples, without making distributional assumptions. Despite this appeal, existing conformal methods can be unnecessarily conservative because they form intervals of constant or weakly varying length across the input space. In this paper we propose a new method that is fully adaptive to heteroscedasticity. It combines conformal prediction with classical quantile regression, inheriting the advantages of both. We establish a theoretical guarantee of valid coverage, supplemented by extensive experiments on popular regression datasets. We compare the efficiency of conformalized quantile regression to other conformal methods, showing that our method tends to produce shorter intervals. △ Less

Submitted 8 May, 2019; originally announced May 2019.

Comments: 19 pages, 8 figures, 1 table

arXiv:1905.02928 [pdf, other]

Predictive inference with the jackknife+

Authors: Rina Foygel Barber, Emmanuel J. Candes, Aaditya Ramdas, Ryan J. Tibshirani

Abstract: This paper introduces the jackknife+, which is a novel method for constructing predictive confidence intervals. Whereas the jackknife outputs an interval centered at the predicted response of a test point, with the width of the interval determined by the quantiles of leave-one-out residuals, the jackknife+ also uses the leave-one-out predictions at the test point to account for the variability in… ▽ More This paper introduces the jackknife+, which is a novel method for constructing predictive confidence intervals. Whereas the jackknife outputs an interval centered at the predicted response of a test point, with the width of the interval determined by the quantiles of leave-one-out residuals, the jackknife+ also uses the leave-one-out predictions at the test point to account for the variability in the fitted regression function. Assuming exchangeable training samples, we prove that this crucial modification permits rigorous coverage guarantees regardless of the distribution of the data points, for any algorithm that treats the training points symmetrically. Such guarantees are not possible for the original jackknife and we demonstrate examples where the coverage rate may actually vanish. Our theoretical and empirical analysis reveals that the jackknife and the jackknife+ intervals achieve nearly exact coverage and have similar lengths whenever the fitting algorithm obeys some form of stability. Further, we extend the jackknife+ to K-fold cross validation and similarly establish rigorous coverage properties. Our methods are related to cross-conformal prediction proposed by Vovk [2015] and we discuss connections. △ Less

Submitted 29 May, 2020; v1 submitted 8 May, 2019; originally announced May 2019.

arXiv:1904.06019 [pdf, other]

Conformal Prediction Under Covariate Shift

Authors: Ryan J. Tibshirani, Rina Foygel Barber, Emmanuel J. Candes, Aaditya Ramdas

Abstract: We extend conformal prediction methodology beyond the case of exchangeable data. In particular, we show that a weighted version of conformal prediction can be used to compute distribution-free prediction intervals for problems in which the test and training covariate distributions differ, but the likelihood ratio between these two distributions is known---or, in practice, can be estimated accurate… ▽ More We extend conformal prediction methodology beyond the case of exchangeable data. In particular, we show that a weighted version of conformal prediction can be used to compute distribution-free prediction intervals for problems in which the test and training covariate distributions differ, but the likelihood ratio between these two distributions is known---or, in practice, can be estimated accurately with access to a large set of unlabeled data (test covariate points). Our weighted extension of conformal prediction also applies more generally, to settings in which the data satisfies a certain weighted notion of exchangeability. We discuss other potential applications of our new conformal methodology, including latent variable and missing data problems. △ Less

Submitted 6 July, 2020; v1 submitted 11 April, 2019; originally announced April 2019.

Comments: 17 pages, 4 figures

arXiv:1903.05701 [pdf, other]

doi 10.1093/biomet/asy075

Rejoinder: "Gene Hunting with Hidden Markov Model Knockoffs"

Authors: Matteo Sesia, Chiara Sabatti, Emmanuel J. Candès

Abstract: In this paper we deepen and enlarge the reflection on the possible advantages of a knockoff approach to genome wide association studies (Sesia et al., 2018), starting from the discussions in Bottolo & Richardson (2019); Jewell & Witten (2019); Rosenblatt et al. (2019) and Marchini (2019). The discussants bring up a number of important points, either related to the knockoffs methodology in general,… ▽ More In this paper we deepen and enlarge the reflection on the possible advantages of a knockoff approach to genome wide association studies (Sesia et al., 2018), starting from the discussions in Bottolo & Richardson (2019); Jewell & Witten (2019); Rosenblatt et al. (2019) and Marchini (2019). The discussants bring up a number of important points, either related to the knockoffs methodology in general, or to its specific application to genetic studies. In the following we offer some clarifications, mention relevant recent developments and highlight some of the still open problems. △ Less

Submitted 13 March, 2019; originally announced March 2019.

Comments: 12 pages, 4 figures

Journal ref: Biometrika, Volume 106, Issue 1, 1 March 2019, Pages 35-45

arXiv:1811.06687 [pdf, other]

doi 10.1080/01621459.2019.1660174

Deep Knockoffs

Authors: Yaniv Romano, Matteo Sesia, Emmanuel J. Candès

Abstract: This paper introduces a machine for sampling approximate model-X knockoffs for arbitrary and unspecified data distributions using deep generative models. The main idea is to iteratively refine a knockoff sampling mechanism until a criterion measuring the validity of the produced knockoffs is optimized; this criterion is inspired by the popular maximum mean discrepancy in machine learning and can b… ▽ More This paper introduces a machine for sampling approximate model-X knockoffs for arbitrary and unspecified data distributions using deep generative models. The main idea is to iteratively refine a knockoff sampling mechanism until a criterion measuring the validity of the produced knockoffs is optimized; this criterion is inspired by the popular maximum mean discrepancy in machine learning and can be thought of as measuring the distance to pairwise exchangeability between original and knockoff features. By building upon the existing model-X framework, we thus obtain a flexible and model-free statistical tool to perform controlled variable selection. Extensive numerical experiments and quantitative tests confirm the generality, effectiveness, and power of our deep knockoff machines. Finally, we apply this new method to a real study of mutations linked to changes in drug resistance in the human immunodeficiency virus. △ Less

Submitted 16 November, 2018; originally announced November 2018.

Comments: 37 pages, 23 figures, 1 table

Journal ref: J. Am. Stat. Assoc., Volume 0, Issue 0, 17 Oct 2019, Pages 1-12

arXiv:1804.09753 [pdf, other]

The phase transition for the existence of the maximum likelihood estimate in high-dimensional logistic regression

Authors: Emmanuel J. Candes, Pragya Sur

Abstract: This paper rigorously establishes that the existence of the maximum likelihood estimate (MLE) in high-dimensional logistic regression models with Gaussian covariates undergoes a sharp `phase transition'. We introduce an explicit boundary curve $h_{\text{MLE}}$, parameterized by two scalars measuring the overall magnitude of the unknown sequence of regression coefficients, with the following proper… ▽ More This paper rigorously establishes that the existence of the maximum likelihood estimate (MLE) in high-dimensional logistic regression models with Gaussian covariates undergoes a sharp `phase transition'. We introduce an explicit boundary curve $h_{\text{MLE}}$, parameterized by two scalars measuring the overall magnitude of the unknown sequence of regression coefficients, with the following property: in the limit of large sample sizes $n$ and number of features $p$ proportioned in such a way that $p/n \rightarrow κ$, we show that if the problem is sufficiently high dimensional in the sense that $κ> h_{\text{MLE}}$, then the MLE does not exist with probability one. Conversely, if $κ< h_{\text{MLE}}$, the MLE asymptotically exists with probability one. △ Less

Submitted 25 April, 2018; originally announced April 2018.

Comments: 15 pages, 2 figures

arXiv:1803.06964 [pdf, other]

doi 10.1073/pnas.1810420116

A modern maximum-likelihood theory for high-dimensional logistic regression

Authors: Pragya Sur, Emmanuel J. Candes

Abstract: Every student in statistics or data science learns early on that when the sample size largely exceeds the number of variables, fitting a logistic model produces estimates that are approximately unbiased. Every student also learns that there are formulas to predict the variability of these estimates which are used for the purpose of statistical inference; for instance, to produce p-values for testi… ▽ More Every student in statistics or data science learns early on that when the sample size largely exceeds the number of variables, fitting a logistic model produces estimates that are approximately unbiased. Every student also learns that there are formulas to predict the variability of these estimates which are used for the purpose of statistical inference; for instance, to produce p-values for testing the significance of regression coefficients. Although these formulas come from large sample asymptotics, we are often told that we are on reasonably safe grounds when $n$ is large in such a way that $n \ge 5p$ or $n \ge 10p$. This paper shows that this is far from the case, and consequently, inferences routinely produced by common software packages are often unreliable. Consider a logistic model with independent features in which $n$ and $p$ become increasingly large in a fixed ratio. Then we show that (1) the MLE is biased, (2) the variability of the MLE is far greater than classically predicted, and (3) the commonly used likelihood-ratio test (LRT) is not distributed as a chi-square. The bias of the MLE is extremely problematic as it yields completely wrong predictions for the probability of a case based on observed values of the covariates. We develop a new theory, which asymptotically predicts (1) the bias of the MLE, (2) the variability of the MLE, and (3) the distribution of the LRT. We empirically also demonstrate that these predictions are extremely accurate in finite samples. Further, an appealing feature is that these novel predictions depend on the unknown sequence of regression coefficients only through a single scalar, the overall strength of the signal. This suggests very concrete procedures to adjust inference; we describe one such procedure learning a single parameter from data and producing accurate inference △ Less

Submitted 16 June, 2018; v1 submitted 19 March, 2018; originally announced March 2018.

Comments: 29 pages, 14 figures, 4 tables

arXiv:1801.03896 [pdf, ps, other]

Robust inference with knockoffs

Authors: Rina Foygel Barber, Emmanuel J. Candès, Richard J. Samworth

Abstract: We consider the variable selection problem, which seeks to identify important variables influencing a response $Y$ out of many candidate features $X_1, \ldots, X_p$. We wish to do so while offering finite-sample guarantees about the fraction of false positives - selected variables $X_j$ that in fact have no effect on $Y$ after the other features are known. When the number of features $p$ is large… ▽ More We consider the variable selection problem, which seeks to identify important variables influencing a response $Y$ out of many candidate features $X_1, \ldots, X_p$. We wish to do so while offering finite-sample guarantees about the fraction of false positives - selected variables $X_j$ that in fact have no effect on $Y$ after the other features are known. When the number of features $p$ is large (perhaps even larger than the sample size $n$), and we have no prior knowledge regarding the type of dependence between $Y$ and $X$, the model-X knockoffs framework nonetheless allows us to select a model with a guaranteed bound on the false discovery rate, as long as the distribution of the feature vector $X=(X_1,\dots,X_p)$ is exactly known. This model selection procedure operates by constructing "knockoff copies'" of each of the $p$ features, which are then used as a control group to ensure that the model selection algorithm is not choosing too many irrelevant features. In this work, we study the practical setting where the distribution of $X$ could only be estimated, rather than known exactly, and the knockoff copies of the $X_j$'s are therefore constructed somewhat incorrectly. Our results, which are free of any modeling assumption whatsoever, show that the resulting model selection procedure incurs an inflation of the false discovery rate that is proportional to our errors in estimating the distribution of each feature $X_j$ conditional on the remaining features $\{X_k:k\neq j\}$. The model-X knockoff framework is therefore robust to errors in the underlying assumptions on the distribution of $X$, making it an effective method for many practical applications, such as genome-wide association studies, where the underlying distribution on the features $X_1,\dots,X_p$ is estimated accurately but not known exactly. △ Less

Submitted 11 February, 2019; v1 submitted 11 January, 2018; originally announced January 2018.

arXiv:1706.04677 [pdf, other]

doi 10.1093/biomet/asy033

Gene Hunting with Knockoffs for Hidden Markov Models

Authors: Matteo Sesia, Chiara Sabatti, Emmanuel J. Candès

Abstract: Modern scientific studies often require the identification of a subset of relevant explanatory variables, in the attempt to understand an interesting phenomenon. Several statistical methods have been developed to automate this task, but only recently has the framework of model-free knockoffs proposed a general solution that can perform variable selection under rigorous type-I error control, withou… ▽ More Modern scientific studies often require the identification of a subset of relevant explanatory variables, in the attempt to understand an interesting phenomenon. Several statistical methods have been developed to automate this task, but only recently has the framework of model-free knockoffs proposed a general solution that can perform variable selection under rigorous type-I error control, without relying on strong modeling assumptions. In this paper, we extend the methodology of model-free knockoffs to a rich family of problems where the distribution of the covariates can be described by a hidden Markov model (HMM). We develop an exact and efficient algorithm to sample knockoff copies of an HMM. We then argue that combined with the knockoffs selective framework, they provide a natural and powerful tool for performing principled inference in genome-wide association studies with guaranteed FDR control. Finally, we apply our methodology to several datasets aimed at studying the Crohn's disease and several continuous phenotypes, e.g. levels of cholesterol. △ Less

Submitted 14 June, 2017; originally announced June 2017.

Comments: 35 pages, 13 figues, 9 tables

Journal ref: Biometrika, Volume 106, Issue 1, 1 March 2019, Pages 1-18

arXiv:1706.01191 [pdf, other]

The Likelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square

Authors: Pragya Sur, Yuxin Chen, Emmanuel J. Candès

Abstract: Logistic regression is used thousands of times a day to fit data, predict future outcomes, and assess the statistical significance of explanatory variables. When used for the purpose of statistical inference, logistic models produce p-values for the regression coefficients by using an approximation to the distribution of the likelihood-ratio test. Indeed, Wilks' theorem asserts that whenever we ha… ▽ More Logistic regression is used thousands of times a day to fit data, predict future outcomes, and assess the statistical significance of explanatory variables. When used for the purpose of statistical inference, logistic models produce p-values for the regression coefficients by using an approximation to the distribution of the likelihood-ratio test. Indeed, Wilks' theorem asserts that whenever we have a fixed number $p$ of variables, twice the log-likelihood ratio (LLR) $2Λ$ is distributed as a $χ^2_k$ variable in the limit of large sample sizes $n$; here, $k$ is the number of variables being tested. In this paper, we prove that when $p$ is not negligible compared to $n$, Wilks' theorem does not hold and that the chi-square approximation is grossly incorrect; in fact, this approximation produces p-values that are far too small (under the null hypothesis). Assume that $n$ and $p$ grow large in such a way that $p/n\rightarrowκ$ for some constant $κ< 1/2$. We prove that for a class of logistic models, the LLR converges to a rescaled chi-square, namely, $2Λ~\stackrel{\mathrm{d}}{\rightarrow}~α(κ)χ_k^2$, where the scaling factor $α(κ)$ is greater than one as soon as the dimensionality ratio $κ$ is positive. Hence, the LLR is larger than classically assumed. For instance, when $κ=0.3$, $α(κ)\approx1.5$. In general, we show how to compute the scaling factor by solving a nonlinear system of two equations with two unknowns. Our mathematical arguments are involved and use techniques from approximate message passing theory, non-asymptotic random matrix theory and convex geometry. We also complement our mathematical study by showing that the new limiting distribution is accurate for finite sample sizes. Finally, all the results from this paper extend to some other regression models such as the probit regression model. △ Less

Submitted 5 June, 2017; originally announced June 2017.

Comments: 58 pages, 7 figures

arXiv:1602.03574 [pdf, other]

A knockoff filter for high-dimensional selective inference

Authors: Rina Foygel Barber, Emmanuel J. Candes

Abstract: This paper develops a framework for testing for associations in a possibly high-dimensional linear model where the number of features/variables may far exceed the number of observational units. In this framework, the observations are split into two groups, where the first group is used to screen for a set of potentially relevant variables, whereas the second is used for inference over this reduced… ▽ More This paper develops a framework for testing for associations in a possibly high-dimensional linear model where the number of features/variables may far exceed the number of observational units. In this framework, the observations are split into two groups, where the first group is used to screen for a set of potentially relevant variables, whereas the second is used for inference over this reduced set of variables; we also develop strategies for leveraging information from the first part of the data at the inference step for greater power. In our work, the inferential step is carried out by applying the recently introduced knockoff filter, which creates a knockoff copy-a fake variable serving as a control-for each screened variable. We prove that this procedure controls the directional false discovery rate (FDR) in the reduced model controlling for all screened variables; this says that our high-dimensional knockoff procedure 'discovers' important variables as well as the directions (signs) of their effects, in such a way that the expected proportion of wrongly chosen signs is below the user-specified level (thereby controlling a notion of Type S error averaged over the selected set). This result is non-asymptotic, and holds for any distribution of the original features and any values of the unknown regression coefficients, so that inference is not calibrated under hypothesized values of the effect sizes. We demonstrate the performance of our general and flexible approach through numerical studies, showing more power than existing alternatives. Finally, we apply our method to a genome-wide association study to find locations on the genome that are possibly associated with a continuous phenotype. △ Less

Submitted 3 May, 2018; v1 submitted 10 February, 2016; originally announced February 2016.

arXiv:1505.05114 [pdf, other]

Solving Random Quadratic Systems of Equations Is Nearly as Easy as Solving Linear Systems

Authors: Yuxin Chen, Emmanuel J. Candes

Abstract: We consider the fundamental problem of solving quadratic systems of equations in $n$ variables, where $y_i = |\langle \boldsymbol{a}_i, \boldsymbol{x} \rangle|^2$, $i = 1, \ldots, m$ and $\boldsymbol{x} \in \mathbb{R}^n$ is unknown. We propose a novel method, which starting with an initial guess computed by means of a spectral method, proceeds by minimizing a nonconvex functional as in the Wirting… ▽ More We consider the fundamental problem of solving quadratic systems of equations in $n$ variables, where $y_i = |\langle \boldsymbol{a}_i, \boldsymbol{x} \rangle|^2$, $i = 1, \ldots, m$ and $\boldsymbol{x} \in \mathbb{R}^n$ is unknown. We propose a novel method, which starting with an initial guess computed by means of a spectral method, proceeds by minimizing a nonconvex functional as in the Wirtinger flow approach. There are several key distinguishing features, most notably, a distinct objective functional and novel update rules, which operate in an adaptive fashion and drop terms bearing too much influence on the search direction. These careful selection rules provide a tighter initial guess, better descent directions, and thus enhanced practical performance. On the theoretical side, we prove that for certain unstructured models of quadratic systems, our algorithms return the correct solution in linear time, i.e. in time proportional to reading the data $\{\boldsymbol{a}_i\}$ and $\{y_i\}$ as soon as the ratio $m/n$ between the number of equations and unknowns exceeds a fixed numerical constant. We extend the theory to deal with noisy systems in which we only have $y_i \approx |\langle \boldsymbol{a}_i, \boldsymbol{x} \rangle|^2$ and prove that our algorithms achieve a statistical accuracy, which is nearly un-improvable. We complement our theoretical study with numerical examples showing that solving random quadratic systems is both computationally and statistically not much harder than solving linear systems of the same size---hence the title of this paper. For instance, we demonstrate empirically that the computational cost of our algorithm is about four times that of solving a least-squares problem of the same size. △ Less

Submitted 22 March, 2016; v1 submitted 19 May, 2015; originally announced May 2015.

Comments: accepted to Communications on Pure and Applied Mathematics (CPAM)

arXiv:1503.01243 [pdf, ps, other]

A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights

Authors: Weijie Su, Stephen Boyd, Emmanuel J. Candes

Abstract: We derive a second-order ordinary differential equation (ODE) which is the limit of Nesterov's accelerated gradient method. This ODE exhibits approximate equivalence to Nesterov's scheme and thus can serve as a tool for analysis. We show that the continuous time ODE allows for a better understanding of Nesterov's scheme. As a byproduct, we obtain a family of schemes with similar convergence rates.… ▽ More We derive a second-order ordinary differential equation (ODE) which is the limit of Nesterov's accelerated gradient method. This ODE exhibits approximate equivalence to Nesterov's scheme and thus can serve as a tool for analysis. We show that the continuous time ODE allows for a better understanding of Nesterov's scheme. As a byproduct, we obtain a family of schemes with similar convergence rates. The ODE interpretation also suggests restarting Nesterov's scheme leading to an algorithm, which can be rigorously proven to converge at a linear rate whenever the objective is strongly convex. △ Less

Submitted 27 October, 2015; v1 submitted 4 March, 2015; originally announced March 2015.

Comments: To appear in Journal of Machine Learning Research. Added more simulation studies. Preliminary version appeared in NIPS 2014

arXiv:1407.3824 [pdf, ps, other]

doi 10.1214/15-AOAS842

SLOPE - Adaptive variable selection via convex optimization

Authors: Małgorzata Bogdan, Ewout van den Berg, Chiara Sabatti, Weijie Su, Emmanuel J. Candès

Abstract: We introduce a new estimator for the vector of coefficients $β$ in the linear model $y=Xβ+z$, where $X$ has dimensions $n\times p$ with $p$ possibly larger than $n$. SLOPE, short for Sorted L-One Penalized Estimation, is the solution to \[\min_{b\in\mathbb{R}^p}\frac{1}{2}\Vert y-Xb\Vert _{\ell_2}^2+λ_1\vert b\vert _{(1)}+λ_2\vert b\vert_{(2)}+\cdots+λ_p\vert b\vert_{(p)},\] where… ▽ More We introduce a new estimator for the vector of coefficients $β$ in the linear model $y=Xβ+z$, where $X$ has dimensions $n\times p$ with $p$ possibly larger than $n$. SLOPE, short for Sorted L-One Penalized Estimation, is the solution to \[\min_{b\in\mathbb{R}^p}\frac{1}{2}\Vert y-Xb\Vert _{\ell_2}^2+λ_1\vert b\vert _{(1)}+λ_2\vert b\vert_{(2)}+\cdots+λ_p\vert b\vert_{(p)},\] where $λ_1\geλ_2\ge\cdots\geλ_p\ge0$ and $\vert b\vert_{(1)}\ge\vert b\vert_{(2)}\ge\cdots\ge\vert b\vert_{(p)}$ are the decreasing absolute values of the entries of $b$. This is a convex program and we demonstrate a solution algorithm whose computational complexity is roughly comparable to that of classical $\ell_1$ procedures such as the Lasso. Here, the regularizer is a sorted $\ell_1$ norm, which penalizes the regression coefficients according to their rank: the higher the rank - that is, stronger the signal - the larger the penalty. This is similar to the Benjamini and Hochberg [J. Roy. Statist. Soc. Ser. B 57 (1995) 289-300] procedure (BH) which compares more significant $p$-values with more stringent thresholds. One notable choice of the sequence $\{λ_i\}$ is given by the BH critical values $λ_{\mathrm {BH}}(i)=z(1-i\cdot q/2p)$, where $q\in(0,1)$ and $z(α)$ is the quantile of a standard normal distribution. SLOPE aims to provide finite sample guarantees on the selected model; of special interest is the false discovery rate (FDR), defined as the expected proportion of irrelevant regressors among all selected predictors. Under orthogonal designs, SLOPE with $λ_{\mathrm{BH}}$ provably controls FDR at level $q$. Moreover, it also appears to have appreciable inferential properties under more general designs $X$ while having substantial power, as demonstrated in a series of experiments running on both simulated and real data. △ Less

Submitted 4 November, 2015; v1 submitted 14 July, 2014; originally announced July 2014.

Comments: Published at http://dx.doi.org/10.1214/15-AOAS842 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS842

Journal ref: Annals of Applied Statistics 2015, Vol. 9, No. 3, 1103-1140

arXiv:1404.5609 [pdf, ps, other]

doi 10.1214/15-AOS1337

Controlling the false discovery rate via knockoffs

Authors: Rina Foygel Barber, Emmanuel J. Candès

Abstract: In many fields of science, we observe a response variable together with a large number of potential explanatory variables, and would like to be able to discover which variables are truly associated with the response. At the same time, we need to know that the false discovery rate (FDR) - the expected fraction of false discoveries among all discoveries - is not too high, in order to assure the scie… ▽ More In many fields of science, we observe a response variable together with a large number of potential explanatory variables, and would like to be able to discover which variables are truly associated with the response. At the same time, we need to know that the false discovery rate (FDR) - the expected fraction of false discoveries among all discoveries - is not too high, in order to assure the scientist that most of the discoveries are indeed true and replicable. This paper introduces the knockoff filter, a new variable selection procedure controlling the FDR in the statistical linear model whenever there are at least as many observations as variables. This method achieves exact FDR control in finite sample settings no matter the design or covariates, the number of variables in the model, or the amplitudes of the unknown regression coefficients, and does not require any knowledge of the noise level. As the name suggests, the method operates by manufacturing knockoff variables that are cheap - their construction does not require any new data - and are designed to mimic the correlation structure found within the existing variables, in a way that allows for accurate FDR control, beyond what is possible with permutation-based methods. The method of knockoffs is very general and flexible, and can work with a broad class of test statistics. We test the method in combination with statistics from the Lasso for sparse regression, and obtain empirical results showing that the resulting method has far more power than existing selection rules when the proportion of null variables is high. △ Less

Submitted 14 October, 2015; v1 submitted 22 April, 2014; originally announced April 2014.

Comments: Published at http://dx.doi.org/10.1214/15-AOS1337 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS1337

Journal ref: Annals of Statistics 2015, Vol. 43, No. 5, 2055-2085

arXiv:1301.2603 [pdf, ps, other]

doi 10.1214/13-AOS1199

Robust subspace clustering

Authors: Mahdi Soltanolkotabi, Ehsan Elhamifar, Emmanuel J. Candès

Abstract: Subspace clustering refers to the task of finding a multi-subspace representation that best fits a collection of points taken from a high-dimensional space. This paper introduces an algorithm inspired by sparse subspace clustering (SSC) [In IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2009) 2790-2797] to cluster noisy data, and develops some novel theory demonstrating its corr… ▽ More Subspace clustering refers to the task of finding a multi-subspace representation that best fits a collection of points taken from a high-dimensional space. This paper introduces an algorithm inspired by sparse subspace clustering (SSC) [In IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2009) 2790-2797] to cluster noisy data, and develops some novel theory demonstrating its correctness. In particular, the theory uses ideas from geometric functional analysis to show that the algorithm can accurately recover the underlying subspaces under minimal requirements on their orientation, and on the number of samples per subspace. Synthetic as well as real data experiments complement our theoretical study, illustrating our approach and demonstrating its effectiveness. △ Less

Submitted 23 May, 2014; v1 submitted 11 January, 2013; originally announced January 2013.

Comments: Published in at http://dx.doi.org/10.1214/13-AOS1199 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS1199

Journal ref: Annals of Statistics 2014, Vol. 42, No. 2, 669-699

arXiv:1211.0817 [pdf, ps, other]

doi 10.1214/12-AOS1001

Discussion: Latent variable graphical model selection via convex optimization

Authors: Emmanuel J. Candés, Mahdi Soltanolkotabi

Abstract: Discussion of "Latent variable graphical model selection via convex optimization" by Venkat Chandrasekaran, Pablo A. Parrilo and Alan S. Willsky [arXiv:1008.1290]. Discussion of "Latent variable graphical model selection via convex optimization" by Venkat Chandrasekaran, Pablo A. Parrilo and Alan S. Willsky [arXiv:1008.1290]. △ Less

Submitted 5 November, 2012; originally announced November 2012.

Comments: Published in at http://dx.doi.org/10.1214/12-AOS1001 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS1001

Journal ref: Annals of Statistics 2012, Vol. 40, No. 4, 1997-2004

arXiv:1210.4139 [pdf, ps, other]

doi 10.1109/TSP.2013.2270464

Unbiased Risk Estimates for Singular Value Thresholding and Spectral Estimators

Authors: Emmanuel J. Candes, Carlos A. Sing-Long, Joshua D. Trzasko

Abstract: In an increasing number of applications, it is of interest to recover an approximately low-rank data matrix from noisy observations. This paper develops an unbiased risk estimate---holding in a Gaussian model---for any spectral estimator obeying some mild regularity assumptions. In particular, we give an unbiased risk estimate formula for singular value thresholding (SVT), a popular estimation str… ▽ More In an increasing number of applications, it is of interest to recover an approximately low-rank data matrix from noisy observations. This paper develops an unbiased risk estimate---holding in a Gaussian model---for any spectral estimator obeying some mild regularity assumptions. In particular, we give an unbiased risk estimate formula for singular value thresholding (SVT), a popular estimation strategy which applies a soft-thresholding rule to the singular values of the noisy observations. Among other things, our formulas offer a principled and automated way of selecting regularization parameters in a variety of problems. In particular, we demonstrate the utility of the unbiased risk estimation for SVT-based denoising of real clinical cardiac MRI series data. We also give new results concerning the differentiability of certain matrix-valued functions. △ Less

Submitted 15 October, 2012; originally announced October 2012.

Comments: 29 pages, 8 figures

arXiv:1112.4258 [pdf, ps, other]

doi 10.1214/12-AOS1034

A geometric analysis of subspace clustering with outliers

Authors: Mahdi Soltanolkotabi, Emmanuel J. Candés

Abstract: This paper considers the problem of clustering a collection of unlabeled data points assumed to lie near a union of lower-dimensional planes. As is common in computer vision or unsupervised learning applications, we do not know in advance how many subspaces there are nor do we have any information about their dimensions. We develop a novel geometric analysis of an algorithm named sparse subspace c… ▽ More This paper considers the problem of clustering a collection of unlabeled data points assumed to lie near a union of lower-dimensional planes. As is common in computer vision or unsupervised learning applications, we do not know in advance how many subspaces there are nor do we have any information about their dimensions. We develop a novel geometric analysis of an algorithm named sparse subspace clustering (SSC) [In IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009 (2009) 2790-2797. IEEE], which significantly broadens the range of problems where it is provably effective. For instance, we show that SSC can recover multiple subspaces, each of dimension comparable to the ambient dimension. We also prove that SSC can correctly cluster data points even when the subspaces of interest intersect. Further, we develop an extension of SSC that succeeds when the data set is corrupted with possibly overwhelmingly many outliers. Underlying our analysis are clear geometric insights, which may bear on other sparse recovery problems. A numerical study complements our theoretical analysis and demonstrates the effectiveness of these methods. △ Less

Submitted 30 January, 2013; v1 submitted 19 December, 2011; originally announced December 2011.

Comments: Published in at http://dx.doi.org/10.1214/12-AOS1034 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS1034

Journal ref: Annals of Statistics 2012, Vol. 40, No. 4, 2195-2238

arXiv:1007.1434 [pdf, ps, other]

doi 10.1214/11-AOS910

Global testing under sparse alternatives: ANOVA, multiple comparisons and the higher criticism

Authors: Ery Arias-Castro, Emmanuel J. Candès, Yaniv Plan

Abstract: Testing for the significance of a subset of regression coefficients in a linear model, a staple of statistical analysis, goes back at least to the work of Fisher who introduced the analysis of variance (ANOVA). We study this problem under the assumption that the coefficient vector is sparse, a common situation in modern high-dimensional settings. Suppose we have $p$ covariates and that under the a… ▽ More Testing for the significance of a subset of regression coefficients in a linear model, a staple of statistical analysis, goes back at least to the work of Fisher who introduced the analysis of variance (ANOVA). We study this problem under the assumption that the coefficient vector is sparse, a common situation in modern high-dimensional settings. Suppose we have $p$ covariates and that under the alternative, the response only depends upon the order of $p^{1-α}$ of those, $0\leα\le1$. Under moderate sparsity levels, that is, $0\leα\le1/2$, we show that ANOVA is essentially optimal under some conditions on the design. This is no longer the case under strong sparsity constraints, that is, $α>1/2$. In such settings, a multiple comparison procedure is often preferred and we establish its optimality when $α\geq3/4$. However, these two very popular methods are suboptimal, and sometimes powerless, under moderately strong sparsity where $1/2<α<3/4$. We suggest a method based on the higher criticism that is powerful in the whole range $α>1/2$. This optimality property is true for a variety of designs, including the classical (balanced) multi-way designs and more modern "$p>n$" designs arising in genetics and signal processing. In addition to the standard fixed effects model, we establish similar results for a random effects model where the nonzero coefficients of the regression vector are normally distributed. △ Less

Submitted 23 February, 2012; v1 submitted 8 July, 2010; originally announced July 2010.

Comments: Published in at http://dx.doi.org/10.1214/11-AOS910 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS910

Journal ref: Annals of Statistics 2011, Vol. 39, No. 5, 2533-2556

arXiv:0711.1612 [pdf, ps, other]

Enhancing Sparsity by Reweighted L1 Minimization

Authors: Emmanuel J. Candes, Michael B. Wakin, Stephen P. Boyd

Abstract: It is now well understood that (1) it is possible to reconstruct sparse signals exactly from what appear to be highly incomplete sets of linear measurements and (2) that this can be done by constrained L1 minimization. In this paper, we study a novel method for sparse signal recovery that in many situations outperforms L1 minimization in the sense that substantially fewer measurements are needed… ▽ More It is now well understood that (1) it is possible to reconstruct sparse signals exactly from what appear to be highly incomplete sets of linear measurements and (2) that this can be done by constrained L1 minimization. In this paper, we study a novel method for sparse signal recovery that in many situations outperforms L1 minimization in the sense that substantially fewer measurements are needed for exact recovery. The algorithm consists of solving a sequence of weighted L1-minimization problems where the weights used for the next iteration are computed from the value of the current solution. We present a series of experiments demonstrating the remarkable performance and broad applicability of this algorithm in the areas of sparse signal recovery, statistical estimation, error correction and image processing. Interestingly, superior gains are also achieved when our method is applied to recover signals with assumed near-sparsity in overcomplete representations--not by reweighting the L1 norm of the coefficient sequence as is common, but by reweighting the L1 norm of the transformed object. An immediate consequence is the possibility of highly efficient data acquisition protocols by improving on a technique known as compressed sensing. △ Less

Submitted 10 November, 2007; originally announced November 2007.

MSC Class: 49N30; 49N45; 94A12

Showing 1–45 of 45 results for author: Candes, E J