Skip to main content

Showing 1–45 of 45 results for author: Candes, E J

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.09714  [pdf, other

    stat.ML cs.LG stat.ME

    Large language model validity via enhanced conformal prediction methods

    Authors: John J. Cherian, Isaac Gibbs, Emmanuel J. Candès

    Abstract: We develop new conformal inference methods for obtaining validity guarantees on the output of large language models (LLMs). Prior work in conformal language modeling identifies a subset of the text that satisfies a high-probability guarantee of correctness. These methods work by filtering claims from the LLM's original response if a scoring function evaluated on the claim fails to exceed a thresho… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 20 pages, 8 figures

  2. arXiv:2406.07449  [pdf, other

    stat.ME stat.ML

    Boosted Conformal Prediction Intervals

    Authors: Ran Xie, Rina Foygel Barber, Emmanuel J. Candès

    Abstract: This paper introduces a boosted conformal procedure designed to tailor conformalized prediction intervals toward specific desired properties, such as enhanced conditional coverage or reduced interval length. We employ machine learning techniques, notably gradient boosting, to systematically improve upon a predefined conformity score function. This process is guided by carefully constructed loss fu… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 22 pages, 9 figures

  3. arXiv:2403.03208  [pdf, other

    stat.ML cs.LG stat.ME

    Active Statistical Inference

    Authors: Tijana Zrnic, Emmanuel J. Candès

    Abstract: Inspired by the concept of active learning, we propose active inference$\unicode{x2013}$a methodology for statistical inference with machine-learning-assisted data collection. Assuming a budget on the number of labels that can be collected, the methodology uses a machine learning model to identify which data points would be most beneficial to label, thus effectively utilizing the budget. It operat… ▽ More

    Submitted 29 May, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

  4. arXiv:2309.16598  [pdf, other

    stat.ML cs.LG stat.ME

    Cross-Prediction-Powered Inference

    Authors: Tijana Zrnic, Emmanuel J. Candès

    Abstract: While reliable data-driven decision-making hinges on high-quality labeled data, the acquisition of quality labels often involves laborious human annotations or slow and expensive scientific measurements. Machine learning is becoming an appealing alternative as sophisticated predictive techniques are being used to quickly and cheaply produce large amounts of predicted labels; e.g., predicted protei… ▽ More

    Submitted 28 February, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

  5. arXiv:2307.16895  [pdf, other

    cs.LG eess.SY stat.ME stat.ML

    Conformal PID Control for Time Series Prediction

    Authors: Anastasios N. Angelopoulos, Emmanuel J. Candes, Ryan J. Tibshirani

    Abstract: We study the problem of uncertainty quantification for time series prediction, with the goal of providing easy-to-use algorithms with formal guarantees. The algorithms we present build upon ideas from conformal prediction and control theory, are able to prospectively model conformal scores in an online setting, and adapt to the presence of systematic errors due to seasonality, trends, and general… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

    Comments: Code available at https://github.com/aangelopoulos/conformal-time-series

  6. arXiv:2307.09291  [pdf, other

    stat.ME math.ST stat.AP

    Model-free selective inference under covariate shift via weighted conformal p-values

    Authors: Ying **, Emmanuel J. Candès

    Abstract: This paper introduces novel weighted conformal p-values and methods for model-free selective inference. The problem is as follows: given test units with covariates $X$ and missing responses $Y$, how do we select units for which the responses $Y$ are larger than user-specified values while controlling the proportion of false positives? Can we achieve this without any modeling assumptions on the dat… ▽ More

    Submitted 26 September, 2023; v1 submitted 18 July, 2023; originally announced July 2023.

  7. arXiv:2305.12616  [pdf, other

    stat.ME

    Conformal Prediction With Conditional Guarantees

    Authors: Isaac Gibbs, John J. Cherian, Emmanuel J. Candès

    Abstract: We consider the problem of constructing distribution-free prediction sets with finite-sample conditional guarantees. Prior work has shown that it is impossible to provide exact conditional coverage universally in finite samples. Thus, most popular methods only provide marginal coverage over the covariates. This paper bridges this gap by defining a spectrum of problems that interpolate between marg… ▽ More

    Submitted 20 December, 2023; v1 submitted 21 May, 2023; originally announced May 2023.

    Comments: 46 pages, 11 figures

  8. arXiv:2305.03712  [pdf, other

    stat.ME cs.CY cs.LG

    Statistical Inference for Fairness Auditing

    Authors: John J. Cherian, Emmanuel J. Candès

    Abstract: Before deploying a black-box model in high-stakes problems, it is important to evaluate the model's performance on sensitive subpopulations. For example, in a recidivism prediction task, we may wish to identify demographic groups for which our prediction model has unacceptably high false positive rates or certify that no such groups exist. In this paper, we frame this task, often referred to as "f… ▽ More

    Submitted 8 June, 2023; v1 submitted 5 May, 2023; originally announced May 2023.

    Comments: 44 pages, 8 figures

  9. arXiv:2210.01408  [pdf, other

    stat.ME stat.ML

    Selection by Prediction with Conformal p-values

    Authors: Ying **, Emmanuel J. Candès

    Abstract: Decision making or scientific discovery pipelines such as job hiring and drug discovery often involve multiple stages: before any resource-intensive step, there is often an initial screening that uses predictions from a machine learning model to shortlist a few candidates from a large pool. We study screening procedures that aim to select candidates whose unobserved outcomes exceed user-specified… ▽ More

    Submitted 26 May, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

    Comments: Journal of Machine Learning Research

  10. arXiv:2208.08944  [pdf, other

    stat.ME

    An Adaptively Resized Parametric Bootstrap for Inference in High-dimensional Generalized Linear Models

    Authors: Qian Zhao, Emmanuel J. Candes

    Abstract: Accurate statistical inference in logistic regression models remains a critical challenge when the ratio between the number of parameters and sample size is not negligible. This is because approximations based on either classical asymptotic theory or bootstrap calculations are grossly off the mark. This paper introduces a resized bootstrap method to infer model parameters in arbitrary dimensions.… ▽ More

    Submitted 18 August, 2022; originally announced August 2022.

  11. arXiv:2204.13581  [pdf, ps, other

    stat.ME

    Permutation tests using arbitrary permutation distributions

    Authors: Aaditya Ramdas, Rina Foygel Barber, Emmanuel J. Candes, Ryan J. Tibshirani

    Abstract: Permutation tests date back nearly a century to Fisher's randomized experiments, and remain an immensely popular statistical tool, used for testing hypotheses of independence between variables and other common inferential questions. Much of the existing literature has emphasized that, for the permutation p-value to be valid, one must first pick a subgroup $G$ of permutations (which could equal the… ▽ More

    Submitted 2 December, 2022; v1 submitted 28 April, 2022; originally announced April 2022.

  12. arXiv:2202.13415  [pdf, other

    stat.ME

    Conformal prediction beyond exchangeability

    Authors: Rina Foygel Barber, Emmanuel J. Candes, Aaditya Ramdas, Ryan J. Tibshirani

    Abstract: Conformal prediction is a popular, modern technique for providing valid predictive inference for arbitrary machine learning models. Its validity relies on the assumptions of exchangeability of the data, and symmetry of the given model fitting algorithm as a function of the data. However, exchangeability is often violated when predictive models are deployed in practice. For example, if the data dis… ▽ More

    Submitted 16 March, 2023; v1 submitted 27 February, 2022; originally announced February 2022.

  13. arXiv:2111.12161  [pdf, other

    stat.ME

    Sensitivity Analysis of Individual Treatment Effects: A Robust Conformal Inference Approach

    Authors: Ying **, Zhimei Ren, Emmanuel J. Candès

    Abstract: We propose a model-free framework for sensitivity analysis of individual treatment effects (ITEs), building upon ideas from conformal inference. For any unit, our procedure reports the $Γ$-value, a number which quantifies the minimum strength of confounding needed to explain away the evidence for ITE. Our approach rests on the reliable predictive inference of counterfactuals and ITEs in situations… ▽ More

    Submitted 24 April, 2022; v1 submitted 23 November, 2021; originally announced November 2021.

  14. arXiv:2110.02422  [pdf, other

    stat.ME

    Deploying the Conditional Randomization Test in High Multiplicity Problems

    Authors: Shuangning Li, Emmanuel J. Candès

    Abstract: This paper introduces the sequential CRT, which is a variable selection procedure that combines the conditional randomization test (CRT) and Selective SeqStep+. Valid p-values are constructed via the flexible CRT, which are then ordered and passed through the selective SeqStep+ filter to produce a list of discoveries. We develop theory guaranteeing control on the false discovery rate (FDR) even th… ▽ More

    Submitted 7 April, 2022; v1 submitted 5 October, 2021; originally announced October 2021.

  15. arXiv:2110.01052  [pdf, other

    cs.LG cs.AI cs.CV stat.ME stat.ML

    Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control

    Authors: Anastasios N. Angelopoulos, Stephen Bates, Emmanuel J. Candès, Michael I. Jordan, Lihua Lei

    Abstract: We introduce a framework for calibrating machine learning models so that their predictions satisfy explicit, finite-sample statistical guarantees. Our calibration algorithms work with any underlying model and (unknown) data-generating distribution and do not require model refitting. The framework addresses, among other examples, false discovery rate control in multi-label classification, intersect… ▽ More

    Submitted 29 September, 2022; v1 submitted 3 October, 2021; originally announced October 2021.

    Comments: Code available at https://github.com/aangelopoulos/ltt

  16. arXiv:2103.09763  [pdf, other

    stat.ME stat.ML

    Conformalized Survival Analysis

    Authors: Emmanuel J. Candès, Lihua Lei, Zhimei Ren

    Abstract: Existing survival analysis techniques heavily rely on strong modelling assumptions and are, therefore, prone to model misspecification errors. In this paper, we develop an inferential method based on ideas from conformal prediction, which can wrap around any survival prediction algorithm to produce calibrated, covariate-dependent lower predictive bounds on survival times. In the Type I right-censo… ▽ More

    Submitted 23 April, 2023; v1 submitted 17 March, 2021; originally announced March 2021.

  17. arXiv:2102.07967  [pdf, other

    math.ST stat.ME

    Distribution-Free Conditional Median Inference

    Authors: Dhruv Medarametla, Emmanuel J. Candès

    Abstract: We consider the problem of constructing confidence intervals for the median of a response $Y \in \mathbb{R}$ conditional on features $X \in \mathbb{R}^d$ in a situation where we are not willing to make any assumption whatsoever on the underlying distribution of the data $(X,Y)$. We propose a method based upon ideas from conformal prediction and establish a theoretical guarantee of coverage while a… ▽ More

    Submitted 3 September, 2021; v1 submitted 16 February, 2021; originally announced February 2021.

    Comments: 27 pages, 4 figures

  18. arXiv:2006.06138  [pdf, other

    stat.ME math.ST stat.ML

    Conformal Inference of Counterfactuals and Individual Treatment Effects

    Authors: Lihua Lei, Emmanuel J. Candès

    Abstract: Evaluating treatment effect heterogeneity widely informs treatment decision making. At the moment, much emphasis is placed on the estimation of the conditional average treatment effect via flexible machine learning algorithms. While these methods enjoy some theoretical appeal in terms of consistency and convergence rates, they generally perform poorly in terms of uncertainty quantification. This i… ▽ More

    Submitted 5 May, 2021; v1 submitted 10 June, 2020; originally announced June 2020.

    Comments: Accepted by Journal of the Royal Statistical Society: Series B (JRSSB); 38 pages

  19. arXiv:2006.04937  [pdf, other

    eess.SP stat.AP stat.ML

    Interpretable Classification of Bacterial Raman Spectra with Knockoff Wavelets

    Authors: Charmaine Chia, Matteo Sesia, Chi-Sing Ho, Stefanie S. Jeffrey, Jennifer Dionne, Emmanuel J. Candès, Roger T. Howe

    Abstract: Deep neural networks and other sophisticated machine learning models are widely applied to biomedical signal data because they can detect complex patterns and compute accurate predictions. However, the difficulty of interpreting such models is a limitation, especially for applications involving high-stakes decision, including the identification of bacterial infections. In this paper, we consider f… ▽ More

    Submitted 1 May, 2021; v1 submitted 8 June, 2020; originally announced June 2020.

    Comments: 9 pages, 6 figures, 4 tables

  20. arXiv:2006.04292  [pdf, other

    stat.ML cs.LG stat.ME

    Achieving Equalized Odds by Resampling Sensitive Attributes

    Authors: Yaniv Romano, Stephen Bates, Emmanuel J. Candès

    Abstract: We present a flexible framework for learning predictive models that approximately satisfy the equalized odds notion of fairness. This is achieved by introducing a general discrepancy functional that rigorously quantifies violations of this criterion. This differentiable functional is used as a penalty driving the model parameters towards equalized odds. To rigorously evaluate fitted models, we dev… ▽ More

    Submitted 7 June, 2020; originally announced June 2020.

    Comments: 14 pages, 4 figures

  21. arXiv:2006.02544  [pdf, other

    stat.ME stat.ML

    Classification with Valid and Adaptive Coverage

    Authors: Yaniv Romano, Matteo Sesia, Emmanuel J. Candès

    Abstract: Conformal inference, cross-validation+, and the jackknife+ are hold-out methods that can be combined with virtually any machine learning algorithm to construct prediction sets with guaranteed marginal coverage. In this paper, we develop specialized versions of these techniques for categorical and unordered response labels that, in addition to providing marginal coverage, are also fully adaptive to… ▽ More

    Submitted 3 June, 2020; originally announced June 2020.

    Comments: 10 pages, 3 figures; 13 supplementary pages, 4 supplementary figures, 4 supplementary tables

    Journal ref: Advances in Neural Information Processing Systems 33 (NeurIPS 2020)

  22. arXiv:2001.09351  [pdf, other

    math.ST stat.ME

    The Asymptotic Distribution of the MLE in High-dimensional Logistic Models: Arbitrary Covariance

    Authors: Qian Zhao, Pragya Sur, Emmanuel J. Candès

    Abstract: We study the distribution of the maximum likelihood estimate (MLE) in high-dimensional logistic models, extending the recent results from Sur (2019) to the case where the Gaussian covariates may have an arbitrary covariance structure. We prove that in the limit of large problems holding the ratio between the number $p$ of covariates and the sample size $n$ constant, every finite list of MLE coordi… ▽ More

    Submitted 4 January, 2023; v1 submitted 25 January, 2020; originally announced January 2020.

    Journal ref: Bernoulli 28 (3) 1835-1861, August 2022

  23. arXiv:1909.05433  [pdf, other

    stat.ME math.ST stat.ML

    A comparison of some conformal quantile regression methods

    Authors: Matteo Sesia, Emmanuel J. Candès

    Abstract: We compare two recently proposed methods that combine ideas from conformal inference and quantile regression to produce locally adaptive and marginally valid prediction intervals under sample exchangeability (Romano et al., 2019; Kivaranovic et al., 2019). First, we prove that these two approaches are asymptotically efficient in large samples, under some additional assumptions. Then we compare the… ▽ More

    Submitted 11 September, 2019; originally announced September 2019.

    Comments: 20 pages, 9 figures, 3 tables

    Journal ref: Stat. 2020; 9:e261

  24. arXiv:1908.05428  [pdf, other

    stat.ME cs.CY stat.AP stat.ML

    With Malice Towards None: Assessing Uncertainty via Equalized Coverage

    Authors: Yaniv Romano, Rina Foygel Barber, Chiara Sabatti, Emmanuel J. Candès

    Abstract: An important factor to guarantee a fair use of data-driven recommendation systems is that we should be able to communicate their uncertainty to decision makers. This can be accomplished by constructing prediction intervals, which provide an intuitive measure of the limits of predictive performance. To support equitable treatment, we force the construction of such intervals to be unbiased in the se… ▽ More

    Submitted 15 August, 2019; originally announced August 2019.

    Comments: 14 pages, 1 figure, 1 table

  25. arXiv:1905.03222  [pdf, other

    stat.ME stat.ML

    Conformalized Quantile Regression

    Authors: Yaniv Romano, Evan Patterson, Emmanuel J. Candès

    Abstract: Conformal prediction is a technique for constructing prediction intervals that attain valid coverage in finite samples, without making distributional assumptions. Despite this appeal, existing conformal methods can be unnecessarily conservative because they form intervals of constant or weakly varying length across the input space. In this paper we propose a new method that is fully adaptive to he… ▽ More

    Submitted 8 May, 2019; originally announced May 2019.

    Comments: 19 pages, 8 figures, 1 table

  26. arXiv:1905.02928  [pdf, other

    stat.ME

    Predictive inference with the jackknife+

    Authors: Rina Foygel Barber, Emmanuel J. Candes, Aaditya Ramdas, Ryan J. Tibshirani

    Abstract: This paper introduces the jackknife+, which is a novel method for constructing predictive confidence intervals. Whereas the jackknife outputs an interval centered at the predicted response of a test point, with the width of the interval determined by the quantiles of leave-one-out residuals, the jackknife+ also uses the leave-one-out predictions at the test point to account for the variability in… ▽ More

    Submitted 29 May, 2020; v1 submitted 8 May, 2019; originally announced May 2019.

  27. arXiv:1904.06019  [pdf, other

    stat.ME

    Conformal Prediction Under Covariate Shift

    Authors: Ryan J. Tibshirani, Rina Foygel Barber, Emmanuel J. Candes, Aaditya Ramdas

    Abstract: We extend conformal prediction methodology beyond the case of exchangeable data. In particular, we show that a weighted version of conformal prediction can be used to compute distribution-free prediction intervals for problems in which the test and training covariate distributions differ, but the likelihood ratio between these two distributions is known---or, in practice, can be estimated accurate… ▽ More

    Submitted 6 July, 2020; v1 submitted 11 April, 2019; originally announced April 2019.

    Comments: 17 pages, 4 figures

  28. arXiv:1903.05701  [pdf, other

    stat.ME math.ST stat.AP

    Rejoinder: "Gene Hunting with Hidden Markov Model Knockoffs"

    Authors: Matteo Sesia, Chiara Sabatti, Emmanuel J. Candès

    Abstract: In this paper we deepen and enlarge the reflection on the possible advantages of a knockoff approach to genome wide association studies (Sesia et al., 2018), starting from the discussions in Bottolo & Richardson (2019); Jewell & Witten (2019); Rosenblatt et al. (2019) and Marchini (2019). The discussants bring up a number of important points, either related to the knockoffs methodology in general,… ▽ More

    Submitted 13 March, 2019; originally announced March 2019.

    Comments: 12 pages, 4 figures

    Journal ref: Biometrika, Volume 106, Issue 1, 1 March 2019, Pages 35-45

  29. arXiv:1811.06687  [pdf, other

    stat.ME math.ST stat.AP stat.ML

    Deep Knockoffs

    Authors: Yaniv Romano, Matteo Sesia, Emmanuel J. Candès

    Abstract: This paper introduces a machine for sampling approximate model-X knockoffs for arbitrary and unspecified data distributions using deep generative models. The main idea is to iteratively refine a knockoff sampling mechanism until a criterion measuring the validity of the produced knockoffs is optimized; this criterion is inspired by the popular maximum mean discrepancy in machine learning and can b… ▽ More

    Submitted 16 November, 2018; originally announced November 2018.

    Comments: 37 pages, 23 figures, 1 table

    Journal ref: J. Am. Stat. Assoc., Volume 0, Issue 0, 17 Oct 2019, Pages 1-12

  30. arXiv:1804.09753  [pdf, other

    stat.ME stat.ML

    The phase transition for the existence of the maximum likelihood estimate in high-dimensional logistic regression

    Authors: Emmanuel J. Candes, Pragya Sur

    Abstract: This paper rigorously establishes that the existence of the maximum likelihood estimate (MLE) in high-dimensional logistic regression models with Gaussian covariates undergoes a sharp `phase transition'. We introduce an explicit boundary curve $h_{\text{MLE}}$, parameterized by two scalars measuring the overall magnitude of the unknown sequence of regression coefficients, with the following proper… ▽ More

    Submitted 25 April, 2018; originally announced April 2018.

    Comments: 15 pages, 2 figures

  31. A modern maximum-likelihood theory for high-dimensional logistic regression

    Authors: Pragya Sur, Emmanuel J. Candes

    Abstract: Every student in statistics or data science learns early on that when the sample size largely exceeds the number of variables, fitting a logistic model produces estimates that are approximately unbiased. Every student also learns that there are formulas to predict the variability of these estimates which are used for the purpose of statistical inference; for instance, to produce p-values for testi… ▽ More

    Submitted 16 June, 2018; v1 submitted 19 March, 2018; originally announced March 2018.

    Comments: 29 pages, 14 figures, 4 tables

  32. arXiv:1801.03896  [pdf, ps, other

    stat.ME

    Robust inference with knockoffs

    Authors: Rina Foygel Barber, Emmanuel J. Candès, Richard J. Samworth

    Abstract: We consider the variable selection problem, which seeks to identify important variables influencing a response $Y$ out of many candidate features $X_1, \ldots, X_p$. We wish to do so while offering finite-sample guarantees about the fraction of false positives - selected variables $X_j$ that in fact have no effect on $Y$ after the other features are known. When the number of features $p$ is large… ▽ More

    Submitted 11 February, 2019; v1 submitted 11 January, 2018; originally announced January 2018.

  33. arXiv:1706.04677  [pdf, other

    stat.ME math.ST stat.AP

    Gene Hunting with Knockoffs for Hidden Markov Models

    Authors: Matteo Sesia, Chiara Sabatti, Emmanuel J. Candès

    Abstract: Modern scientific studies often require the identification of a subset of relevant explanatory variables, in the attempt to understand an interesting phenomenon. Several statistical methods have been developed to automate this task, but only recently has the framework of model-free knockoffs proposed a general solution that can perform variable selection under rigorous type-I error control, withou… ▽ More

    Submitted 14 June, 2017; originally announced June 2017.

    Comments: 35 pages, 13 figues, 9 tables

    Journal ref: Biometrika, Volume 106, Issue 1, 1 March 2019, Pages 1-18

  34. arXiv:1706.01191  [pdf, other

    math.ST cs.IT math.PR stat.ML

    The Likelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square

    Authors: Pragya Sur, Yuxin Chen, Emmanuel J. Candès

    Abstract: Logistic regression is used thousands of times a day to fit data, predict future outcomes, and assess the statistical significance of explanatory variables. When used for the purpose of statistical inference, logistic models produce p-values for the regression coefficients by using an approximation to the distribution of the likelihood-ratio test. Indeed, Wilks' theorem asserts that whenever we ha… ▽ More

    Submitted 5 June, 2017; originally announced June 2017.

    Comments: 58 pages, 7 figures

  35. arXiv:1602.03574  [pdf, other

    stat.ME math.ST

    A knockoff filter for high-dimensional selective inference

    Authors: Rina Foygel Barber, Emmanuel J. Candes

    Abstract: This paper develops a framework for testing for associations in a possibly high-dimensional linear model where the number of features/variables may far exceed the number of observational units. In this framework, the observations are split into two groups, where the first group is used to screen for a set of potentially relevant variables, whereas the second is used for inference over this reduced… ▽ More

    Submitted 3 May, 2018; v1 submitted 10 February, 2016; originally announced February 2016.

  36. arXiv:1505.05114  [pdf, other

    cs.IT cs.LG math.NA math.ST stat.ML

    Solving Random Quadratic Systems of Equations Is Nearly as Easy as Solving Linear Systems

    Authors: Yuxin Chen, Emmanuel J. Candes

    Abstract: We consider the fundamental problem of solving quadratic systems of equations in $n$ variables, where $y_i = |\langle \boldsymbol{a}_i, \boldsymbol{x} \rangle|^2$, $i = 1, \ldots, m$ and $\boldsymbol{x} \in \mathbb{R}^n$ is unknown. We propose a novel method, which starting with an initial guess computed by means of a spectral method, proceeds by minimizing a nonconvex functional as in the Wirting… ▽ More

    Submitted 22 March, 2016; v1 submitted 19 May, 2015; originally announced May 2015.

    Comments: accepted to Communications on Pure and Applied Mathematics (CPAM)

  37. arXiv:1503.01243  [pdf, ps, other

    stat.ML math.CA math.OC

    A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights

    Authors: Weijie Su, Stephen Boyd, Emmanuel J. Candes

    Abstract: We derive a second-order ordinary differential equation (ODE) which is the limit of Nesterov's accelerated gradient method. This ODE exhibits approximate equivalence to Nesterov's scheme and thus can serve as a tool for analysis. We show that the continuous time ODE allows for a better understanding of Nesterov's scheme. As a byproduct, we obtain a family of schemes with similar convergence rates.… ▽ More

    Submitted 27 October, 2015; v1 submitted 4 March, 2015; originally announced March 2015.

    Comments: To appear in Journal of Machine Learning Research. Added more simulation studies. Preliminary version appeared in NIPS 2014

  38. SLOPE - Adaptive variable selection via convex optimization

    Authors: Małgorzata Bogdan, Ewout van den Berg, Chiara Sabatti, Weijie Su, Emmanuel J. Candès

    Abstract: We introduce a new estimator for the vector of coefficients $β$ in the linear model $y=Xβ+z$, where $X$ has dimensions $n\times p$ with $p$ possibly larger than $n$. SLOPE, short for Sorted L-One Penalized Estimation, is the solution to \[\min_{b\in\mathbb{R}^p}\frac{1}{2}\Vert y-Xb\Vert _{\ell_2}^2+λ_1\vert b\vert _{(1)}+λ_2\vert b\vert_{(2)}+\cdots+λ_p\vert b\vert_{(p)},\] where… ▽ More

    Submitted 4 November, 2015; v1 submitted 14 July, 2014; originally announced July 2014.

    Comments: Published at http://dx.doi.org/10.1214/15-AOAS842 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS842

    Journal ref: Annals of Applied Statistics 2015, Vol. 9, No. 3, 1103-1140

  39. arXiv:1404.5609  [pdf, ps, other

    stat.ME math.ST

    Controlling the false discovery rate via knockoffs

    Authors: Rina Foygel Barber, Emmanuel J. Candès

    Abstract: In many fields of science, we observe a response variable together with a large number of potential explanatory variables, and would like to be able to discover which variables are truly associated with the response. At the same time, we need to know that the false discovery rate (FDR) - the expected fraction of false discoveries among all discoveries - is not too high, in order to assure the scie… ▽ More

    Submitted 14 October, 2015; v1 submitted 22 April, 2014; originally announced April 2014.

    Comments: Published at http://dx.doi.org/10.1214/15-AOS1337 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOS-AOS1337

    Journal ref: Annals of Statistics 2015, Vol. 43, No. 5, 2055-2085

  40. arXiv:1301.2603  [pdf, ps, other

    cs.LG cs.IT math.OC math.ST stat.ML

    Robust subspace clustering

    Authors: Mahdi Soltanolkotabi, Ehsan Elhamifar, Emmanuel J. Candès

    Abstract: Subspace clustering refers to the task of finding a multi-subspace representation that best fits a collection of points taken from a high-dimensional space. This paper introduces an algorithm inspired by sparse subspace clustering (SSC) [In IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2009) 2790-2797] to cluster noisy data, and develops some novel theory demonstrating its corr… ▽ More

    Submitted 23 May, 2014; v1 submitted 11 January, 2013; originally announced January 2013.

    Comments: Published in at http://dx.doi.org/10.1214/13-AOS1199 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOS-AOS1199

    Journal ref: Annals of Statistics 2014, Vol. 42, No. 2, 669-699

  41. arXiv:1211.0817  [pdf, ps, other

    math.ST cs.LG stat.ML

    Discussion: Latent variable graphical model selection via convex optimization

    Authors: Emmanuel J. Candés, Mahdi Soltanolkotabi

    Abstract: Discussion of "Latent variable graphical model selection via convex optimization" by Venkat Chandrasekaran, Pablo A. Parrilo and Alan S. Willsky [arXiv:1008.1290].

    Submitted 5 November, 2012; originally announced November 2012.

    Comments: Published in at http://dx.doi.org/10.1214/12-AOS1001 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOS-AOS1001

    Journal ref: Annals of Statistics 2012, Vol. 40, No. 4, 1997-2004

  42. Unbiased Risk Estimates for Singular Value Thresholding and Spectral Estimators

    Authors: Emmanuel J. Candes, Carlos A. Sing-Long, Joshua D. Trzasko

    Abstract: In an increasing number of applications, it is of interest to recover an approximately low-rank data matrix from noisy observations. This paper develops an unbiased risk estimate---holding in a Gaussian model---for any spectral estimator obeying some mild regularity assumptions. In particular, we give an unbiased risk estimate formula for singular value thresholding (SVT), a popular estimation str… ▽ More

    Submitted 15 October, 2012; originally announced October 2012.

    Comments: 29 pages, 8 figures

  43. arXiv:1112.4258  [pdf, ps, other

    cs.IT cs.LG math.ST stat.ML

    A geometric analysis of subspace clustering with outliers

    Authors: Mahdi Soltanolkotabi, Emmanuel J. Candés

    Abstract: This paper considers the problem of clustering a collection of unlabeled data points assumed to lie near a union of lower-dimensional planes. As is common in computer vision or unsupervised learning applications, we do not know in advance how many subspaces there are nor do we have any information about their dimensions. We develop a novel geometric analysis of an algorithm named sparse subspace c… ▽ More

    Submitted 30 January, 2013; v1 submitted 19 December, 2011; originally announced December 2011.

    Comments: Published in at http://dx.doi.org/10.1214/12-AOS1034 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOS-AOS1034

    Journal ref: Annals of Statistics 2012, Vol. 40, No. 4, 2195-2238

  44. arXiv:1007.1434  [pdf, ps, other

    math.ST stat.ME

    Global testing under sparse alternatives: ANOVA, multiple comparisons and the higher criticism

    Authors: Ery Arias-Castro, Emmanuel J. Candès, Yaniv Plan

    Abstract: Testing for the significance of a subset of regression coefficients in a linear model, a staple of statistical analysis, goes back at least to the work of Fisher who introduced the analysis of variance (ANOVA). We study this problem under the assumption that the coefficient vector is sparse, a common situation in modern high-dimensional settings. Suppose we have $p$ covariates and that under the a… ▽ More

    Submitted 23 February, 2012; v1 submitted 8 July, 2010; originally announced July 2010.

    Comments: Published in at http://dx.doi.org/10.1214/11-AOS910 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOS-AOS910

    Journal ref: Annals of Statistics 2011, Vol. 39, No. 5, 2533-2556

  45. arXiv:0711.1612  [pdf, ps, other

    stat.ME math.ST

    Enhancing Sparsity by Reweighted L1 Minimization

    Authors: Emmanuel J. Candes, Michael B. Wakin, Stephen P. Boyd

    Abstract: It is now well understood that (1) it is possible to reconstruct sparse signals exactly from what appear to be highly incomplete sets of linear measurements and (2) that this can be done by constrained L1 minimization. In this paper, we study a novel method for sparse signal recovery that in many situations outperforms L1 minimization in the sense that substantially fewer measurements are needed… ▽ More

    Submitted 10 November, 2007; originally announced November 2007.

    MSC Class: 49N30; 49N45; 94A12