Skip to main content

Showing 1–27 of 27 results for author: Sesia, M

.
  1. arXiv:2405.15106  [pdf, other

    stat.ML cs.LG

    Conformal Classification with Equalized Coverage for Adaptively Selected Groups

    Authors: Yanfei Zhou, Matteo Sesia

    Abstract: This paper introduces a conformal inference method to evaluate uncertainty in classification by generating prediction sets with valid coverage conditional on adaptively chosen features. These features are carefully selected to reflect potential model limitations or biases. This can be useful to find a practical compromise between efficiency -- by providing informative predictions -- and algorithmi… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  2. arXiv:2404.17561  [pdf, other

    stat.ME stat.ML

    Structured Conformal Inference for Matrix Completion with Applications to Group Recommender Systems

    Authors: Ziyi Liang, Tianmin Xie, Xin Tong, Matteo Sesia

    Abstract: We develop a conformal inference method to construct joint confidence regions for structured groups of missing entries within a sparsely observed matrix. This method is useful to provide reliable uncertainty estimation for group-level collaborative filtering; for example, it can be applied to help suggest a movie for a group of friends to watch together. Unlike standard conformal techniques, which… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  3. arXiv:2404.03163  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    Uncertainty in Language Models: Assessment through Rank-Calibration

    Authors: Xinmeng Huang, Shuo Li, Mengxin Yu, Matteo Sesia, Hamed Hassani, Insup Lee, Osbert Bastani, Edgar Dobriban

    Abstract: Language Models (LMs) have shown promising performance in natural language generation. However, as LMs often generate incorrect or hallucinated responses, it is crucial to correctly quantify their uncertainty in responding to given inputs. In addition to verbalized confidence elicited via prompting, many uncertainty measures ($e.g.$, semantic entropy and affinity-graph-based measures) have been pr… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  4. arXiv:2402.09623  [pdf, other

    stat.ML cs.LG

    Conformalized Adaptive Forecasting of Heterogeneous Trajectories

    Authors: Yanfei Zhou, Lars Lindemann, Matteo Sesia

    Abstract: This paper presents a new conformal method for generating simultaneous forecasting bands guaranteed to cover the entire path of a new random trajectory with sufficiently high probability. Prompted by the need for dependable uncertainty estimates in motion planning applications where the behavior of diverse objects may be more or less unpredictable, we blend different techniques from online conform… ▽ More

    Submitted 15 May, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

  5. arXiv:2309.15408  [pdf, other

    stat.ME cs.DS cs.IR math.ST

    A smoothed-Bayesian approach to frequency recovery from sketched data

    Authors: Mario Beraha, Stefano Favaro, Matteo Sesia

    Abstract: We provide a novel statistical perspective on a classical problem at the intersection of computer science and information theory: recovering the empirical frequency of a symbol in a large discrete dataset using only a compressed representation, or sketch, obtained via random hashing. Departing from traditional algorithmic approaches, recent works have proposed Bayesian nonparametric (BNP) methods… ▽ More

    Submitted 12 June, 2024; v1 submitted 27 September, 2023; originally announced September 2023.

  6. arXiv:2309.05092  [pdf, other

    stat.ME cs.LG math.ST

    Adaptive conformal classification with noisy labels

    Authors: Matteo Sesia, Y. X. Rachel Wang, Xin Tong

    Abstract: This paper develops novel conformal prediction methods for classification tasks that can automatically adapt to random label contamination in the calibration sample, leading to more informative prediction sets with stronger coverage guarantees compared to state-of-the-art approaches. This is made possible by a precise characterization of the effective coverage inflation (or deflation) suffered by… ▽ More

    Submitted 21 February, 2024; v1 submitted 10 September, 2023; originally announced September 2023.

    Comments: 28 pages (127 pages including references and appendices)

  7. arXiv:2303.15029  [pdf, other

    math.ST stat.ME

    Random measure priors in Bayesian recovery from sketches

    Authors: Mario Beraha, Stefano Favaro, Matteo Sesia

    Abstract: This paper introduces a Bayesian nonparametric approach to frequency recovery from lossy-compressed discrete data, leveraging all information contained in a sketch obtained through random hashing. By modeling the data points as random samples from an unknown discrete distribution endowed with a Poisson-Kingman prior, we derive the posterior distribution of a symbol's empirical frequency given the… ▽ More

    Submitted 4 June, 2024; v1 submitted 27 March, 2023; originally announced March 2023.

  8. arXiv:2302.07294  [pdf, other

    cs.LG stat.ME

    Derandomized Novelty Detection with FDR Control via Conformal E-values

    Authors: Meshi Bashari, Amir Epstein, Yaniv Romano, Matteo Sesia

    Abstract: Conformal inference provides a general distribution-free method to rigorously calibrate the output of any machine learning algorithm for novelty detection. While this approach has many strengths, it has the limitation of being randomized, in the sense that it may lead to different results when analyzing twice the same data, and this can hinder the interpretation of any findings. We propose to make… ▽ More

    Submitted 23 October, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

    Comments: 35 pages, 24 figures

  9. arXiv:2301.11556  [pdf, other

    stat.ML cs.LG math.ST

    Conformal inference is (almost) free for neural networks trained with early stop**

    Authors: Ziyi Liang, Yanfei Zhou, Matteo Sesia

    Abstract: Early stop** based on hold-out data is a popular regularization technique designed to mitigate overfitting and increase the predictive accuracy of neural networks. Models trained with early stop** often provide relatively accurate predictions, but they generally still lack precise statistical guarantees unless they are further calibrated using independent hold-out data. This paper addresses th… ▽ More

    Submitted 26 June, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

    Comments: Updates: extension to quantile regression, some further details about methodology, more numerical experiments

  10. arXiv:2211.04612  [pdf, other

    stat.ME math.ST stat.ML

    Conformal Frequency Estimation using Discrete Sketched Data with Coverage for Distinct Queries

    Authors: Matteo Sesia, Stefano Favaro, Edgar Dobriban

    Abstract: This paper develops conformal inference methods to construct a confidence interval for the frequency of a queried object in a very large discrete data set, based on a sketch with a lower memory footprint. This approach requires no knowledge of the data distribution and can be combined with any sketching algorithm, including but not limited to the renowned count-min sketch, the count-sketch, and va… ▽ More

    Submitted 15 August, 2023; v1 submitted 8 November, 2022; originally announced November 2022.

    Comments: 79 pages, 47 figures, 2 tables. Extended version of arXiv:2204.04270

  11. arXiv:2209.02135  [pdf, other

    stat.ME stat.ML

    Bayesian nonparametric estimation of coverage probabilities and distinct counts from sketched data

    Authors: Stefano Favaro, Matteo Sesia

    Abstract: The estimation of coverage probabilities, and in particular of the missing mass, is a classical statistical problem with applications in numerous scientific fields. In this paper, we study this problem in relation to randomized data compression, or sketching. This is a novel but practically relevant perspective, and it refers to situations in which coverage probabilities must be estimated based on… ▽ More

    Submitted 5 September, 2022; originally announced September 2022.

    Comments: 35 pages

  12. arXiv:2208.11111  [pdf, other

    stat.ME cs.LG math.ST stat.ML

    Integrative conformal p-values for powerful out-of-distribution testing with labeled outliers

    Authors: Ziyi Liang, Matteo Sesia, Wenguang Sun

    Abstract: This paper develops novel conformal methods to test whether a new observation was sampled from the same distribution as a reference set. Blending inductive and transductive conformal inference in an innovative way, the described methods can re-weight standard conformal p-values based on dependent side information from known out-of-distribution data in a principled way, and can automatically take a… ▽ More

    Submitted 23 August, 2022; originally announced August 2022.

  13. arXiv:2206.00885  [pdf, other

    stat.ML cs.LG stat.ME

    Coordinated Double Machine Learning

    Authors: Nitai Fingerhut, Matteo Sesia, Yaniv Romano

    Abstract: Double machine learning is a statistical method for leveraging complex black-box models to construct approximately unbiased treatment effect estimates given observational data with high-dimensional covariates, under the assumption of a partially linear model. The idea is to first fit on a subset of the samples two non-linear predictive models, one for the continuous outcome of interest and one for… ▽ More

    Submitted 2 June, 2022; originally announced June 2022.

    Comments: 9 pages, 6 figures

  14. arXiv:2205.08653  [pdf, other

    stat.ME math.ST

    Searching for subgroup-specific associations while controlling the false discovery rate

    Authors: Matteo Sesia, Tianshu Sun

    Abstract: This paper introduces an innovative method for conducting conditional independence testing in high-dimensional data, facilitating the automated discovery of significant associations within distinct subgroups of a population, all while controlling the false discovery rate. This is achieved by expanding upon the model-X knockoff filter to provide more informative inferences. Our enhanced inferences… ▽ More

    Submitted 15 September, 2023; v1 submitted 17 May, 2022; originally announced May 2022.

    Comments: 13 pages (25 pages including references and appendices)

  15. arXiv:2205.05878  [pdf, other

    stat.ML cs.LG

    Training Uncertainty-Aware Classifiers with Conformalized Deep Learning

    Authors: Bat-Sheva Einbinder, Yaniv Romano, Matteo Sesia, Yanfei Zhou

    Abstract: Deep neural networks are powerful tools to detect hidden patterns in data and leverage them to make predictions, but they are not designed to understand uncertainty and estimate reliable probabilities. In particular, they tend to be overconfident. We begin to address this problem in the context of multi-class classification by develo** a novel training algorithm producing models with more depend… ▽ More

    Submitted 8 November, 2022; v1 submitted 12 May, 2022; originally announced May 2022.

    Comments: 46 pages, 48 figures, 5 tables

  16. arXiv:2204.04270  [pdf, other

    stat.ME stat.ML

    Conformal Frequency Estimation with Sketched Data

    Authors: Matteo Sesia, Stefano Favaro

    Abstract: A flexible conformal inference method is developed to construct confidence intervals for the frequencies of queried objects in very large data sets, based on a much smaller sketch of those data. The approach is data-adaptive and requires no knowledge of the data distribution or of the details of the sketching algorithm; instead, it constructs provably valid frequentist confidence intervals under t… ▽ More

    Submitted 8 November, 2022; v1 submitted 8 April, 2022; originally announced April 2022.

    Comments: 28 pages, 24 figures, 2 tables

  17. arXiv:2108.08813  [pdf, other

    stat.AP

    Transfer learning in genome-wide association studies with knockoffs

    Authors: Shuangning Li, Zhimei Ren, Chiara Sabatti, Matteo Sesia

    Abstract: This paper presents and compares alternative transfer learning methods that can increase the power of conditional testing via knockoffs by leveraging prior information in external data sets collected from different populations or measuring related outcomes. The relevance of this methodology is explored in particular within the context of genome-wide association studies, where it can be helpful to… ▽ More

    Submitted 19 August, 2021; originally announced August 2021.

  18. arXiv:2106.04118  [pdf, other

    stat.ME math.ST stat.AP

    Searching for consistent associations with a multi-environment knockoff filter

    Authors: Shuangning Li, Matteo Sesia, Yaniv Romano, Emmanuel Candès, Chiara Sabatti

    Abstract: This paper develops a method based on model-X knockoffs to find conditional associations that are consistent across diverse environments, controlling the false discovery rate. The motivation for this problem is that large data sets may contain numerous associations that are statistically significant and yet misleading, as they are induced by confounders or sampling imperfections. However, associat… ▽ More

    Submitted 8 June, 2021; originally announced June 2021.

    Comments: 41 pages, 21 figures, 8 tables

  19. arXiv:2105.08747  [pdf, other

    stat.ME stat.ML

    Conformal Prediction using Conditional Histograms

    Authors: Matteo Sesia, Yaniv Romano

    Abstract: This paper develops a conformal method to compute prediction intervals for non-parametric regression that can automatically adapt to skewed data. Leveraging black-box machine learning algorithms to estimate the conditional distribution of the outcome using histograms, it translates their output into the shortest prediction intervals with approximate conditional coverage. The resulting prediction i… ▽ More

    Submitted 23 October, 2021; v1 submitted 18 May, 2021; originally announced May 2021.

    Comments: 12 pages, 4 figures. Supplement: 15 pages, 3 figures, 1 table

  20. arXiv:2104.08279  [pdf, other

    stat.ME math.ST stat.ML

    Testing for Outliers with Conformal p-values

    Authors: Stephen Bates, Emmanuel Candès, Lihua Lei, Yaniv Romano, Matteo Sesia

    Abstract: This paper studies the construction of p-values for nonparametric outlier detection, taking a multiple-testing perspective. The goal is to test whether new independent samples belong to the same distribution as a reference data set or are outliers. We propose a solution based on conformal inference, a broadly applicable framework which yields p-values that are marginally valid but mutually depende… ▽ More

    Submitted 24 May, 2022; v1 submitted 16 April, 2021; originally announced April 2021.

    Comments: Revision May 24, 2022: added "asymptotic" and "Monte Carlo" conditional calibration methods; added power analyses; updated numerical experiments to include new methods

    Journal ref: Ann. Statist. 51(1): 149-178 (February 2023)

  21. arXiv:2006.04937  [pdf, other

    eess.SP stat.AP stat.ML

    Interpretable Classification of Bacterial Raman Spectra with Knockoff Wavelets

    Authors: Charmaine Chia, Matteo Sesia, Chi-Sing Ho, Stefanie S. Jeffrey, Jennifer Dionne, Emmanuel J. Candès, Roger T. Howe

    Abstract: Deep neural networks and other sophisticated machine learning models are widely applied to biomedical signal data because they can detect complex patterns and compute accurate predictions. However, the difficulty of interpreting such models is a limitation, especially for applications involving high-stakes decision, including the identification of bacterial infections. In this paper, we consider f… ▽ More

    Submitted 1 May, 2021; v1 submitted 8 June, 2020; originally announced June 2020.

    Comments: 9 pages, 6 figures, 4 tables

  22. arXiv:2006.02544  [pdf, other

    stat.ME stat.ML

    Classification with Valid and Adaptive Coverage

    Authors: Yaniv Romano, Matteo Sesia, Emmanuel J. Candès

    Abstract: Conformal inference, cross-validation+, and the jackknife+ are hold-out methods that can be combined with virtually any machine learning algorithm to construct prediction sets with guaranteed marginal coverage. In this paper, we develop specialized versions of these techniques for categorical and unordered response labels that, in addition to providing marginal coverage, are also fully adaptive to… ▽ More

    Submitted 3 June, 2020; originally announced June 2020.

    Comments: 10 pages, 3 figures; 13 supplementary pages, 4 supplementary figures, 4 supplementary tables

    Journal ref: Advances in Neural Information Processing Systems 33 (NeurIPS 2020)

  23. Causal Inference in Genetic Trio Studies

    Authors: Stephen Bates, Matteo Sesia, Chiara Sabatti, Emmanuel Candes

    Abstract: We introduce a method to rigorously draw causal inferences---inferences immune to all possible confounding---from genetic data that include parents and offspring. Causal conclusions are possible with these data because the natural randomness in meiosis can be viewed as a high-dimensional randomized experiment. We make this observation actionable by develo** a novel conditional independence test… ▽ More

    Submitted 22 February, 2020; originally announced February 2020.

    Journal ref: Proc. Natl. Acad. Sci. U.S.A. 177 (2020) 24117-24126

  24. arXiv:1909.05433  [pdf, other

    stat.ME math.ST stat.ML

    A comparison of some conformal quantile regression methods

    Authors: Matteo Sesia, Emmanuel J. Candès

    Abstract: We compare two recently proposed methods that combine ideas from conformal inference and quantile regression to produce locally adaptive and marginally valid prediction intervals under sample exchangeability (Romano et al., 2019; Kivaranovic et al., 2019). First, we prove that these two approaches are asymptotically efficient in large samples, under some additional assumptions. Then we compare the… ▽ More

    Submitted 11 September, 2019; originally announced September 2019.

    Comments: 20 pages, 9 figures, 3 tables

    Journal ref: Stat. 2020; 9:e261

  25. arXiv:1903.05701  [pdf, other

    stat.ME math.ST stat.AP

    Rejoinder: "Gene Hunting with Hidden Markov Model Knockoffs"

    Authors: Matteo Sesia, Chiara Sabatti, Emmanuel J. Candès

    Abstract: In this paper we deepen and enlarge the reflection on the possible advantages of a knockoff approach to genome wide association studies (Sesia et al., 2018), starting from the discussions in Bottolo & Richardson (2019); Jewell & Witten (2019); Rosenblatt et al. (2019) and Marchini (2019). The discussants bring up a number of important points, either related to the knockoffs methodology in general,… ▽ More

    Submitted 13 March, 2019; originally announced March 2019.

    Comments: 12 pages, 4 figures

    Journal ref: Biometrika, Volume 106, Issue 1, 1 March 2019, Pages 35-45

  26. arXiv:1811.06687  [pdf, other

    stat.ME math.ST stat.AP stat.ML

    Deep Knockoffs

    Authors: Yaniv Romano, Matteo Sesia, Emmanuel J. Candès

    Abstract: This paper introduces a machine for sampling approximate model-X knockoffs for arbitrary and unspecified data distributions using deep generative models. The main idea is to iteratively refine a knockoff sampling mechanism until a criterion measuring the validity of the produced knockoffs is optimized; this criterion is inspired by the popular maximum mean discrepancy in machine learning and can b… ▽ More

    Submitted 16 November, 2018; originally announced November 2018.

    Comments: 37 pages, 23 figures, 1 table

    Journal ref: J. Am. Stat. Assoc., Volume 0, Issue 0, 17 Oct 2019, Pages 1-12

  27. arXiv:1706.04677  [pdf, other

    stat.ME math.ST stat.AP

    Gene Hunting with Knockoffs for Hidden Markov Models

    Authors: Matteo Sesia, Chiara Sabatti, Emmanuel J. Candès

    Abstract: Modern scientific studies often require the identification of a subset of relevant explanatory variables, in the attempt to understand an interesting phenomenon. Several statistical methods have been developed to automate this task, but only recently has the framework of model-free knockoffs proposed a general solution that can perform variable selection under rigorous type-I error control, withou… ▽ More

    Submitted 14 June, 2017; originally announced June 2017.

    Comments: 35 pages, 13 figues, 9 tables

    Journal ref: Biometrika, Volume 106, Issue 1, 1 March 2019, Pages 1-18