Skip to main content

Showing 1–21 of 21 results for author: Sabatti, C

.
  1. arXiv:2402.12724  [pdf, other

    stat.ME q-bio.GN stat.AP

    Controlled Variable Selection from Summary Statistics Only? A Solution via GhostKnockoffs and Penalized Regression

    Authors: Zhaomeng Chen, Zihuai He, Benjamin B. Chu, Jiaqi Gu, Tim Morrison, Chiara Sabatti, Emmanuel Candès

    Abstract: Identifying which variables do influence a response while controlling false positives pervades statistics and data science. In this paper, we consider a scenario in which we only have access to summary statistics, such as the values of marginal empirical correlations between each dependent variable of potential interest and the response. This situation may arise due to privacy concerns, e.g., to a… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  2. arXiv:2310.15069  [pdf, other

    stat.ME q-bio.GN stat.AP

    Second-order group knockoffs with applications to GWAS

    Authors: Benjamin B Chu, Jiaqi Gu, Zhaomeng Chen, Tim Morrison, Emmanuel Candes, Zihuai He, Chiara Sabatti

    Abstract: Conditional testing via the knockoff framework allows one to identify -- among large number of possible explanatory variables -- those that carry unique information about an outcome of interest, and also provides a false discovery rate guarantee on the selection. This approach is particularly well suited to the analysis of genome wide association studies (GWAS), which have the goal of identifying… ▽ More

    Submitted 3 March, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: 46 pages, 10 figures, 2 tables, 3 algorithms

  3. arXiv:2306.09976  [pdf, other

    stat.ME

    Catch me if you can: Signal localization with knockoff e-values

    Authors: Paula Gablenz, Chiara Sabatti

    Abstract: We consider problems where many, somewhat redundant, hypotheses are tested and we are interested in reporting the most precise rejections, with false discovery rate (FDR) control. This is the case, for example, when researchers are interested both in individual hypotheses as well as group hypotheses corresponding to intersections of sets of the original hypotheses, at several resolution levels. A… ▽ More

    Submitted 19 April, 2024; v1 submitted 16 June, 2023; originally announced June 2023.

    Comments: 48 pages, 12 figures; text edits (incl. abstract, appendix, additional remarks), added references

  4. arXiv:2211.08637  [pdf, other

    stat.OT

    Near-peer mentoring in data science: Two experiences at Stanford University

    Authors: Chiara Sabatti, Qian Zhao

    Abstract: Universities have been expanding the data science programs for undergraduate students, with the simultaneous goal of reaching and retaining students from underrepresented groups in the data science workforce. The set of new programs also offer opportunities to involve graduate students, fostering their growth as future leaders in data science education. We describe two programs that use the near p… ▽ More

    Submitted 8 June, 2024; v1 submitted 15 November, 2022; originally announced November 2022.

  5. arXiv:2108.08813  [pdf, other

    stat.AP

    Transfer learning in genome-wide association studies with knockoffs

    Authors: Shuangning Li, Zhimei Ren, Chiara Sabatti, Matteo Sesia

    Abstract: This paper presents and compares alternative transfer learning methods that can increase the power of conditional testing via knockoffs by leveraging prior information in external data sets collected from different populations or measuring related outcomes. The relevance of this methodology is explored in particular within the context of genome-wide association studies, where it can be helpful to… ▽ More

    Submitted 19 August, 2021; originally announced August 2021.

  6. arXiv:2106.04118  [pdf, other

    stat.ME math.ST stat.AP

    Searching for consistent associations with a multi-environment knockoff filter

    Authors: Shuangning Li, Matteo Sesia, Yaniv Romano, Emmanuel Candès, Chiara Sabatti

    Abstract: This paper develops a method based on model-X knockoffs to find conditional associations that are consistent across diverse environments, controlling the false discovery rate. The motivation for this problem is that large data sets may contain numerous associations that are statistically significant and yet misleading, as they are induced by confounders or sampling imperfections. However, associat… ▽ More

    Submitted 8 June, 2021; originally announced June 2021.

    Comments: 41 pages, 21 figures, 8 tables

  7. Causal Inference in Genetic Trio Studies

    Authors: Stephen Bates, Matteo Sesia, Chiara Sabatti, Emmanuel Candes

    Abstract: We introduce a method to rigorously draw causal inferences---inferences immune to all possible confounding---from genetic data that include parents and offspring. Causal conclusions are possible with these data because the natural randomness in meiosis can be viewed as a high-dimensional randomized experiment. We make this observation actionable by develo** a novel conditional independence test… ▽ More

    Submitted 22 February, 2020; originally announced February 2020.

    Journal ref: Proc. Natl. Acad. Sci. U.S.A. 177 (2020) 24117-24126

  8. arXiv:1908.05428  [pdf, other

    stat.ME cs.CY stat.AP stat.ML

    With Malice Towards None: Assessing Uncertainty via Equalized Coverage

    Authors: Yaniv Romano, Rina Foygel Barber, Chiara Sabatti, Emmanuel J. Candès

    Abstract: An important factor to guarantee a fair use of data-driven recommendation systems is that we should be able to communicate their uncertainty to decision makers. This can be accomplished by constructing prediction intervals, which provide an intuitive measure of the limits of predictive performance. To support equitable treatment, we force the construction of such intervals to be unbiased in the se… ▽ More

    Submitted 15 August, 2019; originally announced August 2019.

    Comments: 14 pages, 1 figure, 1 table

  9. arXiv:1903.05701  [pdf, other

    stat.ME math.ST stat.AP

    Rejoinder: "Gene Hunting with Hidden Markov Model Knockoffs"

    Authors: Matteo Sesia, Chiara Sabatti, Emmanuel J. Candès

    Abstract: In this paper we deepen and enlarge the reflection on the possible advantages of a knockoff approach to genome wide association studies (Sesia et al., 2018), starting from the discussions in Bottolo & Richardson (2019); Jewell & Witten (2019); Rosenblatt et al. (2019) and Marchini (2019). The discussants bring up a number of important points, either related to the knockoffs methodology in general,… ▽ More

    Submitted 13 March, 2019; originally announced March 2019.

    Comments: 12 pages, 4 figures

    Journal ref: Biometrika, Volume 106, Issue 1, 1 March 2019, Pages 35-45

  10. arXiv:1809.01792  [pdf, other

    stat.ME

    Filtering the rejection set while preserving false discovery rate control

    Authors: Eugene Katsevich, Chiara Sabatti, Marina Bogomolov

    Abstract: Scientific hypotheses in a variety of applications have domain-specific structures, such as the tree structure of the International Classification of Diseases (ICD), the directed acyclic graph structure of the Gene Ontology (GO), or the spatial structure in genome-wide association studies. In the context of multiple testing, the resulting relationships among hypotheses can create redundancies amon… ▽ More

    Submitted 10 April, 2020; v1 submitted 5 September, 2018; originally announced September 2018.

  11. arXiv:1801.08686  [pdf, other

    stat.AP

    Selection-adjusted inference: an application to confidence intervals for cis-eQTL effect sizes

    Authors: Snigdha Panigrahi, Junjie Zhu, Chiara Sabatti

    Abstract: The goal of eQTL studies is to identify the genetic variants that influence the expression levels of the genes in an organism. High throughput technology has made such studies possible: in a given tissue sample, it enables us to quantify the expression levels of approximately 20,000 genes and to record the alleles present at millions of genetic polymorphisms. While obtaining this data is relativel… ▽ More

    Submitted 6 June, 2018; v1 submitted 26 January, 2018; originally announced January 2018.

  12. arXiv:1706.09375  [pdf, other

    stat.ME

    Multilayer Knockoff Filter: Controlled variable selection at multiple resolutions

    Authors: Eugene Katsevich, Chiara Sabatti

    Abstract: We tackle the problem of selecting from among a large number of variables those that are 'important' for an outcome. We consider situations where groups of variables are also of interest in their own right. For example, each variable might be a genetic polymorphism and we might want to study how a trait depends on variability in genes, segments of DNA that typically contain multiple such polymorph… ▽ More

    Submitted 9 August, 2018; v1 submitted 28 June, 2017; originally announced June 2017.

  13. arXiv:1706.04677  [pdf, other

    stat.ME math.ST stat.AP

    Gene Hunting with Knockoffs for Hidden Markov Models

    Authors: Matteo Sesia, Chiara Sabatti, Emmanuel J. Candès

    Abstract: Modern scientific studies often require the identification of a subset of relevant explanatory variables, in the attempt to understand an interesting phenomenon. Several statistical methods have been developed to automate this task, but only recently has the framework of model-free knockoffs proposed a general solution that can perform variable selection under rigorous type-I error control, withou… ▽ More

    Submitted 14 June, 2017; originally announced June 2017.

    Comments: 35 pages, 13 figues, 9 tables

    Journal ref: Biometrika, Volume 106, Issue 1, 1 March 2019, Pages 1-18

  14. arXiv:1705.07529  [pdf, other

    stat.ME

    Testing hypotheses on a tree: new error rates and controlling strategies

    Authors: Marina Bogomolov, Christine B. Peterson, Yoav Benjamini, Chiara Sabatti

    Abstract: We introduce a multiple testing procedure (TreeBH) which addresses the challenge of controlling error rates at multiple levels of resolution. Conceptually, we frame this problem as the selection of hypotheses which are organized hierarchically in a tree structure. We describe a fast algorithm for the proposed sequential procedure, and prove that it controls relevant error rates given certain assum… ▽ More

    Submitted 23 October, 2018; v1 submitted 21 May, 2017; originally announced May 2017.

  15. arXiv:1610.03330  [pdf, other

    stat.ME

    Detecting Multiple Replicating Signals using Adaptive Filtering Procedures

    Authors: **gshu Wang, Lin Gui, Weijie J. Su, Chiara Sabatti, Art B. Owen

    Abstract: Replicability is a fundamental quality of scientific discoveries: we are interested in those signals that are detectable in different laboratories, study populations, across time etc. Unlike meta-analysis which accounts for experimental variability but does not guarantee replicability, testing a partial conjunction (PC) null aims specifically to identify the signals that are discovered in multiple… ▽ More

    Submitted 18 November, 2021; v1 submitted 11 October, 2016; originally announced October 2016.

  16. Genetic variant selection: learning across traits and sites

    Authors: Laurel Stell, Chiara Sabatti

    Abstract: We consider resequencing studies of associated loci and the problem of prioritizing sequence variants for functional follow-up. Working within the multivariate linear regression framework helps us to account for correlation across variants, and adopting a Bayesian approach naturally leads to posterior probabilities that incorporate all information about the variants' function. We describe two nove… ▽ More

    Submitted 4 April, 2016; v1 submitted 3 April, 2015; originally announced April 2015.

    Comments: Published at http://www.genetics.org/content/202/2/439 in GENETICS (http://www.genetics.org)

    Journal ref: Genetics 2016, vol. 202, no. 2, 439-455

  17. arXiv:1504.00701  [pdf, other

    stat.AP

    Many Phenotypes without Many False Discoveries: Error Controlling Strategies for Multi-Traits Association Studies

    Authors: Christine Peterson, Marina Bogomolov, Yoav Benjamini, Chiara Sabatti

    Abstract: The genetic basis of multiple phenotypes such as gene expression, metabolite levels, or imaging features is often investigated by testing a large collection of hypotheses, probing the existence of association between each of the traits and hundreds of thousands of genotyped variants. Appropriate multiplicity adjustment is crucial to guarantee replicability of findings, and False Discovery Rate (FD… ▽ More

    Submitted 2 April, 2015; originally announced April 2015.

  18. SLOPE - Adaptive variable selection via convex optimization

    Authors: Małgorzata Bogdan, Ewout van den Berg, Chiara Sabatti, Weijie Su, Emmanuel J. Candès

    Abstract: We introduce a new estimator for the vector of coefficients $β$ in the linear model $y=Xβ+z$, where $X$ has dimensions $n\times p$ with $p$ possibly larger than $n$. SLOPE, short for Sorted L-One Penalized Estimation, is the solution to \[\min_{b\in\mathbb{R}^p}\frac{1}{2}\Vert y-Xb\Vert _{\ell_2}^2+λ_1\vert b\vert _{(1)}+λ_2\vert b\vert_{(2)}+\cdots+λ_p\vert b\vert_{(p)},\] where… ▽ More

    Submitted 4 November, 2015; v1 submitted 14 July, 2014; originally announced July 2014.

    Comments: Published at http://dx.doi.org/10.1214/15-AOAS842 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS842

    Journal ref: Annals of Applied Statistics 2015, Vol. 9, No. 3, 1103-1140

  19. arXiv:1202.5064  [pdf, other

    stat.ME stat.AP

    Reconstructing DNA copy number by joint segmentation of multiple sequences

    Authors: Zhongyang Zhang, Kenneth Lange, Chiara Sabatti

    Abstract: The variation in DNA copy number carries information on the modalities of genome evolution and misregulation of DNA replication in cancer cells; its study can be helpful to localize tumor suppressor genes, distinguish different populations of cancerous cell, as well identify genomic variations responsible for disease phenotypes. A number of different high throughput technologies can be used to ide… ▽ More

    Submitted 14 March, 2012; v1 submitted 22 February, 2012; originally announced February 2012.

    Comments: 54 pages, 5 figures

  20. Sparse regulatory networks

    Authors: Gareth M. James, Chiara Sabatti, Nengfeng Zhou, Ji Zhu

    Abstract: In many organisms the expression levels of each gene are controlled by the activation levels of known "Transcription Factors" (TF). A problem of considerable interest is that of estimating the "Transcription Regulation Networks" (TRN) relating the TFs and genes. While the expression levels of genes can be observed, the activation levels of the corresponding TFs are usually unknown, greatly increas… ▽ More

    Submitted 8 November, 2010; originally announced November 2010.

    Comments: Published in at http://dx.doi.org/10.1214/10-AOAS350 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS350

    Journal ref: Annals of Applied Statistics 2010, Vol. 4, No. 2, 663-686

  21. arXiv:0906.2234  [pdf, ps, other

    stat.ME q-bio.GN stat.AP

    Reconstructing DNA copy number by penalized estimation and imputation

    Authors: Zhongyang Zhang, Kenneth Lange, Roel Ophoff, Chiara Sabatti

    Abstract: Recent advances in genomics have underscored the surprising ubiquity of DNA copy number variation (CNV). Fortunately, modern genoty** platforms also detect CNVs with fairly high reliability. Hidden Markov models and algorithms have played a dominant role in the interpretation of CNV data. Here we explore CNV reconstruction via estimation with a fused-lasso penalty as suggested by Tibshirani and… ▽ More

    Submitted 10 January, 2011; v1 submitted 11 June, 2009; originally announced June 2009.

    Comments: Published in at http://dx.doi.org/10.1214/10-AOAS357 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS357

    Journal ref: Annals of Applied Statistics 2010, Vol. 4, No. 4, 1749-1773