Skip to main content

Showing 1–31 of 31 results for author: Benjamini, Y

.
  1. arXiv:2404.00319  [pdf, other

    stat.ME stat.AP

    Direction Preferring Confidence Intervals

    Authors: Tzviel Frostig, Yoav Benjamini, Ruth Heller

    Abstract: Confidence intervals (CIs) are instrumental in statistical analysis, providing a range estimate of the parameters. In modern statistics, selective inference is common, where only certain parameters are highlighted. However, this selective approach can bias the inference, leading some to advocate for the use of CIs over p-values. To increase the flexibility of confidence intervals, we introduce dir… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 11 figures, 45 pages

    MSC Class: 62P10

  2. arXiv:2311.18575  [pdf, other

    cs.LG

    Class Distribution Shifts in Zero-Shot Learning: Learning Robust Representations

    Authors: Yuli Slavutsky, Yuval Benjamini

    Abstract: Zero-shot learning methods typically assume that the new, unseen classes that are encountered at deployment, come from the same distribution as training classes. However, real-world scenarios often involve class distribution shifts (e.g., in age or gender for person identification), posing challenges for zero-shot classifiers that rely on learned representations from training classes. In this work… ▽ More

    Submitted 27 May, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

  3. arXiv:2307.15361  [pdf, other

    stat.ML cs.AI cs.LG

    Confident Feature Ranking

    Authors: Bitya Neuhof, Yuval Benjamini

    Abstract: Machine learning models are widely applied in various fields. Stakeholders often use post-hoc feature importance methods to better understand the input features' contribution to the models' predictions. The interpretation of the importance values provided by these methods is frequently based on the relative order of the features (their ranking) rather than the importance values themselves. Since t… ▽ More

    Submitted 18 April, 2024; v1 submitted 28 July, 2023; originally announced July 2023.

  4. arXiv:2303.13330  [pdf, other

    stat.ME stat.ML

    Logistic Regression Equivalence: A Framework for Comparing Logistic Regression Models Across Populations

    Authors: Guy Ashiri-Prossner, Yuval Benjamini

    Abstract: In this paper we discuss how to evaluate the differences between fitted logistic regression models across sub-populations. Our motivating example is in studying computerized diagnosis for learning disabilities, where sub-populations based on gender may or may not require separate models. In this context, significance tests for hypotheses of no difference between populations may provide perverse in… ▽ More

    Submitted 23 March, 2023; originally announced March 2023.

  5. arXiv:2111.07444  [pdf, other

    stat.AP stat.ME

    Detecting Differences Between Correlation-Matrix Populations due to Single-variable Perturbations, with Application to Resting State fMRI

    Authors: Itamar Faran, Michael Peer, Shahar Arzy, Yuval Benjamini

    Abstract: Correlation matrices provide a useful way to characterize variable dependencies in many real-world problems. Often, a perturbation in few variables can lead to small differences in multiple correlation coefficients related to these variables. In this paper we propose a low-dimensional representation of these differences as a product of single-variable perturbations that can efficiently characteriz… ▽ More

    Submitted 14 November, 2021; originally announced November 2021.

  6. arXiv:2010.15011  [pdf, other

    cs.LG cs.AI stat.ML

    Predicting Classification Accuracy When Adding New Unobserved Classes

    Authors: Yuli Slavutsky, Yuval Benjamini

    Abstract: Multiclass classifiers are often designed and evaluated only on a sample from the classes on which they will eventually be applied. Hence, their final accuracy remains unknown. In this work we study how a classifier's performance over the initial class sample can be used to extrapolate its expected accuracy on a larger, unobserved set of classes. For this, we define a measure of separation between… ▽ More

    Submitted 9 March, 2021; v1 submitted 28 October, 2020; originally announced October 2020.

    Journal ref: International Conference on Learning Representations (ICLR), 2021

  7. arXiv:2006.11585  [pdf

    stat.ME

    Ignored evident multiplicity harms replicability -- adjusting for it offers a remedy

    Authors: Yoav Zeevi, Sofi Astashenko, Yoav Benjamini

    Abstract: It is a central dogma in science that a result of a study should be replicable. Only 90 of the 190 replications attempts were successful. We attribute a substantial part of the problem to selective inference evident in the paper, which is the practice of selecting some of the results from the many. 100 papers in the Reproducibility Project in Psychology were analyzed. It was evident that the repor… ▽ More

    Submitted 19 May, 2021; v1 submitted 20 June, 2020; originally announced June 2020.

    Comments: 28 pages, 2 figures, 1 table

  8. arXiv:1912.10472  [pdf, other

    stat.ME

    Testing the equality of multivariate means when $p>n$ by combining the Hoteling and Simes tests

    Authors: Tzviel Frostig, Yoav Benjamini

    Abstract: We propose a method of testing the shift between mean vectors of two multivariate Gaussian random variables in a high-dimensional setting incorporating the possible dependency and allowing $p > n$. This method is a combination of two well-known tests: the Hotelling test and the Simes test. The tests are integrated by sampling several dimensions at each iteration, testing each using the Hotelling t… ▽ More

    Submitted 22 December, 2019; originally announced December 2019.

  9. arXiv:1907.06856  [pdf, other

    stat.ME

    Quantifying replicability and consistency in systematic reviews

    Authors: Iman Jaljuli, Yoav Benjamini, Liat Shenhav, Orestis Panagiotou, Ruth Heller

    Abstract: Systematic reviews of interventions are important tools for synthesizing evidence from multiple studies. They serve to increase power and improve precision, in the same way that larger studies can do, but also to establish the consistency of effects and replicability of results across studies which are not identical. In this work we suggest to incorporate replicability analysis tools to quantify t… ▽ More

    Submitted 18 April, 2021; v1 submitted 16 July, 2019; originally announced July 2019.

  10. arXiv:1906.00505  [pdf, other

    stat.ME math.ST stat.ML

    Confidence Intervals for Selected Parameters

    Authors: Yoav Benjamini, Yotam Hechtlinger, Philip B. Stark

    Abstract: Practical or scientific considerations often lead to selecting a subset of parameters as ``important.'' Inferences about those parameters often are based on the same data used to select them in the first place. That can make the reported uncertainties deceptively optimistic: confidence intervals that ignore selection generally have less than their nominal coverage probability. Controlling the prob… ▽ More

    Submitted 2 June, 2019; originally announced June 2019.

    Comments: 36 pages, 11 figures

  11. arXiv:1712.09713  [pdf, other

    stat.ML cs.CV cs.LG

    Extrapolating Expected Accuracies for Large Multi-Class Problems

    Authors: Charles Zheng, Rakesh Achanta, Yuval Benjamini

    Abstract: The difficulty of multi-class classification generally increases with the number of classes. Using data from a subset of the classes, can we predict how well a classifier will scale with an increased number of classes? Under the assumptions that the classes are sampled identically and independently from a population, and that the classifier is based on independently learned scoring functions, we s… ▽ More

    Submitted 27 December, 2017; originally announced December 2017.

    Comments: Submitted to JMLR

  12. arXiv:1705.07529  [pdf, other

    stat.ME

    Testing hypotheses on a tree: new error rates and controlling strategies

    Authors: Marina Bogomolov, Christine B. Peterson, Yoav Benjamini, Chiara Sabatti

    Abstract: We introduce a multiple testing procedure (TreeBH) which addresses the challenge of controlling error rates at multiple levels of resolution. Conceptually, we frame this problem as the selection of hypotheses which are organized hierarchically in a tree structure. We describe a fast algorithm for the proposed sequential procedure, and prove that it controls relevant error rates given certain assum… ▽ More

    Submitted 23 October, 2018; v1 submitted 21 May, 2017; originally announced May 2017.

  13. Better-Than-Chance Classification for Signal Detection

    Authors: Jonathan D. Rosenblatt, Yuval Benjamini, Roee Gilron, Roy Mukamel, Jelle J. Goeman

    Abstract: The estimated accuracy of a classifier is a random quantity with variability. A common practice in supervised machine learning, is thus to test if the estimated accuracy is significantly better than chance level. This method of signal detection is particularly popular in neuroimaging and genetics. We provide evidence that using a classifier's accuracy as a test statistic can be an underpowered str… ▽ More

    Submitted 14 December, 2017; v1 submitted 31 August, 2016; originally announced August 2016.

  14. arXiv:1606.05229  [pdf, other

    stat.ML cs.IT

    Estimating mutual information in high dimensions via classification error

    Authors: Charles Y. Zheng, Yuval Benjamini

    Abstract: Multivariate pattern analyses approaches in neuroimaging are fundamentally concerned with investigating the quantity and type of information processed by various regions of the human brain; typically, estimates of classification accuracy are used to quantify information. While a extensive and powerful library of methods can be applied to train and assess classifiers, it is not always clear how to… ▽ More

    Submitted 10 October, 2016; v1 submitted 16 June, 2016; originally announced June 2016.

  15. arXiv:1606.05228  [pdf, other

    stat.ML cs.CV cs.IT cs.LG

    How many faces can be recognized? Performance extrapolation for multi-class classification

    Authors: Charles Y. Zheng, Rakesh Achanta, Yuval Benjamini

    Abstract: The difficulty of multi-class classification generally increases with the number of classes. Using data from a subset of the classes, can we predict how well a classifier will scale with an increased number of classes? Under the assumption that the classes are sampled exchangeably, and under the assumption that the classifier is generative (e.g. QDA or Naive Bayes), we show that the expected accur… ▽ More

    Submitted 16 June, 2016; originally announced June 2016.

    Comments: Submitted to NIPS 2016

  16. arXiv:1507.07270  [pdf

    q-bio.NC

    Searching for behavioral homologies: Shared generative rules for expansion and narrowing down of the locomotor repertoire in Arthropods and Vertebrates

    Authors: A. Gomez-Marin, E. Oron, A. Gakamsky, D. Valente, Y. Benjamini, I. Golani

    Abstract: We use immobility as an origin and reference for the measurement of locomotor behavior; speed, the direction of walking and the direction of facing as the three degrees of freedom sha** fly locomotor behavior, and cocaine as the parameter inducing a progressive transition in and out of immobility. In this way we expose and quantify the generative rules that shape fruit fly locomotor behavior, wh… ▽ More

    Submitted 26 July, 2015; originally announced July 2015.

  17. Co** with Space Neophobia in Drosophila melanogaster: The Asymmetric Dynamics of Crossing a Doorway to the Untrodden

    Authors: Shay Cohen, Yoav Benjamini, Ilan Golani

    Abstract: Insects exhibit remarkable cognitive skills in the field and several cognitive abilities have been demonstrated in Drosophila in the laboratory. By devising an ethologically relevant experimental setup that also allows comparison of behavior across remote taxonomic groups we sought to reduce the gap between the field and the laboratory, and reveal as yet undiscovered ethological phenomena within a… ▽ More

    Submitted 28 June, 2015; originally announced June 2015.

  18. arXiv:1504.00701  [pdf, other

    stat.AP

    Many Phenotypes without Many False Discoveries: Error Controlling Strategies for Multi-Traits Association Studies

    Authors: Christine Peterson, Marina Bogomolov, Yoav Benjamini, Chiara Sabatti

    Abstract: The genetic basis of multiple phenotypes such as gene expression, metabolite levels, or imaging features is often investigated by testing a large collection of hypotheses, probing the existence of association between each of the traits and hundreds of thousands of genotyped variants. Appropriate multiplicity adjustment is crucial to guarantee replicability of findings, and False Discovery Rate (FD… ▽ More

    Submitted 2 April, 2015; originally announced April 2015.

  19. arXiv:1503.02278  [pdf, ps, other

    stat.ME

    Testing for replicability in a follow-up study when the primary study hypotheses are two-sided

    Authors: Ruth Heller, Marina Bogomolov, Yoav Benjamini, Tamar Sofer

    Abstract: When testing for replication of results from a primary study with two-sided hypotheses in a follow-up study, we are usually interested in discovering the features with discoveries in the same direction in the two studies. The direction of testing in the follow-up study for each feature can therefore be decided by the primary study. We prove that in this case the methods suggested in Heller, Bogomo… ▽ More

    Submitted 8 March, 2015; originally announced March 2015.

    Comments: arXiv admin note: text overlap with arXiv:1310.0606

  20. arXiv:1502.00088  [pdf, other

    stat.AP

    Quantifying replicability in systematic reviews: the r-value

    Authors: Liat Shenhav, Ruth Heller, Yoav Benjamini

    Abstract: In order to assess the effect of a health care intervention, it is useful to look at an ensemble of relevant studies. The Cochrane Collaboration's admirable goal is to provide systematic reviews of all relevant clinical studies, in order to establish whether or not there is a conclusive evidence about a specific intervention. This is done mainly by conducting a meta-analysis: a statistical synthes… ▽ More

    Submitted 10 May, 2015; v1 submitted 31 January, 2015; originally announced February 2015.

  21. arXiv:1412.3242  [pdf, other

    stat.ME

    Selective Correlations - the conditional estimators

    Authors: Yoav Benjamini, Amit Meir

    Abstract: The problem of Voodoo correlations is recognized in neuroimaging as the problem of estimating quantities of interest from the same data that was used to select them as interesting. In statistical terminology, the problem of inference following selection from the same data is that of selective inference. Motivated by the unwelcome side-effects of the recommended remedy- splitting the data. A method… ▽ More

    Submitted 10 December, 2014; originally announced December 2014.

    Comments: 18 pages, 10 figures

  22. The shuffle estimator for explainable variance in fMRI experiments

    Authors: Yuval Benjamini, Bin Yu

    Abstract: In computational neuroscience, it is important to estimate well the proportion of signal variance in the total variance of neural activity measurements. This explainable variance measure helps neuroscientists assess the adequacy of predictive models that describe how images are encoded in the brain. Complicating the estimation problem are strong noise correlations, which may confound the neural re… ▽ More

    Submitted 13 January, 2014; originally announced January 2014.

    Comments: Published in at http://dx.doi.org/10.1214/13-AOAS681 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS681

    Journal ref: Annals of Applied Statistics 2013, Vol. 7, No. 4, 2007-2033

  23. Another Argument in Favour of Wilcoxon's Signed Rank Test

    Authors: Jonathan Rosenblatt, Yoav Benjamini

    Abstract: The Wilcoxon Signed Rank test is typically called upon when testing whether a symmetric distribution has a specified centre and the Gaussianity is in question. As with all insurance policies it comes with a cost, even if small, in terms of power versus a t-test, when the distribution is indeed Gaussian. In this note we further show that even when the distribution tested is Gaussian there need not… ▽ More

    Submitted 21 November, 2013; originally announced November 2013.

  24. arXiv:1310.0606  [pdf, ps, other

    stat.AP stat.ME

    Deciding whether follow-up studies have replicated findings in a preliminary large-scale "omics' study"

    Authors: Ruth Heller, Marina Bogomolov, Yoav Benjamini

    Abstract: We propose a formal method to declare that findings from a primary study have been replicated in a follow-up study. Our proposal is appropriate for primary studies that involve large-scale searches for rare true positives (i.e. needles in a haystack). Our proposal assigns an $r$-value to each finding; this is the lowest false discovery rate at which the finding can be called replicated. Examples a… ▽ More

    Submitted 10 June, 2014; v1 submitted 2 October, 2013; originally announced October 2013.

    Journal ref: Proceedings of the National Academy of Sciences of the United States of America (PNAS), 2014 vol. 111 no. 46, 16262-16267

  25. Revisiting Multi-Subject Random Effects in fMRI: Advocating Prevalence Estimation

    Authors: Jonathan D. Rosenblatt, Matthijs Vink, Yoav Benjamini

    Abstract: Random Effects analysis has been introduced into fMRI research in order to generalize findings from the study group to the whole population. Generalizing findings is obviously harder than detecting activation in the study group since in order to be significant, an activation has to be larger than the inter-subject variability. Indeed, detected regions are smaller when using random effect analysis… ▽ More

    Submitted 31 March, 2013; v1 submitted 14 December, 2012; originally announced December 2012.

  26. arXiv:1106.3670  [pdf, ps, other

    math.ST

    Adjusting for selection bias in testing multiple families of hypotheses

    Authors: Yoav Benjamini, Marina Bogomolov

    Abstract: In many large multiple testing problems the hypotheses are divided into families. Given the data, families with evidence for true discoveries are selected, and hypotheses within them are tested. Neither controlling the error-rate in each family separately nor controlling the error-rate over all hypotheses together can assure that an error-rate is controlled in the selected families. We formulate t… ▽ More

    Submitted 18 June, 2011; originally announced June 2011.

  27. High-throughput data analysis in behavior genetics

    Authors: Anat Sakov, Ilan Golani, Dina Lipkind, Yoav Benjamini

    Abstract: In recent years, a growing need has arisen in different fields for the development of computational systems for automated analysis of large amounts of data (high-throughput). Dealing with nonstandard noise structure and outliers, that could have been detected and corrected in manual analysis, must now be built into the system with the aid of robust methods. We discuss such problems and present ins… ▽ More

    Submitted 9 November, 2010; originally announced November 2010.

    Comments: Published in at http://dx.doi.org/10.1214/09-AOAS304 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS304

    Journal ref: Annals of Applied Statistics 2010, Vol. 4, No. 2, 743-763

  28. A simple forward selection procedure based on false discovery rate control

    Authors: Yoav Benjamini, Yulia Gavrilov

    Abstract: We propose the use of a new false discovery rate (FDR) controlling procedure as a model selection penalized method, and compare its performance to that of other penalized methods over a wide range of realistic settings: nonorthogonal design matrices, moderate and large pool of explanatory variables, and both sparse and nonsparse models, in the sense that they may include a small and large fracti… ▽ More

    Submitted 18 May, 2009; originally announced May 2009.

    Comments: Published in at http://dx.doi.org/10.1214/08-AOAS194 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS194

    Journal ref: Annals of Applied Statistics 2009, Vol. 3, No. 1, 179-198

  29. An adaptive step-down procedure with proven FDR control under independence

    Authors: Yulia Gavrilov, Yoav Benjamini, Sanat K. Sarkar

    Abstract: In this work we study an adaptive step-down procedure for testing $m$ hypotheses. It stems from the repeated use of the false discovery rate controlling the linear step-up procedure (sometimes called BH), and makes use of the critical constants $iq/[(m+1-i(1-q)]$, $i=1,...,m$. Motivated by its success as a model selection procedure, as well as by its asymptotic optimality, we are interested in i… ▽ More

    Submitted 31 March, 2009; originally announced March 2009.

    Comments: Published in at http://dx.doi.org/10.1214/07-AOS586 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOS-AOS586 MSC Class: 62J15 (Primary)

    Journal ref: Annals of Statistics 2009, Vol. 37, No. 2, 619-629

  30. Comment: Microarrays, Empirical Bayes and the Two-Groups Model

    Authors: Yoav Benjamini

    Abstract: Comment on ``Microarrays, Empirical Bayes and the Two-Groups Model'' [arXiv:0808.0572]

    Submitted 5 August, 2008; originally announced August 2008.

    Comments: Published in at http://dx.doi.org/10.1214/07-STS236B the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-STS-STS236B

    Journal ref: Statistical Science 2008, Vol. 23, No. 1, 23-28

  31. arXiv:math/0505374  [pdf, ps, other

    math.ST

    Adapting to Unknown Sparsity by controlling the False Discovery Rate

    Authors: Felix Abramovich, Yoav Benjamini, David L. Donoho, Iain M. Johnstone

    Abstract: We attempt to recover an $n$-dimensional vector observed in white noise, where $n$ is large and the vector is known to be sparse, but the degree of sparsity is unknown. We consider three different ways of defining sparsity of a vector: using the fraction of nonzero terms; imposing power-law decay bounds on the ordered entries; and controlling the $\ell_p$ norm for $p$ small. We obtain a procedur… ▽ More

    Submitted 18 May, 2005; originally announced May 2005.

    Comments: This is a complete version of a paper to appear in Annals of Statitistics. The paper in AoS has certain proofs abbreviated that are given here in detail

    MSC Class: 62F10; 62G12