Search | arXiv e-print repository

False Discovery Rate Control for Lesion-Symptom Map** with Heterogeneous data via Weighted P-values

Authors: Siyu Zheng, Alexander C. McLain, Joshua Habiger, Christopher Rorden, Julius Fridriksson

Abstract: Lesion-symptom map** studies provide insight into what areas of the brain are involved in different aspects of cognition. This is commonly done via behavioral testing in patients with a naturally occurring brain injury or lesions (e.g., strokes or brain tumors). This results in high-dimensional observational data where lesion status (present/absent) is non-uniformly distributed with some voxels… ▽ More Lesion-symptom map** studies provide insight into what areas of the brain are involved in different aspects of cognition. This is commonly done via behavioral testing in patients with a naturally occurring brain injury or lesions (e.g., strokes or brain tumors). This results in high-dimensional observational data where lesion status (present/absent) is non-uniformly distributed with some voxels having lesions in very few (or no) subjects. In this situation, mass univariate hypothesis tests have severe power heterogeneity where many tests are known a priori to have little to no power. Recent advancements in multiple testing methodologies allow researchers to weigh hypotheses according to side-information (e.g., information on power heterogeneity). In this paper, we propose the use of p-value weighting for voxel-based lesion-symptom map** (VLSM) studies. The weights are created using the distribution of lesion status and spatial information to estimate different non-null prior probabilities for each hypothesis test through some common approaches. We provide a monotone minimum weight criterion which requires minimum a priori power information. Our methods are demonstrated on dependent simulated data and an aphasia study investigating which regions of the brain are associated with the severity of language impairment among stroke survivors. The results demonstrate that the proposed methods have robust error control and can increase power. Further, we showcase how weights can be used to identify regions that are inconclusive due to lack of power. △ Less

Submitted 16 August, 2023; originally announced August 2023.

MSC Class: 62J15

arXiv:1709.05269 [pdf, ps, other]

doi 10.1080/24754269.2017.1387445

The Inuence of Misspecified Covariance on False Discovery Control when Using Posterior Probabilities

Authors: Ye Liang, Joshua D. Habiger, Xiaoyi Min

Abstract: This paper focuses on the influence of a misspecified covariance structure on false discovery rate for the large scale multiple testing problem. Specifically, we evaluate the influence on the marginal distribution of local fdr statistics, which are used in many multiple testing procedures and related to Bayesian posterior probabilities. Explicit forms of the marginal distributions under both corre… ▽ More This paper focuses on the influence of a misspecified covariance structure on false discovery rate for the large scale multiple testing problem. Specifically, we evaluate the influence on the marginal distribution of local fdr statistics, which are used in many multiple testing procedures and related to Bayesian posterior probabilities. Explicit forms of the marginal distributions under both correctly specified and incorrectly specified models are derived. The Kullback-Leibler divergence is used to quantify the influence caused by a misspecification. Several numerical examples are provided to illustrate the influence. A real spatio-temporal data on soil humidity is discussed. △ Less

Submitted 15 September, 2017; originally announced September 2017.

Comments: 22 pages, 5 figures

Journal ref: Statistical Theory and Related Fields, Vol 1 (2017) 205-215

arXiv:1511.01400 [pdf, ps, other]

Multiple Testing with Heterogeneous Multinomial Distributions

Authors: Joshua Habiger, David Watts, Michael Anderson

Abstract: False discovery rate (FDR) procedures provide misleading inference when testing multiple null hypotheses with heterogeneous multinomial data. For example, in the motivating study the goal is to identify species of bacteria near the roots of wheat plants (rhizobacteria) that are associated with productivity, but standard procedures discover the most abundant species even when the association is wea… ▽ More False discovery rate (FDR) procedures provide misleading inference when testing multiple null hypotheses with heterogeneous multinomial data. For example, in the motivating study the goal is to identify species of bacteria near the roots of wheat plants (rhizobacteria) that are associated with productivity, but standard procedures discover the most abundant species even when the association is weak or negligible, and fail to discover strong associations when species are not abundant. Consequently, a list of abundant species is produced by the multiple testing procedure even though the goal was to provide a list of producitivity-associated species. This paper provides an FDR method based on a mixture of multinomial distributions and shows that it tends to discover more non-negligible effects and fewer negligible effects when the data are heterogeneous across tests. The proposed method and competing methods are applied to the motivating data. The new method identifies more species that are strongly associated with productivity and identifies fewer species that are weakly associated with productivity. △ Less

Submitted 4 November, 2015; originally announced November 2015.

arXiv:1412.0645 [pdf, ps, other]

Adaptive False Discovery Rate Control for Heterogeneous Data

Authors: Joshua D. Habiger

Abstract: Efforts to develop more efficient multiple hypothesis testing procedures for false discovery rate (FDR) control have focused on incorporating an estimate of the proportion of true null hypotheses (such procedures are called adaptive) or exploiting heterogeneity across tests via some optimal weighting scheme. This paper combines these approaches using a weighted adaptive multiple decision function… ▽ More Efforts to develop more efficient multiple hypothesis testing procedures for false discovery rate (FDR) control have focused on incorporating an estimate of the proportion of true null hypotheses (such procedures are called adaptive) or exploiting heterogeneity across tests via some optimal weighting scheme. This paper combines these approaches using a weighted adaptive multiple decision function (WAMDF) framework. Optimal weights for a flexible random effects model are derived and a WAMDF that controls the FDR for arbitrary weighting schemes when test statistics are independent under the null hypotheses is given. Asymptotic and numerical assessment reveals that, under weak dependence, the proposed WAMDFs provide more efficient FDR control even if optimal weights are misspecified. The robustness and flexibility of the proposed methodology facilitates the development of more efficient, yet practical, FDR procedures for heterogeneous data. To illustrate, two different weighted adaptive FDR methods for heterogeneous sample sizes are developed and applied to data. △ Less

Submitted 10 February, 2017; v1 submitted 1 December, 2014; originally announced December 2014.

MSC Class: 62F03

arXiv:1412.0561 [pdf, ps, other]

Multiple Test Functions and Adjusted p-Values for Test Statistics with Discrete Distributions

Authors: Joshua D Habiger

Abstract: The randomized $p$-value, (nonrandomized) mid-$p$-value and abstract randomized $p$-value have all been recommended for testing a null hypothesis whenever the test statistic has a discrete distribution. This paper provides a unifying framework for these approaches and extends it to the multiple testing setting. In particular, multiplicity adjusted versions of the aforementioned $p$-values and mult… ▽ More The randomized $p$-value, (nonrandomized) mid-$p$-value and abstract randomized $p$-value have all been recommended for testing a null hypothesis whenever the test statistic has a discrete distribution. This paper provides a unifying framework for these approaches and extends it to the multiple testing setting. In particular, multiplicity adjusted versions of the aforementioned $p$-values and multiple test functions are developed. It is demonstrated that, whenever the usual nonrandomized and randomized decisions to reject or retain the null hypothesis may differ, the (adjusted) abstract randomized $p$-value and test function should be reported, especially when the number of tests is large. It is shown that the proposed approach dominates the traditional randomized and nonrandomized approaches in terms of bias and variability. Tools for plotting adjusted abstract randomized $p$-values and for computing multiple test functions are developed. Examples are used to illustrate the method and to motivate a new type of multiplicity adjusted mid-$p$-value. △ Less

Submitted 1 December, 2014; originally announced December 2014.

arXiv:1108.4848 [pdf, ps, other]

Compound p-Value Statistics for Multiple Testing Procedures

Authors: Joshua D. Habiger, Edsel A. Pena

Abstract: Many multiple testing procedures make use of the p-values from the individual pairs of hypothesis tests, and are valid if the p-value statistics are independent and uniformly distributed under the null hypotheses. However, it has recently been shown that these types of multiple testing procedures are inefficient since such p-values do not depend upon all of the available data. This paper provides… ▽ More Many multiple testing procedures make use of the p-values from the individual pairs of hypothesis tests, and are valid if the p-value statistics are independent and uniformly distributed under the null hypotheses. However, it has recently been shown that these types of multiple testing procedures are inefficient since such p-values do not depend upon all of the available data. This paper provides tools for constructing compound p-value statistics, which are those that depend upon all of the available data, but still satisfy the conditions of independence and uniformity under the null hypotheses. As an example, a class of compound p-value statistics for testing for location shifts is developed. It is demonstrated, both analytically and through simulations, that multiple testing procedures tend to reject more false null hypotheses when applied to these compound p-values rather than the usual p-values, and at the same time still guarantee the desired type I error rate control. The compound p-values, in conjunction with two different multiple testing methods, are used to analyze a real microarray data set. Applying either multiple testing method to the compound p-values, instead of the usual p-values, enhances their powers. △ Less

Submitted 24 August, 2011; originally announced August 2011.

MSC Class: 62H15

arXiv:1007.2612 [pdf, ps, other]

doi 10.1007/s00184-014-0516-6

Classes of Multiple Decision Functions Strongly Controlling FWER and FDR

Authors: Edsel A. Pena, Joshua D. Habiger, Wensong Wu

Abstract: This paper provides two general classes of multiple decision functions where each member of the first class strongly controls the family-wise error rate (FWER), while each member of the second class strongly controls the false discovery rate (FDR). These classes offer the possibility that an optimal multiple decision function with respect to a pre-specified criterion, such as the missed discovery… ▽ More This paper provides two general classes of multiple decision functions where each member of the first class strongly controls the family-wise error rate (FWER), while each member of the second class strongly controls the false discovery rate (FDR). These classes offer the possibility that an optimal multiple decision function with respect to a pre-specified criterion, such as the missed discovery rate (MDR), could be found within these classes. Such multiple decision functions can be utilized in multiple testing, specifically, but not limited to, the analysis of high-dimensional microarray data sets. △ Less

Submitted 15 July, 2010; originally announced July 2010.

Comments: 19 pages

Journal ref: Metrika, 2015, 78:563-595

arXiv:0908.1767 [pdf, ps, other]

doi 10.1214/10-AOS844

Power-enhanced multiple decision functions controlling family-wise error and false discovery rates

Authors: Edsel A. Peña, Joshua D. Habiger, Wensong Wu

Abstract: Improved procedures, in terms of smaller missed discovery rates (MDR), for performing multiple hypotheses testing with weak and strong control of the family-wise error rate (FWER) or the false discovery rate (FDR) are developed and studied. The improvement over existing procedures such as the Šidák procedure for FWER control and the Benjamini--Hochberg (BH) procedure for FDR control is achieved by… ▽ More Improved procedures, in terms of smaller missed discovery rates (MDR), for performing multiple hypotheses testing with weak and strong control of the family-wise error rate (FWER) or the false discovery rate (FDR) are developed and studied. The improvement over existing procedures such as the Šidák procedure for FWER control and the Benjamini--Hochberg (BH) procedure for FDR control is achieved by exploiting possible differences in the powers of the individual tests. Results signal the need to take into account the powers of the individual tests and to have multiple hypotheses decision functions which are not limited to simply using the individual $p$-values, as is the case, for example, with the Šidák, Bonferroni, or BH procedures. They also enhance understanding of the role of the powers of individual tests, or more precisely the receiver operating characteristic (ROC) functions of decision processes, in the search for better multiple hypotheses testing procedures. A decision-theoretic framework is utilized, and through auxiliary randomizers the procedures could be used with discrete or mixed-type data or with rank-based nonparametric tests. This is in contrast to existing $p$-value based procedures whose theoretical validity is contingent on each of these $p$-value statistics being stochastically equal to or greater than a standard uniform variable under the null hypothesis. Proposed procedures are relevant in the analysis of high-dimensional "large $M$, small $n$" data sets arising in the natural, physical, medical, economic and social sciences, whose generation and creation is accelerated by advances in high-throughput technology, notably, but not limited to, microarray technology. △ Less

Submitted 9 March, 2011; v1 submitted 12 August, 2009; originally announced August 2009.

Comments: Published in at http://dx.doi.org/10.1214/10-AOS844 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS844

Journal ref: Annals of Statistics 2011, Vol. 39, No. 1, 556-583

Showing 1–8 of 8 results for author: Habiger, J