Permutation in genetic association studies with covariates: controlling the familywise error rate with score tests in generalized linear models
Authors:
Kari Krizak Halle,
Mette Langaas
Abstract:
In genome-wide association (GWA) studies the goal is to detect associations between genetic markers and a given phenotype. The number of genetic markers can be large and effective methods for control of the overall error rate is a central topic when analyzing GWA data. The Bonferroni method is known to be conservative when the tests are dependent. Permutation methods give exact control of the over…
▽ More
In genome-wide association (GWA) studies the goal is to detect associations between genetic markers and a given phenotype. The number of genetic markers can be large and effective methods for control of the overall error rate is a central topic when analyzing GWA data. The Bonferroni method is known to be conservative when the tests are dependent. Permutation methods give exact control of the overall error rate when the assumption of exchangeability is satisfied, but are computationally intensive for large datasets. For regression models the exchangeability assumption is in general not satisfied and there is no standard solution on how to do permutation testing, except some approximate methods. In this paper we will discuss permutation methods for control of the familywise error rate in genetic association studies and present an approximate solution. These methods will be compared using simulated data.
△ Less
Submitted 8 May, 2017; v1 submitted 21 December, 2016;
originally announced December 2016.
Is the familywise error rate in genomics controlled by methods based on the effective number of independent tests?
Authors:
Kari Krizak Halle,
Srdjan Djurovic,
Ole Andreas Andreassen,
Mette Langaas
Abstract:
In genome-wide association (GWA) studies the goal is to detect association between one or more genetic markers and a given phenotype. The number of genetic markers in a GWA study can be in the order hundreds of thousands and therefore multiple testing methods are needed. This paper presents a set of popular methods to be used to correct for multiple testing in GWA studies. All are based on the con…
▽ More
In genome-wide association (GWA) studies the goal is to detect association between one or more genetic markers and a given phenotype. The number of genetic markers in a GWA study can be in the order hundreds of thousands and therefore multiple testing methods are needed. This paper presents a set of popular methods to be used to correct for multiple testing in GWA studies. All are based on the concept of estimating an effective number of independent tests. We compare these methods using simulated data and data from the TOP study, and show that the effective number of independent tests is not additive over blocks of independent genetic markers unless we assume a common value for the local significance level. We also show that the reviewed methods based on estimating the effective number of independent tests in general do not control the familywise error rate.
△ Less
Submitted 21 December, 2016; v1 submitted 14 December, 2016;
originally announced December 2016.
Efficient and powerful familywise error control in genome-wide association studies using generalized linear models
Authors:
K. K. Halle,
Ø. Bakke,
S. Djurovic,
A. Bye,
E. Ryeng,
U. Wisløff,
O. A. Andreassen,
M. Langaas
Abstract:
In genetic association studies, detecting phenotype-genotype association is a primary goal. We assume that the relationship between the data -phenotype, genetic markers and environmental covariates - can be modelled by a generalized linear model (GLM). The inclusion of environmental covariates makes it possible to account for important confounding factors, such as sex and population substructure.…
▽ More
In genetic association studies, detecting phenotype-genotype association is a primary goal. We assume that the relationship between the data -phenotype, genetic markers and environmental covariates - can be modelled by a generalized linear model (GLM). The inclusion of environmental covariates makes it possible to account for important confounding factors, such as sex and population substructure. A multivariate score statistic, which under the complete null hypothesis of no phenotype-genotype association asymptotically has a multivariate normal distribution with a covariance matrix that can be estimated from the data, is used to test a large number of genetic markers for association with the phenotype. We stress the importance of controlling the familywise error rate (FWER), and use the asymptotic distribution of the multivariate score test statistic to find a local significance level for the individual test. Using real data (from one study on schizophrenia and bipolar disorder and one on maximal oxygen uptake) and constructed correlated structures, we show that our method is a powerful alternative to the popular Bonferroni and Sidak methods. For GLMs without environmental covariates, we show that our method is an efficient alternative to permutation methods for multiple testing. Further, we show that if environmental covariates and genetic markers are uncorrelated, the estimated covariance matrix of the score test statistic can be approximated by the estimated correlation matrix for just the genetic markers. As byproducts of our method, an effective number of independent tests can be defined, and FWER-adjusted $p$-values can be calculated as an alternative to using a local significance level.
△ Less
Submitted 22 December, 2016; v1 submitted 18 March, 2016;
originally announced March 2016.