Powerful extreme phenotype sampling designs and score tests for genetic association studies
Authors:
Thea Bjørnland,
Anja Bye,
Einar Ryeng,
Ulrik Wisløff,
Mette Langaas
Abstract:
We consider cross-sectional genetic association studies (common and rare variants) where non-genetic information is available, or feasible to obtain for $N$ individuals, but where it is infeasible to genotype all $N$ individuals. We consider continuously measurable Gaussian traits (phenotypes). Genoty** $n<N$ extreme phenotype individuals can yield better power to detect phenotype-genotype assoc…
▽ More
We consider cross-sectional genetic association studies (common and rare variants) where non-genetic information is available, or feasible to obtain for $N$ individuals, but where it is infeasible to genotype all $N$ individuals. We consider continuously measurable Gaussian traits (phenotypes). Genoty** $n<N$ extreme phenotype individuals can yield better power to detect phenotype-genotype associations, as compared to randomly selecting $n$ individuals. We define a person as having an extreme phenotype if the observed phenotype is above a specified threshold or below a specified thresholds. We consider a model where these thresholds can be tailored to each individual. The classical extreme sampling design is to set equal thresholds for all individuals. We introduce a design ($z$-extreme sampling) where personalized thresholds are defined based on the residuals of a regression model including only non-genetic (fully available) information. We derive score tests for the situation where only $n$ extremes are analyzed (complete case analysis), and for the situation where the non-genetic information on $N-n$ non-extremes is included in the analysis (all case analysis). For the classical design, all case analysis is generally more powerful than complete case analysis. For the $z$-extreme sample, we show that all case and complete case tests are equally powerful. Simulations and data analysis also show that $z$-extreme sampling is at least as powerful as the classical extreme sampling design and the classical design is shown to be at times less powerful than random sampling. The method of dichotomizing extreme phenotypes is also discussed.
△ Less
Submitted 5 February, 2020; v1 submitted 5 January, 2017;
originally announced January 2017.
Efficient and powerful familywise error control in genome-wide association studies using generalized linear models
Authors:
K. K. Halle,
Ø. Bakke,
S. Djurovic,
A. Bye,
E. Ryeng,
U. Wisløff,
O. A. Andreassen,
M. Langaas
Abstract:
In genetic association studies, detecting phenotype-genotype association is a primary goal. We assume that the relationship between the data -phenotype, genetic markers and environmental covariates - can be modelled by a generalized linear model (GLM). The inclusion of environmental covariates makes it possible to account for important confounding factors, such as sex and population substructure.…
▽ More
In genetic association studies, detecting phenotype-genotype association is a primary goal. We assume that the relationship between the data -phenotype, genetic markers and environmental covariates - can be modelled by a generalized linear model (GLM). The inclusion of environmental covariates makes it possible to account for important confounding factors, such as sex and population substructure. A multivariate score statistic, which under the complete null hypothesis of no phenotype-genotype association asymptotically has a multivariate normal distribution with a covariance matrix that can be estimated from the data, is used to test a large number of genetic markers for association with the phenotype. We stress the importance of controlling the familywise error rate (FWER), and use the asymptotic distribution of the multivariate score test statistic to find a local significance level for the individual test. Using real data (from one study on schizophrenia and bipolar disorder and one on maximal oxygen uptake) and constructed correlated structures, we show that our method is a powerful alternative to the popular Bonferroni and Sidak methods. For GLMs without environmental covariates, we show that our method is an efficient alternative to permutation methods for multiple testing. Further, we show that if environmental covariates and genetic markers are uncorrelated, the estimated covariance matrix of the score test statistic can be approximated by the estimated correlation matrix for just the genetic markers. As byproducts of our method, an effective number of independent tests can be defined, and FWER-adjusted $p$-values can be calculated as an alternative to using a local significance level.
△ Less
Submitted 22 December, 2016; v1 submitted 18 March, 2016;
originally announced March 2016.