Search | arXiv e-print repository

Distinguishing correlation from causation using genome-wide association studies

Authors: Luke J. O'Connor, Alkes L. Price

Abstract: Genome-wide association studies (GWAS) have emerged as a rich source of genetic clues into disease biology, and they have revealed strong genetic correlations among many diseases and traits. Some of these genetic correlations may reflect causal relationships. We developed a method to quantify causal relationships between genetically correlated traits using GWAS summary association statistics. In p… ▽ More Genome-wide association studies (GWAS) have emerged as a rich source of genetic clues into disease biology, and they have revealed strong genetic correlations among many diseases and traits. Some of these genetic correlations may reflect causal relationships. We developed a method to quantify causal relationships between genetically correlated traits using GWAS summary association statistics. In particular, our method quantifies what part of the genetic component of trait 1 is also causal for trait 2 using mixed fourth moments $E(α_1^2α_1α_2)$ and $E(α_2^2α_1α_2)$ of the bivariate effect size distribution. If trait 1 is causal for trait 2, then SNPs affecting trait 1 (large $α_1^2$) will have correlated effects on trait 2 (large $α_1α_2$), but not vice versa. We validated this approach in extensive simulations. Across 52 traits (average $N=331$k), we identified 30 putative genetically causal relationships, many novel, including an effect of LDL cholesterol on decreased bone mineral density. More broadly, we demonstrate that it is possible to distinguish between genetic correlation and causation using genetic association data. △ Less

Submitted 21 November, 2018; originally announced November 2018.

Comments: Machine Learning for Health (ML4H) Workshop at NeurIPS 2018 arXiv:1811.07216

Report number: ML4H/2018/4

Journal ref: O'Connor, Luke J. and Alkes L. Price. "Distinguishing genetic correlation from causation across 52 diseases and complex traits." Nature genetics (2018)

arXiv:1312.2675 [pdf]

Genome-wide scan of 29,141 African Americans finds no evidence of selection since admixture

Authors: Gaurav Bhatia, Arti Tandon, Melinda C. Aldrich, Christine B. Ambrosone, Christopher Amos, Elisa V. Bandera, Sonja I. Berndt, Leslie Bernstein, William J. Blot, Cathryn H. Bock, Neil Caporaso, Graham Casey, Sandra L. Deming, W. Ryan Diver, Susan M. Gapstur, Elizabeth M. Gillanders, Curtis C. Harris, Brian E. Henderson, Sue A. Ingles, William Isaacs, Esther M. John, Rick A. Kittles, Emma Larkin, Lorna H. McNeill, Robert C. Millikan , et al. (22 additional authors not shown)

Abstract: We scanned through the genomes of 29,141 African Americans, searching for loci where the average proportion of African ancestry deviates significantly from the genome-wide average. We failed to find any genome-wide significant deviations, and conclude that any selection in African Americans since admixture is sufficiently weak that it falls below the threshold of our power to detect it using a lar… ▽ More We scanned through the genomes of 29,141 African Americans, searching for loci where the average proportion of African ancestry deviates significantly from the genome-wide average. We failed to find any genome-wide significant deviations, and conclude that any selection in African Americans since admixture is sufficiently weak that it falls below the threshold of our power to detect it using a large sample size. These results stand in contrast to the findings of a recent study of selection in African Americans. That study, which had 15 times fewer samples, reported six loci with significant deviations. We show that the discrepancy is likely due to insufficient correction for multiple hypothesis testing in the previous study. The same study reported 14 loci that showed greater population differentiation between African Americans and Nigerian Yoruba than would be expected in the absence of natural selection. Four such loci were previously shown to be genome-wide significant and likely to be affected by selection, but we show that most of the 10 additional loci are likely to be false positives. Additionally, the most parsimonious explanation for the loci that have significant evidence of unusual differentiation in frequency between Nigerians and Africans Americans is selection in Africa prior to their forced migration to the Americas. △ Less

Submitted 10 December, 2013; originally announced December 2013.

arXiv:1309.3258 [pdf, other]

doi 10.1093/bioinformatics/btu416

Fast and accurate imputation of summary statistics enhances evidence of functional enrichment

Authors: Bogdan Pasaniuc, Noah Zaitlen, Huwenbo Shi, Gaurav Bhatia, Alexander Gusev, Joseph Pickrell, Joel Hirschhorn, David P Strachan, Nick Patterson, Alkes L. Price

Abstract: Imputation using external reference panels is a widely used approach for increasing power in GWAS and meta-analysis. Existing HMM-based imputation approaches require individual-level genotypes. Here, we develop a new method for Gaussian imputation from summary association statistics, a type of data that is becoming widely available. In simulations using 1000 Genomes (1000G) data, this method recov… ▽ More Imputation using external reference panels is a widely used approach for increasing power in GWAS and meta-analysis. Existing HMM-based imputation approaches require individual-level genotypes. Here, we develop a new method for Gaussian imputation from summary association statistics, a type of data that is becoming widely available. In simulations using 1000 Genomes (1000G) data, this method recovers 84% (54%) of the effective sample size for common (>5%) and low-frequency (1-5%) variants (increasing to 87% (60%) when summary LD information is available from target samples) versus 89% (67%) for HMM-based imputation, which cannot be applied to summary statistics. Our approach accounts for the limited sample size of the reference panel, a crucial step to eliminate false-positive associations, and is computationally very fast. As an empirical demonstration, we apply our method to 7 case-control phenotypes from the WTCCC data and a study of height in the British 1958 birth cohort (1958BC). Gaussian imputation from summary statistics recovers 95% (105%) of the effective sample size (as quantified by the ratio of $χ^2$ association statistics) compared to HMM-based imputation from individual-level genotypes at the 227 (176) published SNPs in the WTCCC (1958BC height) data. In addition, for publicly available summary statistics from large meta-analyses of 4 lipid traits, we publicly release imputed summary statistics at 1000G SNPs, which could not have been obtained using previously published methods, and demonstrate their accuracy by masking subsets of the data. We show that 1000G imputation using our approach increases the magnitude and statistical evidence of enrichment at genic vs. non-genic loci for these traits, as compared to an analysis without 1000G imputation. Thus, imputation of summary statistics will be a valuable tool in future functional enrichment analyses. △ Less

Submitted 12 September, 2013; originally announced September 2013.

Comments: 32 pages, 4 figures

Showing 1–3 of 3 results for author: Price, A L