-
Double trouble: Predicting new variant counts across two heterogeneous populations
Authors:
Yunyi Shen,
Lorenzo Masoero,
Joshua G. Schraiber,
Tamara Broderick
Abstract:
Collecting genomics data across multiple heterogeneous populations (e.g., across different cancer types) has the potential to improve our understanding of disease. Despite sequencing advances, though, resources often remain a constraint when gathering data. So it would be useful for experimental design if experimenters with access to a pilot study could predict the number of new variants they migh…
▽ More
Collecting genomics data across multiple heterogeneous populations (e.g., across different cancer types) has the potential to improve our understanding of disease. Despite sequencing advances, though, resources often remain a constraint when gathering data. So it would be useful for experimental design if experimenters with access to a pilot study could predict the number of new variants they might expect to find in a follow-up study: both the number of new variants shared between the populations and the total across the populations. While many authors have developed prediction methods for the single-population case, we show that these predictions can fare poorly across multiple populations that are heterogeneous. We prove that, surprisingly, a natural extension of a state-of-the-art single-population predictor to multiple populations fails for fundamental reasons. We provide the first predictor for the number of new shared variants and new total variants that can handle heterogeneity in multiple populations. We show that our proposed method works well empirically using real cancer and population genetics data.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Bayesian inference of natural selection from allele frequency time series
Authors:
Joshua G. Schraiber,
Steven N. Evans,
Montgomery Slatkin
Abstract:
The advent of accessible ancient DNA technology now allows the direct ascertainment of allele frequencies in ancestral populations, thereby enabling the use of allele frequency time series to detect and estimate natural selection. Such direct observations of allele frequency dynamics are expected to be more powerful than inferences made using patterns of linked neutral variation obtained from mode…
▽ More
The advent of accessible ancient DNA technology now allows the direct ascertainment of allele frequencies in ancestral populations, thereby enabling the use of allele frequency time series to detect and estimate natural selection. Such direct observations of allele frequency dynamics are expected to be more powerful than inferences made using patterns of linked neutral variation obtained from modern individuals. We develop a Bayesian method to make use of allele frequency time series data and infer the parameters of general diploid selection, along with allele age, in non-equilibrium populations. We introduce a novel path augmentation approach, in which we use Markov chain Monte Carlo to integrate over the space of allele frequency trajectories consistent with the observed data. Using simulations, we show that this approach has good power to estimate selection coefficients and allele age. Moreover, when applying our approach to data on horse coat color, we find that ignoring a relevant demographic history can significantly bias the results of inference. Our approach is made available in a C++ software package.
△ Less
Submitted 20 January, 2016;
originally announced January 2016.
-
Ancient human genomes suggest three ancestral populations for present-day Europeans
Authors:
Iosif Lazaridis,
Nick Patterson,
Alissa Mittnik,
Gabriel Renaud,
Swapan Mallick,
Karola Kirsanow,
Peter H. Sudmant,
Joshua G. Schraiber,
Sergi Castellano,
Mark Lipson,
Bonnie Berger,
Christos Economou,
Ruth Bollongino,
Qiaomei Fu,
Kirsten I. Bos,
Susanne Nordenfelt,
Heng Li,
Cesare de Filippo,
Kay Prüfer,
Susanna Sawyer,
Cosimo Posth,
Wolfgang Haak,
Fredrik Hallgren,
Elin Fornander,
Nadin Rohland
, et al. (95 additional authors not shown)
Abstract:
We sequenced genomes from a $\sim$7,000 year old early farmer from Stuttgart in Germany, an $\sim$8,000 year old hunter-gatherer from Luxembourg, and seven $\sim$8,000 year old hunter-gatherers from southern Sweden. We analyzed these data together with other ancient genomes and 2,345 contemporary humans to show that the great majority of present-day Europeans derive from at least three highly diff…
▽ More
We sequenced genomes from a $\sim$7,000 year old early farmer from Stuttgart in Germany, an $\sim$8,000 year old hunter-gatherer from Luxembourg, and seven $\sim$8,000 year old hunter-gatherers from southern Sweden. We analyzed these data together with other ancient genomes and 2,345 contemporary humans to show that the great majority of present-day Europeans derive from at least three highly differentiated populations: West European Hunter-Gatherers (WHG), who contributed ancestry to all Europeans but not to Near Easterners; Ancient North Eurasians (ANE), who were most closely related to Upper Paleolithic Siberians and contributed to both Europeans and Near Easterners; and Early European Farmers (EEF), who were mainly of Near Eastern origin but also harbored WHG-related ancestry. We model these populations' deep relationships and show that EEF had $\sim$44% ancestry from a "Basal Eurasian" lineage that split prior to the diversification of all other non-African lineages.
△ Less
Submitted 1 April, 2014; v1 submitted 23 December, 2013;
originally announced December 2013.
-
A path integral formulation of the Wright-Fisher process with genic selection
Authors:
Joshua G. Schraiber
Abstract:
The Wright-Fisher process with selection is an important tool in population genetics theory. Traditional analysis of this process relies on the diffusion approximation. The diffusion approximation is usually studied in a partial differential equations framework. In this paper, I introduce a path integral formalism to study the Wright-Fisher process with selection and use that formalism to obtain a…
▽ More
The Wright-Fisher process with selection is an important tool in population genetics theory. Traditional analysis of this process relies on the diffusion approximation. The diffusion approximation is usually studied in a partial differential equations framework. In this paper, I introduce a path integral formalism to study the Wright-Fisher process with selection and use that formalism to obtain a simple perturbation series to approximate the transition density. The perturbation series can be understood in terms of Feynman diagrams, which have a simple probabilistic interpretation in terms of selective events. The perturbation series proves to be an accurate approximation of the transition density for weak selection and is shown to be arbitrarily accurate for any selection coefficient.
△ Less
Submitted 29 July, 2013;
originally announced July 2013.
-
Analysis and rejection sampling of Wright-Fisher diffusion bridges
Authors:
Joshua G. Schraiber,
Robert C. Griffiths,
Steven N. Evans
Abstract:
We investigate the properties of a Wright-Fisher diffusion process started from frequency x at time 0 and conditioned to be at frequency y at time T. Such a process is called a bridge. Bridges arise naturally in the analysis of selection acting on standing variation and in the inference of selection from allele frequency time series. We establish a number of results about the distribution of neutr…
▽ More
We investigate the properties of a Wright-Fisher diffusion process started from frequency x at time 0 and conditioned to be at frequency y at time T. Such a process is called a bridge. Bridges arise naturally in the analysis of selection acting on standing variation and in the inference of selection from allele frequency time series. We establish a number of results about the distribution of neutral Wright-Fisher bridges and develop a novel rejection sampling scheme for bridges under selection that we use to study their behavior.
△ Less
Submitted 14 June, 2013;
originally announced June 2013.
-
Inferring evolutionary histories of pathway regulation from transcriptional profiling data
Authors:
Joshua G. Schraiber,
Yulia Mostovoy,
Tiffany Y. Hsu,
Rachel B. Brem
Abstract:
One of the outstanding challenges in comparative genomics is to interpret the evolutionary importance of regulatory variation between species. Rigorous molecular evolution-based methods to infer evidence for natural selection from expression data are at a premium in the field, and to date, phylogenetic approaches have not been well-suited to address the question in the small sets of taxa profiled…
▽ More
One of the outstanding challenges in comparative genomics is to interpret the evolutionary importance of regulatory variation between species. Rigorous molecular evolution-based methods to infer evidence for natural selection from expression data are at a premium in the field, and to date, phylogenetic approaches have not been well-suited to address the question in the small sets of taxa profiled in standard surveys of gene expression. We have developed a strategy to infer evolutionary histories from expression profiles by analyzing suites of genes of common function. In a manner conceptually similar to molecular evolution models in which the evolutionary rates of DNA sequence at multiple loci follow a gamma distribution, we modeled expression of the genes of an \emph{a priori}-defined pathway with rates drawn from an inverse gamma distribution. We then developed a fitting strategy to infer the parameters of this distribution from expression measurements, and to identify gene groups whose expression patterns were consistent with evolutionary constraint or rapid evolution in particular species. Simulations confirmed the power and accuracy of our inference method. As an experimental testbed for our approach, we generated and analyzed transcriptional profiles of four \emph{Saccharomyces} yeasts. The results revealed pathways with signatures of constrained and accelerated regulatory evolution in individual yeasts and across the phylogeny, highlighting the prevalence of pathway-level expression change during the divergence of yeast species. We anticipate that our pathway-based phylogenetic approach will be of broad utility in the search to understand the evolutionary relevance of regulatory change.
△ Less
Submitted 25 July, 2013; v1 submitted 19 April, 2013;
originally announced April 2013.
-
Genomic tests of variation in inbreeding among individuals and among chromosomes
Authors:
Joshua G. Schraiber,
Stephannie Shih,
Montgomery Slatkin
Abstract:
We examine the distribution of heterozygous sites in nine European and nine Yoruban individuals whose genomic sequences were made publicly available by Complete Genomics. We show that it is possible to obtain detailed information about inbreeding when a relatively small set of whole-genome sequences is available. Rather than focus on testing for deviations from Hardy-Weinberg genotype frequencies…
▽ More
We examine the distribution of heterozygous sites in nine European and nine Yoruban individuals whose genomic sequences were made publicly available by Complete Genomics. We show that it is possible to obtain detailed information about inbreeding when a relatively small set of whole-genome sequences is available. Rather than focus on testing for deviations from Hardy-Weinberg genotype frequencies at each site, we analyze the entire distribution of heterozygotes conditioned on the number of copies of the derived (non-chimpanzee) allele. Using Levene's exact test, we reject Hardy-Weinberg in both populations. We generalized Levene's distribution to obtain the exact distribution of the number of heterozygous individuals given that every individual has the same inbreeding coefficient, F. We estimated F to be 0.0026 in Europeans and 0.0005 in Yorubans, but we could also reject the hypothesis that F was the same in each individual. We used a composite likelihood method to estimate F in each individual and within each chromosome. Variation in F across chromosomes within individuals was too large to be consistent with sampling effects alone. Furthermore, estimates of F for each chromosome in different populations were not correlated. Our results show how detailed comparisons of population genomic data can be made to theoretical predictions. The application of methods to the Complete Genomics data set shows that the extent of apparent inbreeding varies across chromosomes and across individuals, and estimates of inbreeding coefficients are subject to unexpected levels of variation which might be partly accounted for by selection.
△ Less
Submitted 26 September, 2012;
originally announced September 2012.