Search | arXiv e-print repository

Double trouble: Predicting new variant counts across two heterogeneous populations

Authors: Yunyi Shen, Lorenzo Masoero, Joshua G. Schraiber, Tamara Broderick

Abstract: Collecting genomics data across multiple heterogeneous populations (e.g., across different cancer types) has the potential to improve our understanding of disease. Despite sequencing advances, though, resources often remain a constraint when gathering data. So it would be useful for experimental design if experimenters with access to a pilot study could predict the number of new variants they migh… ▽ More Collecting genomics data across multiple heterogeneous populations (e.g., across different cancer types) has the potential to improve our understanding of disease. Despite sequencing advances, though, resources often remain a constraint when gathering data. So it would be useful for experimental design if experimenters with access to a pilot study could predict the number of new variants they might expect to find in a follow-up study: both the number of new variants shared between the populations and the total across the populations. While many authors have developed prediction methods for the single-population case, we show that these predictions can fare poorly across multiple populations that are heterogeneous. We prove that, surprisingly, a natural extension of a state-of-the-art single-population predictor to multiple populations fails for fundamental reasons. We provide the first predictor for the number of new shared variants and new total variants that can handle heterogeneity in multiple populations. We show that our proposed method works well empirically using real cancer and population genetics data. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:1601.05388 [pdf, other]

Bayesian inference of natural selection from allele frequency time series

Authors: Joshua G. Schraiber, Steven N. Evans, Montgomery Slatkin

Abstract: The advent of accessible ancient DNA technology now allows the direct ascertainment of allele frequencies in ancestral populations, thereby enabling the use of allele frequency time series to detect and estimate natural selection. Such direct observations of allele frequency dynamics are expected to be more powerful than inferences made using patterns of linked neutral variation obtained from mode… ▽ More The advent of accessible ancient DNA technology now allows the direct ascertainment of allele frequencies in ancestral populations, thereby enabling the use of allele frequency time series to detect and estimate natural selection. Such direct observations of allele frequency dynamics are expected to be more powerful than inferences made using patterns of linked neutral variation obtained from modern individuals. We develop a Bayesian method to make use of allele frequency time series data and infer the parameters of general diploid selection, along with allele age, in non-equilibrium populations. We introduce a novel path augmentation approach, in which we use Markov chain Monte Carlo to integrate over the space of allele frequency trajectories consistent with the observed data. Using simulations, we show that this approach has good power to estimate selection coefficients and allele age. Moreover, when applying our approach to data on horse coat color, we find that ignoring a relevant demographic history can significantly bias the results of inference. Our approach is made available in a C++ software package. △ Less

Submitted 20 January, 2016; originally announced January 2016.

Comments: 27 pages

arXiv:1312.6639 [pdf]

doi 10.1038/nature13673

Ancient human genomes suggest three ancestral populations for present-day Europeans

Authors: Iosif Lazaridis, Nick Patterson, Alissa Mittnik, Gabriel Renaud, Swapan Mallick, Karola Kirsanow, Peter H. Sudmant, Joshua G. Schraiber, Sergi Castellano, Mark Lipson, Bonnie Berger, Christos Economou, Ruth Bollongino, Qiaomei Fu, Kirsten I. Bos, Susanne Nordenfelt, Heng Li, Cesare de Filippo, Kay Prüfer, Susanna Sawyer, Cosimo Posth, Wolfgang Haak, Fredrik Hallgren, Elin Fornander, Nadin Rohland , et al. (95 additional authors not shown)

Abstract: We sequenced genomes from a $\sim$7,000 year old early farmer from Stuttgart in Germany, an $\sim$8,000 year old hunter-gatherer from Luxembourg, and seven $\sim$8,000 year old hunter-gatherers from southern Sweden. We analyzed these data together with other ancient genomes and 2,345 contemporary humans to show that the great majority of present-day Europeans derive from at least three highly diff… ▽ More We sequenced genomes from a $\sim$7,000 year old early farmer from Stuttgart in Germany, an $\sim$8,000 year old hunter-gatherer from Luxembourg, and seven $\sim$8,000 year old hunter-gatherers from southern Sweden. We analyzed these data together with other ancient genomes and 2,345 contemporary humans to show that the great majority of present-day Europeans derive from at least three highly differentiated populations: West European Hunter-Gatherers (WHG), who contributed ancestry to all Europeans but not to Near Easterners; Ancient North Eurasians (ANE), who were most closely related to Upper Paleolithic Siberians and contributed to both Europeans and Near Easterners; and Early European Farmers (EEF), who were mainly of Near Eastern origin but also harbored WHG-related ancestry. We model these populations' deep relationships and show that EEF had $\sim$44% ancestry from a "Basal Eurasian" lineage that split prior to the diversification of all other non-African lineages. △ Less

Submitted 1 April, 2014; v1 submitted 23 December, 2013; originally announced December 2013.

arXiv:1307.7756 [pdf, other]

doi 10.1016/j.tpb.2013.11.002

A path integral formulation of the Wright-Fisher process with genic selection

Authors: Joshua G. Schraiber

Abstract: The Wright-Fisher process with selection is an important tool in population genetics theory. Traditional analysis of this process relies on the diffusion approximation. The diffusion approximation is usually studied in a partial differential equations framework. In this paper, I introduce a path integral formalism to study the Wright-Fisher process with selection and use that formalism to obtain a… ▽ More The Wright-Fisher process with selection is an important tool in population genetics theory. Traditional analysis of this process relies on the diffusion approximation. The diffusion approximation is usually studied in a partial differential equations framework. In this paper, I introduce a path integral formalism to study the Wright-Fisher process with selection and use that formalism to obtain a simple perturbation series to approximate the transition density. The perturbation series can be understood in terms of Feynman diagrams, which have a simple probabilistic interpretation in terms of selective events. The perturbation series proves to be an accurate approximation of the transition density for weak selection and is shown to be arbitrarily accurate for any selection coefficient. △ Less

Submitted 29 July, 2013; originally announced July 2013.

Comments: 22 pages, 3 figures

Journal ref: Theoretical Population Biology 92, 2013, pp 30-35

arXiv:1306.3522 [pdf, other]

doi 10.1016/j.tpb.2013.08.005

Analysis and rejection sampling of Wright-Fisher diffusion bridges

Authors: Joshua G. Schraiber, Robert C. Griffiths, Steven N. Evans

Abstract: We investigate the properties of a Wright-Fisher diffusion process started from frequency x at time 0 and conditioned to be at frequency y at time T. Such a process is called a bridge. Bridges arise naturally in the analysis of selection acting on standing variation and in the inference of selection from allele frequency time series. We establish a number of results about the distribution of neutr… ▽ More We investigate the properties of a Wright-Fisher diffusion process started from frequency x at time 0 and conditioned to be at frequency y at time T. Such a process is called a bridge. Bridges arise naturally in the analysis of selection acting on standing variation and in the inference of selection from allele frequency time series. We establish a number of results about the distribution of neutral Wright-Fisher bridges and develop a novel rejection sampling scheme for bridges under selection that we use to study their behavior. △ Less

Submitted 14 June, 2013; originally announced June 2013.

Comments: 25 pages, 3 figures, 1 table

Journal ref: Theoretical Population Biology 89, 2013, pp 64-74

arXiv:1304.5486 [pdf, ps, other]

doi 10.1371/journal.pcbi.1003255

Inferring evolutionary histories of pathway regulation from transcriptional profiling data

Authors: Joshua G. Schraiber, Yulia Mostovoy, Tiffany Y. Hsu, Rachel B. Brem

Abstract: One of the outstanding challenges in comparative genomics is to interpret the evolutionary importance of regulatory variation between species. Rigorous molecular evolution-based methods to infer evidence for natural selection from expression data are at a premium in the field, and to date, phylogenetic approaches have not been well-suited to address the question in the small sets of taxa profiled… ▽ More One of the outstanding challenges in comparative genomics is to interpret the evolutionary importance of regulatory variation between species. Rigorous molecular evolution-based methods to infer evidence for natural selection from expression data are at a premium in the field, and to date, phylogenetic approaches have not been well-suited to address the question in the small sets of taxa profiled in standard surveys of gene expression. We have developed a strategy to infer evolutionary histories from expression profiles by analyzing suites of genes of common function. In a manner conceptually similar to molecular evolution models in which the evolutionary rates of DNA sequence at multiple loci follow a gamma distribution, we modeled expression of the genes of an \emph{a priori}-defined pathway with rates drawn from an inverse gamma distribution. We then developed a fitting strategy to infer the parameters of this distribution from expression measurements, and to identify gene groups whose expression patterns were consistent with evolutionary constraint or rapid evolution in particular species. Simulations confirmed the power and accuracy of our inference method. As an experimental testbed for our approach, we generated and analyzed transcriptional profiles of four \emph{Saccharomyces} yeasts. The results revealed pathways with signatures of constrained and accelerated regulatory evolution in individual yeasts and across the phylogeny, highlighting the prevalence of pathway-level expression change during the divergence of yeast species. We anticipate that our pathway-based phylogenetic approach will be of broad utility in the search to understand the evolutionary relevance of regulatory change. △ Less

Submitted 25 July, 2013; v1 submitted 19 April, 2013; originally announced April 2013.

Comments: 30 pages, 12 figures, 2 tables, contact authors for supplementary tables

Journal ref: PLoS Computational Biology 9, 2013, e1003255

arXiv:1209.6046 [pdf]

doi 10.1534/genetics.112.145367

Genomic tests of variation in inbreeding among individuals and among chromosomes

Authors: Joshua G. Schraiber, Stephannie Shih, Montgomery Slatkin

Abstract: We examine the distribution of heterozygous sites in nine European and nine Yoruban individuals whose genomic sequences were made publicly available by Complete Genomics. We show that it is possible to obtain detailed information about inbreeding when a relatively small set of whole-genome sequences is available. Rather than focus on testing for deviations from Hardy-Weinberg genotype frequencies… ▽ More We examine the distribution of heterozygous sites in nine European and nine Yoruban individuals whose genomic sequences were made publicly available by Complete Genomics. We show that it is possible to obtain detailed information about inbreeding when a relatively small set of whole-genome sequences is available. Rather than focus on testing for deviations from Hardy-Weinberg genotype frequencies at each site, we analyze the entire distribution of heterozygotes conditioned on the number of copies of the derived (non-chimpanzee) allele. Using Levene's exact test, we reject Hardy-Weinberg in both populations. We generalized Levene's distribution to obtain the exact distribution of the number of heterozygous individuals given that every individual has the same inbreeding coefficient, F. We estimated F to be 0.0026 in Europeans and 0.0005 in Yorubans, but we could also reject the hypothesis that F was the same in each individual. We used a composite likelihood method to estimate F in each individual and within each chromosome. Variation in F across chromosomes within individuals was too large to be consistent with sampling effects alone. Furthermore, estimates of F for each chromosome in different populations were not correlated. Our results show how detailed comparisons of population genomic data can be made to theoretical predictions. The application of methods to the Complete Genomics data set shows that the extent of apparent inbreeding varies across chromosomes and across individuals, and estimates of inbreeding coefficients are subject to unexpected levels of variation which might be partly accounted for by selection. △ Less

Submitted 26 September, 2012; originally announced September 2012.

Comments: 18 pages, 2 figures

Journal ref: Genetics December 1, 2012 vol. 192 no. 4 1477-1482

Showing 1–7 of 7 results for author: Schraiber, J G