-
Excess of genomic defects in a woolly mammoth on Wrangel island
Authors:
Rebekah L. Rogers,
Montgomery Slatkin
Abstract:
Woolly mammoths (Mammuthus primigenius) populated Siberia, Beringia, and North America during the Pleistocene and early Holocene. Recent breakthroughs in ancient DNA sequencing have allowed for complete genome sequencing for two specimens of woolly mammoths (Palkopoulou et al. 2015). One mammoth specimen is from a mainland population ~45,000 years ago when mammoths were plentiful. The second, a 43…
▽ More
Woolly mammoths (Mammuthus primigenius) populated Siberia, Beringia, and North America during the Pleistocene and early Holocene. Recent breakthroughs in ancient DNA sequencing have allowed for complete genome sequencing for two specimens of woolly mammoths (Palkopoulou et al. 2015). One mammoth specimen is from a mainland population ~45,000 years ago when mammoths were plentiful. The second, a 4300 yr old specimen, is derived from an isolated population on Wrangel island where mammoths subsisted with small effective population size more than 43-fold lower than previous populations. These extreme differences in effective population size offer a rare opportunity to test nearly neutral models of genome architecture evolution within a single species. Using these previously published mammoth sequences, we identify deletions, retrogenes, and non-functionalizing point mutations. In the Wrangel island mammoth, we identify a greater number of deletions, a larger proportion of deletions affecting gene sequences, a greater number of candidate retrogenes, and an increased number of premature stop codons. This accumulation of detrimental mutations is consistent with genomic meltdown in response to low effective population sizes in the dwindling mammoth population on Wrangel island. In addition, we observe high rates of loss of olfactory receptors and urinary proteins, either because these loci are non-essential or because they were favored by divergent selective pressures in island environments. Finally, at the locus of FOXQ1 we observe two independent loss-of-function mutations, which would confer a satin coat phenotype in this island woolly mammoth.
△ Less
Submitted 19 January, 2017; v1 submitted 20 June, 2016;
originally announced June 2016.
-
Bayesian inference of natural selection from allele frequency time series
Authors:
Joshua G. Schraiber,
Steven N. Evans,
Montgomery Slatkin
Abstract:
The advent of accessible ancient DNA technology now allows the direct ascertainment of allele frequencies in ancestral populations, thereby enabling the use of allele frequency time series to detect and estimate natural selection. Such direct observations of allele frequency dynamics are expected to be more powerful than inferences made using patterns of linked neutral variation obtained from mode…
▽ More
The advent of accessible ancient DNA technology now allows the direct ascertainment of allele frequencies in ancestral populations, thereby enabling the use of allele frequency time series to detect and estimate natural selection. Such direct observations of allele frequency dynamics are expected to be more powerful than inferences made using patterns of linked neutral variation obtained from modern individuals. We develop a Bayesian method to make use of allele frequency time series data and infer the parameters of general diploid selection, along with allele age, in non-equilibrium populations. We introduce a novel path augmentation approach, in which we use Markov chain Monte Carlo to integrate over the space of allele frequency trajectories consistent with the observed data. Using simulations, we show that this approach has good power to estimate selection coefficients and allele age. Moreover, when applying our approach to data on horse coat color, we find that ignoring a relevant demographic history can significantly bias the results of inference. Our approach is made available in a C++ software package.
△ Less
Submitted 20 January, 2016;
originally announced January 2016.
-
Ancient human genomes suggest three ancestral populations for present-day Europeans
Authors:
Iosif Lazaridis,
Nick Patterson,
Alissa Mittnik,
Gabriel Renaud,
Swapan Mallick,
Karola Kirsanow,
Peter H. Sudmant,
Joshua G. Schraiber,
Sergi Castellano,
Mark Lipson,
Bonnie Berger,
Christos Economou,
Ruth Bollongino,
Qiaomei Fu,
Kirsten I. Bos,
Susanne Nordenfelt,
Heng Li,
Cesare de Filippo,
Kay Prüfer,
Susanna Sawyer,
Cosimo Posth,
Wolfgang Haak,
Fredrik Hallgren,
Elin Fornander,
Nadin Rohland
, et al. (95 additional authors not shown)
Abstract:
We sequenced genomes from a $\sim$7,000 year old early farmer from Stuttgart in Germany, an $\sim$8,000 year old hunter-gatherer from Luxembourg, and seven $\sim$8,000 year old hunter-gatherers from southern Sweden. We analyzed these data together with other ancient genomes and 2,345 contemporary humans to show that the great majority of present-day Europeans derive from at least three highly diff…
▽ More
We sequenced genomes from a $\sim$7,000 year old early farmer from Stuttgart in Germany, an $\sim$8,000 year old hunter-gatherer from Luxembourg, and seven $\sim$8,000 year old hunter-gatherers from southern Sweden. We analyzed these data together with other ancient genomes and 2,345 contemporary humans to show that the great majority of present-day Europeans derive from at least three highly differentiated populations: West European Hunter-Gatherers (WHG), who contributed ancestry to all Europeans but not to Near Easterners; Ancient North Eurasians (ANE), who were most closely related to Upper Paleolithic Siberians and contributed to both Europeans and Near Easterners; and Early European Farmers (EEF), who were mainly of Near Eastern origin but also harbored WHG-related ancestry. We model these populations' deep relationships and show that EEF had $\sim$44% ancestry from a "Basal Eurasian" lineage that split prior to the diversification of all other non-African lineages.
△ Less
Submitted 1 April, 2014; v1 submitted 23 December, 2013;
originally announced December 2013.
-
The influence of relatives on the efficiency and error rate of familial searching
Authors:
Rori V. Rohlfs,
Erin Murphy,
Yun S. Song,
Montgomery Slatkin
Abstract:
We investigate the consequences of adopting the criteria used by the state of California, as described by Myers et al. (2011), for conducting familial searches. We carried out a simulation study of randomly generated profiles of related and unrelated individuals with 13-locus CODIS genotypes and YFiler Y-chromosome haplotypes, on which the Myers protocol for relative identification was carried out…
▽ More
We investigate the consequences of adopting the criteria used by the state of California, as described by Myers et al. (2011), for conducting familial searches. We carried out a simulation study of randomly generated profiles of related and unrelated individuals with 13-locus CODIS genotypes and YFiler Y-chromosome haplotypes, on which the Myers protocol for relative identification was carried out. For Y-chromosome sharing first degree relatives, the Myers protocol has a high probability (80 - 99%) of identifying their relationship. For unrelated individuals, there is a low probability that an unrelated person in the database will be identified as a first-degree relative. For more distant Y-haplotype sharing relatives (half-siblings, first cousins, half-first cousins or second cousins) there is a substantial probability that the more distant relative will be incorrectly identified as a first-degree relative. For example, there is a 3 - 18% probability that a first cousin will be identified as a full sibling, with the probability depending on the population background. Although the California familial search policy is likely to identify a first degree relative if his profile is in the database, and it poses little risk of falsely identifying an unrelated individual in a database as a first-degree relative, there is a substantial risk of falsely identifying a more distant Y-haplotype sharing relative in the database as a first-degree relative, with the consequence that their immediate family may become the target for further investigation. This risk falls disproportionately on those ethnic groups that are currently overrepresented in state and federal databases.
△ Less
Submitted 14 August, 2013; v1 submitted 10 April, 2013;
originally announced April 2013.
-
Detecting range expansions from genetic data
Authors:
Benjamin M Peter,
Montgomery Slatkin
Abstract:
We propose a method that uses genetic data to test for the occurrence of a recent range expansion and to infer the location of the origin of the expansion. We introduce a statistic for pairs of populations $ψ$ (the directionality index) that detects asymmetries in the two-dimensional allele frequency spectrum caused by the series of founder events that happen during an expansion. Such asymmetry ar…
▽ More
We propose a method that uses genetic data to test for the occurrence of a recent range expansion and to infer the location of the origin of the expansion. We introduce a statistic for pairs of populations $ψ$ (the directionality index) that detects asymmetries in the two-dimensional allele frequency spectrum caused by the series of founder events that happen during an expansion. Such asymmetry arises because low frequency alleles tend to be lost during founder events, thus creating clines in the frequencies of surviving low-frequency alleles. Using simulations, we further show that $ψ$ is more powerful for detecting range expansions than both $F_{ST}$ and clines in heterozygosity. We illustrate the utility of $ψ$ by applying it to a data set from modern humans and show how we can include more complicated scenarios such as multiple expansion origins or barriers to migration in the model.
△ Less
Submitted 29 March, 2013;
originally announced March 2013.
-
Genomic tests of variation in inbreeding among individuals and among chromosomes
Authors:
Joshua G. Schraiber,
Stephannie Shih,
Montgomery Slatkin
Abstract:
We examine the distribution of heterozygous sites in nine European and nine Yoruban individuals whose genomic sequences were made publicly available by Complete Genomics. We show that it is possible to obtain detailed information about inbreeding when a relatively small set of whole-genome sequences is available. Rather than focus on testing for deviations from Hardy-Weinberg genotype frequencies…
▽ More
We examine the distribution of heterozygous sites in nine European and nine Yoruban individuals whose genomic sequences were made publicly available by Complete Genomics. We show that it is possible to obtain detailed information about inbreeding when a relatively small set of whole-genome sequences is available. Rather than focus on testing for deviations from Hardy-Weinberg genotype frequencies at each site, we analyze the entire distribution of heterozygotes conditioned on the number of copies of the derived (non-chimpanzee) allele. Using Levene's exact test, we reject Hardy-Weinberg in both populations. We generalized Levene's distribution to obtain the exact distribution of the number of heterozygous individuals given that every individual has the same inbreeding coefficient, F. We estimated F to be 0.0026 in Europeans and 0.0005 in Yorubans, but we could also reject the hypothesis that F was the same in each individual. We used a composite likelihood method to estimate F in each individual and within each chromosome. Variation in F across chromosomes within individuals was too large to be consistent with sampling effects alone. Furthermore, estimates of F for each chromosome in different populations were not correlated. Our results show how detailed comparisons of population genomic data can be made to theoretical predictions. The application of methods to the Complete Genomics data set shows that the extent of apparent inbreeding varies across chromosomes and across individuals, and estimates of inbreeding coefficients are subject to unexpected levels of variation which might be partly accounted for by selection.
△ Less
Submitted 26 September, 2012;
originally announced September 2012.
-
Non-equilibrium theory of the allele frequency spectrum
Authors:
Steven N. Evans,
Yelena Shvets,
Montgomery Slatkin
Abstract:
A forward diffusion equation describing the evolution of the allele frequency spectrum is presented. The influx of mutations is accounted for by imposing a suitable boundary condition. For a Wright-Fisher diffusion with or without selection and varying population size, the boundary condition is $\lim_{x \downarrow 0} x f(x,t)=θρ(t)$, where $f(\cdot,t)$ is the frequency spectrum of derived allele…
▽ More
A forward diffusion equation describing the evolution of the allele frequency spectrum is presented. The influx of mutations is accounted for by imposing a suitable boundary condition. For a Wright-Fisher diffusion with or without selection and varying population size, the boundary condition is $\lim_{x \downarrow 0} x f(x,t)=θρ(t)$, where $f(\cdot,t)$ is the frequency spectrum of derived alleles at independent loci at time $t$ and $ρ(t)$ is the relative population size at time $t$. When population size and selection intensity are independent of time, the forward equation is equivalent to the backwards diffusion usually used to derive the frequency spectrum, but the forward equation allows computation of the time dependence of the spectrum both before an equilibrium is attained and when population size and selection intensity vary with time. From the diffusion equation, we derive a set of ordinary differential equations for the moments of $f(\cdot,t)$ and express the expected spectrum of a finite sample in terms of those moments. We illustrate the use of the forward equation by considering neutral and selected alleles in a highly simplified model of human history. For example, we show that approximately 30% of the expected heterozygosity of neutral loci is attributable to mutations that arose since the onset of population growth in roughly the last $150,000$ years.
△ Less
Submitted 5 June, 2006; v1 submitted 8 April, 2006;
originally announced April 2006.