-
H-Packer: Holographic Rotationally Equivariant Convolutional Neural Network for Protein Side-Chain Packing
Authors:
Gian Marco Visani,
William Galvin,
Michael Neal Pun,
Armita Nourmohammad
Abstract:
Accurately modeling protein 3D structure is essential for the design of functional proteins. An important sub-task of structure modeling is protein side-chain packing: predicting the conformation of side-chains (rotamers) given the protein's backbone structure and amino-acid sequence. Conventional approaches for this task rely on expensive sampling procedures over hand-crafted energy functions and…
▽ More
Accurately modeling protein 3D structure is essential for the design of functional proteins. An important sub-task of structure modeling is protein side-chain packing: predicting the conformation of side-chains (rotamers) given the protein's backbone structure and amino-acid sequence. Conventional approaches for this task rely on expensive sampling procedures over hand-crafted energy functions and rotamer libraries. Recently, several deep learning methods have been developed to tackle the problem in a data-driven way, albeit with vastly different formulations (from image-to-image translation to directly predicting atomic coordinates). Here, we frame the problem as a joint regression over the side-chains' true degrees of freedom: the dihedral $χ$ angles. We carefully study possible objective functions for this task, while accounting for the underlying symmetries of the task. We propose Holographic Packer (H-Packer), a novel two-stage algorithm for side-chain packing built on top of two light-weight rotationally equivariant neural networks. We evaluate our method on CASP13 and CASP14 targets. H-Packer is computationally efficient and shows favorable performance against conventional physics-based algorithms and is competitive against alternative deep learning solutions.
△ Less
Submitted 28 November, 2023; v1 submitted 15 November, 2023;
originally announced November 2023.
-
Variability in the local and global composition of human T-cell receptor repertoires during thymic development across cell types and individuals
Authors:
Giulio Isacchini,
Valentin Quiniou,
Hélène Vantomme,
Paul Stys,
Encarnita Mariotti-Ferandiz,
David Klatzmann,
Aleksandra M. Walczak,
Thierry Mora,
Armita Nourmohammad
Abstract:
The adaptive immune response relies on T cells that combine phenotypic specialization with diversity of T cell receptors (TCRs) to recognize a wide range of pathogens. TCRs are acquired and selected during T cell maturation in the thymus. Characterizing TCR repertoires across individuals and T cell maturation stages is important for better understanding adaptive immune responses and for develo**…
▽ More
The adaptive immune response relies on T cells that combine phenotypic specialization with diversity of T cell receptors (TCRs) to recognize a wide range of pathogens. TCRs are acquired and selected during T cell maturation in the thymus. Characterizing TCR repertoires across individuals and T cell maturation stages is important for better understanding adaptive immune responses and for develo** new diagnostics and therapies. Analyzing a dataset of human TCR repertoires from thymocyte subsets, we find that the variability between individuals generated during the TCR V(D)J recombination is maintained through all stages of T cell maturation and differentiation. The inter-individual variability of repertoires of the same cell type is of comparable magnitude to the variability across cell types within the same individual. To zoom in on smaller scales than whole repertoires, we defined a distance measuring the relative overlap of locally similar sequences in repertoires. We find that the whole repertoire models correctly predict local similarity networks, suggesting a lack of forbidden T cell receptor sequences. The local measure correlates well with distances calculated using whole repertoire traits and carries information about cell types.
△ Less
Submitted 25 July, 2023;
originally announced July 2023.
-
Learning the shape of protein micro-environments with a holographic convolutional neural network
Authors:
Michael N. Pun,
Andrew Ivanov,
Quinn Bellamy,
Zachary Montague,
Colin LaMont,
Philip Bradley,
Jakub Otwinowski,
Armita Nourmohammad
Abstract:
Proteins play a central role in biology from immune recognition to brain activity. While major advances in machine learning have improved our ability to predict protein structure from sequence, determining protein function from structure remains a major challenge. Here, we introduce Holographic Convolutional Neural Network (H-CNN) for proteins, which is a physically motivated machine learning appr…
▽ More
Proteins play a central role in biology from immune recognition to brain activity. While major advances in machine learning have improved our ability to predict protein structure from sequence, determining protein function from structure remains a major challenge. Here, we introduce Holographic Convolutional Neural Network (H-CNN) for proteins, which is a physically motivated machine learning approach to model amino acid preferences in protein structures. H-CNN reflects physical interactions in a protein structure and recapitulates the functional information stored in evolutionary data. H-CNN accurately predicts the impact of mutations on protein function, including stability and binding of protein complexes. Our interpretable computational model for protein structure-function maps could guide design of novel proteins with desired function.
△ Less
Submitted 5 November, 2022;
originally announced November 2022.
-
Design of an optimal combination therapy with broadly neutralizing antibodies to suppress HIV-1
Authors:
Colin LaMont,
Jakub Otwinowski,
Kanika Vanshylla,
Henning Gruell,
Florian Klein,
Armita Nourmohammad
Abstract:
Broadly neutralizing antibodies (bNAbs) are promising targets for vaccination and therapy against HIV. Passive infusions of bNAbs have shown promise in clinical trials as a potential alternative for anti-retroviral therapy. A key challenge for the potential clinical application of bnAbs is the suppression of viral escape, which is more effectively achieved with a combination of bNAbs. However, ide…
▽ More
Broadly neutralizing antibodies (bNAbs) are promising targets for vaccination and therapy against HIV. Passive infusions of bNAbs have shown promise in clinical trials as a potential alternative for anti-retroviral therapy. A key challenge for the potential clinical application of bnAbs is the suppression of viral escape, which is more effectively achieved with a combination of bNAbs. However, identifying an optimal bNAb cocktail is combinatorially complex. Here, we propose a computational approach to predict the efficacy of a bNAb therapy trial based on the population genetics of HIV escape, which we parametrize using high-throughput HIV sequence data from a cohort of untreated bNAb-naive patients. By quantifying the mutational target size and the fitness cost of HIV-1 escape from bNAbs, we reliably predict the distribution of rebound times in three clinical trials. Importantly, we show that early rebounds are dominated by the pre-treatment standing variation of HIV-1 populations, rather than spontaneous mutations during treatment. Lastly, we show that a cocktail of three bNAbs is necessary to suppress the chances of viral escape below 1%, and we predict the optimal composition of such a bNAb cocktail. Our results offer a rational design for bNAb therapy against HIV-1, and more generally show how genetic data could be used to predict treatment outcomes and design new approaches to pathogenic control.
△ Less
Submitted 30 November, 2021;
originally announced December 2021.
-
Risk-utility tradeoff shapes memory strategies for evolving patterns
Authors:
Oskar H Schnaack,
Luca Peliti,
Armita Nourmohammad
Abstract:
Kee** a memory of evolving stimuli is ubiquitous in biology, an example of which is immune memory for evolving pathogens. However, learning and memory storage for dynamic patterns still pose challenges in machine learning. Here, we introduce an analytical energy-based framework to address this problem. By accounting for the tradeoff between utility in kee** a high-affinity memory and the risk…
▽ More
Kee** a memory of evolving stimuli is ubiquitous in biology, an example of which is immune memory for evolving pathogens. However, learning and memory storage for dynamic patterns still pose challenges in machine learning. Here, we introduce an analytical energy-based framework to address this problem. By accounting for the tradeoff between utility in kee** a high-affinity memory and the risk in forgetting some of the diverse stimuli, we show that a moderate tolerance for risk enables a repertoire to robustly classify evolving patterns, without much fine-tuning. Our approach offers a general guideline for learning and memory storage in systems interacting with diverse and evolving stimuli.
△ Less
Submitted 28 October, 2021;
originally announced October 2021.
-
Deep generative selection models of T and B cell receptor repertoires with soNNia
Authors:
Giulio Isacchini,
Aleksandra M Walczak,
Thierry Mora,
Armita Nourmohammad
Abstract:
Subclasses of lymphocytes carry different functional roles to work together to produce an immune response and lasting immunity. Additionally to these functional roles, T and B-cell lymphocytes rely on the diversity of their receptor chains to recognize different pathogens. The lymphocyte subclasses emerge from common ancestors generated with the same diversity of receptors during selection process…
▽ More
Subclasses of lymphocytes carry different functional roles to work together to produce an immune response and lasting immunity. Additionally to these functional roles, T and B-cell lymphocytes rely on the diversity of their receptor chains to recognize different pathogens. The lymphocyte subclasses emerge from common ancestors generated with the same diversity of receptors during selection processes. Here we leverage biophysical models of receptor generation with machine learning models of selection to identify specific sequence features characteristic of functional lymphocyte repertoires and subrepertoires. Specifically using only repertoire level sequence information, we classify CD4$^+$ and CD8$^+$ T-cells, find correlations between receptor chains arising during selection and identify T-cells subsets that are targets of pathogenic epitopes. We also show examples of when simple linear classifiers do as well as more complex machine learning methods.
△ Less
Submitted 26 March, 2021; v1 submitted 5 November, 2020;
originally announced November 2020.
-
Dynamics of B-cell repertoires and emergence of cross-reactive responses in COVID-19 patients with different disease severity
Authors:
Zachary Montague,
Huibin Lv,
Jakub Otwinowski,
William S. DeWitt,
Giulio Isacchini,
Garrick K. Yip,
Wilson W. Ng,
Owen Tak-Yin Tsang,
Meng Yuan,
Hejun Liu,
Ian A. Wilson,
J. S. Malik Peiris,
Nicholas C. Wu,
Armita Nourmohammad,
Chris Ka Pun Mok
Abstract:
COVID-19 patients show varying severity of the disease ranging from asymptomatic to requiring intensive care. Although a number of SARS-CoV-2 specific monoclonal antibodies have been identified, we still lack an understanding of the overall landscape of B-cell receptor (BCR) repertoires in COVID-19 patients. Here, we used high-throughput sequencing of bulk and plasma B-cells collected over multipl…
▽ More
COVID-19 patients show varying severity of the disease ranging from asymptomatic to requiring intensive care. Although a number of SARS-CoV-2 specific monoclonal antibodies have been identified, we still lack an understanding of the overall landscape of B-cell receptor (BCR) repertoires in COVID-19 patients. Here, we used high-throughput sequencing of bulk and plasma B-cells collected over multiple time points during infection to characterize signatures of B-cell response to SARS-CoV-2 in 19 patients. Using principled statistical approaches, we determined differential features of BCRs associated with different disease severity. We identified 38 significantly expanded clonal lineages shared among patients as candidates for specific responses to SARS-CoV-2. Using single-cell sequencing, we verified reactivity of BCRs shared among individuals to SARS-CoV-2 epitopes. Moreover, we identified natural emergence of a BCR with cross-reactivity to SARS-CoV-1 and SARS-CoV-2 in a number of patients. Our results provide important insights for development of rational therapies and vaccines against COVID-19.
△ Less
Submitted 5 April, 2021; v1 submitted 13 July, 2020;
originally announced July 2020.
-
Optimal evolutionary decision-making to store immune memory
Authors:
Oskar H Schnaack,
Armita Nourmohammad
Abstract:
The adaptive immune system provides a diverse set of molecules that can mount specific responses against a multitude of pathogens. Memory is a key feature of adaptive immunity, which allows organisms to respond more readily upon re-infections. However, differentiation of memory cells is still one of the least understood cell fate decisions. Here, we introduce a mathematical framework to characteri…
▽ More
The adaptive immune system provides a diverse set of molecules that can mount specific responses against a multitude of pathogens. Memory is a key feature of adaptive immunity, which allows organisms to respond more readily upon re-infections. However, differentiation of memory cells is still one of the least understood cell fate decisions. Here, we introduce a mathematical framework to characterize optimal strategies to store memory to maximize the utility of immune response over an organism's lifetime. We show that memory production should be actively regulated to balance between affinity and cross-reactivity of immune receptors for an effective protection against evolving pathogens. Moreover, we predict that specificity of memory should depend on the organism's lifespan, and shorter-lived organisms with fewer pathogenic encounters should store more cross-reactive memory. Our framework provides a baseline to gauge the efficacy of immune memory in light of an organism's coevolutionary history with pathogens.
△ Less
Submitted 12 April, 2021; v1 submitted 2 July, 2020;
originally announced July 2020.
-
SOS: Online probability estimation and generation of T and B cell receptors
Authors:
Giulio Isacchini,
Carlos Olivares,
Armita Nourmohammad,
Aleksandra M. Walczak,
Thierry Mora
Abstract:
Recent advances in modelling VDJ recombination and subsequent selection of T and B cell receptors provide useful tools to analyze and compare immune repertoires across time, individuals, and tissues. A suite of tools--IGoR [1], OLGA [2] and SONIA [3]--have been publicly released to the community that allow for the inference of generative and selection models from high-throughput sequencing data. H…
▽ More
Recent advances in modelling VDJ recombination and subsequent selection of T and B cell receptors provide useful tools to analyze and compare immune repertoires across time, individuals, and tissues. A suite of tools--IGoR [1], OLGA [2] and SONIA [3]--have been publicly released to the community that allow for the inference of generative and selection models from high-throughput sequencing data. However using these tools requires some scripting or command-line skills and familiarity with complex datasets. As a result the application of the above models has not been available to a broad audience. In this application note we fill this gap by presenting Simple OLGA & SONIA (SOS), a web-based interface where users with no coding skills can compute the generation and post-selection probabilities of their sequences, as well as generate batches of synthetic sequences. The application also functions on mobile phones.
△ Less
Submitted 29 March, 2020;
originally announced March 2020.
-
Optimal evolutionary control for artificial selection on molecular phenotypes
Authors:
Armita Nourmohammad,
Ceyhun Eksin
Abstract:
Controlling an evolving population is an important task in modern molecular genetics, including directed evolution for improving the activity of molecules and enzymes, in breeding experiments in animals and in plants, and in devising public health strategies to suppress evolving pathogens. An optimal intervention to direct evolution should be designed by considering its impact over an entire stoch…
▽ More
Controlling an evolving population is an important task in modern molecular genetics, including directed evolution for improving the activity of molecules and enzymes, in breeding experiments in animals and in plants, and in devising public health strategies to suppress evolving pathogens. An optimal intervention to direct evolution should be designed by considering its impact over an entire stochastic evolutionary trajectory that follows. As a result, a seemingly suboptimal intervention at a given time can be globally optimal as it can open opportunities for desirable actions in the future. Here, we propose a feedback control formalism to devise globally optimal artificial selection protocol to direct the evolution of molecular phenotypes. We show that artificial selection should be designed to counter evolutionary tradeoffs among multi-variate phenotypes to avoid undesirable outcomes in one phenotype by imposing selection on another. Control by artificial selection is challenged by our ability to predict molecular evolution. We develop an information theoretical framework and show that molecular time-scales for evolution under natural selection can inform how to monitor a population in order to acquire sufficient predictive information for an effective intervention with artificial selection. Our formalism opens a new avenue for devising artificial selection methods for directed evolution of molecular functions.
△ Less
Submitted 12 January, 2021; v1 submitted 31 December, 2019;
originally announced December 2019.
-
On generative models of T-cell receptor sequences
Authors:
Giulio Isacchini,
Zachary Sethna,
Yuval Elhanati,
Armita Nourmohammad,
Aleksandra M. Walczak,
Thierry Mora
Abstract:
T-cell receptors (TCR) are key proteins of the adaptive immune system, generated randomly in each individual, whose diversity underlies our ability to recognize infections and malignancies. Modeling the distribution of TCR sequences is of key importance for immunology and medical applications. Here, we compare two inference methods trained on high-throughput sequencing data: a knowledge-guided app…
▽ More
T-cell receptors (TCR) are key proteins of the adaptive immune system, generated randomly in each individual, whose diversity underlies our ability to recognize infections and malignancies. Modeling the distribution of TCR sequences is of key importance for immunology and medical applications. Here, we compare two inference methods trained on high-throughput sequencing data: a knowledge-guided approach, which accounts for the details of sequence generation, supplemented by a physics-inspired model of selection; and a knowledge-free Variational Auto-Encoder based on deep artificial neural networks. We show that the knowledge-guided model outperforms the deep network approach at predicting TCR probabilities, while being more interpretable, at a lower computational cost.
△ Less
Submitted 13 March, 2020; v1 submitted 27 November, 2019;
originally announced November 2019.
-
The size of the immune repertoire of bacteria
Authors:
Serena Bradde,
Armita Nourmohammad,
Sidhartha Goyal,
Vijay Balasubramanian
Abstract:
Some bacteria and archaea possess an immune system, based on the CRISPR-Cas mechanism, that confers adaptive immunity against phage. In such species, individual bacteria maintain a "cassette" of viral DNA elements called spacers as a memory of past infections. The typical cassette contains a few dozen spacers. Given that bacteria can have very large genomes, and since having more spacers should co…
▽ More
Some bacteria and archaea possess an immune system, based on the CRISPR-Cas mechanism, that confers adaptive immunity against phage. In such species, individual bacteria maintain a "cassette" of viral DNA elements called spacers as a memory of past infections. The typical cassette contains a few dozen spacers. Given that bacteria can have very large genomes, and since having more spacers should confer a better memory, it is puzzling that so little genetic space would be devoted by bacteria to their adaptive immune system. Here, we identify a fundamental trade-off between the size of the bacterial immune repertoire and effectiveness of response to a given threat, and show how this tradeoff imposes a limit on the optimal size of the CRISPR cassette.
△ Less
Submitted 1 March, 2019;
originally announced March 2019.
-
Fierce selection and interference in B-cell repertoire response to chronic HIV-1
Authors:
Armita Nourmohammad,
Jakub Otwinowski,
Marta Łuksza,
Thierry Mora,
Aleksandra M Walczak
Abstract:
During chronic infection, HIV-1 engages in a rapid coevolutionary arms race with the host's adaptive immune system. While it is clear that HIV exerts strong selection on the adaptive immune system, the characteristics of the somatic evolution that shape the immune response are still unknown. Traditional population genetics methods fail to distinguish chronic immune response from healthy repertoire…
▽ More
During chronic infection, HIV-1 engages in a rapid coevolutionary arms race with the host's adaptive immune system. While it is clear that HIV exerts strong selection on the adaptive immune system, the characteristics of the somatic evolution that shape the immune response are still unknown. Traditional population genetics methods fail to distinguish chronic immune response from healthy repertoire evolution. Here, we infer the evolutionary modes of B-cell repertoires and identify complex dynamics with a constant production of better B-cell receptor mutants that compete, maintaining large clonal diversity and potentially slowing down adaptation. A substantial fraction of mutations that rise to high frequencies in pathogen engaging CDRs of B-cell receptors (BCRs) are beneficial, in contrast to many such changes in structurally relevant frameworks that are deleterious and circulate by hitchhiking. We identify a pattern where BCRs in patients who experience larger viral expansions undergo stronger selection with a rapid turnover of beneficial mutations due to clonal interference in their CDR3 regions. Using population genetics modeling, we show that the extinction of these beneficial mutations can be attributed to the rise of competing beneficial alleles and clonal interference. The picture is of a dynamic repertoire, where better clones may be outcompeted by new mutants before they fix.
△ Less
Submitted 25 July, 2020; v1 submitted 24 February, 2018;
originally announced February 2018.
-
Host-pathogen coevolution and the emergence of broadly neutralizing antibodies in chronic infections
Authors:
Armita Nourmohammad,
Jakub Otwinowski,
Joshua B. Plotkin
Abstract:
The vertebrate adaptive immune system provides a flexible and diverse set of molecules to neutralize pathogens. Yet, viruses such as HIV can cause chronic infections by evolving as quickly as the adaptive immune system, forming an evolutionary arms race. Here we introduce a mathematical framework to study the coevolutionary dynamics of antibodies with antigens within a host. We focus on changes in…
▽ More
The vertebrate adaptive immune system provides a flexible and diverse set of molecules to neutralize pathogens. Yet, viruses such as HIV can cause chronic infections by evolving as quickly as the adaptive immune system, forming an evolutionary arms race. Here we introduce a mathematical framework to study the coevolutionary dynamics of antibodies with antigens within a host. We focus on changes in the binding interactions between the antibody and antigen populations, which result from the underlying stochastic evolution of genotype frequencies driven by mutation, selection, and drift. We identify the critical viral and immune parameters that determine the distribution of antibody-antigen binding affinities. We also identify definitive signatures of coevolution that measure the reciprocal response between antibodies and viruses, and we introduce experimentally measurable quantities that quantify the extent of adaptation during continual coevolution of the two opposing populations. Using this analytical framework, we infer rates of viral and immune adaptation based on time-shifted neutralization assays in two HIV-infected patients. Finally, we analyze competition between clonal lineages of antibodies and characterize the fate of a given lineage in terms of the state of the antibody and viral populations. In particular, we derive the conditions that favor the emergence of broadly neutralizing antibodies, which may be useful in designing a vaccine against HIV.
△ Less
Submitted 20 June, 2016; v1 submitted 19 December, 2015;
originally announced December 2015.
-
Pervasive adaptation of gene expression in Drosophila
Authors:
Armita Nourmohammad,
Joachim Rambeau,
Torsten Held,
Johannes Berg,
Michael Lassig
Abstract:
Gene expression levels are important molecular quantitative traits that link genotypes to molecular functions and fitness. In Drosophila, population-genetic studies in recent years have revealed substantial adaptive evolution at the genomic level. However, the evolutionary modes of gene expression have remained controversial. Here we present evidence that adaptation dominates the evolution of gene…
▽ More
Gene expression levels are important molecular quantitative traits that link genotypes to molecular functions and fitness. In Drosophila, population-genetic studies in recent years have revealed substantial adaptive evolution at the genomic level. However, the evolutionary modes of gene expression have remained controversial. Here we present evidence that adaptation dominates the evolution of gene expression levels in flies. We show that 63% of the observed expression divergence across seven Drosophila species are adaptive changes driven by directional selection. Our results are derived from the variation of expression within species and the time-resolved divergence across a family of related species, using a new inference method for selection. We identify functional classes of adaptively regulated genes, as well as sex-specific adaptation occurring predominantly in males. Our analysis opens a new avenue to map system-wide selection on molecular quantitative traits independently of their genetic basis.
△ Less
Submitted 2 April, 2015; v1 submitted 23 February, 2015;
originally announced February 2015.
-
Adaptive evolution of molecular phenotypes
Authors:
Torsten Held,
Armita Nourmohammad,
Michael Lässig
Abstract:
Molecular phenotypes link genomic information with organismic functions, fitness, and evolution. Quantitative traits are complex phenotypes that depend on multiple genomic loci. In this paper, we study the adaptive evolution of a quantitative trait under time-dependent selection, which arises from environmental changes or through fitness interactions with other co-evolving phenotypes. We analyze a…
▽ More
Molecular phenotypes link genomic information with organismic functions, fitness, and evolution. Quantitative traits are complex phenotypes that depend on multiple genomic loci. In this paper, we study the adaptive evolution of a quantitative trait under time-dependent selection, which arises from environmental changes or through fitness interactions with other co-evolving phenotypes. We analyze a model of trait evolution under mutations and genetic drift in a single-peak fitness seascape. The fitness peak performs a constrained random walk in the trait amplitude, which determines the time-dependent trait optimum in a given population. We derive analytical expressions for the distribution of the time-dependent trait divergence between populations and of the trait diversity within populations. Based on this solution, we develop a method to infer adaptive evolution of quantitative traits. Specifically, we show that the ratio of the average trait divergence and the diversity is a universal function of evolutionary time, which predicts the stabilizing strength and the driving rate of the fitness seascape. From an information-theoretic point of view, this function measures the macro-evolutionary entropy in a population ensemble, which determines the predictability of the evolutionary process. Our solution also quantifies two key characteristics of adapting populations: the cumulative fitness flux, which measures the total amount of adaptation, and the adaptive load, which is the fitness cost due to a population's lag behind the fitness peak.
△ Less
Submitted 7 March, 2014;
originally announced March 2014.
-
Universality and predictability in molecular quantitative genetics
Authors:
Armita Nourmohammad,
Torsten Held,
Michael Lässig
Abstract:
Molecular traits, such as gene expression levels or protein binding affinities, are increasingly accessible to quantitative measurement by modern high-throughput techniques. Such traits measure molecular functions and, from an evolutionary point of view, are important as targets of natural selection. We review recent developments in evolutionary theory and experiments that are expected to become b…
▽ More
Molecular traits, such as gene expression levels or protein binding affinities, are increasingly accessible to quantitative measurement by modern high-throughput techniques. Such traits measure molecular functions and, from an evolutionary point of view, are important as targets of natural selection. We review recent developments in evolutionary theory and experiments that are expected to become building blocks of a quantitative genetics of molecular traits. We focus on universal evolutionary characteristics: these are largely independent of a trait's genetic basis, which is often at least partially unknown. We show that universal measurements can be used to infer selection on a quantitative trait, which determines its evolutionary mode of conservation or adaptation. Furthermore, universality is closely linked to predictability of trait evolution across lineages. We argue that universal trait statistics extends over a range of cellular scales and opens new avenues of quantitative evolutionary systems biology.
△ Less
Submitted 14 November, 2013; v1 submitted 12 September, 2013;
originally announced September 2013.
-
Evolution of molecular phenotypes under stabilizing selection
Authors:
Armita Nourmohammad,
Stephan Schiffels,
Michael Laessig
Abstract:
Molecular phenotypes are important links between genomic information and organismic functions, fitness, and evolution. Complex phenotypes, which are also called quantitative traits, often depend on multiple genomic loci. Their evolution builds on genome evolution in a complicated way, which involves selection, genetic drift, mutations and recombination. Here we develop a coarse-grained evolutionar…
▽ More
Molecular phenotypes are important links between genomic information and organismic functions, fitness, and evolution. Complex phenotypes, which are also called quantitative traits, often depend on multiple genomic loci. Their evolution builds on genome evolution in a complicated way, which involves selection, genetic drift, mutations and recombination. Here we develop a coarse-grained evolutionary statistics for phenotypes, which decouples from details of the underlying genotypes. We derive approximate evolution equations for the distribution of phenotype values within and across populations. This dynamics covers evolutionary processes at high and low recombination rates, that is, it applies to sexual and asexual populations. In a fitness landscape with a single optimal phenotype value, the phenotypic diversity within populations and the divergence between populations reach evolutionary equilibria, which describe stabilizing selection. We compute the equilibrium distributions of both quantities analytically and we show that the ratio of mean divergence and diversity depends on the strength of selection in a universal way: it is largely independent of the phenotype's genomic encoding and of the recombination rate. This establishes a new method for the inference of selection on molecular phenotypes beyond the genome level. We discuss the implications of our findings for the predictability of evolutionary processes.
△ Less
Submitted 16 January, 2013;
originally announced January 2013.
-
Formation of regulatory modules by local sequence duplication
Authors:
Armita Nourmohammad,
Michael Laessig
Abstract:
Turnover of regulatory sequence and function is an important part of molecular evolution. But what are the modes of sequence evolution leading to rapid formation and loss of regulatory sites? Here, we show that a large fraction of neighboring transcription factor binding sites in the fly genome have formed from a common sequence origin by local duplications. This mode of evolution is found to prod…
▽ More
Turnover of regulatory sequence and function is an important part of molecular evolution. But what are the modes of sequence evolution leading to rapid formation and loss of regulatory sites? Here, we show that a large fraction of neighboring transcription factor binding sites in the fly genome have formed from a common sequence origin by local duplications. This mode of evolution is found to produce regulatory information: duplications can seed new sites in the neighborhood of existing sites. Duplicate seeds evolve subsequently by point mutations, often towards binding a different factor than their ancestral neighbor sites. These results are based on a statistical analysis of 346 cis-regulatory modules in the Drosophila melanogaster genome, and a comparison set of intergenic regulatory sequence in Saccharomyces cerevisiae. In fly regulatory modules, pairs of binding sites show significantly enhanced sequence similarity up to distances of about 50 bp. We analyze these data in terms of an evolutionary model with two distinct modes of site formation: (i) evolution from independent sequence origin and (ii) divergent evolution following duplication of a common ancestor sequence. Our results suggest that pervasive formation of binding sites by local sequence duplications distinguishes the complex regulatory architecture of higher eukaryotes from the simpler architecture of unicellular organisms.
△ Less
Submitted 24 May, 2011;
originally announced May 2011.