-
Balancing reaction-diffusion network for cell polarization pattern with stability and asymmetry
Authors:
Yixuan Chen,
Guoye Guan,
Lei-Han Tang,
Chao Tang
Abstract:
Cell polarization is a critical process that separates molecules into two distinct regions in prokaryotic and eukaryotic cells, guiding biological processes such as cell division and cell differentiation. Although several underlying antagonistic reaction-diffusion networks capable of setting up cell polarization have been identified experimentally and theoretically, our understanding of how to man…
▽ More
Cell polarization is a critical process that separates molecules into two distinct regions in prokaryotic and eukaryotic cells, guiding biological processes such as cell division and cell differentiation. Although several underlying antagonistic reaction-diffusion networks capable of setting up cell polarization have been identified experimentally and theoretically, our understanding of how to manipulate pattern stability and asymmetry remains incomplete, especially when only a subset of network components are known. Here we present numerical results to show that the polarized pattern of an antagonistic 2-node network collapses into a homogeneous state when subjected to single-sided self-regulation, single-sided additional regulation, or unequal system parameters. However, polarity can be restored through a combination of two modifications that have opposing effects. Additionally, spatially inhomogeneous parameters favoring respective domains stabilize their interface at designated locations. To connect our findings to cell polarity studies of the nematode Caenorhabditis elegans zygote, we reconstituted a 5-node network where a 4-node circuit with full mutual inhibitions between anterior and posterior is modified by a mutual activation in the anterior and an additional mutual inhibition between the anterior and the posterior. Once again, a generic set of kinetic parameters moves the interface towards either the anterior or posterior end, yet a polarized pattern can be stabilized through spatial tuning of one or more parameters coupled to intracellular or extracellular cues. A user-friendly software, PolarSim, is introduced to facilitate the exploration of networks with alternative node numbers, parameter values, and regulatory pathways.
△ Less
Submitted 14 January, 2024;
originally announced January 2024.
-
SegmentAnything helps microscopy images based automatic and quantitative organoid detection and analysis
Authors:
Xiaodan Xing,
Chunling Tang,
Yunzhe Guo,
Nicholas Kurniawan,
Guang Yang
Abstract:
Organoids are self-organized 3D cell clusters that closely mimic the architecture and function of in vivo tissues and organs. Quantification of organoid morphology helps in studying organ development, drug discovery, and toxicity assessment. Recent microscopy techniques provide a potent tool to acquire organoid morphology features, but manual image analysis remains a labor and time-intensive proce…
▽ More
Organoids are self-organized 3D cell clusters that closely mimic the architecture and function of in vivo tissues and organs. Quantification of organoid morphology helps in studying organ development, drug discovery, and toxicity assessment. Recent microscopy techniques provide a potent tool to acquire organoid morphology features, but manual image analysis remains a labor and time-intensive process. Thus, this paper proposes a comprehensive pipeline for microscopy analysis that leverages the SegmentAnything to precisely demarcate individual organoids. Additionally, we introduce a set of morphological properties, including perimeter, area, radius, non-smoothness, and non-circularity, allowing researchers to analyze the organoid structures quantitatively and automatically. To validate the effectiveness of our approach, we conducted tests on bright-field images of human induced pluripotent stem cells (iPSCs) derived neural-epithelial (NE) organoids. The results obtained from our automatic pipeline closely align with manual organoid detection and measurement, showcasing the capability of our proposed method in accelerating organoids morphology analysis.
△ Less
Submitted 8 April, 2024; v1 submitted 8 September, 2023;
originally announced September 2023.
-
MorphoSim: An efficient and scalable phase-field framework for accurately simulating multicellular morphologies
Authors:
Xiangyu Kuang,
Guoye Guan,
Chao Tang,
Lei Zhang
Abstract:
The phase field model can accurately simulate the evolution of microstructures with complex morphologies, and it has been widely used for cell modeling in the last two decades. However, compared to other cellular models such as the coarse-grained model and the vertex model, its high computational cost caused by three-dimensional spatial discretization hampered its application and scalability, espe…
▽ More
The phase field model can accurately simulate the evolution of microstructures with complex morphologies, and it has been widely used for cell modeling in the last two decades. However, compared to other cellular models such as the coarse-grained model and the vertex model, its high computational cost caused by three-dimensional spatial discretization hampered its application and scalability, especially for multicellular organisms. Recently, we built a phase field model coupled with in vivo imaging data to accurately reconstruct the embryonic morphogenesis of Caenorhabditis elegans from 1- to 8-cell stages [Kuang et al, PLoS Comput. Biol., 2022]. In this work, we propose an improved phase field model by using the stabilized numerical scheme and modified volume constriction. Then we present a scalable phase-field framework, MorphoSim, which is 100 times more efficient than the previous one, and can simulate over 100 mechanically interacting cells. Finally, we demonstrate how MorphoSim can be successfully applied to reproduce the assembly, self-repairing, and dissociation of a synthetic artificial multicellular system - the synNotch system.
△ Less
Submitted 10 June, 2022;
originally announced June 2022.
-
Spontaneous mechanical and energetic state transitions during Caenorhabditis elegans gastrulation
Authors:
Jiao Miao,
Guoye Guan,
Chao Tang
Abstract:
Gastrulation, namely cell internalization, is a significant milestone during the development of metazoans from worm to human, which generates multiple embryonic layers with distinct cell fates and spatial organizations. Although many molecular activities are known to facilitate this process, in this paper, we focus on gastrulation of the nematode Caenorhabditis elegans and theoretically demonstrat…
▽ More
Gastrulation, namely cell internalization, is a significant milestone during the development of metazoans from worm to human, which generates multiple embryonic layers with distinct cell fates and spatial organizations. Although many molecular activities are known to facilitate this process, in this paper, we focus on gastrulation of the nematode Caenorhabditis elegans and theoretically demonstrate that even a group of cells with only isotropic repulsive and attractive interactions can experience such internalization behavior when dividing within a confined space. As the cell number increases and cell size decreases, the cells contacted to the eggshell become closer to each other along with harder lateral compression, and a cell that internalizes could effectively increase the cell neighbor distance and lower the potential energy of the system. The multicellular structure transits from single- to double-layer spontaneously with bistable states existing from 15- to 44-cell stages, near the gastrulation timing in vivo. Specifically, the cells with a larger size or placed near a smaller-curvature boundary are easier to internalize. Actively regulating a few cells' internalizations can make the morphogenesis noise-resistant. Our work successfully recaptures the key characteristics in C. elegans gastrulation and provides a rational interpretation of how this phenomenon emerges and is optimally programmed.
△ Less
Submitted 14 January, 2022; v1 submitted 12 May, 2021;
originally announced May 2021.
-
Speed and fate diversity tradeoff in nematode's early embryogenesis
Authors:
Guoye Guan,
Ming-Kin Wong,
Zhongying Zhao,
Lei-Han Tang,
Chao Tang
Abstract:
Nematode species are well-known for their invariant cell lineage pattern during development. Combining knowledge about the fate specification induced by asymmetric division and the anti-correlation between cell cycle length and cell volume in Caenorhabditis elegans, we propose a model to simulate lineage initiation by altering cell volume segregation ratio in each division, and quantify the derive…
▽ More
Nematode species are well-known for their invariant cell lineage pattern during development. Combining knowledge about the fate specification induced by asymmetric division and the anti-correlation between cell cycle length and cell volume in Caenorhabditis elegans, we propose a model to simulate lineage initiation by altering cell volume segregation ratio in each division, and quantify the derived pattern's performance in proliferation speed, fate diversity and space robustness. The stereotypic pattern in C. elegans embryo is found to be one of the most optimal solutions taking minimum time to achieve the cell number before gastrulation, by programming asymmetric division as a strategy.
△ Less
Submitted 23 March, 2021; v1 submitted 11 July, 2020;
originally announced July 2020.
-
Critical slowing down and attractive manifold: a mechanism for dynamic robustness in yeast cell-cycle process
Authors:
Yao Zhao,
Dedi Wang,
Zhiwen Zhang,
Ying Lu,
Xiao**g Yang,
Qi Ouyang,
Chao Tang,
Fangting Li
Abstract:
The biological processes that execute complex multiple functions, such as cell cycle, must ensure the order of sequential events and keep the dynamic robustness against various fluctuations. Here, we examine the dynamic mechanism and the fundamental structure to achieve these properties in the cell-cycle process of budding yeast Saccharomyces cerevisiae. We show that the budding yeast cell-cycle p…
▽ More
The biological processes that execute complex multiple functions, such as cell cycle, must ensure the order of sequential events and keep the dynamic robustness against various fluctuations. Here, we examine the dynamic mechanism and the fundamental structure to achieve these properties in the cell-cycle process of budding yeast Saccharomyces cerevisiae. We show that the budding yeast cell-cycle process behaves like an excitable system containing three well-coupled saddle-node bifurcations to execute DNA replication and mitosis events. The yeast cell-cycle regulatory network can be separated into G1/S phase module, early M module and late M phase module, where the positive feedbacks in each module and the interactions among the modules play important role. If the cell-cycle process operates near the critical points of the saddle-node bifurcations, there is a critical slowing down or ghost effect. This can provide the cell-cycle process with a sufficient duration for each event and an attractive manifold for the state checking of the completion of DNA replication and mitosis; moreover, the fluctuation in the early module/event is forbidden to transmit to the latter module/event. Our results suggest both a fundamental structure of cell-cycle regulatory network and a hint for the evolution of eukaryotic cell-cycle processes, from the dynamic checking mechanism to the molecule checkpoint pathway.
△ Less
Submitted 23 November, 2018;
originally announced November 2018.
-
Deciphering gene regulation from gene expression dynamics using deep neural network
Authors:
**gxiang Shen,
Mariela D. Petkova,
Yuhai Tu,
Feng Liu,
Chao Tang
Abstract:
Complex biological functions are carried out by the interaction of genes and proteins. Uncovering the gene regulation network behind a function is one of the central themes in biology. Typically, it involves extensive experiments of genetics, biochemistry and molecular biology. In this paper, we show that much of the inference task can be accomplished by a deep neural network (DNN), a form of mach…
▽ More
Complex biological functions are carried out by the interaction of genes and proteins. Uncovering the gene regulation network behind a function is one of the central themes in biology. Typically, it involves extensive experiments of genetics, biochemistry and molecular biology. In this paper, we show that much of the inference task can be accomplished by a deep neural network (DNN), a form of machine learning or artificial intelligence. Specifically, the DNN learns from the dynamics of the gene expression. The learnt DNN behaves like an accurate simulator of the system, on which one can perform in-silico experiments to reveal the underlying gene network. We demonstrate the method with two examples: biochemical adaptation and the gap-gene patterning in fruit fly embryogenesis. In the first example, the DNN can successfully find the two basic network motifs for adaptation - the negative feedback and the incoherent feed-forward. In the second and much more complex example, the DNN can accurately predict behaviors of essentially all the mutants. Furthermore, the regulation network it uncovers is strikingly similar to the one inferred from experiments. In doing so, we develop methods for deciphering the gene regulation network hidden in the DNN "black box". Our interpretable DNN approach should have broad applications in genotype-phenotype map**.
△ Less
Submitted 22 February, 2020; v1 submitted 22 July, 2018;
originally announced July 2018.
-
Growth strategy of microbes on mixed carbon sources
Authors:
Xin Wang,
Kang Xia,
Xiao**g Yang,
Chao Tang
Abstract:
A classic problem in microbiology is that bacteria display two types of growth behavior when cultured on a mixture of two carbon sources: the two sources are sequentially consumed one after another (diauxie) or they are simultaneously consumed (co-utilization). The search for the molecular mechanism of diauxie led to the discovery of the lac operon. However, questions remain as why microbes would…
▽ More
A classic problem in microbiology is that bacteria display two types of growth behavior when cultured on a mixture of two carbon sources: the two sources are sequentially consumed one after another (diauxie) or they are simultaneously consumed (co-utilization). The search for the molecular mechanism of diauxie led to the discovery of the lac operon. However, questions remain as why microbes would bother to have different strategies of taking up nutrients. Here we show that diauxie versus co-utilization can be understood from the topological features of the metabolic network. A model of optimal allocation of protein resources quantitatively explains why and how the cell makes the choice. In case of co-utilization, the model predicts the percentage of each carbon source in supplying the amino acid pools, which is quantitatively verified by experiments. Our work solves a long-standing puzzle and provides a quantitative framework for the carbon source utilization of microbes.
△ Less
Submitted 1 March, 2019; v1 submitted 26 March, 2017;
originally announced March 2017.
-
Generic Properties of Random Gene Regulatory Networks
Authors:
Zhiyuan Li,
Simone Bianco,
Zhaoyang Zhang,
Chao Tang
Abstract:
Modeling gene regulatory networks (GRNs) is an important topic in systems biology. Although there has been much work focusing on various specific systems, the generic behavior of GRNs with continuous variables is still elusive. In particular, it is not clear typically how attractors partition among the three types of orbits: steady state, periodic and chaotic, and how the dynamical properties chan…
▽ More
Modeling gene regulatory networks (GRNs) is an important topic in systems biology. Although there has been much work focusing on various specific systems, the generic behavior of GRNs with continuous variables is still elusive. In particular, it is not clear typically how attractors partition among the three types of orbits: steady state, periodic and chaotic, and how the dynamical properties change with network's topological characteristics. In this work, we first investigated these questions in random GRNs with different network sizes, connectivity, fraction of inhibitory links and transcription regulation rules. Then we searched for the core motifs that govern the dynamic behavior of large GRNs. We show that the stability of a random GRN is typically governed by a few embedding motifs of small sizes, and therefore can in general be understood in the context of these short motifs. Our results provide insights for the study and design of genetic networks.
△ Less
Submitted 21 April, 2014;
originally announced April 2014.
-
Millisecond-scale motor encoding in a cortical vocal area
Authors:
Claire Tang,
Diala Chehayeb,
Kyle Srivastava,
Ilya Nemenman,
Samuel Sober
Abstract:
Studies of motor control have almost universally examined firing rates to investigate how the brain shapes behavior. In principle, however, neurons could encode information through the precise temporal patterning of their spike trains as well as (or instead of) through their firing rates. Although the importance of spike timing has been demonstrated in sensory systems, it is largely unknown whethe…
▽ More
Studies of motor control have almost universally examined firing rates to investigate how the brain shapes behavior. In principle, however, neurons could encode information through the precise temporal patterning of their spike trains as well as (or instead of) through their firing rates. Although the importance of spike timing has been demonstrated in sensory systems, it is largely unknown whether timing differences in motor areas could affect behavior. We tested the hypothesis that significant information about trial-by-trial variations in behavior is represented by spike timing in the songbird vocal motor system. We found that premotor neurons convey information via spike timing far more often than via spike rate and that the amount of information conveyed at the millisecond timescale greatly exceeds the information available from spike counts. These results demonstrate that information can be represented by spike timing in motor circuits and suggest that timing variations evoke differences in behavior.
△ Less
Submitted 14 April, 2014; v1 submitted 2 April, 2014;
originally announced April 2014.
-
Community detection for networks with unipartite and bipartite structure
Authors:
Chang Chang,
Chao Tang
Abstract:
Finding community structures in networks is important in network science, technology, and applications. To date, most algorithms that aim to find community structures only focus either on unipartite or bipartite networks. A unipartite network consists of one set of nodes and a bipartite network consists of two nonoverlap** sets of nodes with only links joining the nodes in different sets. Howeve…
▽ More
Finding community structures in networks is important in network science, technology, and applications. To date, most algorithms that aim to find community structures only focus either on unipartite or bipartite networks. A unipartite network consists of one set of nodes and a bipartite network consists of two nonoverlap** sets of nodes with only links joining the nodes in different sets. However, a third type of network exists, defined here as the mixture network. Just like a bipartite network, a mixture network also consists of two sets of nodes, but some nodes may simultaneously belong to two sets, which breaks the nonoverlap** restriction of a bipartite network. The mixture network can be considered as a general case, with unipartite and bipartite networks viewed as its limiting cases. A mixture network can represent not only all the unipartite and bipartite networks, but also a wide range of real-world networks that cannot be properly represented as either unipartite or bipartite networks in fields such as biology and social science. Based on this observation, we first propose a probabilistic model that can find modules in unipartite, bipartite, and mixture networks in a unified framework based on the link community model for a unipartite undirected network [B Ball et al (2011 Phys. Rev. E 84 036103)]. We test our algorithm on synthetic networks (both overlap** and nonoverlap** communities) and apply it to two real-world networks: a southern women bipartite network and a human transcriptional regulatory mixture network. The results suggest that our model performs well for all three types of networks, is competitive with other algorithms for unipartite or bipartite networks, and is applicable to real-world networks.
△ Less
Submitted 12 September, 2014; v1 submitted 29 July, 2013;
originally announced July 2013.
-
The σlaw of evolutionary dynamics in community-structured populations
Authors:
Changbing Tang,
Xiang Li,
Lang Cao,
**gyuan Zhan
Abstract:
Evolutionary game dynamics in finite populations provides a new framework to understand the selection of traits with frequency-dependent fitness. Recently, a simple but fundamental law of evolutionary dynamics, which we call σ law, describes how to determine the selection between two competing strategies: in most evolutionary processes with two strategies, A and B, strategy A is favored over B in…
▽ More
Evolutionary game dynamics in finite populations provides a new framework to understand the selection of traits with frequency-dependent fitness. Recently, a simple but fundamental law of evolutionary dynamics, which we call σ law, describes how to determine the selection between two competing strategies: in most evolutionary processes with two strategies, A and B, strategy A is favored over B in weak selection if and only if σR + S > T + σP. This relationship holds for a wide variety of structured populations with mutation rate and weak selection under certain assumptions. In this paper, we propose a model of games based on a community-structured population and revisit this law under the Moran process. By calculating the average payoffs of A and B individuals with the method of effective sojourn time, we find that σ features not only the structured population characteristics but also the reaction rate between individuals. That's to say, an interaction between two individuals are not uniform, and we can take σ as a reaction rate between any two individuals with the same strategy. We verify this viewpoint by the modified replicator equation with non-uniform interaction rates in a simplified version of the prisoner's dilemma game (PDG).
△ Less
Submitted 17 April, 2012;
originally announced April 2012.
-
RNASEQR - A streamlined and accurate RNA-seq sequence analysis program
Authors:
Abner C. -Y. Huang,
Leslie Y Chen,
Kuo-Chen Wei,
Kai Wang,
Chiung-Yin Huang,
Danielle Yi,
Chuan Yi Tang,
David J. Galas,
Leroy E. Hood
Abstract:
The paper has been withdrawn by the authors.
The paper has been withdrawn by the authors.
△ Less
Submitted 29 December, 2011; v1 submitted 15 December, 2011;
originally announced December 2011.
-
Robustness and modular design of the Drosophila segment polarity network
Authors:
Wenzhe Ma,
Luhua Lai,
Qi Ouyang,
Chao Tang
Abstract:
Biomolecular networks have to perform their functions robustly. A robust function may have preferences in the topological structures of the underlying network. We carried out an exhaustive computational analysis on network topologies in relation to a patterning function in Drosophila embryogenesis. We found that while the vast majority of topologies can either not perform the required function o…
▽ More
Biomolecular networks have to perform their functions robustly. A robust function may have preferences in the topological structures of the underlying network. We carried out an exhaustive computational analysis on network topologies in relation to a patterning function in Drosophila embryogenesis. We found that while the vast majority of topologies can either not perform the required function or only do so very fragilely, a small fraction of topologies emerges as particularly robust for the function. The topology adopted by Drosophila, that of the segment polarity network, is a top ranking one among all topologies with no direct autoregulation. Furthermore, we found that all robust topologies are modular--each being a combination of three kinds of modules. These modules can be traced back to three sub-functions of the patterning function and their combinations provide a combinatorial variability for the robust topologies. Our results suggest that the requirement of functional robustness drastically reduces the choices of viable topology to a limited set of modular combinations among which nature optimizes its choice under evolutionary and other biological constraints.
△ Less
Submitted 30 October, 2006; v1 submitted 16 October, 2006;
originally announced October 2006.
-
Function Constrains Network Architecture and Dynamics: A Case Study on the Yeast Cell Cycle Boolean Network
Authors:
Kai-Yeung Lau,
Surya Ganguli,
Chao Tang
Abstract:
We develop a general method to explore how the function performed by a biological network can constrain both its structural and dynamical network properties. This approach is orthogonal to prior studies which examine the functional consequences of a given structural feature, for example a scale free architecture. A key step is to construct an algorithm that allows us to efficiently sample from a…
▽ More
We develop a general method to explore how the function performed by a biological network can constrain both its structural and dynamical network properties. This approach is orthogonal to prior studies which examine the functional consequences of a given structural feature, for example a scale free architecture. A key step is to construct an algorithm that allows us to efficiently sample from a maximum entropy distribution on the space of boolean dynamical networks constrained to perform a specific function, or cascade of gene expression. Such a distribution can act as a "functional null model" to test the significance of any given network feature, and can aid in revealing underlying evolutionary selection pressures on various network properties. Although our methods are general, we illustrate them in an analysis of the yeast cell cycle cascade. This analysis uncovers strong constraints on the architecture of the cell cycle regulatory network as well as significant selection pressures on this network to maintain ordered and convergent dynamics, possibly at the expense of sacrificing robustness to structural perturbations.
△ Less
Submitted 24 April, 2007; v1 submitted 13 October, 2006;
originally announced October 2006.
-
Dynamic Studies of Scaffold-dependent Mating Pathway in Yeast
Authors:
Danying Shao,
Wen Zheng,
Wenjun Qiu,
Qi Ouyang,
Chao Tang
Abstract:
The mating pathway in \emph{Saccharomyces cerevisiae} is one of the best understood signal transduction pathways in eukaryotes. It transmits the mating signal from plasma membrane into the nucleus through the G-protein coupled receptor and the mitogen-activated protein kinase (MAPK) cascade. According to the current understandings of the mating pathway, we construct a system of ordinary differen…
▽ More
The mating pathway in \emph{Saccharomyces cerevisiae} is one of the best understood signal transduction pathways in eukaryotes. It transmits the mating signal from plasma membrane into the nucleus through the G-protein coupled receptor and the mitogen-activated protein kinase (MAPK) cascade. According to the current understandings of the mating pathway, we construct a system of ordinary differential equations to describe the process. Our model is consistent with a wide range of experiments, indicating that it captures some main characteristics of the signal transduction along the pathway. Investigation with the model reveals that the shuttling of the scaffold protein and the dephosphorylation of kinases involved in the MAPK cascade cooperate to regulate the response upon pheromone induction and to help preserving the fidelity of the mating signaling. We explored factors affecting the dose-response curves of this pathway and found that both negative feedback and concentrations of the proteins involved in the MAPK cascade play crucial role. Contrary to some other MAPK systems where signaling sensitivity is being amplified successively along the cascade, here the mating signal is transmitted through the cascade in an almost linear fashion.
△ Less
Submitted 11 September, 2006;
originally announced September 2006.
-
Stochastic Model of Yeast Cell Cycle Network
Authors:
Yu** Zhang,
Min** Qian,
Qi Ouyang,
Minghua Deng,
Fangting Li,
Chao Tang
Abstract:
Biological functions in living cells are controlled by protein interaction and genetic networks. These molecular networks should be dynamically stable against various fluctuations which are inevitable in the living world. In this paper, we propose and study a stochastic model for the network regulating the cell cycle of the budding yeast. The stochasticity in the model is controlled by a tempera…
▽ More
Biological functions in living cells are controlled by protein interaction and genetic networks. These molecular networks should be dynamically stable against various fluctuations which are inevitable in the living world. In this paper, we propose and study a stochastic model for the network regulating the cell cycle of the budding yeast. The stochasticity in the model is controlled by a temperature-like parameter $β$. Our simulation results show that both the biological stationary state and the biological pathway are stable for a wide range of "temperature". There is, however, a sharp transition-like behavior at $β_c$, below which the dynamics is dominated by noise. We also define a pseudo energy landscape for the system in which the biological pathway can be seen as a deep valley.
△ Less
Submitted 6 May, 2006;
originally announced May 2006.
-
Specificity of Trypsin and Chymotrypsin: Loop Motion Controlled Dynamic Correlation as a Determinant
Authors:
Wenzhe Ma,
Chao Tang,
Luhua Lai
Abstract:
Trypsin and chymotrypsin are both serine proteases with high sequence and structural similarities, but with different substrate specificity. Previous experiments have demonstrated the critical role of the two loops outside the binding pocket in controlling the specificity of the two enzymes. To understand the mechanism of such a control of specificity by distant loops, we have used the Gaussian…
▽ More
Trypsin and chymotrypsin are both serine proteases with high sequence and structural similarities, but with different substrate specificity. Previous experiments have demonstrated the critical role of the two loops outside the binding pocket in controlling the specificity of the two enzymes. To understand the mechanism of such a control of specificity by distant loops, we have used the Gaussian Network Model to study the dynamic properties of trypsin and chymotrypsin and the roles played by the two loops. A clustering method was introduced to analyze the correlated motions of residues. We have found that trypsin and chymotrypsin have distinct dynamic signatures in the two loop regions which are in turn highly correlated with motions of certain residues in the binding pockets. Interestingly, replacing the two loops of trypsin with those of chymotrypsin changes the motion style of trypsin to chymotrypsin-like, whereas the same experimental replacement was shown necessary to make trypsin have chymotrypsin's enzyme specificity and activity. These results suggest that the cooperative motions of the two loops and the substrate-binding sites contribute to the activity and substrate specificity of trypsin and chymotrypsin.
△ Less
Submitted 19 May, 2005;
originally announced May 2005.
-
Correlation between sequence hydrophobicity and surface-exposure pattern of database proteins
Authors:
Susanne Moelbert,
Eldon Emberly,
Chao Tang
Abstract:
Hydrophobicity is thought to be one of the primary forces driving the folding of proteins. On average, hydrophobic residues occur preferentially in the core, whereas polar residues tends to occur at the surface of a folded protein. By analyzing the known protein structures, we quantify the degree to which the hydrophobicity sequence of a protein correlates with its pattern of surface exposure. W…
▽ More
Hydrophobicity is thought to be one of the primary forces driving the folding of proteins. On average, hydrophobic residues occur preferentially in the core, whereas polar residues tends to occur at the surface of a folded protein. By analyzing the known protein structures, we quantify the degree to which the hydrophobicity sequence of a protein correlates with its pattern of surface exposure. We have assessed the statistical significance of this correlation for several hydrophobicity scales in the literature, and find that the computed correlations are significant but far from optimal. We show that this less than optimal correlation arises primarily from the large degree of mutations that naturally occurring proteins can tolerate. Lesser effects are due in part to forces other than hydrophobicity and we quantify this by analyzing the surface exposure distributions of all amino acids. Lastly we show that our database findings are consistent with those found from an off-lattice hydrophobic-polar model of protein folding.
△ Less
Submitted 8 December, 2003;
originally announced December 2003.
-
Finding regulatory modules through large-scale gene-expression data analysis
Authors:
Morten Kloster,
Chao Tang,
Ned Wingreen
Abstract:
The use of gene microchips has enabled a rapid accumulation of gene-expression data. One of the major challenges of analyzing this data is the diversity, in both size and signal strength, of the various modules in the gene regulatory networks of organisms. Based on the Iterative Signature Algorithm [Bergmann, S., Ihmels, J. and Barkai, N. (2002) Phys. Rev. E 67, 031902], we present an algorithm…
▽ More
The use of gene microchips has enabled a rapid accumulation of gene-expression data. One of the major challenges of analyzing this data is the diversity, in both size and signal strength, of the various modules in the gene regulatory networks of organisms. Based on the Iterative Signature Algorithm [Bergmann, S., Ihmels, J. and Barkai, N. (2002) Phys. Rev. E 67, 031902], we present an algorithm - the Progressive Iterative Signature Algorithm (PISA) - that, by sequentially eliminating modules, allows unsupervised identification of both large and small regulatory modules. We applied PISA to a large set of yeast gene-expression data, and, using the Gene Ontology annotation database as a reference, found that our algorithm is much better able to identify regulatory modules than methods based on high-throughput transcription-factor binding experiments or on comparative genomics.
△ Less
Submitted 19 January, 2004; v1 submitted 12 November, 2003;
originally announced November 2003.
-
The Yeast Cell-Cycle Network Is Robustly Designed
Authors:
Fangting Li,
Tao Long,
Ying Lu,
Qi Ouyang,
Chao Tang
Abstract:
The interactions between proteins, DNA, and RNA in living cells constitute molecular networks that govern various cellular functions. To investigate the global dynamical properties and stabilities of such networks, we studied the cell-cycle regulatory network of the budding yeast. With the use of a simple dynamical model, it was demonstrated that the cell-cycle network is extremely stable and ro…
▽ More
The interactions between proteins, DNA, and RNA in living cells constitute molecular networks that govern various cellular functions. To investigate the global dynamical properties and stabilities of such networks, we studied the cell-cycle regulatory network of the budding yeast. With the use of a simple dynamical model, it was demonstrated that the cell-cycle network is extremely stable and robust for its function. The biological stationary state--the G1 state--is a global attractor of the dynamics. The biological pathway--the cell-cycle sequence of protein states--is a globally attracting trajectory of the dynamics. These properties are largely preserved with respect to small perturbations to the network. These results suggest that cellular regulatory networks are robustly designed for their functions.
△ Less
Submitted 4 February, 2004; v1 submitted 9 October, 2003;
originally announced October 2003.
-
Flexibility of beta-sheets: Principal-component analysis of database protein structures
Authors:
Eldon G. Emberly,
Ranjan Mukhopadhyay,
Chao Tang,
Ned S. Wingreen
Abstract:
Protein folds are built primarily from the packing together of two types of structures: alpha-helices and beta-sheets. Neither structure is rigid, and the flexibility of helices and sheets is often important in determining the final fold ({\it e.g.}, coiled coils and beta-barrels). Recent work has quantified the flexibility of alpha-helices using a principal-component analysis (PCA) of database…
▽ More
Protein folds are built primarily from the packing together of two types of structures: alpha-helices and beta-sheets. Neither structure is rigid, and the flexibility of helices and sheets is often important in determining the final fold ({\it e.g.}, coiled coils and beta-barrels). Recent work has quantified the flexibility of alpha-helices using a principal-component analysis (PCA) of database helical structures (Emberly, 2003). Here, we extend the analysis to beta-sheet flexibility using PCA on a database of beta-sheet structures. For sheets of varying dimension and geometry, we find two dominant modes of flexibility: twist and bend. The distributions of amplitudes for these modes are found to be Gaussian and independent, suggesting that the PCA twist and bend modes can be identified as the soft elastic normal modes of sheets. We consider the scaling of mode eigenvalues with sheet size and find that parallel beta-sheets are more rigid than anti-parallel sheets over the entire range studied. Lastly, we discuss the application of our PCA results to modeling and design of beta-sheet proteins.
△ Less
Submitted 4 September, 2003;
originally announced September 2003.
-
Designability and Thermal Stability of Protein Structures
Authors:
Ned Wingreen,
Hao Li,
Chao Tang
Abstract:
Only about 1,000 qualitatively different protein folds are believed to exist in nature. Here, we review theoretical studies which suggest that some folds are intrinsically more designable than others, {\it i.e.} are lowest energy states of an unusually large number of sequences. The sequences associated with these folds are also found to be unusually thermally stable. The connection between high…
▽ More
Only about 1,000 qualitatively different protein folds are believed to exist in nature. Here, we review theoretical studies which suggest that some folds are intrinsically more designable than others, {\it i.e.} are lowest energy states of an unusually large number of sequences. The sequences associated with these folds are also found to be unusually thermally stable. The connection between highly designable structures and highly stable sequences is generally known as the "designability principle". The designability principle may help explain the small number of natural folds, and may also guide the design of new folds.
△ Less
Submitted 27 March, 2003;
originally announced March 2003.
-
Simulation and analysis of in vitro DNA evolution
Authors:
Morten Kloster,
Chao Tang
Abstract:
We study theoretically the in vitro evolution of a DNA sequence by binding to a transcription factor. Using a simple model of protein-DNA binding and available binding constants for the Mnt protein, we perform large-scale, realistic simulations of evolution starting from a single DNA sequence. We identify different parameter regimes characterized by distinct evolutionary behaviors. For each regi…
▽ More
We study theoretically the in vitro evolution of a DNA sequence by binding to a transcription factor. Using a simple model of protein-DNA binding and available binding constants for the Mnt protein, we perform large-scale, realistic simulations of evolution starting from a single DNA sequence. We identify different parameter regimes characterized by distinct evolutionary behaviors. For each regime we find analytical estimates which agree well with simulation results. For small population sizes, the DNA evolutional path is a random walk on a smooth landscape. While for large population sizes, the evolution dynamics can be well described by a mean-field theory. We also study how the details of the DNA-protein interaction affect the evolution.
△ Less
Submitted 21 January, 2003;
originally announced January 2003.
-
Origin of Scaling Behavior of Protein Packing Density: A Sequential Monte Carlo Study of Compact Long Chain Polymers
Authors:
**feng Zhang,
Rong Chen,
Chao Tang,
Jie Liang
Abstract:
Single domain proteins are thought to be tightly packed. The introduction of voids by mutations is often regarded as destabilizing. In this study we show that packing density for single domain proteins decreases with chain length. We find that the radius of gyration provides poor description of protein packing but the alpha contact number we introduce here characterize proteins well. We further…
▽ More
Single domain proteins are thought to be tightly packed. The introduction of voids by mutations is often regarded as destabilizing. In this study we show that packing density for single domain proteins decreases with chain length. We find that the radius of gyration provides poor description of protein packing but the alpha contact number we introduce here characterize proteins well. We further demonstrate that protein-like scaling relationship between packing density and chain length is observed in off-lattice self-avoiding walks. A key problem in studying compact chain polymer is the attrition problem: It is difficult to generate independent samples of compact long self-avoiding walks. We develop an algorithm based on the framework of sequential Monte Carlo and succeed in generating populations of compact long chain off-lattice polymers up to length $N=2,000$. Results based on analysis of these chain polymers suggest that maintaining high packing density is only characteristic of short chain proteins. We found that the scaling behavior of packing density with chain length of proteins is a generic feature of random polymers satisfying loose constraint in compactness. We conclude that proteins are not optimized by evolution to eliminate packing voids.
△ Less
Submitted 7 January, 2003;
originally announced January 2003.
-
Statistical mechanics of RNA folding: importance of alphabet size
Authors:
Ranjan Mukhopadhyay,
Eldon Emberly,
Chao Tang,
Ned S. Wingreen
Abstract:
We construct a minimalist model of RNA secondary-structure formation and use it to study the map** from sequence to structure. There are strong, qualitative differences between two-letter and four or six-letter alphabets. With only two kinds of bases, there are many alternate folding configurations, yielding thermodynamically stable ground-states only for a small set of structures of high desi…
▽ More
We construct a minimalist model of RNA secondary-structure formation and use it to study the map** from sequence to structure. There are strong, qualitative differences between two-letter and four or six-letter alphabets. With only two kinds of bases, there are many alternate folding configurations, yielding thermodynamically stable ground-states only for a small set of structures of high designability, i.e., total number of associated sequences. In contrast, sequences made from four bases, as found in nature, or six bases have far fewer competing folding configurations, resulting in a much greater average stability of the ground state.
△ Less
Submitted 4 August, 2003; v1 submitted 26 September, 2002;
originally announced September 2002.
-
Flexibility of $α$-helices: Results of a statistical analysis of database protein structures
Authors:
Eldon G. Emberly,
Ranjan Mukhopadhyay,
Ned S. Wingreen,
Chao Tang
Abstract:
$α$-helices stand out as common and relatively invariant secondary structural elements of proteins. However, $α$-helices are not rigid bodies and their deformations can be significant in protein function ({\it e.g.} coiled coils). To quantify the flexibility of $α$-helices we have performed a structural principal-component analysis of helices of different lengths from a representative set of pro…
▽ More
$α$-helices stand out as common and relatively invariant secondary structural elements of proteins. However, $α$-helices are not rigid bodies and their deformations can be significant in protein function ({\it e.g.} coiled coils). To quantify the flexibility of $α$-helices we have performed a structural principal-component analysis of helices of different lengths from a representative set of protein folds in the Protein Data Bank. We find three dominant modes of flexibility: two degenerate bend modes and one twist mode. The data are consistent with independent Gaussian distributions for each mode. The mode eigenvalues, which measure flexibility, follow simple scaling forms as a function of helix length. The dominant bend and twist modes and their harmonics are reproduced by a simple spring model, which incorporates hydrogen-bonding and excluded volume. As an application, we examine the amount of bend and twist in helices making up several coiled-coil proteins. Incorporation of $α$-helix flexibility into structure refinement and design is discussed.
△ Less
Submitted 25 September, 2002;
originally announced September 2002.
-
Structure Space of Model Proteins --A Principle Component Analysis
Authors:
Mehdi Yahyanejad,
Mehran Kardar,
Chao Tang
Abstract:
We study the space of all compact structures on a two-dimensional square lattice of size $N=6\times6$. Each structure is mapped onto a vector in $N$-dimensions according to a hydrophobic model. Previous work has shown that the designabilities of structures are closely related to the distribution of the structure vectors in the $N$-dimensional space, with highly designable structures predominantl…
▽ More
We study the space of all compact structures on a two-dimensional square lattice of size $N=6\times6$. Each structure is mapped onto a vector in $N$-dimensions according to a hydrophobic model. Previous work has shown that the designabilities of structures are closely related to the distribution of the structure vectors in the $N$-dimensional space, with highly designable structures predominantly found in low density regions. We use principal component analysis to probe and characterize the distribution of structure vectors, and find a non-uniform density with a single peak. Interestingly, the principal axes of this peak are almost aligned with Fourier eigenvectors, and the corresponding Fourier eigenvalues go to zero continuously at the wave-number for alternating patterns ($q=π$). These observations provide a step** stone for an analytic description of the distribution of structural points, and open the possibility of estimating designabilities of realistic structures by simply Fourier transforming the hydrophobicities of the corresponding sequences.
△ Less
Submitted 24 September, 2002; v1 submitted 10 July, 2002;
originally announced July 2002.
-
Designability of alpha-helical Proteins
Authors:
Eldon Emberly,
Ned Wingreen,
Chao Tang
Abstract:
A typical protein structure is a compact packing of connected alpha-helices and/or beta-strands. We have developed a method for generating the ensemble of compact structures a given set of helices and strands can form. The method is tested on structures composed of four alpha-helices connected by short turns. All such natural four-helix bundles that are connected by short turns seen in nature ar…
▽ More
A typical protein structure is a compact packing of connected alpha-helices and/or beta-strands. We have developed a method for generating the ensemble of compact structures a given set of helices and strands can form. The method is tested on structures composed of four alpha-helices connected by short turns. All such natural four-helix bundles that are connected by short turns seen in nature are reproduced to closer than 3.6 Angstroms per residue within the ensemble. Since structures with no natural counterpart may be targets for ab initio structure design, the designability of each structure in the ensemble -- defined as the number of sequences with that structure as their lowest energy state -- is evaluated using a hydrophobic energy. For the case of four alpha-helices, a small set of highly designable structures emerges, most of which have an analog among the known four-helix fold families, however several novel packings and topologies are identified.
△ Less
Submitted 20 June, 2002;
originally announced June 2002.
-
The Designability of Protein Structures: A Lattice-Model Study using the Miyazawa-Jernigan Matrix
Authors:
Hao Li,
Chao Tang,
Ned Wingreen
Abstract:
We study the designability of all compact 3x3x3 and 6x6 lattice-protein structures using the Miyazawa-Jernigan (MJ) matrix. The designability of a structure is the number of sequences that design the structure, i.e. sequences that have that structure as their unique lowest-energy state. Previous studies of hydrophobic-polar (HP) models showed a wide distribution of structure designabilities. Rec…
▽ More
We study the designability of all compact 3x3x3 and 6x6 lattice-protein structures using the Miyazawa-Jernigan (MJ) matrix. The designability of a structure is the number of sequences that design the structure, i.e. sequences that have that structure as their unique lowest-energy state. Previous studies of hydrophobic-polar (HP) models showed a wide distribution of structure designabilities. Recently, questions were raised concerning the use of a 2-letter (HP) code in such studies. Here we calculate designabilities using all 20 amino acids, with empirically determined interaction potentials (MJ matrix), and compare with HP model results. We find good qualitative agreement between the two models. In particular, highly designable structures in the HP model are also highly designable in the MJ model--and vice versa--with the associated sequences having enhanced thermodynamic stability.
△ Less
Submitted 22 March, 2002;
originally announced March 2002.
-
Identifying Proteins of High Designability via Surface-Exposure Patterns
Authors:
Eldon G. Emberly,
Jonathan Miller,
Chen Zeng,
Ned S. Wingreen,
Chao Tang
Abstract:
Using an off-lattice model, we fully enumerate folded conformations of polypeptide chains of up to N = 19 monomers. Structures are found to differ markedly in designability, defined as the number of sequences with that structure as a unique lowest-energy conformation. We find that designability is closely correlated with the pattern of surface exposure of the folded structure. For longer chains,…
▽ More
Using an off-lattice model, we fully enumerate folded conformations of polypeptide chains of up to N = 19 monomers. Structures are found to differ markedly in designability, defined as the number of sequences with that structure as a unique lowest-energy conformation. We find that designability is closely correlated with the pattern of surface exposure of the folded structure. For longer chains, complete enumeration of structures is impractical. Instead, structures can be randomly sampled, and relative designability estimated either from designability within the random sample, or directly from surface-exposure pattern. We compare the surface-exposure patterns of those structures identified as highly designable to the patterns of naturally occurring proteins.
△ Less
Submitted 10 October, 2001;
originally announced October 2001.
-
Emergence of highly-designable protein-backbone conformations in an off-lattice model
Authors:
J. Miller,
C. Zeng,
N. S. Wingreen,
C. Tang
Abstract:
Despite the variety of protein sizes, shapes, and backbone configurations found in nature, the design of novel protein folds remains an open problem. Within simple lattice models it has been shown that all structures are not equally suitable for design. Rather, certain structures are distinguished by unusually high designability: the number of amino-acid sequences for which they represent the un…
▽ More
Despite the variety of protein sizes, shapes, and backbone configurations found in nature, the design of novel protein folds remains an open problem. Within simple lattice models it has been shown that all structures are not equally suitable for design. Rather, certain structures are distinguished by unusually high designability: the number of amino-acid sequences for which they represent the unique ground state; sequences associated with such structures possess both robustness to mutation and thermodynamic stability. Here we report that highly designable backbone conformations also emerge in a realistic off-lattice model. The highly designable conformations of a chain of 23 amino acids are identified, and found to be remarkably insensitive to model parameters. While some of these conformations correspond closely to known natural protein folds, such as the zinc finger and the helix-turn-helix motifs, others do not resemble known folds and may be candidates for novel fold design.
△ Less
Submitted 17 September, 2001;
originally announced September 2001.
-
Fast Tree Search for Enumeration of a Lattice Model of Protein Folding
Authors:
Henry Cejtin,
Jan Edler,
Allan Gottlieb,
Robert Helling,
Hao Li,
James Philbin,
Chao Tang,
Ned Wingreen
Abstract:
Using a fast tree-searching algorithm and a Pentium cluster, we enumerated all the sequences and compact conformations (structures) for a protein folding model on a cubic lattice of size $4\times3\times3$. We used two types of amino acids -- hydrophobic (H) and polar (P) -- to make up the sequences, so there were $2^{36} \approx 6.87 \times 10^{10}$ different sequences. The total number of disti…
▽ More
Using a fast tree-searching algorithm and a Pentium cluster, we enumerated all the sequences and compact conformations (structures) for a protein folding model on a cubic lattice of size $4\times3\times3$. We used two types of amino acids -- hydrophobic (H) and polar (P) -- to make up the sequences, so there were $2^{36} \approx 6.87 \times 10^{10}$ different sequences. The total number of distinct structures was 84,731,192. We made use of a simple solvation model in which the energy of a sequence folded into a structure is minus the number of hydrophobic amino acids in the ``core'' of the structure. For every sequence, we found its ground state or ground states, i.e., the structure or structures for which its energy is lowest. About 0.3% of the sequences have a unique ground state. The number of structures that are unique ground states of at least one sequence is 2,662,050, about 3% of the total number of structures. However, these ``designable'' structures differ drastically in their designability, defined as the number of sequences whose unique ground state is that structure. To understand this variation in designability, we studied the distribution of structures in a high dimensional space in which each structure is represented by a string of 1's and 0's, denoting core and surface sites, respectively.
△ Less
Submitted 26 July, 2001;
originally announced July 2001.
-
Symmetry and designability for lattice protein models
Authors:
Tairan Wang,
Jonathan Miller,
Ned S. Wingreen,
Chao Tang,
Ken A. Dill
Abstract:
Native protein folds often have a high degree of symmetry. We study the relationship between the symmetries of native proteins, and their designabilities -- how many different sequences encode a given native structure. Using a two-dimensional lattice protein model based on hydrophobicity, we find that those native structures that are encoded by the largest number of different sequences have high…
▽ More
Native protein folds often have a high degree of symmetry. We study the relationship between the symmetries of native proteins, and their designabilities -- how many different sequences encode a given native structure. Using a two-dimensional lattice protein model based on hydrophobicity, we find that those native structures that are encoded by the largest number of different sequences have high symmetry. However only certain symmetries are enhanced, e.g. x/y-mirror symmetry and $180^o$ rotation, while others are suppressed. If it takes a large number of mutations to destabilize the native state of a protein, then, by definition, the state is highly designable. Hence, our findings imply that insensitivity to mutation implies high symmetry. It appears that the relationship between designability and symmetry results because protein substructures are also designable. Native protein folds may therefore be symmetric because they are composed of repeated designable substructures.
△ Less
Submitted 23 June, 2000;
originally announced June 2000.
-
Simple Models of the Protein Folding Problem
Authors:
Chao Tang
Abstract:
The protein folding problem has attracted an increasing attention from physicists. The problem has a flavor of statistical mechanics, but possesses the most common feature of most biological problems -- the profound effects of evolution. I will give an introduction to the problem, and then focus on some recent work concerning the so-called ``designability principle''. The designability of a stru…
▽ More
The protein folding problem has attracted an increasing attention from physicists. The problem has a flavor of statistical mechanics, but possesses the most common feature of most biological problems -- the profound effects of evolution. I will give an introduction to the problem, and then focus on some recent work concerning the so-called ``designability principle''. The designability of a structure is measured by the number of sequences that have that structure as their unique ground state. Structures differ drastically in terms of their designability; highly designable structures emerge with a number of associated sequences much larger than the average. These highly designable structures 1) possess ``proteinlike'' secondary structures and motifs, 2) are thermodynamically more stable, and 3) fold faster than other structures. These results suggest that protein structures are selected in nature because they are readily designed and stable against mutations, and that such selection simultaneously leads to thermodynamic stability and foldability. According to this picture, a key to the protein folding problem is to understand the emergence and the properties of the highly designable structures.
△ Less
Submitted 26 December, 1999;
originally announced December 1999.
-
Designability, thermodynamic stability, and dynamics in protein folding: a lattice model study
Authors:
Régis Mélin,
Hao Li,
Ned S. Wingreen,
Chao Tang
Abstract:
In the framework of a lattice-model study of protein folding, we investigate the interplay between designability, thermodynamic stability, and kinetics. To be ``protein-like'', heteropolymers must be thermodynamically stable, stable against mutating the amino-acid sequence, and must be fast folders. We find two criteria which, together, guarantee that a sequence will be ``protein like'': i) the…
▽ More
In the framework of a lattice-model study of protein folding, we investigate the interplay between designability, thermodynamic stability, and kinetics. To be ``protein-like'', heteropolymers must be thermodynamically stable, stable against mutating the amino-acid sequence, and must be fast folders. We find two criteria which, together, guarantee that a sequence will be ``protein like'': i) the ground state is a highly designable stucture, i. e. the native structure is the ground state of a large number of sequences, and ii) the sequence has a large $Δ/Γ$ ratio, $Δ$ being the average energy separation between the ground state and the excited compact conformations, and $Γ$ the dispersion in energy of excited compact conformations. These two criteria are not incompatible since, on average, sequences whose ground states are highly designable structures have large $Δ/Γ$ values. These two criteria require knowledge only of the compact-state spectrum. These claims are substantiated by the study of 45 sequences, with various values of $Δ/Γ$ and various degrees of designability, by means of a Borst-Kalos-Lebowitz algorithm, and the Ferrenberg-Swendsen histogram optimization method. Finally, we report on the reasons for slow folding. A comparison between a very slow folding sequence, an average folding one and a fast folding one suggests that slow folding originates from a proliferation of nearly compact low-energy conformations, not present for fast folders.
△ Less
Submitted 16 June, 1998;
originally announced June 1998.
-
Are Protein Folds Atypical?
Authors:
Hao Li,
Chao Tang,
Ned S. Wingreen
Abstract:
Protein structures are a very special class among all possible structures. It was suggested that a ``designability principle'' plays a crucial role in nature's selection of protein sequences and structures. Here we provide a theoretical base for such a selection principle, using a novel formulation of the protein folding problem based on hydrophobic interactions. A structure is reduced to a stri…
▽ More
Protein structures are a very special class among all possible structures. It was suggested that a ``designability principle'' plays a crucial role in nature's selection of protein sequences and structures. Here we provide a theoretical base for such a selection principle, using a novel formulation of the protein folding problem based on hydrophobic interactions. A structure is reduced to a string of 0's and 1's which represent the surface and core sites, respectively, as the backbone is traced. Each structure is therefore associated with one point in a high dimensional space. Sequences are represented by strings of their hydrophobicities and thus can be mapped into the same space. A sequence which lies closer to a particular structure in this space than to any other structures will have that structure as its ground state. Atypical structures, namely those far away from other structures in the high dimensional space, have more sequences which fold into them, and are thermodynamically more stable. We argue that the most common folds of proteins are the most atypical in the space of possible structures.
△ Less
Submitted 5 September, 1997;
originally announced September 1997.
-
Why Do Proteins Look Like Proteins?
Authors:
Hao Li,
Robert Helling,
Chao Tang,
Ned Wingreen
Abstract:
Protein structures in nature often exhibit a high degree of regularity (secondary structures, tertiary symmetries, etc.) absent in random compact conformations. We demonstrate in a simple lattice model of protein folding that structural regularities are related to high designability and evolutionary stability. We measure the designability of each compact structure by the number of sequences whic…
▽ More
Protein structures in nature often exhibit a high degree of regularity (secondary structures, tertiary symmetries, etc.) absent in random compact conformations. We demonstrate in a simple lattice model of protein folding that structural regularities are related to high designability and evolutionary stability. We measure the designability of each compact structure by the number of sequences which can design the structure, i.e., which possess the structure as their nondegenerate ground state. We find that compact structures are drastically different in terms of their designability; highly designable structures emerge with a number of associated sequences much larger than the average. These structures are found to have ``protein like'' secondary structure and even tertiary symmetries. In addition, they are also thermodynamically more stable than ordinary structures. These results suggest that protein structures are selected because they are easy to design and stable against mutations, and that such a selection simutaneously leads to thermodynamic stability.
△ Less
Submitted 5 March, 1996; v1 submitted 2 March, 1996;
originally announced March 1996.
-
Nature of Driving Force for Protein Folding -- A Result From Analyzing the Statistical Potential
Authors:
Hao Li,
Chao Tang,
Ned Wingreen
Abstract:
In a statistical approach to protein structure analysis, Miyazawa and Jernigan (MJ) derived a $20\times 20$ matrix of inter-residue contact energies between different types of amino acids. Using the method of eigenvalue decomposition, we find that the MJ matrix can be accurately reconstructed from its first two principal component vectors as $M_{ij}=C_0+C_1(q_i+q_j)+C_2 q_i q_j$, with constant…
▽ More
In a statistical approach to protein structure analysis, Miyazawa and Jernigan (MJ) derived a $20\times 20$ matrix of inter-residue contact energies between different types of amino acids. Using the method of eigenvalue decomposition, we find that the MJ matrix can be accurately reconstructed from its first two principal component vectors as $M_{ij}=C_0+C_1(q_i+q_j)+C_2 q_i q_j$, with constant $C$'s, and 20 $q$ values associated with the 20 amino acids. This regularity is due to hydrophobic interactions and a force of demixing, the latter obeying Hildebrand's solubility theory of simple liquids.
△ Less
Submitted 4 January, 1997; v1 submitted 13 December, 1995;
originally announced December 1995.