Search | arXiv e-print repository

ALPHAGMUT: A Rationale-Guided Alpha Shape Graph Neural Network to Evaluate Mutation Effects

Authors: Boshen Wang, Bowei Ye, Lin Xu, Jie Liang

Abstract: In silico methods evaluating the mutation effects of missense mutations are providing an important approach for understanding mutations in personal genomes and identifying disease-relevant biomarkers. However, existing methods, including deep learning methods, heavily rely on sequence-aware information, and do not fully leverage the potential of available 3D structural information. In addition, th… ▽ More In silico methods evaluating the mutation effects of missense mutations are providing an important approach for understanding mutations in personal genomes and identifying disease-relevant biomarkers. However, existing methods, including deep learning methods, heavily rely on sequence-aware information, and do not fully leverage the potential of available 3D structural information. In addition, these methods may exhibit an inability to predict mutations in domains difficult to formulate sequence-based embeddings. In this study, we introduce a novel rationale-guided graph neural network AlphaGMut to evaluate mutation effects and to distinguish pathogenic mutations from neutral mutations. We compute the alpha shapes of protein structures to obtain atomic-resolution edge connectivities and map them to an accurate residue-level graph representation. We then compute structural-, topological-, biophysical-, and sequence properties of the mutation sites, which are assigned as node attributes in the graph. These node attributes could effectively guide the graph neural network to learn the difference between pathogenic and neutral mutations using k-hop message passing with a short training period. We demonstrate that AlphaGMut outperforms state-of-the-art methods, including DeepMind's AlphaMissense, in many performance metrics. In addition, AlphaGMut has the advantage of performing well in alignment-free settings, which provides broader prediction coverage and better generalization compared to current methods requiring deep sequence-aware information. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 2 figures, 2 tables

arXiv:2403.12284 [pdf, other]

The Wreaths of KHAN: Uniform Graph Feature Selection with False Discovery Rate Control

Authors: Jiajun Liang, Yue Liu, Doudou Zhou, Sinian Zhang, Junwei Lu

Abstract: Graphical models find numerous applications in biology, chemistry, sociology, neuroscience, etc. While substantial progress has been made in graph estimation, it remains largely unexplored how to select significant graph signals with uncertainty assessment, especially those graph features related to topological structures including cycles (i.e., wreaths), cliques, hubs, etc. These features play a… ▽ More Graphical models find numerous applications in biology, chemistry, sociology, neuroscience, etc. While substantial progress has been made in graph estimation, it remains largely unexplored how to select significant graph signals with uncertainty assessment, especially those graph features related to topological structures including cycles (i.e., wreaths), cliques, hubs, etc. These features play a vital role in protein substructure analysis, drug molecular design, and brain network connectivity analysis. To fill the gap, we propose a novel inferential framework for general high dimensional graphical models to select graph features with false discovery rate controlled. Our method is based on the maximum of $p$-values from single edges that comprise the topological feature of interest, thus is able to detect weak signals. Moreover, we introduce the $K$-dimensional persistent Homology Adaptive selectioN (KHAN) algorithm to select all the homological features within $K$ dimensions with the uniform control of the false discovery rate over continuous filtration levels. The KHAN method applies a novel discrete Gram-Schmidt algorithm to select statistically significant generators from the homology group. We apply the structural screening method to identify the important residues of the SARS-CoV-2 spike protein during the binding process to the ACE2 receptors. We score the residues for all domains in the spike protein by the $p$-value weighted filtration level in the network persistent homology for the closed, partially open, and open states and identify the residues crucial for protein conformational changes and thus being potential targets for inhibition. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2311.13372 [pdf]

MRGazer: Decoding Eye Gaze Points from Functional Magnetic Resonance Imaging in Individual Space

Authors: Xiuwen Wu, Rongjie Hu, Jie Liang, Yanming Wang, Bensheng Qiu, Xiaoxiao Wang

Abstract: Eye-tracking research has proven valuable in understanding numerous cognitive functions. Recently, Frey et al. provided an exciting deep learning method for learning eye movements from fMRI data. However, it needed to co-register fMRI into standard space to obtain eyeballs masks, and thus required additional templates and was time consuming. To resolve this issue, in this paper, we propose a frame… ▽ More Eye-tracking research has proven valuable in understanding numerous cognitive functions. Recently, Frey et al. provided an exciting deep learning method for learning eye movements from fMRI data. However, it needed to co-register fMRI into standard space to obtain eyeballs masks, and thus required additional templates and was time consuming. To resolve this issue, in this paper, we propose a framework named MRGazer for predicting eye gaze points from fMRI in individual space. The MRGazer consisted of eyeballs extraction module and a residual network-based eye gaze prediction. Compared to the previous method, the proposed framework skips the fMRI co-registration step, simplifies the processing protocol and achieves end-to-end eye gaze regression. The proposed method achieved superior performance in a variety of eye movement tasks than the co-registration-based method, and delivered objective results within a shorter time (~ 0.02 Seconds for each volume) than prior method (~0.3 Seconds for each volume). △ Less

Submitted 27 November, 2023; v1 submitted 22 November, 2023; originally announced November 2023.

arXiv:2204.03742 [pdf, other]

doi 10.1016/j.media.2022.102699

Mitosis domain generalization in histopathology images -- The MIDOG challenge

Authors: Marc Aubreville, Nikolas Stathonikos, Christof A. Bertram, Robert Klopleisch, Natalie ter Hoeve, Francesco Ciompi, Frauke Wilm, Christian Marzahl, Taryn A. Donovan, Andreas Maier, Jack Breen, Nishant Ravikumar, You** Chung, **ah Park, Ramin Nateghi, Fattaneh Pourakpour, Rutger H. J. Fick, Saima Ben Hadj, Mostafa Jahanifar, Nasir Rajpoot, Jakob Dexl, Thomas Wittenberg, Satoshi Kondo, Maxime W. Lafarge, Viktor H. Koelzer , et al. (10 additional authors not shown)

Abstract: The density of mitotic figures within tumor tissue is known to be highly correlated with tumor proliferation and thus is an important marker in tumor grading. Recognition of mitotic figures by pathologists is known to be subject to a strong inter-rater bias, which limits the prognostic value. State-of-the-art deep learning methods can support the expert in this assessment but are known to strongly… ▽ More The density of mitotic figures within tumor tissue is known to be highly correlated with tumor proliferation and thus is an important marker in tumor grading. Recognition of mitotic figures by pathologists is known to be subject to a strong inter-rater bias, which limits the prognostic value. State-of-the-art deep learning methods can support the expert in this assessment but are known to strongly deteriorate when applied in a different clinical environment than was used for training. One decisive component in the underlying domain shift has been identified as the variability caused by using different whole slide scanners. The goal of the MICCAI MIDOG 2021 challenge has been to propose and evaluate methods that counter this domain shift and derive scanner-agnostic mitosis detection algorithms. The challenge used a training set of 200 cases, split across four scanning systems. As a test set, an additional 100 cases split across four scanning systems, including two previously unseen scanners, were given. The best approaches performed on an expert level, with the winning algorithm yielding an F_1 score of 0.748 (CI95: 0.704-0.781). In this paper, we evaluate and compare the approaches that were submitted to the challenge and identify methodological factors contributing to better performance. △ Less

Submitted 6 April, 2022; originally announced April 2022.

Comments: 19 pages, 9 figures, summary paper of the 2021 MICCAI MIDOG challenge

Journal ref: Medical Image Analysis 84 (2023) 102699

arXiv:2202.11551 [pdf, other]

SeqMapPDB: A Standalone Pipeline to Identify Representative Structures of Protein Sequences and Map** Residue Indices in Real-Time at Proteome Scale

Authors: Boshen Wang, Xue Lei, Wei Tian, Alan Perez-Rathke, Yan-Yuan Tseng, Jie Liang

Abstract: Motivation: 3D structures of proteins provide rich information for understanding their biochemical roles. Identifying the representative protein structures for protein sequences is essential for analysis of proteins at proteome scale. However, there are technical difficulties in identifying the representative structure of a given protein sequence and providing accurate map** of residue indices.… ▽ More Motivation: 3D structures of proteins provide rich information for understanding their biochemical roles. Identifying the representative protein structures for protein sequences is essential for analysis of proteins at proteome scale. However, there are technical difficulties in identifying the representative structure of a given protein sequence and providing accurate map** of residue indices. Existing databases of map** between structures and sequences are usually static that are not suitable for studying proteomes with frequent gene model revisions. They often do not provide reliable and consistent representative structures that maximizes sequence coverage. Furthermore, proteins isomers are usually not properly resolved. Results: To overcome these difficulties, we have developed a computational pipeline called SeqMapPDB to provide high-quality representative PDB structures of given sequences. It provides map** to structures that fully cover the sequences when available, or to the set of partial non-overlap** structural domains that maximally cover the query sequence. The residue indices are accurate mapped and isomeric proteins are resolved. SeqMapPDB is efficient and can rapidly carry out proteome-wide map** to the selected version of reference genomes in real-time. Furthermore, SeqMapPDB provides the flexibility of a stand-alone pipeline for large scale map** of in-house sequence and structure data. Availability: Our method is available at https://bitbucket.org/lianglabuic/seqmappdb with GNU GPL license. △ Less

Submitted 27 February, 2023; v1 submitted 23 February, 2022; originally announced February 2022.

Comments: 3 pages

arXiv:2104.02672 [pdf, other]

Exact Probability Landscapes of Stochastic Phenotype Switching in Feed-Forward Loops: Phase Diagrams of Multimodality

Authors: Anna Terebus, Farid Manuchehrfar, Youfang Cao, Jie Liang

Abstract: Feed-forward loops (FFLs) are among the most ubiquitously found motifs of reaction networks in nature. However, little is known about their stochastic behavior and the variety of network phenotypes they can exhibit. In this study, we provide full characterizations of the properties of stochastic multimodality of FFLs, and how switching between different network phenotypes are controlled. We have c… ▽ More Feed-forward loops (FFLs) are among the most ubiquitously found motifs of reaction networks in nature. However, little is known about their stochastic behavior and the variety of network phenotypes they can exhibit. In this study, we provide full characterizations of the properties of stochastic multimodality of FFLs, and how switching between different network phenotypes are controlled. We have computed the exact steady state probability landscapes of all eight types of coherent and incoherent FFLs using the finite-butter ACME algorithm, and quantified the exact topological features of their high-dimensional probability landscapes using persistent homology. Through analysis of the degree of multimodality for each of a set of 10,812 probability landscapes, where each landscape resides over 10^5-10^6 microstates, we have constructed comprehensive phase diagrams of all relevant behavior of FFL multimodality over broad ranges of input and regulation intensities, as well as different regimes of promoter binding dynamics. Our results show that with slow binding and unbinding dynamics of transcription factor to promoter, FFLs exhibit strong stochastic behavior that is very different from what would be inferred from deterministic models. In addition, input intensity play major roles in the phenotypes of FFLs: At weak input intensity, FFL exhibit monomodality, but strong input intensity may result in up to 6 stable phenotypes. Furthermore, we found that gene duplication can enlarge stable regions of specific multimodalities and enrich the phenotypic diversity of FFL networks, providing means for cells towards better adaptation to changing environment. Our results are directly applicable to analysis of behavior of FFLs in biological processes such as stem cell differentiation and for design of synthetic networks when certain phenotypic behavior is desired. △ Less

Submitted 7 April, 2021; v1 submitted 6 April, 2021; originally announced April 2021.

Comments: 25 pages, 9 figures

arXiv:2007.02511 [pdf]

doi 10.1093/nsr/nwab102

Less is More: Wiring-Economical Modular Networks Support Self-Sustained Firing-Economical Neural Avalanches for Efficient Processing

Authors: Junhao Liang, Sheng-Jun Wang, Changsong Zhou

Abstract: Brain network is remarkably cost-efficient while the fundamental physical and dynamical mechanisms underlying its economical optimization in network structure and activity are not clear. Here we study intricate cost-efficient interplay between structure and dynamics in biologically plausible spatial modular neuronal network models. We find that critical avalanche states from excitation-inhibition… ▽ More Brain network is remarkably cost-efficient while the fundamental physical and dynamical mechanisms underlying its economical optimization in network structure and activity are not clear. Here we study intricate cost-efficient interplay between structure and dynamics in biologically plausible spatial modular neuronal network models. We find that critical avalanche states from excitation-inhibition balance, under modular network topology with less wiring cost, can also achieve less costs in firing, but with strongly enhanced response sensitivity to stimuli. We derived mean-field equations that govern the macroscopic network dynamics through a novel approximate theory. The mechanism of low firing cost and stronger response in the form of critical avalanche is explained as a proximity to a Hopf bifurcation of the modules when increasing their connection density. Our work reveals the generic mechanism underlying the cost-efficient modular organization and critical dynamics widely observed in neural systems, providing insights to brain-inspired efficient computational designs. △ Less

Submitted 28 January, 2021; v1 submitted 5 July, 2020; originally announced July 2020.

Comments: 19 pages, 6 figures

Journal ref: National Science Review, nwab102, Published: 10 June 2021

arXiv:2006.02396 [pdf, other]

How initial distribution affects symmetry breaking induced by panic in ants: experiment and flee-pheromone model

Authors: Geng Li, Weijia Wang, Jiahui Lin, Zhiyang Huang, Jianqiang Liang, Huabo Wu, Jian** Wen, Zengru Di, Bertrand Roehner, Zhangang Han

Abstract: Collective esca** is a ubiquitous phenomenon in animal groups. Symmetry breaking caused by panic escape exhibits a shared feature across species that one exit is used more than the other when agents esca** from a closed space with two symmetrically located exists. Intuitively, one exit will be used more by more individuals close to it, namely there is an asymmetric distribution initially. We u… ▽ More Collective esca** is a ubiquitous phenomenon in animal groups. Symmetry breaking caused by panic escape exhibits a shared feature across species that one exit is used more than the other when agents esca** from a closed space with two symmetrically located exists. Intuitively, one exit will be used more by more individuals close to it, namely there is an asymmetric distribution initially. We used ant groups to investigate how initial distribution of colonies would influence symmetry breaking in collective esca**. Surprisingly, there was no positive correlation between symmetry breaking and the asymmetrically initial distribution, which was quite counter-intuitive. In the experiments, a flee stage was observed and accordingly a flee-pheromone model was introduced to depict this special behavior in the early stage of esca**. Simulation results fitted well with the experiment. Furthermore, the flee stage duration was calibrated quantitatively and the model reproduced the observation demonstrated by our previous work. This paper explicitly distinguished two stages in ant panic esca** for the first time, thus enhancing the understanding in esca** behavior of ant colonies. △ Less

Submitted 3 June, 2020; originally announced June 2020.

arXiv:2001.05626 [pdf]

doi 10.3389/fnsys.2020.580011

Hopf Bifurcation in Mean Field Explains Critical Avalanches in Excitation-Inhibition Balanced Neuronal Networks: A Mechanism for Multiscale Variability

Authors: Junhao Liang, Tianshou Zhou, Changsong Zhou

Abstract: Cortical neural circuits display highly irregular spiking in individual neurons but variably sized collective firing, oscillations and critical avalanches at the population level, all of which have functional importance for information processing. Theoretically, the balance of excitation and inhibition inputs is thought to account for spiking irregularity and critical avalanches may originate from… ▽ More Cortical neural circuits display highly irregular spiking in individual neurons but variably sized collective firing, oscillations and critical avalanches at the population level, all of which have functional importance for information processing. Theoretically, the balance of excitation and inhibition inputs is thought to account for spiking irregularity and critical avalanches may originate from an underlying phase transition. However, the theoretical reconciliation of these multilevel dynamic aspects in neural circuits remains an open question. Herein, we study excitation-inhibition (E-I) balanced neuronal network with biologically realistic synaptic kinetics. It can maintain irregular spiking dynamics with different levels of synchrony and critical avalanches emerge near the synchronous transition point. We propose a novel semi-analytical mean-field theory to derive the field equations governing the network macroscopic dynamics. It reveals that the E-I balanced state of the network manifesting irregular individual spiking is characterized by a macroscopic stable state, which can be either a fixed point or a periodic motion and the transition is predicted by a Hopf bifurcation in the macroscopic field. Furthermore, by analyzing public data, we find the coexistence of irregular spiking and critical avalanches in the spontaneous spiking activities of mouse cortical slice in vitro, indicating the universality of the observed phenomena. Our theory unveils the mechanism that permits complex neural activities in different spatiotemporal scales to coexist and elucidates a possible origin of the criticality of neural systems. It also provides a novel tool for analyzing the macroscopic dynamics of E-I balanced networks and its relationship to the microscopic counterparts, which can be useful for large-scale modeling and computation of cortical dynamics. △ Less

Submitted 3 July, 2020; v1 submitted 15 January, 2020; originally announced January 2020.

Journal ref: Front. Syst. Neurosci.14:580011. (2020)

arXiv:1810.07263 [pdf, ps, other]

doi 10.1063/1.5050808

Discrete Flux and Velocity Fields of Probability and Their Global Maps in Reaction Systems

Authors: Anna Terebus, Chun Liu, Jie Liang

Abstract: Stochasticity plays important roles in reaction systems. Vector fields of probability flux and velocity characterize time-varying and steady-state properties of these systems, including high probability paths, barriers, checkpoints among different stable regions, as well as mechanisms of dynamic switching among them. However, conventional fluxes on continuous space are ill-defined and are problema… ▽ More Stochasticity plays important roles in reaction systems. Vector fields of probability flux and velocity characterize time-varying and steady-state properties of these systems, including high probability paths, barriers, checkpoints among different stable regions, as well as mechanisms of dynamic switching among them. However, conventional fluxes on continuous space are ill-defined and are problematic when at boundaries of the state space or when copy numbers are small. By re-defining the derivative and divergence operators based on the discrete nature of reactions, we introduce new formulations of discrete fluxes. Our flux model fully accounts for the discreetness of both the state space and the jump processes of reactions. The reactional discrete flux satisfies the continuity equation and describes the behavior of the system evolving along directions of reactions. The species discrete flux directly describes the dynamic behavior in the state space of the reactants such as the transfer of probability mass. With the relationship between these two fluxes specified, we show how to construct time-evolving and steady-state global flow-maps of probability flux and velocity in the directions of every species at every microstate, and how they are related to the outflow and inflow of probability fluxes when tracing out reaction trajectories. We also describe how to impose proper conditions enabling exact quantification of flux and velocity in the boundary regions, without the difficulty of enforcing artificial reflecting conditions. We illustrate the computation of probability flux and velocity using three model systems, namely, the birth-death process, the bistable Schlögl model, and the oscillating Schnakenberg model. △ Less

Submitted 16 October, 2018; originally announced October 2018.

Comments: 21 pages, 5 figures

arXiv:1802.02540 [pdf]

Molecular Regulation of Histamine Synthesis

Authors: Hua Huang, Yapeng Li, **yi Liang, Fred D. Finkelman

Abstract: Histamine is a critical mediator of IgE/ cell-mediated anaphylaxis, a neurotransmitter and a regulator of gastric acid secretion. Histamine is a monoamine synthesized from the amino acid histidine through a reaction catalyzed by the enzyme histidine decarboxylase (HDC), which removes carboxyl group from histidine. Despite the importance of histamine, transcriptional regulation of HDC gene expressi… ▽ More Histamine is a critical mediator of IgE/ cell-mediated anaphylaxis, a neurotransmitter and a regulator of gastric acid secretion. Histamine is a monoamine synthesized from the amino acid histidine through a reaction catalyzed by the enzyme histidine decarboxylase (HDC), which removes carboxyl group from histidine. Despite the importance of histamine, transcriptional regulation of HDC gene expression in mammals is still poorly understood. In this Review, we focus on discussing advances in the understanding of molecular regulation of mammalian histamine synthesis. △ Less

Submitted 31 May, 2018; v1 submitted 7 February, 2018; originally announced February 2018.

Comments: 1.added references for introduction section; 2.added references and typos added for histamine-producing cells in mammals and stimuli that trigger histamine release; 3.typos added for section of histidine decarboxylase and histamine synthesis in mammals; 4.added references and typos added for section of hdc gene expression and histamine synthesis in basophils and mast cells. 5. added 2 figures

arXiv:1707.08236 [pdf, ps, other]

doi 10.1007/s11538-016-0149-1

State space truncation with quantified errors for accurate solutions to discrete Chemical Master Equation

Authors: Youfang Cao, Anna Terebus, Jie Liang

Abstract: The discrete chemical master equation (dCME) provides a general framework for studying stochasticity in mesoscopic reaction networks. Since its direct solution rapidly becomes intractable due to the increasing size of the state space, truncation of the state space is necessary for solving most dCMEs. It is therefore important to assess the consequences of state space truncations so errors can be q… ▽ More The discrete chemical master equation (dCME) provides a general framework for studying stochasticity in mesoscopic reaction networks. Since its direct solution rapidly becomes intractable due to the increasing size of the state space, truncation of the state space is necessary for solving most dCMEs. It is therefore important to assess the consequences of state space truncations so errors can be quantified and minimized. Here we describe a novel method for state space truncation. By partitioning a reaction network into multiple molecular equivalence groups (MEG), we truncate the state space by limiting the total molecular copy numbers in each MEG. We further describe a theoretical framework for analysis of the truncation error in the steady state probability landscape using reflecting boundaries. By aggregating the state space based on the usage of a MEG and constructing an aggregated Markov process, we show that the truncation error of a MEG can be asymptotically bounded by the probability of states on the reflecting boundary of the MEG. Furthermore, truncating states of an arbitrary MEG will not undermine the estimated error of truncating any other MEGs. We then provide an error estimate for networks with multiple MEGs. To rapidly determine the appropriate size of an arbitrary MEG, we introduce an a priori method to estimate the upper bound of its truncation error, which can be rapidly computed from reaction rates, without costly trial solutions of the dCME. We show results of applying our methods to four stochastic networks. We demonstrate how truncation errors and steady state probability landscapes can be computed using different sizes of the MEG(s) and how the results validate out theories. Overall, the novel state space truncation and error analysis methods developed here can be used to ensure accurate direct solutions to the dCME for a large class of stochastic networks. △ Less

Submitted 25 July, 2017; originally announced July 2017.

Comments: 41 pages, 6 figures

Journal ref: Bulletin of Mathematical Biology. 78 (2016) 617-661

arXiv:1707.08233 [pdf, ps, other]

doi 10.1137/15M1034180

Accurate Chemical Master Equation Solution Using Multi-Finite Buffers

Authors: Youfang Cao, Anna Terebus, Jie Liang

Abstract: The discrete chemical master equation (dCME) provides a fundamental framework for studying stochasticity in mesoscopic networks. Because of the multi-scale nature of many networks where reaction rates have large disparity, directly solving dCMEs is intractable due to the exploding size of the state space. It is important to truncate the state space effectively with quantified errors, so accurate s… ▽ More The discrete chemical master equation (dCME) provides a fundamental framework for studying stochasticity in mesoscopic networks. Because of the multi-scale nature of many networks where reaction rates have large disparity, directly solving dCMEs is intractable due to the exploding size of the state space. It is important to truncate the state space effectively with quantified errors, so accurate solutions can be computed. It is also important to know if all major probabilistic peaks have been computed. Here we introduce the Accurate CME (ACME) algorithm for obtaining direct solutions to dCMEs. With multi-finite buffers for reducing the state space by O(n!), exact steady-state and time-evolving network probability landscapes can be computed. We further describe a theoretical framework of aggregating microstates into a smaller number of macrostates by decomposing a network into independent aggregated birth and death processes, and give an a priori method for rapidly determining steady-state truncation errors. The maximal sizes of the finite buffers for a given error tolerance can also be pre-computed without costly trial solutions of dCMEs. We show exactly computed probability landscapes of three multi-scale networks, namely, a 6-node toggle switch, 11-node phage-lambda epigenetic circuit, and 16-node MAPK cascade network, the latter two with no known solutions. We also show how probabilities of rare events can be computed from first-passage times, another class of unsolved problems challenging for simulation-based techniques due to large separations in time scales. Overall, the ACME method enables accurate and efficient solutions of the dCME for a large class of networks. △ Less

Submitted 25 July, 2017; originally announced July 2017.

Comments: 44 pages, 13 figures

Journal ref: Multiscale Modeling and Simulation. 14 (2016) 923-963

arXiv:1611.00212 [pdf, other]

doi 10.1103/PhysRevE.100.032310

Non-trivial Resource Amount Requirement in the Early Stage for Containing Fatal Diseases

Authors: Xiaolong Chen, Tianshou Zhou, Ling Feng, Junhao Liang, Fredrik Liljeros, Shlomo Havlin, Yanqing Hu

Abstract: During an epidemic control, the containment of the disease is usually achieved through increasing devoted resource to shorten the duration of infectiousness. However, the impact of this resource expenditure has not been studied quantitatively. Using the well-documented cholera data, we observe empirically that the recovery rate which is related to the duration of infectiousness has a strong positi… ▽ More During an epidemic control, the containment of the disease is usually achieved through increasing devoted resource to shorten the duration of infectiousness. However, the impact of this resource expenditure has not been studied quantitatively. Using the well-documented cholera data, we observe empirically that the recovery rate which is related to the duration of infectiousness has a strong positive correlation with the average resource devoted to the infected individuals. By incorporating this relation we build a novel model and find that insufficient resource leads to an abrupt increase in the infected population size, which is in marked contrast with the continuous phase transitions believed previously. Counterintuitively, this abrupt phase transition is more pronounced in the less contagious diseases, which usually correspond to the most fatal ones. Furthermore, we find that even for a single infection source, public resource needs to meet a significant amount, which is proportional to the whole population size to ensure epidemic containment. Our findings provide a theoretical foundation for efficient epidemic containment strategies in the early stage. △ Less

Submitted 27 January, 2018; v1 submitted 1 November, 2016; originally announced November 2016.

Comments: 12 pages, 5 figures

Journal ref: Phys. Rev. E 100, 032310 (2019)

arXiv:1303.3057 [pdf, ps, other]

doi 10.1186/1752-0509-2-30

Optimal enumeration of state space of finitely buffered stochastic molecular networks and exact computation of steady state landscape probability

Authors: Youfang Cao, Jie Liang

Abstract: Stochasticity plays important roles in molecular networks when molecular concentrations are in the range of $0.1 μ$M to $10 n$M (about 100 to 10 copies in a cell). The chemical master equation provides a fundamental framework for studying these networks, and the time-varying landscape probability distribution over the full microstates provide a full characterization of the network dynamics. A comp… ▽ More Stochasticity plays important roles in molecular networks when molecular concentrations are in the range of $0.1 μ$M to $10 n$M (about 100 to 10 copies in a cell). The chemical master equation provides a fundamental framework for studying these networks, and the time-varying landscape probability distribution over the full microstates provide a full characterization of the network dynamics. A complete characterization of the space of the microstates is a prerequisite for obtaining the full landscape probability distribution of a network. However, there are neither closed-form solutions nor algorithms fully describing all microstates for a given molecular network. We have developed an algorithm that can exhaustively enumerate the microstates of a molecular network of small copy numbers under the finite buffer condition that the net gain in newly synthesized molecules is smaller than a predefined limit. We also describe a simple method for computing the exact mean or steady state landscape probability distribution over microstates. We show how the full landscape probability for the gene networks of the self-regulating gene and the toggle-switch in the steady state can be fully characterized. We also give an example using the MAPK cascade network. Our algorithm works for networks of small copy numbers buffered with a finite copy number of net molecules that can be synthesized, regardless of the reaction stoichiometry, and is optimal in both storage and time complexity. The buffer size is limited by the available memory or disk storage. Our algorithm is applicable to a class of biological networks when the copy numbers of molecules are small and the network is closed, or the network is open but the net gain in newly synthesized molecules does not exceed a predefined buffer capacity. △ Less

Submitted 12 March, 2013; originally announced March 2013.

Comments: 23 pages, 7 figures

Journal ref: BMC Systems Biology, 2008, 2:30:1-13

arXiv:1209.2911 [pdf]

doi 10.1007/s11427-014-4704-4

Methods for scoring the collective effect of SNPs: Minor alleles of common SNPs quantitatively affect traits/diseases and are under both positive and negative selection

Authors: Dejian Yuan, Zuobin Zhu, Xiaohua Tan, Jie Liang, Ceng Zeng, Jiegen Zhang, Jun Chen, Long Ma, Ayca Dogan, Gudrun Brockmann, Oliver Goldmann, Eva Medina, Amanda D. Rice, Richard W. Moyer, Xian Man, Ke Yi, Yanke Li, Qing Lu, Yimin Huang, Dapeng Wang, Jun Yu, Hui Guo, Kun Xia, Shi Huang

Abstract: Most common SNPs are popularly assumed to be neutral. We here developed novel methods to examine in animal models and humans whether extreme amount of minor alleles (MAs) carried by an individual may represent extreme trait values and common diseases. We analyzed panels of genetic reference populations and identified the MAs in each panel and the MA content (MAC) that each strain carried. We also… ▽ More Most common SNPs are popularly assumed to be neutral. We here developed novel methods to examine in animal models and humans whether extreme amount of minor alleles (MAs) carried by an individual may represent extreme trait values and common diseases. We analyzed panels of genetic reference populations and identified the MAs in each panel and the MA content (MAC) that each strain carried. We also analyzed 21 published GWAS datasets of human diseases and identified the MAC of each case or control. MAC was nearly linearly linked to quantitative variations in numerous traits in model organisms, including life span, tumor susceptibility, learning and memory, sensitivity to alcohol and anti-psychotic drugs, and two correlated traits poor reproductive fitness and strong immunity. Similarly, in Europeans or European Americans, enrichment of MAs of fast but not slow evolutionary rate was linked to autoimmune and numerous other diseases, including type 2 diabetes, Parkinson's disease, psychiatric disorders, alcohol and cocaine addictions, cancer, and less life span. Therefore, both high and low MAC correlated with extreme values in many traits, indicating stabilizing selection on most MAs. The methods here are broadly applicable and may help solve the missing heritability problem in complex traits and diseases. △ Less

Submitted 15 July, 2013; v1 submitted 12 September, 2012; originally announced September 2012.

Journal ref: Sci China Life Sci. 57:876-888. (2014)

arXiv:q-bio/0601029 [pdf, ps, other]

doi 10.1002/prot.20809

Empirical Potential Function for Simplified Protein Models: Combining Contact and Local Sequence-Structure Descriptors

Authors: **feng Zhang, Rong Chen, Jie Liang

Abstract: An effective potential function is critical for protein structure prediction and folding simulation. Simplified protein models such as those requiring only $C_α$ or backbone atoms are attractive because they enable efficient search of the conformational space. We show residue specific reduced discrete state models can represent the backbone conformations of proteins with small RMSD values. Howev… ▽ More An effective potential function is critical for protein structure prediction and folding simulation. Simplified protein models such as those requiring only $C_α$ or backbone atoms are attractive because they enable efficient search of the conformational space. We show residue specific reduced discrete state models can represent the backbone conformations of proteins with small RMSD values. However, no potential functions exist that are designed for such simplified protein models. In this study, we develop optimal potential functions by combining contact interaction descriptors and local sequence-structure descriptors. The form of the potential function is a weighted linear sum of all descriptors, and the optimal weight coefficients are obtained through optimization using both native and decoy structures. The performance of the potential function in test of discriminating native protein structures from decoys is evaluated using several benchmark decoy sets. Our potential function requiring only backbone atoms or $C_α$ atoms have comparable or better performance than several residue-based potential functions that require additional coordinates of side chain centers or coordinates of all side chain atoms. By reducing the residue alphabets down to size 5 for local structure-sequence relationship, the performance of the potential function can be further improved. Our results also suggest that local sequence-structure correlation may play important role in reducing the entropic cost of protein folding. △ Less

Submitted 19 January, 2006; originally announced January 2006.

Comments: 20 pages, 5 figures, 4 tables. In press, Proteins

arXiv:q-bio/0601027 [pdf, ps, other]

doi 10.1016/j.jmb.2005.09.094

Interstrand pairing patterns in $β$-barrel membrane proteins: the positive-outside rule, aromatic rescue, and strand registration prediction

Authors: Ronald Jackups, Jr., Jie Liang

Abstract: $β$-barrel membrane proteins are found in the outer membrane of gram-negative bacteria, mitochondria, and chloroplasts. We have developed probabilistic models to quantify propensities of residues for different spatial locations and for interstrand pairwise contact interactions involving strong H-bonds, side-chain interactions, and weak H-bonds. The propensity values and p-values measuring statis… ▽ More $β$-barrel membrane proteins are found in the outer membrane of gram-negative bacteria, mitochondria, and chloroplasts. We have developed probabilistic models to quantify propensities of residues for different spatial locations and for interstrand pairwise contact interactions involving strong H-bonds, side-chain interactions, and weak H-bonds. The propensity values and p-values measuring statistical significance are calculated exactly by analytical formulae we have developed. Contrary to the ``positive-inside'' rule for helical membrane proteins, $β$-barrel membrane proteins follow a significant albeit weaker ``positive-outside'' rule, in that the basic residues Arg and Lys are disproportionately favored in the extracellular cap region and disfavored in the periplasmic cap region. Different residue pairs prefer strong backbone H-bonded interstrand pairings (e.g. Gly-Aromatic) or non-H-bonded pairings (e.g. Aromatic-Aromatic). In addition, Tyr and Phe participate in aromatic rescue by shielding Gly from polar environments. These propensities can be used to predict the registration of strand pairs, an important task for the structure prediction of $β$-barrel membrane proteins. Our accuracy of 44% is considerably better than random (7%) and other studies. Our results imply several experiments that can help to elucidate the mechanisms of in vitro and in vivo folding of $β$-barrel membrane proteins. See supplementary material after the bibliography for detailed techniques. △ Less

Submitted 19 January, 2006; originally announced January 2006.

Comments: 26 pages, 4 figures, and 4 tables

Journal ref: J. Mol. Biol. (2005) 354:979--993

arXiv:q-bio/0601026 [pdf, ps, other]

doi 10.1007/978-0-387-68372-0_3

Knowledge-based energy functions for computational studies of proteins

Authors: Xiang Li, Jie Liang

Abstract: This chapter discusses theoretical framework and methods for develo** knowledge-based potential functions essential for protein structure prediction, protein-protein interaction, and protein sequence design. We discuss in some details about the Miyazawa-Jernigan contact statistical potential, distance-dependent statistical potentials, as well as geometric statistical potentials. We also descri… ▽ More This chapter discusses theoretical framework and methods for develo** knowledge-based potential functions essential for protein structure prediction, protein-protein interaction, and protein sequence design. We discuss in some details about the Miyazawa-Jernigan contact statistical potential, distance-dependent statistical potentials, as well as geometric statistical potentials. We also describe a geometric model for develo** both linear and non-linear potential functions by optimization. Applications of knowledge-based potential functions in protein-decoy discrimination, in protein-protein interactions, and in protein design are then described. Several issues of knowledge-based potential functions are finally discussed. △ Less

Submitted 19 January, 2006; originally announced January 2006.

Comments: 57 pages, 6 figures. To be published in a book by Springer

arXiv:q-bio/0601020 [pdf, ps, other]

doi 10.1007/978-0-387-68372-0_6

Computation of protein geometry and its applications: Packing and function prediction

Authors: Jie Liang

Abstract: This chapter discusses geometric models of biomolecules and geometric constructs, including the union of ball model, the weigthed Voronoi diagram, the weighted Delaunay triangulation, and the alpha shapes. These geometric constructs enable fast and analytical computaton of shapes of biomoleculres (including features such as voids and pockets) and metric properties (such as area and volume). The… ▽ More This chapter discusses geometric models of biomolecules and geometric constructs, including the union of ball model, the weigthed Voronoi diagram, the weighted Delaunay triangulation, and the alpha shapes. These geometric constructs enable fast and analytical computaton of shapes of biomoleculres (including features such as voids and pockets) and metric properties (such as area and volume). The algorithms of Delaunay triangulation, computation of voids and pockets, as well volume/area computation are also described. In addition, applications in packing analysis of protein structures and protein function prediction are also discussed. △ Less

Submitted 14 January, 2006; originally announced January 2006.

Comments: 32 pages, 9 figures

arXiv:q-bio/0601019 [pdf, ps, other]

doi 10.1093/molbev/msj048

Estimation of Amino Acid Residue Substitution Rates at Local Spatial Regions and Application in Protein Function Inference: A Bayesian Monte Carlo Approach

Authors: Yan Y. Tseng, Jie Liang

Abstract: The amino acid sequences of proteins provide rich information for inferring distant phylogenetic relationships and for predicting protein functions. Estimating the rate matrix of residue substitutions from amino acid sequences is also important because the rate matrix can be used to develop scoring matrices for sequence alignment. Here we use a continuous time Markov process to model the substit… ▽ More The amino acid sequences of proteins provide rich information for inferring distant phylogenetic relationships and for predicting protein functions. Estimating the rate matrix of residue substitutions from amino acid sequences is also important because the rate matrix can be used to develop scoring matrices for sequence alignment. Here we use a continuous time Markov process to model the substitution rates of residues and develop a Bayesian Markov chain Monte Carlo method for rate estimation. We validate our method using simulated artificial protein sequences. Because different local regions such as binding surfaces and the protein interior core experience different selection pressures due to functional or stability constraints, we use our method to estimate the substitution rates of local regions. Our results show that the substitution rates are very different for residues in the buried core and residues on the solvent exposed surfaces. In addition, the rest of the proteins on the binding surfaces also have very different substitution rates from residues. Based on these findings, we further develop a method for protein function prediction by surface matching using scoring matrices derived from estimated substitution rates for residues located on the binding surfaces. We show with examples that our method is effective in identifying functionally related proteins that have overall low sequence identity, a task known to be very challenging. △ Less

Submitted 13 January, 2006; originally announced January 2006.

Comments: 27 pages, 7 figures

Journal ref: Mol Biol Evol. 2006 Feb;23(2):421-36. Epub 2005 Oct 26

arXiv:q-bio/0601018 [pdf, ps, other]

doi 10.1103/PhysRevLett.96.058106

Protein folding dynamics via quantification of kinematic energy landscape

Authors: Sëma Kachalo, Hsiao-Mei Lu, Jie Liang

Abstract: We study folding dynamics of protein-like sequences on square lattice using physical move set that exhausts all possible conformational changes. By analytically solving the master equation, we follow the time-dependent probabilities of occupancy of all 802,075 conformations of 16-mers over 7-orders of time span. We find that (i) folding rates of these protein-like sequences of same length can di… ▽ More We study folding dynamics of protein-like sequences on square lattice using physical move set that exhausts all possible conformational changes. By analytically solving the master equation, we follow the time-dependent probabilities of occupancy of all 802,075 conformations of 16-mers over 7-orders of time span. We find that (i) folding rates of these protein-like sequences of same length can differ by 4-orders of magnitude, (ii) folding rates of sequences of the same conformation can differ by a factor of 190, and (iii) parameters of the native structures, designability, and thermodynamic properties are weak predictors of the folding rates, rather, basin analysis of the kinematic energy landscape defined by the moves can provide excellent account of the observed folding rates. △ Less

Submitted 12 January, 2006; originally announced January 2006.

Comments: 4 pages, 4 figures

arXiv:q-bio/0407040 [pdf, ps, other]

Develo** optimal nonlinear scoring function for protein design

Authors: Changyu Hu, Xiang Li, Jie Liang

Abstract: Motivation. Protein design aims to identify sequences compatible with a given protein fold but incompatible to any alternative folds. To select the correct sequences and to guide the search process, a design scoring function is critically important. Such a scoring function should be able to characterize the global fitness landscape of many proteins simultaneously. Results. To find optimal desi… ▽ More Motivation. Protein design aims to identify sequences compatible with a given protein fold but incompatible to any alternative folds. To select the correct sequences and to guide the search process, a design scoring function is critically important. Such a scoring function should be able to characterize the global fitness landscape of many proteins simultaneously. Results. To find optimal design scoring functions, we introduce two geometric views and propose a formulation using mixture of nonlinear Gaussian kernel functions. We aim to solve a simplified protein sequence design problem. Our goal is to distinguish each native sequence for a major portion of representative protein structures from a large number of alternative decoy sequences, each a fragment from proteins of different fold. Our scoring function discriminate perfectly a set of 440 native proteins from 14 million sequence decoys. We show that no linear scoring function can succeed in this task. In a blind test of unrelated proteins, our scoring function misclassfies only 13 native proteins out of 194. This compares favorably with about 3-4 times more misclassifications when optimal linear functions reported in literature are used. We also discuss how to develop protein folding scoring function. △ Less

Submitted 29 July, 2004; originally announced July 2004.

Comments: 25 pages, 6 figures, 7 tables. Accepted by Bioinformatics

arXiv:q-bio/0407039 [pdf, ps, other]

doi 10.1109/IEMBS.2004.1403795

Order independent structural alignment of circularly permuted proteins

Authors: T. Andrew Binkowski, Bhaskar DasGupta, Jie Liang

Abstract: Circular permutation connects the N and C termini of a protein and concurrently cleaves elsewhere in the chain, providing an important mechanism for generating novel protein fold and functions. However, their in genomes is unknown because current detection methods can miss many occurances, mistaking random repeats as circular permutation. Here we develop a method for detecting circularly permute… ▽ More Circular permutation connects the N and C termini of a protein and concurrently cleaves elsewhere in the chain, providing an important mechanism for generating novel protein fold and functions. However, their in genomes is unknown because current detection methods can miss many occurances, mistaking random repeats as circular permutation. Here we develop a method for detecting circularly permuted proteins from structural comparison. Sequence order independent alignment of protein structures can be regarded as a special case of the maximum-weight independent set problem, which is known to be computationally hard. We develop an efficient approximation algorithm by repeatedly solving relaxations of an appropriate intermediate integer programming formulation, we show that the approximation ratio is much better then the theoretical worst case ratio of $r = 1/4$. Circularly permuted proteins reported in literature can be identified rapidly with our method, while they escape the detection by publicly available servers for structural alignment. △ Less

Submitted 29 July, 2004; originally announced July 2004.

Comments: 5 pages, 3 figures, Accepted by IEEE-EMBS 2004 Conference Proceedings

arXiv:q-bio/0407030 [pdf, ps, other]

doi 10.1109/IEMBS.2004.1403844

Potential function of simplified protein models for discriminating native proteins from decoys: Combining contact interaction and local sequence-dependent geometry

Authors: **feng Zhang, Rong Chen, Jie Liang

Abstract: An effective potential function is critical for protein structure prediction and folding simulation. For simplified models of proteins where coordinates of only $C_α$ atoms need to be specified, an accurate potential function is important. Such a simplified model is essential for efficient search of conformational space. In this work, we present a formulation of potential function for simplified… ▽ More An effective potential function is critical for protein structure prediction and folding simulation. For simplified models of proteins where coordinates of only $C_α$ atoms need to be specified, an accurate potential function is important. Such a simplified model is essential for efficient search of conformational space. In this work, we present a formulation of potential function for simplified representations of protein structures. It is based on the combination of descriptors derived from residue-residue contact and sequence-dependent local geometry. The optimal weight coefficients for contact and local geometry is obtained through optimization by maximizing margins among native and decoy structures. The latter are generated by chain growth and by gapless threading. The performance of the potential function in blind test of discriminating native protein structures from decoys is evaluated using several benchmark decoy sets. This potential function have comparable or better performance than several residue-based potential functions that require in addition coordinates of side chain centers or coordinates of all side chain atoms. △ Less

Submitted 22 July, 2004; originally announced July 2004.

Comments: 4 pages, 2 figures, Accepted by 26th IEEE-EMBS Conference, San Francisco

arXiv:q-bio/0404013 [pdf, ps, other]

doi 10.1063/1.1756573

Importance of chirality and reduced flexibility of protein side chains: A study with square and tetrahedral lattice models

Authors: **feng Zhang, Yu Chen, Rong Chen, Jie Liang

Abstract: In simple models side chains are often represented implicitly (e.g., by spin-states) or simplified as one atom. We study side chain effects using square lattice and tetrahedral lattice models, with explicitly side chains of two atoms. We distinguish effects due to chirality and effects due to side chain flexibilities, since residues in proteins are L-residues, and their side chains adopt differe… ▽ More In simple models side chains are often represented implicitly (e.g., by spin-states) or simplified as one atom. We study side chain effects using square lattice and tetrahedral lattice models, with explicitly side chains of two atoms. We distinguish effects due to chirality and effects due to side chain flexibilities, since residues in proteins are L-residues, and their side chains adopt different rotameric states. Short chains are enumerated exhaustively. For long chains, we sample effectively rare events (eg, compact conformations) and obtain complete pictures of ensemble properties of these models at all compactness region. We find that both chirality and reduced side chain flexibility lower the folding entropy significantly for globally compact conformations, suggesting that they are important properties of residues to ensure fast folding and stable native structure. This corresponds well with our finding that natural amino acid residues have reduced effective flexibility, as evidenced by analysis of rotamer libraries and side chain rotatable bonds. We further develop a method calculating the exact side-chain entropy for a given back bone structure. We show that simple rotamer counting often underestimates side chain entropy significantly, and side chain entropy does not always correlate well with main chain packing. Among compact backbones with maximum side chain entropy, helical structures emerges as the dominating configurations. Our results suggest that side chain entropy may be an important factor contributing to the formation of alpha helices for compact conformations. △ Less

Submitted 10 April, 2004; originally announced April 2004.

Comments: 16 pages, 15 figures, 2 tables. Accepted by J. Chem. Phys

arXiv:q-bio/0311011 [pdf, ps, other]

Are residues in a protein folding nucleus evolutionarily conserved?

Authors: Yan Yuan Tseng, Jie Liang

Abstract: It is important to understand how protein folding and evolution influences each other. Several studies based on entropy calculation correlating experimental measurement of residue participation in folding nucleus and sequence conservation have reached different conclusions. Here we report analysis of conservation of folding nucleus using an evolutionary model alternative to entropy based approac… ▽ More It is important to understand how protein folding and evolution influences each other. Several studies based on entropy calculation correlating experimental measurement of residue participation in folding nucleus and sequence conservation have reached different conclusions. Here we report analysis of conservation of folding nucleus using an evolutionary model alternative to entropy based approaches. We employ a continuous time Markov model of codon substitution to distinguish mutation fixed by evolution and mutation fixed by chance. This model takes into account bias in codon frequency, bias favoring transition over transversion, as well as explicit phylogenetic information. We measure selection pressure using the ratio $ω$ of synonymous vs. non-synonymous substitution at individual residue site. The $ω$-values are estimated using the {\sc Paml} method, a maximum-likelihood estimator. Our results show that there is little correlation between the extent of kinetic participation in protein folding nucleus as measured by experimental $φ$-value and selection pressure as measured by $ω$-value. In addition, two randomization tests failed to show that folding nucleus residues are significantly more conserved than the whole protein. These results suggest that at the level of codon substitution, there is no indication that folding nucleus residues are significantly more conserved than other residues. We further reconstruct candidate ancestral residues of the folding nucleus and suggest possible test tube mutation studies of ancient folding nucleus. △ Less

Submitted 10 November, 2003; originally announced November 2003.

Comments: 15 pages, 4 figures, and 1 table. Accepted by J. Mol. Biol

arXiv:physics/0302082 [pdf, ps, other]

Simplicial edge representation of protein structures and alpha contact potential with confidence measure

Authors: Xiang Li, Changyu Hu, Jie Liang

Abstract: Protein representation and potential function are essential ingredients for studying proteins folding and protein prediction. We introduce a novel geometric representation of contact interactions using the edge simplices from alpha shape of protein structure. This representation can eliminate implausible neighbors not in physical contact, and can avoid spurious contact between two residues when… ▽ More Protein representation and potential function are essential ingredients for studying proteins folding and protein prediction. We introduce a novel geometric representation of contact interactions using the edge simplices from alpha shape of protein structure. This representation can eliminate implausible neighbors not in physical contact, and can avoid spurious contact between two residues when a third residue is between them. We develop statistical alpha contact potential. A studentized bootstrap method is then introduced for assessing the 95% confidence intervals for each of the 210 parameters. We found with confidence that there is significant long range propensity (>30 residues apart) for hydrophobic interactions. We test alpha contact potential for native structure discrimination using several decoy sets, and found it often has comparable performance with atom-based potentials requiring more parameters. We also show that alpha contact potential has better performance than potential defined by cut-off distance between geometric centers of side chains. Clustering of alpha contact potentials reveals natural grou** of residues. To explore the relationship between shape representation and physicochemical representation, we test the minimum alphabet size for structure discrimination. We found that there is no significant difference in discrimination when alphabet size varies from 7 to 20, if geometry is represented accurately by alpha simplicial edges. This result suggests that the geometry of packing plays an important role, but the specific residue types are often interchangeable. △ Less

Submitted 23 February, 2003; originally announced February 2003.

Comments: 18 pages, 7 figures, and 6 tables. Accepted by Proteins

arXiv:cond-mat/0302002 [pdf, ps, other]

On Design of Optimal Nonlinear Kernel Potential Function for Protein Folding and Protein Design

Authors: Changyu Hu, Xiang Li, Jie Liang

Abstract: Potential functions are critical for computational studies of protein structure prediction, folding, and sequence design. A class of widely used potentials for coarse grained models of proteins are contact potentials in the form of weighted linear sum of pairwise contacts. However, these potentials have been shown to be unsuitable choices because they cannot stabilize native proteins against a l… ▽ More Potential functions are critical for computational studies of protein structure prediction, folding, and sequence design. A class of widely used potentials for coarse grained models of proteins are contact potentials in the form of weighted linear sum of pairwise contacts. However, these potentials have been shown to be unsuitable choices because they cannot stabilize native proteins against a large number of decoys generated by gapless threading. We develop an alternative framework for designing protein potential. We describe how finding optimal protein potential can be understood from two geometric viewpoints, and we derive nonlinear potentials using mixture of Gaussian kernel functions for folding and design. The optimization criterion for obtaining parameters of the potential is to minimize bounds on the generalization error of discriminating protein structures and decoys not used in training. In our experiment we use a training set of 440 protein structures repre senting a major portion of all known protein structures, and about 14 million structure decoys and sequence decoys obtained by gapless threading. We succeeded in obtaining nonlinear potential with perfect discrimination of the 440 native structures and native sequences. For the more challenging task of sequence design when decoys are obtained by gapless threading, we show that there is no linear potential with perfect discrimination of all 440 native sequences. Results on an independent test set of 194 proteins also showed that nonlinear kernel potential performs well. △ Less

Submitted 31 January, 2003; originally announced February 2003.

Comments: 22 pages, 7 figures, and 5 tables

arXiv:cond-mat/0301085 [pdf, ps, other]

doi 10.1063/1.1554395

Origin of Scaling Behavior of Protein Packing Density: A Sequential Monte Carlo Study of Compact Long Chain Polymers

Authors: **feng Zhang, Rong Chen, Chao Tang, Jie Liang

Abstract: Single domain proteins are thought to be tightly packed. The introduction of voids by mutations is often regarded as destabilizing. In this study we show that packing density for single domain proteins decreases with chain length. We find that the radius of gyration provides poor description of protein packing but the alpha contact number we introduce here characterize proteins well. We further… ▽ More Single domain proteins are thought to be tightly packed. The introduction of voids by mutations is often regarded as destabilizing. In this study we show that packing density for single domain proteins decreases with chain length. We find that the radius of gyration provides poor description of protein packing but the alpha contact number we introduce here characterize proteins well. We further demonstrate that protein-like scaling relationship between packing density and chain length is observed in off-lattice self-avoiding walks. A key problem in studying compact chain polymer is the attrition problem: It is difficult to generate independent samples of compact long self-avoiding walks. We develop an algorithm based on the framework of sequential Monte Carlo and succeed in generating populations of compact long chain off-lattice polymers up to length $N=2,000$. Results based on analysis of these chain polymers suggest that maintaining high packing density is only characteristic of short chain proteins. We found that the scaling behavior of packing density with chain length of proteins is a generic feature of random polymers satisfying loose constraint in compactness. We conclude that proteins are not optimized by evolution to eliminate packing voids. △ Less

Submitted 7 January, 2003; originally announced January 2003.

Comments: 9 pages, 10 figures. Accepted by J. Chem. Phys

arXiv:physics/0203015 [pdf, ps, other]

doi 10.1063/1.1493772

Statistical Geometry of Packing Defects of Lattice Chain Polymer from Enumeration and Sequential Monte Carlo Method

Authors: Jie Liang, **feng Zhang, Rong Chen

Abstract: Voids exist in proteins as packing defects and are often associated with protein functions. We study the statistical geometry of voids in two-dimensional lattice chain polymers. We define voids as topological features and develop a simple algorithm for their detection. For short chains, void geometry is examined by enumerating all conformations. For long chains, the space of void geometry is exp… ▽ More Voids exist in proteins as packing defects and are often associated with protein functions. We study the statistical geometry of voids in two-dimensional lattice chain polymers. We define voids as topological features and develop a simple algorithm for their detection. For short chains, void geometry is examined by enumerating all conformations. For long chains, the space of void geometry is explored using sequential Monte Carlo importance sampling and resampling techniques. We characterize the relationship of geometric properties of voids with chain length, including probability of void formation, expected number of voids, void size, and wall size of voids. We formalize the concept of packing density for lattice polymers, and further study the relationship between packing density and compactness, two parameters frequently used to describe protein packing. We find that both fully extended and maximally compact polymers have the highest packing density, but polymers with intermediate compactness have low packing density. To study the conformational entropic effects of void formation, we characterize the conformation reduction factor of void formation and found that there are strong end-effect. Voids are more likely to form at the chain end. The critical exponent of end-effect is twice as large as that of self-contacting loop formation when existence of voids is not required. We also briefly discuss the sequential Monte Carlo sampling and resampling techniques used in this study. △ Less

Submitted 6 March, 2002; originally announced March 2002.

Comments: 29 pages, including 12 figures

Showing 1–31 of 31 results for author: Liang, J