Modular decomposition of protein structure using community detection
Authors:
William P. Grant,
Sebastian E. Ahnert
Abstract:
As the number of solved protein structures increases, the opportunities for meta-analysis of this dataset increase too. Protein structures are known to be formed of domains; structural and functional subunits that are often repeated across sets of proteins. These domains generally form compact, globular regions, and are therefore often easily identifiable by inspection, yet the problem of automati…
▽ More
As the number of solved protein structures increases, the opportunities for meta-analysis of this dataset increase too. Protein structures are known to be formed of domains; structural and functional subunits that are often repeated across sets of proteins. These domains generally form compact, globular regions, and are therefore often easily identifiable by inspection, yet the problem of automatically fragmenting the protein into these compact substructures remains computationally challenging. Existing domain classification methods focus on finding subregions of protein structure that are conserved, rather than finding a decomposition which spans the full protein structure. However, such a decomposition would find ready application in coarse-graining molecular dynamics, analysing the protein's topology, in de novo protein design and in fitting electron microscopy maps. Here, we present a tool for performing this modular decomposition using the Infomap community detection algorithm. The protein structure is abstracted into a network in which its amino acids are the nodes, and where the edges are generated using a simple proximity test. Infomap can then be used to identify highly intra-connected regions of the protein. We perform this decomposition systematically across 4000 distinct protein structures, taken from the Protein Data Bank. The decomposition obtained correlates well with existing PFAM sequence classifications, but has the advantage of spanning the full protein, with the potential for novel domains. The coarse-grained network formed by the communities can also be used as a proxy for protein topology at the single-chain level; we demonstrate that grou** these proteins by their coarse-grained network results in a functionally significant classification.
△ Less
Submitted 18 September, 2018;
originally announced September 2018.
Revealing and exploiting hierarchical material structure through complex atomic networks
Authors:
Sebastian E. Ahnert,
William P. Grant,
Chris J. Pickard
Abstract:
One of the great challenges of modern science is to faithfully model, and understand, matter at a wide range of scales. Starting with atoms, the vastness of the space of possible configurations poses a formidable challenge to any simulation of complex atomic and molecular systems. We introduce a computational method to reduce the complexity of atomic configuration space by systematically recognisi…
▽ More
One of the great challenges of modern science is to faithfully model, and understand, matter at a wide range of scales. Starting with atoms, the vastness of the space of possible configurations poses a formidable challenge to any simulation of complex atomic and molecular systems. We introduce a computational method to reduce the complexity of atomic configuration space by systematically recognising hierarchical levels of atomic structure, and identifying the individual components. Given a list of atomic coordinates, a network is generated based on the distances between the atoms. Using the technique of modularity optimisation, the network is decomposed into modules. This procedure can be performed at different resolution levels, leading to a decomposition of the system at different scales, from which hierarchical structure can be identified. By considering the amount of information required to represent a given modular decomposition we can furthermore find the most succinct descriptions of a given atomic ensemble. Our straightforward, automatic and general approach is applied to complex crystal structures. We show that modular decomposition of these structures considerably simplifies configuration space, which in turn can be used in discovery of novel crystal structures, and opens up a pathway towards accelerated molecular dynamics of complex atomic ensembles. The power of this approach is demonstrated by the identification of a possible allotrope of boron containing 56 atoms in the primitive unit cell, which we uncover using an accelerated structure search, based on a modular decomposition of a known dense phase of boron, $γ$-B$_{28}$.
△ Less
Submitted 25 August, 2017;
originally announced August 2017.