-
AI-predicted protein deformation encodes energy landscape perturbation
Authors:
John M Mcbride,
Tsvi Tlusty
Abstract:
AI algorithms proved excellent predictors of protein structure, but whether their exceptional accuracy is merely due to megascale regression or these algorithms learn the underlying physics remains an open question. Here, we perform a stringent test for the existence of such learning in the Alphafold2 (AF) algorithm: We use AF to predict the subtle structural deformation induced by single mutation…
▽ More
AI algorithms proved excellent predictors of protein structure, but whether their exceptional accuracy is merely due to megascale regression or these algorithms learn the underlying physics remains an open question. Here, we perform a stringent test for the existence of such learning in the Alphafold2 (AF) algorithm: We use AF to predict the subtle structural deformation induced by single mutations, quantified by strain, and compare with experimental datasets of corresponding perturbations in folding free energy $ΔΔG$. Unexpectedly, we find that physical strain alone -- without any additional data or computation -- correlates almost as well with $ΔΔG$ as state-of-the-art energy-based and machine-learning predictors.This indicates that the AF-predicted structures alone encode fine details about the energy landscape. In particular, the structures encode significant information on stability, enough to estimate (de-)stabilizing effects of mutations, thus paving the way for the development of novel, structure-based stability predictors for protein design and evolution.
△ Less
Submitted 29 November, 2023;
originally announced November 2023.
-
The Physical Logic of Protein Machines
Authors:
John M. McBride,
Tsvi Tlusty
Abstract:
Proteins are intricate molecular machines whose complexity arises from the heterogeneity of the amino acid building blocks and their dynamic network of many-body interactions. These nanomachines gain function when put in the context of a whole organism through interaction with other inhabitants of the biological realm. And this functionality shapes their evolutionary histories through intertwined…
▽ More
Proteins are intricate molecular machines whose complexity arises from the heterogeneity of the amino acid building blocks and their dynamic network of many-body interactions. These nanomachines gain function when put in the context of a whole organism through interaction with other inhabitants of the biological realm. And this functionality shapes their evolutionary histories through intertwined paths of selection and adaptation. Recent advances in machine learning have solved the decades-old problem of how protein sequence determines their structure. However, the ultimate question regarding the basic logic of protein machines remains open: How does the collective physics of proteins lead to their functionality? and how does a sequence encode the full range of dynamics and chemical interactions that facilitate function? Here, we explore these questions within a physical approach that treats proteins as mechano-chemical machines, which are adapted to function via concerted evolution of structure, motion, and chemical interactions.
△ Less
Submitted 13 December, 2023; v1 submitted 16 November, 2023;
originally announced November 2023.
-
AlphaFold2 can predict single-mutation effects
Authors:
John M. McBride,
Konstantin Polev,
Amirbek Abdirasulov,
Vladimir Reinharz,
Bartosz A. Grzybowski,
Tsvi Tlusty
Abstract:
AlphaFold2 (AF) is a promising tool, but is it accurate enough to predict single mutation effects? Here, we report that the localized structural deformation between protein pairs differing by only 1-3 mutations -- as measured by the effective strain -- is correlated across \num{3901} experimental and AF-predicted structures. Furthermore, analysis of ${\sim} 11000$ proteins shows that the local str…
▽ More
AlphaFold2 (AF) is a promising tool, but is it accurate enough to predict single mutation effects? Here, we report that the localized structural deformation between protein pairs differing by only 1-3 mutations -- as measured by the effective strain -- is correlated across \num{3901} experimental and AF-predicted structures. Furthermore, analysis of ${\sim} 11000$ proteins shows that the local structural change correlates with various phenotypic changes. These findings suggest that AF can predict the range and magnitude of single-mutation effects on average, and we propose a method to improve precision of AF predictions and to indicate when predictions are unreliable.
△ Less
Submitted 21 October, 2023; v1 submitted 14 April, 2022;
originally announced April 2022.
-
General theory of specific binding: insights from a genetic-mechano-chemical protein model
Authors:
John M McBride,
Jean-Pierre Eckmann,
Tsvi Tlusty
Abstract:
Proteins need to selectively interact with specific targets among a multitude of similar molecules in the cell. But despite a firm physical understanding of binding interactions, we lack a general theory of how proteins evolve high specificity. Here, we present such a model that combines chemistry, mechanics and genetics, and explains how their interplay governs the evolution of specific protein-l…
▽ More
Proteins need to selectively interact with specific targets among a multitude of similar molecules in the cell. But despite a firm physical understanding of binding interactions, we lack a general theory of how proteins evolve high specificity. Here, we present such a model that combines chemistry, mechanics and genetics, and explains how their interplay governs the evolution of specific protein-ligand interactions. The model shows that there are many routes to achieving molecular discrimination - by varying degrees of flexibility and shape/chemistry complementarity - but the key ingredient is precision. Harder discrimination tasks require more collective and precise coaction of structure, forces and movements. Proteins can achieve this through correlated mutations extending far from a binding site, which fine-tune the localized interaction with the ligand. Thus, the solution of more complicated tasks is enabled by increasing the protein size, and proteins become more evolvable and robust when they are larger than the bare minimum required for discrimination. The model makes testable, specific predictions about the role of flexibility and shape mismatch in discrimination, and how evolution can independently tune affinity and specificity. Thus, the proposed theory of specific binding addresses the natural question of "why are proteins so big?". A possible answer is that molecular discrimination is often a hard task best performed by adding more layers to the protein.
△ Less
Submitted 26 September, 2022; v1 submitted 22 February, 2022;
originally announced February 2022.
-
Convergent evolution in a large cross-cultural database of musical scales
Authors:
John M McBride,
Sam Passmore,
Tsvi Tlusty
Abstract:
Scales, sets of discrete pitches that form the basis of melodies, are thought to be one of the most universal hallmarks of music. But we know relatively little about cross-cultural diversity of scales or how they evolved. To remedy this, we assemble a cross-cultural database (Database of Musical Scales: DaMuSc) of scale data, collected over the past century by various ethnomusicologists. Statistic…
▽ More
Scales, sets of discrete pitches that form the basis of melodies, are thought to be one of the most universal hallmarks of music. But we know relatively little about cross-cultural diversity of scales or how they evolved. To remedy this, we assemble a cross-cultural database (Database of Musical Scales: DaMuSc) of scale data, collected over the past century by various ethnomusicologists. Statistical analyses of the data highlight that certain intervals (e.g., the octave, fifth, second) are used frequently across cultures. Despite some diversity among scales, it is the similarities across societies which are most striking - most scales are found close to equidistant 5- and 7-note scales. We discuss the mechanisms of variation and selection in the evolution of scales, and how the assembled data may be used to examine the root causes of convergent evolution.
△ Less
Submitted 2 May, 2023; v1 submitted 22 July, 2021;
originally announced August 2021.
-
Structural asymmetry along protein sequences and co-translational folding
Authors:
John M McBride,
Tsvi Tlusty
Abstract:
Proteins are translated from the N- to the C-terminus, raising the basic question of how this innate directionality affects their evolution. To explore this question, we analyze 16,200 structures from the protein data bank (PDB). We find remarkable enrichment of $α$-helices at the C terminus and $β$-strands at the N terminus. Furthermore, this $α$-$β$ asymmetry correlates with sequence length and…
▽ More
Proteins are translated from the N- to the C-terminus, raising the basic question of how this innate directionality affects their evolution. To explore this question, we analyze 16,200 structures from the protein data bank (PDB). We find remarkable enrichment of $α$-helices at the C terminus and $β$-strands at the N terminus. Furthermore, this $α$-$β$ asymmetry correlates with sequence length and contact order, both determinants of folding rate, hinting at possible links to co-translational folding (CTF). Hence, we propose the 'slowest-first' scheme, whereby protein sequences evolved structural asymmetry to accelerate CTF: the slowest of the cooperatively-folding segments are positioned near the N terminus so they have more time to fold during translation. A phenomenological model predicts that CTF can be accelerated by asymmetry, up to double the rate, when folding time is commensurate with translation time; analysis of the PDB reveals that structural asymmetry is indeed maximal in this regime. This correspondence is greater in prokaryotes, which generally require faster protein production. Altogether, this indicates that accelerating CTF is a substantial evolutionary force whose interplay with stability and functionality is encoded in sequence asymmetry.
△ Less
Submitted 16 March, 2021; v1 submitted 29 September, 2020;
originally announced September 2020.
-
Cross-cultural data shows musical scales evolved to maximise imperfect fifths
Authors:
John M. McBride,
Tsvi Tlusty
Abstract:
Musical scales are used throughout the world, but the question of how they evolved remains open. Some suggest that scales based on the harmonic series are inherently pleasant, while others propose that scales are chosen that are easy to communicate. However, testing these theories has been hindered by the sparseness of empirical evidence. Here, we assimilate data from diverse ethnomusicological so…
▽ More
Musical scales are used throughout the world, but the question of how they evolved remains open. Some suggest that scales based on the harmonic series are inherently pleasant, while others propose that scales are chosen that are easy to communicate. However, testing these theories has been hindered by the sparseness of empirical evidence. Here, we assimilate data from diverse ethnomusicological sources into a cross-cultural database of scales. We generate populations of scales based on multiple theories and assess their similarity to empirical distributions from the database. Most scales tend to include intervals which are close in size to perfect fifths (``imperfect fifths''), and packing arguments explain the salient features of the distributions. Scales are also preferred if their intervals are compressible, which may facilitate efficient communication and memory of melodies. While scales appear to evolve according to various selection pressures, the simplest h imperfect-fifths packing model best fits the empirical data.
△ Less
Submitted 1 June, 2020; v1 submitted 13 June, 2019;
originally announced June 2019.