-
Selecting High-Dimensional Representations of Physical Systems by Reweighted Diffusion Maps
Authors:
Jakub Rydzewski
Abstract:
Constructing reduced representations of high-dimensional systems is a fundamental problem in physical chemistry. Many unsupervised machine learning methods can automatically find such low-dimensional representations. However, an often overlooked problem is what high-dimensional representation should be used to describe systems before dimensionality reduction. Here, we address this issue using a re…
▽ More
Constructing reduced representations of high-dimensional systems is a fundamental problem in physical chemistry. Many unsupervised machine learning methods can automatically find such low-dimensional representations. However, an often overlooked problem is what high-dimensional representation should be used to describe systems before dimensionality reduction. Here, we address this issue using a recently developed method called reweighted diffusion map [J. Chem. Theory Comput. 2022, 18, 7179-7192]. We show how high-dimensional representations can be quantitatively selected by exploring the spectral decomposition of Markov transition matrices built from data obtained from standard or enhanced sampling atomistic simulations. We demonstrate the performance of the method in several high-dimensional examples.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Spectral Map: Embedding Slow Kinetics in Collective Variables
Authors:
Jakub Rydzewski
Abstract:
The dynamics of physical systems that require high-dimensional representation can often be captured in a few meaningful degrees of freedom called collective variables (CVs). However, identifying CVs is challenging and constitutes a fundamental problem in physical chemistry. This problem is even more pronounced when CVs information about slow kinetics related to rare transitions between long-lived…
▽ More
The dynamics of physical systems that require high-dimensional representation can often be captured in a few meaningful degrees of freedom called collective variables (CVs). However, identifying CVs is challenging and constitutes a fundamental problem in physical chemistry. This problem is even more pronounced when CVs information about slow kinetics related to rare transitions between long-lived metastable states. To address this issue, we propose an unsupervised deep-learning method called spectral map. Our method constructs slow CVs by maximizing the spectral gap between slow and fast eigenvalues of a transition matrix estimated by an anisotropic diffusion kernel. We demonstrate our method in several high-dimensional reversible folding processes.
△ Less
Submitted 2 April, 2024;
originally announced April 2024.
-
Learning Markovian Dynamics with Spectral Maps
Authors:
Jakub Rydzewski,
Tuğçe Gökdemir
Abstract:
The long-time behavior of many complex molecular systems is often governed by slow relaxation dynamics that can be described by a few reaction coordinates referred to as collective variables (CVs). However, identifying CVs hidden in a high-dimensional configuration space poses a fundamental challenge in chemical physics. To address this problem, we expand on a recently introduced deep-learning tec…
▽ More
The long-time behavior of many complex molecular systems is often governed by slow relaxation dynamics that can be described by a few reaction coordinates referred to as collective variables (CVs). However, identifying CVs hidden in a high-dimensional configuration space poses a fundamental challenge in chemical physics. To address this problem, we expand on a recently introduced deep-learning technique called spectral map [Rydzewski, J. Phys. Chem. Lett. 2023, 14, 22, 5216-5220]. Spectral map learns CVs by maximizing a spectral gap between slow and fast eigenvalues of a Markov transition matrix describing anisotropic diffusion. An introduced modification in the learning algorithm allows spectral map to represent multiscale free-energy landscapes. Through a Markov state model analysis, we validate that spectral map learns slow CVs related to the dominant relaxation timescales and discerns between long-lived metastable states.
△ Less
Submitted 2 April, 2024; v1 submitted 27 November, 2023;
originally announced November 2023.
-
Spectral Maps for Learning Reduced Representations of Molecular Systems
Authors:
Tuğçe Gökdemir,
Jakub Rydzewski
Abstract:
Investigating processes in complex molecular systems, which are characterized by many variables, is a crucial problem in computational physics. These systems can be reduced to a few meaningful degrees of freedom known as collective variables (CVs). However, identifying these CVs is a significant challenge, especially for systems with long-lived metastable states. This is because the information ab…
▽ More
Investigating processes in complex molecular systems, which are characterized by many variables, is a crucial problem in computational physics. These systems can be reduced to a few meaningful degrees of freedom known as collective variables (CVs). However, identifying these CVs is a significant challenge, especially for systems with long-lived metastable states. This is because the information about the slow kinetics of rare transitions needs to be encoded in CVs. In this talk, we review recent advances in learning slow CVs and focus mainly on our spectral map technique, a promising deep-learning method that learns CVs based on the slowest timescales. By maximizing the spectral gap between slow and fast eigenvalues of a Markov transition matrix constructed from simulation data, our method effectively captures a simplified representation of alanine dipeptide in solvent. This practical application of our method demonstrates its ability to extract slow CVs, making it a valuable tool for analyzing complex systems.
△ Less
Submitted 24 May, 2024; v1 submitted 6 November, 2023;
originally announced November 2023.
-
Manifold Learning in Atomistic Simulations: A Conceptual Review
Authors:
Jakub Rydzewski,
Ming Chen,
Omar Valsson
Abstract:
Analyzing large volumes of high-dimensional data requires dimensionality reduction: finding meaningful low-dimensional structures hidden in their high-dimensional observations. Such practice is needed in atomistic simulations of complex systems where even thousands of degrees of freedom are sampled. An abundance of such data makes gaining insight into a specific physical problem strenuous. Our pri…
▽ More
Analyzing large volumes of high-dimensional data requires dimensionality reduction: finding meaningful low-dimensional structures hidden in their high-dimensional observations. Such practice is needed in atomistic simulations of complex systems where even thousands of degrees of freedom are sampled. An abundance of such data makes gaining insight into a specific physical problem strenuous. Our primary aim in this review is to focus on unsupervised machine learning methods that can be used on simulation data to find a low-dimensional manifold providing a collective and informative characterization of the studied process. Such manifolds can be used for sampling long-timescale processes and free-energy estimation. We describe methods that can work on datasets from standard and enhanced sampling atomistic simulations. Unlike recent reviews on manifold learning for atomistic simulations, we consider only methods that construct low-dimensional manifolds based on Markov transition probabilities between high-dimensional samples. We discuss these techniques from a conceptual point of view, including their underlying theoretical frameworks and possible limitations.
△ Less
Submitted 27 May, 2023; v1 submitted 15 March, 2023;
originally announced March 2023.
-
Reweighted Manifold Learning of Collective Variables from Enhanced Sampling Simulations
Authors:
Jakub Rydzewski,
Ming Chen,
Tushar K. Ghosh,
Omar Valsson
Abstract:
Enhanced sampling methods are indispensable in computational physics and chemistry, where atomistic simulations cannot exhaustively sample the high-dimensional configuration space of dynamical systems due to the sampling problem. A class of such enhanced sampling methods works by identifying a few slow degrees of freedom, termed collective variables (CVs), and enhancing the sampling along these CV…
▽ More
Enhanced sampling methods are indispensable in computational physics and chemistry, where atomistic simulations cannot exhaustively sample the high-dimensional configuration space of dynamical systems due to the sampling problem. A class of such enhanced sampling methods works by identifying a few slow degrees of freedom, termed collective variables (CVs), and enhancing the sampling along these CVs. Selecting CVs to analyze and drive the sampling is not trivial and often relies on physical and chemical intuition. Despite routinely circumventing this issue using manifold learning to estimate CVs directly from standard simulations, such methods cannot provide map**s to a low-dimensional manifold from enhanced sampling simulations as the geometry and density of the learned manifold are biased. Here, we address this crucial issue and provide a general reweighting framework based on anisotropic diffusion maps for manifold learning that takes into account that the learning data set is sampled from a biased probability distribution. We consider manifold learning methods based on constructing a Markov chain describing transition probabilities between high-dimensional samples. We show that our framework reverts the biasing effect yielding CVs that correctly describe the equilibrium density. This advancement enables the construction of low-dimensional CVs using manifold learning directly from data generated by enhanced sampling simulations. We call our framework reweighted manifold learning. We show that it can be used in many manifold learning techniques on data from both standard and enhanced sampling simulations.
△ Less
Submitted 3 April, 2024; v1 submitted 29 July, 2022;
originally announced July 2022.
-
Multiscale reweighted stochastic embedding (MRSE): Deep learning of collective variables for enhanced sampling
Authors:
Jakub Rydzewski,
Omar Valsson
Abstract:
Machine learning methods provide a general framework for automatically finding and representing the essential characteristics of simulation data. This task is particularly crucial in enhanced sampling simulations. There we seek a few generalized degrees of freedom, referred to as collective variables (CVs), to represent and drive the sampling of the free energy landscape. In theory, these CVs shou…
▽ More
Machine learning methods provide a general framework for automatically finding and representing the essential characteristics of simulation data. This task is particularly crucial in enhanced sampling simulations. There we seek a few generalized degrees of freedom, referred to as collective variables (CVs), to represent and drive the sampling of the free energy landscape. In theory, these CVs should separate different metastable states and correspond to the slow degrees of freedom of the studied physical process. To this aim, we propose a new method that we call multiscale reweighted stochastic embedding (MRSE). Our work builds upon a parametric version of stochastic neighbor embedding. The technique automatically learns CVs that map a high-dimensional feature space to a low-dimensional latent space via a deep neural network. We introduce several new advancements to stochastic neighbor embedding methods that make MRSE especially suitable for enhanced sampling simulations: (1) weight-tempered random sampling as a landmark selection scheme to obtain training data sets that strike a balance between equilibrium representation and capturing important metastable states lying higher in free energy; (2) a multiscale representation of the high-dimensional feature space via a Gaussian mixture probability model; and (3) a reweighting procedure to account for training data from a biased probability distribution. We show that MRSE constructs low-dimensional CVs that can correctly characterize the different metastable states in three model systems: the Müller-Brown potential, alanine dipeptide, and alanine tetrapeptide.
△ Less
Submitted 17 June, 2021; v1 submitted 13 July, 2020;
originally announced July 2020.
-
maze: Heterogeneous Ligand Unbinding along Transient Protein Tunnels
Authors:
Jakub Rydzewski
Abstract:
Recent developments in enhanced sampling methods showed that it is possible to reconstruct ligand unbinding pathways with spatial and temporal resolution inaccessible to experiments. Ideally, such techniques should provide an atomistic definition of possibly many reaction pathways, because crude estimates may lead either to overestimating energy barriers, or inability to sample hidden energy barri…
▽ More
Recent developments in enhanced sampling methods showed that it is possible to reconstruct ligand unbinding pathways with spatial and temporal resolution inaccessible to experiments. Ideally, such techniques should provide an atomistic definition of possibly many reaction pathways, because crude estimates may lead either to overestimating energy barriers, or inability to sample hidden energy barriers that are not captured by reaction pathway estimates. Here we provide an implementation of a new method [J. Rydzewski \& O. Valsson, J. Chem. Phys. {\bf 150}, 221101 (2019)] dedicated entirely to sampling the reaction pathways of the ligand-protein dissociation process. The program, called \texttt{maze}, is implemented as an official module for PLUMED 2, an open source library for enhanced sampling in molecular systems, and comprises algorithms to find multiple heterogeneous reaction pathways of ligand unbinding from proteins during atomistic simulations. The \texttt{maze} module requires only a crystallographic structure to start a simulation, and does not depend on many \textit{ad hoc} parameters. The program is based on enhanced sampling and non-convex optimization methods. To present its applicability and flexibility, we provide several examples of ligand unbinding pathways along transient protein tunnels reconstructed by \texttt{maze} in a model ligand-protein system, and discuss the details of the implementation.
△ Less
Submitted 16 December, 2019; v1 submitted 8 April, 2019;
originally announced April 2019.
-
Finding Multiple Reaction Pathways of Ligand Unbinding
Authors:
J. Rydzewski,
O. Valsson
Abstract:
Searching for reaction pathways describing rare events in large systems presents a long-standing challenge in chemistry and physics. Incorrectly computed reaction pathways result in the degeneracy of microscopic configurations and inability to sample hidden energy barriers. To this aim, we present a general enhanced sampling method to find multiple diverse reaction pathways of ligand unbinding thr…
▽ More
Searching for reaction pathways describing rare events in large systems presents a long-standing challenge in chemistry and physics. Incorrectly computed reaction pathways result in the degeneracy of microscopic configurations and inability to sample hidden energy barriers. To this aim, we present a general enhanced sampling method to find multiple diverse reaction pathways of ligand unbinding through non-convex optimization of a loss function describing ligand-protein interactions. The method successfully overcomes large energy barriers using an adaptive bias potential, and constructs possible reaction pathways along transient tunnels without the initial guesses of intermediate or final states, requiring crystallographic information only. We examine the method on the T4 lysozyme L99A mutant which is often used as a model system to study ligand binding to proteins, provide a previously unknown reaction pathway, and show that using the bias potential and the tunnel widths it is possible to capture heterogeneity of the unbinding mechanisms between the found transient protein tunnels.
△ Less
Submitted 17 June, 2019; v1 submitted 24 August, 2018;
originally announced August 2018.
-
Entropic measure to prevent energy over-minimization in molecular dynamics simulations
Authors:
Jakub Rydzewski,
Rafal Jakubowski,
Wieslaw Nowak
Abstract:
This work examines the impact of energy over-minimization on an ensemble of biological molecules subjected to the potential energy minimization procedure in vacuum. In the studied structures, long potential energy minimization stage leads to an increase of the main- and side-chain entropies in proteins. We show that such over-minimization may diverge the protein structures from the near-native att…
▽ More
This work examines the impact of energy over-minimization on an ensemble of biological molecules subjected to the potential energy minimization procedure in vacuum. In the studied structures, long potential energy minimization stage leads to an increase of the main- and side-chain entropies in proteins. We show that such over-minimization may diverge the protein structures from the near-native attraction basin which possesses a minimum of free energy. We propose a measure based on the Pareto front of total entropy for quality assessment of minimized protein conformation. This measure may help in selection of adequate number of energy minimization steps in protein modelling and, thus, in preservation of the near-native protein conformation.
△ Less
Submitted 29 September, 2015; v1 submitted 4 July, 2015;
originally announced July 2015.
-
Memetic Algorithms for Ligand Expulsion from Protein Cavities
Authors:
Jakub Rydzewski,
Wieslaw Nowak
Abstract:
Ligand diffusion through proteins is a fundamental process governing biological signaling and enzymatic catalysis. The complex topology of protein tunnels results in difficulties with computing ligand escape pathways by standard molecular dynamics (MD) simulations. Here, two novel methods for searching of ligand exit pathways and cavity exploration are proposed: memory random acceleration MD (mRAM…
▽ More
Ligand diffusion through proteins is a fundamental process governing biological signaling and enzymatic catalysis. The complex topology of protein tunnels results in difficulties with computing ligand escape pathways by standard molecular dynamics (MD) simulations. Here, two novel methods for searching of ligand exit pathways and cavity exploration are proposed: memory random acceleration MD (mRAMD), and memetic algorithms (MA). In mRAMD, finding exit pathways is based on a non-Markovian biasing that is introduced to optimize the unbinding force. In MA, hybrid learning protocols are exploited to predict optimal ligand exit paths. The methods are tested on three proteins with increasing complexity of tunnels: M2 muscarinic receptor, nitrile hydratase, and cytochrome P450cam. In these cases, the proposed methods outperform standard techniques that are used currently to find ligand egress pathways. The proposed approach is general and appropriate for accelerated transport of an object through a network of protein tunnels.
△ Less
Submitted 20 March, 2019; v1 submitted 1 July, 2015;
originally announced July 2015.