Search | arXiv e-print repository

doi 10.1021/acs.jpclett.3c00265

Selecting High-Dimensional Representations of Physical Systems by Reweighted Diffusion Maps

Abstract: Constructing reduced representations of high-dimensional systems is a fundamental problem in physical chemistry. Many unsupervised machine learning methods can automatically find such low-dimensional representations. However, an often overlooked problem is what high-dimensional representation should be used to describe systems before dimensionality reduction. Here, we address this issue using a re… ▽ More Constructing reduced representations of high-dimensional systems is a fundamental problem in physical chemistry. Many unsupervised machine learning methods can automatically find such low-dimensional representations. However, an often overlooked problem is what high-dimensional representation should be used to describe systems before dimensionality reduction. Here, we address this issue using a recently developed method called reweighted diffusion map [J. Chem. Theory Comput. 2022, 18, 7179-7192]. We show how high-dimensional representations can be quantitatively selected by exploring the spectral decomposition of Markov transition matrices built from data obtained from standard or enhanced sampling atomistic simulations. We demonstrate the performance of the method in several high-dimensional examples. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: Published version

Journal ref: J. Phys. Chem. Lett. 2023, 14, 11, 2778-2783

arXiv:2404.01809 [pdf, other]

doi 10.1021/acs.jpclett.3c01101

Spectral Map: Embedding Slow Kinetics in Collective Variables

Authors: Jakub Rydzewski

Abstract: The dynamics of physical systems that require high-dimensional representation can often be captured in a few meaningful degrees of freedom called collective variables (CVs). However, identifying CVs is challenging and constitutes a fundamental problem in physical chemistry. This problem is even more pronounced when CVs information about slow kinetics related to rare transitions between long-lived… ▽ More The dynamics of physical systems that require high-dimensional representation can often be captured in a few meaningful degrees of freedom called collective variables (CVs). However, identifying CVs is challenging and constitutes a fundamental problem in physical chemistry. This problem is even more pronounced when CVs information about slow kinetics related to rare transitions between long-lived metastable states. To address this issue, we propose an unsupervised deep-learning method called spectral map. Our method constructs slow CVs by maximizing the spectral gap between slow and fast eigenvalues of a transition matrix estimated by an anisotropic diffusion kernel. We demonstrate our method in several high-dimensional reversible folding processes. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: Published version

Journal ref: J. Phys. Chem. Lett. 2023, 14, 22, 5216-5220

arXiv:2311.16411 [pdf, other]

doi 10.1063/5.0189241

Learning Markovian Dynamics with Spectral Maps

Authors: Jakub Rydzewski, Tuğçe Gökdemir

Abstract: The long-time behavior of many complex molecular systems is often governed by slow relaxation dynamics that can be described by a few reaction coordinates referred to as collective variables (CVs). However, identifying CVs hidden in a high-dimensional configuration space poses a fundamental challenge in chemical physics. To address this problem, we expand on a recently introduced deep-learning tec… ▽ More The long-time behavior of many complex molecular systems is often governed by slow relaxation dynamics that can be described by a few reaction coordinates referred to as collective variables (CVs). However, identifying CVs hidden in a high-dimensional configuration space poses a fundamental challenge in chemical physics. To address this problem, we expand on a recently introduced deep-learning technique called spectral map [Rydzewski, J. Phys. Chem. Lett. 2023, 14, 22, 5216-5220]. Spectral map learns CVs by maximizing a spectral gap between slow and fast eigenvalues of a Markov transition matrix describing anisotropic diffusion. An introduced modification in the learning algorithm allows spectral map to represent multiscale free-energy landscapes. Through a Markov state model analysis, we validate that spectral map learns slow CVs related to the dominant relaxation timescales and discerns between long-lived metastable states. △ Less

Submitted 2 April, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

Comments: Accepted version

Journal ref: J. Chem. Phys. 160 (9), 2024

arXiv:2311.03641 [pdf, other]

Spectral Maps for Learning Reduced Representations of Molecular Systems

Authors: Tuğçe Gökdemir, Jakub Rydzewski

Abstract: Investigating processes in complex molecular systems, which are characterized by many variables, is a crucial problem in computational physics. These systems can be reduced to a few meaningful degrees of freedom known as collective variables (CVs). However, identifying these CVs is a significant challenge, especially for systems with long-lived metastable states. This is because the information ab… ▽ More Investigating processes in complex molecular systems, which are characterized by many variables, is a crucial problem in computational physics. These systems can be reduced to a few meaningful degrees of freedom known as collective variables (CVs). However, identifying these CVs is a significant challenge, especially for systems with long-lived metastable states. This is because the information about the slow kinetics of rare transitions needs to be encoded in CVs. In this talk, we review recent advances in learning slow CVs and focus mainly on our spectral map technique, a promising deep-learning method that learns CVs based on the slowest timescales. By maximizing the spectral gap between slow and fast eigenvalues of a Markov transition matrix constructed from simulation data, our method effectively captures a simplified representation of alanine dipeptide in solvent. This practical application of our method demonstrates its ability to extract slow CVs, making it a valuable tool for analyzing complex systems. △ Less

Submitted 24 May, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

Comments: Presented at 34th IUPAP Conference on Computational Physics (CCP2023)

arXiv:2303.08486 [pdf, other]

doi 10.1088/2632-2153/ace81a

Manifold Learning in Atomistic Simulations: A Conceptual Review

Authors: Jakub Rydzewski, Ming Chen, Omar Valsson

Abstract: Analyzing large volumes of high-dimensional data requires dimensionality reduction: finding meaningful low-dimensional structures hidden in their high-dimensional observations. Such practice is needed in atomistic simulations of complex systems where even thousands of degrees of freedom are sampled. An abundance of such data makes gaining insight into a specific physical problem strenuous. Our pri… ▽ More Analyzing large volumes of high-dimensional data requires dimensionality reduction: finding meaningful low-dimensional structures hidden in their high-dimensional observations. Such practice is needed in atomistic simulations of complex systems where even thousands of degrees of freedom are sampled. An abundance of such data makes gaining insight into a specific physical problem strenuous. Our primary aim in this review is to focus on unsupervised machine learning methods that can be used on simulation data to find a low-dimensional manifold providing a collective and informative characterization of the studied process. Such manifolds can be used for sampling long-timescale processes and free-energy estimation. We describe methods that can work on datasets from standard and enhanced sampling atomistic simulations. Unlike recent reviews on manifold learning for atomistic simulations, we consider only methods that construct low-dimensional manifolds based on Markov transition probabilities between high-dimensional samples. We discuss these techniques from a conceptual point of view, including their underlying theoretical frameworks and possible limitations. △ Less

Submitted 27 May, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

Journal ref: Mach. Learn.: Sci. Technol. 4 031001 (2023)

arXiv:2207.14554 [pdf, other]

doi 10.1021/acs.jctc.2c00873

Reweighted Manifold Learning of Collective Variables from Enhanced Sampling Simulations

Authors: Jakub Rydzewski, Ming Chen, Tushar K. Ghosh, Omar Valsson

Abstract: Enhanced sampling methods are indispensable in computational physics and chemistry, where atomistic simulations cannot exhaustively sample the high-dimensional configuration space of dynamical systems due to the sampling problem. A class of such enhanced sampling methods works by identifying a few slow degrees of freedom, termed collective variables (CVs), and enhancing the sampling along these CV… ▽ More Enhanced sampling methods are indispensable in computational physics and chemistry, where atomistic simulations cannot exhaustively sample the high-dimensional configuration space of dynamical systems due to the sampling problem. A class of such enhanced sampling methods works by identifying a few slow degrees of freedom, termed collective variables (CVs), and enhancing the sampling along these CVs. Selecting CVs to analyze and drive the sampling is not trivial and often relies on physical and chemical intuition. Despite routinely circumventing this issue using manifold learning to estimate CVs directly from standard simulations, such methods cannot provide map**s to a low-dimensional manifold from enhanced sampling simulations as the geometry and density of the learned manifold are biased. Here, we address this crucial issue and provide a general reweighting framework based on anisotropic diffusion maps for manifold learning that takes into account that the learning data set is sampled from a biased probability distribution. We consider manifold learning methods based on constructing a Markov chain describing transition probabilities between high-dimensional samples. We show that our framework reverts the biasing effect yielding CVs that correctly describe the equilibrium density. This advancement enables the construction of low-dimensional CVs using manifold learning directly from data generated by enhanced sampling simulations. We call our framework reweighted manifold learning. We show that it can be used in many manifold learning techniques on data from both standard and enhanced sampling simulations. △ Less

Submitted 3 April, 2024; v1 submitted 29 July, 2022; originally announced July 2022.

Comments: Published version

Journal ref: J. Chem. Theory Comput. 2022, 18, 12, 7179-7192

arXiv:2007.06377 [pdf, other]

doi 10.1021/acs.jpca.1c02869

Multiscale reweighted stochastic embedding (MRSE): Deep learning of collective variables for enhanced sampling

Authors: Jakub Rydzewski, Omar Valsson

Abstract: Machine learning methods provide a general framework for automatically finding and representing the essential characteristics of simulation data. This task is particularly crucial in enhanced sampling simulations. There we seek a few generalized degrees of freedom, referred to as collective variables (CVs), to represent and drive the sampling of the free energy landscape. In theory, these CVs shou… ▽ More Machine learning methods provide a general framework for automatically finding and representing the essential characteristics of simulation data. This task is particularly crucial in enhanced sampling simulations. There we seek a few generalized degrees of freedom, referred to as collective variables (CVs), to represent and drive the sampling of the free energy landscape. In theory, these CVs should separate different metastable states and correspond to the slow degrees of freedom of the studied physical process. To this aim, we propose a new method that we call multiscale reweighted stochastic embedding (MRSE). Our work builds upon a parametric version of stochastic neighbor embedding. The technique automatically learns CVs that map a high-dimensional feature space to a low-dimensional latent space via a deep neural network. We introduce several new advancements to stochastic neighbor embedding methods that make MRSE especially suitable for enhanced sampling simulations: (1) weight-tempered random sampling as a landmark selection scheme to obtain training data sets that strike a balance between equilibrium representation and capturing important metastable states lying higher in free energy; (2) a multiscale representation of the high-dimensional feature space via a Gaussian mixture probability model; and (3) a reweighting procedure to account for training data from a biased probability distribution. We show that MRSE constructs low-dimensional CVs that can correctly characterize the different metastable states in three model systems: the Müller-Brown potential, alanine dipeptide, and alanine tetrapeptide. △ Less

Submitted 17 June, 2021; v1 submitted 13 July, 2020; originally announced July 2020.

arXiv:1904.03929 [pdf, other]

doi 10.1016/j.cpc.2019.106865

maze: Heterogeneous Ligand Unbinding along Transient Protein Tunnels

Authors: Jakub Rydzewski

Abstract: Recent developments in enhanced sampling methods showed that it is possible to reconstruct ligand unbinding pathways with spatial and temporal resolution inaccessible to experiments. Ideally, such techniques should provide an atomistic definition of possibly many reaction pathways, because crude estimates may lead either to overestimating energy barriers, or inability to sample hidden energy barri… ▽ More Recent developments in enhanced sampling methods showed that it is possible to reconstruct ligand unbinding pathways with spatial and temporal resolution inaccessible to experiments. Ideally, such techniques should provide an atomistic definition of possibly many reaction pathways, because crude estimates may lead either to overestimating energy barriers, or inability to sample hidden energy barriers that are not captured by reaction pathway estimates. Here we provide an implementation of a new method [J. Rydzewski \& O. Valsson, J. Chem. Phys. {\bf 150}, 221101 (2019)] dedicated entirely to sampling the reaction pathways of the ligand-protein dissociation process. The program, called \texttt{maze}, is implemented as an official module for PLUMED 2, an open source library for enhanced sampling in molecular systems, and comprises algorithms to find multiple heterogeneous reaction pathways of ligand unbinding from proteins during atomistic simulations. The \texttt{maze} module requires only a crystallographic structure to start a simulation, and does not depend on many \textit{ad hoc} parameters. The program is based on enhanced sampling and non-convex optimization methods. To present its applicability and flexibility, we provide several examples of ligand unbinding pathways along transient protein tunnels reconstructed by \texttt{maze} in a model ligand-protein system, and discuss the details of the implementation. △ Less

Submitted 16 December, 2019; v1 submitted 8 April, 2019; originally announced April 2019.

Journal ref: Comp. Phys. Commun. 247, 106865 (2020)

arXiv:1808.08089 [pdf, other]

doi 10.1063/1.5108638

Finding Multiple Reaction Pathways of Ligand Unbinding

Authors: J. Rydzewski, O. Valsson

Abstract: Searching for reaction pathways describing rare events in large systems presents a long-standing challenge in chemistry and physics. Incorrectly computed reaction pathways result in the degeneracy of microscopic configurations and inability to sample hidden energy barriers. To this aim, we present a general enhanced sampling method to find multiple diverse reaction pathways of ligand unbinding thr… ▽ More Searching for reaction pathways describing rare events in large systems presents a long-standing challenge in chemistry and physics. Incorrectly computed reaction pathways result in the degeneracy of microscopic configurations and inability to sample hidden energy barriers. To this aim, we present a general enhanced sampling method to find multiple diverse reaction pathways of ligand unbinding through non-convex optimization of a loss function describing ligand-protein interactions. The method successfully overcomes large energy barriers using an adaptive bias potential, and constructs possible reaction pathways along transient tunnels without the initial guesses of intermediate or final states, requiring crystallographic information only. We examine the method on the T4 lysozyme L99A mutant which is often used as a model system to study ligand binding to proteins, provide a previously unknown reaction pathway, and show that using the bias potential and the tunnel widths it is possible to capture heterogeneity of the unbinding mechanisms between the found transient protein tunnels. △ Less

Submitted 17 June, 2019; v1 submitted 24 August, 2018; originally announced August 2018.

Journal ref: J. Chem. Phys. 150, 221101 (2019)

arXiv:1507.01118 [pdf]

doi 10.1063/1.4935370

Entropic measure to prevent energy over-minimization in molecular dynamics simulations

Authors: Jakub Rydzewski, Rafal Jakubowski, Wieslaw Nowak

Abstract: This work examines the impact of energy over-minimization on an ensemble of biological molecules subjected to the potential energy minimization procedure in vacuum. In the studied structures, long potential energy minimization stage leads to an increase of the main- and side-chain entropies in proteins. We show that such over-minimization may diverge the protein structures from the near-native att… ▽ More This work examines the impact of energy over-minimization on an ensemble of biological molecules subjected to the potential energy minimization procedure in vacuum. In the studied structures, long potential energy minimization stage leads to an increase of the main- and side-chain entropies in proteins. We show that such over-minimization may diverge the protein structures from the near-native attraction basin which possesses a minimum of free energy. We propose a measure based on the Pareto front of total entropy for quality assessment of minimized protein conformation. This measure may help in selection of adequate number of energy minimization steps in protein modelling and, thus, in preservation of the near-native protein conformation. △ Less

Submitted 29 September, 2015; v1 submitted 4 July, 2015; originally announced July 2015.

Journal ref: J. Chem. Phys. 143, 171103 (2015)

arXiv:1507.00150 [pdf, other]

doi 10.1063/1.4931181

Memetic Algorithms for Ligand Expulsion from Protein Cavities

Authors: Jakub Rydzewski, Wieslaw Nowak

Abstract: Ligand diffusion through proteins is a fundamental process governing biological signaling and enzymatic catalysis. The complex topology of protein tunnels results in difficulties with computing ligand escape pathways by standard molecular dynamics (MD) simulations. Here, two novel methods for searching of ligand exit pathways and cavity exploration are proposed: memory random acceleration MD (mRAM… ▽ More Ligand diffusion through proteins is a fundamental process governing biological signaling and enzymatic catalysis. The complex topology of protein tunnels results in difficulties with computing ligand escape pathways by standard molecular dynamics (MD) simulations. Here, two novel methods for searching of ligand exit pathways and cavity exploration are proposed: memory random acceleration MD (mRAMD), and memetic algorithms (MA). In mRAMD, finding exit pathways is based on a non-Markovian biasing that is introduced to optimize the unbinding force. In MA, hybrid learning protocols are exploited to predict optimal ligand exit paths. The methods are tested on three proteins with increasing complexity of tunnels: M2 muscarinic receptor, nitrile hydratase, and cytochrome P450cam. In these cases, the proposed methods outperform standard techniques that are used currently to find ligand egress pathways. The proposed approach is general and appropriate for accelerated transport of an object through a network of protein tunnels. △ Less

Submitted 20 March, 2019; v1 submitted 1 July, 2015; originally announced July 2015.

Journal ref: J. Chem. Phys. 143 (12), 124101, 2015

Showing 1–11 of 11 results for author: Rydzewski, J