-
Invariance of Gaussian RKHSs under Koopman operators of stochastic differential equations with constant matrix coefficients
Authors:
Friedrich Philipp,
Manuel Schaller,
Karl Worthmann,
Sebastian Peitz,
Feliks Nüske
Abstract:
We consider the Koopman operator semigroup $(K^t)_{t\ge 0}$ associated with stochastic differential equations of the form $dX_t = AX_t\,dt + B\,dW_t$ with constant matrices $A$ and $B$ and Brownian motion $W_t$. We prove that the reproducing kernel Hilbert space $\bH_C$ generated by a Gaussian kernel with a positive definite covariance matrix $C$ is invariant under each Koopman operator $K^t$ if t…
▽ More
We consider the Koopman operator semigroup $(K^t)_{t\ge 0}$ associated with stochastic differential equations of the form $dX_t = AX_t\,dt + B\,dW_t$ with constant matrices $A$ and $B$ and Brownian motion $W_t$. We prove that the reproducing kernel Hilbert space $\bH_C$ generated by a Gaussian kernel with a positive definite covariance matrix $C$ is invariant under each Koopman operator $K^t$ if the matrices $A$, $B$, and $C$ satisfy the following Lyapunov-like matrix inequality: $AC^2 + C^2A^\top\le 2BB^\top$. In this course, we prove a characterization concerning the inclusion $\bH_{C_1}\subset\bH_{C_2}$ of Gaussian RKHSs for two positive definite matrices $C_1$ and $C_2$. The question of whether the sufficient Lyapunov-condition is also necessary is left as an open problem.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Variance representations and convergence rates for data-driven approximations of Koopman operators
Authors:
Friedrich M. Philipp,
Manuel Schaller,
Septimus Boshoff,
Sebastian Peitz,
Feliks Nüske,
Karl Worthmann
Abstract:
We rigorously derive novel error bounds for extended dynamic mode decomposition (EDMD) to approximate the Koopman operator for discrete- and continuous time (stochastic) systems; both for i.i.d. and ergodic sampling under non-restrictive assumptions. We show exponential convergence rates for i.i.d. sampling and provide the first superlinear convergence rates for ergodic sampling of deterministic s…
▽ More
We rigorously derive novel error bounds for extended dynamic mode decomposition (EDMD) to approximate the Koopman operator for discrete- and continuous time (stochastic) systems; both for i.i.d. and ergodic sampling under non-restrictive assumptions. We show exponential convergence rates for i.i.d. sampling and provide the first superlinear convergence rates for ergodic sampling of deterministic systems. The proofs are based on novel exact variance representations for the empirical estimators of mass and stiffness matrix. Moreover, we verify the accuracy of the derived error bounds and convergence rates by means of numerical simulations for highly-complex dynamical systems including a nonlinear partial differential equation.
△ Less
Submitted 23 May, 2024; v1 submitted 4 February, 2024;
originally announced February 2024.
-
Error analysis of kernel EDMD for prediction and control in the Koopman framework
Authors:
Friedrich Philipp,
Manuel Schaller,
Karl Worthmann,
Sebastian Peitz,
Feliks Nüske
Abstract:
Extended Dynamic Mode Decomposition (EDMD) is a popular data-driven method to approximate the Koopman operator for deterministic and stochastic (control) systems. This operator is linear and encompasses full information on the (expected stochastic) dynamics. In this paper, we analyze a kernel-based EDMD algorithm, known as kEDMD, where the dictionary consists of the canonical kernel features at th…
▽ More
Extended Dynamic Mode Decomposition (EDMD) is a popular data-driven method to approximate the Koopman operator for deterministic and stochastic (control) systems. This operator is linear and encompasses full information on the (expected stochastic) dynamics. In this paper, we analyze a kernel-based EDMD algorithm, known as kEDMD, where the dictionary consists of the canonical kernel features at the data points. The latter are acquired by i.i.d. samples from a user-defined and application-driven distribution on a compact set. We prove bounds on the prediction error of the kEDMD estimator when sampling from this (not necessarily ergodic) distribution. The error analysis is further extended to control-affine systems, where the considered invariance of the Reproducing Kernel Hilbert Space is significantly less restrictive in comparison to invariance assumptions on an a-priori chosen dictionary.
△ Less
Submitted 16 December, 2023;
originally announced December 2023.
-
Partial observations, coarse graining and equivariance in Koopman operator theory for large-scale dynamical systems
Authors:
Sebastian Peitz,
Hans Harder,
Feliks Nüske,
Friedrich Philipp,
Manuel Schaller,
Karl Worthmann
Abstract:
The Koopman operator has become an essential tool for data-driven analysis, prediction and control of complex systems, the main reason being the enormous potential of identifying linear function space representations of nonlinear dynamics from measurements. Until now, the situation where for large-scale systems, we (i) only have access to partial observations (i.e., measurements, as is very common…
▽ More
The Koopman operator has become an essential tool for data-driven analysis, prediction and control of complex systems, the main reason being the enormous potential of identifying linear function space representations of nonlinear dynamics from measurements. Until now, the situation where for large-scale systems, we (i) only have access to partial observations (i.e., measurements, as is very common for experimental data) or (ii) deliberately perform coarse graining (for efficiency reasons) has not been treated to its full extent. In this paper, we address the pitfall associated with this situation, that the classical EDMD algorithm does not automatically provide a Koopman operator approximation for the underlying system if we do not carefully select the number of observables. Moreover, we show that symmetries in the system dynamics can be carried over to the Koopman operator, which allows us to massively increase the model efficiency. We also briefly draw a connection to domain decomposition techniques for partial differential equations and present numerical evidence using the Kuramoto--Sivashinsky equation.
△ Less
Submitted 28 July, 2023;
originally announced July 2023.
-
Efficient Approximation of Molecular Kinetics using Random Fourier Features
Authors:
Feliks Nüske,
Stefan Klus
Abstract:
Slow kinetic processes of molecular systems can be analyzed by computing dominant eigenpairs of the Koopman operator or its generator. In this context, the Variational Approach to Markov Processes (VAMP) provides a rigorous way of discerning the quality of different approximate models. Kernel methods have been shown to provide accurate and robust estimates for slow kinetic processes, but are sensi…
▽ More
Slow kinetic processes of molecular systems can be analyzed by computing dominant eigenpairs of the Koopman operator or its generator. In this context, the Variational Approach to Markov Processes (VAMP) provides a rigorous way of discerning the quality of different approximate models. Kernel methods have been shown to provide accurate and robust estimates for slow kinetic processes, but are sensitive to hyper-parameter selection, and require the solution of large-scale generalized eigenvalue problems, which can easily become computationally demanding for large data sizes. In this contribution, we employ a stochastic approximation of the kernel based on random Fourier features (RFFs), to derive a small-scale dual eigenvalue problem which can easily be solved. We provide an interpretation of this procedure in terms of a finite randomly generated basis set. By combining the RFF approach and model selection by means of the VAMP score, we show that kernel parameters can be efficiently tuned, and accurate estimates of slow molecular kinetics can be obtained for several benchmarking systems, such as deca alanine and the NTL9 protein.
△ Less
Submitted 15 June, 2023; v1 submitted 1 June, 2023;
originally announced June 2023.
-
Error bounds for kernel-based approximations of the Koopman operator
Authors:
Friedrich Philipp,
Manuel Schaller,
Karl Worthmann,
Sebastian Peitz,
Feliks Nüske
Abstract:
We consider the data-driven approximation of the Koopman operator for stochastic differential equations on reproducing kernel Hilbert spaces (RKHS). Our focus is on the estimation error if the data are collected from long-term ergodic simulations. We derive both an exact expression for the variance of the kernel cross-covariance operator, measured in the Hilbert-Schmidt norm, and probabilistic bou…
▽ More
We consider the data-driven approximation of the Koopman operator for stochastic differential equations on reproducing kernel Hilbert spaces (RKHS). Our focus is on the estimation error if the data are collected from long-term ergodic simulations. We derive both an exact expression for the variance of the kernel cross-covariance operator, measured in the Hilbert-Schmidt norm, and probabilistic bounds for the finite-data estimation error. Moreover, we derive a bound on the prediction error of observables in the RKHS using a finite Mercer series expansion. Further, assuming Koopman-invariance of the RKHS, we provide bounds on the full approximation error. Numerical experiments using the Ornstein-Uhlenbeck process illustrate our results.
△ Less
Submitted 19 December, 2023; v1 submitted 20 January, 2023;
originally announced January 2023.
-
Funnel Control for Langevin Dynamics
Authors:
Thomas Berger,
Feliks Nüske
Abstract:
We study tracking control for stochastic differential equations of Langevin type and describe a new conceptual approach to the sampling problem for those systems. The objective is to guarantee the evolution of the mean value in a prescribed performance funnel around a given sufficiently smooth reference signal. To achieve this objective we design a novel funnel controller and show its feasibility…
▽ More
We study tracking control for stochastic differential equations of Langevin type and describe a new conceptual approach to the sampling problem for those systems. The objective is to guarantee the evolution of the mean value in a prescribed performance funnel around a given sufficiently smooth reference signal. To achieve this objective we design a novel funnel controller and show its feasibility under certain structural conditions on the potential energy. The control design does not require any specific knowledge of the shape of the potential energy. We illustrate the results by a numerical simulation for a double-well potential.
△ Less
Submitted 26 June, 2023; v1 submitted 14 November, 2022;
originally announced November 2022.
-
Towards reliable data-based optimal and predictive control using extended DMD
Authors:
Manuel Schaller,
Karl Worthmann,
Friedrich Philipp,
Sebastian Peitz,
Feliks Nüske
Abstract:
While Koopman-based techniques like extended Dynamic Mode Decomposition are nowadays ubiquitous in the data-driven approximation of dynamical systems, quantitative error estimates were only recently established. To this end, both sources of error resulting from a finite dictionary and only finitely-many data points in the generation of the surrogate model have to be taken into account. We generali…
▽ More
While Koopman-based techniques like extended Dynamic Mode Decomposition are nowadays ubiquitous in the data-driven approximation of dynamical systems, quantitative error estimates were only recently established. To this end, both sources of error resulting from a finite dictionary and only finitely-many data points in the generation of the surrogate model have to be taken into account. We generalize the rigorous analysis of the approximation error to the control setting while simultaneously reducing the impact of the curse of dimensionality by using a recently proposed bilinear approach. In particular, we establish uniform bounds on the approximation error of state-dependent quantities like constraints or a performance index enabling data-based optimal and predictive control with guarantees.
△ Less
Submitted 13 November, 2022; v1 submitted 18 February, 2022;
originally announced February 2022.
-
Koopman analysis of quantum systems
Authors:
Stefan Klus,
Feliks Nüske,
Sebastian Peitz
Abstract:
Koopman operator theory has been successfully applied to problems from various research areas such as fluid dynamics, molecular dynamics, climate science, engineering, and biology. Applications include detecting metastable or coherent sets, coarse-graining, system identification, and control. There is an intricate connection between dynamical systems driven by stochastic differential equations and…
▽ More
Koopman operator theory has been successfully applied to problems from various research areas such as fluid dynamics, molecular dynamics, climate science, engineering, and biology. Applications include detecting metastable or coherent sets, coarse-graining, system identification, and control. There is an intricate connection between dynamical systems driven by stochastic differential equations and quantum mechanics. In this paper, we compare the ground-state transformation and Nelson's stochastic mechanics and demonstrate how data-driven methods developed for the approximation of the Koopman operator can be used to analyze quantum physics problems. Moreover, we exploit the relationship between Schrödinger operators and stochastic control problems to show that modern data-driven methods for stochastic control can be used to solve the stationary or imaginary-time Schrödinger equation. Our findings open up a new avenue towards solving Schrödinger's equation using recently developed tools from data science.
△ Less
Submitted 28 June, 2022; v1 submitted 28 January, 2022;
originally announced January 2022.
-
tgEDMD: Approximation of the Kolmogorov Operator in Tensor Train Format
Authors:
Marvin Lücke,
Feliks Nüske
Abstract:
Extracting information about dynamical systems from models learned off simulation data has become an increasingly important research topic in the natural and engineering sciences. Modeling the Koopman operator semigroup has played a central role in this context. As the approximation quality of any such model critically depends on the basis set, recent work has focused on deriving data-efficient re…
▽ More
Extracting information about dynamical systems from models learned off simulation data has become an increasingly important research topic in the natural and engineering sciences. Modeling the Koopman operator semigroup has played a central role in this context. As the approximation quality of any such model critically depends on the basis set, recent work has focused on deriving data-efficient representations of the Koopman operator in low-rank tensor formats, enabling the use of powerful model classes while avoiding over-fitting. On the other hand, detailed information about the system at hand can be extracted from models for the infinitesimal generator, also called Kolmogorov backward operator for stochastic differential equations. In this work, we present a data-driven method to efficiently approximate the generator using the tensor train (TT) format. The centerpiece of the method is a TT representation of the tensor of generator evaluations at all data sites. We analyze consistency and complexity of the method, present extensions to practically relevant settings, and demonstrate its applicability to benchmark numerical examples.
△ Less
Submitted 21 March, 2022; v1 submitted 18 November, 2021;
originally announced November 2021.
-
Finite-data error bounds for Koopman-based prediction and control
Authors:
Feliks Nüske,
Sebastian Peitz,
Friedrich Philipp,
Manuel Schaller,
Karl Worthmann
Abstract:
The Koopman operator has become an essential tool for data-driven approximation of dynamical (control) systems, e.g., via extended dynamic mode decomposition. Despite its popularity, convergence results and, in particular, error bounds are still scarce. In this paper, we derive probabilistic bounds for the approximation error and the prediction error depending on the number of training data points…
▽ More
The Koopman operator has become an essential tool for data-driven approximation of dynamical (control) systems, e.g., via extended dynamic mode decomposition. Despite its popularity, convergence results and, in particular, error bounds are still scarce. In this paper, we derive probabilistic bounds for the approximation error and the prediction error depending on the number of training data points; for both ordinary and stochastic differential equations while using either ergodic trajectories or i.i.d. samples. We illustrate these bounds by means of an example with the Ornstein-Uhlenbeck process. Moreover, we extend our analysis to (stochastic) nonlinear control-affine systems. We prove error estimates for a previously proposed approach that exploits the linearity of the Koopman generator to obtain a bilinear surrogate control system and, thus, circumvents the curse of dimensionality since the system is not autonomized by augmenting the state by the control inputs. To the best of our knowledge, this is the first finite-data error analysis in the stochastic and/or control setting. Finally, we demonstrate the effectiveness of the bilinear approach by comparing it with state-of-the-art techniques showing its superiority whenever state and control are coupled.
△ Less
Submitted 15 February, 2022; v1 submitted 16 August, 2021;
originally announced August 2021.
-
Symmetric and antisymmetric kernels for machine learning problems in quantum physics and chemistry
Authors:
Stefan Klus,
Patrick Gelß,
Feliks Nüske,
Frank Noé
Abstract:
We derive symmetric and antisymmetric kernels by symmetrizing and antisymmetrizing conventional kernels and analyze their properties. In particular, we compute the feature space dimensions of the resulting polynomial kernels, prove that the reproducing kernel Hilbert spaces induced by symmetric and antisymmetric Gaussian kernels are dense in the space of symmetric and antisymmetric functions, and…
▽ More
We derive symmetric and antisymmetric kernels by symmetrizing and antisymmetrizing conventional kernels and analyze their properties. In particular, we compute the feature space dimensions of the resulting polynomial kernels, prove that the reproducing kernel Hilbert spaces induced by symmetric and antisymmetric Gaussian kernels are dense in the space of symmetric and antisymmetric functions, and propose a Slater determinant representation of the antisymmetric Gaussian kernel, which allows for an efficient evaluation even if the state space is high-dimensional. Furthermore, we show that by exploiting symmetries or antisymmetries the size of the training data set can be significantly reduced. The results are illustrated with guiding examples and simple quantum physics and chemistry applications.
△ Less
Submitted 26 June, 2021; v1 submitted 31 March, 2021;
originally announced March 2021.
-
Kernel-based approximation of the Koopman generator and Schrödinger operator
Authors:
Stefan Klus,
Feliks Nüske,
Boumediene Hamzi
Abstract:
Many dimensionality and model reduction techniques rely on estimating dominant eigenfunctions of associated dynamical operators from data. Important examples include the Koopman operator and its generator, but also the Schrödinger operator. We propose a kernel-based method for the approximation of differential operators in reproducing kernel Hilbert spaces and show how eigenfunctions can be estima…
▽ More
Many dimensionality and model reduction techniques rely on estimating dominant eigenfunctions of associated dynamical operators from data. Important examples include the Koopman operator and its generator, but also the Schrödinger operator. We propose a kernel-based method for the approximation of differential operators in reproducing kernel Hilbert spaces and show how eigenfunctions can be estimated by solving auxiliary matrix eigenvalue problems. The resulting algorithms are applied to molecular dynamics and quantum chemistry examples. Furthermore, we exploit that, under certain conditions, the Schrödinger operator can be transformed into a Kolmogorov backward operator corresponding to a drift-diffusion process and vice versa. This allows us to apply methods developed for the analysis of high-dimensional stochastic differential equations to quantum mechanical systems.
△ Less
Submitted 25 December, 2020; v1 submitted 27 May, 2020;
originally announced May 2020.
-
Data-driven approximation of the Koopman generator: Model reduction, system identification, and control
Authors:
Stefan Klus,
Feliks Nüske,
Sebastian Peitz,
Jan-Hendrik Niemann,
Cecilia Clementi,
Christof Schütte
Abstract:
We derive a data-driven method for the approximation of the Koopman generator called gEDMD, which can be regarded as a straightforward extension of EDMD (extended dynamic mode decomposition). This approach is applicable to deterministic and stochastic dynamical systems. It can be used for computing eigenvalues, eigenfunctions, and modes of the generator and for system identification. In addition t…
▽ More
We derive a data-driven method for the approximation of the Koopman generator called gEDMD, which can be regarded as a straightforward extension of EDMD (extended dynamic mode decomposition). This approach is applicable to deterministic and stochastic dynamical systems. It can be used for computing eigenvalues, eigenfunctions, and modes of the generator and for system identification. In addition to learning the governing equations of deterministic systems, which then reduces to SINDy (sparse identification of nonlinear dynamics), it is possible to identify the drift and diffusion terms of stochastic differential equations from data. Moreover, we apply gEDMD to derive coarse-grained models of high-dimensional systems, and also to determine efficient model predictive control strategies. We highlight relationships with other methods and demonstrate the efficacy of the proposed methods using several guiding examples and prototypical molecular dynamics problems.
△ Less
Submitted 13 February, 2020; v1 submitted 23 September, 2019;
originally announced September 2019.
-
Tensor-based computation of metastable and coherent sets
Authors:
Feliks Nüske,
Patrick Gelß,
Stefan Klus,
Cecilia Clementi
Abstract:
Recent years have seen rapid advances in the data-driven analysis of dynamical systems based on Koopman operator theory and related approaches. On the other hand, low-rank tensor product approximations -- in particular the tensor train (TT) format -- have become a valuable tool for the solution of large-scale problems in a number of fields. In this work, we combine Koopman-based models and the TT…
▽ More
Recent years have seen rapid advances in the data-driven analysis of dynamical systems based on Koopman operator theory and related approaches. On the other hand, low-rank tensor product approximations -- in particular the tensor train (TT) format -- have become a valuable tool for the solution of large-scale problems in a number of fields. In this work, we combine Koopman-based models and the TT format, enabling their application to high-dimensional problems in conjunction with a rich set of basis functions or features. We derive efficient algorithms to obtain a reduced matrix representation of the system's evolution operator starting from an appropriate low-rank representation of the data. These algorithms can be applied to both stationary and non-stationary systems. We establish the infinite-data limit of these matrix representations, and demonstrate our methods' capabilities using several benchmark data sets.
△ Less
Submitted 10 August, 2021; v1 submitted 12 August, 2019;
originally announced August 2019.
-
Coarse-graining Molecular Systems by Spectral Matching
Authors:
Feliks Nüske,
Lorenzo Boninsegna,
Cecilia Clementi
Abstract:
Coarse-graining has become an area of tremendous importance within many different research fields. For molecular simulation, coarse-graining bears the promise of finding simplified models such that long-time simulations of large-scale systems become computationally tractable. While significant progress has been made in tuning thermodynamic properties of reduced models, it remains a key challenge t…
▽ More
Coarse-graining has become an area of tremendous importance within many different research fields. For molecular simulation, coarse-graining bears the promise of finding simplified models such that long-time simulations of large-scale systems become computationally tractable. While significant progress has been made in tuning thermodynamic properties of reduced models, it remains a key challenge to ensure that relevant kinetic properties are retained by coarse-grained dynamical systems. In this study, we focus on data-driven methods to preserve the rare-event kinetics of the original system, and make use of their close connection to the low-lying spectrum of the system's generator. Building on work by Crommelin and Vanden-Eijnden, SIAM Multiscale Model. Simul. (2011), we present a general framework, called spectral matching, which directly targets the generator's leading eigenvalue equations when learning parameters for coarse-grained models. We discuss different parametric models for effective dynamics and derive the resulting data-based regression problems. We show that spectral matching can be used to learn effective potentials which retain the slow dynamics, but also to correct the dynamics induced by existing techniques, such as force matching.
△ Less
Submitted 15 April, 2019;
originally announced April 2019.
-
Spectral Properties of Effective Dynamics from Conditional Expectations
Authors:
Feliks Nüske,
Péter Koltai,
Lorenzo Boninsegna,
Cecilia Clementi
Abstract:
The reduction of high-dimensional systems to effective models on a smaller set of variables is an essential task in many areas of science. For stochastic dynamics governed by diffusion processes, a general procedure to find effective equations is the conditioning approach. In this paper, we are interested in the spectrum of the generator of the resulting effective dynamics, and how it compares to…
▽ More
The reduction of high-dimensional systems to effective models on a smaller set of variables is an essential task in many areas of science. For stochastic dynamics governed by diffusion processes, a general procedure to find effective equations is the conditioning approach. In this paper, we are interested in the spectrum of the generator of the resulting effective dynamics, and how it compares to the spectrum of the full generator. We prove a new relative error bound in terms of the eigenfunction approximation error for reversible systems. We also present numerical examples indicating that if Kramers--Moyal (KM) type approximations are used to compute the spectrum of the reduced generator, it seems largely insensitive to the time window used for the KM estimators. We analyze the implications of these observations for systems driven by underdamped Langevin dynamics, and show how meaningful effective dynamics can be defined in this setting.
△ Less
Submitted 12 December, 2020; v1 submitted 6 January, 2019;
originally announced January 2019.
-
Sparse learning of stochastic dynamic equations
Authors:
Lorenzo Boninsegna,
Feliks Nüske,
Cecilia Clementi
Abstract:
With the rapid increase of available data for complex systems, there is great interest in the extraction of physically relevant information from massive datasets. Recently, a framework called Sparse Identification of Nonlinear Dynamics (SINDy) has been introduced to identify the governing equations of dynamical systems from simulation data. In this study, we extend SINDy to stochastic dynamical sy…
▽ More
With the rapid increase of available data for complex systems, there is great interest in the extraction of physically relevant information from massive datasets. Recently, a framework called Sparse Identification of Nonlinear Dynamics (SINDy) has been introduced to identify the governing equations of dynamical systems from simulation data. In this study, we extend SINDy to stochastic dynamical systems, which are frequently used to model biophysical processes. We prove the asymptotic correctness of stochastics SINDy in the infinite data limit, both in the original and projected variables. We discuss algorithms to solve the sparse regression problem arising from the practical implementation of SINDy, and show that cross validation is an essential tool to determine the right level of sparsity. We demonstrate the proposed methodology on two test systems, namely, the diffusion in a one-dimensional potential, and the projected dynamics of a two-dimensional diffusion process.
△ Less
Submitted 6 December, 2017;
originally announced December 2017.
-
Data-driven model reduction and transfer operator approximation
Authors:
Stefan Klus,
Feliks Nüske,
Péter Koltai,
Hao Wu,
Ioannis Kevrekidis,
Christof Schütte,
Frank Noé
Abstract:
In this review paper, we will present different data-driven dimension reduction techniques for dynamical systems that are based on transfer operator theory as well as methods to approximate transfer operators and their eigenvalues, eigenfunctions, and eigenmodes. The goal is to point out similarities and differences between methods developed independently by the dynamical systems, fluid dynamics,…
▽ More
In this review paper, we will present different data-driven dimension reduction techniques for dynamical systems that are based on transfer operator theory as well as methods to approximate transfer operators and their eigenvalues, eigenfunctions, and eigenmodes. The goal is to point out similarities and differences between methods developed independently by the dynamical systems, fluid dynamics, and molecular dynamics communities such as time-lagged independent component analysis (TICA), dynamic mode decomposition (DMD), and their respective generalizations. As a result, extensions and best practices developed for one particular method can be carried over to other related methods.
△ Less
Submitted 18 September, 2017; v1 submitted 29 March, 2017;
originally announced March 2017.
-
Markov State Models from short non-Equilibrium Simulations - Analysis and Correction of Estimation Bias
Authors:
Feliks Nüske,
Hao Wu,
Jan-Hendrik Prinz,
Christoph Wehmeyer,
Cecilia Clementi,
Frank Noé
Abstract:
Many state of the art methods for the thermodynamic and kinetic characterization of large and complex biomolecular systems by simulation rely on ensemble approaches, where data from large numbers of relatively short trajectories are integrated. In this context, Markov state models (MSMs) are extremely popular because they can be used to compute stationary quantities and long-time kinetics from ens…
▽ More
Many state of the art methods for the thermodynamic and kinetic characterization of large and complex biomolecular systems by simulation rely on ensemble approaches, where data from large numbers of relatively short trajectories are integrated. In this context, Markov state models (MSMs) are extremely popular because they can be used to compute stationary quantities and long-time kinetics from ensembles of short simulations, provided that these short simulations are in "local equilibrium" within the MSM states. However, in the last over 15 years since the inception of MSMs, it has been controversially discussed and not yet been answered how deviations from local equilibrium can be detected, whether these deviations induce a practical bias in MSM estimation, and how to correct for them. In this paper, we address these issues: We systematically analyze the estimation of Markov state models (MSMs) from short non-equilibrium simulations, and we provide an expression for the error between unbiased transition probabilities and the expected estimate from many short simulations. We show that the unbiased MSM estimate can be obtained even from relatively short non-equilibrium simulations in the limit of long lag times and good discretization. Further, we exploit observable operator model (OOM) theory to derive an unbiased estimator for the MSM transition matrix that corrects for the effect of starting out of equilibrium, even when short lag times are used. Finally, we show how the OOM framework can be used to estimate the exact eigenvalues or relaxation timescales of the system without estimating an MSM transition matrix, which allows us to practically assess the discretization quality of the MSM. Applications to model systems and molecular dynamics simulation data of alanine dipeptide are included for illustration. The improved MSM estimator is implemented in PyEMMA as of version 2.3.
△ Less
Submitted 6 January, 2017;
originally announced January 2017.
-
Variational Koopman models: slow collective variables and molecular kinetics from short off-equilibrium simulations
Authors:
Hao Wu,
Feliks Nüske,
Fabian Paul,
Stefan Klus,
Peter Koltai,
Frank Noé
Abstract:
Markov state models (MSMs) and Master equation models are popular approaches to approximate molecular kinetics, equilibria, metastable states, and reaction coordinates in terms of a state space discretization usually obtained by clustering. Recently, a powerful generalization of MSMs has been introduced, the variational approach (VA) of molecular kinetics and its special case the time-lagged indep…
▽ More
Markov state models (MSMs) and Master equation models are popular approaches to approximate molecular kinetics, equilibria, metastable states, and reaction coordinates in terms of a state space discretization usually obtained by clustering. Recently, a powerful generalization of MSMs has been introduced, the variational approach (VA) of molecular kinetics and its special case the time-lagged independent component analysis (TICA), which allow us to approximate slow collective variables and molecular kinetics by linear combinations of smooth basis functions or order parameters. While it is known how to estimate MSMs from trajectories whose starting points are not sampled from an equilibrium ensemble, this has not yet been the case for TICA and the VA. Previous estimates from short trajectories, have been strongly biased and thus not variationally optimal. Here, we employ Koopman operator theory and ideas from dynamic mode decomposition (DMD) to extend the VA and TICA to non-equilibrium data. The main insight is that the VA and TICA provide a coefficient matrix that we call Koopman model, as it approximates the underlying dynamical (Koopman) operator in conjunction with the basis set used. This Koopman model can be used to compute a stationary vector to reweight the data to equilibrium. From such a Koopman-reweighted sample, equilibrium expectation values and variationally optimal reversible Koopman models can be constructed even with short simulations. The Koopman model can be used to propagate densities, and its eigenvalue decomposition provide estimates of relaxation timescales and slow collective variables for dimension reduction. Koopman models are generalizations of Markov state models, TICA and the linear VA and allow molecular kinetics to be described without a cluster discretization.
△ Less
Submitted 22 January, 2017; v1 submitted 20 October, 2016;
originally announced October 2016.
-
A variational approach to modeling slow processes in stochastic dynamical systems
Authors:
Frank Noé,
Feliks Nüske
Abstract:
The slow processes of metastable stochastic dynamical systems are difficult to access by direct numerical simulation due the sampling problem. Here, we suggest an approach for modeling the slow parts of Markov processes by approximating the dominant eigenfunctions and eigenvalues of the propagator. To this end, a variational principle is derived that is based on the maximization of a Rayleigh coef…
▽ More
The slow processes of metastable stochastic dynamical systems are difficult to access by direct numerical simulation due the sampling problem. Here, we suggest an approach for modeling the slow parts of Markov processes by approximating the dominant eigenfunctions and eigenvalues of the propagator. To this end, a variational principle is derived that is based on the maximization of a Rayleigh coefficient. It is shown that this Rayleigh coefficient can be estimated from statistical observables that can be obtained from short distributed simulations starting from different parts of state space. The approach forms a basis for the development of adaptive and efficient computational algorithms for simulating and analyzing metastable Markov processes while avoiding the sampling problem. Since any stochastic process with finite memory can be transformed into a Markov process, the approach is applicable to a wide range of processes relevant for modeling complex real-world phenomena.
△ Less
Submitted 29 November, 2012;
originally announced November 2012.