-
Data-driven dynamical coarse-graining for condensed matter systems
Authors:
Mauricio J. del Razo,
Daan Crommelin,
Peter G. Bolhuis
Abstract:
Simulations of condensed matter systems often focus on the dynamics of a few distinguished components but require integrating the dynamics of the full system. A prime example is a molecular dynamics simulation of a (macro)molecule in solution, where both the molecules(s) and the solvent dynamics needs to be integrated. This renders the simulations computationally costly and often unfeasible for ph…
▽ More
Simulations of condensed matter systems often focus on the dynamics of a few distinguished components but require integrating the dynamics of the full system. A prime example is a molecular dynamics simulation of a (macro)molecule in solution, where both the molecules(s) and the solvent dynamics needs to be integrated. This renders the simulations computationally costly and often unfeasible for physically or biologically relevant time scales. Standard coarse graining approaches are capable of reproducing equilibrium distributions and structural features but do not properly include the dynamics. In this work, we develop a stochastic data-driven coarse-graining method inspired by the Mori-Zwanzig formalism. This formalism shows that macroscopic systems with a large number of degrees of freedom can in principle be well described by a small number of relevant variables plus additional noise and memory terms. Our coarse-graining method consists of numerical integrators for the distinguished components of the system, where the noise and interaction terms with other system components are substituted by a random variable sampled from a data-driven model. Applying our methodology on three different systems -- a distinguished particle under a harmonic potential and under a bistable potential; and a dimer with two metastable configurations -- we show that the resulting coarse-grained models are not only capable of reproducing the correct equilibrium distributions but also the dynamic behavior due to temporal correlations and memory effects. Our coarse-graining method requires data from full-scale simulations to be parametrized, and can in principle be extended to different types of models beyond Langevin dynamics.
△ Less
Submitted 30 June, 2023;
originally announced June 2023.
-
Comparison of neural closure models for discretised PDEs
Authors:
Hugo Melchers,
Daan Crommelin,
Barry Koren,
Vlado Menkovski,
Benjamin Sanderse
Abstract:
Neural closure models have recently been proposed as a method for efficiently approximating small scales in multiscale systems with neural networks. The choice of loss function and associated training procedure has a large effect on the accuracy and stability of the resulting neural closure model. In this work, we systematically compare three distinct procedures: "derivative fitting", "trajectory…
▽ More
Neural closure models have recently been proposed as a method for efficiently approximating small scales in multiscale systems with neural networks. The choice of loss function and associated training procedure has a large effect on the accuracy and stability of the resulting neural closure model. In this work, we systematically compare three distinct procedures: "derivative fitting", "trajectory fitting" with discretise-then-optimise, and "trajectory fitting" with optimise-then-discretise. Derivative fitting is conceptually the simplest and computationally the most efficient approach and is found to perform reasonably well on one of the test problems (Kuramoto-Sivashinsky) but poorly on the other (Burgers). Trajectory fitting is computationally more expensive but is more robust and is therefore the preferred approach. Of the two trajectory fitting procedures, the discretise-then-optimise approach produces more accurate models than the optimise-then-discretise approach. While the optimise-then-discretise approach can still produce accurate models, care must be taken in choosing the length of the trajectories used for training, in order to train the models on long-term behaviour while still producing reasonably accurate gradients during training. Two existing theorems are interpreted in a novel way that gives insight into the long-term accuracy of a neural closure model based on how accurate it is in the short term.
△ Less
Submitted 18 May, 2023; v1 submitted 26 October, 2022;
originally announced October 2022.
-
VECMAtk: A Scalable Verification, Validation and Uncertainty Quantification Toolkit for Scientific Simulations
Authors:
D. Groen,
H. Arabnejad,
V. Jancauskas,
W. N. Edeling,
F. Jansson,
R. A. Richardson,
J. Lakhlili,
L. Veen,
B. Bosak,
P. Kopta,
D. W. Wright,
N. Monnier,
P. Karlshoefer,
D. Suleimenova,
R. Sinclair,
M. Vassaux,
A. Nikishova,
M. Bieniek,
O. O. Luk,
M. Kulczewski,
E. Raffin,
D. Crommelin,
O. Hoenen,
D. P. Coster,
T. Piontek
, et al. (1 additional authors not shown)
Abstract:
We present the VECMA toolkit (VECMAtk), a flexible software environment for single and multiscale simulations that introduces directly applicable and reusable procedures for verification, validation (V&V), sensitivity analysis (SA) and uncertainty quantification (UQ). It enables users to verify key aspects of their applications, systematically compare and validate the simulation outputs against ob…
▽ More
We present the VECMA toolkit (VECMAtk), a flexible software environment for single and multiscale simulations that introduces directly applicable and reusable procedures for verification, validation (V&V), sensitivity analysis (SA) and uncertainty quantification (UQ). It enables users to verify key aspects of their applications, systematically compare and validate the simulation outputs against observational or benchmark data, and run simulations conveniently on any platform from the desktop to current multi-petascale computers. In this sequel to our paper on VECMAtk which we presented last year, we focus on a range of functional and performance improvements that we have introduced, cover newly introduced components, and applications examples from seven different domains such as conflict modelling and environmental sciences. We also present several implemented patterns for UQ/SA and V&V, and guide the reader through one example concerning COVID-19 modelling in detail.
△ Less
Submitted 11 October, 2020; v1 submitted 8 October, 2020;
originally announced October 2020.
-
Stochastic parameterization with VARX processes
Authors:
Nick Verheul,
Daan Crommelin
Abstract:
In this study we investigate a data-driven stochastic methodology to parameterize small-scale features in a prototype multiscale dynamical system, the Lorenz '96 (L96) model. We propose to model the small-scale features using a vector autoregressive process with exogenous variable (VARX), estimated from given sample data. To reduce the number of parameters of the VARX we impose a diagonal structur…
▽ More
In this study we investigate a data-driven stochastic methodology to parameterize small-scale features in a prototype multiscale dynamical system, the Lorenz '96 (L96) model. We propose to model the small-scale features using a vector autoregressive process with exogenous variable (VARX), estimated from given sample data. To reduce the number of parameters of the VARX we impose a diagonal structure on its coefficient matrices. We apply the VARX to two different configurations of the 2-layer L96 model, one with common parameter choices giving unimodal invariant probability distributions for the L96 model variables, and one with non-standard parameters giving trimodal distributions. We show through various statistical criteria that the proposed VARX performs very well for the unimodal configuration, while kee** the number of parameters linear in the number of model variables. We also show that the parameterization performs accurately for the very challenging trimodal L96 configuration by allowing for a dense (non-diagonal) VARX covariance matrix.
△ Less
Submitted 7 October, 2020;
originally announced October 2020.
-
Resampling with neural networks for stochastic parameterization in multiscale systems
Authors:
Daan Crommelin,
Wouter Edeling
Abstract:
In simulations of multiscale dynamical systems, not all relevant processes can be resolved explicitly. Taking the effect of the unresolved processes into account is important, which introduces the need for paramerizations. We present a machine-learning method, used for the conditional resampling of observations or reference data from a fully resolved simulation. It is based on the probabilistic cl…
▽ More
In simulations of multiscale dynamical systems, not all relevant processes can be resolved explicitly. Taking the effect of the unresolved processes into account is important, which introduces the need for paramerizations. We present a machine-learning method, used for the conditional resampling of observations or reference data from a fully resolved simulation. It is based on the probabilistic classiffcation of subsets of reference data, conditioned on macroscopic variables. This method is used to formulate a parameterization that is stochastic, taking the uncertainty of the unresolved scales into account. We validate our approach on the Lorenz 96 system, using two different parameter settings which are challenging for parameterization methods.
△ Less
Submitted 3 April, 2020;
originally announced April 2020.
-
Computing first passage times for Markov-modulated fluid models using numerical PDE problem solvers
Authors:
Debarati Bhaumik,
Marko A. A. Boon,
Daan Crommelin,
Barry Koren,
Bert Zwart
Abstract:
A popular method to compute first-passage probabilities in continuous-time Markov chains is by numerically inverting their Laplace transforms. Past decades, the scientific computing community has developed excellent numerical methods for solving problems governed by partial differential equations (PDEs), making the availability of a Laplace transform not necessary here for computational purposes.…
▽ More
A popular method to compute first-passage probabilities in continuous-time Markov chains is by numerically inverting their Laplace transforms. Past decades, the scientific computing community has developed excellent numerical methods for solving problems governed by partial differential equations (PDEs), making the availability of a Laplace transform not necessary here for computational purposes. In this study we demonstrate that numerical PDE problem solvers are suitable for computing first passage times, and can be very efficient for this purpose. By doing extensive computational experiments, we show that modern PDE problem solvers can outperform numerical Laplace transform inversion, even if a transform is available. When the Laplace transform is explicit (e.g. does not require the computation of an eigensystem), numerical transform inversion remains the primary method of choice.
△ Less
Submitted 30 March, 2020;
originally announced March 2020.
-
Efficient estimation of divergence-based sensitivity indices with Gaussian process surrogates
Authors:
A. W. Eggels,
D. T. Crommelin
Abstract:
We consider the estimation of sensitivity indices based on divergence measures such as Hellinger distance. For sensitivity analysis of complex models, these divergence-based indices can be estimated by Monte-Carlo sampling (MCS) in combination with kernel density estimation (KDE). In a direct approach, the complex model must be evaluated at every input point generated by MCS, resulting in samples…
▽ More
We consider the estimation of sensitivity indices based on divergence measures such as Hellinger distance. For sensitivity analysis of complex models, these divergence-based indices can be estimated by Monte-Carlo sampling (MCS) in combination with kernel density estimation (KDE). In a direct approach, the complex model must be evaluated at every input point generated by MCS, resulting in samples in the input-output space that can be used for density estimation. However, if the computational cost of the complex model strongly limits the number of model evaluations, this direct method gives large errors. We propose to use Gaussian process (GP) surrogates to increase the number of samples in the combined input-output space. By enlarging this sample set, the KDE becomes more accurate, leading to improved estimates. To compare the GP surrogates, we use a surrogate constructed by samples obtained with stochastic collocation, combined with Lagrange interpolation. Furthermore, we propose a new estimation method for these sensitivity indices based on minimum spanning trees. Finally, we also propose a new type of sensitivity indices based on divergence measures, namely direct sensitivity indices. These are useful when the input data is dependent.
△ Less
Submitted 17 September, 2019; v1 submitted 8 April, 2019;
originally announced April 2019.
-
Rare Event Simulation for Steady-State Probabilities via Recurrency Cycles
Authors:
Krzysztof Bisewski,
Daan Crommelin,
Michel Mandjes
Abstract:
We develop a new algorithm for the estimation of rare event probabilities associated with the steady-state of a Markov stochastic process with continuous state space $\mathbb R^d$ and discrete time steps (i.e. a discrete-time $\mathbb R^d$-valued Markov chain). The algorithm, which we coin Recurrent Multilevel Splitting (RMS), relies on the Markov chain's underlying recurrent structure, in combina…
▽ More
We develop a new algorithm for the estimation of rare event probabilities associated with the steady-state of a Markov stochastic process with continuous state space $\mathbb R^d$ and discrete time steps (i.e. a discrete-time $\mathbb R^d$-valued Markov chain). The algorithm, which we coin Recurrent Multilevel Splitting (RMS), relies on the Markov chain's underlying recurrent structure, in combination with the Multilevel Splitting method. Extensive simulation experiments are performed, including experiments with a nonlinear stochastic model that has some characteristics of complex climate models. The numerical experiments show that RMS can boost the computational efficiency by several orders of magnitude compared to the Monte Carlo method.
△ Less
Submitted 5 April, 2019;
originally announced April 2019.
-
Quantifying dependencies for sensitivity analysis with multivariate input sample data
Authors:
A. W. Eggels,
D. T. Crommelin
Abstract:
We present a novel method for quantifying dependencies in multivariate datasets, based on estimating the Rényi entropy by minimum spanning trees (MSTs). The length of the MSTs can be used to order pairs of variables from strongly to weakly dependent, making it a useful tool for sensitivity analysis with dependent input variables. It is well-suited for cases where the input distribution is unknown…
▽ More
We present a novel method for quantifying dependencies in multivariate datasets, based on estimating the Rényi entropy by minimum spanning trees (MSTs). The length of the MSTs can be used to order pairs of variables from strongly to weakly dependent, making it a useful tool for sensitivity analysis with dependent input variables. It is well-suited for cases where the input distribution is unknown and only a sample of the inputs is available. We introduce an estimator to quantify dependency based on the MST length, and investigate its properties with several numerical examples. To reduce the computational cost of constructing the exact MST for large datasets, we explore methods to compute approximations to the exact MST, and find the multilevel approach introduced recently by Zhong et al. (2015) to be the most accurate. We apply our proposed method to an artificial testcase based on the Ishigami function, as well as to a real-world testcase involving sediment transport in the North Sea. The results are consistent with prior knowledge and heuristic understanding, as well as with variance-based analysis using Sobol indices in the case where these indices can be computed.
△ Less
Submitted 6 February, 2018;
originally announced February 2018.
-
Controlling the time discretization bias for the supremum of Brownian Motion
Authors:
Krzysztof Bisewski,
Daan Crommelin,
Michel Mandjes
Abstract:
We consider the bias arising from time discretization when estimating the threshold crossing probability $w(b) := \mathbb{P}(\sup_{t\in[0,1]} B_t > b)$, with $(B_t)_{t\in[0,1]}$ a standard Brownian Motion. We prove that if the discretization is equidistant, then to reach a given target value of the relative bias, the number of grid points has to grow quadratically in $b$, as $b$ grows. When consid…
▽ More
We consider the bias arising from time discretization when estimating the threshold crossing probability $w(b) := \mathbb{P}(\sup_{t\in[0,1]} B_t > b)$, with $(B_t)_{t\in[0,1]}$ a standard Brownian Motion. We prove that if the discretization is equidistant, then to reach a given target value of the relative bias, the number of grid points has to grow quadratically in $b$, as $b$ grows. When considering non-equidistant discretizations (with threshold-dependent grid points), we can substantially improve on this: we show that for such grids the required number of grid points is independent of $b$, and in addition we point out how they can be used to construct a strongly efficient algorithm for the estimation of $w(b)$. Finally, we show how to apply the resulting algorithm for a broad class of stochastic processes; it is empirically shown that the threshold-dependent grid significantly outperforms its equidistant counterpart.
△ Less
Submitted 13 September, 2017; v1 submitted 18 May, 2017;
originally announced May 2017.
-
Clustering-based collocation for uncertainty propagation with multivariate dependent inputs
Authors:
A. W. Eggels,
D. T. Crommelin,
J. A. S. Witteveen
Abstract:
In this article, we propose the use of partitioning and clustering methods as an alternative to Gaussian quadrature for stochastic collocation. The key idea is to use cluster centers as the nodes for collocation. In this way, we can extend the use of collocation methods to uncertainty propagation with multivariate, dependent input, in which the output approximation is piecewise constant on the clu…
▽ More
In this article, we propose the use of partitioning and clustering methods as an alternative to Gaussian quadrature for stochastic collocation. The key idea is to use cluster centers as the nodes for collocation. In this way, we can extend the use of collocation methods to uncertainty propagation with multivariate, dependent input, in which the output approximation is piecewise constant on the clusters. The approach is particularly useful in situations where the probability distribution of the input is unknown, and only a sample from the input distribution is available. We examine several clustering methods and assess the convergence of collocation based on these methods both theoretically and numerically. We demonstrate good performance of the proposed methods, most notably for the challenging case of nonlinearly dependent inputs in higher dimensions. Numerical tests with input dimension up to 16 are included, using as benchmarks the Genz test functions and a test case from computational fluid dynamics (lid-driven cavity flow).
△ Less
Submitted 15 April, 2019; v1 submitted 8 March, 2017;
originally announced March 2017.
-
Stochastic Climate Theory
Authors:
Georg A. Gottwald,
Daan T. Crommelin,
Christian L. E. Franzke
Abstract:
In this chapter we review stochastic modelling methods in climate science. First we provide a conceptual framework for stochastic modelling of deterministic dynamical systems based on the Mori-Zwanzig formalism. The Mori-Zwanzig equations contain a Markov term, a memory term and a term suggestive of stochastic noise. Within this framework we express standard model reduction methods such as averagi…
▽ More
In this chapter we review stochastic modelling methods in climate science. First we provide a conceptual framework for stochastic modelling of deterministic dynamical systems based on the Mori-Zwanzig formalism. The Mori-Zwanzig equations contain a Markov term, a memory term and a term suggestive of stochastic noise. Within this framework we express standard model reduction methods such as averaging and homogenization which eliminate the memory term. We further discuss ways to deal with the memory term and how the type of noise depends on the underlying deterministic chaotic system. Secondly, we review current approaches in stochastic data-driven models. We discuss how the drift and diffusion coefficients of models in the form of stochastic differential equations can be estimated from observational data. We pay attention to situations where the data stems from multi scale systems, a relevant topic in the context of data from the climate system. Furthermore, we discuss the use of discrete stochastic processes (Markov chains) for e.g. stochastic subgrid-scale modeling and other topics in climate science.
△ Less
Submitted 22 December, 2016;
originally announced December 2016.
-
Stochastic Parameterization: Towards a new view of Weather and Climate Models
Authors:
Judith Berner,
Ulrich Achatz,
Lauriane Batte,
Lisa Bengtsson,
Alvaro De La Camara,
Daan Crommelin,
Hannah Christensen,
Matteo Colangeli,
Stamen Dolaptchiev,
Christian L. E. Franzke,
Petra Friederichs,
Peter Imkeller,
Heikki Jarvinen,
Stephan Juricke,
Vassili Kitsios,
Franois Lott,
Valerio Lucarini,
Salil Mahajan,
Timothy N. Palmer,
Cecile Penland,
**-Song Von Storch,
Mirjana Sakradzija,
Michael Weniger,
Antje Weisheimer,
Paul D. Williams
, et al. (1 additional authors not shown)
Abstract:
The last decade has seen the success of stochastic parameterizations in short-term, medium-range and seasonal forecasts: operational weather centers now routinely use stochastic parameterization schemes to better represent model inadequacy and improve the quantification of forecast uncertainty. Developed initially for numerical weather prediction, the inclusion of stochastic parameterizations not…
▽ More
The last decade has seen the success of stochastic parameterizations in short-term, medium-range and seasonal forecasts: operational weather centers now routinely use stochastic parameterization schemes to better represent model inadequacy and improve the quantification of forecast uncertainty. Developed initially for numerical weather prediction, the inclusion of stochastic parameterizations not only provides better estimates of uncertainty, but it is also extremely promising for reducing longstanding climate biases and relevant for determining the climate response to external forcing. This article highlights recent developments from different research groups which show that the stochastic representation of unresolved processes in the atmosphere, oceans, land surface and cryosphere of comprehensive weather and climate models (a) gives rise to more reliable probabilistic forecasts of weather and climate and (b) reduces systematic model bias. We make a case that the use of mathematically stringent methods for the derivation of stochastic dynamic equations will lead to substantial improvements in our ability to accurately simulate weather and climate at all scales. Recent work in mathematics, statistical mechanics and turbulence is reviewed, its relevance for the climate problem demonstrated, and future research directions outlined.
△ Less
Submitted 20 March, 2017; v1 submitted 29 October, 2015;
originally announced October 2015.