Choice of adaptive sampling strategy impacts state discovery, transition probabilities, and the apparent mechanism of conformational changes
Authors:
Maxwell I. Zimmerman,
Justin R. Porter,
Xianqiang Sun,
Roseane R. Silva,
Gregory R. Bowman
Abstract:
Interest in equilibrium-based sampling methods has grown with recent advances in computational hardware and Markov state modeling (MSM) methods, yet outstanding questions remain that hinder widespread adoption. Namely, how do sampling strategies explore conformational space and how might this influence predictions? Here, we seek to answer these questions for four commonly used sampling methods: 1)…
▽ More
Interest in equilibrium-based sampling methods has grown with recent advances in computational hardware and Markov state modeling (MSM) methods, yet outstanding questions remain that hinder widespread adoption. Namely, how do sampling strategies explore conformational space and how might this influence predictions? Here, we seek to answer these questions for four commonly used sampling methods: 1) a long simulation, 2) many short simulations, 3) adaptive sampling, and 4) FAST. We first develop a theoretical framework for analytically calculating the probability of discovering states and uncover the drastic effects of varying the number and length of simulations. We then use kinetic Monte Carlo simulations on a variety of physically inspired landscapes to characterize state discovery and transition pathways. Consistently, we find that FAST simulations discover target states with the highest probability and traverse realistic pathways. Furthermore, we uncover the pathology that short parallel simulations sometimes predict an incorrect transition pathway by crossing large energy barriers that long simulations would typically circumnavigate, which we refer to as pathway tunneling. To protect against tunneling, we introduce FAST-string, which samples along the highest-flux transition paths to refine an MSMs transition probabilities and discriminate between competing pathways. Additionally, we compare MSM estimators in describing thermodynamics and kinetics. For adaptive sampling, we recommend normalizing the transition counts out of each state after adding pseudo-counts to avoid creating sources or sinks. Lastly, we evaluate our insights from simple landscapes with all-atom molecular dynamics simulations of the folding of the λ-repressor protein. We find that FAST-contacts predicts the same folding pathway as long simulations but with orders of magnitude less simulation time.
△ Less
Submitted 11 May, 2018;
originally announced May 2018.