-
RiteWeight: Randomized Iterative Trajectory Reweighting for Steady-State Distributions Without Discretization Error
Authors:
Sagar Kania,
David Aristoff,
Daniel M. Zuckerman
Abstract:
Molecular dynamics (MD) and enhanced sampling simulations have become fundamental tools for studying biomolecular events. A significant challenge in these simulations is ensuring that sampled configurations and transitions converge to the stationary distribution of interest, whether equilibrium or nonequilibrium. Lack of convergence constrains the estimation of mechanisms, free energy, and rates o…
▽ More
Molecular dynamics (MD) and enhanced sampling simulations have become fundamental tools for studying biomolecular events. A significant challenge in these simulations is ensuring that sampled configurations and transitions converge to the stationary distribution of interest, whether equilibrium or nonequilibrium. Lack of convergence constrains the estimation of mechanisms, free energy, and rates of complex molecular events. Here, we introduce the "Randomized Iterative Trajectory Reweighting" (RiteWeight) algorithm to estimate a stationary distribution from unconverged simulation data. This method iteratively reweights trajectories in a self-consistent way by solving for the stationary distribution using a discrete-state transition matrix, employing a new random clustering in each iteration. The iterative random clustering mitigates the phase-space discretization error inherent in existing trajectory reweighting techniques based on one-shot clustering and ultimately yields numerically unbiased, quasi-continuous configuration-space distributions and estimates of observables. We demonstrate the efficacy of RiteWeight using Trp-Cage synthetic MD trajectories.
△ Less
Submitted 10 January, 2024;
originally announced January 2024.
-
Extractors for Images of Varieties
Authors:
Zeyu Guo,
Ben Lee Volk,
Akhil Jalan,
David Zuckerman
Abstract:
We construct explicit deterministic extractors for polynomial images of varieties, that is, distributions sampled by applying a low-degree polynomial map $f : \mathbb{F}_q^r \to \mathbb{F}_q^n$ to an element sampled uniformly at random from a $k$-dimensional variety $V \subseteq \mathbb{F}_q^r$. This class of sources generalizes both polynomial sources, studied by Dvir, Gabizon and Wigderson (FOCS…
▽ More
We construct explicit deterministic extractors for polynomial images of varieties, that is, distributions sampled by applying a low-degree polynomial map $f : \mathbb{F}_q^r \to \mathbb{F}_q^n$ to an element sampled uniformly at random from a $k$-dimensional variety $V \subseteq \mathbb{F}_q^r$. This class of sources generalizes both polynomial sources, studied by Dvir, Gabizon and Wigderson (FOCS 2007, Comput. Complex. 2009), and variety sources, studied by Dvir (CCC 2009, Comput. Complex. 2012).
Assuming certain natural non-degeneracy conditions on the map $f$ and the variety $V$, which in particular ensure that the source has enough min-entropy, we extract almost all the min-entropy of the distribution. Unlike the Dvir-Gabizon-Wigderson and Dvir results, our construction works over large enough finite fields of arbitrary characteristic. One key part of our construction is an improved deterministic rank extractor for varieties. As a by-product, we obtain explicit Noether normalization lemmas for affine varieties and affine algebras.
Additionally, we generalize a construction of affine extractors with exponentially small error due to Bourgain, Dvir and Leeman (Comput. Complex. 2016) by extending it to all finite prime fields of quasipolynomial size.
△ Less
Submitted 14 January, 2023; v1 submitted 26 November, 2022;
originally announced November 2022.
-
Weighted ensemble: Recent mathematical developments
Authors:
D. Aristoff,
J. Copperman,
G. Simpson,
R. J. Webber,
D. M. Zuckerman
Abstract:
The weighted ensemble (WE) method, an enhanced sampling approach based on periodically replicating and pruning trajectories in a set of parallel simulations, has grown increasingly popular for computational biochemistry problems, due in part to improved hardware and the availability of modern software. Algorithmic and analytical improvements have also played an important role, and progress has acc…
▽ More
The weighted ensemble (WE) method, an enhanced sampling approach based on periodically replicating and pruning trajectories in a set of parallel simulations, has grown increasingly popular for computational biochemistry problems, due in part to improved hardware and the availability of modern software. Algorithmic and analytical improvements have also played an important role, and progress has accelerated in recent years. Here, we discuss and elaborate on the WE method from a mathematical perspective, highlighting recent results which have begun to yield greater computational efficiency. Notable among these innovations are variance reduction approaches that optimize trajectory management for systems of arbitrary dimensionality.
△ Less
Submitted 29 June, 2022;
originally announced June 2022.
-
Simple synthetic molecular dynamics for efficient trajectory generation
Authors:
John D. Russo,
Daniel M. Zuckerman
Abstract:
Synthetic molecular dynamics (synMD) trajectories from learned generative models have been proposed as a useful addition to the biomolecular simulation toolbox. The computational expense of explicitly integrating the equations of motion in molecular dynamics currently is a severe limit on the number and length of trajectories which can be generated for complex systems. Approximate, but more comput…
▽ More
Synthetic molecular dynamics (synMD) trajectories from learned generative models have been proposed as a useful addition to the biomolecular simulation toolbox. The computational expense of explicitly integrating the equations of motion in molecular dynamics currently is a severe limit on the number and length of trajectories which can be generated for complex systems. Approximate, but more computationally efficient, generative models can be used in place of explicit integration of the equations of motion, and can produce meaningful trajectories at greatly reduced computational cost. Here, we demonstrate a very simple synMD approach using a fine-grained Markov state model (MSM) with states mapped to specific atomistic configurations, which provides an exactly solvable reference. We anticipate this simple approach will enable rapid, effective testing of enhanced sampling algorithms in highly non-trivial models for both equilibrium and non-equilibrium problems. We demonstrate the use of a MSM to generate atomistic synMD trajectories for the fast-folding miniprotein Trp-cage, at a rate of over 200 milliseconds per day on a standard workstation. We employ a non-standard clustering for MSM generation that appears to better preserve kinetic properties at shorter lag times than a conventional MSM. We also show a parallelizable workflow that backmaps discrete synMD trajectories to full-coordinate representations at dynamic resolution for efficient analysis.
△ Less
Submitted 4 May, 2022; v1 submitted 8 April, 2022;
originally announced April 2022.
-
Unbiased estimation of equilibrium, rates, and committors from Markov state model analysis
Authors:
John D. Russo,
Jeremy Copperman,
David Aristoff,
Gideon Simpson,
Daniel M. Zuckerman
Abstract:
Markov state models (MSMs) have been broadly adopted for analyzing molecular dynamics trajectories, but the approximate nature of the models that results from coarse-graining into discrete states is a long-known limitation. We show theoretically that, despite the coarse graining, in principle MSM-like analysis can yield unbiased estimation of key observables. We describe unbiased estimators for eq…
▽ More
Markov state models (MSMs) have been broadly adopted for analyzing molecular dynamics trajectories, but the approximate nature of the models that results from coarse-graining into discrete states is a long-known limitation. We show theoretically that, despite the coarse graining, in principle MSM-like analysis can yield unbiased estimation of key observables. We describe unbiased estimators for equilibrium state populations, for the mean first-passage time (MFPT) of an arbitrary process, and for state committors - i.e., splitting probabilities. Generically, the estimators are only asymptotically unbiased but we describe how extension of a recently proposed reweighting scheme can accelerate relaxation to unbiased values. Exactly accounting for 'sliding window' averaging over finite-length trajectories is a key, novel element of our analysis. In general, our analysis indicates that coarse-grained MSMs are asymptotically unbiased for steady-state properties only when appropriate boundary conditions (e.g., source-sink for MFPT estimation) are applied directly to trajectories, prior to calculation of the appropriate transition matrix.
△ Less
Submitted 27 May, 2021;
originally announced May 2021.
-
A gentle introduction to the non-equilibrium physics of trajectories: Theory, algorithms, and biomolecular applications
Authors:
Daniel M. Zuckerman,
John D. Russo
Abstract:
Despite the importance of non-equilibrium statistical mechanics in modern physics and related fields, the topic is often omitted from undergraduate and core-graduate curricula. Key aspects of non-equilibrium physics, however, can be understood with a minimum of formalism based on a rigorous trajectory picture. The fundamental object is the ensemble of trajectories, a set of independent time-evolvi…
▽ More
Despite the importance of non-equilibrium statistical mechanics in modern physics and related fields, the topic is often omitted from undergraduate and core-graduate curricula. Key aspects of non-equilibrium physics, however, can be understood with a minimum of formalism based on a rigorous trajectory picture. The fundamental object is the ensemble of trajectories, a set of independent time-evolving systems that easily can be visualized or simulated (for protein folding, e.g.), and which can be analyzed rigorously in analogy to an ensemble of static system configurations. The trajectory picture provides a straightforward basis for understanding first-passage times, "mechanisms" in complex systems, and fundamental constraints the apparent reversibility of complex processes. Trajectories make concrete the physics underlying the diffusion and Fokker-Planck partial differential equations. Last but not least, trajectory ensembles underpin some of the most important algorithms which have provided significant advances in biomolecular studies of protein conformational and binding processes.
△ Less
Submitted 20 July, 2021; v1 submitted 23 October, 2020;
originally announced October 2020.
-
Iterative trajectory reweighting for estimation of equilibrium and non-equilibrium observables
Authors:
John D. Russo,
Jeremy Copperman,
Daniel M. Zuckerman
Abstract:
We present two algorithms by which a set of short, unbiased trajectories can be iteratively reweighted to obtain various observables. The first algorithm estimates the stationary (steady state) distribution of a system by iteratively reweighting the trajectories based on the average probability in each state. The algorithm applies to equilibrium or non-equilibrium steady states, exploiting the `le…
▽ More
We present two algorithms by which a set of short, unbiased trajectories can be iteratively reweighted to obtain various observables. The first algorithm estimates the stationary (steady state) distribution of a system by iteratively reweighting the trajectories based on the average probability in each state. The algorithm applies to equilibrium or non-equilibrium steady states, exploiting the `left' stationarity of the distribution under dynamics -- i.e., in a discrete setting, when the column vector of probabilities is multiplied by the transition matrix expressed as a left stochastic matrix. The second procedure relies on the `right' stationarity of the committor (splitting probability) expressed as a row vector. The algorithms are unbiased, do not rely on computing transition matrices, and make no Markov assumption about discretized states. Here, we apply the procedures to a one-dimensional double-well potential, and to a 208$μ$s atomistic Trp-cage folding trajectory from D.E. Shaw Research.
△ Less
Submitted 16 June, 2020;
originally announced June 2020.
-
Spectral Sparsification via Bounded-Independence Sampling
Authors:
Dean Doron,
Jack Murtagh,
Salil Vadhan,
David Zuckerman
Abstract:
We give a deterministic, nearly logarithmic-space algorithm for mild spectral sparsification of undirected graphs. Given a weighted, undirected graph $G$ on $n$ vertices described by a binary string of length $N$, an integer $k\leq \log n$, and an error parameter $ε> 0$, our algorithm runs in space $\tilde{O}(k\log (N\cdot w_{\mathrm{max}}/w_{\mathrm{min}}))$ where $w_{\mathrm{max}}$ and…
▽ More
We give a deterministic, nearly logarithmic-space algorithm for mild spectral sparsification of undirected graphs. Given a weighted, undirected graph $G$ on $n$ vertices described by a binary string of length $N$, an integer $k\leq \log n$, and an error parameter $ε> 0$, our algorithm runs in space $\tilde{O}(k\log (N\cdot w_{\mathrm{max}}/w_{\mathrm{min}}))$ where $w_{\mathrm{max}}$ and $w_{\mathrm{min}}$ are the maximum and minimum edge weights in $G$, and produces a weighted graph $H$ with $\tilde{O}(n^{1+2/k}/ε^2)$ edges that spectrally approximates $G$, in the sense of Spielmen and Teng [ST04], up to an error of $ε$.
Our algorithm is based on a new bounded-independence analysis of Spielman and Srivastava's effective resistance based edge sampling algorithm [SS08] and uses results from recent work on space-bounded Laplacian solvers [MRSV17]. In particular, we demonstrate an inherent tradeoff (via upper and lower bounds) between the amount of (bounded) independence used in the edge sampling algorithm, denoted by $k$ above, and the resulting sparsity that can be achieved.
△ Less
Submitted 20 April, 2020; v1 submitted 25 February, 2020;
originally announced February 2020.
-
Key biology you should have learned in physics class: Using ideal-gas mixtures to understand biomolecular machines
Authors:
Daniel M. Zuckerman
Abstract:
The biological cell exhibits a fantastic range of behaviors, but ultimately these are governed by a handful of physical and chemical principles. Here we explore simple theory, known for decades and based on the simple thermodynamics of mixtures of ideal gases, which illuminates several key functions performed within the cell. Our focus is the free-energy-driven import and export of molecules, such…
▽ More
The biological cell exhibits a fantastic range of behaviors, but ultimately these are governed by a handful of physical and chemical principles. Here we explore simple theory, known for decades and based on the simple thermodynamics of mixtures of ideal gases, which illuminates several key functions performed within the cell. Our focus is the free-energy-driven import and export of molecules, such as nutrients and other vital compounds, via transporter proteins. Complementary to a thermodynamic picture is a description of transporters via "mass-action" chemical kinetics, which lends further insights into biological machinery and free energy use. Both thermodynamic and kinetic descriptions can shed light on the fundamental non-equilibrium aspects of transport. On the whole, our biochemical-physics discussion will remain agnostic to chemical details, but we will see how such details ultimately enter a physical description through the example of the cellular fuel ATP.
△ Less
Submitted 23 March, 2020; v1 submitted 19 June, 2019;
originally announced June 2019.
-
Accelerated estimation of long-timescale kinetics by combining weighted ensemble simulation with Markov model "microstates" using non-Markovian theory
Authors:
Jeremy Copperman,
Daniel Zuckerman
Abstract:
The weighted ensemble (WE) simulation strategy provides unbiased sampling of non-equilibrium processes, such as molecular folding or binding, but the extraction of rate constants relies on characterizing steady state behavior. Unfortunately, WE simulations of sufficiently complex systems will not relax to steady state on observed simulation times. Here we show that a post-simulation clustering of…
▽ More
The weighted ensemble (WE) simulation strategy provides unbiased sampling of non-equilibrium processes, such as molecular folding or binding, but the extraction of rate constants relies on characterizing steady state behavior. Unfortunately, WE simulations of sufficiently complex systems will not relax to steady state on observed simulation times. Here we show that a post-simulation clustering of molecular configurations into "microbins" using methods developed in the Markov State Model (MSM) community, can yield unbiased kinetics from WE data before steady-state convergence of the WE simulation itself. Because WE trajectories are directional and not equilibrium-distributed, the history-augmented MSM (haMSM) formulation can be used, which yields the mean first-passage time (MFPT) without bias for arbitrarily small lag times. Accurate kinetics can be obtained while bypassing the often prohibitive convergence requirements of the non-equilibrium weighted ensemble. We validate the method in a simple diffusive process on a 2D random energy landscape, and then analyze atomistic protein folding simulations using WE molecular dynamics. We report significant progress towards the unbiased estimation of protein folding times and pathways, though key challenges remain.
△ Less
Submitted 1 October, 2020; v1 submitted 11 March, 2019;
originally announced March 2019.
-
Biasing Boolean Functions and Collective Coin-Flip** Protocols over Arbitrary Product Distributions
Authors:
Yuval Filmus,
Lianna Hambardzumyan,
Hamed Hatami,
Pooya Hatami,
David Zuckerman
Abstract:
The seminal result of Kahn, Kalai and Linial shows that a coalition of $O(\frac{n}{\log n})$ players can bias the outcome of any Boolean function $\{0,1\}^n \to \{0,1\}$ with respect to the uniform measure. We extend their result to arbitrary product measures on $\{0,1\}^n$, by combining their argument with a completely different argument that handles very biased coordinates.
We view this result…
▽ More
The seminal result of Kahn, Kalai and Linial shows that a coalition of $O(\frac{n}{\log n})$ players can bias the outcome of any Boolean function $\{0,1\}^n \to \{0,1\}$ with respect to the uniform measure. We extend their result to arbitrary product measures on $\{0,1\}^n$, by combining their argument with a completely different argument that handles very biased coordinates.
We view this result as a step towards proving a conjecture of Friedgut, which states that Boolean functions on the continuous cube $[0,1]^n$ (or, equivalently, on $\{1,\dots,n\}^n$) can be biased using coalitions of $o(n)$ players. This is the first step taken in this direction since Friedgut proposed the conjecture in 2004.
Russell, Saks and Zuckerman extended the result of Kahn, Kalai and Linial to multi-round protocols, showing that when the number of rounds is $o(\log^* n)$, a coalition of $o(n)$ players can bias the outcome with respect to the uniform measure. We extend this result as well to arbitrary product measures on $\{0,1\}^n$.
The argument of Russell et al. relies on the fact that a coalition of $o(n)$ players can boost the expectation of any Boolean function from $ε$ to $1-ε$ with respect to the uniform measure. This fails for general product distributions, as the example of the AND function with respect to $μ_{1-1/n}$ shows. Instead, we use a novel boosting argument alongside a generalization of our first result to arbitrary finite ranges.
△ Less
Submitted 20 February, 2019;
originally announced February 2019.
-
Pathway Histogram Analysis of Trajectories: A general strategy for quantification of molecular mechanisms
Authors:
Ernesto Suárez,
Daniel M. Zuckerman
Abstract:
A key overall goal of biomolecular simulations is the characterization of "mechanism" -- the pathways through configuration space of processes such as conformational transitions and binding. Some amount of heterogeneity is intrinsic to the ensemble of pathways, in direct analogy to thermal configurational ensembles. Quantification of that heterogeneity is essential to a complete understanding of m…
▽ More
A key overall goal of biomolecular simulations is the characterization of "mechanism" -- the pathways through configuration space of processes such as conformational transitions and binding. Some amount of heterogeneity is intrinsic to the ensemble of pathways, in direct analogy to thermal configurational ensembles. Quantification of that heterogeneity is essential to a complete understanding of mechanism. We propose a general approach for characterizing path ensembles based on map** individual trajectories into pathway classes whose populations and uncertainties can be analyzed as an ordinary histogram, providing a quantitative "fingerprint" of mechanism. In contrast to prior flux-based analyses used for discrete-state models, stochastic deviations from average behavior are explicitly included via direct classification of trajectories. The histogram approach, furthermore, is applicable to analysis of continuous trajectories. It enables straightforward comparison between ensembles produced by different methods or under different conditions. To implement the formulation, we develop approaches for classifying trajectories, including a clustering-based approach suitable for both continuous-space (e.g., molecular dynamics) or discrete-state (e.g., Markov state model) trajectories, as well as a "fundamental sequence" approach tailored for discrete-state trajectories but also applicable to continuous trajectories through a map** process. We apply the pathway histogram analysis to a toy model and an extremely long atomistic molecular dynamics trajectory of protein folding.
△ Less
Submitted 24 October, 2018;
originally announced October 2018.
-
Transient probability currents provide upper and lower bounds on non-equilibrium steady-state currents in the Smoluchowski picture
Authors:
Jeremy Copperman,
David Aristoff,
Dmitrii E. Makarov,
Gideon Simpson,
Daniel M. Zuckerman
Abstract:
Probability currents are fundamental in characterizing the kinetics of non-equilibrium processes. Notably, the steady-state current $J_{ss}$ for a source-sink system can provide the exact mean-first-passage time (MFPT) for the transition from source to sink. Because transient non-equilibrium behavior is quantified in some modern path sampling approaches, such as the "weighted ensemble" strategy, t…
▽ More
Probability currents are fundamental in characterizing the kinetics of non-equilibrium processes. Notably, the steady-state current $J_{ss}$ for a source-sink system can provide the exact mean-first-passage time (MFPT) for the transition from source to sink. Because transient non-equilibrium behavior is quantified in some modern path sampling approaches, such as the "weighted ensemble" strategy, there is strong motivation to determine bounds on $J_{ss}$ -- and hence on the MFPT -- as the system evolves in time. Here we show that $J_{ss}$ is bounded from above and below by the maximum and minimum, respectively, of the current as a function of the spatial coordinate at any time $t$ for one-dimensional systems undergoing over-damped Langevin (i.e., Smoluchowski) dynamics and for higher-dimensional Smoluchowski systems satisfying certain assumptions when projected onto a single dimension. These bounds become tighter with time, making them of potential practical utility in a scheme for estimating $J_{ss}$ and the long-timescale kinetics of complex systems. Conceptually, the bounds result from the fact that extrema of the transient currents relax toward the steady-state current.
△ Less
Submitted 6 November, 2019; v1 submitted 23 October, 2018;
originally announced October 2018.
-
Stochastic Simulation to Visualize Gene Expression and Error Correction in Living Cells
Authors:
Kevin Y. Chen,
Daniel M. Zuckerman,
Philip C. Nelson
Abstract:
Stochastic simulation can make the molecular processes of cellular control more vivid than the traditional differential-equation approach by generating typical system histories instead of just statistical measures such as the mean and variance of a population. Simple simulations are now easy for students to construct from scratch, that is, without recourse to black-box packages. In some cases, the…
▽ More
Stochastic simulation can make the molecular processes of cellular control more vivid than the traditional differential-equation approach by generating typical system histories instead of just statistical measures such as the mean and variance of a population. Simple simulations are now easy for students to construct from scratch, that is, without recourse to black-box packages. In some cases, their results can also be compared directly to single-molecule experimental data. After introducing the stochastic simulation algorithm, this article gives two case studies, involving gene expression and error correction, respectively. Code samples and resulting animations showing results are given in the online supplements.
△ Less
Submitted 14 September, 2018;
originally announced September 2018.
-
Statistical uncertainty analysis for small-sample, high log-variance data: Cautions for bootstrap** and Bayesian bootstrap**
Authors:
Barmak Mostofian,
Daniel M. Zuckerman
Abstract:
Recent advances in molecular simulations allow the evaluation of previously unattainable observables, such as rate constants for protein folding. However, these calculations are usually computationally expensive and even significant computing resources may result in a small number of independent estimates spread over many orders of magnitude. Such small-sample, high "log-variance" data are not rea…
▽ More
Recent advances in molecular simulations allow the evaluation of previously unattainable observables, such as rate constants for protein folding. However, these calculations are usually computationally expensive and even significant computing resources may result in a small number of independent estimates spread over many orders of magnitude. Such small-sample, high "log-variance" data are not readily amenable to analysis using the standard uncertainty (i.e., "standard error of the mean") because unphysical negative limits of confidence intervals result. Bootstrap**, a natural alternative guaranteed to yield a confidence interval within the minimum and maximum values, also exhibits a striking systematic bias of the lower confidence limit in log space. As we show, bootstrap** artifactually assigns high probability to improbably low mean values. A second alternative, the Bayesian bootstrap strategy, does not suffer from the same deficit and is more logically consistent with the type of confidence interval desired. The Bayesian bootstrap provides uncertainty intervals that are more reliable than those from the standard bootstrap method, but must be used with caution nevertheless. Neither standard nor Bayesian bootstrap** can overcome the intrinsic challenge of under-estimating the mean from small-size, high log-variance samples. Our conclusions are based on extensive analysis of model distributions and re-analysis of multiple independent atomistic simulations. Although we only analyze rate constants, similar considerations will apply to related calculations, potentially including highly non-linear averages like the Jarzynski relation.
△ Less
Submitted 6 February, 2019; v1 submitted 5 June, 2018;
originally announced June 2018.
-
Optimizing weighted ensemble sampling of steady states
Authors:
David Aristoff,
Daniel M. Zuckerman
Abstract:
We propose parameter optimization techniques for weighted ensemble sampling of Markov chains in the steady-state regime. Weighted ensemble consists of replicas of a Markov chain, each carrying a weight, that are periodically resampled according to their weights inside of each of a number of bins that partition state space. We derive, from first principles, strategies for optimizing the choices of…
▽ More
We propose parameter optimization techniques for weighted ensemble sampling of Markov chains in the steady-state regime. Weighted ensemble consists of replicas of a Markov chain, each carrying a weight, that are periodically resampled according to their weights inside of each of a number of bins that partition state space. We derive, from first principles, strategies for optimizing the choices of weighted ensemble parameters, in particular the choice of bins and the number of replicas to maintain in each bin. In a simple numerical example, we compare our new strategies with more traditional ones and with direct Monte Carlo.
△ Less
Submitted 20 April, 2020; v1 submitted 3 June, 2018;
originally announced June 2018.
-
Biophysical comparison of ATP-driven proton pum** mechanisms suggests a kinetic advantage for the rotary process depending on coupling ratio
Authors:
Ramu Anandakrishnan,
Daniel M. Zuckerman
Abstract:
ATP-driven proton pumps, which are critical to the operation of a cell, maintain cytosolic and organellar pH levels within a narrow functional range. These pumps employ two very different mechanisms: an elaborate rotary mechanism used by V-ATPase H+ pumps, and a simpler alternating access mechanism used by P-ATPase H+ pumps. Why are two different mechanisms used to perform the same function? Syste…
▽ More
ATP-driven proton pumps, which are critical to the operation of a cell, maintain cytosolic and organellar pH levels within a narrow functional range. These pumps employ two very different mechanisms: an elaborate rotary mechanism used by V-ATPase H+ pumps, and a simpler alternating access mechanism used by P-ATPase H+ pumps. Why are two different mechanisms used to perform the same function? Systematic analysis, without parameter fitting, of kinetic models of the rotary, alternating access and other possible mechanisms suggest that, when the ratio of protons transported per ATP hydrolyzed exceeds one, the one-at-a-time proton transport by the rotary mechanism is faster than other possible mechanisms across a wide range of driving conditions. When the ratio is one, there is no intrinsic difference in the free energy landscape between mechanisms, and therefore all mechanisms can exhibit the same kinetic performance. To our knowledge all known rotary pumps have an H+:ATP ratio greater than one, and all known alternating access ATP-driven proton pumps have a ratio of one. Our analysis suggests a possible explanation for this apparent relationship between coupling ratio and mechanism. When the conditions under which the pump must operate permit a coupling ratio greater than one, the rotary mechanism may have been selected for its kinetic advantage. On the other hand, when conditions require a coupling ratio of one or less, the alternating access mechanism may have been selected for other possible advantages resulting from its structural and functional simplicity.
△ Less
Submitted 28 October, 2016;
originally announced October 2016.
-
Bitcoin Beacon
Authors:
Iddo Bentov,
Ariel Gabizon,
David Zuckerman
Abstract:
We examine a protocol $π_{\text{beacon}}$ that outputs unpredictable and publicly verifiable randomness, meaning that the output is unknown at the time that $π_{\text{beacon}}$ starts, yet everyone can verify that the output is close to uniform after $π_{\text{beacon}}$ terminates. We show that $π_{\text{beacon}}$ can be instantiated via Bitcoin under sensible assumptions; in particular we conside…
▽ More
We examine a protocol $π_{\text{beacon}}$ that outputs unpredictable and publicly verifiable randomness, meaning that the output is unknown at the time that $π_{\text{beacon}}$ starts, yet everyone can verify that the output is close to uniform after $π_{\text{beacon}}$ terminates. We show that $π_{\text{beacon}}$ can be instantiated via Bitcoin under sensible assumptions; in particular we consider an adversary with an arbitrarily large initial budget who may not operate at a loss indefinitely. In case the adversary has an infinite budget, we provide an impossibility result that stems from the similarity between the Bitcoin model and Santha-Vazirani sources. We also give a hybrid protocol that combines trusted parties and a Bitcoin-based beacon.
△ Less
Submitted 21 May, 2016; v1 submitted 15 May, 2016;
originally announced May 2016.
-
A proposal for regularly updated review/survey articles: "Perpetual Reviews"
Authors:
David L. Mobley,
Daniel M. Zuckerman
Abstract:
We advocate the publication of review/survey articles that will be updated regularly, both in traditional journals and novel venues. We call these "perpetual reviews." This idea naturally builds on the dissemination and archival capabilities present in the modern internet, and indeed perpetual reviews exist already in some forms. Perpetual review articles allow authors to maintain over time the re…
▽ More
We advocate the publication of review/survey articles that will be updated regularly, both in traditional journals and novel venues. We call these "perpetual reviews." This idea naturally builds on the dissemination and archival capabilities present in the modern internet, and indeed perpetual reviews exist already in some forms. Perpetual review articles allow authors to maintain over time the relevance of non-research scholarship that requires a significant investment of effort. Further, such reviews published in a purely electronic format without space constraints can also permit more pedagogical scholarship and clearer treatment of technical issues that remain obscure in a brief treatment.
△ Less
Submitted 8 February, 2015; v1 submitted 3 February, 2015;
originally announced February 2015.
-
On Low Discrepancy Samplings in Product Spaces of Motion Groups
Authors:
Chandrajit Bajaj,
Abhishek Bhowmick,
Eshan Chattopadhyay,
David Zuckerman
Abstract:
Deterministically generating near-uniform point samplings of the motion groups like SO(3), SE(3) and their n-wise products SO(3)^n, SE(3)^n is fundamental to numerous applications in computational and data sciences. The natural measure of sampling quality is discrepancy. In this work, our main goal is construct low discrepancy deterministic samplings in product spaces of the motion groups. To this…
▽ More
Deterministically generating near-uniform point samplings of the motion groups like SO(3), SE(3) and their n-wise products SO(3)^n, SE(3)^n is fundamental to numerous applications in computational and data sciences. The natural measure of sampling quality is discrepancy. In this work, our main goal is construct low discrepancy deterministic samplings in product spaces of the motion groups. To this end, we develop a novel strategy (using a two-step discrepancy construction) that leads to an almost exponential improvement in size (from the trivial direct product). To the best of our knowledge, this is the first nontrivial construction for SO(3)^n, SE(3)^n and the hypertorus T^n.
We also construct new low discrepancy samplings of S^2 and SO(3). The central component in our construction for SO(3) is an explicit construction of N points in S^2 with discrepancy \tildeØ(1/\sqrt{N}) with respect to convex sets, matching the bound achieved for the special case of spherical caps in \cite{ABD_12}. We also generalize the discrepancy of Cartesian product sets \cite{Chazelle04thediscrepancy} to the discrepancy of local Cartesian product sets.
The tools we develop should be useful in generating low discrepancy samplings of other complicated geometric spaces.
△ Less
Submitted 28 November, 2014;
originally announced November 2014.
-
Deterministic Extractors for Additive Sources
Authors:
Abhishek Bhowmick,
Ariel Gabizon,
Thái Hoàng Lê,
David Zuckerman
Abstract:
We propose a new model of a weakly random source that admits randomness extraction. Our model of additive sources includes such natural sources as uniform distributions on arithmetic progressions (APs), generalized arithmetic progressions (GAPs), and Bohr sets, each of which generalizes affine sources. We give an explicit extractor for additive sources with linear min-entropy over both…
▽ More
We propose a new model of a weakly random source that admits randomness extraction. Our model of additive sources includes such natural sources as uniform distributions on arithmetic progressions (APs), generalized arithmetic progressions (GAPs), and Bohr sets, each of which generalizes affine sources. We give an explicit extractor for additive sources with linear min-entropy over both $\mathbb{Z}_p$ and $\mathbb{Z}_p^n$, for large prime $p$, although our results over $\mathbb{Z}_p^n$ require that the source further satisfy a list-decodability condition. As a corollary, we obtain explicit extractors for APs, GAPs, and Bohr sources with linear min-entropy, although again our results over $\mathbb{Z}_p^n$ require the list-decodability condition. We further explore special cases of additive sources. We improve previous constructions of line sources (affine sources of dimension 1), requiring a field of size linear in $n$, rather than $Ω(n^2)$ by Gabizon and Raz. This beats the non-explicit bound of $Θ(n \log n)$ obtained by the probabilistic method. We then generalize this result to APs and GAPs.
△ Less
Submitted 27 October, 2014;
originally announced October 2014.
-
Learning from history: Non-Markovian analyses of complex trajectories for extracting long-time behavior
Authors:
Ernesto Suarez,
Daniel Zuckerman
Abstract:
A number of modern sampling methods probe long time behavior in complex biomolecules using a set of relatively short trajectory segments. Markov state models (MSMs) can be useful in analyzing such data sets, but in particularly complex landscapes, the available trajectory data may prove insufficient for constructing valid Markov models. Here, we explore the potential utility of history-dependent a…
▽ More
A number of modern sampling methods probe long time behavior in complex biomolecules using a set of relatively short trajectory segments. Markov state models (MSMs) can be useful in analyzing such data sets, but in particularly complex landscapes, the available trajectory data may prove insufficient for constructing valid Markov models. Here, we explore the potential utility of history-dependent analyses applied to relatively poor decompositions of configuration space for which MSMs are inadequate. Our approaches build on previous work [Suarez et. al., JCTC 2014] showing that, with sufficient history information, unbiased equilibrium and non-equilibrium observables can be obtained even for arbitrary non-Markovian divisions of phase space. We explore a range of non-Markovian approximations using varying amounts of history information to model the finite length of trajectory segments, applying the analyses to toy models as well as several proteins previously studied by microsec-milisec scale atomistic simulations [Lindorff-Larsen et. al., Science 2011].
△ Less
Submitted 4 July, 2014;
originally announced July 2014.
-
Efficient Stochastic Simulation of Chemical Kinetics Networks using a Weighted Ensemble of Trajectories
Authors:
Rory M. Donovan,
Andrew J. Sedgewick,
James R. Faeder,
Daniel M. Zuckerman
Abstract:
We apply the "weighted ensemble" (WE) simulation strategy, previously employed in the context of molecular dynamics simulations, to a series of systems-biology models that range in complexity from one-dimensional to a system with 354 species and 3680 reactions. WE is relatively easy to implement, does not require extensive hand-tuning of parameters, does not depend on the details of the simulation…
▽ More
We apply the "weighted ensemble" (WE) simulation strategy, previously employed in the context of molecular dynamics simulations, to a series of systems-biology models that range in complexity from one-dimensional to a system with 354 species and 3680 reactions. WE is relatively easy to implement, does not require extensive hand-tuning of parameters, does not depend on the details of the simulation algorithm, and can facilitate the simulation of extremely rare events.
For the coupled stochastic reaction systems we study, WE is able to produce accurate and efficient approximations of the joint probability distribution for all chemical species for all time t. WE is also able to efficiently extract mean first passage times for the systems, via the construction of a steady-state condition with feedback. In all cases studied here, WE results agree with independent calculations, but significantly enhance the precision with which rare or slow processes can be characterized. Speedups over "brute-force" in sampling rare events via the Gillespie direct Stochastic Simulation Algorithm range from ~10^12 to ~10^20 for rare states in a distribution, and ~10^2 to ~10^4 for finding mean first passage times.
△ Less
Submitted 28 March, 2013; v1 submitted 24 March, 2013;
originally announced March 2013.
-
Simultaneous computation of dynamical and equilibrium information using a weighted ensemble of trajectories
Authors:
Ernesto Suarez,
Steven Lettieri,
Matthew C. Zwier,
Carsen A. Stringer,
Sundar Raman Subramanian,
Lillian T. Chong,
Daniel M. Zuckerman
Abstract:
Equilibrium formally can be represented as an ensemble of uncoupled systems undergoing unbiased dynamics in which detailed balance is maintained. Many non-equilibrium processes can be described by suitable subsets of the equilibrium ensemble. Here, we employ the "weighted ensemble" (WE) simulation protocol [Huber and Kim, Biophys. J., 1996] to generate equilibrium trajectory ensembles and extract…
▽ More
Equilibrium formally can be represented as an ensemble of uncoupled systems undergoing unbiased dynamics in which detailed balance is maintained. Many non-equilibrium processes can be described by suitable subsets of the equilibrium ensemble. Here, we employ the "weighted ensemble" (WE) simulation protocol [Huber and Kim, Biophys. J., 1996] to generate equilibrium trajectory ensembles and extract non-equilibrium subsets for computing kinetic quantities. States do not need to be chosen in advance. The procedure formally allows estimation of kinetic rates between arbitrary states chosen after the simulation, along with their equilibrium populations. We also describe a related history-dependent matrix procedure for estimating equilibrium and non-equilibrium observables when phase space has been divided into arbitrary non-Markovian regions, whether in WE or ordinary simulation. In this proof-of-principle study, these methods are successfully applied and validated on two molecular systems: explicitly solvated methane association and the implicitly solvated Ala4 peptide. We comment on challenges remaining in WE calculations.
△ Less
Submitted 5 July, 2014; v1 submitted 10 October, 2012;
originally announced October 2012.
-
Privacy Amplification and Non-Malleable Extractors Via Character Sums
Authors:
Yevgeniy Dodis,
Xin Li,
Trevor D. Wooley,
David Zuckerman
Abstract:
In studying how to communicate over a public channel with an active adversary, Dodis and Wichs introduced the notion of a non-malleable extractor. A non-malleable extractor dramatically strengthens the notion of a strong extractor. A strong extractor takes two inputs, a weakly-random x and a uniformly random seed y, and outputs a string which appears uniform, even given y. For a non-malleable extr…
▽ More
In studying how to communicate over a public channel with an active adversary, Dodis and Wichs introduced the notion of a non-malleable extractor. A non-malleable extractor dramatically strengthens the notion of a strong extractor. A strong extractor takes two inputs, a weakly-random x and a uniformly random seed y, and outputs a string which appears uniform, even given y. For a non-malleable extractor nmExt, the output nmExt(x,y) should appear uniform given y as well as nmExt(x,A(y)), where A is an arbitrary function with A(y) not equal to y.
We show that an extractor introduced by Chor and Goldreich is non-malleable when the entropy rate is above half. It outputs a linear number of bits when the entropy rate is 1/2 + alpha, for any alpha>0. Previously, no nontrivial parameters were known for any non-malleable extractor. To achieve a polynomial running time when outputting many bits, we rely on a widely-believed conjecture about the distribution of prime numbers in arithmetic progressions. Our analysis involves a character sum estimate, which may be of independent interest.
Using our non-malleable extractor, we obtain protocols for "privacy amplification": key agreement between two parties who share a weakly-random secret. Our protocols work in the presence of an active adversary with unlimited computational power, and have asymptotically optimal entropy loss. When the secret has entropy rate greater than 1/2, the protocol follows from a result of Dodis and Wichs, and takes two rounds. When the secret has entropy rate delta for any constant delta>0, our new protocol takes a constant (polynomial in 1/delta) number of rounds. Our protocols run in polynomial time under the above well-known conjecture about primes.
△ Less
Submitted 3 September, 2011; v1 submitted 26 February, 2011;
originally announced February 2011.
-
Equilibrium Sampling in Biomolecular Simulation
Authors:
Daniel M. Zuckerman
Abstract:
Equilibrium sampling of biomolecules remains an unmet challenge after more than 30 years of atomistic simulation. Efforts to enhance sampling capability, which are reviewed here, range from the development of new algorithms to parallelization to novel uses of hardware. Special focus is placed on classifying algorithms -- most of which are underpinned by a few key ideas -- in order to understand th…
▽ More
Equilibrium sampling of biomolecules remains an unmet challenge after more than 30 years of atomistic simulation. Efforts to enhance sampling capability, which are reviewed here, range from the development of new algorithms to parallelization to novel uses of hardware. Special focus is placed on classifying algorithms -- most of which are underpinned by a few key ideas -- in order to understand their fundamental strengths and limitations. Although algorithms have proliferated, progress resulting from novel hardware use appears to be more clear-cut than from algorithms alone, partly due to the lack of widely used sampling measures.
△ Less
Submitted 15 September, 2010;
originally announced September 2010.
-
Can Random Coin Flips Speed Up a Computer?
Authors:
David Zuckerman
Abstract:
This expository essay introduces randomness and computation to a lay audience.
This expository essay introduces randomness and computation to a lay audience.
△ Less
Submitted 9 July, 2010;
originally announced July 2010.
-
Extending fragment-based free energy calculations with library Monte Carlo simulation: Annealing in interaction space
Authors:
Steven Lettieri,
Artem B. Mamonov,
Daniel M. Zuckerman
Abstract:
Pre-calculated libraries of molecular fragment configurations have previously been used as a basis for both equilibrium sampling (via "library-based Monte Carlo") and for obtaining absolute free energies using a polymer-growth formalism. Here, we combine the two approaches to extend the size of systems for which free energies can be calculated. We study a series of all-atom poly-alanine systems in…
▽ More
Pre-calculated libraries of molecular fragment configurations have previously been used as a basis for both equilibrium sampling (via "library-based Monte Carlo") and for obtaining absolute free energies using a polymer-growth formalism. Here, we combine the two approaches to extend the size of systems for which free energies can be calculated. We study a series of all-atom poly-alanine systems in a simple dielectric "solvent" and find that precise free energies can be obtained rapidly. For instance, for 12 residues, less than an hour of single-processor is required. The combined approach is formally equivalent to the "annealed importance sampling" algorithm; instead of annealing by decreasing temperature, however, interactions among fragments are gradually added as the molecule is "grown." We discuss implications for future binding affinity calculations in which a ligand is grown into a binding site.
△ Less
Submitted 10 September, 2010; v1 submitted 21 June, 2010;
originally announced June 2010.
-
Certifiably Pseudorandom Financial Derivatives
Authors:
David Zuckerman
Abstract:
Arora, Barak, Brunnermeier, and Ge showed that taking computational complexity into account, a dishonest seller could strategically place lemons in financial derivatives to make them substantially less valuable to buyers. We show that if the seller is required to construct derivatives of a certain form, then this phenomenon disappears. In particular, we define and construct pseudorandom derivative…
▽ More
Arora, Barak, Brunnermeier, and Ge showed that taking computational complexity into account, a dishonest seller could strategically place lemons in financial derivatives to make them substantially less valuable to buyers. We show that if the seller is required to construct derivatives of a certain form, then this phenomenon disappears. In particular, we define and construct pseudorandom derivative families, for which lemon placement only slightly affects the values of the derivatives. Our constructions use expander graphs. We study our derivatives in a more general setting than Arora et al. In particular, we analyze arbitrary tranches of the common collateralized debt obligations (CDOs) when the underlying assets can have significant dependencies.
△ Less
Submitted 31 August, 2019; v1 submitted 2 June, 2010;
originally announced June 2010.
-
Automated sampling assessment for molecular simulations using the effective sample size
Authors:
Xin Zhang,
Divesh Bhatt,
Daniel M. Zuckerman
Abstract:
To quantify the progress in development of algorithms and forcefields used in molecular simulations, a method for the assessment of the sampling quality is needed. We propose a general method to assess the sampling quality through the estimation of the number of independent samples obtained from molecular simulations. This method is applicable to both dynamic and nondynamic methods and utilizes…
▽ More
To quantify the progress in development of algorithms and forcefields used in molecular simulations, a method for the assessment of the sampling quality is needed. We propose a general method to assess the sampling quality through the estimation of the number of independent samples obtained from molecular simulations. This method is applicable to both dynamic and nondynamic methods and utilizes the variance in the populations of physical states to determine the ESS. We test the correctness and robustness of our procedure in a variety of systems--two-state toy model, all-atom butane, coarse-grained calmodulin, all-atom dileucine and Met-enkaphalin.
We also introduce an automated procedure to obtain approximate physical states from dynamic trajectories: this procedure allows for sample--size estimation for systems for which physical states are not known in advance.
△ Less
Submitted 19 February, 2010;
originally announced February 2010.
-
Symmetry of forward and reverse path populations
Authors:
Divesh Bhatt,
Daniel M. Zuckerman
Abstract:
In this note, we address formally the issue of symmetry for probabilities of different dynamical pathways in the forward and reverse directions of a conformational transition. Our discussion is based on a decomposition of equilibrium into opposing steady states, and makes clear the conditions necessary for symmetry to apply. From a practical point of view, we also discuss when approximate symmet…
▽ More
In this note, we address formally the issue of symmetry for probabilities of different dynamical pathways in the forward and reverse directions of a conformational transition. Our discussion is based on a decomposition of equilibrium into opposing steady states, and makes clear the conditions necessary for symmetry to apply. From a practical point of view, we also discuss when approximate symmetry is to be expected.
△ Less
Submitted 11 February, 2010; v1 submitted 11 February, 2010;
originally announced February 2010.
-
Fooling functions of halfspaces under product distributions
Authors:
P. Gopalan,
R. O'Donnell,
Y. Wu,
D. Zuckerman
Abstract:
We construct pseudorandom generators that fool functions of halfspaces (threshold functions) under a very broad class of product distributions. This class includes not only familiar cases such as the uniform distribution on the discrete cube, the uniform distribution on the solid cube, and the multivariate Gaussian distribution, but also includes any product of discrete distributions with probab…
▽ More
We construct pseudorandom generators that fool functions of halfspaces (threshold functions) under a very broad class of product distributions. This class includes not only familiar cases such as the uniform distribution on the discrete cube, the uniform distribution on the solid cube, and the multivariate Gaussian distribution, but also includes any product of discrete distributions with probabilities bounded away from 0.
Our first main result shows that a recent pseudorandom generator construction of Meka and Zuckerman [MZ09], when suitably modifed, can fool arbitrary functions of d halfspaces under product distributions where each coordinate has bounded fourth moment. To eps-fool any size-s, depth-d decision tree of halfspaces, our pseudorandom generator uses seed length O((d log(ds/eps)+log n) log(ds/eps)). For monotone functions of d halfspaces, the seed length can be improved to O((d log(d/eps)+log n) log(d/eps)). We get better bounds for larger eps; for example, to 1/polylog(n)-fool all monotone functions of (log n)= log log n halfspaces, our generator requires a seed of length just O(log n). Our second main result generalizes the work of Diakonikolas et al. [DGJ+09] to show that bounded independence suffices to fool functions of halfspaces under product distributions. Assuming each coordinate satisfies a certain stronger moment condition, we show that any function computable by a size-s, depth-d decision tree of halfspaces is eps-fooled by O(d^4s^2/eps^2)-wise independence.
△ Less
Submitted 11 January, 2010;
originally announced January 2010.
-
Steady-state simulations using weighted ensemble path sampling
Authors:
Divesh Bhatt,
Bin W. Zhang,
Daniel M. Zuckerman
Abstract:
We extend the weighted ensemble (WE) path sampling method to perform rigorous statistical sampling for systems at steady state. The straightforward steady-state implementation of WE is directly practical for simple landscapes, but not when significant metastable intermediates states are present. We therefore develop an enhanced WE scheme, building on existing ideas, which accelerates attainment…
▽ More
We extend the weighted ensemble (WE) path sampling method to perform rigorous statistical sampling for systems at steady state. The straightforward steady-state implementation of WE is directly practical for simple landscapes, but not when significant metastable intermediates states are present. We therefore develop an enhanced WE scheme, building on existing ideas, which accelerates attainment of steady state in complex systems. We apply both WE approaches to several model systems confirming their correctness and efficiency by comparison with brute-force results. The enhanced version is significantly faster than the brute force and straightforward WE for systems with WE bins that accurately reflect the reaction coordinate(s). The new WE methods can also be applied to equilibrium sampling, since equilibrium is a steady state.
△ Less
Submitted 28 February, 2010; v1 submitted 27 October, 2009;
originally announced October 2009.
-
Thermal Motions of the E. Coli Glucose-Galactose Binding Protein Studied Using Well-Sampled Semi-Atomistic Simulations
Authors:
Derek J. Cashman,
Artem B. Mamonov,
Divesh Bhatt,
Daniel M. Zuckerman
Abstract:
The E. coli glucose-galactose chemosensory receptor is a 309 residue, 32 kDa protein consisting of two distinct structural domains. In this computational study, we studied the protein's thermal fluctuations, including both the large scale interdomain movements that contribute to the receptor's mechanism of action, as well as smaller scale motions, using two different computational methods. We em…
▽ More
The E. coli glucose-galactose chemosensory receptor is a 309 residue, 32 kDa protein consisting of two distinct structural domains. In this computational study, we studied the protein's thermal fluctuations, including both the large scale interdomain movements that contribute to the receptor's mechanism of action, as well as smaller scale motions, using two different computational methods. We employ extremely fast, "semi-atomistic" Library-Based Monte Carlo (LBMC) simulations, which include all backbone atoms but "implicit" side chains. Our results were compared with previous experiments and an all-atom Langevin dynamics simulation. Both LBMC and Langevin dynamics simulations were performed using both the apo and glucose-bound form of the protein, with LBMC exhibiting significantly larger fluctuations. The LBMC simulations are also in general agreement with the disulfide trap** experiments of Careaga & Falke (JMB, 1992; Biophys. J., 1992), which indicate that distant residues in the crystal structure (i.e. beta carbons separated by 10 to 20 angstroms) form spontaneous transient contacts in solution. Our simulations illustrate several possible "mechanisms" (configurational pathways) for these fluctuations. We also observe several discrepancies between our calculations and experiment. Nevertheless, we believe that our semi-atomistic approach could be used to study the fluctuations in other proteins, perhaps for ensemble docking, or other analyses of protein flexibility in virtual screening studies.
△ Less
Submitted 27 October, 2009;
originally announced October 2009.
-
Pseudorandom Generators for Polynomial Threshold Functions
Authors:
Raghu Meka,
David Zuckerman
Abstract:
We study the natural question of constructing pseudorandom generators (PRGs) for low-degree polynomial threshold functions (PTFs). We give a PRG with seed-length log n/eps^{O(d)} fooling degree d PTFs with error at most eps. Previously, no nontrivial constructions were known even for quadratic threshold functions and constant error eps. For the class of degree 1 threshold functions or halfspaces,…
▽ More
We study the natural question of constructing pseudorandom generators (PRGs) for low-degree polynomial threshold functions (PTFs). We give a PRG with seed-length log n/eps^{O(d)} fooling degree d PTFs with error at most eps. Previously, no nontrivial constructions were known even for quadratic threshold functions and constant error eps. For the class of degree 1 threshold functions or halfspaces, we construct PRGs with much better dependence on the error parameter eps and obtain a PRG with seed-length O(log n + log^2(1/eps)). Previously, only PRGs with seed length O(log n log^2(1/eps)/eps^2) were known for halfspaces. We also obtain PRGs with similar seed lengths for fooling halfspaces over the n-dimensional unit sphere.
The main theme of our constructions and analysis is the use of invariance principles to construct pseudorandom generators. We also introduce the notion of monotone read-once branching programs, which is key to improving the dependence on the error rate eps for halfspaces. These techniques may be of independent interest.
△ Less
Submitted 15 November, 2011; v1 submitted 21 October, 2009;
originally announced October 2009.
-
Efficient equilibrium sampling of all-atom peptides using library-based Monte Carlo
Authors:
Ying Ding,
Artem B. Mamonov,
Daniel M. Zuckerman
Abstract:
We applied our previously developed library-based Monte Carlo (LBMC) to equilibrium sampling of several implicitly solvated all-atom peptides. LBMC can perform equilibrium sampling of molecules using the pre-calculated statistical libraries of molecular-fragment configurations and energies. For this study, we employed residue-based fragments distributed according to the Boltzmann factor of the O…
▽ More
We applied our previously developed library-based Monte Carlo (LBMC) to equilibrium sampling of several implicitly solvated all-atom peptides. LBMC can perform equilibrium sampling of molecules using the pre-calculated statistical libraries of molecular-fragment configurations and energies. For this study, we employed residue-based fragments distributed according to the Boltzmann factor of the OPLS-AA forcefield describing the individual fragments. Two solvent models were employed: a simple uniform dielectric and the Generalized Born/Surface Area (GBSA) model. The efficiency of LBMC was compared to standard Langevin dynamics (LD) using three different statistical tools. The statistical analyses indicate that LBMC is more than 100 times faster than LD not only for the simple solvent model but also for GBSA.
△ Less
Submitted 26 January, 2010; v1 submitted 13 October, 2009;
originally announced October 2009.
-
Rapid sampling of all-atom peptides using a library-based polymer-growth approach
Authors:
A. B. Mamonov,
X. Zhang,
D. M. Zuckerman
Abstract:
We adapted existing polymer growth strategies for equilibrium sampling of peptides described by modern atomistic forcefields with implicit solvent. The main novel feature of our approach is the use of pre-calculated statistical libraries of molecular fragments. A molecule is sampled by combining fragment configurations -- of single residues in this study -- which are stored in the libraries. Ens…
▽ More
We adapted existing polymer growth strategies for equilibrium sampling of peptides described by modern atomistic forcefields with implicit solvent. The main novel feature of our approach is the use of pre-calculated statistical libraries of molecular fragments. A molecule is sampled by combining fragment configurations -- of single residues in this study -- which are stored in the libraries. Ensembles generated from the independent libraries are reweighted to conform with the Boltzmann factor distribution of the forcefield describing the full molecule. In this way, high-quality equilibrium sampling of small peptides (4-8 residues) typically requires less than one hour of single-processor wallclock time and can be significantly faster than Langevin simulations. Furthermore, approximate but clash-free ensembles can be generated for larger peptides (e.g., 16 residues) in less than a minute of single-processor computing. We also describe an application to free energy calculation, a "multi-resolution" implementation of the growth procedure and application to fragment assembly protein-structure prediction protocols.
△ Less
Submitted 4 March, 2010; v1 submitted 13 October, 2009;
originally announced October 2009.
-
Heterogeneous path ensembles for conformational transitions in semi-atomistic models of adenylate kinase
Authors:
Divesh Bhatt,
Daniel M. Zuckerman
Abstract:
We performed "weighted ensemble" path-sampling simulations of adenylate kinase, using several semi-atomistic protein models. Our study investigated both the biophysics of conformational transitions as well as the possibility of increasing model accuracy without sacrificing good sampling. Biophysically, the path ensembles show significant heterogeneity and the explicit possibility of two principl…
▽ More
We performed "weighted ensemble" path-sampling simulations of adenylate kinase, using several semi-atomistic protein models. Our study investigated both the biophysics of conformational transitions as well as the possibility of increasing model accuracy without sacrificing good sampling. Biophysically, the path ensembles show significant heterogeneity and the explicit possibility of two principle pathways in the Open-Closed transition. We recently showed, under certain conditions, a "symmetry of hetereogeneity" is expected between the forward and the reverse transitions: the fraction of transitions taking a specific pathway/channel will be the same in both the directions. Our path ensembles are analyzed in the light of the symmetry relation and its conditions. In the realm of modeling, we employed an all-atom backbone with various levels of residue interactions. Because reasonable path sampling required only a few weeks of single-processor computing time with these models, the addition of further chemical detail should be feasible.
△ Less
Submitted 25 February, 2010; v1 submitted 8 October, 2009;
originally announced October 2009.
-
Optimal Testing of Reed-Muller Codes
Authors:
Arnab Bhattacharyya,
Swastik Kopparty,
Grant Schoenebeck,
Madhu Sudan,
David Zuckerman
Abstract:
We consider the problem of testing if a given function f : F_2^n -> F_2 is close to any degree d polynomial in n variables, also known as the Reed-Muller testing problem. The Gowers norm is based on a natural 2^{d+1}-query test for this property. Alon et al. [AKKLR05] rediscovered this test and showed that it accepts every degree d polynomial with probability 1, while it rejects functions that a…
▽ More
We consider the problem of testing if a given function f : F_2^n -> F_2 is close to any degree d polynomial in n variables, also known as the Reed-Muller testing problem. The Gowers norm is based on a natural 2^{d+1}-query test for this property. Alon et al. [AKKLR05] rediscovered this test and showed that it accepts every degree d polynomial with probability 1, while it rejects functions that are Omega(1)-far with probability Omega(1/(d 2^{d})). We give an asymptotically optimal analysis of this test, and show that it rejects functions that are (even only) Omega(2^{-d})-far with Omega(1)-probability (so the rejection probability is a universal constant independent of d and n). This implies a tight relationship between the (d+1)st Gowers norm of a function and its maximal correlation with degree d polynomials, when the correlation is close to 1. Our proof works by induction on n and yields a new analysis of even the classical Blum-Luby-Rubinfeld [BLR93] linearity test, for the setting of functions map** F_2^n to F_2. The optimality follows from a tighter analysis of counterexamples to the "inverse conjecture for the Gowers norm" constructed by [GT09,LMS08]. Our result has several implications. First, it shows that the Gowers norm test is tolerant, in that it also accepts close codewords. Second, it improves the parameters of an XOR lemma for polynomials given by Viola and Wigderson [VW07]. Third, it implies a "query hierarchy" result for property testing of affine-invariant properties. That is, for every function q(n), it gives an affine-invariant property that is testable with O(q(n))-queries, but not with o(q(n))-queries, complementing an analogous result of [GKNR09] for graph properties.
△ Less
Submitted 9 April, 2010; v1 submitted 4 October, 2009;
originally announced October 2009.
-
Weighted Ensemble Path Sampling for Multiple Reaction Channels
Authors:
Bin W. Zhang,
David Jasnow,
Daniel M. Zuckerman
Abstract:
Finding and sampling multiple reaction channels for molecular transitions remains an important challenge in physical chemistry. Here we show that the weighted ensemble (WE) path sampling method can readily sample multiple channels. In a first test, both the WE and transition path sampling methods are applied to two-dimensional model potentials. The comparison explains why the weighted ensemble a…
▽ More
Finding and sampling multiple reaction channels for molecular transitions remains an important challenge in physical chemistry. Here we show that the weighted ensemble (WE) path sampling method can readily sample multiple channels. In a first test, both the WE and transition path sampling methods are applied to two-dimensional model potentials. The comparison explains why the weighted ensemble approach will not be trapped in one channel. The WE approach is then used to sample the full transition path ensemble in implicitly solvated alanine dipeptide at two different temperatures. The ensembles are of sufficient quality to permit quantification of the fractional importance of each channel, even at T=300K when brute-force simulation is prohibitively expensive.
△ Less
Submitted 16 February, 2009;
originally announced February 2009.
-
Absolute free energies estimated by combining pre-calculated molecular fragment libraries
Authors:
Xin Zhang,
Artem B. Mamonov,
Daniel M. Zuckerman
Abstract:
The absolute free energy -- or partition function, equivalently -- of a molecule can be estimated computationally using a suitable reference system. Here, we demonstrate a practical method for staging such calculations by growing a molecule based on a series of fragments. Significant computer time is saved by pre-calculating fragment configurations and interactions for re-use in a variety of mol…
▽ More
The absolute free energy -- or partition function, equivalently -- of a molecule can be estimated computationally using a suitable reference system. Here, we demonstrate a practical method for staging such calculations by growing a molecule based on a series of fragments. Significant computer time is saved by pre-calculating fragment configurations and interactions for re-use in a variety of molecules. We employ such fragment libraries and interaction tables for amino acids and cap** groups to estimate free energies for small peptides. Equilibrium ensembles for the molecules are generated at no additional computational cost, and are used to check our results by comparison to standard dynamics simulation.
△ Less
Submitted 30 January, 2009; v1 submitted 25 January, 2009;
originally announced January 2009.
-
The "weighted ensemble" path sampling method is statistically exact for a broad class of stochastic processes and binning procedures
Authors:
Bin W. Zhang,
Daniel M. Zuckerman,
David Jasnow
Abstract:
The "weighted ensemble" method, introduced by Huber and Kim, [G. A. Huber and S. Kim, Biophys. J. 70, 97 (1996)], is one of a handful of rigorous approaches to path sampling of rare events. Expanding earlier discussions, we show that the technique is statistically exact for a wide class of Markovian and non-Markovian dynamics. The derivation is based on standard path-integral (path probability)…
▽ More
The "weighted ensemble" method, introduced by Huber and Kim, [G. A. Huber and S. Kim, Biophys. J. 70, 97 (1996)], is one of a handful of rigorous approaches to path sampling of rare events. Expanding earlier discussions, we show that the technique is statistically exact for a wide class of Markovian and non-Markovian dynamics. The derivation is based on standard path-integral (path probability) ideas, but recasts the weighted-ensemble approach as simple "resampling" in path space. Similar reasoning indicates that arbitrary nonstatic binning procedures, which merely guide the resampling process, are also valid. Numerical examples confirm the claims, including the use of bins which can adaptively find the target state in a simple model.
△ Less
Submitted 21 December, 2009; v1 submitted 10 October, 2008;
originally announced October 2008.
-
A library-based Monte Carlo technique enables rapid equilibrium sampling of a protein model with atomistic components
Authors:
Artem B. Mamonov,
Divesh Bhatt,
Derek J. Cashman,
Daniel M. Zuckerman
Abstract:
There is significant interest in rapid protein simulations because of the time-scale limitations of all-atom methods. Exploiting the low cost and great availability of computer memory, we report a Monte Carlo technique for incorporating fully flexible atomistic protein components (e.g., peptide planes) into protein models without compromising sampling speed or statistical rigor. Building on exis…
▽ More
There is significant interest in rapid protein simulations because of the time-scale limitations of all-atom methods. Exploiting the low cost and great availability of computer memory, we report a Monte Carlo technique for incorporating fully flexible atomistic protein components (e.g., peptide planes) into protein models without compromising sampling speed or statistical rigor. Building on existing approximate methods (e.g., Rosetta), the technique uses pre-generated statistical libraries of all-atom components which are swapped with the corresponding protein components during a simulation. The simple model we study consists of the three all-atom backbone residues -- Ala, Gly, and Pro -- with structure-based (Go-like) interactions. For the five different proteins considered in this study, LBMC can generate at least 30 statistically independent configurations in about a month of single CPU time. Minimal additional cost is required to add residue-specific interactions.
△ Less
Submitted 4 December, 2008; v1 submitted 22 September, 2008;
originally announced September 2008.
-
Annealed importance sampling of dileucine peptide
Authors:
Edward Lyman,
Daniel M. Zuckerman
Abstract:
Annealed importance sampling is a means to assign equilibrium weights to a nonequilibrium sample that was generated by a simulated annealing protocol. The weights may then be used to calculate equilibrium averages, and also serve as an ``adiabatic signature'' of the chosen cooling schedule. In this paper we demonstrate the method on the 50-atom dileucine peptide, showing that equilibrium distrib…
▽ More
Annealed importance sampling is a means to assign equilibrium weights to a nonequilibrium sample that was generated by a simulated annealing protocol. The weights may then be used to calculate equilibrium averages, and also serve as an ``adiabatic signature'' of the chosen cooling schedule. In this paper we demonstrate the method on the 50-atom dileucine peptide, showing that equilibrium distributions are attained for manageable cooling schedules. For this system, as naively implemented here, the method is modestly more efficient than constant temperature simulation. However, the method is worth considering whenever any simulated heating or cooling is performed (as is often done at the beginning of a simulation project, or during an NMR structure calculation), as it is simple to implement and requires minimal additional CPU expense. Furthermore, the naive implementation presented here can be improved.
△ Less
Submitted 3 April, 2007;
originally announced April 2007.
-
Demonstrated convergence of the equilibrium ensemble for a fast united-residue protein model
Authors:
F. Marty Ytreberg,
Svetlana Kh. Aroutiounian,
Daniel M. Zuckerman
Abstract:
Due to the time-scale limitations of all-atom simulation of proteins, there has been substantial interest in coarse-grained approaches. Some methods, like "Resolution Exchange," [E. Lyman et al., Phys. Rev. Lett. 96, 028105 (2006)] can accelerate canonical all-atom sampling, but require properly distributed coarse ensembles. We therefore demonstrate that full sampling can indeed be achieved in a…
▽ More
Due to the time-scale limitations of all-atom simulation of proteins, there has been substantial interest in coarse-grained approaches. Some methods, like "Resolution Exchange," [E. Lyman et al., Phys. Rev. Lett. 96, 028105 (2006)] can accelerate canonical all-atom sampling, but require properly distributed coarse ensembles. We therefore demonstrate that full sampling can indeed be achieved in a sufficiently simplified protein model, as verified by a recently developed convergence analysis. The model accounts for protein backbone geometry in that rigid peptide planes rotate according to atomistically defined dihedral angles, but there are only two degrees of freedom (phi and psi dihedrals) per residue. Our convergence analysis indicates that small proteins (up to 89 residues in our tests) can be simulated for more than 50 "structural decorrelation times" in less than a week on a single processor. We show that the fluctuation behavior is reasonable, as well as discussing applications, limitations, and extensions of the model.
△ Less
Submitted 23 March, 2007;
originally announced March 2007.
-
A "black-box" re-weighting analysis can correct flawed simulation data, after the fact
Authors:
F. Marty Ytreberg,
Daniel M. Zuckerman
Abstract:
There is a great need for improved statistical sampling in a range of physical, chemical and biological systems. Even simulations based on correct algorithms suffer from statistical error, which can be substantial or even dominant when slow processes are involved. Further, in key biomolecular applications, such as the determination of protein structures from NMR data, non-Boltzmann-distributed e…
▽ More
There is a great need for improved statistical sampling in a range of physical, chemical and biological systems. Even simulations based on correct algorithms suffer from statistical error, which can be substantial or even dominant when slow processes are involved. Further, in key biomolecular applications, such as the determination of protein structures from NMR data, non-Boltzmann-distributed ensembles are generated. We therefore have developed the "black-box" strategy for re-weighting a set of configurations generated by arbitrary means to produce an ensemble distributed according to any target distribution. In contrast to previous algorithmic efforts, the black-box approach exploits the configuration-space density observed in a simulation, rather than assuming a desired distribution has been generated. Successful implementations of the strategy, which reduce both statistical error and bias, are developed for a one-dimensional system, and a 50-atom peptide, for which the correct 250-to-1 population ratio is recovered from a heavily biased ensemble.
△ Less
Submitted 8 November, 2007; v1 submitted 22 September, 2006;
originally announced September 2006.
-
Transition-Event Durations in One Dimensional Activated Processes
Authors:
Bin W. Zhang,
David Jasnow,
Daniel M. Zuckerman
Abstract:
Despite their importance in activated processes, transition-event durations -- which are much shorter than first passage times -- have not received a complete theoretical treatment. We therefore study the distribution of durations of transition events over a barrier in a one-dimensional system undergoing over-damped Langevin dynamics.
Despite their importance in activated processes, transition-event durations -- which are much shorter than first passage times -- have not received a complete theoretical treatment. We therefore study the distribution of durations of transition events over a barrier in a one-dimensional system undergoing over-damped Langevin dynamics.
△ Less
Submitted 28 September, 2006;
originally announced September 2006.
-
The structural de-correlation time: A robust statistical measure of convergence of biomolecular simulations
Authors:
Edward Lyman,
Daniel M. Zuckerman
Abstract:
Although atomistic simulations of proteins and other biological systems are approaching microsecond timescales, the quality of trajectories has remained difficult to assess. Such assessment is critical not only for establishing the relevance of any individual simulation but also in the extremely active field of develo** computational methods. Here we map the trajectory assessment problem onto…
▽ More
Although atomistic simulations of proteins and other biological systems are approaching microsecond timescales, the quality of trajectories has remained difficult to assess. Such assessment is critical not only for establishing the relevance of any individual simulation but also in the extremely active field of develo** computational methods. Here we map the trajectory assessment problem onto a simple statistical calculation of the ``effective sample size'' - i.e., the number of statistically independent configurations. The map** is achieved by asking the question, ``How much time must elapse between snapshots included in a sample for that sample to exhibit the statistical properties expected for independent and identically distributed configurations?'' The resulting ``structural de-correlation time'' is robustly calculated using exact properties deduced from our previously developed ``structural histograms,'' without any fitting parameters. We show the method is equally and directly applicable to toy models, peptides, and a 72-residue protein model. Variants of our approach can readily be applied to a wide range of physical and chemical systems.
△ Less
Submitted 18 February, 2007; v1 submitted 21 July, 2006;
originally announced July 2006.
-
Interaction in Quantum Communication
Authors:
Hartmut Klauck,
Ashwin Nayak,
Amnon Ta-Shma,
David Zuckerman
Abstract:
In some scenarios there are ways of conveying information with many fewer, even exponentially fewer, qubits than possible classically. Moreover, some of these methods have a very simple structure--they involve only few message exchanges between the communicating parties. It is therefore natural to ask whether every classical protocol may be transformed to a ``simpler'' quantum protocol--one that…
▽ More
In some scenarios there are ways of conveying information with many fewer, even exponentially fewer, qubits than possible classically. Moreover, some of these methods have a very simple structure--they involve only few message exchanges between the communicating parties. It is therefore natural to ask whether every classical protocol may be transformed to a ``simpler'' quantum protocol--one that has similar efficiency, but uses fewer message exchanges.
We show that for any constant k, there is a problem such that its k+1 message classical communication complexity is exponentially smaller than its k message quantum communication complexity. This, in particular, proves a round hierarchy theorem for quantum communication complexity, and implies, via a simple reduction, an Omega(N^{1/k}) lower bound for k message quantum protocols for Set Disjointness for constant k.
Enroute, we prove information-theoretic lemmas, and define a related measure of correlation, the informational distance, that we believe may be of significance in other contexts as well.
△ Less
Submitted 15 March, 2006;
originally announced March 2006.
-
Comparison of free energy methods for molecular systems
Authors:
F. Marty Ytreberg,
Robert H. Swendsen,
Daniel M. Zuckerman
Abstract:
We present a detailed comparison of computational efficiency and precision for several free energy difference ($ΔF$) methods. The analysis includes both equilibrium and non-equilibrium approaches, and distinguishes between uni-directional and bi-directional methodologies. We are primarily interested in comparing two recently proposed approaches, adaptive integration and single-ensemble path samp…
▽ More
We present a detailed comparison of computational efficiency and precision for several free energy difference ($ΔF$) methods. The analysis includes both equilibrium and non-equilibrium approaches, and distinguishes between uni-directional and bi-directional methodologies. We are primarily interested in comparing two recently proposed approaches, adaptive integration and single-ensemble path sampling, to more established methodologies. As test cases, we study relative solvation free energies, of large changes to the size or charge of a Lennard-Jones particle in explicit water. The results show that, for the systems used in this study, both adaptive integration and path sampling offer unique advantages over the more traditional approaches. Specifically, adaptive integration is found to provide very precise long-simulation $ΔF$ estimates as compared to other methods used in this report, while also offering rapid estimation of $ΔF$. The results demonstrate that the adaptive integration approach is the best overall method for the systems studied here. The single-ensemble path sampling approach is found to be superior to ordinary Jarzynski averaging for the uni-directional, ``fast-growth'' non-equilibrium case. Closer examination of the path sampling approach on a two-dimensional system suggests it may be the overall method of choice when conformational sampling barriers are high. However, it appears that the free energy landscapes for the systems used in this study have rather modest configurational sampling barriers.
△ Less
Submitted 16 August, 2006; v1 submitted 14 February, 2006;
originally announced February 2006.