Search | arXiv e-print repository

Denoising Diffusion Probabilistic Models in Six Simple Steps

Authors: Richard E. Turner, Cristiana-Diana Diaconu, Stratis Markou, Aliaksandra Shysheya, Andrew Y. K. Foong, Bruno Mlodozeniec

Abstract: Denoising Diffusion Probabilistic Models (DDPMs) are a very popular class of deep generative model that have been successfully applied to a diverse range of problems including image and video generation, protein and material synthesis, weather forecasting, and neural surrogates of partial differential equations. Despite their ubiquity it is hard to find an introduction to DDPMs which is simple, co… ▽ More Denoising Diffusion Probabilistic Models (DDPMs) are a very popular class of deep generative model that have been successfully applied to a diverse range of problems including image and video generation, protein and material synthesis, weather forecasting, and neural surrogates of partial differential equations. Despite their ubiquity it is hard to find an introduction to DDPMs which is simple, comprehensive, clean and clear. The compact explanations necessary in research papers are not able to elucidate all of the different design steps taken to formulate the DDPM and the rationale of the steps that are presented is often omitted to save space. Moreover, the expositions are typically presented from the variational lower bound perspective which is unnecessary and arguably harmful as it obfuscates why the method is working and suggests generalisations that do not perform well in practice. On the other hand, perspectives that take the continuous time-limit are beautiful and general, but they have a high barrier-to-entry as they require background knowledge of stochastic differential equations and probability flow. In this note, we distill down the formulation of the DDPM into six simple steps each of which comes with a clear rationale. We assume that the reader is familiar with fundamental topics in machine learning including basic probabilistic modelling, Gaussian distributions, maximum likelihood estimation, and deep learning. △ Less

Submitted 10 February, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

arXiv:2401.04082 [pdf, other]

Improved motif-scaffolding with SE(3) flow matching

Authors: Jason Yim, Andrew Campbell, Emile Mathieu, Andrew Y. K. Foong, Michael Gastegger, José Jiménez-Luna, Sarah Lewis, Victor Garcia Satorras, Bastiaan S. Veeling, Frank Noé, Regina Barzilay, Tommi S. Jaakkola

Abstract: Protein design often begins with knowledge of a desired function from a motif which motif-scaffolding aims to construct a functional protein around. Recently, generative models have achieved breakthrough success in designing scaffolds for a diverse range of motifs. However, the generated scaffolds tend to lack structural diversity, which can hinder success in wet-lab validation. In this work, we e… ▽ More Protein design often begins with knowledge of a desired function from a motif which motif-scaffolding aims to construct a functional protein around. Recently, generative models have achieved breakthrough success in designing scaffolds for a diverse range of motifs. However, the generated scaffolds tend to lack structural diversity, which can hinder success in wet-lab validation. In this work, we extend FrameFlow, an SE(3) flow matching model for protein backbone generation, to perform motif-scaffolding with two complementary approaches. The first is motif amortization, in which FrameFlow is trained with the motif as input using a data augmentation strategy. The second is motif guidance, which performs scaffolding using an estimate of the conditional score from FrameFlow, and requires no additional training. Both approaches achieve an equivalent or higher success rate than previous state-of-the-art methods, with 2.5 times more structurally diverse scaffolds. Code: https://github.com/ microsoft/frame-flow. △ Less

Submitted 8 January, 2024; originally announced January 2024.

Comments: Preprint. Code: https://github.com/ microsoft/frame-flow

arXiv:2303.14468 [pdf, other]

Autoregressive Conditional Neural Processes

Authors: Wessel P. Bruinsma, Stratis Markou, James Requiema, Andrew Y. K. Foong, Tom R. Andersson, Anna Vaughan, Anthony Buonomo, J. Scott Hosking, Richard E. Turner

Abstract: Conditional neural processes (CNPs; Garnelo et al., 2018a) are attractive meta-learning models which produce well-calibrated predictions and are trainable via a simple maximum likelihood procedure. Although CNPs have many advantages, they are unable to model dependencies in their predictions. Various works propose solutions to this, but these come at the cost of either requiring approximate infere… ▽ More Conditional neural processes (CNPs; Garnelo et al., 2018a) are attractive meta-learning models which produce well-calibrated predictions and are trainable via a simple maximum likelihood procedure. Although CNPs have many advantages, they are unable to model dependencies in their predictions. Various works propose solutions to this, but these come at the cost of either requiring approximate inference or being limited to Gaussian predictions. In this work, we instead propose to change how CNPs are deployed at test time, without any modifications to the model or training procedure. Instead of making predictions independently for every target point, we autoregressively define a joint predictive distribution using the chain rule of probability, taking inspiration from the neural autoregressive density estimator (NADE) literature. We show that this simple procedure allows factorised Gaussian CNPs to model highly dependent, non-Gaussian predictive distributions. Perhaps surprisingly, in an extensive range of tasks with synthetic and real data, we show that CNPs in autoregressive (AR) mode not only significantly outperform non-AR CNPs, but are also competitive with more sophisticated models that are significantly more computationally expensive and challenging to train. This performance is remarkable given that AR CNPs are not trained to model joint dependencies. Our work provides an example of how ideas from neural distribution estimation can benefit neural processes, and motivates research into the AR deployment of other neural process models. △ Less

Submitted 25 March, 2023; originally announced March 2023.

Comments: 57 pages; accepted to the 11th International Conference on Learning Representations (ICLR 2023)

arXiv:2302.01170 [pdf, other]

Timewarp: Transferable Acceleration of Molecular Dynamics by Learning Time-Coarsened Dynamics

Authors: Leon Klein, Andrew Y. K. Foong, Tor Erlend Fjelde, Bruno Mlodozeniec, Marc Brockschmidt, Sebastian Nowozin, Frank Noé, Ryota Tomioka

Abstract: Molecular dynamics (MD) simulation is a widely used technique to simulate molecular systems, most commonly at the all-atom resolution where equations of motion are integrated with timesteps on the order of femtoseconds ($1\textrm{fs}=10^{-15}\textrm{s}$). MD is often used to compute equilibrium properties, which requires sampling from an equilibrium distribution such as the Boltzmann distribution.… ▽ More Molecular dynamics (MD) simulation is a widely used technique to simulate molecular systems, most commonly at the all-atom resolution where equations of motion are integrated with timesteps on the order of femtoseconds ($1\textrm{fs}=10^{-15}\textrm{s}$). MD is often used to compute equilibrium properties, which requires sampling from an equilibrium distribution such as the Boltzmann distribution. However, many important processes, such as binding and folding, occur over timescales of milliseconds or beyond, and cannot be efficiently sampled with conventional MD. Furthermore, new MD simulations need to be performed for each molecular system studied. We present Timewarp, an enhanced sampling method which uses a normalising flow as a proposal distribution in a Markov chain Monte Carlo method targeting the Boltzmann distribution. The flow is trained offline on MD trajectories and learns to make large steps in time, simulating the molecular dynamics of $10^{5} - 10^{6}\:\textrm{fs}$. Crucially, Timewarp is transferable between molecular systems: once trained, we show that it generalises to unseen small peptides (2-4 amino acids) at all-atom resolution, exploring their metastable states and providing wall-clock acceleration of sampling compared to standard MD. Our method constitutes an important step towards general, transferable algorithms for accelerating MD. △ Less

Submitted 1 December, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

arXiv:2205.07880 [pdf, ps, other]

A Note on the Chernoff Bound for Random Variables in the Unit Interval

Authors: Andrew Y. K. Foong, Wessel P. Bruinsma, David R. Burt

Abstract: The Chernoff bound is a well-known tool for obtaining a high probability bound on the expectation of a Bernoulli random variable in terms of its sample average. This bound is commonly used in statistical learning theory to upper bound the generalisation risk of a hypothesis in terms of its empirical risk on held-out data, for the case of a binary-valued loss function. However, the extension of thi… ▽ More The Chernoff bound is a well-known tool for obtaining a high probability bound on the expectation of a Bernoulli random variable in terms of its sample average. This bound is commonly used in statistical learning theory to upper bound the generalisation risk of a hypothesis in terms of its empirical risk on held-out data, for the case of a binary-valued loss function. However, the extension of this bound to the case of random variables taking values in the unit interval is less well known in the community. In this note we provide a proof of this extension for convenience and future reference. △ Less

Submitted 15 May, 2022; originally announced May 2022.

arXiv:2107.05734 [pdf, other]

Assessment of Immune Correlates of Protection via Controlled Vaccine Efficacy and Controlled Risk

Authors: Peter B. Gilbert, Youyi Fong, Marco Carone

Abstract: Immune correlates of protection (CoPs) are immunologic biomarkers accepted as a surrogate for an infectious disease clinical endpoint and thus can be used for traditional or provisional vaccine approval. To study CoPs in randomized, placebo-controlled trials, correlates of risk (CoRs) are first assessed in vaccine recipients. This analysis does not assess causation, as a CoR may fail to be a CoP.… ▽ More Immune correlates of protection (CoPs) are immunologic biomarkers accepted as a surrogate for an infectious disease clinical endpoint and thus can be used for traditional or provisional vaccine approval. To study CoPs in randomized, placebo-controlled trials, correlates of risk (CoRs) are first assessed in vaccine recipients. This analysis does not assess causation, as a CoR may fail to be a CoP. We propose a causal CoP analysis that estimates the controlled vaccine efficacy curve across biomarker levels $s$, $CVE(s)$, equal to one minus the ratio of the controlled-risk curve $r_C(s)$ at $s$ and placebo risk, where $r_C(s)$ is causal risk if all participants are assigned vaccine and the biomarker is set to $s$. The criterion for a useful CoP is wide variability of $CVE(s)$ in $s$. Moreover, estimation of $r_C(s)$ is of interest in itself, especially in studies without a placebo arm. For estimation of $r_C(s)$, measured confounders can be adjusted for by any regression method that accommodates missing biomarkers, to which we add sensitivity analysis to quantify robustness of CoP evidence to unmeasured confounding. Application to two harmonized phase 3 trials supports that 50% neutralizing antibody titer has value as a controlled vaccine efficacy CoP for virologically confirmed dengue (VCD): in CYD14 the point estimate (95% confidence interval) for $CVE(s)$ accounting for measured confounders and building in conservative margin for unmeasured confounding increases from 29.6% (95% CI 3.5 to 45.9) at titer 1:36 to 78.5% (95% CI 67.9 to 86.8) at titer 1:1200; these estimates are 17.4% (95% CI -14.4 to 36.5) and 84.5% (95% CI 79.6 to 89.1) for CYD15. △ Less

Submitted 12 July, 2021; originally announced July 2021.

Comments: 25 pages, 3 figures, 1 table

arXiv:2106.03542 [pdf, other]

How Tight Can PAC-Bayes be in the Small Data Regime?

Authors: Andrew Y. K. Foong, Wessel P. Bruinsma, David R. Burt, Richard E. Turner

Abstract: In this paper, we investigate the question: Given a small number of datapoints, for example N = 30, how tight can PAC-Bayes and test set bounds be made? For such small datasets, test set bounds adversely affect generalisation performance by withholding data from the training procedure. In this setting, PAC-Bayes bounds are especially attractive, due to their ability to use all the data to simultan… ▽ More In this paper, we investigate the question: Given a small number of datapoints, for example N = 30, how tight can PAC-Bayes and test set bounds be made? For such small datasets, test set bounds adversely affect generalisation performance by withholding data from the training procedure. In this setting, PAC-Bayes bounds are especially attractive, due to their ability to use all the data to simultaneously learn a posterior and bound its generalisation risk. We focus on the case of i.i.d. data with a bounded loss and consider the generic PAC-Bayes theorem of Germain et al. While their theorem is known to recover many existing PAC-Bayes bounds, it is unclear what the tightest bound derivable from their framework is. For a fixed learning algorithm and dataset, we show that the tightest possible bound coincides with a bound considered by Catoni; and, in the more natural case of distributions over datasets, we establish a lower bound on the best bound achievable in expectation. Interestingly, this lower bound recovers the Chernoff test set bound if the posterior is equal to the prior. Moreover, to illustrate how tight these bounds can be, we study synthetic one-dimensional classification tasks in which it is feasible to meta-learn both the prior and the form of the bound to numerically optimise for the tightest bounds possible. We find that in this simple, controlled scenario, PAC-Bayes bounds are competitive with comparable, commonly used Chernoff test set bounds. However, the sharpest test set bounds still lead to better guarantees on the generalisation error than the PAC-Bayes bounds we consider. △ Less

Submitted 13 January, 2022; v1 submitted 7 June, 2021; originally announced June 2021.

Comments: Published at Neural Information Processing Systems 2021

arXiv:2101.03606 [pdf, other]

The Gaussian Neural Process

Authors: Wessel P. Bruinsma, James Requeima, Andrew Y. K. Foong, Jonathan Gordon, Richard E. Turner

Abstract: Neural Processes (NPs; Garnelo et al., 2018a,b) are a rich class of models for meta-learning that map data sets directly to predictive stochastic processes. We provide a rigorous analysis of the standard maximum-likelihood objective used to train conditional NPs. Moreover, we propose a new member to the Neural Process family called the Gaussian Neural Process (GNP), which models predictive correla… ▽ More Neural Processes (NPs; Garnelo et al., 2018a,b) are a rich class of models for meta-learning that map data sets directly to predictive stochastic processes. We provide a rigorous analysis of the standard maximum-likelihood objective used to train conditional NPs. Moreover, we propose a new member to the Neural Process family called the Gaussian Neural Process (GNP), which models predictive correlations, incorporates translation equivariance, provides universal approximation guarantees, and demonstrates encouraging performance. △ Less

Submitted 10 January, 2021; originally announced January 2021.

Comments: 34 pages; includes supplementary material; to appear in AABI 2020

arXiv:2007.01332 [pdf, other]

Meta-Learning Stationary Stochastic Process Prediction with Convolutional Neural Processes

Authors: Andrew Y. K. Foong, Wessel P. Bruinsma, Jonathan Gordon, Yann Dubois, James Requeima, Richard E. Turner

Abstract: Stationary stochastic processes (SPs) are a key component of many probabilistic models, such as those for off-the-grid spatio-temporal data. They enable the statistical symmetry of underlying physical phenomena to be leveraged, thereby aiding generalization. Prediction in such models can be viewed as a translation equivariant map from observed data sets to predictive SPs, emphasizing the intimate… ▽ More Stationary stochastic processes (SPs) are a key component of many probabilistic models, such as those for off-the-grid spatio-temporal data. They enable the statistical symmetry of underlying physical phenomena to be leveraged, thereby aiding generalization. Prediction in such models can be viewed as a translation equivariant map from observed data sets to predictive SPs, emphasizing the intimate relationship between stationarity and equivariance. Building on this, we propose the Convolutional Neural Process (ConvNP), which endows Neural Processes (NPs) with translation equivariance and extends convolutional conditional NPs to allow for dependencies in the predictive distribution. The latter enables ConvNPs to be deployed in settings which require coherent samples, such as Thompson sampling or conditional image completion. Moreover, we propose a new maximum-likelihood objective to replace the standard ELBO objective in NPs, which conceptually simplifies the framework and empirically improves performance. We demonstrate the strong performance and generalization capabilities of ConvNPs on 1D regression, image completion, and various tasks with real-world spatio-temporal data. △ Less

Submitted 20 November, 2020; v1 submitted 2 July, 2020; originally announced July 2020.

Comments: NeurIPS 2020

arXiv:1910.13556 [pdf, other]

Convolutional Conditional Neural Processes

Authors: Jonathan Gordon, Wessel P. Bruinsma, Andrew Y. K. Foong, James Requeima, Yann Dubois, Richard E. Turner

Abstract: We introduce the Convolutional Conditional Neural Process (ConvCNP), a new member of the Neural Process family that models translation equivariance in the data. Translation equivariance is an important inductive bias for many learning problems including time series modelling, spatial data, and images. The model embeds data sets into an infinite-dimensional function space as opposed to a finite-dim… ▽ More We introduce the Convolutional Conditional Neural Process (ConvCNP), a new member of the Neural Process family that models translation equivariance in the data. Translation equivariance is an important inductive bias for many learning problems including time series modelling, spatial data, and images. The model embeds data sets into an infinite-dimensional function space as opposed to a finite-dimensional vector space. To formalize this notion, we extend the theory of neural representations of sets to include functional representations, and demonstrate that any translation-equivariant embedding can be represented using a convolutional deep set. We evaluate ConvCNPs in several settings, demonstrating that they achieve state-of-the-art performance compared to existing NPs. We demonstrate that building in translation equivariance enables zero-shot generalization to challenging, out-of-domain tasks. △ Less

Submitted 25 June, 2020; v1 submitted 29 October, 2019; originally announced October 2019.

Comments: Accepted at International Conference on Learning Representations 2020

arXiv:1909.00719 [pdf, other]

On the Expressiveness of Approximate Inference in Bayesian Neural Networks

Authors: Andrew Y. K. Foong, David R. Burt, Yingzhen Li, Richard E. Turner

Abstract: While Bayesian neural networks (BNNs) hold the promise of being flexible, well-calibrated statistical models, inference often requires approximations whose consequences are poorly understood. We study the quality of common variational methods in approximating the Bayesian predictive distribution. For single-hidden layer ReLU BNNs, we prove a fundamental limitation in function-space of two of the m… ▽ More While Bayesian neural networks (BNNs) hold the promise of being flexible, well-calibrated statistical models, inference often requires approximations whose consequences are poorly understood. We study the quality of common variational methods in approximating the Bayesian predictive distribution. For single-hidden layer ReLU BNNs, we prove a fundamental limitation in function-space of two of the most commonly used distributions defined in weight-space: mean-field Gaussian and Monte Carlo dropout. We find there are simple cases where neither method can have substantially increased uncertainty in between well-separated regions of low uncertainty. We provide strong empirical evidence that exact inference does not have this pathology, hence it is due to the approximation and not the model. In contrast, for deep networks, we prove a universality result showing that there exist approximate posteriors in the above classes which provide flexible uncertainty estimates. However, we find empirically that pathologies of a similar form as in the single-hidden layer case can persist when performing variational inference in deeper networks. Our results motivate careful consideration of the implications of approximate inference methods in BNNs. △ Less

Submitted 23 October, 2020; v1 submitted 2 September, 2019; originally announced September 2019.

Comments: NeurIPS 2020 version

arXiv:1906.11537 [pdf, other]

'In-Between' Uncertainty in Bayesian Neural Networks

Authors: Andrew Y. K. Foong, Yingzhen Li, José Miguel Hernández-Lobato, Richard E. Turner

Abstract: We describe a limitation in the expressiveness of the predictive uncertainty estimate given by mean-field variational inference (MFVI), a popular approximate inference method for Bayesian neural networks. In particular, MFVI fails to give calibrated uncertainty estimates in between separated regions of observations. This can lead to catastrophically overconfident predictions when testing on out-of… ▽ More We describe a limitation in the expressiveness of the predictive uncertainty estimate given by mean-field variational inference (MFVI), a popular approximate inference method for Bayesian neural networks. In particular, MFVI fails to give calibrated uncertainty estimates in between separated regions of observations. This can lead to catastrophically overconfident predictions when testing on out-of-distribution data. Avoiding such overconfidence is critical for active learning, Bayesian optimisation and out-of-distribution robustness. We instead find that a classical technique, the linearised Laplace approximation, can handle 'in-between' uncertainty much better for small network architectures. △ Less

Submitted 27 June, 2019; originally announced June 2019.

Comments: Presented at the ICML 2019 Workshop on Uncertainty and Robustness in Deep Learning

arXiv:1011.0601 [pdf, ps, other]

doi 10.1214/09-AOAS269

Bayesian inference and model choice in a hidden stochastic two-compartment model of hematopoietic stem cell fate decisions

Authors: Youyi Fong, Peter Guttorp, Janis Abkowitz

Abstract: Despite rapid advances in experimental cell biology, the in vivo behavior of hematopoietic stem cells (HSC) cannot be directly observed and measured. Previously we modeled feline hematopoiesis using a two-compartment hidden Markov process that had birth and emigration events in the first compartment. Here we perform Bayesian statistical inference on models which contain two additional events in th… ▽ More Despite rapid advances in experimental cell biology, the in vivo behavior of hematopoietic stem cells (HSC) cannot be directly observed and measured. Previously we modeled feline hematopoiesis using a two-compartment hidden Markov process that had birth and emigration events in the first compartment. Here we perform Bayesian statistical inference on models which contain two additional events in the first compartment in order to determine if HSC fate decisions are linked to cell division or occur independently. Pareto Optimal Model Assessment approach is used to cross check the estimates from Bayesian inference. Our results show that HSC must divide symmetrically (i.e., produce two HSC daughter cells) in order to maintain hematopoiesis. We then demonstrate that the augmented model that adds asymmetric division events provides a better fit to the competitive transplantation data, and we thus provide evidence that HSC fate determination in vivo occurs both in association with cell division and at a separate point in time. Last we show that assuming each cat has a unique set of parameters leads to either a significant decrease or a nonsignificant increase in model fit, suggesting that the kinetic parameters for HSC are not unique attributes of individual animals, but shared within a species. △ Less

Submitted 2 November, 2010; originally announced November 2010.

Comments: Published in at http://dx.doi.org/10.1214/09-AOAS269 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS269

Journal ref: Annals of Applied Statistics 2009, Vol. 3, No. 4, 1695-1709

Showing 1–13 of 13 results for author: Fong, Y