Search | arXiv e-print repository

Scalable Bayesian Learning with posteriors

Authors: Samuel Duffield, Kaelan Donatella, Johnathan Chiu, Phoebe Klett, Daniel Simpson

Abstract: Although theoretically compelling, Bayesian learning with modern machine learning models is computationally challenging since it requires approximating a high dimensional posterior distribution. In this work, we (i) introduce posteriors, an easily extensible PyTorch library hosting general-purpose implementations making Bayesian learning accessible and scalable to large data and parameter regimes;… ▽ More Although theoretically compelling, Bayesian learning with modern machine learning models is computationally challenging since it requires approximating a high dimensional posterior distribution. In this work, we (i) introduce posteriors, an easily extensible PyTorch library hosting general-purpose implementations making Bayesian learning accessible and scalable to large data and parameter regimes; (ii) present a tempered framing of stochastic gradient Markov chain Monte Carlo, as implemented in posteriors, that transitions seamlessly into optimization and unveils a minor modification to deep ensembles to ensure they are asymptotically unbiased for the Bayesian posterior, and (iii) demonstrate and compare the utility of Bayesian approximations through experiments including an investigation into the cold posterior effect and applications with large language models. △ Less

Submitted 31 May, 2024; originally announced June 2024.

arXiv:2405.13817 [pdf, other]

Thermodynamic Natural Gradient Descent

Authors: Kaelan Donatella, Samuel Duffield, Maxwell Aifer, Denis Melanson, Gavin Crooks, Patrick J. Coles

Abstract: Second-order training methods have better convergence properties than gradient descent but are rarely used in practice for large-scale training due to their computational overhead. This can be viewed as a hardware limitation (imposed by digital computers). Here we show that natural gradient descent (NGD), a second-order method, can have a similar computational complexity per iteration to a first-o… ▽ More Second-order training methods have better convergence properties than gradient descent but are rarely used in practice for large-scale training due to their computational overhead. This can be viewed as a hardware limitation (imposed by digital computers). Here we show that natural gradient descent (NGD), a second-order method, can have a similar computational complexity per iteration to a first-order method, when employing appropriate hardware. We present a new hybrid digital-analog algorithm for training neural networks that is equivalent to NGD in a certain parameter regime but avoids prohibitively costly linear system solves. Our algorithm exploits the thermodynamic properties of an analog system at equilibrium, and hence requires an analog thermodynamic computer. The training occurs in a hybrid digital-analog loop, where the gradient and Fisher information matrix (or any other positive semi-definite curvature matrix) are calculated at given time intervals while the analog dynamics take place. We numerically demonstrate the superiority of this approach over state-of-the-art digital first- and second-order training methods on classification tasks and language model fine-tuning tasks. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: 17 pages, 7 figures

arXiv:2402.10797 [pdf, other]

BlackJAX: Composable Bayesian inference in JAX

Authors: Alberto Cabezas, Adrien Corenflos, Junpeng Lao, Rémi Louf, Antoine Carnec, Kaustubh Chaudhari, Reuben Cohn-Gordon, Jeremie Coullon, Wei Deng, Sam Duffield, Gerardo Durán-Martín, Marcin Elantkowski, Dan Foreman-Mackey, Michele Gregori, Carlos Iguaran, Ravin Kumar, Martin Lysy, Kevin Murphy, Juan Camilo Orduz, Karm Patel, Xi Wang, Rob Zinkov

Abstract: BlackJAX is a library implementing sampling and variational inference algorithms commonly used in Bayesian computation. It is designed for ease of use, speed, and modularity by taking a functional approach to the algorithms' implementation. BlackJAX is written in Python, using JAX to compile and run NumpPy-like samplers and variational methods on CPUs, GPUs, and TPUs. The library integrates well w… ▽ More BlackJAX is a library implementing sampling and variational inference algorithms commonly used in Bayesian computation. It is designed for ease of use, speed, and modularity by taking a functional approach to the algorithms' implementation. BlackJAX is written in Python, using JAX to compile and run NumpPy-like samplers and variational methods on CPUs, GPUs, and TPUs. The library integrates well with probabilistic programming languages by working directly with the (un-normalized) target log density function. BlackJAX is intended as a collection of low-level, composable implementations of basic statistical 'atoms' that can be combined to perform well-defined Bayesian inference, but also provides high-level routines for ease of use. It is designed for users who need cutting-edge methods, researchers who want to create complex sampling methods, and people who want to learn how these work. △ Less

Submitted 22 February, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

Comments: Companion paper for the library https://github.com/blackjax-devs/blackjax Update: minor changes and updated the list of authors to include technical contributors

arXiv:2311.12759 [pdf, other]

Thermodynamic Matrix Exponentials and Thermodynamic Parallelism

Authors: Samuel Duffield, Maxwell Aifer, Gavin Crooks, Thomas Ahle, Patrick J. Coles

Abstract: Thermodynamic computing exploits fluctuations and dissipation in physical systems to efficiently solve various mathematical problems. For example, it was recently shown that certain linear algebra problems can be solved thermodynamically, leading to an asymptotic speedup scaling with the matrix dimension. The origin of this "thermodynamic advantage" has not yet been fully explained, and it is not… ▽ More Thermodynamic computing exploits fluctuations and dissipation in physical systems to efficiently solve various mathematical problems. For example, it was recently shown that certain linear algebra problems can be solved thermodynamically, leading to an asymptotic speedup scaling with the matrix dimension. The origin of this "thermodynamic advantage" has not yet been fully explained, and it is not clear what other problems might benefit from it. Here we provide a new thermodynamic algorithm for exponentiating a real matrix, with applications in simulating linear dynamical systems. We describe a simple electrical circuit involving coupled oscillators, whose thermal equilibration can implement our algorithm. We also show that this algorithm also provides an asymptotic speedup that is linear in the dimension. Finally, we introduce the concept of thermodynamic parallelism to explain this speedup, stating that thermodynamic noise provides a resource leading to effective parallelization of computations, and we hypothesize this as a mechanism to explain thermodynamic advantage more generally. △ Less

Submitted 5 January, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

Comments: 14 pages, 5 figures

arXiv:2311.04986 [pdf, other]

Exploiting Inductive Biases in Video Modeling through Neural CDEs

Authors: Johnathan Chiu, Samuel Duffield, Max Hunter-Gordon, Kaelan Donatella, Max Aifer, Andi Gu

Abstract: We introduce a novel approach to video modeling that leverages controlled differential equations (CDEs) to address key challenges in video tasks, notably video interpolation and mask propagation. We apply CDEs at varying resolutions leading to a continuous-time U-Net architecture. Unlike traditional methods, our approach does not require explicit optical flow learning, and instead makes use of the… ▽ More We introduce a novel approach to video modeling that leverages controlled differential equations (CDEs) to address key challenges in video tasks, notably video interpolation and mask propagation. We apply CDEs at varying resolutions leading to a continuous-time U-Net architecture. Unlike traditional methods, our approach does not require explicit optical flow learning, and instead makes use of the inherent continuous-time features of CDEs to produce a highly expressive video model. We demonstrate competitive performance against state-of-the-art models for video interpolation and mask propagation tasks. △ Less

Submitted 8 November, 2023; originally announced November 2023.

arXiv:2308.05660 [pdf, other]

Thermodynamic Linear Algebra

Authors: Maxwell Aifer, Kaelan Donatella, Max Hunter Gordon, Samuel Duffield, Thomas Ahle, Daniel Simpson, Gavin E. Crooks, Patrick J. Coles

Abstract: Linear algebraic primitives are at the core of many modern algorithms in engineering, science, and machine learning. Hence, accelerating these primitives with novel computing hardware would have tremendous economic impact. Quantum computing has been proposed for this purpose, although the resource requirements are far beyond current technological capabilities, so this approach remains long-term in… ▽ More Linear algebraic primitives are at the core of many modern algorithms in engineering, science, and machine learning. Hence, accelerating these primitives with novel computing hardware would have tremendous economic impact. Quantum computing has been proposed for this purpose, although the resource requirements are far beyond current technological capabilities, so this approach remains long-term in timescale. Here we consider an alternative physics-based computing paradigm based on classical thermodynamics, to provide a near-term approach to accelerating linear algebra. At first sight, thermodynamics and linear algebra seem to be unrelated fields. In this work, we connect solving linear algebra problems to sampling from the thermodynamic equilibrium distribution of a system of coupled harmonic oscillators. We present simple thermodynamic algorithms for (1) solving linear systems of equations, (2) computing matrix inverses, (3) computing matrix determinants, and (4) solving Lyapunov equations. Under reasonable assumptions, we rigorously establish asymptotic speedups for our algorithms, relative to digital methods, that scale linearly in matrix dimension. Our algorithms exploit thermodynamic principles like ergodicity, entropy, and equilibration, highlighting the deep connection between these two seemingly distinct fields, and opening up algebraic applications for thermodynamic computing hardware. △ Less

Submitted 10 June, 2024; v1 submitted 10 August, 2023; originally announced August 2023.

Comments: 15+22 pages, 6 figures

arXiv:2308.02414 [pdf, other]

A State-Space Perspective on Modelling and Inference for Online Skill Rating

Authors: Samuel Duffield, Samuel Power, Lorenzo Rimella

Abstract: We summarise popular methods used for skill rating in competitive sports, along with their inferential paradigms and introduce new approaches based on sequential Monte Carlo and discrete hidden Markov models. We advocate for a state-space model perspective, wherein players' skills are represented as time-varying, and match results serve as observed quantities. We explore the steps to construct the… ▽ More We summarise popular methods used for skill rating in competitive sports, along with their inferential paradigms and introduce new approaches based on sequential Monte Carlo and discrete hidden Markov models. We advocate for a state-space model perspective, wherein players' skills are represented as time-varying, and match results serve as observed quantities. We explore the steps to construct the model and the three stages of inference: filtering, smoothing and parameter estimation. We examine the challenges of scaling up to numerous players and matches, highlighting the main approximations and reductions which facilitate statistical and computational efficiency. We additionally compare approaches in a realistic experimental pipeline that can be easily reproduced and extended with our open-source Python package, https://github.com/SamDuffield/abile. △ Less

Submitted 12 April, 2024; v1 submitted 4 August, 2023; originally announced August 2023.

arXiv:2306.16608 [pdf, other]

doi 10.1103/PhysRevResearch.6.013221

Demonstrating Bayesian Quantum Phase Estimation with Quantum Error Detection

Authors: Kentaro Yamamoto, Samuel Duffield, Yuta Kikuchi, David Muñoz Ramo

Abstract: Quantum phase estimation (QPE) serves as a building block of many different quantum algorithms and finds important applications in computational chemistry problems. Despite the rapid development of quantum hardware, experimental demonstration of QPE for chemistry problems remains challenging due to its large circuit depth and the lack of quantum resources to protect the hardware from noise with fu… ▽ More Quantum phase estimation (QPE) serves as a building block of many different quantum algorithms and finds important applications in computational chemistry problems. Despite the rapid development of quantum hardware, experimental demonstration of QPE for chemistry problems remains challenging due to its large circuit depth and the lack of quantum resources to protect the hardware from noise with fully fault-tolerant protocols. In the present work, we take a step towards fault-tolerant quantum computing by demonstrating a QPE algorithm on a Quantinuum trapped-ion computer. We employ a Bayesian approach to QPE and introduce a routine for optimal parameter selection, which we combine with a $[[ n+2,n,2 ]]$ quantum error detection code carefully tailored to the hardware capabilities. As a simple quantum chemistry example, we take a hydrogen molecule represented by a two-qubit Hamiltonian and estimate its ground state energy using our QPE protocol. In the experiment, we use the quantum circuits containing as many as 920 physical two-qubit gates to estimate the ground state energy within $6\times 10^{-3}$ hartree of the exact value. △ Less

Submitted 7 September, 2023; v1 submitted 28 June, 2023; originally announced June 2023.

Comments: 16 pages, 9 figures

arXiv:2211.12580 [pdf, other]

Quasi-Newton Sequential Monte Carlo

Authors: Samuel Duffield, Sumeetpal S. Singh

Abstract: Sequential Monte Carlo samplers represent a compelling approach to posterior inference in Bayesian models, due to being parallelisable and providing an unbiased estimate of the posterior normalising constant. In this work, we significantly accelerate sequential Monte Carlo samplers by adopting the L-BFGS Hessian approximation which represents the state-of-the-art in full-batch optimisation techniq… ▽ More Sequential Monte Carlo samplers represent a compelling approach to posterior inference in Bayesian models, due to being parallelisable and providing an unbiased estimate of the posterior normalising constant. In this work, we significantly accelerate sequential Monte Carlo samplers by adopting the L-BFGS Hessian approximation which represents the state-of-the-art in full-batch optimisation techniques. The L-BFGS Hessian approximation has only linear complexity in the parameter dimension and requires no additional posterior or gradient evaluations. The resulting sequential Monte Carlo algorithm is adaptive, parallelisable and well-suited to high-dimensional and multi-modal settings, which we demonstrate in numerical experiments on challenging posterior distributions. △ Less

Submitted 22 November, 2022; originally announced November 2022.

arXiv:2206.07559 [pdf, other]

doi 10.1088/2632-2153/acc8b7

Bayesian Learning of Parameterised Quantum Circuits

Authors: Samuel Duffield, Marcello Benedetti, Matthias Rosenkranz

Abstract: Currently available quantum computers suffer from constraints including hardware noise and a limited number of qubits. As such, variational quantum algorithms that utilise a classical optimiser in order to train a parameterised quantum circuit have drawn significant attention for near-term practical applications of quantum technology. In this work, we take a probabilistic point of view and reformu… ▽ More Currently available quantum computers suffer from constraints including hardware noise and a limited number of qubits. As such, variational quantum algorithms that utilise a classical optimiser in order to train a parameterised quantum circuit have drawn significant attention for near-term practical applications of quantum technology. In this work, we take a probabilistic point of view and reformulate the classical optimisation as an approximation of a Bayesian posterior. The posterior is induced by combining the cost function to be minimised with a prior distribution over the parameters of the quantum circuit. We describe a dimension reduction strategy based on a maximum a posteriori point estimate with a Laplace prior. Experiments on the Quantinuum H1-2 computer show that the resulting circuits are faster to execute and less noisy than the circuits trained without the dimension reduction strategy. We subsequently describe a posterior sampling strategy based on stochastic gradient Langevin dynamics. Numerical simulations on three different problems show that the strategy is capable of generating samples from the full posterior and avoiding local optima. △ Less

Submitted 15 June, 2022; originally announced June 2022.

Comments: 11 pages, 7 figures

Journal ref: Mach. Learn.: Sci. Technol. 4, 025007 (2023)

arXiv:2110.03034 [pdf, other]

doi 10.1016/j.spl.2022.109523

Ensemble Kalman Inversion for General Likelihoods

Authors: Samuel Duffield, Sumeetpal S. Singh

Abstract: In this letter we generalise Ensemble Kalman inversion techniques to general Bayesian models where previously they were restricted to additive Gaussian likelihoods - all in the difficult setting where the likelihood can be sampled from, but its density not necessarily evaluated. In this letter we generalise Ensemble Kalman inversion techniques to general Bayesian models where previously they were restricted to additive Gaussian likelihoods - all in the difficult setting where the likelihood can be sampled from, but its density not necessarily evaluated. △ Less

Submitted 7 June, 2022; v1 submitted 6 October, 2021; originally announced October 2021.

Journal ref: Statistics & Probability Letters 187 (2022)

arXiv:2012.04602 [pdf, other]

doi 10.1109/TSP.2022.3141259

Online Particle Smoothing with Application to Map-matching

Authors: Samuel Duffield, Sumeetpal S. Singh

Abstract: We introduce a novel method for online smoothing in state-space models that utilises a fixed-lag approximation to overcome the well known issue of path degeneracy. Unlike classical fixed-lag techniques that only approximate certain marginals, we introduce an online resampling algorithm, called particle stitching, that converts these marginal samples into a full posterior approximation. We demonstr… ▽ More We introduce a novel method for online smoothing in state-space models that utilises a fixed-lag approximation to overcome the well known issue of path degeneracy. Unlike classical fixed-lag techniques that only approximate certain marginals, we introduce an online resampling algorithm, called particle stitching, that converts these marginal samples into a full posterior approximation. We demonstrate the utility of our method in the context of map-matching, the task of inferring a vehicle's trajectory given a road network and noisy GPS observations. We develop a new state-space model for the difficult task of map-matching on dense, urban road networks. △ Less

Submitted 2 August, 2021; v1 submitted 8 December, 2020; originally announced December 2020.

Journal ref: IEEE Transactions on Signal Processing 2022

Showing 1–12 of 12 results for author: Duffield, S