Search | arXiv e-print repository

Scalable Bayesian Learning with posteriors

Authors: Samuel Duffield, Kaelan Donatella, Johnathan Chiu, Phoebe Klett, Daniel Simpson

Abstract: Although theoretically compelling, Bayesian learning with modern machine learning models is computationally challenging since it requires approximating a high dimensional posterior distribution. In this work, we (i) introduce posteriors, an easily extensible PyTorch library hosting general-purpose implementations making Bayesian learning accessible and scalable to large data and parameter regimes;… ▽ More Although theoretically compelling, Bayesian learning with modern machine learning models is computationally challenging since it requires approximating a high dimensional posterior distribution. In this work, we (i) introduce posteriors, an easily extensible PyTorch library hosting general-purpose implementations making Bayesian learning accessible and scalable to large data and parameter regimes; (ii) present a tempered framing of stochastic gradient Markov chain Monte Carlo, as implemented in posteriors, that transitions seamlessly into optimization and unveils a minor modification to deep ensembles to ensure they are asymptotically unbiased for the Bayesian posterior, and (iii) demonstrate and compare the utility of Bayesian approximations through experiments including an investigation into the cold posterior effect and applications with large language models. △ Less

Submitted 31 May, 2024; originally announced June 2024.

arXiv:2405.13817 [pdf, other]

Thermodynamic Natural Gradient Descent

Authors: Kaelan Donatella, Samuel Duffield, Maxwell Aifer, Denis Melanson, Gavin Crooks, Patrick J. Coles

Abstract: Second-order training methods have better convergence properties than gradient descent but are rarely used in practice for large-scale training due to their computational overhead. This can be viewed as a hardware limitation (imposed by digital computers). Here we show that natural gradient descent (NGD), a second-order method, can have a similar computational complexity per iteration to a first-o… ▽ More Second-order training methods have better convergence properties than gradient descent but are rarely used in practice for large-scale training due to their computational overhead. This can be viewed as a hardware limitation (imposed by digital computers). Here we show that natural gradient descent (NGD), a second-order method, can have a similar computational complexity per iteration to a first-order method, when employing appropriate hardware. We present a new hybrid digital-analog algorithm for training neural networks that is equivalent to NGD in a certain parameter regime but avoids prohibitively costly linear system solves. Our algorithm exploits the thermodynamic properties of an analog system at equilibrium, and hence requires an analog thermodynamic computer. The training occurs in a hybrid digital-analog loop, where the gradient and Fisher information matrix (or any other positive semi-definite curvature matrix) are calculated at given time intervals while the analog dynamics take place. We numerically demonstrate the superiority of this approach over state-of-the-art digital first- and second-order training methods on classification tasks and language model fine-tuning tasks. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: 17 pages, 7 figures

arXiv:2401.16231 [pdf, other]

Error Mitigation for Thermodynamic Computing

Authors: Maxwell Aifer, Denis Melanson, Kaelan Donatella, Gavin Crooks, Thomas Ahle, Patrick J. Coles

Abstract: While physics-based computing can offer speed and energy efficiency compared to digital computing, it also is subject to errors that must be mitigated. For example, many error mitigation methods have been proposed for quantum computing. However this error mitigation framework has yet to be applied to other physics-based computing paradigms. In this work, we consider thermodynamic computing, which… ▽ More While physics-based computing can offer speed and energy efficiency compared to digital computing, it also is subject to errors that must be mitigated. For example, many error mitigation methods have been proposed for quantum computing. However this error mitigation framework has yet to be applied to other physics-based computing paradigms. In this work, we consider thermodynamic computing, which has recently captured attention due to its relevance to artificial intelligence (AI) applications, such as probabilistic AI and generative AI. A key source of errors in this paradigm is the imprecision of the analog hardware components. Here, we introduce a method that reduces the overall error from a linear to a quadratic dependence (from $ε$ to $ε^2$) on the imprecision $ε$, for Gaussian sampling and linear algebra applications. The method involves sampling from an ensemble of imprecise distributions associated with various rounding events and then merging these samples. We numerically demonstrate the scalability of this method for dimensions greater than 1000. Finally, we implement this method on an actual thermodynamic computer and show $20\%$ error reduction for matrix inversion; the first thermodynamic error mitigation experiment. △ Less

Submitted 29 January, 2024; originally announced January 2024.

Comments: 17 pages, 8 figures

arXiv:2312.04836 [pdf, other]

Thermodynamic Computing System for AI Applications

Authors: Denis Melanson, Mohammad Abu Khater, Maxwell Aifer, Kaelan Donatella, Max Hunter Gordon, Thomas Ahle, Gavin Crooks, Antonio J. Martinez, Faris Sbahi, Patrick J. Coles

Abstract: Recent breakthroughs in artificial intelligence (AI) algorithms have highlighted the need for novel computing hardware in order to truly unlock the potential for AI. Physics-based hardware, such as thermodynamic computing, has the potential to provide a fast, low-power means to accelerate AI primitives, especially generative AI and probabilistic AI. In this work, we present the first continuous-va… ▽ More Recent breakthroughs in artificial intelligence (AI) algorithms have highlighted the need for novel computing hardware in order to truly unlock the potential for AI. Physics-based hardware, such as thermodynamic computing, has the potential to provide a fast, low-power means to accelerate AI primitives, especially generative AI and probabilistic AI. In this work, we present the first continuous-variable thermodynamic computer, which we call the stochastic processing unit (SPU). Our SPU is composed of RLC circuits, as unit cells, on a printed circuit board, with 8 unit cells that are all-to-all coupled via switched capacitances. It can be used for either sampling or linear algebra primitives, and we demonstrate Gaussian sampling and matrix inversion on our hardware. The latter represents the first thermodynamic linear algebra experiment. We also illustrate the applicability of the SPU to uncertainty quantification for neural network classification. We envision that this hardware, when scaled up in size, will have significant impact on accelerating various probabilistic AI applications. △ Less

Submitted 8 December, 2023; originally announced December 2023.

Comments: 26 pages, 22 figures

arXiv:2311.04986 [pdf, other]

Exploiting Inductive Biases in Video Modeling through Neural CDEs

Authors: Johnathan Chiu, Samuel Duffield, Max Hunter-Gordon, Kaelan Donatella, Max Aifer, Andi Gu

Abstract: We introduce a novel approach to video modeling that leverages controlled differential equations (CDEs) to address key challenges in video tasks, notably video interpolation and mask propagation. We apply CDEs at varying resolutions leading to a continuous-time U-Net architecture. Unlike traditional methods, our approach does not require explicit optical flow learning, and instead makes use of the… ▽ More We introduce a novel approach to video modeling that leverages controlled differential equations (CDEs) to address key challenges in video tasks, notably video interpolation and mask propagation. We apply CDEs at varying resolutions leading to a continuous-time U-Net architecture. Unlike traditional methods, our approach does not require explicit optical flow learning, and instead makes use of the inherent continuous-time features of CDEs to produce a highly expressive video model. We demonstrate competitive performance against state-of-the-art models for video interpolation and mask propagation tasks. △ Less

Submitted 8 November, 2023; originally announced November 2023.

arXiv:2308.05660 [pdf, other]

Thermodynamic Linear Algebra

Authors: Maxwell Aifer, Kaelan Donatella, Max Hunter Gordon, Samuel Duffield, Thomas Ahle, Daniel Simpson, Gavin E. Crooks, Patrick J. Coles

Abstract: Linear algebraic primitives are at the core of many modern algorithms in engineering, science, and machine learning. Hence, accelerating these primitives with novel computing hardware would have tremendous economic impact. Quantum computing has been proposed for this purpose, although the resource requirements are far beyond current technological capabilities, so this approach remains long-term in… ▽ More Linear algebraic primitives are at the core of many modern algorithms in engineering, science, and machine learning. Hence, accelerating these primitives with novel computing hardware would have tremendous economic impact. Quantum computing has been proposed for this purpose, although the resource requirements are far beyond current technological capabilities, so this approach remains long-term in timescale. Here we consider an alternative physics-based computing paradigm based on classical thermodynamics, to provide a near-term approach to accelerating linear algebra. At first sight, thermodynamics and linear algebra seem to be unrelated fields. In this work, we connect solving linear algebra problems to sampling from the thermodynamic equilibrium distribution of a system of coupled harmonic oscillators. We present simple thermodynamic algorithms for (1) solving linear systems of equations, (2) computing matrix inverses, (3) computing matrix determinants, and (4) solving Lyapunov equations. Under reasonable assumptions, we rigorously establish asymptotic speedups for our algorithms, relative to digital methods, that scale linearly in matrix dimension. Our algorithms exploit thermodynamic principles like ergodicity, entropy, and equilibration, highlighting the deep connection between these two seemingly distinct fields, and opening up algebraic applications for thermodynamic computing hardware. △ Less

Submitted 10 June, 2024; v1 submitted 10 August, 2023; originally announced August 2023.

Comments: 15+22 pages, 6 figures

arXiv:2302.06584 [pdf, other]

Thermodynamic AI and the fluctuation frontier

Authors: Patrick J. Coles, Collin Szczepanski, Denis Melanson, Kaelan Donatella, Antonio J. Martinez, Faris Sbahi

Abstract: Many Artificial Intelligence (AI) algorithms are inspired by physics and employ stochastic fluctuations. We connect these physics-inspired AI algorithms by unifying them under a single mathematical framework that we call Thermodynamic AI. Seemingly disparate algorithmic classes can be described by this framework, for example, (1) Generative diffusion models, (2) Bayesian neural networks, (3) Monte… ▽ More Many Artificial Intelligence (AI) algorithms are inspired by physics and employ stochastic fluctuations. We connect these physics-inspired AI algorithms by unifying them under a single mathematical framework that we call Thermodynamic AI. Seemingly disparate algorithmic classes can be described by this framework, for example, (1) Generative diffusion models, (2) Bayesian neural networks, (3) Monte Carlo sampling and (4) Simulated annealing. Such Thermodynamic AI algorithms are currently run on digital hardware, ultimately limiting their scalability and overall potential. Stochastic fluctuations naturally occur in physical thermodynamic systems, and such fluctuations can be viewed as a computational resource. Hence, we propose a novel computing paradigm, where software and hardware become inseparable. Our algorithmic unification allows us to identify a single full-stack paradigm, involving Thermodynamic AI hardware, that could accelerate such algorithms. We contrast Thermodynamic AI hardware with quantum computing where noise is a roadblock rather than a resource. Thermodynamic AI hardware can be viewed as a novel form of computing, since it uses a novel fundamental building block. We identify stochastic bits (s-bits) and stochastic modes (s-modes) as the respective building blocks for discrete and continuous Thermodynamic AI hardware. In addition to these stochastic units, Thermodynamic AI hardware employs a Maxwell's demon device that guides the system to produce non-trivial states. We provide a few simple physical architectures for building these devices and we develop a formalism for programming the hardware via gate sequences. We hope to stimulate discussion around this new computing paradigm. Beyond acceleration, we believe it will impact the design of both hardware and algorithms, while also deepening our understanding of the connection between physics and intelligence. △ Less

Submitted 13 June, 2023; v1 submitted 9 February, 2023; originally announced February 2023.

Comments: 47 pages, 18 figures, Updated authors

arXiv:2302.04649 [pdf, other]

doi 10.1103/PRXQuantum.4.040335

Efficient estimation of trainability for variational quantum circuits

Authors: Valentin Heyraud, Zejian Li, Kaelan Donatella, Alexandre Le Boité, Cristiano Ciuti

Abstract: Parameterized quantum circuits used as variational ansätze are emerging as promising tools to tackle complex problems ranging from quantum chemistry to combinatorial optimization. These variational quantum circuits can suffer from the well-known curse of barren plateaus, which is characterized by an exponential vanishing of the cost-function gradient with the system size, making training unfeasibl… ▽ More Parameterized quantum circuits used as variational ansätze are emerging as promising tools to tackle complex problems ranging from quantum chemistry to combinatorial optimization. These variational quantum circuits can suffer from the well-known curse of barren plateaus, which is characterized by an exponential vanishing of the cost-function gradient with the system size, making training unfeasible for practical applications. Since a generic quantum circuit cannot be simulated efficiently, the determination of its trainability is an important problem. Here we find an efficient method to compute the gradient of the cost function and its variance for a wide class of variational quantum circuits. Our scheme relies on our proof of an exact map** from randomly initialized circuits to a set of Clifford circuits that can be efficiently simulated on a classical computer by virtue of the celebrated Gottesmann-Knill theorem. This method is scalable and can be used to certify trainability for variational quantum circuits and explore design strategies that can overcome the barren plateau problem. As illustrative examples, we show results with up to 100 qubits. △ Less

Submitted 7 September, 2023; v1 submitted 9 February, 2023; originally announced February 2023.

Comments: Revised version with three additional figures and results with up to 100 qubits

Journal ref: PRX Quantum 4, 040335 (2023)

arXiv:2209.03241 [pdf, other]

doi 10.1103/PhysRevA.108.022210

Dynamics with autoregressive neural quantum states: application to critical quench dynamics

Authors: Kaelan Donatella, Zakari Denis, Alexandre Le Boité, Cristiano Ciuti

Abstract: Despite very promising results, capturing the dynamics of complex quantum systems with neural-network ansätze has been plagued by several problems, one of which being stochastic noise that makes the dynamics unstable and highly dependent on some regularization hyperparameters. We present an alternative general scheme that enables one to capture long-time dynamics of quantum systems in a stable fas… ▽ More Despite very promising results, capturing the dynamics of complex quantum systems with neural-network ansätze has been plagued by several problems, one of which being stochastic noise that makes the dynamics unstable and highly dependent on some regularization hyperparameters. We present an alternative general scheme that enables one to capture long-time dynamics of quantum systems in a stable fashion, provided the neural-network ansatz is normalized, which can be ensured by the autoregressive property of the chosen ansatz. We then apply the scheme to time-dependent quench dynamics by investigating the Kibble-Zurek mechanism in the two-dimensional quantum Ising model. We find an excellent agreement with exact dynamics for small systems and are able to recover scaling laws in agreement with other variational methods. △ Less

Submitted 22 August, 2023; v1 submitted 7 September, 2022; originally announced September 2022.

Comments: Final published version

Journal ref: Phys. Rev. A 108, 022210 (2023)

arXiv:2205.07925 [pdf, other]

doi 10.1103/PhysRevA.106.032413

Machine learning via relativity-inspired quantum dynamics

Authors: Zejian Li, Valentin Heyraud, Kaelan Donatella, Zakari Denis, Cristiano Ciuti

Abstract: We present a machine-learning scheme based on the relativistic dynamics of a quantum system, namely a quantum detector inside a cavity resonator. An equivalent analog model can be realized for example in a circuit QED platform subject to properly modulated driving fields. We consider a reservoir-computing scheme where the input data are embedded in the modulation of the system (equivalent to the a… ▽ More We present a machine-learning scheme based on the relativistic dynamics of a quantum system, namely a quantum detector inside a cavity resonator. An equivalent analog model can be realized for example in a circuit QED platform subject to properly modulated driving fields. We consider a reservoir-computing scheme where the input data are embedded in the modulation of the system (equivalent to the acceleration of the relativistic object) and the output data are obtained by linear combinations of measured observables. As an illustrative example, we have simulated such a relativistic quantum machine for a challenging classification task, showing a very large enhancement of the accuracy in the relativistic regime. Using kernel-machine theory, we show that in the relativistic regime the task-independent expressivity is dramatically magnified with respect to the Newtonian regime. △ Less

Submitted 16 May, 2022; originally announced May 2022.

Comments: Article (2 figures) + Suppl. Mat. (2 figures)

arXiv:2204.04198 [pdf]

Modern applications of machine learning in quantum sciences

Authors: Anna Dawid, Julian Arnold, Borja Requena, Alexander Gresch, Marcin Płodzień, Kaelan Donatella, Kim A. Nicoli, Paolo Stornati, Rouven Koch, Miriam Büttner, Robert Okuła, Gorka Muñoz-Gil, Rodrigo A. Vargas-Hernández, Alba Cervera-Lierta, Juan Carrasquilla, Vedran Dunjko, Marylou Gabrié, Patrick Huembeli, Evert van Nieuwenburg, Filippo Vicentini, Lei Wang, Sebastian J. Wetzel, Giuseppe Carleo, Eliška Greplová, Roman Krems , et al. (4 additional authors not shown)

Abstract: In this book, we provide a comprehensive introduction to the most recent advances in the application of machine learning methods in quantum sciences. We cover the use of deep learning and kernel methods in supervised, unsupervised, and reinforcement learning algorithms for phase classification, representation of many-body quantum states, quantum feedback control, and quantum circuits optimization.… ▽ More In this book, we provide a comprehensive introduction to the most recent advances in the application of machine learning methods in quantum sciences. We cover the use of deep learning and kernel methods in supervised, unsupervised, and reinforcement learning algorithms for phase classification, representation of many-body quantum states, quantum feedback control, and quantum circuits optimization. Moreover, we introduce and discuss more specialized topics such as differentiable programming, generative models, statistical approach to machine learning, and quantum machine learning. △ Less

Submitted 15 November, 2023; v1 submitted 8 April, 2022; originally announced April 2022.

Comments: 288 pages, 92 figures. We have a publishing contract with Cambridge University Press. Figures and tex files are available at https://github.com/Shmoo137/Lecture-Notes

arXiv:2102.04265 [pdf, other]

doi 10.1103/PhysRevA.104.062407

Continuous-time dynamics and error scaling of noisy highly-entangling quantum circuits

Authors: Kaelan Donatella, Zakari Denis, Alexandre Le Boité, Cristiano Ciuti

Abstract: We investigate the continuous-time dynamics of highly-entangling intermediate-scale quantum circuits in the presence of dissipation and decoherence. By compressing the Hilbert space to a time-dependent "corner" subspace that supports faithful representations of the density matrix, we simulate a noisy quantum Fourier transform processor with up to 21 qubits. Our method is efficient to compute with… ▽ More We investigate the continuous-time dynamics of highly-entangling intermediate-scale quantum circuits in the presence of dissipation and decoherence. By compressing the Hilbert space to a time-dependent "corner" subspace that supports faithful representations of the density matrix, we simulate a noisy quantum Fourier transform processor with up to 21 qubits. Our method is efficient to compute with a controllable accuracy the time evolution of intermediate-scale open quantum systems with moderate entropy, while taking into account microscopic dissipative processes rather than relying on digital error models. The circuit size reached in our simulations allows to extract the scaling behaviour of error propagation with the dissipation rates and the number of qubits. Moreover, we show that depending on the dissipative mechanisms at play, the choice of input state has a strong impact on the performance of the quantum algorithm. △ Less

Submitted 6 December, 2021; v1 submitted 8 February, 2021; originally announced February 2021.

Comments: 11 pages, 7 figures. Final version accepted in PRA

Journal ref: Phys. Rev. A 104, 062407 (2021)

arXiv:2004.12883 [pdf, other]

doi 10.1103/PhysRevResearch.2.043232

Entanglement dynamics in dissipative photonic Mott insulators

Authors: Kaelan Donatella, Alberto Biella, Alexandre Le Boité, Cristiano Ciuti

Abstract: We theoretically investigate the entanglement dynamics in photonic Mott insulators in the presence of particle losses and dephasing. We explore two configurations where entanglement is generated following the injection or extraction of a photon in the central site of a chain of cavity resonators. We study the entanglement negativity of two-site reduced density matrices as a function of time and in… ▽ More We theoretically investigate the entanglement dynamics in photonic Mott insulators in the presence of particle losses and dephasing. We explore two configurations where entanglement is generated following the injection or extraction of a photon in the central site of a chain of cavity resonators. We study the entanglement negativity of two-site reduced density matrices as a function of time and inter-site distance. Our findings show that in spite of particle losses the quantum entanglement propagation exhibits a ballistic character with propagation speeds related to the differerent quasiparticles that are involved in the dynamics, namely photonic doublons and holons respectively. Our analysis reveals that photon dissipation has a strikingly asymmetric behavior in the two configurations with a much more dramatic role on the holon entanglement propagation than for the doublon case. △ Less

Submitted 27 April, 2020; originally announced April 2020.

Comments: 4 figures

Journal ref: Phys. Rev. Research 2, 043232 (2020)

Showing 1–13 of 13 results for author: Donatella, K