Search | arXiv e-print repository

Thermalization and Criticality on an Analog-Digital Quantum Simulator

Authors: Trond I. Andersen, Nikita Astrakhantsev, Amir Karamlou, Julia Berndtsson, Johannes Motruk, Aaron Szasz, Jonathan A. Gross, Tom Westerhout, Yaxing Zhang, Ebrahim Forati, Dario Rossi, Bryce Kobrin, Agustin Di Paolo, Andrey R. Klots, Ilya Drozdov, Vladislav D. Kurilovich, Andre Petukhov, Lev B. Ioffe, Andreas Elben, Aniket Rath, Vittorio Vitale, Benoit Vermersch, Rajeev Acharya, Laleh Aghababaie Beni, Kyle Anderson , et al. (202 additional authors not shown)

Abstract: Understanding how interacting particles approach thermal equilibrium is a major challenge of quantum simulators. Unlocking the full potential of such systems toward this goal requires flexible initial state preparation, precise time evolution, and extensive probes for final state characterization. We present a quantum simulator comprising 69 superconducting qubits which supports both universal qua… ▽ More Understanding how interacting particles approach thermal equilibrium is a major challenge of quantum simulators. Unlocking the full potential of such systems toward this goal requires flexible initial state preparation, precise time evolution, and extensive probes for final state characterization. We present a quantum simulator comprising 69 superconducting qubits which supports both universal quantum gates and high-fidelity analog evolution, with performance beyond the reach of classical simulation in cross-entropy benchmarking experiments. Emulating a two-dimensional (2D) XY quantum magnet, we leverage a wide range of measurement techniques to study quantum states after ramps from an antiferromagnetic initial state. We observe signatures of the classical Kosterlitz-Thouless phase transition, as well as strong deviations from Kibble-Zurek scaling predictions attributed to the interplay between quantum and classical coarsening of the correlated domains. This interpretation is corroborated by injecting variable energy density into the initial state, which enables studying the effects of the eigenstate thermalization hypothesis (ETH) in targeted parts of the eigenspectrum. Finally, we digitally prepare the system in pairwise-entangled dimer states and image the transport of energy and vorticity during thermalization. These results establish the efficacy of superconducting analog-digital quantum processors for preparing states across many-body spectra and unveiling their thermalization dynamics. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2310.07136 [pdf, other]

Exponential Quantum Communication Advantage in Distributed Inference and Learning

Authors: Hagay Michaeli, Dar Gilboa, Daniel Soudry, Jarrod R. McClean

Abstract: Training and inference with large machine learning models that far exceed the memory capacity of individual devices necessitates the design of distributed architectures, forcing one to contend with communication constraints. We present a framework for distributed computation over a quantum network in which data is encoded into specialized quantum states. We prove that for models within this framew… ▽ More Training and inference with large machine learning models that far exceed the memory capacity of individual devices necessitates the design of distributed architectures, forcing one to contend with communication constraints. We present a framework for distributed computation over a quantum network in which data is encoded into specialized quantum states. We prove that for models within this framework, inference and training using gradient descent can be performed with exponentially less communication compared to their classical analogs, and with relatively modest overhead relative to standard gradient-based methods. We show that certain graph neural networks are particularly amenable to implementation within this framework, and moreover present empirical evidence that they perform well on standard benchmarks. To our knowledge, this is the first example of exponential quantum advantage for a generic class of machine learning problems that hold regardless of the data encoding cost. Moreover, we show that models in this class can encode highly nonlinear features of their inputs, and their expressivity increases exponentially with model depth. We also delineate the space of models for which exponential communication advantages hold by showing that they cannot hold for linear classification. Our results can be combined with natural privacy advantages in the communicated quantum states that limit the amount of information that can be extracted from them about the data and model parameters. Taken as a whole, these findings form a promising foundation for distributed machine learning over quantum networks. △ Less

Submitted 21 June, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

arXiv:2306.09333 [pdf, other]

doi 10.1126/science.adi7877

Dynamics of magnetization at infinite temperature in a Heisenberg spin chain

Authors: Eliott Rosenberg, Trond Andersen, Rhine Samajdar, Andre Petukhov, Jesse Hoke, Dmitry Abanin, Andreas Bengtsson, Ilya Drozdov, Catherine Erickson, Paul Klimov, Xiao Mi, Alexis Morvan, Matthew Neeley, Charles Neill, Rajeev Acharya, Richard Allen, Kyle Anderson, Markus Ansmann, Frank Arute, Kunal Arya, Abraham Asfaw, Juan Atalaya, Joseph Bardin, A. Bilmes, Gina Bortoli , et al. (156 additional authors not shown)

Abstract: Understanding universal aspects of quantum dynamics is an unresolved problem in statistical mechanics. In particular, the spin dynamics of the 1D Heisenberg model were conjectured to belong to the Kardar-Parisi-Zhang (KPZ) universality class based on the scaling of the infinite-temperature spin-spin correlation function. In a chain of 46 superconducting qubits, we study the probability distributio… ▽ More Understanding universal aspects of quantum dynamics is an unresolved problem in statistical mechanics. In particular, the spin dynamics of the 1D Heisenberg model were conjectured to belong to the Kardar-Parisi-Zhang (KPZ) universality class based on the scaling of the infinite-temperature spin-spin correlation function. In a chain of 46 superconducting qubits, we study the probability distribution, $P(\mathcal{M})$, of the magnetization transferred across the chain's center. The first two moments of $P(\mathcal{M})$ show superdiffusive behavior, a hallmark of KPZ universality. However, the third and fourth moments rule out the KPZ conjecture and allow for evaluating other theories. Our results highlight the importance of studying higher moments in determining dynamic universality classes and provide key insights into universal behavior in quantum systems. △ Less

Submitted 4 April, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

Journal ref: Science 384, 48-53 (2024)

arXiv:2305.13362 [pdf, other]

On quantum backpropagation, information reuse, and cheating measurement collapse

Authors: Amira Abbas, Robbie King, Hsin-Yuan Huang, William J. Huggins, Ramis Movassagh, Dar Gilboa, Jarrod R. McClean

Abstract: The success of modern deep learning hinges on the ability to train neural networks at scale. Through clever reuse of intermediate information, backpropagation facilitates training through gradient computation at a total cost roughly proportional to running the function, rather than incurring an additional factor proportional to the number of parameters - which can now be in the trillions. Naively,… ▽ More The success of modern deep learning hinges on the ability to train neural networks at scale. Through clever reuse of intermediate information, backpropagation facilitates training through gradient computation at a total cost roughly proportional to running the function, rather than incurring an additional factor proportional to the number of parameters - which can now be in the trillions. Naively, one expects that quantum measurement collapse entirely rules out the reuse of quantum information as in backpropagation. But recent developments in shadow tomography, which assumes access to multiple copies of a quantum state, have challenged that notion. Here, we investigate whether parameterized quantum models can train as efficiently as classical neural networks. We show that achieving backpropagation scaling is impossible without access to multiple copies of a state. With this added ability, we introduce an algorithm with foundations in shadow tomography that matches backpropagation scaling in quantum resources while reducing classical auxiliary computational costs to open problems in shadow tomography. These results highlight the nuance of reusing quantum information for practical purposes and clarify the unique difficulties in training large quantum models, which could alter the course of quantum machine learning. △ Less

Submitted 22 May, 2023; originally announced May 2023.

Comments: 29 pages, 2 figures

Journal ref: Advances in Neural Information Processing Systems 36 (2024)

arXiv:2304.13878 [pdf, other]

doi 10.1126/science.adh9932

Stable Quantum-Correlated Many Body States through Engineered Dissipation

Authors: X. Mi, A. A. Michailidis, S. Shabani, K. C. Miao, P. V. Klimov, J. Lloyd, E. Rosenberg, R. Acharya, I. Aleiner, T. I. Andersen, M. Ansmann, F. Arute, K. Arya, A. Asfaw, J. Atalaya, J. C. Bardin, A. Bengtsson, G. Bortoli, A. Bourassa, J. Bovaird, L. Brill, M. Broughton, B. B. Buckley, D. A. Buell, T. Burger , et al. (142 additional authors not shown)

Abstract: Engineered dissipative reservoirs have the potential to steer many-body quantum systems toward correlated steady states useful for quantum simulation of high-temperature superconductivity or quantum magnetism. Using up to 49 superconducting qubits, we prepared low-energy states of the transverse-field Ising model through coupling to dissipative auxiliary qubits. In one dimension, we observed long-… ▽ More Engineered dissipative reservoirs have the potential to steer many-body quantum systems toward correlated steady states useful for quantum simulation of high-temperature superconductivity or quantum magnetism. Using up to 49 superconducting qubits, we prepared low-energy states of the transverse-field Ising model through coupling to dissipative auxiliary qubits. In one dimension, we observed long-range quantum correlations and a ground-state fidelity of 0.86 for 18 qubits at the critical point. In two dimensions, we found mutual information that extends beyond nearest neighbors. Lastly, by coupling the system to auxiliaries emulating reservoirs with different chemical potentials, we explored transport in the quantum Heisenberg model. Our results establish engineered dissipation as a scalable alternative to unitary evolution for preparing entangled many-body states on noisy quantum processors. △ Less

Submitted 5 April, 2024; v1 submitted 26 April, 2023; originally announced April 2023.

Journal ref: Science 383, 1332-1337 (2024)

arXiv:2304.11119 [pdf, other]

Phase transition in Random Circuit Sampling

Authors: A. Morvan, B. Villalonga, X. Mi, S. Mandrà, A. Bengtsson, P. V. Klimov, Z. Chen, S. Hong, C. Erickson, I. K. Drozdov, J. Chau, G. Laun, R. Movassagh, A. Asfaw, L. T. A. N. Brandão, R. Peralta, D. Abanin, R. Acharya, R. Allen, T. I. Andersen, K. Anderson, M. Ansmann, F. Arute, K. Arya, J. Atalaya , et al. (160 additional authors not shown)

Abstract: Undesired coupling to the surrounding environment destroys long-range correlations on quantum processors and hinders the coherent evolution in the nominally available computational space. This incoherent noise is an outstanding challenge to fully leverage the computation power of near-term quantum processors. It has been shown that benchmarking Random Circuit Sampling (RCS) with Cross-Entropy Benc… ▽ More Undesired coupling to the surrounding environment destroys long-range correlations on quantum processors and hinders the coherent evolution in the nominally available computational space. This incoherent noise is an outstanding challenge to fully leverage the computation power of near-term quantum processors. It has been shown that benchmarking Random Circuit Sampling (RCS) with Cross-Entropy Benchmarking (XEB) can provide a reliable estimate of the effective size of the Hilbert space coherently available. The extent to which the presence of noise can trivialize the outputs of a given quantum algorithm, i.e. making it spoofable by a classical computation, is an unanswered question. Here, by implementing an RCS algorithm we demonstrate experimentally that there are two phase transitions observable with XEB, which we explain theoretically with a statistical model. The first is a dynamical transition as a function of the number of cycles and is the continuation of the anti-concentration point in the noiseless case. The second is a quantum phase transition controlled by the error per cycle; to identify it analytically and experimentally, we create a weak link model which allows varying the strength of noise versus coherent evolution. Furthermore, by presenting an RCS experiment with 67 qubits at 32 cycles, we demonstrate that the computational cost of our experiment is beyond the capabilities of existing classical supercomputers, even when accounting for the inevitable presence of noise. Our experimental and theoretical work establishes the existence of transitions to a stable computationally complex phase that is reachable with current quantum processors. △ Less

Submitted 21 December, 2023; v1 submitted 21 April, 2023; originally announced April 2023.

arXiv:2303.04792 [pdf, other]

doi 10.1038/s41586-023-06505-7

Measurement-induced entanglement and teleportation on a noisy quantum processor

Authors: Jesse C. Hoke, Matteo Ippoliti, Eliott Rosenberg, Dmitry Abanin, Rajeev Acharya, Trond I. Andersen, Markus Ansmann, Frank Arute, Kunal Arya, Abraham Asfaw, Juan Atalaya, Joseph C. Bardin, Andreas Bengtsson, Gina Bortoli, Alexandre Bourassa, Jenna Bovaird, Leon Brill, Michael Broughton, Bob B. Buckley, David A. Buell, Tim Burger, Brian Burkett, Nicholas Bushnell, Zijun Chen, Ben Chiaro , et al. (138 additional authors not shown)

Abstract: Measurement has a special role in quantum theory: by collapsing the wavefunction it can enable phenomena such as teleportation and thereby alter the "arrow of time" that constrains unitary evolution. When integrated in many-body dynamics, measurements can lead to emergent patterns of quantum information in space-time that go beyond established paradigms for characterizing phases, either in or out… ▽ More Measurement has a special role in quantum theory: by collapsing the wavefunction it can enable phenomena such as teleportation and thereby alter the "arrow of time" that constrains unitary evolution. When integrated in many-body dynamics, measurements can lead to emergent patterns of quantum information in space-time that go beyond established paradigms for characterizing phases, either in or out of equilibrium. On present-day NISQ processors, the experimental realization of this physics is challenging due to noise, hardware limitations, and the stochastic nature of quantum measurement. Here we address each of these experimental challenges and investigate measurement-induced quantum information phases on up to 70 superconducting qubits. By leveraging the interchangeability of space and time, we use a duality map**, to avoid mid-circuit measurement and access different manifestations of the underlying phases -- from entanglement scaling to measurement-induced teleportation -- in a unified way. We obtain finite-size signatures of a phase transition with a decoding protocol that correlates the experimental measurement record with classical simulation data. The phases display sharply different sensitivity to noise, which we exploit to turn an inherent hardware limitation into a useful diagnostic. Our work demonstrates an approach to realize measurement-induced physics at scales that are at the limits of current NISQ processors. △ Less

Submitted 17 October, 2023; v1 submitted 8 March, 2023; originally announced March 2023.

Journal ref: Nature 622, 481-486 (2023)

arXiv:2210.10799 [pdf, other]

doi 10.1038/s41567-023-02240-y

Purification-based quantum error mitigation of pair-correlated electron simulations

Authors: T. E. O'Brien, G. Anselmetti, F. Gkritsis, V. E. Elfving, S. Polla, W. J. Huggins, O. Oumarou, K. Kechedzhi, D. Abanin, R. Acharya, I. Aleiner, R. Allen, T. I. Andersen, K. Anderson, M. Ansmann, F. Arute, K. Arya, A. Asfaw, J. Atalaya, D. Bacon, J. C. Bardin, A. Bengtsson, S. Boixo, G. Bortoli, A. Bourassa , et al. (151 additional authors not shown)

Abstract: An important measure of the development of quantum computing platforms has been the simulation of increasingly complex physical systems. Prior to fault-tolerant quantum computing, robust error mitigation strategies are necessary to continue this growth. Here, we study physical simulation within the seniority-zero electron pairing subspace, which affords both a computational step** stone to a ful… ▽ More An important measure of the development of quantum computing platforms has been the simulation of increasingly complex physical systems. Prior to fault-tolerant quantum computing, robust error mitigation strategies are necessary to continue this growth. Here, we study physical simulation within the seniority-zero electron pairing subspace, which affords both a computational step** stone to a fully correlated model, and an opportunity to validate recently introduced ``purification-based'' error-mitigation strategies. We compare the performance of error mitigation based on doubling quantum resources in time (echo verification) or in space (virtual distillation), on up to $20$ qubits of a superconducting qubit quantum processor. We observe a reduction of error by one to two orders of magnitude below less sophisticated techniques (e.g. post-selection); the gain from error mitigation is seen to increase with the system size. Employing these error mitigation strategies enables the implementation of the largest variational algorithm for a correlated chemistry system to-date. Extrapolating performance from these results allows us to estimate minimum requirements for a beyond-classical simulation of electronic structure. We find that, despite the impressive gains from purification-based error mitigation, significant hardware improvements will be required for classically intractable variational chemistry simulations. △ Less

Submitted 19 October, 2022; originally announced October 2022.

Comments: 10 pages, 13 page supplementary material, 12 figures. Experimental data available at https://doi.org/10.5281/zenodo.7225821

Journal ref: Nat. Phys. (2023)

arXiv:2210.10255 [pdf, other]

Non-Abelian braiding of graph vertices in a superconducting processor

Authors: Trond I. Andersen, Yuri D. Lensky, Kostyantyn Kechedzhi, Ilya Drozdov, Andreas Bengtsson, Sabrina Hong, Alexis Morvan, Xiao Mi, Alex Opremcak, Rajeev Acharya, Richard Allen, Markus Ansmann, Frank Arute, Kunal Arya, Abraham Asfaw, Juan Atalaya, Ryan Babbush, Dave Bacon, Joseph C. Bardin, Gina Bortoli, Alexandre Bourassa, Jenna Bovaird, Leon Brill, Michael Broughton, Bob B. Buckley , et al. (144 additional authors not shown)

Abstract: Indistinguishability of particles is a fundamental principle of quantum mechanics. For all elementary and quasiparticles observed to date - including fermions, bosons, and Abelian anyons - this principle guarantees that the braiding of identical particles leaves the system unchanged. However, in two spatial dimensions, an intriguing possibility exists: braiding of non-Abelian anyons causes rotatio… ▽ More Indistinguishability of particles is a fundamental principle of quantum mechanics. For all elementary and quasiparticles observed to date - including fermions, bosons, and Abelian anyons - this principle guarantees that the braiding of identical particles leaves the system unchanged. However, in two spatial dimensions, an intriguing possibility exists: braiding of non-Abelian anyons causes rotations in a space of topologically degenerate wavefunctions. Hence, it can change the observables of the system without violating the principle of indistinguishability. Despite the well developed mathematical description of non-Abelian anyons and numerous theoretical proposals, the experimental observation of their exchange statistics has remained elusive for decades. Controllable many-body quantum states generated on quantum processors offer another path for exploring these fundamental phenomena. While efforts on conventional solid-state platforms typically involve Hamiltonian dynamics of quasi-particles, superconducting quantum processors allow for directly manipulating the many-body wavefunction via unitary gates. Building on predictions that stabilizer codes can host projective non-Abelian Ising anyons, we implement a generalized stabilizer code and unitary protocol to create and braid them. This allows us to experimentally verify the fusion rules of the anyons and braid them to realize their statistics. We then study the prospect of employing the anyons for quantum computation and utilize braiding to create an entangled state of anyons encoding three logical qubits. Our work provides new insights about non-Abelian braiding and - through the future inclusion of error correction to achieve topological protection - could open a path toward fault-tolerant quantum computing. △ Less

Submitted 31 May, 2023; v1 submitted 18 October, 2022; originally announced October 2022.

arXiv:2207.06431 [pdf, other]

Suppressing quantum errors by scaling a surface code logical qubit

Authors: Rajeev Acharya, Igor Aleiner, Richard Allen, Trond I. Andersen, Markus Ansmann, Frank Arute, Kunal Arya, Abraham Asfaw, Juan Atalaya, Ryan Babbush, Dave Bacon, Joseph C. Bardin, Joao Basso, Andreas Bengtsson, Sergio Boixo, Gina Bortoli, Alexandre Bourassa, Jenna Bovaird, Leon Brill, Michael Broughton, Bob B. Buckley, David A. Buell, Tim Burger, Brian Burkett, Nicholas Bushnell , et al. (132 additional authors not shown)

Abstract: Practical quantum computing will require error rates that are well below what is achievable with physical qubits. Quantum error correction offers a path to algorithmically-relevant error rates by encoding logical qubits within many physical qubits, where increasing the number of physical qubits enhances protection against physical errors. However, introducing more qubits also increases the number… ▽ More Practical quantum computing will require error rates that are well below what is achievable with physical qubits. Quantum error correction offers a path to algorithmically-relevant error rates by encoding logical qubits within many physical qubits, where increasing the number of physical qubits enhances protection against physical errors. However, introducing more qubits also increases the number of error sources, so the density of errors must be sufficiently low in order for logical performance to improve with increasing code size. Here, we report the measurement of logical qubit performance scaling across multiple code sizes, and demonstrate that our system of superconducting qubits has sufficient performance to overcome the additional errors from increasing qubit number. We find our distance-5 surface code logical qubit modestly outperforms an ensemble of distance-3 logical qubits on average, both in terms of logical error probability over 25 cycles and logical error per cycle ($2.914\%\pm 0.016\%$ compared to $3.028\%\pm 0.023\%$). To investigate damaging, low-probability error sources, we run a distance-25 repetition code and observe a $1.7\times10^{-6}$ logical error per round floor set by a single high-energy event ($1.6\times10^{-7}$ when excluding this event). We are able to accurately model our experiment, and from this model we can extract error budgets that highlight the biggest challenges for future systems. These results mark the first experimental demonstration where quantum error correction begins to improve performance with increasing qubit number, illuminating the path to reaching the logical error rates required for computation. △ Less

Submitted 20 July, 2022; v1 submitted 13 July, 2022; originally announced July 2022.

Comments: Main text: 6 pages, 4 figures. v2: Update author list, references, Fig. S12, Table IV

arXiv:2206.05254 [pdf, other]

doi 10.1038/s41586-022-05348-y

Formation of robust bound states of interacting microwave photons

Authors: Alexis Morvan, Trond I. Andersen, Xiao Mi, Charles Neill, Andre Petukhov, Kostyantyn Kechedzhi, Dmitry Abanin, Rajeev Acharya, Frank Arute, Kunal Arya, Abraham Asfaw, Juan Atalaya, Ryan Babbush, Dave Bacon, Joseph C. Bardin, Joao Basso, Andreas Bengtsson, Gina Bortoli, Alexandre Bourassa, Jenna Bovaird, Leon Brill, Michael Broughton, Bob B. Buckley, David A. Buell, Tim Burger , et al. (125 additional authors not shown)

Abstract: Systems of correlated particles appear in many fields of science and represent some of the most intractable puzzles in nature. The computational challenge in these systems arises when interactions become comparable to other energy scales, which makes the state of each particle depend on all other particles. The lack of general solutions for the 3-body problem and acceptable theory for strongly cor… ▽ More Systems of correlated particles appear in many fields of science and represent some of the most intractable puzzles in nature. The computational challenge in these systems arises when interactions become comparable to other energy scales, which makes the state of each particle depend on all other particles. The lack of general solutions for the 3-body problem and acceptable theory for strongly correlated electrons shows that our understanding of correlated systems fades when the particle number or the interaction strength increases. One of the hallmarks of interacting systems is the formation of multi-particle bound states. In a ring of 24 superconducting qubits, we develop a high fidelity parameterizable fSim gate that we use to implement the periodic quantum circuit of the spin-1/2 XXZ model, an archetypal model of interaction. By placing microwave photons in adjacent qubit sites, we study the propagation of these excitations and observe their bound nature for up to 5 photons. We devise a phase sensitive method for constructing the few-body spectrum of the bound states and extract their pseudo-charge by introducing a synthetic flux. By introducing interactions between the ring and additional qubits, we observe an unexpected resilience of the bound states to integrability breaking. This finding goes against the common wisdom that bound states in non-integrable systems are unstable when their energies overlap with the continuum spectrum. Our work provides experimental evidence for bound states of interacting photons and discovers their stability beyond the integrability limit. △ Less

Submitted 21 December, 2022; v1 submitted 10 June, 2022; originally announced June 2022.

Comments: 7 pages + 15 pages supplements

Journal ref: Nature 612, 240-245 (2022)

arXiv:2206.04615 [pdf, other]

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting. △ Less

Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

arXiv:2204.11372 [pdf, other]

doi 10.1126/science.abq5769

Noise-resilient Edge Modes on a Chain of Superconducting Qubits

Authors: Xiao Mi, Michael Sonner, Murphy Yuezhen Niu, Kenneth W. Lee, Brooks Foxen, Rajeev Acharya, Igor Aleiner, Trond I. Andersen, Frank Arute, Kunal Arya, Abraham Asfaw, Juan Atalaya, Ryan Babbush, Dave Bacon, Joseph C. Bardin, Joao Basso, Andreas Bengtsson, Gina Bortoli, Alexandre Bourassa, Leon Brill, Michael Broughton, Bob B. Buckley, David A. Buell, Brian Burkett, Nicholas Bushnell , et al. (103 additional authors not shown)

Abstract: Inherent symmetry of a quantum system may protect its otherwise fragile states. Leveraging such protection requires testing its robustness against uncontrolled environmental interactions. Using 47 superconducting qubits, we implement the one-dimensional kicked Ising model which exhibits non-local Majorana edge modes (MEMs) with $\mathbb{Z}_2$ parity symmetry. Remarkably, we find that any multi-qub… ▽ More Inherent symmetry of a quantum system may protect its otherwise fragile states. Leveraging such protection requires testing its robustness against uncontrolled environmental interactions. Using 47 superconducting qubits, we implement the one-dimensional kicked Ising model which exhibits non-local Majorana edge modes (MEMs) with $\mathbb{Z}_2$ parity symmetry. Remarkably, we find that any multi-qubit Pauli operator overlap** with the MEMs exhibits a uniform late-time decay rate comparable to single-qubit relaxation rates, irrespective of its size or composition. This characteristic allows us to accurately reconstruct the exponentially localized spatial profiles of the MEMs. Furthermore, the MEMs are found to be resilient against certain symmetry-breaking noise owing to a prethermalization mechanism. Our work elucidates the complex interplay between noise and symmetry-protected edge modes in a solid-state environment. △ Less

Submitted 8 December, 2022; v1 submitted 24 April, 2022; originally announced April 2022.

Journal ref: Science 378, 785 (2022)

arXiv:2107.14324 [pdf, other]

Deep Networks Provably Classify Data on Curves

Authors: Tingran Wang, Sam Buchanan, Dar Gilboa, John Wright

Abstract: Data with low-dimensional nonlinear structure are ubiquitous in engineering and scientific problems. We study a model problem with such structure -- a binary classification task that uses a deep fully-connected neural network to classify data drawn from two disjoint smooth curves on the unit sphere. Aside from mild regularity conditions, we place no restrictions on the configuration of the curves.… ▽ More Data with low-dimensional nonlinear structure are ubiquitous in engineering and scientific problems. We study a model problem with such structure -- a binary classification task that uses a deep fully-connected neural network to classify data drawn from two disjoint smooth curves on the unit sphere. Aside from mild regularity conditions, we place no restrictions on the configuration of the curves. We prove that when (i) the network depth is large relative to certain geometric properties that set the difficulty of the problem and (ii) the network width and number of samples is polynomial in the depth, randomly-initialized gradient descent quickly learns to correctly classify all points on the two curves with high probability. To our knowledge, this is the first generalization guarantee for deep networks with nonlinear data that depends only on intrinsic data properties. Our analysis proceeds by a reduction to dynamics in the neural tangent kernel (NTK) regime, where the network depth plays the role of a fitting resource in solving the classification problem. In particular, via fine-grained control of the decay properties of the NTK, we demonstrate that when the network is sufficiently deep, the NTK can be locally approximated by a translationally invariant operator on the manifolds and stably inverted over smooth functions, which guarantees convergence and generalization. △ Less

Submitted 28 October, 2021; v1 submitted 29 July, 2021; originally announced July 2021.

Comments: NeurIPS 2021

arXiv:2106.04741 [pdf, other]

Marginalizable Density Models

Authors: Dar Gilboa, Ari Pakman, Thibault Vatter

Abstract: Probability density models based on deep networks have achieved remarkable success in modeling complex high-dimensional datasets. However, unlike kernel density estimators, modern neural models do not yield marginals or conditionals in closed form, as these quantities require the evaluation of seldom tractable integrals. In this work, we present the Marginalizable Density Model Approximator (MDMA)… ▽ More Probability density models based on deep networks have achieved remarkable success in modeling complex high-dimensional datasets. However, unlike kernel density estimators, modern neural models do not yield marginals or conditionals in closed form, as these quantities require the evaluation of seldom tractable integrals. In this work, we present the Marginalizable Density Model Approximator (MDMA), a novel deep network architecture which provides closed form expressions for the probabilities, marginals and conditionals of any subset of the variables. The MDMA learns deep scalar representations for each individual variable and combines them via learned hierarchical tensor decompositions into a tractable yet expressive CDF, from which marginals and conditional densities are easily obtained. We illustrate the advantage of exact marginalizability in several tasks that are out of reach of previous deep network-based density estimation models, such as estimating mutual information between arbitrary subsets of variables, inferring causality by testing for conditional independence, and inference with missing data without the need for data imputation, outperforming state-of-the-art models on these tasks. The model also allows for parallelized sampling with only a logarithmic dependence of the time complexity on the number of variables. △ Less

Submitted 8 June, 2021; originally announced June 2021.

arXiv:2102.00218 [pdf, other]

Estimating the Unique Information of Continuous Variables

Authors: Ari Pakman, Amin Nejatbakhsh, Dar Gilboa, Abdullah Makkeh, Luca Mazzucato, Michael Wibral, Elad Schneidman

Abstract: The integration and transfer of information from multiple sources to multiple targets is a core motive of neural systems. The emerging field of partial information decomposition (PID) provides a novel information-theoretic lens into these mechanisms by identifying synergistic, redundant, and unique contributions to the mutual information between one and several variables. While many works have stu… ▽ More The integration and transfer of information from multiple sources to multiple targets is a core motive of neural systems. The emerging field of partial information decomposition (PID) provides a novel information-theoretic lens into these mechanisms by identifying synergistic, redundant, and unique contributions to the mutual information between one and several variables. While many works have studied aspects of PID for Gaussian and discrete distributions, the case of general continuous distributions is still uncharted territory. In this work we present a method for estimating the unique information in continuous distributions, for the case of one versus two variables. Our method solves the associated optimization problem over the space of distributions with fixed bivariate marginals by combining copula decompositions and techniques developed to optimize variational autoencoders. We obtain excellent agreement with known analytic results for Gaussians, and illustrate the power of our new approach in several brain-inspired neural models. Our method is capable of recovering the effective connectivity of a chaotic network of rate neurons, and uncovers a complex trade-off between redundancy, synergy and unique information in recurrent networks trained to solve a generalized XOR task. △ Less

Submitted 26 October, 2021; v1 submitted 30 January, 2021; originally announced February 2021.

Journal ref: NeurIPS 2021

arXiv:2008.11245 [pdf, other]

Deep Networks and the Multiple Manifold Problem

Authors: Sam Buchanan, Dar Gilboa, John Wright

Abstract: We study the multiple manifold problem, a binary classification task modeled on applications in machine vision, in which a deep fully-connected neural network is trained to separate two low-dimensional submanifolds of the unit sphere. We provide an analysis of the one-dimensional case, proving for a simple manifold configuration that when the network depth $L$ is large relative to certain geometri… ▽ More We study the multiple manifold problem, a binary classification task modeled on applications in machine vision, in which a deep fully-connected neural network is trained to separate two low-dimensional submanifolds of the unit sphere. We provide an analysis of the one-dimensional case, proving for a simple manifold configuration that when the network depth $L$ is large relative to certain geometric and statistical properties of the data, the network width $n$ grows as a sufficiently large polynomial in $L$, and the number of i.i.d. samples from the manifolds is polynomial in $L$, randomly-initialized gradient descent rapidly learns to classify the two manifolds perfectly with high probability. Our analysis demonstrates concrete benefits of depth and width in the context of a practically-motivated model problem: the depth acts as a fitting resource, with larger depths corresponding to smoother networks that can more readily separate the class manifolds, and the width acts as a statistical resource, enabling concentration of the randomly-initialized network and its gradients. The argument centers around the neural tangent kernel and its role in the nonasymptotic analysis of training overparameterized neural networks; to this literature, we contribute essentially optimal rates of concentration for the neural tangent kernel of deep fully-connected networks, requiring width $n \gtrsim L\,\mathrm{poly}(d_0)$ to achieve uniform concentration of the initial kernel over a $d_0$-dimensional submanifold of the unit sphere $\mathbb{S}^{n_0-1}$, and a nonasymptotic framework for establishing generalization of networks trained in the NTK regime with structured data. The proof makes heavy use of martingale concentration to optimally treat statistical dependencies across layers of the initial random network. This approach should be of use in establishing similar results for other network architectures. △ Less

Submitted 6 May, 2021; v1 submitted 25 August, 2020; originally announced August 2020.

Comments: ICLR 2021

arXiv:2007.01038 [pdf, other]

Beyond Signal Propagation: Is Feature Diversity Necessary in Deep Neural Network Initialization?

Authors: Yaniv Blumenfeld, Dar Gilboa, Daniel Soudry

Abstract: Deep neural networks are typically initialized with random weights, with variances chosen to facilitate signal propagation and stable gradients. It is also believed that diversity of features is an important property of these initializations. We construct a deep convolutional network with identical features by initializing almost all the weights to $0$. The architecture also enables perfect signal… ▽ More Deep neural networks are typically initialized with random weights, with variances chosen to facilitate signal propagation and stable gradients. It is also believed that diversity of features is an important property of these initializations. We construct a deep convolutional network with identical features by initializing almost all the weights to $0$. The architecture also enables perfect signal propagation and stable gradients, and achieves high accuracy on standard benchmarks. This indicates that random, diverse initializations are \textit{not} necessary for training neural networks. An essential element in training this network is a mechanism of symmetry breaking; we study this phenomenon and find that standard GPU operations, which are non-deterministic, can serve as a sufficient source of symmetry breaking to enable training. △ Less

Submitted 2 July, 2020; originally announced July 2020.

Comments: ICML 2020

arXiv:1912.05137

Is Feature Diversity Necessary in Neural Network Initialization?

Authors: Yaniv Blumenfeld, Dar Gilboa, Daniel Soudry

Abstract: Standard practice in training neural networks involves initializing the weights in an independent fashion. The results of recent work suggest that feature "diversity" at initialization plays an important role in training the network. However, other initialization schemes with reduced feature diversity have also been shown to be viable. In this work, we conduct a series of experiments aimed at eluc… ▽ More Standard practice in training neural networks involves initializing the weights in an independent fashion. The results of recent work suggest that feature "diversity" at initialization plays an important role in training the network. However, other initialization schemes with reduced feature diversity have also been shown to be viable. In this work, we conduct a series of experiments aimed at elucidating the importance of feature diversity at initialization. We show that a complete lack of diversity is harmful to training, but its effects can be counteracted by a relatively small addition of noise - even the noise in standard non-deterministic GPU computations is sufficient. Furthermore, we construct a deep convolutional network with identical features at initialization and almost all of the weights initialized at 0 that can be trained to reach accuracy matching its standard-initialized counterpart. △ Less

Submitted 3 July, 2020; v1 submitted 11 December, 2019; originally announced December 2019.

Comments: This paper has been substantially modified, updated, and expanded with additional content (arXiv:2007.01038). To avoid confusion, we are withdrawing the old version of this article

arXiv:1909.11572 [pdf, other]

Wider Networks Learn Better Features

Authors: Dar Gilboa, Guy Gur-Ari

Abstract: Transferability of learned features between tasks can massively reduce the cost of training a neural network on a novel task. We investigate the effect of network width on learned features using activation atlases --- a visualization technique that captures features the entire hidden state responds to, as opposed to individual neurons alone. We find that, while individual neurons do not learn inte… ▽ More Transferability of learned features between tasks can massively reduce the cost of training a neural network on a novel task. We investigate the effect of network width on learned features using activation atlases --- a visualization technique that captures features the entire hidden state responds to, as opposed to individual neurons alone. We find that, while individual neurons do not learn interpretable features in wide networks, groups of neurons do. In addition, the hidden state of a wide network contains more information about the inputs than that of a narrow network trained to the same test accuracy. Inspired by this observation, we show that when fine-tuning the last layer of a network on a new task, performance improves significantly as the width of the network is increased, even though test accuracy on the original task is independent of width. △ Less

Submitted 25 September, 2019; originally announced September 2019.

arXiv:1906.00771 [pdf, other]

A Mean Field Theory of Quantized Deep Networks: The Quantization-Depth Trade-Off

Authors: Yaniv Blumenfeld, Dar Gilboa, Daniel Soudry

Abstract: Reducing the precision of weights and activation functions in neural network training, with minimal impact on performance, is essential for the deployment of these models in resource-constrained environments. We apply mean-field techniques to networks with quantized activations in order to evaluate the degree to which quantization degrades signal propagation at initialization. We derive initializa… ▽ More Reducing the precision of weights and activation functions in neural network training, with minimal impact on performance, is essential for the deployment of these models in resource-constrained environments. We apply mean-field techniques to networks with quantized activations in order to evaluate the degree to which quantization degrades signal propagation at initialization. We derive initialization schemes which maximize signal propagation in such networks and suggest why this is helpful for generalization. Building on these results, we obtain a closed form implicit equation for $L_{\max}$, the maximal trainable depth (and hence model capacity), given $N$, the number of quantization levels in the activation function. Solving this equation numerically, we obtain asymptotically: $L_{\max}\propto N^{1.82}$. △ Less

Submitted 31 October, 2019; v1 submitted 3 June, 2019; originally announced June 2019.

Comments: NIPS 2019

arXiv:1901.08987 [pdf, other]

Dynamical Isometry and a Mean Field Theory of LSTMs and GRUs

Authors: Dar Gilboa, Bo Chang, Minmin Chen, Greg Yang, Samuel S. Schoenholz, Ed H. Chi, Jeffrey Pennington

Abstract: Training recurrent neural networks (RNNs) on long sequence tasks is plagued with difficulties arising from the exponential explosion or vanishing of signals as they propagate forward or backward through the network. Many techniques have been proposed to ameliorate these issues, including various algorithmic and architectural modifications. Two of the most successful RNN architectures, the LSTM and… ▽ More Training recurrent neural networks (RNNs) on long sequence tasks is plagued with difficulties arising from the exponential explosion or vanishing of signals as they propagate forward or backward through the network. Many techniques have been proposed to ameliorate these issues, including various algorithmic and architectural modifications. Two of the most successful RNN architectures, the LSTM and the GRU, do exhibit modest improvements over vanilla RNN cells, but they still suffer from instabilities when trained on very long sequences. In this work, we develop a mean field theory of signal propagation in LSTMs and GRUs that enables us to calculate the time scales for signal propagation as well as the spectral properties of the state-to-state Jacobians. By optimizing these quantities in terms of the initialization hyperparameters, we derive a novel initialization scheme that eliminates or reduces training instabilities. We demonstrate the efficacy of our initialization scheme on multiple sequence tasks, on which it enables successful training while a standard initialization either fails completely or is orders of magnitude slower. We also observe a beneficial effect on generalization performance using this new initialization. △ Less

Submitted 23 May, 2019; v1 submitted 25 January, 2019; originally announced January 2019.

arXiv:1809.10313 [pdf, other]

Efficient Dictionary Learning with Gradient Descent

Authors: Dar Gilboa, Sam Buchanan, John Wright

Abstract: Randomly initialized first-order optimization algorithms are the method of choice for solving many high-dimensional nonconvex problems in machine learning, yet general theoretical guarantees cannot rule out convergence to critical points of poor objective value. For some highly structured nonconvex problems however, the success of gradient descent can be understood by studying the geometry of the… ▽ More Randomly initialized first-order optimization algorithms are the method of choice for solving many high-dimensional nonconvex problems in machine learning, yet general theoretical guarantees cannot rule out convergence to critical points of poor objective value. For some highly structured nonconvex problems however, the success of gradient descent can be understood by studying the geometry of the objective. We study one such problem -- complete orthogonal dictionary learning, and provide converge guarantees for randomly initialized gradient descent to the neighborhood of a global optimum. The resulting rates scale as low order polynomials in the dimension even though the objective possesses an exponential number of saddle points. This efficient convergence can be viewed as a consequence of negative curvature normal to the stable manifolds associated with saddle points, and we provide evidence that this feature is shared by other nonconvex problems of importance as well. △ Less

Submitted 26 September, 2018; originally announced September 2018.

arXiv:1609.00770 [pdf, other]

Stochastic Bouncy Particle Sampler

Authors: Ari Pakman, Dar Gilboa, David Carlson, Liam Paninski

Abstract: We introduce a novel stochastic version of the non-reversible, rejection-free Bouncy Particle Sampler (BPS), a Markov process whose sample trajectories are piecewise linear. The algorithm is based on simulating first arrival times in a doubly stochastic Poisson process using the thinning method, and allows efficient sampling of Bayesian posteriors in big datasets. We prove that in the BPS no bias… ▽ More We introduce a novel stochastic version of the non-reversible, rejection-free Bouncy Particle Sampler (BPS), a Markov process whose sample trajectories are piecewise linear. The algorithm is based on simulating first arrival times in a doubly stochastic Poisson process using the thinning method, and allows efficient sampling of Bayesian posteriors in big datasets. We prove that in the BPS no bias is introduced by noisy evaluations of the log-likelihood gradient. On the other hand, we argue that efficiency considerations favor a small, controllable bias in the construction of the thinning proposals, in exchange for faster mixing. We introduce a simple regression-based proposal intensity for the thinning method that controls this trade-off. We illustrate the algorithm in several examples in which it outperforms both unbiased, but slowly mixing stochastic versions of BPS, as well as biased stochastic gradient-based samplers. △ Less

Submitted 13 June, 2017; v1 submitted 2 September, 2016; originally announced September 2016.

Comments: ICML Camera ready version

arXiv:1506.08586 [pdf]

Focusing and Scanning through Flexible Multimode Fibers without Access to the Distal End

Authors: Shamir Rosen, Doron Gilboa, Ori Katz, Yaron Silberberg

Abstract: Multimode fibers (MMFs) are attractive ultra-thin replacements for state-of-the-art endoscopes, but the phase randomization in propagation through MMFs poses a major hurdle for imaging and focusing of light. Recently, this challenge has been addressed by pre-measuring the compensation for the fiber's complex input-output modes relations. Unfortunately, the sensitivity of this approach to fiber ben… ▽ More Multimode fibers (MMFs) are attractive ultra-thin replacements for state-of-the-art endoscopes, but the phase randomization in propagation through MMFs poses a major hurdle for imaging and focusing of light. Recently, this challenge has been addressed by pre-measuring the compensation for the fiber's complex input-output modes relations. Unfortunately, the sensitivity of this approach to fiber bending and temperature variations renders it inappropriate for many applications. Here, we demonstrate a truly endoscopic robust method for controlled in-situ focusing and scanning through a flexible uncharacterized MMF, whereby all the instrumentation is situated at the proximal end. We show that in graded-index (GRIN) fibers, light patterns at the proximal end allow retrieving information about the distal light distribution. We utilize these properties and two-photon fluorescence for robust controlled focusing through bended GRIN fibers. Our results carry potential for lensless two-photon micro-endoscopy. △ Less

Submitted 29 June, 2015; originally announced June 2015.

arXiv:1505.01033 [pdf, other]

doi 10.1103/PhysRevLett.115.073901

Light with tunable non-Markovian phase imprint

Authors: Robert Fischer, Itamar Vidal, Doron Gilboa, Ricardo R. B. Correia, Ana C. Ribeiro-Teixeira, Sandra D. Prado, Jandir Hickman, Yaron Silberberg

Abstract: We introduce a simple and flexible method to generate spatially non-Markovian light with tunable coherence properties in one and two dimensions. The unusual behavior of this light is demonstrated experimentally by probing the far field and recording its diffraction pattern after a double slit: In both cases we observe instead of a central intensity maximum a line or cross shaped dark region, whose… ▽ More We introduce a simple and flexible method to generate spatially non-Markovian light with tunable coherence properties in one and two dimensions. The unusual behavior of this light is demonstrated experimentally by probing the far field and recording its diffraction pattern after a double slit: In both cases we observe instead of a central intensity maximum a line or cross shaped dark region, whose width and profile depend on the non-Markovian coherence properties. Since these properties can be controlled and easily reproduced in experiment, the presented approach lends itself to serve as a testbed to gain a deeper understanding of non-Markovian processes. △ Less

Submitted 5 May, 2015; originally announced May 2015.

Journal ref: Phys. Rev. Lett. 115, 073901 (2015)

Showing 1–26 of 26 results for author: Gilboa, D