-
Policy-guided Monte Carlo on general state spaces: Application to glass-forming mixtures
Authors:
Leonardo Galliano,
Riccardo Rende,
Daniele Coslovich
Abstract:
Policy-guided Monte Carlo is an adaptive method to simulate classical interacting systems. It adjusts the proposal distribution of the Metropolis-Hastings algorithm to maximize the sampling efficiency, using a formalism inspired by reinforcement learning. In this work, we first extend the policy-guided method to deal with a general state space, comprising for instance both discrete and continuous…
▽ More
Policy-guided Monte Carlo is an adaptive method to simulate classical interacting systems. It adjusts the proposal distribution of the Metropolis-Hastings algorithm to maximize the sampling efficiency, using a formalism inspired by reinforcement learning. In this work, we first extend the policy-guided method to deal with a general state space, comprising for instance both discrete and continuous degrees of freedom, and then apply it to a few paradigmatic models of glass-forming mixtures. We assess the efficiency of a set of physically-inspired moves, whose proposal distributions are optimized through on-policy learning. Compared to conventional Monte Carlo methods, the optimized proposals are two orders of magnitude faster for an additive soft sphere mixture, but yield a much more limited speed-up for the well-studied Kob-Andersen model. We discuss the current limitations of the method and suggest possible ways to improve it.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Are queries and keys always relevant? A case study on Transformer wave functions
Authors:
Riccardo Rende,
Luciano Loris Viteritti
Abstract:
The dot product attention mechanism, originally designed for natural language processing (NLP) tasks, is a cornerstone of modern Transformers. It adeptly captures semantic relationships between word pairs in sentences by computing a similarity overlap between queries and keys. In this work, we explore the suitability of Transformers, focusing on their attention mechanisms, in the specific domain o…
▽ More
The dot product attention mechanism, originally designed for natural language processing (NLP) tasks, is a cornerstone of modern Transformers. It adeptly captures semantic relationships between word pairs in sentences by computing a similarity overlap between queries and keys. In this work, we explore the suitability of Transformers, focusing on their attention mechanisms, in the specific domain of the parametrization of variational wave functions to approximate ground states of quantum many-body spin Hamiltonians. Specifically, we perform numerical simulations on the two-dimensional $J_1$-$J_2$ Heisenberg model, a common benchmark in the field of quantum-many body systems on lattice. By comparing the performance of standard attention mechanisms with a simplified version that excludes queries and keys, relying solely on positions, we achieve competitive results while reducing computational cost and parameter usage. Furthermore, through the analysis of the attention maps generated by standard attention mechanisms, we show that the attention weights become effectively input-independent at the end of the optimization. We support the numerical results with analytical calculations, providing physical insights of why queries and keys should be, in principle, omitted from the attention mechanism when studying large systems. Interestingly, the same arguments can be extended to the NLP domain, in the limit of long input sentences.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Fine-tuning Neural Network Quantum States
Authors:
Riccardo Rende,
Sebastian Goldt,
Federico Becca,
Luciano Loris Viteritti
Abstract:
Recent progress in the design and optimization of Neural-Network Quantum States (NQS) have made them an effective method to investigate ground-state properties of quantum many-body systems. In contrast to the standard approach of training a separate NQS from scratch at every point of the phase diagram, we demonstrate that the optimization of a NQS at a highly expressive point of the phase diagram…
▽ More
Recent progress in the design and optimization of Neural-Network Quantum States (NQS) have made them an effective method to investigate ground-state properties of quantum many-body systems. In contrast to the standard approach of training a separate NQS from scratch at every point of the phase diagram, we demonstrate that the optimization of a NQS at a highly expressive point of the phase diagram (i.e., close to a phase transition) yields interpretable features that can be reused to accurately describe a wide region across the transition. We demonstrate the feasibility of our approach on different systems in one and two dimensions by initially pretraining a NQS at a given point of the phase diagram, followed by fine-tuning only the output layer for all other points. Notably, the computational cost of the fine-tuning step is very low compared to the pretraining stage. We argue that the reduced cost of this paradigm has significant potential to advance the exploration of condensed matter systems using NQS, mirroring the success of fine-tuning in machine learning and natural language processing.
△ Less
Submitted 18 March, 2024; v1 submitted 12 March, 2024;
originally announced March 2024.
-
Transformer Wave Function for the Shastry-Sutherland Model: emergence of a Spin-Liquid Phase
Authors:
Luciano Loris Viteritti,
Riccardo Rende,
Alberto Parola,
Sebastian Goldt,
Federico Becca
Abstract:
Quantum magnetism in two-dimensional systems represents a lively branch of modern condensed-matter physics. In the presence of competing super-exchange couplings, magnetic order is frustrated and can be suppressed down to zero temperature, leading to exotic ground states. The Shastry-Sutherland model, describing $S=1/2$ degrees of freedom interacting in a two-dimensional lattice, portrays a simple…
▽ More
Quantum magnetism in two-dimensional systems represents a lively branch of modern condensed-matter physics. In the presence of competing super-exchange couplings, magnetic order is frustrated and can be suppressed down to zero temperature, leading to exotic ground states. The Shastry-Sutherland model, describing $S=1/2$ degrees of freedom interacting in a two-dimensional lattice, portrays a simple example of highly-frustrated magnetism, capturing the low-temperature behavior of SrCu$_2$(BO$_3$)$_2$ with its intriguing properties. Here, we investigate this problem by using a Vision Transformer to define an extremely accurate variational wave function. From a technical side, a pivotal achievement relies on using a deep neural network with real-valued parameters, parametrized with a Transformer, to map physical spin configurations into a high-dimensional feature space. Within this abstract space, the determination of the ground-state properties is simplified, requiring only a single output layer with complex-valued parameters. From the physical side, we supply strong evidence for the stabilization of a spin-liquid between the plaquette and antiferromagnetic phases. Our findings underscore the potential of Neural-Network Quantum States as a valuable tool for probing uncharted phases of matter, opening opportunities to establish the properties of many-body systems.
△ Less
Submitted 12 February, 2024; v1 submitted 28 November, 2023;
originally announced November 2023.
-
A simple linear algebra identity to optimize Large-Scale Neural Network Quantum States
Authors:
Riccardo Rende,
Luciano Loris Viteritti,
Lorenzo Bardone,
Federico Becca,
Sebastian Goldt
Abstract:
Neural network architectures have been increasingly used to represent quantum many-body wave functions. In this context, a large number of parameters is necessary, causing a crisis in traditional optimization methods. The Stochastic Reconfiguration (SR) approach has been widely used in the past for cases with a limited number of parameters $P$, since it involves the inversion of a $P \times P$ mat…
▽ More
Neural network architectures have been increasingly used to represent quantum many-body wave functions. In this context, a large number of parameters is necessary, causing a crisis in traditional optimization methods. The Stochastic Reconfiguration (SR) approach has been widely used in the past for cases with a limited number of parameters $P$, since it involves the inversion of a $P \times P$ matrix and becomes computationally infeasible when $P$ exceeds a few thousands. This is the major limitation in the context of deep learning, where the number of parameters significantly surpasses the number of samples $M$ used for stochastic estimates ($P \gg M$). Here, we show that SR can be applied exactly by leveraging a simple linear algebra identity to reduce the problem to the inversion of a manageable $M \times M$ matrix. We demonstrate the effectiveness of our method by optimizing a Deep Transformer architecture featuring approximately 300,000 parameters without any assumption on the sign structure of the ground state. Our approach achieves state-of-the-art ground-state energy on the $J_1$-$J_2$ Heisenberg model in the challenging point $J_2/J_1=0.5$ on the $10\times10$ square lattice, a demanding benchmark problem in the field of highly-frustrated magnetism. This work marks a significant step forward in the scalability and efficiency of SR for Neural Network Quantum States (NNQS), making them a promising method to investigate unknown phases of matter of quantum systems, where other methods struggle.
△ Less
Submitted 9 October, 2023;
originally announced October 2023.
-
Map** of attention mechanisms to a generalized Potts model
Authors:
Riccardo Rende,
Federica Gerace,
Alessandro Laio,
Sebastian Goldt
Abstract:
Transformers are neural networks that revolutionized natural language processing and machine learning. They process sequences of inputs, like words, using a mechanism called self-attention, which is trained via masked language modeling (MLM). In MLM, a word is randomly masked in an input sequence, and the network is trained to predict the missing word. Despite the practical success of transformers…
▽ More
Transformers are neural networks that revolutionized natural language processing and machine learning. They process sequences of inputs, like words, using a mechanism called self-attention, which is trained via masked language modeling (MLM). In MLM, a word is randomly masked in an input sequence, and the network is trained to predict the missing word. Despite the practical success of transformers, it remains unclear what type of data distribution self-attention can learn efficiently. Here, we show analytically that if one decouples the treatment of word positions and embeddings, a single layer of self-attention learns the conditionals of a generalized Potts model with interactions between sites and Potts colors. Moreover, we show that training this neural network is exactly equivalent to solving the inverse Potts problem by the so-called pseudo-likelihood method, well known in statistical physics. Using this map**, we compute the generalization error of self-attention in a model scenario analytically using the replica method.
△ Less
Submitted 4 April, 2024; v1 submitted 14 April, 2023;
originally announced April 2023.
-
Transformer variational wave functions for frustrated quantum spin systems
Authors:
Luciano Loris Viteritti,
Riccardo Rende,
Federico Becca
Abstract:
The Transformer architecture has become the state-of-art model for natural language processing tasks and, more recently, also for computer vision tasks, thus defining the Vision Transformer (ViT) architecture. The key feature is the ability to describe long-range correlations among the elements of the input sequences, through the so-called self-attention mechanism. Here, we propose an adaptation o…
▽ More
The Transformer architecture has become the state-of-art model for natural language processing tasks and, more recently, also for computer vision tasks, thus defining the Vision Transformer (ViT) architecture. The key feature is the ability to describe long-range correlations among the elements of the input sequences, through the so-called self-attention mechanism. Here, we propose an adaptation of the ViT architecture with complex parameters to define a new class of variational neural-network states for quantum many-body systems, the ViT wave function. We apply this idea to the one-dimensional $J_1$-$J_2$ Heisenberg model, demonstrating that a relatively simple parametrization gets excellent results for both gapped and gapless phases. In this case, excellent accuracies are obtained by a relatively shallow architecture, with a single layer of self-attention, thus largely simplifying the original architecture. Still, the optimization of a deeper structure is possible and can be used for more challenging models, most notably highly-frustrated systems in two dimensions. The success of the ViT wave function relies on mixing both local and global operations, thus enabling the study of large systems with high accuracy.
△ Less
Submitted 11 June, 2023; v1 submitted 10 November, 2022;
originally announced November 2022.