-
Jamba: A Hybrid Transformer-Mamba Language Model
Authors:
Opher Lieber,
Barak Lenz,
Hofit Bata,
Gal Cohen,
Jhonathan Osin,
Itay Dalmedigos,
Erez Safahi,
Shaked Meirom,
Yonatan Belinkov,
Shai Shalev-Shwartz,
Omri Abend,
Raz Alon,
Tomer Asida,
Amir Bergman,
Roman Glozman,
Michael Gokhman,
Avashalom Manevich,
Nir Ratner,
Noam Rozen,
Erez Shwartz,
Mor Zusman,
Yoav Shoham
Abstract:
We present Jamba, a new base large language model based on a novel hybrid Transformer-Mamba mixture-of-experts (MoE) architecture. Specifically, Jamba interleaves blocks of Transformer and Mamba layers, enjoying the benefits of both model families. MoE is added in some of these layers to increase model capacity while kee** active parameter usage manageable. This flexible architecture allows reso…
▽ More
We present Jamba, a new base large language model based on a novel hybrid Transformer-Mamba mixture-of-experts (MoE) architecture. Specifically, Jamba interleaves blocks of Transformer and Mamba layers, enjoying the benefits of both model families. MoE is added in some of these layers to increase model capacity while kee** active parameter usage manageable. This flexible architecture allows resource- and objective-specific configurations. In the particular configuration we have implemented, we end up with a powerful model that fits in a single 80GB GPU. Built at large scale, Jamba provides high throughput and small memory footprint compared to vanilla Transformers, and at the same time state-of-the-art performance on standard language model benchmarks and long-context evaluations. Remarkably, the model presents strong results for up to 256K tokens context length. We study various architectural decisions, such as how to combine Transformer and Mamba layers, and how to mix experts, and show that some of them are crucial in large scale modeling. We also describe several interesting properties of these architectures which the training and evaluation of Jamba have revealed, and plan to release checkpoints from various ablation runs, to encourage further exploration of this novel architecture. We make the weights of our implementation of Jamba publicly available under a permissive license.
△ Less
Submitted 3 July, 2024; v1 submitted 28 March, 2024;
originally announced March 2024.
-
On the cuprates' universal waterfall feature: evidence of a momentum-driven crossover
Authors:
Benjamin Bacq-Labreuil,
Chafic Fawaz,
Yuichi Okazaki,
Yukiko Obata,
Hervé Cercellier,
Patrick Lefevre,
François Bertran,
David Santos-Cottin,
Hajime Yamamoto,
Ikuya Yamada,
Masaki Azuma,
Koji Horiba,
Hiroshi Kumigashira,
Matteo d'Astuto,
Silke Biermann,
Benjamin Lenz
Abstract:
We study two related universal anomalies of the spectral function of cuprates, so called waterfall and high-energy kink features, by a combined cellular dynamical mean-field theory and angle-resolved photoemission study for the oxychloride Na$_x$Ca$_{2-x}$CuO$_2$Cl$_2$ (Na-CCOC). Tracing their origin back to an interplay of spin-polaron and local correlation effects both in undoped and hole-doped…
▽ More
We study two related universal anomalies of the spectral function of cuprates, so called waterfall and high-energy kink features, by a combined cellular dynamical mean-field theory and angle-resolved photoemission study for the oxychloride Na$_x$Ca$_{2-x}$CuO$_2$Cl$_2$ (Na-CCOC). Tracing their origin back to an interplay of spin-polaron and local correlation effects both in undoped and hole-doped (Na-)CCOC, we establish them as a universal crossover between regions differing in the momentum-dependence of the coupling and not necessarily in the related quasiparticles' energies. The proposed scenario extends to do** levels coinciding with the cuprate's superconducting dome and motivates further investigations of the fate of spin-polarons in the superconducting phase.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
The rich phase diagram of the prototypical iridate Ba$_2$IrO$_4$: Effective low-energy models and metal-insulator transition
Authors:
Francesco Cassol,
Léo Gaspard,
Michele Casula,
Cyril Martins,
Benjamin Lenz
Abstract:
In the quest of new exotic phases of matter due to the interplay of various interactions, iridates hosting a spin-orbit entangled $j_{\mathrm{eff}}=1/2$ ground state have been in the spotlight in recent years. Also in view of parallels with the low-energy physics of high-temperature superconducting cuprates, the validity of a single- or few-band picture in terms of the $j_{\mathrm{eff}}$ states is…
▽ More
In the quest of new exotic phases of matter due to the interplay of various interactions, iridates hosting a spin-orbit entangled $j_{\mathrm{eff}}=1/2$ ground state have been in the spotlight in recent years. Also in view of parallels with the low-energy physics of high-temperature superconducting cuprates, the validity of a single- or few-band picture in terms of the $j_{\mathrm{eff}}$ states is key. However, in particular for its structurally simple member Ba$_2$IrO$_4$, such a systematic construction and subsequent analysis of minimal low-energy models are still missing. Here we show by means of a combination of different ab initio techniques with dynamical mean-field theory that a three-band model in terms of Ir-$j_{\mathrm{eff}}$ states fully retains the low-energy physics of the system as compared to a full Ir-$5d$ model. Providing a detailed study of the three-band model in terms of spin-orbit coupling, Hund's coupling and Coulomb interactions, we map out a rich phase diagram and identify a region of effective one-band metal-insulator transition relevant to Ba$_2$IrO$_4$. Compared to available angle-resolved photoemission spectra, we find good agreement of salient aspects of the calculated spectral function and identify features which require the inclusion of non-local fluctuations. In a broader context, we envisage the three- and five-band models developed in this study to be relevant for the study of doped Ba$_2$IrO$_4$ and to clarify further the similarities and differences with cuprates.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
Matrix-product-state-based band-Lanczos solver for quantum cluster approaches
Authors:
Sebastian Paeckel,
Thomas Köhler,
Salvatore R. Manmana,
Benjamin Lenz
Abstract:
We present a matrix-product state (MPS) based band-Lanczos method as solver for quantum cluster methods such as the variational cluster approximation (VCA). While a naïve implementation of MPS as cluster solver would barely improve its range of applicability, we show that our approach makes it possible to treat cluster geometries well beyond the reach of exact diagonalization methods. The key modi…
▽ More
We present a matrix-product state (MPS) based band-Lanczos method as solver for quantum cluster methods such as the variational cluster approximation (VCA). While a naïve implementation of MPS as cluster solver would barely improve its range of applicability, we show that our approach makes it possible to treat cluster geometries well beyond the reach of exact diagonalization methods. The key modifications we introduce are a continuous energy truncation combined with a convergence criterion that is more robust against approximation errors introduced by the MPS representation and provides a bound to deviations in the resulting Green's function. The potential of the resulting cluster solver is demonstrated by computing the self-energy functional for the single-band Hubbard model at half filling in the strongly correlated regime, on different cluster geometries. Here, we find that only when treating large cluster sizes, observables can be extrapolated to the thermodynamic limit, which we demonstrate at the example of the staggered magnetization. Treating clusters sizes with up to $6\times 6$ sites we obtain excellent agreement with quantum Monte-Carlo results.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
Parametrization of the Coulomb interaction matrix with point-group symmetry
Authors:
Coraline Letouzé,
Guillaume Radtke,
Benjamin Lenz,
Christian Brouder
Abstract:
Coulomb integrals, i.e., matrix elements of bare or screened Coulomb interaction between one-electron orbitals, are fundamental objects in many approaches developed to tackle the challenging problem of calculating the electronic structure of strongly correlated materials. In this paper, Coulomb integrals are analyzed by considering both the point group symmetry of the site occupied by the atom in…
▽ More
Coulomb integrals, i.e., matrix elements of bare or screened Coulomb interaction between one-electron orbitals, are fundamental objects in many approaches developed to tackle the challenging problem of calculating the electronic structure of strongly correlated materials. In this paper, Coulomb integrals are analyzed by considering both the point group symmetry of the site occupied by the atom in the crystal or molecule and the permutation symmetries of the orbitals in the integrals. In particular, the case where one-electron orbitals form the basis of a general (i.e. a real, complex or pseudo-complex) irreducible representation is considered. Explicit formulas are provided to calculate all integrals of the interaction tensor in terms of a minimum set of independent ones. The effect of a symmetry breaking is also investigated by describing Coulomb integrals of a group in terms of those of one of its subgroups. We develope the specific example of O(3) as the larger group which can therefore be used to quantify the deviation of a specific system from the spherical symmetry. Possible applications of the presented framework include the calculation of solid-state and molecular spectroscopies via multiplet techniques, dynamical mean-field theory or the GW approximation.
△ Less
Submitted 16 June, 2023;
originally announced June 2023.
-
Human or Not? A Gamified Approach to the Turing Test
Authors:
Daniel Jannai,
Amos Meron,
Barak Lenz,
Yoav Levine,
Yoav Shoham
Abstract:
We present "Human or Not?", an online game inspired by the Turing test, that measures the capability of AI chatbots to mimic humans in dialog, and of humans to tell bots from other humans. Over the course of a month, the game was played by over 1.5 million users who engaged in anonymous two-minute chat sessions with either another human or an AI language model which was prompted to behave like hum…
▽ More
We present "Human or Not?", an online game inspired by the Turing test, that measures the capability of AI chatbots to mimic humans in dialog, and of humans to tell bots from other humans. Over the course of a month, the game was played by over 1.5 million users who engaged in anonymous two-minute chat sessions with either another human or an AI language model which was prompted to behave like humans. The task of the players was to correctly guess whether they spoke to a person or to an AI. This largest scale Turing-style test conducted to date revealed some interesting facts. For example, overall users guessed the identity of their partners correctly in only 68% of the games. In the subset of the games in which users faced an AI bot, users had even lower correct guess rates of 60% (that is, not much higher than chance). This white paper details the development, deployment, and results of this unique experiment. While this experiment calls for many extensions and refinements, these findings already begin to shed light on the inevitable near future which will commingle humans and AI.
△ Less
Submitted 31 May, 2023;
originally announced May 2023.
-
Paramagnon dispersion and dam** in doped Na$_{x}$Ca$_{2-x}$CuO$_2$Cl$_2$
Authors:
Blair W. Lebert,
Benjamin Bacq-Labreuil,
Mark P. M. Dean,
Kari Ruotsalainen,
Alessandro Nicolaou,
Simo Huotari,
Ikuya Yamada,
Hajime Yamamoto,
Masaki Azuma,
Nicholas B. Brookes,
Flora Yakhou,
Hu Miao,
David Santos-Cottin,
Benjamin Lenz,
Silke Biermann,
Matteo d'Astuto
Abstract:
Using Resonant Inelastic X-ray Scattering, we measure the paramagnon dispersion and dam** of undoped, antiferromagnetic Ca$_2$CuO$_2$Cl$_2$ as well as doped, superconducting Na$_{x}$Ca$_{2-x}$CuO$_2$Cl$_2$. Our estimation of the spin-exchange parameter and width of the paramagnon peak at the zone boundary $X=(0.5,0)$ confirms that no simple relation can be drawn between these parameters and the…
▽ More
Using Resonant Inelastic X-ray Scattering, we measure the paramagnon dispersion and dam** of undoped, antiferromagnetic Ca$_2$CuO$_2$Cl$_2$ as well as doped, superconducting Na$_{x}$Ca$_{2-x}$CuO$_2$Cl$_2$. Our estimation of the spin-exchange parameter and width of the paramagnon peak at the zone boundary $X=(0.5,0)$ confirms that no simple relation can be drawn between these parameters and the critical temperature $T_\mathrm{c}$. Consistently with other cuprate compounds, we show that upon do** there is a slight softening at $(0.25,0)$, but not at the zone boundary $X$. In combination with these measurements we perform calculations of the dynamical spin structure factor of the one-band Hubbard model using cluster dynamical mean-field theory. The calculations are in excellent agreement with the experiment in the undoped case, both in terms of energy position and width. While the increase in width is also captured upon do**, the dynamical spin structure factor shows a sizable softening at $X$, which provides insightful information on the length-scale of the spin fluctuations in doped cuprates.
△ Less
Submitted 5 July, 2023; v1 submitted 7 March, 2023;
originally announced March 2023.
-
Two-dimensional fluctuations and competing phases in the stripe-like antiferromagnet BaCoS$_2$
Authors:
Haneen Abushammala,
Benjamin Lenz,
Benoit Baptiste,
David Santos-Cottin,
Pierre Toulemonde,
Michele Casula,
Yannick Klein,
Andrea Gauzzi
Abstract:
By means of a combined x-ray diffraction, magnetic susceptibility and specific heat study, we investigate the interplay between orthorhombic distortion and stripe-like antiferromagnetic (AFM) order in the Mott insulator BaCoS$_{2}$ at $T_N=290$ K. The data give evidence of a purely electronic AFM transition with no participation of the lattice. The observation of large thermal fluctuations in the…
▽ More
By means of a combined x-ray diffraction, magnetic susceptibility and specific heat study, we investigate the interplay between orthorhombic distortion and stripe-like antiferromagnetic (AFM) order in the Mott insulator BaCoS$_{2}$ at $T_N=290$ K. The data give evidence of a purely electronic AFM transition with no participation of the lattice. The observation of large thermal fluctuations in the vicinity of $T_N$ and a Schottky anomaly unveils competing ground states within a minute $\sim$1 meV energy range that differ in the orbital and spin configurations of the Co ions. This interpretation suggests that the stripe-like order results from a spontaneous symmetry breaking of the geometrically frustrated pristine tetragonal phase, which offers an ideal playground to study the driving force of multi-orbital Mott transitions without the participation of the lattice.
△ Less
Submitted 23 June, 2023; v1 submitted 23 February, 2023;
originally announced February 2023.
-
Order from disorder phenomena in BaCoS$_2$
Authors:
Benjamin Lenz,
Michele Fabrizio,
Michele Casula
Abstract:
At $T_N\simeq 305~\text{K}$ the layered insulator BaCoS$_2$ transitions to a columnar antiferromagnet that signals non-negligible magnetic frustration despite the relatively high $T_N$, all the more surprising given its quasi two-dimensional structure. Here, we show by combining ab initio and model calculations that the magnetic transition is an order-from-disorder phenomenon, which not only drive…
▽ More
At $T_N\simeq 305~\text{K}$ the layered insulator BaCoS$_2$ transitions to a columnar antiferromagnet that signals non-negligible magnetic frustration despite the relatively high $T_N$, all the more surprising given its quasi two-dimensional structure. Here, we show by combining ab initio and model calculations that the magnetic transition is an order-from-disorder phenomenon, which not only drives the columnar $C_4\to C_2$ symmetry breaking, but also, and more importantly, the inter-layer coherence responsible for the finite Néel transition temperature. This uncommon ordering mechanism, actively contributed by orbital degrees of freedom, hints at an abundance of low energy excitations below and, especially, above $T_N$, not in disagreement with experimental evidences, and might as well emerge in other layered correlated compounds showing frustrated magnetism at low temperature.
△ Less
Submitted 23 February, 2023;
originally announced February 2023.
-
Oxygen vacancies at the origin of pinned moments in oxide interfaces: the example of tetragonal CuO/SrTiO$_3$
Authors:
Benjamin Bacq-Labreuil,
Benjamin Lenz,
Silke Biermann
Abstract:
Obtaining an accurate theoretical description of the emergent phenomena in oxide heterostructures is a major challenge. Recently, intriguing paramagnetic spin and pinned orbital moments have been discovered by x-ray magnetic circular dichroïsm measurements at the Cu $L_{2,3}$-edge of a tetragonal CuO/SrTiO$_3$ heterostructure. Using first principles calculations, we propose a scenario that explain…
▽ More
Obtaining an accurate theoretical description of the emergent phenomena in oxide heterostructures is a major challenge. Recently, intriguing paramagnetic spin and pinned orbital moments have been discovered by x-ray magnetic circular dichroïsm measurements at the Cu $L_{2,3}$-edge of a tetragonal CuO/SrTiO$_3$ heterostructure. Using first principles calculations, we propose a scenario that explains both types of moments, based on the formation of oxygen vacancies in the TiO$_2$ interface layer. We show the emergence of a paramagnetic 2D electron gas hosted in the interface CuO layer. It is invisible at the Ti $L_{2,3}$-edge since the valence of the Ti atoms remains unchanged. Strong structural distortions breaking both the local and global fourfold rotation $C_4$ symmetries at the interface lead to the in-plane pinning of the Cu orbital moment close to the vacancy. Our results, and in particular the pinning of the orbital moment, may have implications for other systems, especially monoxide/dioxide interfaces with similar metal-oxygen bond length and weak spin-orbit coupling.
△ Less
Submitted 26 October, 2022;
originally announced October 2022.
-
The $S=1$ dimer system K$_2$Ni(MoO$_4$)$_2$: a candidate for magnon Bose-Einstein condensation
Authors:
B. Lenz,
B. Koteswararao,
S. Biermann,
P. Khuntia,
M. Baenitz,
S. K. Panda
Abstract:
Dimerized quantum magnets provide a unique possibility to investigate Bose-Einstein condensation of magnetic excitations in crystalline systems at low temperature. Here, we model the low-temperature magnetic properties of the recently synthesized spin $S=1$ dimer system K${}_2$Ni(MoO${}_4$)$_2$ and propose it as a new candidate material for triplon and quintuplon condensation. Based on a first pri…
▽ More
Dimerized quantum magnets provide a unique possibility to investigate Bose-Einstein condensation of magnetic excitations in crystalline systems at low temperature. Here, we model the low-temperature magnetic properties of the recently synthesized spin $S=1$ dimer system K${}_2$Ni(MoO${}_4$)$_2$ and propose it as a new candidate material for triplon and quintuplon condensation. Based on a first principles analysis of its electronic structure, we derive an effective spin-dimer model that we first solve within a mean-field approximation to refine its parameters in comparison to experiment. Finally, the model is solved by employing a numerically exact quantum Monte Carlo technique which leads to magnetic properties in good agreement with experimental magnetization and thermodynamic results. We discuss the emergent spin model of K${}_2$Ni(MoO${}_4$)$_2$ in view of condensation of magnetic excitations in a broad parameter regime. Finally, we comment on a geometrical peculiarity of the proposed model and discuss how it could host a supersolid phase upon structural distortions.
△ Less
Submitted 19 August, 2022;
originally announced August 2022.
-
MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning
Authors:
Ehud Karpas,
Omri Abend,
Yonatan Belinkov,
Barak Lenz,
Opher Lieber,
Nir Ratner,
Yoav Shoham,
Hofit Bata,
Yoav Levine,
Kevin Leyton-Brown,
Dor Muhlgay,
Noam Rozen,
Erez Schwartz,
Gal Shachaf,
Shai Shalev-Shwartz,
Amnon Shashua,
Moshe Tenenholtz
Abstract:
Huge language models (LMs) have ushered in a new era for AI, serving as a gateway to natural-language-based knowledge tasks. Although an essential element of modern AI, LMs are also inherently limited in a number of ways. We discuss these limitations and how they can be avoided by adopting a systems approach. Conceptualizing the challenge as one that involves knowledge and reasoning in addition to…
▽ More
Huge language models (LMs) have ushered in a new era for AI, serving as a gateway to natural-language-based knowledge tasks. Although an essential element of modern AI, LMs are also inherently limited in a number of ways. We discuss these limitations and how they can be avoided by adopting a systems approach. Conceptualizing the challenge as one that involves knowledge and reasoning in addition to linguistic processing, we define a flexible architecture with multiple neural models, complemented by discrete knowledge and reasoning modules. We describe this neuro-symbolic architecture, dubbed the Modular Reasoning, Knowledge and Language (MRKL, pronounced "miracle") system, some of the technical challenges in implementing it, and Jurassic-X, AI21 Labs' MRKL system implementation.
△ Less
Submitted 1 May, 2022;
originally announced May 2022.
-
Standing on the Shoulders of Giant Frozen Language Models
Authors:
Yoav Levine,
Itay Dalmedigos,
Ori Ram,
Yoel Zeldes,
Daniel Jannai,
Dor Muhlgay,
Yoni Osin,
Opher Lieber,
Barak Lenz,
Shai Shalev-Shwartz,
Amnon Shashua,
Kevin Leyton-Brown,
Yoav Shoham
Abstract:
Huge pretrained language models (LMs) have demonstrated surprisingly good zero-shot capabilities on a wide variety of tasks. This gives rise to the appealing vision of a single, versatile model with a wide range of functionalities across disparate applications. However, current leading techniques for leveraging a "frozen" LM -- i.e., leaving its weights untouched -- still often underperform fine-t…
▽ More
Huge pretrained language models (LMs) have demonstrated surprisingly good zero-shot capabilities on a wide variety of tasks. This gives rise to the appealing vision of a single, versatile model with a wide range of functionalities across disparate applications. However, current leading techniques for leveraging a "frozen" LM -- i.e., leaving its weights untouched -- still often underperform fine-tuning approaches which modify these weights in a task-dependent way. Those, in turn, suffer forgetfulness and compromise versatility, suggesting a tradeoff between performance and versatility. The main message of this paper is that current frozen-model techniques such as prompt tuning are only the tip of the iceberg, and more powerful methods for leveraging frozen LMs can do just as well as fine tuning in challenging domains without sacrificing the underlying model's versatility. To demonstrate this, we introduce three novel methods for leveraging frozen models: input-dependent prompt tuning, frozen readers, and recursive LMs, each of which vastly improves on current frozen-model approaches. Indeed, some of our methods even outperform fine-tuning approaches in domains currently dominated by the latter. The computational cost of each method is higher than that of existing frozen model methods, but still negligible relative to a single pass through a huge frozen LM. Each of these methods constitutes a meaningful contribution in its own right, but by presenting these contributions together we aim to convince the reader of a broader message that goes beyond the details of any given method: that frozen models have untapped potential and that fine-tuning is often unnecessary.
△ Less
Submitted 21 April, 2022;
originally announced April 2022.
-
Formation of CuO$_2$ sublattices by suppression of interlattice correlations in tetragonal CuO
Authors:
Max Bramberger,
Benjamin Bacq-Labreuil,
Martin Grundner,
Silke Biermann,
Ulrich Schollwöck,
Sebastian Paeckel,
Benjamin Lenz
Abstract:
We investigate the tetragonal phase of the binary transition metal oxide CuO (t-CuO) within the context of cellular dynamical mean-field theory. Due to its strong antiferromagnetic correlations and simple structure, analysing the physics of t-CuO is of high interest as it may pave the way towards a more complete understanding of high temperature superconductivity in hole-doped antiferromagnets. In…
▽ More
We investigate the tetragonal phase of the binary transition metal oxide CuO (t-CuO) within the context of cellular dynamical mean-field theory. Due to its strong antiferromagnetic correlations and simple structure, analysing the physics of t-CuO is of high interest as it may pave the way towards a more complete understanding of high temperature superconductivity in hole-doped antiferromagnets. In this work we give a formal justification for the weak coupling assumption that has previously been made for the interconnected sublattices within a single layer of t-CuO by studying the non-local self-energies of the system. We compute momentum-resolved spectral functions using a Matrix Product State (MPS)-based impurity solver directly on the real axis, which does not require any numerically ill-conditioned analytic continuation. The agreement with photoemission spectroscopy indicates that a single band Hubbard model is sufficient to capture the material's low energy physics. We perform calculations on a range of different temperatures, finding two magnetic regimes, for which we identify the driving mechanism behind their respective insulating state. Finally, we show that in the hole-doped regime the sublattice structure of t-CuO has interesting consequences on the symmetry of the superconducting state.
△ Less
Submitted 31 January, 2023; v1 submitted 15 March, 2022;
originally announced March 2022.
-
Exemplar Guided Active Learning
Authors:
Jason Hartford,
Kevin Leyton-Brown,
Hadas Raviv,
Dan Padnos,
Shahar Lev,
Barak Lenz
Abstract:
We consider the problem of wisely using a limited budget to label a small subset of a large unlabeled dataset. We are motivated by the NLP problem of word sense disambiguation. For any word, we have a set of candidate labels from a knowledge base, but the label set is not necessarily representative of what occurs in the data: there may exist labels in the knowledge base that very rarely occur in t…
▽ More
We consider the problem of wisely using a limited budget to label a small subset of a large unlabeled dataset. We are motivated by the NLP problem of word sense disambiguation. For any word, we have a set of candidate labels from a knowledge base, but the label set is not necessarily representative of what occurs in the data: there may exist labels in the knowledge base that very rarely occur in the corpus because the sense is rare in modern English; and conversely there may exist true labels that do not exist in our knowledge base. Our aim is to obtain a classifier that performs as well as possible on examples of each "common class" that occurs with frequency above a given threshold in the unlabeled set while annotating as few examples as possible from "rare classes" whose labels occur with less than this frequency. The challenge is that we are not informed which labels are common and which are rare, and the true label distribution may exhibit extreme skew. We describe an active learning approach that (1) explicitly searches for rare classes by leveraging the contextual embedding spaces provided by modern language models, and (2) incorporates a stop** rule that ignores classes once we prove that they occur below our target threshold with high probability. We prove that our algorithm only costs logarithmically more than a hypothetical approach that knows all true label frequencies and show experimentally that incorporating automated search can significantly reduce the number of samples needed to reach target accuracy levels.
△ Less
Submitted 2 November, 2020;
originally announced November 2020.
-
PMI-Masking: Principled masking of correlated spans
Authors:
Yoav Levine,
Barak Lenz,
Opher Lieber,
Omri Abend,
Kevin Leyton-Brown,
Moshe Tennenholtz,
Yoav Shoham
Abstract:
Masking tokens uniformly at random constitutes a common flaw in the pretraining of Masked Language Models (MLMs) such as BERT. We show that such uniform masking allows an MLM to minimize its training objective by latching onto shallow local signals, leading to pretraining inefficiency and suboptimal downstream performance. To address this flaw, we propose PMI-Masking, a principled masking strategy…
▽ More
Masking tokens uniformly at random constitutes a common flaw in the pretraining of Masked Language Models (MLMs) such as BERT. We show that such uniform masking allows an MLM to minimize its training objective by latching onto shallow local signals, leading to pretraining inefficiency and suboptimal downstream performance. To address this flaw, we propose PMI-Masking, a principled masking strategy based on the concept of Pointwise Mutual Information (PMI), which jointly masks a token n-gram if it exhibits high collocation over the corpus. PMI-Masking motivates, unifies, and improves upon prior more heuristic approaches that attempt to address the drawback of random uniform token masking, such as whole-word masking, entity/phrase masking, and random-span masking. Specifically, we show experimentally that PMI-Masking reaches the performance of prior masking approaches in half the training time, and consistently improves performance at the end of training.
△ Less
Submitted 5 October, 2020;
originally announced October 2020.
-
Pronounced 2/3 magnetization plateau in a frustrated $S$ = 1 isolated spin-triangle compound: Interplay between Heisenberg and biquadratic exchange interactions
Authors:
S. Chattopadhyay,
B. Lenz,
S. Kanungo,
Sushila,
S. K. Panda,
S. Biermann,
W. Schnelle,
K. Manna,
R. Kataria,
M. Uhlarz,
Y. Skourski,
S. A. Zvyagin,
A. Ponomaryov,
T. Herrmannsdörfer,
R. Patra,
J. Wosnitza
Abstract:
We report the synthesis and characterization of a new quantum magnet [2-[Bis(2-hydroxybenzyl)aminomethyl]pyridine]Ni(II)-trimer (BHAP-Ni3) in single-crystalline form. Our combined experimental and theoretical investigations reveal an exotic spin state that stabilizes a robust 2/3 magnetization plateau between 7 and 20 T in an external magnetic field. AC-susceptibility measurements show the absence…
▽ More
We report the synthesis and characterization of a new quantum magnet [2-[Bis(2-hydroxybenzyl)aminomethyl]pyridine]Ni(II)-trimer (BHAP-Ni3) in single-crystalline form. Our combined experimental and theoretical investigations reveal an exotic spin state that stabilizes a robust 2/3 magnetization plateau between 7 and 20 T in an external magnetic field. AC-susceptibility measurements show the absence of any magnetic order/glassy state down to 60 mK. The magnetic ground state is disordered and specific-heat measurements reveal the gapped nature of the spin excitations. Most interestingly, our theoretical modeling suggests that the 2/3 magnetization plateau emerges due to the interplay between antiferromagnetic Heisenberg and biquadratic exchange interactions within nearly isolated spin $S$ = 1 triangles.
△ Less
Submitted 28 August, 2019;
originally announced August 2019.
-
SenseBERT: Driving Some Sense into BERT
Authors:
Yoav Levine,
Barak Lenz,
Or Dagan,
Ori Ram,
Dan Padnos,
Or Sharir,
Shai Shalev-Shwartz,
Amnon Shashua,
Yoav Shoham
Abstract:
The ability to learn from large unlabeled corpora has allowed neural language models to advance the frontier in natural language understanding. However, existing self-supervision techniques operate at the word form level, which serves as a surrogate for the underlying semantic content. This paper proposes a method to employ weak-supervision directly at the word sense level. Our model, named SenseB…
▽ More
The ability to learn from large unlabeled corpora has allowed neural language models to advance the frontier in natural language understanding. However, existing self-supervision techniques operate at the word form level, which serves as a surrogate for the underlying semantic content. This paper proposes a method to employ weak-supervision directly at the word sense level. Our model, named SenseBERT, is pre-trained to predict not only the masked words but also their WordNet supersenses. Accordingly, we attain a lexical-semantic level language model, without the use of human annotation. SenseBERT achieves significantly improved lexical understanding, as we demonstrate by experimenting on SemEval Word Sense Disambiguation, and by attaining a state of the art result on the Word in Context task.
△ Less
Submitted 18 May, 2020; v1 submitted 15 August, 2019;
originally announced August 2019.
-
ARPES study of orbital characters, symmetry breakings and pseudogaps in doped and pure Sr2IrO4
Authors:
Alex Louat,
Benjamin Lenz,
Silke Biermann,
Cyril Martins,
François Bertran,
Patrick Le Fèvre,
Julien E. Rault,
Fabrice Bert,
Véronique Brouet
Abstract:
Sr2IrO4 is characterized by a large spin-orbit coupling, which gives rise to bands with strongly entangled spin and orbital characters, called J=1/2 and J=3/2. We use light-polarization dependent ARPES to study directly the orbital character of these bands and fully map out their dispersion. We observe bands in very good agreement with our cluster dynamical mean-field theory calculations. We show…
▽ More
Sr2IrO4 is characterized by a large spin-orbit coupling, which gives rise to bands with strongly entangled spin and orbital characters, called J=1/2 and J=3/2. We use light-polarization dependent ARPES to study directly the orbital character of these bands and fully map out their dispersion. We observe bands in very good agreement with our cluster dynamical mean-field theory calculations. We show that the J=1/2 band, the closest to the Fermi level Ef, is dominated by dxz character along kx and dyz along ky. This is actually in agreement with an isotropic J=1/2 character on average, but this large orbital dependence in k-space was mostly overlooked before. It gives rise to strong modulations of the ARPES intensity that we explain and carefully take into account to compare dispersions in equivalent directions of the Brillouin zone. Although the latter dispersions look different at first, suggesting possible symmetry breakings, they are found essentially similar, once corrected for these intensity variations. In particular, the pseudogap-like features close to the $X$ point appearing in the nearly metallic 15% Rh-doped Sr2IrO4 strongly depend on experimental conditions. We reveal that there is nevertheless an energy scale of 30meV below which spectral weight is suppressed, independent of the experimental conditions, which gives a reliable basis to analyze this behavior. We suggest it is caused by disorder.
△ Less
Submitted 2 July, 2019;
originally announced July 2019.
-
Magnetization density distribution of Sr$_2$IrO$_4$: Deviation from a local $j_\text{eff}=1/2$ picture
Authors:
Jaehong Jeong,
Benjamin Lenz,
Arsen Gukasov,
Xavier Fabreges,
Andrew Sazonov,
Vladimir Hutanu,
Alex Louat,
Dalila Bounoua,
Cyril Martins,
Silke Biermann,
Véronique Brouet,
Yvan Sidis,
Philippe Bourges
Abstract:
$5d$ iridium oxides are of huge interest due to the potential for new quantum states driven by strong spin-orbit coupling. The strontium iridate Sr$_2$IrO$_4$ is particularly in the spotlight because of the so-called $j_\text{eff}=1/2$ state consisting of a quantum superposition of the three local $t_{2g}…
▽ More
$5d$ iridium oxides are of huge interest due to the potential for new quantum states driven by strong spin-orbit coupling. The strontium iridate Sr$_2$IrO$_4$ is particularly in the spotlight because of the so-called $j_\text{eff}=1/2$ state consisting of a quantum superposition of the three local $t_{2g}$ orbitals with -- in its most simple version -- nearly equal population, which stabilizes an unconventional Mott insulating state. Here, we report an anisotropic and aspherical magnetization density distribution measured by polarized neutron diffraction in a magnetic field up to 5~T at 4~K, which strongly deviates from a local \jeffHalf picture even when distortion-induced deviations from the equal weights of the orbital populations are taken into account. Once reconstructed by the maximum entropy method and multipole expansion model refinement, the magnetization density shows cross-shaped positive four lobes along the crystallographic tetragonal axes with a large spatial extent, showing that the $xy$ orbital contribution is dominant. The analogy to the superconducting copper oxide systems might then be weaker than commonly thought.
△ Less
Submitted 14 August, 2020; v1 submitted 19 April, 2019;
originally announced April 2019.
-
Spectral functions of Sr${}_2$IrO${}_4$: theory versus experiment
Authors:
Benjamin Lenz,
Cyril Martins,
Silke Biermann
Abstract:
The spin-orbit Mott insulator Sr${}_2$IrO${}_4$ has attracted a lot of interest in recent years from theory and experiment due to its close connection to isostructural high-temperature copper oxide superconductors. Despite of not being superconducting its spectral features closely resemble those of the cuprates, including Fermi surface and pseudogap properties. In this article, we review and exten…
▽ More
The spin-orbit Mott insulator Sr${}_2$IrO${}_4$ has attracted a lot of interest in recent years from theory and experiment due to its close connection to isostructural high-temperature copper oxide superconductors. Despite of not being superconducting its spectral features closely resemble those of the cuprates, including Fermi surface and pseudogap properties. In this article, we review and extend recent work in the theoretical description of the spectral function of pure and electron-doped Sr${}_2$IrO${}_4$ based on a cluster extension of dynamical mean-field theory ("oriented-cluster DMFT") and compare it to available angle-resolved photoemission data. Current theories provide surprisingly good agreement for pure and electron-doped Sr${}_2$IrO${}_4$, both in the paramagnetic and antiferromagnetic phases. Most notably, one obtains simple explanations for the experimentally observed steep feature around the $M$ point and the pseudo-gap-like spectral feature in electron-doped Sr${}_2$IrO${}_4$.
△ Less
Submitted 21 March, 2019;
originally announced March 2019.
-
Non-local Coulomb correlations in pure and electron-doped ${\mathrm{Sr}}_{2}{\mathrm{IrO}}_{4}$: spectral functions, Fermi surface and pseudogap-like spectral weight distributions from oriented cluster dynamical mean field theory
Authors:
Cyril Martins,
Benjamin Lenz,
Luca Perfetti,
Véronique Brouet,
François Bertran,
Silke Biermann
Abstract:
We address the role of non-local Coulomb correlations and short-range magnetic fluctuations in the high-temperature phase of Sr$_2$IrO$_4$ within state-of-the-art spectroscopic and first-principles theoretical methods. Introducing a novel cluster dynamical mean field scheme, we compute momentum-resolved spectral functions, which we find to be in excellent agreement with angle-resolved photoemissio…
▽ More
We address the role of non-local Coulomb correlations and short-range magnetic fluctuations in the high-temperature phase of Sr$_2$IrO$_4$ within state-of-the-art spectroscopic and first-principles theoretical methods. Introducing a novel cluster dynamical mean field scheme, we compute momentum-resolved spectral functions, which we find to be in excellent agreement with angle-resolved photoemission spectra. We show that while short-range antiferromagnetic fluctuations are crucial to account for the electronic properties of the material even in the high-temperature paramagnetic phase, long-range magnetic order is not a necessary ingredient of the insulating state. Upon do**, an exotic metallic state is generated, exhibiting cuprate-like pseudo-gap spectral properties, for which we propose a surprisingly simple theoretical mechanism.
△ Less
Submitted 26 January, 2018;
originally announced January 2018.
-
Anisotropy crossover in the frustrated Hubbard model on four-chain cylinders
Authors:
G. Ehlers,
B. Lenz,
S. R. Manmana,
R. M. Noack
Abstract:
Motivated by dimensional crossover in layered organic $κ$ salts, we determine the phase diagram of a system of four periodically coupled Hubbard chains with frustration at half filling as a function of the interchain hop** ${t_{\perp}/t}$ and interaction strength ${U/t}$ at a fixed ratio of frustration and interchain hop** ${t'/t_{\perp}=-0.5}$. We cover the range from the one-dimensional limi…
▽ More
Motivated by dimensional crossover in layered organic $κ$ salts, we determine the phase diagram of a system of four periodically coupled Hubbard chains with frustration at half filling as a function of the interchain hop** ${t_{\perp}/t}$ and interaction strength ${U/t}$ at a fixed ratio of frustration and interchain hop** ${t'/t_{\perp}=-0.5}$. We cover the range from the one-dimensional limit of uncoupled chains (${t_{\perp}/t=0.0}$) to the isotropic model (${t_{\perp}/t=1.0}$). For strong ${U/t}$, we find an antiferromagnetic insulator; in the weak-to-moderate-interaction regime, the phase diagram features quasi-one-dimensional antiferromagnetic behavior, an incommensurate spin-density wave, and a metallic phase as ${t_{\perp}/t}$ is increased. We characterize the phases through their magnetic ordering, dielectric response, and dominant static correlations. Our analysis is based primarily on a variant of the density-matrix renormalization-group algorithm based on an efficient hybrid-real-momentum-space formulation, in which we can treat relatively large lattices albeit of a limited width. This is complemented by a variational cluster approximation study with a cluster geometry corresponding to the cylindrical lattice allowing us to directly compare the two methods for this geometry. As an outlook, we make contact with work studying dimensional crossover in the full two-dimensional system.
△ Less
Submitted 11 January, 2018; v1 submitted 12 May, 2017;
originally announced May 2017.
-
Variational cluster approach to superconductivity and magnetism in the Kondo lattice model
Authors:
Benjamin Lenz,
Riccardo Gezzi,
Salvatore R. Manmana
Abstract:
We investigate in detail antiferromagnetic (AF) and superconducting (SC) phases as well as their coexistence in the two-dimensional Kondo lattice model on a square lattice, which is a paradigmatic model for heavy fermion materials. The results presented are mainly obtained using the variational cluster approximation (VCA) and are complemented by analytical findings for the equations of motion of p…
▽ More
We investigate in detail antiferromagnetic (AF) and superconducting (SC) phases as well as their coexistence in the two-dimensional Kondo lattice model on a square lattice, which is a paradigmatic model for heavy fermion materials. The results presented are mainly obtained using the variational cluster approximation (VCA) and are complemented by analytical findings for the equations of motion of pairing susceptibilities. A particularly interesting aspect is the possibility to have s-wave SC near half filling as reported by Bodensiek \textit{et al.} [Phys. Rev. Lett. \textbf{110}, 146406 (2013)]. When do** the system, we identify three regions which correspond to an AF metallic phase with small Fermi surface at weak coupling, an AF metal with a different Fermi surface topology at intermediate coupling, and a paramagnetic metal with a large Fermi surface at strong coupling. The transition between these two AF phases is found to be discontinuous at lower fillings, but turns to a continuous one when approaching half-filling. In the quest for s-wave superconductivity, only solutions are found which possess mean-field character. No true superconducting solutions caused by correlation effects are found in the s-wave channel. In contrast, we clearly identify robust d-wave pairing away from half-filling. However, we show that only by treating antiferromagnetism and superconductivity on equal footing artificial superconducting solutions at half-filling can be avoided. Our VCA findings support scenarios previously identified by variational Monte Carlo approaches and are a starting point for future investigations with VCA and further approaches such as cluster-embedding methods.
△ Less
Submitted 14 October, 2017; v1 submitted 14 December, 2016;
originally announced December 2016.
-
Mott Quantum Criticality in the Anisotropic 2D Hubbard Model
Authors:
Benjamin Lenz,
Salvatore R. Manmana,
Thomas Pruschke,
Fakher F. Assaad,
Marcin Raczkowski
Abstract:
We present evidence for Mott quantum criticality in an anisotropic two-dimensional system of coupled Hubbard chains at half-filling. In this scenario emerging from variational cluster approximation and cluster dynamical mean-field theory, the interchain hop** $t_{\perp}$ acts as a control parameter driving the second-order critical end point $T_c$ of the metal-insulator transition down to zero a…
▽ More
We present evidence for Mott quantum criticality in an anisotropic two-dimensional system of coupled Hubbard chains at half-filling. In this scenario emerging from variational cluster approximation and cluster dynamical mean-field theory, the interchain hop** $t_{\perp}$ acts as a control parameter driving the second-order critical end point $T_c$ of the metal-insulator transition down to zero at $t_{\perp}^{c}/t\simeq 0.2$. Below $t_{\perp}^{c}$, the volume of the hole and electron Fermi pockets of a compensated metal vanishes continuously at the Mott transition. Above $t_{\perp}^{c}$, the volume reduction of the pockets is cut off by a first-order transition. We discuss the relevance of our findings to a putative quantum critical point in layered organic conductors, whose location remains elusive so far.
△ Less
Submitted 5 March, 2016; v1 submitted 31 August, 2015;
originally announced August 2015.