-
Jamba: A Hybrid Transformer-Mamba Language Model
Authors:
Opher Lieber,
Barak Lenz,
Hofit Bata,
Gal Cohen,
Jhonathan Osin,
Itay Dalmedigos,
Erez Safahi,
Shaked Meirom,
Yonatan Belinkov,
Shai Shalev-Shwartz,
Omri Abend,
Raz Alon,
Tomer Asida,
Amir Bergman,
Roman Glozman,
Michael Gokhman,
Avashalom Manevich,
Nir Ratner,
Noam Rozen,
Erez Shwartz,
Mor Zusman,
Yoav Shoham
Abstract:
We present Jamba, a new base large language model based on a novel hybrid Transformer-Mamba mixture-of-experts (MoE) architecture. Specifically, Jamba interleaves blocks of Transformer and Mamba layers, enjoying the benefits of both model families. MoE is added in some of these layers to increase model capacity while kee** active parameter usage manageable. This flexible architecture allows reso…
▽ More
We present Jamba, a new base large language model based on a novel hybrid Transformer-Mamba mixture-of-experts (MoE) architecture. Specifically, Jamba interleaves blocks of Transformer and Mamba layers, enjoying the benefits of both model families. MoE is added in some of these layers to increase model capacity while kee** active parameter usage manageable. This flexible architecture allows resource- and objective-specific configurations. In the particular configuration we have implemented, we end up with a powerful model that fits in a single 80GB GPU. Built at large scale, Jamba provides high throughput and small memory footprint compared to vanilla Transformers, and at the same time state-of-the-art performance on standard language model benchmarks and long-context evaluations. Remarkably, the model presents strong results for up to 256K tokens context length. We study various architectural decisions, such as how to combine Transformer and Mamba layers, and how to mix experts, and show that some of them are crucial in large scale modeling. We also describe several interesting properties of these architectures which the training and evaluation of Jamba have revealed, and plan to release checkpoints from various ablation runs, to encourage further exploration of this novel architecture. We make the weights of our implementation of Jamba publicly available under a permissive license.
△ Less
Submitted 3 July, 2024; v1 submitted 28 March, 2024;
originally announced March 2024.
-
MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning
Authors:
Ehud Karpas,
Omri Abend,
Yonatan Belinkov,
Barak Lenz,
Opher Lieber,
Nir Ratner,
Yoav Shoham,
Hofit Bata,
Yoav Levine,
Kevin Leyton-Brown,
Dor Muhlgay,
Noam Rozen,
Erez Schwartz,
Gal Shachaf,
Shai Shalev-Shwartz,
Amnon Shashua,
Moshe Tenenholtz
Abstract:
Huge language models (LMs) have ushered in a new era for AI, serving as a gateway to natural-language-based knowledge tasks. Although an essential element of modern AI, LMs are also inherently limited in a number of ways. We discuss these limitations and how they can be avoided by adopting a systems approach. Conceptualizing the challenge as one that involves knowledge and reasoning in addition to…
▽ More
Huge language models (LMs) have ushered in a new era for AI, serving as a gateway to natural-language-based knowledge tasks. Although an essential element of modern AI, LMs are also inherently limited in a number of ways. We discuss these limitations and how they can be avoided by adopting a systems approach. Conceptualizing the challenge as one that involves knowledge and reasoning in addition to linguistic processing, we define a flexible architecture with multiple neural models, complemented by discrete knowledge and reasoning modules. We describe this neuro-symbolic architecture, dubbed the Modular Reasoning, Knowledge and Language (MRKL, pronounced "miracle") system, some of the technical challenges in implementing it, and Jurassic-X, AI21 Labs' MRKL system implementation.
△ Less
Submitted 1 May, 2022;
originally announced May 2022.
-
Moser Flow: Divergence-based Generative Modeling on Manifolds
Authors:
Noam Rozen,
Aditya Grover,
Maximilian Nickel,
Yaron Lipman
Abstract:
We are interested in learning generative models for complex geometries described via manifolds, such as spheres, tori, and other implicit surfaces. Current extensions of existing (Euclidean) generative models are restricted to specific geometries and typically suffer from high computational costs. We introduce Moser Flow (MF), a new class of generative models within the family of continuous normal…
▽ More
We are interested in learning generative models for complex geometries described via manifolds, such as spheres, tori, and other implicit surfaces. Current extensions of existing (Euclidean) generative models are restricted to specific geometries and typically suffer from high computational costs. We introduce Moser Flow (MF), a new class of generative models within the family of continuous normalizing flows (CNF). MF also produces a CNF via a solution to the change-of-variable formula, however differently from other CNF methods, its model (learned) density is parameterized as the source (prior) density minus the divergence of a neural network (NN). The divergence is a local, linear differential operator, easy to approximate and calculate on manifolds. Therefore, unlike other CNFs, MF does not require invoking or backpropagating through an ODE solver during training. Furthermore, representing the model density explicitly as the divergence of a NN rather than as a solution of an ODE facilitates learning high fidelity densities. Theoretically, we prove that MF constitutes a universal density approximator under suitable assumptions. Empirically, we demonstrate for the first time the use of flow models for sampling from general curved surfaces and achieve significant improvements in density estimation, sample quality, and training complexity over existing CNFs on challenging synthetic geometries and real-world benchmarks from the earth and climate sciences.
△ Less
Submitted 2 November, 2021; v1 submitted 18 August, 2021;
originally announced August 2021.
-
How reproducible are methods to measure the dynamic viscoelastic properties of poroelastic media?
Authors:
Paolo Bonfiglio,
Francesco Pompoli,
Kirill V. Horoshenkov,
Mahmud Iskandar B Seth A Rahim,
Luc Jaouen,
Julia Rodenas,
Francois-Xavier Becot,
Emmanuel Gourdon,
Dirk Jaeger,
Volker Kursch,
Maurizio Tarello,
Nicolaas Bernardus Roozen,
Christ Glorieux,
Fabrizio Ferrian,
Pierre Leroy,
Francesco Briatico Vangosa,
Nicolas Dauchez,
Felix Foucart,
Lei Lei,
Kevin Carillo,
Olivier Doutres,
Franck Sgard,
Raymond Panneton,
Kevin Verdiere,
Claudio Bertolini1
, et al. (8 additional authors not shown)
Abstract:
There is a considerable number of research publications on the acoustical properties of porous media with an elastic frame. A simple search through the Web of ScienceTM (last accessed 21 March 2018) suggests that there are at least 819 publications which deal with the acoustics of poroelastic media. A majority of these researches require accurate knowledge of the elastic properties over a broad fr…
▽ More
There is a considerable number of research publications on the acoustical properties of porous media with an elastic frame. A simple search through the Web of ScienceTM (last accessed 21 March 2018) suggests that there are at least 819 publications which deal with the acoustics of poroelastic media. A majority of these researches require accurate knowledge of the elastic properties over a broad frequency range. However, the accuracy of the measurement of the dynamic elastic properties of poroelastic media has been a contentious issue. The novelty of this paper is that it studies the reproducibility of some popular experimental methods which are used routinely to measure the key elastic properties such as the dynamic Young's modulus, loss factor and Poisson ratio of poroelastic media. In this paper, fourteen independent sets of laboratory measurements were performed on specimens of the same porous materials. The results from these measurements suggest that the reproducibility of this type of experimental method is poor. This work can be helpful to suggest improvements which can be developed to harmonize the way the elastic properties of poroelastic media are measured worldwide.
△ Less
Submitted 23 May, 2018;
originally announced May 2018.