Skip to main content

Showing 1–1 of 1 results for author: Meirom, S

.
  1. arXiv:2403.19887  [pdf, other

    cs.CL cs.LG

    Jamba: A Hybrid Transformer-Mamba Language Model

    Authors: Opher Lieber, Barak Lenz, Hofit Bata, Gal Cohen, Jhonathan Osin, Itay Dalmedigos, Erez Safahi, Shaked Meirom, Yonatan Belinkov, Shai Shalev-Shwartz, Omri Abend, Raz Alon, Tomer Asida, Amir Bergman, Roman Glozman, Michael Gokhman, Avashalom Manevich, Nir Ratner, Noam Rozen, Erez Shwartz, Mor Zusman, Yoav Shoham

    Abstract: We present Jamba, a new base large language model based on a novel hybrid Transformer-Mamba mixture-of-experts (MoE) architecture. Specifically, Jamba interleaves blocks of Transformer and Mamba layers, enjoying the benefits of both model families. MoE is added in some of these layers to increase model capacity while kee** active parameter usage manageable. This flexible architecture allows reso… ▽ More

    Submitted 3 July, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

    Comments: Webpage: https://www.ai21.com/jamba