Skip to main content

Showing 1–3 of 3 results for author: Noune, B

.
  1. arXiv:2311.16867  [pdf, other

    cs.CL cs.AI

    The Falcon Series of Open Language Models

    Authors: Ebtesam Almazrouei, Hamza Alobeidli, Abdulaziz Alshamsi, Alessandro Cappelli, Ruxandra Cojocaru, Mérouane Debbah, Étienne Goffinet, Daniel Hesslow, Julien Launay, Quentin Malartic, Daniele Mazzotta, Badreddine Noune, Baptiste Pannier, Guilherme Penedo

    Abstract: We introduce the Falcon series: 7B, 40B, and 180B parameters causal decoder-only models trained on a diverse high-quality corpora predominantly assembled from web data. The largest model, Falcon-180B, has been trained on over 3.5 trillion tokens of text--the largest openly documented pretraining run. Falcon-180B significantly outperforms models such as PaLM or Chinchilla, and improves upon concurr… ▽ More

    Submitted 29 November, 2023; v1 submitted 28 November, 2023; originally announced November 2023.

  2. arXiv:2206.02915  [pdf, other

    cs.LG

    8-bit Numerical Formats for Deep Neural Networks

    Authors: Badreddine Noune, Philip Jones, Daniel Justus, Dominic Masters, Carlo Luschi

    Abstract: Given the current trend of increasing size and complexity of machine learning architectures, it has become of critical importance to identify new approaches to improve the computational efficiency of model training. In this context, we address the advantages of floating-point over fixed-point representation, and present an in-depth study on the use of 8-bit floating-point number formats for activa… ▽ More

    Submitted 6 June, 2022; originally announced June 2022.

  3. arXiv:2012.03837  [pdf, other

    cs.LG cs.AI cs.NE

    Parallel Training of Deep Networks with Local Updates

    Authors: Michael Laskin, Luke Metz, Seth Nabarro, Mark Saroufim, Badreddine Noune, Carlo Luschi, Jascha Sohl-Dickstein, Pieter Abbeel

    Abstract: Deep learning models trained on large data sets have been widely successful in both vision and language domains. As state-of-the-art deep learning architectures have continued to grow in parameter count so have the compute budgets and times required to train them, increasing the need for compute-efficient methods that parallelize training. Two common approaches to parallelize the training of deep… ▽ More

    Submitted 15 June, 2021; v1 submitted 7 December, 2020; originally announced December 2020.

    Comments: First two authors - Michael Laskin and Luke Metz - contributed equally. Order was determined by a coin flip