Skip to main content

Showing 1–29 of 29 results for author: Lample, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2401.04088  [pdf, other

    cs.LG cs.CL

    Mixtral of Experts

    Authors: Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Teven Le Scao, Théophile Gervet, Thibaut Lavril, Thomas Wang, Timothée Lacroix , et al. (1 additional authors not shown)

    Abstract: We introduce Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model. Mixtral has the same architecture as Mistral 7B, with the difference that each layer is composed of 8 feedforward blocks (i.e. experts). For every token, at each layer, a router network selects two experts to process the current state and combine their outputs. Even though each token only sees two experts, the selected e… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

    Comments: See more details at https://mistral.ai/news/mixtral-of-experts/

  2. arXiv:2310.06825  [pdf, other

    cs.CL cs.AI cs.LG

    Mistral 7B

    Authors: Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed

    Abstract: We introduce Mistral 7B v0.1, a 7-billion-parameter language model engineered for superior performance and efficiency. Mistral 7B outperforms Llama 2 13B across all evaluated benchmarks, and Llama 1 34B in reasoning, mathematics, and code generation. Our model leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences o… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

    Comments: Models and code are available at https://mistral.ai/news/announcing-mistral-7b/

  3. arXiv:2302.13971  [pdf, other

    cs.CL

    LLaMA: Open and Efficient Foundation Language Models

    Authors: Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample

    Abstract: We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is co… ▽ More

    Submitted 27 February, 2023; originally announced February 2023.

  4. arXiv:2302.11223  [pdf, other

    cs.LG

    Deep Generative Symbolic Regression with Monte-Carlo-Tree-Search

    Authors: Pierre-Alexandre Kamienny, Guillaume Lample, Sylvain Lamprier, Marco Virgolin

    Abstract: Symbolic regression (SR) is the problem of learning a symbolic expression from numerical data. Recently, deep neural models trained on procedurally-generated synthetic datasets showed competitive performance compared to more classical Genetic Programming (GP) algorithms. Unlike their GP counterparts, these neural approaches are trained to generate expressions from datasets given as context. This a… ▽ More

    Submitted 10 May, 2023; v1 submitted 22 February, 2023; originally announced February 2023.

  5. arXiv:2210.12283  [pdf, other

    cs.AI cs.LG

    Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs

    Authors: Albert Q. Jiang, Sean Welleck, ** Peng Zhou, Wenda Li, Jiacheng Liu, Mateja Jamnik, Timothée Lacroix, Yuhuai Wu, Guillaume Lample

    Abstract: The formalization of existing mathematical proofs is a notoriously difficult process. Despite decades of research on automation and proof assistants, writing formal proofs remains arduous and only accessible to a few experts. While previous studies to automate formalization focused on powerful search algorithms, no attempts were made to take advantage of available informal proofs. In this work, we… ▽ More

    Submitted 20 February, 2023; v1 submitted 21 October, 2022; originally announced October 2022.

  6. arXiv:2205.11491  [pdf, other

    cs.CL cs.AI

    HyperTree Proof Search for Neural Theorem Proving

    Authors: Guillaume Lample, Marie-Anne Lachaux, Thibaut Lavril, Xavier Martinet, Amaury Hayat, Gabriel Ebner, Aurélien Rodriguez, Timothée Lacroix

    Abstract: We propose an online training procedure for a transformer-based automated theorem prover. Our approach leverages a new search algorithm, HyperTree Proof Search (HTPS), inspired by the recent success of AlphaZero. Our model learns from previous proof searches through online training, allowing it to generalize to domains far from the training distribution. We report detailed ablations of our pipelin… ▽ More

    Submitted 23 May, 2022; originally announced May 2022.

  7. arXiv:2204.10532  [pdf, other

    cs.LG

    End-to-end symbolic regression with transformers

    Authors: Pierre-Alexandre Kamienny, Stéphane d'Ascoli, Guillaume Lample, François Charton

    Abstract: Symbolic regression, the task of predicting the mathematical expression of a function from the observation of its values, is a difficult task which usually involves a two-step procedure: predicting the "skeleton" of the expression up to the choice of numerical constants, then fitting the constants by optimizing a non-convex loss function. The dominant approach is genetic programming, which evolves… ▽ More

    Submitted 22 April, 2022; originally announced April 2022.

  8. arXiv:2201.04600  [pdf, other

    cs.LG

    Deep Symbolic Regression for Recurrent Sequences

    Authors: Stéphane d'Ascoli, Pierre-Alexandre Kamienny, Guillaume Lample, François Charton

    Abstract: Symbolic regression, i.e. predicting a function from the observation of its values, is well-known to be a challenging task. In this paper, we train Transformers to infer the function or recurrence relation underlying sequences of integers or floats, a typical task in human IQ tests which has hardly been tackled in the machine learning literature. We evaluate our integer model on a subset of OEIS s… ▽ More

    Submitted 28 June, 2022; v1 submitted 12 January, 2022; originally announced January 2022.

  9. arXiv:2110.06773  [pdf, other

    cs.SE cs.CL cs.LG

    Leveraging Automated Unit Tests for Unsupervised Code Translation

    Authors: Baptiste Roziere, Jie M. Zhang, Francois Charton, Mark Harman, Gabriel Synnaeve, Guillaume Lample

    Abstract: With little to no parallel data available for programming languages, unsupervised methods are well-suited to source code translation. However, the majority of unsupervised machine translation approaches rely on back-translation, a method developed in the context of natural language translation and one that inherently involves training on noisy inputs. Unfortunately, source code is highly sensitive… ▽ More

    Submitted 16 February, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

  10. arXiv:2102.07492  [pdf, other

    cs.CL

    DOBF: A Deobfuscation Pre-Training Objective for Programming Languages

    Authors: Baptiste Roziere, Marie-Anne Lachaux, Marc Szafraniec, Guillaume Lample

    Abstract: Recent advances in self-supervised learning have dramatically improved the state of the art on a wide variety of tasks. However, research in language model pre-training has mostly focused on natural languages, and it is unclear whether models like BERT and its variants provide the best pre-training when applied to other modalities, such as source code. In this paper, we introduce a new pre-trainin… ▽ More

    Submitted 27 October, 2021; v1 submitted 15 February, 2021; originally announced February 2021.

  11. arXiv:2009.09758  [pdf, other

    cs.LG stat.ML

    Target Conditioning for One-to-Many Generation

    Authors: Marie-Anne Lachaux, Armand Joulin, Guillaume Lample

    Abstract: Neural Machine Translation (NMT) models often lack diversity in their generated translations, even when paired with search algorithm, like beam search. A challenge is that the diversity in translations are caused by the variability in the target language, and cannot be inferred from the source sentence alone. In this paper, we propose to explicitly model this one-to-many map** by conditioning th… ▽ More

    Submitted 21 September, 2020; originally announced September 2020.

  12. arXiv:2006.06462  [pdf, other

    cs.LG cs.CL

    Learning advanced mathematical computations from examples

    Authors: François Charton, Amaury Hayat, Guillaume Lample

    Abstract: Using transformers over large generated datasets, we train models to learn mathematical properties of differential systems, such as local stability, behavior at infinity and controllability. We achieve near perfect prediction of qualitative characteristics, and good approximations of numerical features of the system. This demonstrates that neural networks can learn to perform complex computations,… ▽ More

    Submitted 19 March, 2021; v1 submitted 11 June, 2020; originally announced June 2020.

  13. arXiv:2006.03511  [pdf, other

    cs.CL cs.PL

    Unsupervised Translation of Programming Languages

    Authors: Marie-Anne Lachaux, Baptiste Roziere, Lowik Chanussot, Guillaume Lample

    Abstract: A transcompiler, also known as source-to-source translator, is a system that converts source code from a high-level programming language (such as C++ or Python) to another. Transcompilers are primarily used for interoperability, and to port codebases written in an obsolete or deprecated language (e.g. COBOL, Python 2) to a modern one. They typically rely on handcrafted rewrite rules, applied to th… ▽ More

    Submitted 22 September, 2020; v1 submitted 5 June, 2020; originally announced June 2020.

  14. arXiv:1912.01412  [pdf, other

    cs.SC cs.LG

    Deep Learning for Symbolic Mathematics

    Authors: Guillaume Lample, François Charton

    Abstract: Neural networks have a reputation for being better at solving statistical or approximate problems than at performing calculations or working with symbolic data. In this paper, we show that they can be surprisingly good at more elaborated tasks in mathematics, such as symbolic integration and solving differential equations. We propose a syntax for representing mathematical problems, and methods for… ▽ More

    Submitted 2 December, 2019; originally announced December 2019.

  15. arXiv:1907.05242  [pdf, other

    cs.CL cs.LG

    Large Memory Layers with Product Keys

    Authors: Guillaume Lample, Alexandre Sablayrolles, Marc'Aurelio Ranzato, Ludovic Denoyer, Hervé Jégou

    Abstract: This paper introduces a structured memory which can be easily integrated into a neural network. The memory is very large by design and significantly increases the capacity of the architecture, by up to a billion parameters with a negligible computational overhead. Its design and access pattern is based on product keys, which enable fast and exact nearest neighbor search. The ability to increase th… ▽ More

    Submitted 15 December, 2019; v1 submitted 10 July, 2019; originally announced July 2019.

    Comments: Advances in Neural Information Processing Systems, 2019

  16. arXiv:1907.01470  [pdf, other

    cs.LG cs.CL stat.ML

    Augmenting Self-attention with Persistent Memory

    Authors: Sainbayar Sukhbaatar, Edouard Grave, Guillaume Lample, Herve Jegou, Armand Joulin

    Abstract: Transformer networks have lead to important progress in language modeling and machine translation. These models include two consecutive modules, a feed-forward layer and a self-attention layer. The latter allows the network to capture long term dependencies and are often regarded as the key ingredient in the success of Transformers. Building upon this intuition, we propose a new model that solely… ▽ More

    Submitted 2 July, 2019; originally announced July 2019.

  17. arXiv:1902.01382  [pdf, other

    cs.CL

    The FLoRes Evaluation Datasets for Low-Resource Machine Translation: Nepali-English and Sinhala-English

    Authors: Francisco Guzmán, Peng-Jen Chen, Myle Ott, Juan Pino, Guillaume Lample, Philipp Koehn, Vishrav Chaudhary, Marc'Aurelio Ranzato

    Abstract: For machine translation, a vast majority of language pairs in the world are considered low-resource because they have little parallel data available. Besides the technical challenges of learning with limited supervision, it is difficult to evaluate methods trained on low-resource language pairs because of the lack of freely and publicly available benchmarks. In this work, we introduce the FLoRes e… ▽ More

    Submitted 14 September, 2019; v1 submitted 4 February, 2019; originally announced February 2019.

    Comments: EMNLP 2019

  18. arXiv:1901.07291  [pdf, other

    cs.CL

    Cross-lingual Language Model Pretraining

    Authors: Guillaume Lample, Alexis Conneau

    Abstract: Recent studies have demonstrated the efficiency of generative pretraining for English natural language understanding. In this work, we extend this approach to multiple languages and show the effectiveness of cross-lingual pretraining. We propose two methods to learn cross-lingual language models (XLMs): one unsupervised that only relies on monolingual data, and one supervised that leverages parall… ▽ More

    Submitted 22 January, 2019; originally announced January 2019.

  19. arXiv:1811.00552  [pdf, other

    cs.CL cs.LG

    Multiple-Attribute Text Style Transfer

    Authors: Sandeep Subramanian, Guillaume Lample, Eric Michael Smith, Ludovic Denoyer, Marc'Aurelio Ranzato, Y-Lan Boureau

    Abstract: The dominant approach to unsupervised "style transfer" in text is based on the idea of learning a latent representation, which is independent of the attributes specifying its "style". In this paper, we show that this condition is not necessary and is not always met in practice, even with domain adversarial training that explicitly aims at learning such disentangled representations. We thus propose… ▽ More

    Submitted 20 September, 2019; v1 submitted 1 November, 2018; originally announced November 2018.

  20. arXiv:1809.05053  [pdf, other

    cs.CL cs.AI cs.LG

    XNLI: Evaluating Cross-lingual Sentence Representations

    Authors: Alexis Conneau, Guillaume Lample, Ruty Rinott, Adina Williams, Samuel R. Bowman, Holger Schwenk, Veselin Stoyanov

    Abstract: State-of-the-art natural language processing systems rely on supervision in the form of annotated data to learn competent models. These models are generally trained on data in a single language (usually English), and cannot be directly used beyond that language. Since collecting data in every language is not realistic, there has been a growing interest in cross-lingual language understanding (XLU)… ▽ More

    Submitted 13 September, 2018; originally announced September 2018.

    Comments: EMNLP 2018

  21. arXiv:1805.01070  [pdf, other

    cs.CL

    What you can cram into a single vector: Probing sentence embeddings for linguistic properties

    Authors: Alexis Conneau, German Kruszewski, Guillaume Lample, Loïc Barrault, Marco Baroni

    Abstract: Although much effort has recently been devoted to training high-quality sentence embeddings, we still have a poor understanding of what they are capturing. "Downstream" tasks, often based on sentence classification, are commonly used to evaluate the quality of sentence representations. The complexity of the tasks makes it however difficult to infer what kind of information is present in the repres… ▽ More

    Submitted 8 July, 2018; v1 submitted 2 May, 2018; originally announced May 2018.

    Comments: ACL 2018

  22. arXiv:1804.07755  [pdf, other

    cs.CL

    Phrase-Based & Neural Unsupervised Machine Translation

    Authors: Guillaume Lample, Myle Ott, Alexis Conneau, Ludovic Denoyer, Marc'Aurelio Ranzato

    Abstract: Machine translation systems achieve near human-level performance on some languages, yet their effectiveness strongly relies on the availability of large amounts of parallel sentences, which hinders their applicability to the majority of language pairs. This work investigates how to learn to translate when having access to only large monolingual corpora in each language. We propose two model varian… ▽ More

    Submitted 13 August, 2018; v1 submitted 20 April, 2018; originally announced April 2018.

    Comments: EMNLP 2018

  23. arXiv:1711.00043  [pdf, other

    cs.CL cs.AI

    Unsupervised Machine Translation Using Monolingual Corpora Only

    Authors: Guillaume Lample, Alexis Conneau, Ludovic Denoyer, Marc'Aurelio Ranzato

    Abstract: Machine translation has recently achieved impressive performance thanks to recent advances in deep learning and the availability of large-scale parallel corpora. There have been numerous attempts to extend these successes to low-resource language pairs, yet requiring tens of thousands of parallel sentences. In this work, we take this research direction to the extreme and investigate whether it is… ▽ More

    Submitted 13 April, 2018; v1 submitted 31 October, 2017; originally announced November 2017.

    Comments: ICLR 2018

  24. arXiv:1710.04087  [pdf, other

    cs.CL

    Word Translation Without Parallel Data

    Authors: Alexis Conneau, Guillaume Lample, Marc'Aurelio Ranzato, Ludovic Denoyer, Hervé Jégou

    Abstract: State-of-the-art methods for learning cross-lingual word embeddings have relied on bilingual dictionaries or parallel corpora. Recent studies showed that the need for parallel data supervision can be alleviated with character-level information. While these methods showed encouraging results, they are not on par with their supervised counterparts and are limited to pairs of languages sharing a comm… ▽ More

    Submitted 30 January, 2018; v1 submitted 11 October, 2017; originally announced October 2017.

    Comments: ICLR 2018

  25. arXiv:1706.00409  [pdf, other

    cs.CV

    Fader Networks: Manipulating Images by Sliding Attributes

    Authors: Guillaume Lample, Neil Zeghidour, Nicolas Usunier, Antoine Bordes, Ludovic Denoyer, Marc'Aurelio Ranzato

    Abstract: This paper introduces a new encoder-decoder architecture that is trained to reconstruct images by disentangling the salient information of the image and the values of attributes directly in the latent space. As a result, after training, our model can generate different realistic versions of an input image by varying the attribute values. By using continuous attribute values, we can choose how much… ▽ More

    Submitted 28 January, 2018; v1 submitted 1 June, 2017; originally announced June 2017.

    Comments: NIPS 2017

  26. arXiv:1609.05521  [pdf, other

    cs.AI cs.LG

    Playing FPS Games with Deep Reinforcement Learning

    Authors: Guillaume Lample, Devendra Singh Chaplot

    Abstract: Advances in deep reinforcement learning have allowed autonomous agents to perform well on Atari games, often outperforming humans, using only raw pixels to make their decisions. However, most of these games take place in 2D environments that are fully observable to the agent. In this paper, we present the first architecture to tackle 3D environments in first-person shooter games, that involve part… ▽ More

    Submitted 29 January, 2018; v1 submitted 18 September, 2016; originally announced September 2016.

    Comments: The authors contributed equally to this work

  27. arXiv:1605.03832  [pdf, other

    cs.CL

    Polyglot Neural Language Models: A Case Study in Cross-Lingual Phonetic Representation Learning

    Authors: Yulia Tsvetkov, Sunayana Sitaram, Manaal Faruqui, Guillaume Lample, Patrick Littell, David Mortensen, Alan W Black, Lori Levin, Chris Dyer

    Abstract: We introduce polyglot language models, recurrent neural network models trained to predict symbol sequences in many different languages using shared representations of symbols and conditioning on typological information about the language to be predicted. We apply these to the problem of modeling phone sequences---a domain in which universal symbol inventories and cross-linguistically shared featur… ▽ More

    Submitted 12 May, 2016; originally announced May 2016.

    Comments: Proceedings of NAACL 2016; 10 pages

  28. arXiv:1603.01360  [pdf, other

    cs.CL

    Neural Architectures for Named Entity Recognition

    Authors: Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, Chris Dyer

    Abstract: State-of-the-art named entity recognition systems rely heavily on hand-crafted features and domain-specific knowledge in order to learn effectively from the small, supervised training corpora that are available. In this paper, we introduce two new neural architectures---one based on bidirectional LSTMs and conditional random fields, and the other that constructs and labels segments using a transit… ▽ More

    Submitted 7 April, 2016; v1 submitted 4 March, 2016; originally announced March 2016.

    Comments: Proceedings of NAACL 2016

  29. arXiv:1602.01925  [pdf, ps, other

    cs.CL

    Massively Multilingual Word Embeddings

    Authors: Waleed Ammar, George Mulcaire, Yulia Tsvetkov, Guillaume Lample, Chris Dyer, Noah A. Smith

    Abstract: We introduce new methods for estimating and evaluating embeddings of words in more than fifty languages in a single shared embedding space. Our estimation methods, multiCluster and multiCCA, use dictionaries and monolingual data; they do not require parallel data. Our new evaluation method, multiQVEC-CCA, is shown to correlate better than previous ones with two downstream tasks (text categorizatio… ▽ More

    Submitted 21 May, 2016; v1 submitted 4 February, 2016; originally announced February 2016.