Skip to main content

Showing 1–6 of 6 results for author: Dagan, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.01035  [pdf, other

    cs.CL

    Getting the most out of your tokenizer for pre-training and domain adaptation

    Authors: Gautier Dagan, Gabriel Synnaeve, Baptiste Rozière

    Abstract: Tokenization is an understudied and often neglected component of modern LLMs. Most published works use a single tokenizer for all experiments, often borrowed from another model, without performing ablations or analysis to optimize tokenization. Moreover, the tokenizer is generally kept unchanged when fine-tuning a base model. In this paper, we show that the size, pre-tokenization regular expressio… ▽ More

    Submitted 7 February, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

  2. arXiv:2308.06391  [pdf, other

    cs.CL cs.RO

    Dynamic Planning with a LLM

    Authors: Gautier Dagan, Frank Keller, Alex Lascarides

    Abstract: While Large Language Models (LLMs) can solve many NLP tasks in zero-shot settings, applications involving embodied agents remain problematic. In particular, complex plans that require multi-step reasoning become difficult and too costly as the context window grows. Planning requires understanding the likely effects of one's actions and identifying whether the current environment satisfies the goal… ▽ More

    Submitted 11 August, 2023; originally announced August 2023.

  3. arXiv:2301.11845  [pdf, other

    cs.CL

    Learning the Effects of Physical Actions in a Multi-modal Environment

    Authors: Gautier Dagan, Frank Keller, Alex Lascarides

    Abstract: Large Language Models (LLMs) handle physical commonsense information inadequately. As a result of being trained in a disembodied setting, LLMs often fail to predict an action's outcome in a given environment. However, predicting the effects of an action before it is executed is crucial in planning, where coherent sequences of actions are often needed to achieve a goal. Therefore, we introduce the… ▽ More

    Submitted 3 February, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

  4. arXiv:2112.02721  [pdf, other

    cs.CL cs.AI cs.LG

    NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation

    Authors: Kaustubh D. Dhole, Varun Gangal, Sebastian Gehrmann, Aadesh Gupta, Zhenhao Li, Saad Mahamood, Abinaya Mahendiran, Simon Mille, Ashish Shrivastava, Samson Tan, Tongshuang Wu, Jascha Sohl-Dickstein, **ho D. Choi, Eduard Hovy, Ondrej Dusek, Sebastian Ruder, Sajant Anand, Nagender Aneja, Rabin Banjade, Lisa Barthe, Hanna Behnke, Ian Berlot-Attwell, Connor Boyle, Caroline Brun, Marco Antonio Sobrevilla Cabezudo , et al. (101 additional authors not shown)

    Abstract: Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on. In this paper, we present NL-Augmenter, a new participatory Python-based natural language augmentation framework which supports the creation of both transformations (modifications to the data) and filters (data split… ▽ More

    Submitted 11 October, 2022; v1 submitted 5 December, 2021; originally announced December 2021.

    Comments: 39 pages, repository at https://github.com/GEM-benchmark/NL-Augmenter

  5. arXiv:2001.03361  [pdf, other

    cs.CL

    Co-evolution of language and agents in referential games

    Authors: Gautier Dagan, Dieuwke Hupkes, Elia Bruni

    Abstract: Referential games offer a grounded learning environment for neural agents which accounts for the fact that language is functionally used to communicate. However, they do not take into account a second constraint considered to be fundamental for the shape of human language: that it must be learnable by new language learners. Cogswell et al. (2019) introduced cultural transmission within referenti… ▽ More

    Submitted 30 January, 2021; v1 submitted 10 January, 2020; originally announced January 2020.

    Comments: 12 pages, 9 figures, EACL 2021 long paper

  6. arXiv:1911.03872  [pdf, other

    cs.LG stat.ML

    Location Attention for Extrapolation to Longer Sequences

    Authors: Yann Dubois, Gautier Dagan, Dieuwke Hupkes, Elia Bruni

    Abstract: Neural networks are surprisingly good at interpolating and perform remarkably well when the training set examples resemble those in the test set. However, they are often unable to extrapolate patterns beyond the seen data, even when the abstractions required for such patterns are simple. In this paper, we first review the notion of extrapolation, why it is important and how one could hope to tackl… ▽ More

    Submitted 21 April, 2020; v1 submitted 10 November, 2019; originally announced November 2019.

    Comments: 11 pages, 9 figures, Accepted for publication at ACL 2020