Skip to main content

Showing 1–26 of 26 results for author: Dymetman, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2310.13011  [pdf, other

    cs.CL cs.LG

    Compositional preference models for aligning LMs

    Authors: Dongyoung Go, Tomasz Korbak, Germán Kruszewski, Jos Rozen, Marc Dymetman

    Abstract: As language models (LMs) become more capable, it is increasingly important to align them with human preferences. However, the dominant paradigm for training Preference Models (PMs) for that purpose suffers from fundamental limitations, such as lack of transparency and scalability, along with susceptibility to overfitting the preference dataset. We propose Compositional Preference Models (CPMs), a… ▽ More

    Submitted 14 March, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: ICLR 2024

  2. arXiv:2306.17757  [pdf, other

    cs.CL

    Should you marginalize over possible tokenizations?

    Authors: Nadezhda Chirkova, Germán Kruszewski, Jos Rozen, Marc Dymetman

    Abstract: Autoregressive language models (LMs) map token sequences to probabilities. The usual practice for computing the probability of any character string (e.g. English sentences) is to first transform it into a sequence of tokens that is scored by the model. However, there are exponentially many token sequences that represent any given string. To truly compute the probability of a string one should marg… ▽ More

    Submitted 30 June, 2023; originally announced June 2023.

    Comments: Accepted to ACL 2023

  3. arXiv:2303.05431  [pdf, other

    cs.CL cs.AI cs.LG

    disco: a toolkit for Distributional Control of Generative Models

    Authors: Germán Kruszewski, Jos Rozen, Marc Dymetman

    Abstract: Pre-trained language models and other generative models have revolutionized NLP and beyond. However, these models tend to reproduce undesirable biases present in their training data. Also, they may overlook patterns that are important but challenging to capture. To address these limitations, researchers have introduced distributional control techniques. These techniques, not limited to language, a… ▽ More

    Submitted 8 March, 2023; originally announced March 2023.

  4. arXiv:2302.08215  [pdf, other

    cs.CL cs.LG stat.ML

    Aligning Language Models with Preferences through f-divergence Minimization

    Authors: Dongyoung Go, Tomasz Korbak, Germán Kruszewski, Jos Rozen, Nahyeon Ryu, Marc Dymetman

    Abstract: Aligning language models with preferences can be posed as approximating a target distribution representing some desired behavior. Existing approaches differ both in the functional form of the target distribution and the algorithm used to approximate it. For instance, Reinforcement Learning from Human Feedback (RLHF) corresponds to minimizing a reverse KL from an implicit target distribution arisin… ▽ More

    Submitted 6 June, 2023; v1 submitted 16 February, 2023; originally announced February 2023.

  5. arXiv:2206.00761  [pdf, other

    cs.LG cs.CL stat.ML

    On Reinforcement Learning and Distribution Matching for Fine-Tuning Language Models with no Catastrophic Forgetting

    Authors: Tomasz Korbak, Hady Elsahar, Germán Kruszewski, Marc Dymetman

    Abstract: The availability of large pre-trained models is changing the landscape of Machine Learning research and practice, moving from a training-from-scratch to a fine-tuning paradigm. While in some applications the goal is to "nudge" the pre-trained distribution towards preferred outputs, in others it is to steer it towards a different distribution over the sample space. Two main paradigms have emerged t… ▽ More

    Submitted 14 November, 2022; v1 submitted 1 June, 2022; originally announced June 2022.

  6. arXiv:2112.05702  [pdf, other

    cs.LG cs.CL cs.NE

    Sampling from Discrete Energy-Based Models with Quality/Efficiency Trade-offs

    Authors: Bryan Eikema, Germán Kruszewski, Hady Elsahar, Marc Dymetman

    Abstract: Energy-Based Models (EBMs) allow for extremely flexible specifications of probability distributions. However, they do not provide a mechanism for obtaining exact samples from these distributions. Monte Carlo techniques can aid us in obtaining samples if some proposal distribution that we can easily sample from is available. For instance, rejection sampling can provide exact samples but is often di… ▽ More

    Submitted 10 December, 2021; originally announced December 2021.

  7. arXiv:2112.00791  [pdf, other

    cs.LG cs.CL

    Controlling Conditional Language Models without Catastrophic Forgetting

    Authors: Tomasz Korbak, Hady Elsahar, German Kruszewski, Marc Dymetman

    Abstract: Machine learning is shifting towards general-purpose pretrained generative models, trained in a self-supervised manner on large amounts of data, which can then be applied to solve a large number of tasks. However, due to their generic training methodology, these models often fail to meet some of the downstream requirements (e.g., hallucinations in abstractive summarization or style violations in c… ▽ More

    Submitted 20 June, 2022; v1 submitted 1 December, 2021; originally announced December 2021.

    Comments: ICML 2022

  8. arXiv:2106.04985  [pdf, other

    cs.LG cs.CL cs.NE cs.SE

    Energy-Based Models for Code Generation under Compilability Constraints

    Authors: Tomasz Korbak, Hady Elsahar, Marc Dymetman, Germán Kruszewski

    Abstract: Neural language models can be successfully trained on source code, leading to applications such as code completion. However, their versatile autoregressive self-supervision objective overlooks important global sequence-level features that are present in the data such as syntactic correctness or compilability. In this work, we pose the problem of learning to generate compilable code as constraint s… ▽ More

    Submitted 9 June, 2021; originally announced June 2021.

    Comments: Accepted for the First Workshop on Natural Language Processing for Programming, ACL 2021

    ACM Class: I.2.2; I.2.7; I.2.6; I.5.1

  9. arXiv:2012.11635  [pdf, other

    cs.CL cs.AI cs.LG

    A Distributional Approach to Controlled Text Generation

    Authors: Muhammad Khalifa, Hady Elsahar, Marc Dymetman

    Abstract: We propose a Distributional Approach for addressing Controlled Text Generation from pre-trained Language Models (LMs). This approach permits to specify, in a single formal framework, both "pointwise" and "distributional" constraints over the target LM -- to our knowledge, the first model with such generality -- while minimizing KL divergence from the initial LM distribution. The optimal target dis… ▽ More

    Submitted 6 May, 2021; v1 submitted 21 December, 2020; originally announced December 2020.

    Comments: ICLR 2021 camera-ready version

  10. arXiv:1912.08517  [pdf, other

    cs.LG stat.ML

    Distributional Reinforcement Learning for Energy-Based Sequential Models

    Authors: Tetiana Parshakova, Jean-Marc Andreoli, Marc Dymetman

    Abstract: Global Autoregressive Models (GAMs) are a recent proposal [Parshakova et al., CoNLL 2019] for exploiting global properties of sequences for data-efficient learning of seq2seq models. In the first phase of training, an Energy-Based model (EBM) over sequences is derived. This EBM has high representational power, but is unnormalized and cannot be directly exploited for sampling. To address this issue… ▽ More

    Submitted 18 December, 2019; originally announced December 2019.

    Comments: OptRL workshop (Optimization Foundations for Reinforcement Learning) at Neurips 2019

  11. arXiv:1911.04997  [pdf, other

    cs.CL

    Character-based NMT with Transformer

    Authors: Rohit Gupta, Laurent Besacier, Marc Dymetman, Matthias Gallé

    Abstract: Character-based translation has several appealing advantages, but its performance is in general worse than a carefully tuned BPE baseline. In this paper we study the impact of character-based input and output with the Transformer architecture. In particular, our experiments on EN-DE show that character-based Transformer models are more robust than their BPE counterpart, both when translating noisy… ▽ More

    Submitted 12 November, 2019; originally announced November 2019.

  12. arXiv:1910.14589  [pdf, other

    cs.CL

    Machine Translation of Restaurant Reviews: New Corpus for Domain Adaptation and Robustness

    Authors: Alexandre Bérard, Ioan Calapodescu, Marc Dymetman, Claude Roux, Jean-Luc Meunier, Vassilina Nikoulina

    Abstract: We share a French-English parallel corpus of Foursquare restaurant reviews (https://europe.naverlabs.com/research/natural-language-processing/machine-translation-of-restaurant-reviews), and define a new task to encourage research on Neural Machine Translation robustness and domain adaptation, in a real-world scenario where better-quality MT would be greatly beneficial. We discuss the challenges of… ▽ More

    Submitted 31 October, 2019; originally announced October 2019.

    Comments: WNGT 2019 Paper

  13. arXiv:1909.07063  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Global Autoregressive Models for Data-Efficient Sequence Learning

    Authors: Tetiana Parshakova, Jean-Marc Andreoli, Marc Dymetman

    Abstract: Standard autoregressive seq2seq models are easily trained by max-likelihood, but tend to show poor results under small-data conditions. We introduce a class of seq2seq models, GAMs (Global Autoregressive Models), which combine an autoregressive component with a log-linear component, allowing the use of global \textit{a priori} features to compensate for lack of data. We train these models in two s… ▽ More

    Submitted 19 September, 2019; v1 submitted 16 September, 2019; originally announced September 2019.

    Comments: To appear in CONLL (The SIGNLL Conference on Computational Natural Language Learning) Hong Kong, Nov. 2019

  14. arXiv:1812.09836  [pdf, ps, other

    cs.CL cs.LG

    Moment Matching Training for Neural Machine Translation: A Preliminary Study

    Authors: Cong Duy Vu Hoang, Ioan Calapodescu, Marc Dymetman

    Abstract: In previous works, neural sequence models have been shown to improve significantly if external prior knowledge can be provided, for instance by allowing the model to access the embeddings of explicit features during both training and inference. In this work, we propose a different point of view on how to incorporate prior knowledge in a principled way, using a moment matching framework. In this ap… ▽ More

    Submitted 27 December, 2018; v1 submitted 24 December, 2018; originally announced December 2018.

    Comments: A preliminary study

  15. arXiv:1811.05826  [pdf, other

    cs.CL cs.LG

    Char2char Generation with Reranking for the E2E NLG Challenge

    Authors: Shubham Agarwal, Marc Dymetman, Eric Gaussier

    Abstract: This paper describes our submission to the E2E NLG Challenge. Recently, neural seq2seq approaches have become mainstream in NLG, often resorting to pre- (respectively post-) processing delexicalization (relexicalization) steps at the word-level to handle rare words. By contrast, we train a simple character level seq2seq model, which requires no pre/post-processing (delexicalization, tokenization o… ▽ More

    Submitted 4 November, 2018; originally announced November 2018.

  16. arXiv:1809.07721  [pdf

    cs.CL

    Symbolic Priors for RNN-based Semantic Parsing

    Authors: Chunyang Xiao, Marc Dymetman, Claire Gardent

    Abstract: Seq2seq models based on Recurrent Neural Networks (RNNs) have recently received a lot of attention in the domain of Semantic Parsing for Question Answering. While in principle they can be trained directly on pairs (natural language utterances, logical forms), their performance is limited by the amount of available data. To alleviate this problem, we propose to exploit various sources of prior know… ▽ More

    Submitted 20 September, 2018; originally announced September 2018.

  17. arXiv:1607.02467  [pdf, other

    cs.AI cs.CL cs.LG cs.NE

    Log-Linear RNNs: Towards Recurrent Neural Networks with Flexible Prior Knowledge

    Authors: Marc Dymetman, Chunyang Xiao

    Abstract: We introduce LL-RNNs (Log-Linear RNNs), an extension of Recurrent Neural Networks that replaces the softmax output layer by a log-linear output layer, of which the softmax is a special case. This conceptually simple move has two main advantages. First, it allows the learner to combat training data sparsity by allowing it to model words (or more generally, output symbols) as complex combinations of… ▽ More

    Submitted 16 December, 2016; v1 submitted 8 July, 2016; originally announced July 2016.

    Comments: Updated version of arXiv:1607.02467. Presented at the NIPS-2016 RNN Symposium, Barcelona, December 2016

  18. arXiv:1605.01652  [pdf, other

    cs.AI cs.CL

    LSTM-based Mixture-of-Experts for Knowledge-Aware Dialogues

    Authors: Phong Le, Marc Dymetman, Jean-Michel Renders

    Abstract: We introduce an LSTM-based method for dynamically integrating several word-prediction experts to obtain a conditional language model which can be good simultaneously at several subtasks. We illustrate this general approach with an application to dialogue where we integrate a neural chat model, good at conversational aspects, with a neural question-answering model, good at retrieving precise inform… ▽ More

    Submitted 5 May, 2016; originally announced May 2016.

  19. arXiv:1510.02049  [pdf, other

    cs.CL

    Assisting Composition of Email Responses: a Topic Prediction Approach

    Authors: Spandana Gella, Marc Dymetman, Jean Michel Renders, Sriram Venkatapathy

    Abstract: We propose an approach for hel** agents compose email replies to customer requests. To enable that, we use LDA to extract latent topics from a collection of email exchanges. We then use these latent topics to label our data, obtaining a so-called "silver standard" topic labelling. We exploit this labelled set to train a classifier to: (i) predict the topic distribution of the entire agent's emai… ▽ More

    Submitted 7 October, 2015; originally announced October 2015.

    Comments: 8 pages, 5 figures

  20. arXiv:1207.0742  [pdf, other

    cs.AI cs.CL cs.LG

    The OS* Algorithm: a Joint Approach to Exact Optimization and Sampling

    Authors: Marc Dymetman, Guillaume Bouchard, Simon Carter

    Abstract: Most current sampling algorithms for high-dimensional distributions are based on MCMC techniques and are approximate in the sense that they are valid only asymptotically. Rejection sampling, on the other hand, produces valid samples, but is unrealistically slow in high-dimension spaces. The OS* algorithm that we propose is a unified approach to exact optimization and sampling, based on incremental… ▽ More

    Submitted 3 July, 2012; originally announced July 2012.

    Comments: 21 pages

  21. arXiv:cs/9903007  [pdf, ps, other

    cs.CL cs.LO

    Some Remarks on the Geometry of Grammar

    Authors: Marc Dymetman

    Abstract: This paper, following (Dymetman:1998), presents an approach to grammar description and processing based on the geometry of cancellation diagrams, a concept which plays a central role in combinatorial group theory (Lyndon-Schuppe:1977). The focus here is on the geometric intuitions and on relating group-theoretical diagrams to the traditional charts associated with context-free grammars and type-… ▽ More

    Submitted 5 March, 1999; originally announced March 1999.

    Comments: 22 pages, 15 figures

    ACM Class: I.2.7; F.4.2

  22. Group Theory and Grammatical Description

    Authors: Marc Dymetman

    Abstract: This paper presents a model for linguistic description based on group theory. A grammar in this model, or "G-grammar", is a collection of lexical expressions which are products of logical forms, phonological forms, and their inverses. Phrasal descriptions are obtained by forming products of lexical expressions and by cancelling contiguous elements which are inverses of each other. We show applic… ▽ More

    Submitted 7 May, 1998; originally announced May 1998.

    Comments: 17 pages (Latex, Postscript). A shorter version of this paper will appear in the Coling/ACL 98 Proceedings. See http://www.xrce.xerox.com/people/dymetman/dymetman.html

    Report number: MLTT-033

  23. Charts, Interaction-Free Grammars, and the Compact Representation of Ambiguity

    Authors: Marc Dymetman

    Abstract: Recently researchers working in the LFG framework have proposed algorithms for taking advantage of the implicit context-free components of a unification grammar [Maxwell 96]. This paper clarifies the mathematical foundations of these techniques, provides a uniform framework in which they can be formally studied and eliminates the need for special purpose runtime data-structures recording ambigui… ▽ More

    Submitted 12 May, 1997; originally announced May 1997.

    Comments: 15 pages (Latex, Postscript), to appear in Proceedings IJCAI-97

    Report number: MLTT-029

  24. A Simple Transformation for Offline-Parsable Grammars and its Termination Properties

    Authors: Marc Dymetman

    Abstract: We present, in easily reproducible terms, a simple transformation for offline-parsable grammars which results in a provably terminating parsing program directly top-down interpretable in Prolog. The transformation consists in two steps: (1) removal of empty-productions, followed by: (2) left-recursion elimination. It is related both to left-corner parsing (where the grammar is compiled, rather t… ▽ More

    Submitted 14 May, 1996; originally announced May 1996.

    Comments: Latex. 5 pages. Appeared in Coling-94 Proceedings

  25. Extended Dependency Structures and their Formal Interpretation

    Authors: Marc Dymetman, Max Copperman

    Abstract: We describe two ``semantically-oriented'' dependency-structure formalisms, U-forms and S-forms. U-forms have been previously used in machine translation as interlingual representations, but without being provided with a formal interpretation. S-forms, which we introduce in this paper, are a scoped version of U-forms, and we define a compositional semantics mechanism for them. Two types of semant… ▽ More

    Submitted 9 May, 1996; v1 submitted 29 April, 1996; originally announced April 1996.

    Comments: uuencoded gz-compressed .tar file created by csh script uufiles. 17 pages. To appear in Proceedings of Coling-96. (Change from original submission: increased portability)

  26. Towards an Automatic Dictation System for Translators: the TransTalk Project

    Authors: Marc Dymetman, Julie Brousseau, George Foster, Pierre Isabelle, Yves Normandin, Pierre Plamondon

    Abstract: Professional translators often dictate their translations orally and have them typed afterwards. The TransTalk project aims at automating the second part of this process. Its originality as a dictation system lies in the fact that both the acoustic signal produced by the translator and the source text under translation are made available to the system. Probable translations of the source text ca… ▽ More

    Submitted 28 September, 1994; originally announced September 1994.

    Comments: Published in proceedings of the International Conference on Spoken Language Processing (ICSLP) 94. 4 pages, uuencoded compressed latex source with 4 postscript figures