Skip to main content

Showing 1–5 of 5 results for author: Emelin, D

.
  1. arXiv:2212.08120  [pdf, other

    cs.CL cs.AI

    Injecting Domain Knowledge in Language Models for Task-Oriented Dialogue Systems

    Authors: Denis Emelin, Daniele Bonadiman, Sawsan Alqahtani, Yi Zhang, Saab Mansour

    Abstract: Pre-trained language models (PLM) have advanced the state-of-the-art across NLP applications, but lack domain-specific knowledge that does not naturally occur in pre-training data. Previous studies augmented PLMs with symbolic knowledge for different downstream NLP tasks. However, knowledge bases (KBs) utilized in these studies are usually large-scale and static, in contrast to small, domain-speci… ▽ More

    Submitted 15 December, 2022; originally announced December 2022.

    Comments: Published at EMNLP 2022 (main conference)

  2. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, AdriĆ  Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  3. arXiv:2012.15738  [pdf, other

    cs.CL cs.AI

    Moral Stories: Situated Reasoning about Norms, Intents, Actions, and their Consequences

    Authors: Denis Emelin, Ronan Le Bras, Jena D. Hwang, Maxwell Forbes, Ye** Choi

    Abstract: In social settings, much of human behavior is governed by unspoken rules of conduct. For artificial systems to be fully integrated into social environments, adherence to such norms is a central prerequisite. We investigate whether contemporary NLG models can function as behavioral priors for systems deployed in social settings by generating action hypotheses that achieve predefined goals under mor… ▽ More

    Submitted 31 December, 2020; originally announced December 2020.

    Comments: For the 'Moral Stories' dataset, see https://github.com/demelin/moral_stories

  4. arXiv:2011.01846  [pdf, other

    cs.CL

    Detecting Word Sense Disambiguation Biases in Machine Translation for Model-Agnostic Adversarial Attacks

    Authors: Denis Emelin, Ivan Titov, Rico Sennrich

    Abstract: Word sense disambiguation is a well-known source of translation errors in NMT. We posit that some of the incorrect disambiguation choices are due to models' over-reliance on dataset artifacts found in training data, specifically superficial word co-occurrences, rather than a deeper understanding of the source text. We introduce a method for the prediction of disambiguation errors based on statisti… ▽ More

    Submitted 3 November, 2020; originally announced November 2020.

    Comments: Accepted to EMNLP 2020

  5. arXiv:1906.12284  [pdf, other

    cs.CL cs.LG

    Widening the Representation Bottleneck in Neural Machine Translation with Lexical Shortcuts

    Authors: Denis Emelin, Ivan Titov, Rico Sennrich

    Abstract: The transformer is a state-of-the-art neural translation model that uses attention to iteratively refine lexical representations with information drawn from the surrounding context. Lexical features are fed into the first layer and propagated through a deep network of hidden layers. We argue that the need to represent and propagate lexical features in each layer limits the model's capacity for lea… ▽ More

    Submitted 28 June, 2019; originally announced June 2019.

    Comments: Accepted submission to WMT 2019