Skip to main content

Showing 1–50 of 82 results for author: Szlam, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.10616  [pdf, other

    cs.LG cs.CL

    DiPaCo: Distributed Path Composition

    Authors: Arthur Douillard, Qixuan Feng, Andrei A. Rusu, Adhiguna Kuncoro, Yani Donchev, Rachita Chhaparia, Ionel Gog, Marc'Aurelio Ranzato, Jiajun Shen, Arthur Szlam

    Abstract: Progress in machine learning (ML) has been fueled by scaling neural network models. This scaling has been enabled by ever more heroic feats of engineering, necessary for accommodating ML approaches that require high bandwidth communication between devices working in parallel. In this work, we propose a co-designed modular architecture and training approach for ML models, dubbed DIstributed PAth CO… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  2. arXiv:2401.09135  [pdf, other

    cs.LG cs.CL

    Asynchronous Local-SGD Training for Language Modeling

    Authors: Bo Liu, Rachita Chhaparia, Arthur Douillard, Satyen Kale, Andrei A. Rusu, Jiajun Shen, Arthur Szlam, Marc'Aurelio Ranzato

    Abstract: Local stochastic gradient descent (Local-SGD), also referred to as federated averaging, is an approach to distributed optimization where each device performs more than one SGD update per communication. This work presents an empirical study of {\it asynchronous} Local-SGD for training language models; that is, each worker updates the global parameters as soon as it has finished its SGD steps. We co… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

  3. arXiv:2311.08105  [pdf, other

    cs.LG cs.CL

    DiLoCo: Distributed Low-Communication Training of Language Models

    Authors: Arthur Douillard, Qixuan Feng, Andrei A. Rusu, Rachita Chhaparia, Yani Donchev, Adhiguna Kuncoro, Marc'Aurelio Ranzato, Arthur Szlam, Jiajun Shen

    Abstract: Large language models (LLM) have become a critical component in many applications of machine learning. However, standard approaches to training LLM require a large number of tightly interconnected accelerators, with devices exchanging gradients and other intermediate states at each optimization step. While it is difficult to build and maintain a single computing cluster hosting many accelerators,… ▽ More

    Submitted 2 December, 2023; v1 submitted 14 November, 2023; originally announced November 2023.

  4. arXiv:2309.07974  [pdf, other

    cs.LG cs.AI

    A Data Source for Reasoning Embodied Agents

    Authors: Jack Lanchantin, Sainbayar Sukhbaatar, Gabriel Synnaeve, Yuxuan Sun, Kavya Srinet, Arthur Szlam

    Abstract: Recent progress in using machine learning models for reasoning tasks has been driven by novel model architectures, large-scale pre-training protocols, and dedicated reasoning datasets for fine-tuning. In this work, to further pursue these advances, we introduce a new data generator for machine reasoning that integrates with an embodied agent. The generated data consists of templated text queries a… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

  5. arXiv:2305.10783  [pdf, other

    cs.AI

    Transforming Human-Centered AI Collaboration: Redefining Embodied Agents Capabilities through Interactive Grounded Language Instructions

    Authors: Shrestha Mohanty, Negar Arabzadeh, Julia Kiseleva, Artem Zholus, Milagro Teruel, Ahmed Awadallah, Yuxuan Sun, Kavya Srinet, Arthur Szlam

    Abstract: Human intelligence's adaptability is remarkable, allowing us to adjust to new tasks and multi-modal environments swiftly. This skill is evident from a young age as we acquire new abilities and solve problems by imitating others or following natural language instructions. The research community is actively pursuing the development of interactive "embodied agents" that can engage in natural conversa… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

  6. arXiv:2305.00833  [pdf, other

    cs.LG cs.AI cs.CL

    Learning to Reason and Memorize with Self-Notes

    Authors: Jack Lanchantin, Shubham Toshniwal, Jason Weston, Arthur Szlam, Sainbayar Sukhbaatar

    Abstract: Large language models have been shown to struggle with multi-step reasoning, and do not retain previous reasoning steps for future use. We propose a simple method for solving both of these problems by allowing the model to take Self-Notes. Unlike recent chain-of-thought or scratchpad approaches, the model can deviate from the input context at any time to explicitly think and write down its thought… ▽ More

    Submitted 31 October, 2023; v1 submitted 1 May, 2023; originally announced May 2023.

  7. arXiv:2304.13835  [pdf, other

    cs.CL cs.LG

    Multi-Party Chat: Conversational Agents in Group Settings with Humans and Models

    Authors: Jimmy Wei, Kurt Shuster, Arthur Szlam, Jason Weston, Jack Urbanek, Mojtaba Komeili

    Abstract: Current dialogue research primarily studies pairwise (two-party) conversations, and does not address the everyday setting where more than two speakers converse together. In this work, we both collect and evaluate multi-party conversations to study this more general case. We use the LIGHT environment to construct grounded conversations, where each participant has an assigned character to role-play.… ▽ More

    Submitted 8 June, 2023; v1 submitted 26 April, 2023; originally announced April 2023.

  8. arXiv:2301.05746  [pdf, other

    cs.CL cs.AI

    Infusing Commonsense World Models with Graph Knowledge

    Authors: Alexander Gurung, Mojtaba Komeili, Arthur Szlam, Jason Weston, Jack Urbanek

    Abstract: While language models have become more capable of producing compelling language, we find there are still gaps in maintaining consistency, especially when describing events in a dynamically changing world. We study the setting of generating narratives in an open world text adventure game, where a graph representation of the underlying game state can be used to train models that consume and output b… ▽ More

    Submitted 13 January, 2023; originally announced January 2023.

  9. arXiv:2211.06552  [pdf, other

    cs.CL cs.AI

    Collecting Interactive Multi-modal Datasets for Grounded Language Understanding

    Authors: Shrestha Mohanty, Negar Arabzadeh, Milagro Teruel, Yuxuan Sun, Artem Zholus, Alexey Skrynnik, Mikhail Burtsev, Kavya Srinet, Aleksandr Panov, Arthur Szlam, Marc-Alexandre Côté, Julia Kiseleva

    Abstract: Human intelligence can remarkably adapt quickly to new tasks and environments. Starting from a very young age, humans acquire new skills and learn how to solve new tasks either by imitating the behavior of others or by following provided natural language instructions. To facilitate research which can enable similar capabilities in machines, we made the following contributions (1) formalized the co… ▽ More

    Submitted 21 March, 2023; v1 submitted 11 November, 2022; originally announced November 2022.

    Journal ref: Interactive Learning for Natural Language Processing NeurIPS 2022 Workshop

  10. arXiv:2210.05663  [pdf, other

    cs.RO cs.CV

    CLIP-Fields: Weakly Supervised Semantic Fields for Robotic Memory

    Authors: Nur Muhammad Mahi Shafiullah, Chris Paxton, Lerrel Pinto, Soumith Chintala, Arthur Szlam

    Abstract: We propose CLIP-Fields, an implicit scene model that can be used for a variety of tasks, such as segmentation, instance identification, semantic search over space, and view localization. CLIP-Fields learns a map** from spatial locations to semantic embedding vectors. Importantly, we show that this map** can be trained with supervision coming only from web-image and web-text trained models such… ▽ More

    Submitted 22 May, 2023; v1 submitted 11 October, 2022; originally announced October 2022.

    Comments: Code, video, and interactive demonstrations available at https://mahis.life/clip-fields. Accepted for publication at Robotics: Science and Systems 2023 in Daegu, Korea

  11. arXiv:2208.03188  [pdf, other

    cs.CL cs.AI

    BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage

    Authors: Kurt Shuster, **g Xu, Mojtaba Komeili, Da Ju, Eric Michael Smith, Stephen Roller, Megan Ung, Moya Chen, Kushal Arora, Joshua Lane, Morteza Behrooz, William Ngan, Spencer Poff, Naman Goyal, Arthur Szlam, Y-Lan Boureau, Melanie Kambadur, Jason Weston

    Abstract: We present BlenderBot 3, a 175B parameter dialogue model capable of open-domain conversation with access to the internet and a long-term memory, and having been trained on a large number of user defined tasks. We release both the model weights and code, and have also deployed the model on a public web page to interact with organic users. This technical report describes how the model was built (arc… ▽ More

    Submitted 10 August, 2022; v1 submitted 5 August, 2022; originally announced August 2022.

  12. arXiv:2206.00142  [pdf, other

    cs.LG cs.AI cs.CL

    IGLU Gridworld: Simple and Fast Environment for Embodied Dialog Agents

    Authors: Artem Zholus, Alexey Skrynnik, Shrestha Mohanty, Zoya Volovikova, Julia Kiseleva, Artur Szlam, Marc-Alexandre Coté, Aleksandr I. Panov

    Abstract: We present the IGLU Gridworld: a reinforcement learning environment for building and evaluating language conditioned embodied agents in a scalable way. The environment features visual agent embodiment, interactive learning through collaboration, language conditioned RL, and combinatorically hard task (3d blocks building) space.

    Submitted 31 May, 2022; originally announced June 2022.

  13. arXiv:2205.13771  [pdf, other

    cs.CL

    IGLU 2022: Interactive Grounded Language Understanding in a Collaborative Environment at NeurIPS 2022

    Authors: Julia Kiseleva, Alexey Skrynnik, Artem Zholus, Shrestha Mohanty, Negar Arabzadeh, Marc-Alexandre Côté, Mohammad Aliannejadi, Milagro Teruel, Ziming Li, Mikhail Burtsev, Maartje ter Hoeve, Zoya Volovikova, Aleksandr Panov, Yuxuan Sun, Kavya Srinet, Arthur Szlam, Ahmed Awadallah

    Abstract: Human intelligence has the remarkable ability to adapt to new tasks and environments quickly. Starting from a very young age, humans acquire new skills and learn how to solve new tasks either by imitating the behavior of others or by following provided natural language instructions. To facilitate research in this direction, we propose IGLU: Interactive Grounded Language Understanding in a Collabor… ▽ More

    Submitted 27 May, 2022; originally announced May 2022.

    Comments: arXiv admin note: text overlap with arXiv:2110.06536

  14. arXiv:2205.02388  [pdf, other

    cs.CL cs.AI

    Interactive Grounded Language Understanding in a Collaborative Environment: IGLU 2021

    Authors: Julia Kiseleva, Ziming Li, Mohammad Aliannejadi, Shrestha Mohanty, Maartje ter Hoeve, Mikhail Burtsev, Alexey Skrynnik, Artem Zholus, Aleksandr Panov, Kavya Srinet, Arthur Szlam, Yuxuan Sun, Marc-Alexandre Côté, Katja Hofmann, Ahmed Awadallah, Linar Abdrazakov, Igor Churin, Putra Manggala, Kata Naszadi, Michiel van der Meer, Taewoon Kim

    Abstract: Human intelligence has the remarkable ability to quickly adapt to new tasks and environments. Starting from a very young age, humans acquire new skills and learn how to solve new tasks either by imitating the behavior of others or by following provided natural language instructions. To facilitate research in this direction, we propose \emph{IGLU: Interactive Grounded Language Understanding in a Co… ▽ More

    Submitted 27 May, 2022; v1 submitted 4 May, 2022; originally announced May 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2110.06536

    Journal ref: Proceedings of Machine Learning Research NeurIPS 2021 Competition and Demonstration Track

  15. arXiv:2204.08687  [pdf, other

    cs.AI

    Many Episode Learning in a Modular Embodied Agent via End-to-End Interaction

    Authors: Yuxuan Sun, Ethan Carlson, Rebecca Qian, Kavya Srinet, Arthur Szlam

    Abstract: In this work we give a case study of an embodied machine-learning (ML) powered agent that improves itself via interactions with crowd-workers. The agent consists of a set of modules, some of which are learned, and others heuristic. While the agent is not "end-to-end" in the ML sense, end-to-end interaction is a vital part of the agent's learning mechanism. We describe how the design of the agent w… ▽ More

    Submitted 10 January, 2023; v1 submitted 19 April, 2022; originally announced April 2022.

  16. arXiv:2203.13224  [pdf, other

    cs.CL cs.AI

    Language Models that Seek for Knowledge: Modular Search & Generation for Dialogue and Prompt Completion

    Authors: Kurt Shuster, Mojtaba Komeili, Leonard Adolphs, Stephen Roller, Arthur Szlam, Jason Weston

    Abstract: Language models (LMs) have recently been shown to generate more factual responses by employing modularity (Zhou et al., 2021) in combination with retrieval (Adolphs et al., 2021). We extend the recent approach of Adolphs et al. (2021) to include internet search as a module. Our SeeKeR (Search engine->Knowledge->Response) method thus applies a single LM to three modular tasks in succession: search,… ▽ More

    Submitted 29 March, 2022; v1 submitted 24 March, 2022; originally announced March 2022.

  17. arXiv:2203.06215  [pdf, other

    cs.CV cs.AI

    Can I see an Example? Active Learning the Long Tail of Attributes and Relations

    Authors: Tyler L. Hayes, Maximilian Nickel, Christopher Kanan, Ludovic Denoyer, Arthur Szlam

    Abstract: There has been significant progress in creating machine learning models that identify objects in scenes along with their associated attributes and relationships; however, there is a large gap between the best models and human capabilities. One of the major reasons for this gap is the difficulty in collecting sufficient amounts of annotated relations and attributes for training these systems. While… ▽ More

    Submitted 7 October, 2022; v1 submitted 11 March, 2022; originally announced March 2022.

    Comments: To appear in the British Machine Vision Conference (BMVC-2022)

  18. arXiv:2112.05843  [pdf, other

    cs.CL

    Am I Me or You? State-of-the-Art Dialogue Models Cannot Maintain an Identity

    Authors: Kurt Shuster, Jack Urbanek, Arthur Szlam, Jason Weston

    Abstract: State-of-the-art dialogue models still often stumble with regards to factual accuracy and self-contradiction. Anecdotally, they have been observed to fail to maintain character identity throughout discourse; and more specifically, may take on the role of their interlocutor. In this work we formalize and quantify this deficiency, and show experimentally through human evaluations that this is indeed… ▽ More

    Submitted 10 December, 2021; originally announced December 2021.

  19. arXiv:2111.05204  [pdf, other

    cs.CL cs.AI cs.LG

    Reason first, then respond: Modular Generation for Knowledge-infused Dialogue

    Authors: Leonard Adolphs, Kurt Shuster, Jack Urbanek, Arthur Szlam, Jason Weston

    Abstract: Large language models can produce fluent dialogue but often hallucinate factual inaccuracies. While retrieval-augmented models help alleviate this issue, they still face a difficult challenge of both reasoning to provide correct knowledge and generating conversation simultaneously. In this work, we propose a modular model, Knowledge to Response (K2R), for incorporating knowledge into conversationa… ▽ More

    Submitted 9 November, 2021; originally announced November 2021.

  20. arXiv:2110.06536  [pdf, other

    cs.AI

    NeurIPS 2021 Competition IGLU: Interactive Grounded Language Understanding in a Collaborative Environment

    Authors: Julia Kiseleva, Ziming Li, Mohammad Aliannejadi, Shrestha Mohanty, Maartje ter Hoeve, Mikhail Burtsev, Alexey Skrynnik, Artem Zholus, Aleksandr Panov, Kavya Srinet, Arthur Szlam, Yuxuan Sun, Katja Hofmann, Michel Galley, Ahmed Awadallah

    Abstract: Human intelligence has the remarkable ability to adapt to new tasks and environments quickly. Starting from a very young age, humans acquire new skills and learn how to solve new tasks either by imitating the behavior of others or by following provided natural language instructions. To facilitate research in this direction, we propose IGLU: Interactive Grounded Language Understanding in a Collabor… ▽ More

    Submitted 14 October, 2021; v1 submitted 13 October, 2021; originally announced October 2021.

  21. arXiv:2107.07567  [pdf, other

    cs.CL cs.AI

    Beyond Goldfish Memory: Long-Term Open-Domain Conversation

    Authors: **g Xu, Arthur Szlam, Jason Weston

    Abstract: Despite recent improvements in open-domain dialogue models, state of the art models are trained and evaluated on short conversations with little context. In contrast, the long-term conversation setting has hardly been studied. In this work we collect and release a human-human dataset consisting of multiple chat sessions whereby the speaking partners learn about each other's interests and discuss t… ▽ More

    Submitted 15 July, 2021; originally announced July 2021.

  22. arXiv:2106.04426  [pdf, other

    cs.LG cs.CL

    Hash Layers For Large Sparse Models

    Authors: Stephen Roller, Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston

    Abstract: We investigate the training of sparse layers that use different parameters for different inputs based on hashing in large Transformer models. Specifically, we modify the feedforward layer to hash to different sets of weights depending on the current token, over all tokens in the sequence. We show that this procedure either outperforms or is competitive with learning-to-route mixture-of-expert meth… ▽ More

    Submitted 20 July, 2021; v1 submitted 8 June, 2021; originally announced June 2021.

  23. arXiv:2105.06548  [pdf, other

    cs.LG cs.AI

    Not All Memories are Created Equal: Learning to Forget by Expiring

    Authors: Sainbayar Sukhbaatar, Da Ju, Spencer Poff, Stephen Roller, Arthur Szlam, Jason Weston, Angela Fan

    Abstract: Attention mechanisms have shown promising results in sequence modeling tasks that require long-term memory. Recent work investigated mechanisms to reduce the computational cost of preserving and storing memories. However, not all content in the past is equally important to remember. We propose Expire-Span, a method that learns to retain the most important information and expire the irrelevant info… ▽ More

    Submitted 13 June, 2021; v1 submitted 13 May, 2021; originally announced May 2021.

  24. arXiv:2101.10384  [pdf, other

    cs.RO cs.AI

    droidlet: modular, heterogenous, multi-modal agents

    Authors: Anurag Pratik, Soumith Chintala, Kavya Srinet, Dhiraj Gandhi, Rebecca Qian, Yuxuan Sun, Ryan Drew, Sara Elkafrawy, Anoushka Tiwari, Tucker Hart, Mary Williamson, Abhinav Gupta, Arthur Szlam

    Abstract: In recent years, there have been significant advances in building end-to-end Machine Learning (ML) systems that learn at scale. But most of these systems are: (a) isolated (perception, speech, or language only); (b) trained on static datasets. On the other hand, in the field of robotics, large-scale learning has always been difficult. Supervision is hard to gather and real world physical interacti… ▽ More

    Submitted 25 January, 2021; originally announced January 2021.

  25. arXiv:2012.14983  [pdf, other

    cs.CL cs.AI cs.LG

    Reducing conversational agents' overconfidence through linguistic calibration

    Authors: Sabrina J. Mielke, Arthur Szlam, Emily Dinan, Y-Lan Boureau

    Abstract: While improving neural dialogue agents' factual accuracy is the object of much research, another important aspect of communication, less studied in the setting of neural dialogue, is transparency about ignorance. In this work, we analyze to what extent state-of-the-art chit-chat models are linguistically calibrated in the sense that their verbalized expression of doubt (or confidence) matches the… ▽ More

    Submitted 26 June, 2022; v1 submitted 29 December, 2020; originally announced December 2020.

    Comments: Accepted in TACL, to be presented at NAACL 2022

  26. arXiv:2012.09543  [pdf, other

    cs.LG

    Few-shot Sequence Learning with Transformers

    Authors: Lajanugen Logeswaran, Ann Lee, Myle Ott, Honglak Lee, Marc'Aurelio Ranzato, Arthur Szlam

    Abstract: Few-shot algorithms aim at learning new tasks provided only a handful of training examples. In this work we investigate few-shot learning in the setting where the data points are sequences of tokens and propose an efficient learning algorithm based on Transformers. In the simplest setting, we append a token to an input sequence which represents the particular task to be undertaken, and show that t… ▽ More

    Submitted 17 December, 2020; originally announced December 2020.

    Comments: NeurIPS Meta-Learning Workshop 2020

  27. arXiv:2010.02855  [pdf, other

    cs.AI cs.LG

    CURI: A Benchmark for Productive Concept Learning Under Uncertainty

    Authors: Ramakrishna Vedantam, Arthur Szlam, Maximilian Nickel, Ari Morcos, Brenden Lake

    Abstract: Humans can learn and reason under substantial uncertainty in a space of infinitely many concepts, including structured relational concepts ("a scene with objects that have the same color") and ad-hoc categories defined through goals ("objects that could fall on one's head"). In contrast, standard classification benchmarks: 1) consider only a fixed set of category labels, 2) do not evaluate composi… ▽ More

    Submitted 6 October, 2020; originally announced October 2020.

  28. arXiv:2010.00685  [pdf, other

    cs.CL cs.AI

    How to Motivate Your Dragon: Teaching Goal-Driven Agents to Speak and Act in Fantasy Worlds

    Authors: Prithviraj Ammanabrolu, Jack Urbanek, Margaret Li, Arthur Szlam, Tim Rocktäschel, Jason Weston

    Abstract: We seek to create agents that both act and communicate with other agents in pursuit of a goal. Towards this end, we extend LIGHT (Urbanek et al. 2019) -- a large-scale crowd-sourced fantasy text-game -- with a dataset of quests. These contain natural language motivations paired with in-game goals and human demonstrations; completing a quest might require dialogue or actions (or both). We introduce… ▽ More

    Submitted 25 May, 2021; v1 submitted 1 October, 2020; originally announced October 2020.

    Comments: In NAACL 2021

  29. arXiv:2008.08076  [pdf, other

    cs.AI cs.CL

    Deploying Lifelong Open-Domain Dialogue Learning

    Authors: Kurt Shuster, Jack Urbanek, Emily Dinan, Arthur Szlam, Jason Weston

    Abstract: Much of NLP research has focused on crowdsourced static datasets and the supervised learning paradigm of training once and then evaluating test performance. As argued in de Vries et al. (2020), crowdsourced data has the issues of lack of naturalness and relevance to real-world use cases, while the static dataset paradigm does not allow for a model to learn from its experiences of using language (S… ▽ More

    Submitted 19 August, 2020; v1 submitted 18 August, 2020; originally announced August 2020.

  30. arXiv:2007.02879  [pdf, other

    cs.LG cs.AI

    Fast Adaptation via Policy-Dynamics Value Functions

    Authors: Roberta Raileanu, Max Goldstein, Arthur Szlam, Rob Fergus

    Abstract: Standard RL algorithms assume fixed environment dynamics and require a significant amount of interaction to adapt to new environments. We introduce Policy-Dynamics Value Functions (PD-VF), a novel approach for rapidly adapting to dynamics different from those previously seen in training. PD-VF explicitly estimates the cumulative reward in a space of policies and environments. An ensemble of conven… ▽ More

    Submitted 6 July, 2020; originally announced July 2020.

  31. arXiv:2006.15762  [pdf, other

    cs.AI cs.LG stat.ML

    Empirically Verifying Hypotheses Using Reinforcement Learning

    Authors: Kenneth Marino, Rob Fergus, Arthur Szlam, Abhinav Gupta

    Abstract: This paper formulates hypothesis verification as an RL problem. Specifically, we aim to build an agent that, given a hypothesis about the dynamics of the world, can take actions to generate observations which can help predict whether the hypothesis is true or false. Existing RL algorithms fail to solve this task, even for simple environments. In order to train the agents, we exploit the underlying… ▽ More

    Submitted 28 June, 2020; originally announced June 2020.

  32. arXiv:2006.12442  [pdf, other

    cs.CL cs.AI

    Open-Domain Conversational Agents: Current Progress, Open Problems, and Future Directions

    Authors: Stephen Roller, Y-Lan Boureau, Jason Weston, Antoine Bordes, Emily Dinan, Angela Fan, David Gunning, Da Ju, Margaret Li, Spencer Poff, Pratik Ringshia, Kurt Shuster, Eric Michael Smith, Arthur Szlam, Jack Urbanek, Mary Williamson

    Abstract: We present our view of what is necessary to build an engaging open-domain conversational agent: covering the qualities of such an agent, the pieces of the puzzle that have been built so far, and the ga** holes we have not filled yet. We present a biased view, focusing on work done by our own group, while citing related work in each area. In particular, we discuss in detail the properties of cont… ▽ More

    Submitted 13 July, 2020; v1 submitted 22 June, 2020; originally announced June 2020.

  33. arXiv:2004.11714  [pdf, other

    cs.CL cs.LG

    Residual Energy-Based Models for Text Generation

    Authors: Yuntian Deng, Anton Bakhtin, Myle Ott, Arthur Szlam, Marc'Aurelio Ranzato

    Abstract: Text generation is ubiquitous in many NLP tasks, from summarization, to dialogue and machine translation. The dominant parametric approach is based on locally normalized models which predict one word at a time. While these work remarkably well, they are plagued by exposure bias due to the greedy nature of the generation process. In this work, we investigate un-normalized energy-based models (EBMs)… ▽ More

    Submitted 22 April, 2020; originally announced April 2020.

    Comments: published at ICLR 2020. arXiv admin note: substantial text overlap with arXiv:2004.10188

    Journal ref: ICLR 2020

  34. arXiv:2004.10188  [pdf, other

    cs.CL cs.LG stat.ML

    Residual Energy-Based Models for Text

    Authors: Anton Bakhtin, Yuntian Deng, Sam Gross, Myle Ott, Marc'Aurelio Ranzato, Arthur Szlam

    Abstract: Current large-scale auto-regressive language models display impressive fluency and can generate convincing text. In this work we start by asking the question: Can the generations of these models be reliably distinguished from real text by statistical discriminators? We find experimentally that the answer is affirmative when we have access to the training data for the model, and guardedly affirmati… ▽ More

    Submitted 21 December, 2020; v1 submitted 6 April, 2020; originally announced April 2020.

    Comments: long journal version

    Journal ref: Journal of Machine Learning Research 21 (2020) 1-41

  35. arXiv:2004.04954  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Learning to Visually Navigate in Photorealistic Environments Without any Supervision

    Authors: Lina Mezghani, Sainbayar Sukhbaatar, Arthur Szlam, Armand Joulin, Piotr Bojanowski

    Abstract: Learning to navigate in a realistic setting where an agent must rely solely on visual inputs is a challenging task, in part because the lack of position information makes it difficult to provide supervision during training. In this paper, we introduce a novel approach for learning to navigate from image inputs without external supervision or reward. Our approach consists of three stages: learning… ▽ More

    Submitted 10 April, 2020; originally announced April 2020.

  36. arXiv:2002.02878  [pdf, other

    cs.AI cs.CL stat.ML

    I love your chain mail! Making knights smile in a fantasy game world: Open-domain goal-oriented dialogue agents

    Authors: Shrimai Prabhumoye, Margaret Li, Jack Urbanek, Emily Dinan, Douwe Kiela, Jason Weston, Arthur Szlam

    Abstract: Dialogue research tends to distinguish between chit-chat and goal-oriented tasks. While the former is arguably more naturalistic and has a wider use of language, the latter has clearer metrics and a straightforward learning signal. Humans effortlessly combine the two, for example engaging in chit-chat with the goal of exchanging information or eliciting a specific response. Here, we bridge the div… ▽ More

    Submitted 10 February, 2020; v1 submitted 7 February, 2020; originally announced February 2020.

  37. arXiv:1911.09194  [pdf, other

    cs.AI cs.CL cs.LG

    Generating Interactive Worlds with Text

    Authors: Angela Fan, Jack Urbanek, Pratik Ringshia, Emily Dinan, Emma Qian, Siddharth Karamcheti, Shrimai Prabhumoye, Douwe Kiela, Tim Rocktaschel, Arthur Szlam, Jason Weston

    Abstract: Procedurally generating cohesive and interesting game environments is challenging and time-consuming. In order for the relationships between the game elements to be natural, common-sense has to be encoded into arrangement of the elements. In this work, we investigate a machine learning approach for world creation using content from the multi-player text adventure game environment LIGHT. We introdu… ▽ More

    Submitted 4 December, 2019; v1 submitted 20 November, 2019; originally announced November 2019.

  38. arXiv:1907.09273  [pdf, other

    cs.AI cs.CL

    Why Build an Assistant in Minecraft?

    Authors: Arthur Szlam, Jonathan Gray, Kavya Srinet, Yacine Jernite, Armand Joulin, Gabriel Synnaeve, Douwe Kiela, Haonan Yu, Zhuoyuan Chen, Siddharth Goyal, Demi Guo, Danielle Rothermel, C. Lawrence Zitnick, Jason Weston

    Abstract: In this document we describe a rationale for a research program aimed at building an open "assistant" in the game Minecraft, in order to make progress on the problems of natural language understanding and learning from dialogue.

    Submitted 25 July, 2019; v1 submitted 22 July, 2019; originally announced July 2019.

  39. arXiv:1907.08584  [pdf, other

    cs.AI

    CraftAssist: A Framework for Dialogue-enabled Interactive Agents

    Authors: Jonathan Gray, Kavya Srinet, Yacine Jernite, Haonan Yu, Zhuoyuan Chen, Demi Guo, Siddharth Goyal, C. Lawrence Zitnick, Arthur Szlam

    Abstract: This paper describes an implementation of a bot assistant in Minecraft, and the tools and platform allowing players to interact with the bot and to record those interactions. The purpose of building such an assistant is to facilitate the study of agents that can complete tasks specified by dialogue, and eventually, to learn from dialogue interactions.

    Submitted 19 July, 2019; originally announced July 2019.

  40. arXiv:1906.03351  [pdf, other

    cs.LG cs.CL stat.ML

    Real or Fake? Learning to Discriminate Machine from Human Generated Text

    Authors: Anton Bakhtin, Sam Gross, Myle Ott, Yuntian Deng, Marc'Aurelio Ranzato, Arthur Szlam

    Abstract: Energy-based models (EBMs), a.k.a. un-normalized models, have had recent successes in continuous spaces. However, they have not been successfully applied to model text sequences. While decreasing the energy at training samples is straightforward, mining (negative) samples where the energy should be increased is difficult. In part, this is because standard gradient-based methods are not readily app… ▽ More

    Submitted 25 November, 2019; v1 submitted 7 June, 2019; originally announced June 2019.

  41. arXiv:1905.01978  [pdf, other

    cs.CL cs.AI

    CraftAssist Instruction Parsing: Semantic Parsing for a Minecraft Assistant

    Authors: Yacine Jernite, Kavya Srinet, Jonathan Gray, Arthur Szlam

    Abstract: We propose a large scale semantic parsing dataset focused on instruction-driven communication with an agent in Minecraft. We describe the data collection process which yields additional 35K human generated instructions with their semantic annotations. We report the performance of three baseline models and find that while a dataset of this size helps us train a usable instruction parser, it still p… ▽ More

    Submitted 17 April, 2019; originally announced May 2019.

  42. arXiv:1903.03094  [pdf, other

    cs.CL cs.AI

    Learning to Speak and Act in a Fantasy Text Adventure Game

    Authors: Jack Urbanek, Angela Fan, Siddharth Karamcheti, Saachi Jain, Samuel Humeau, Emily Dinan, Tim Rocktäschel, Douwe Kiela, Arthur Szlam, Jason Weston

    Abstract: We introduce a large scale crowdsourced text adventure game as a research platform for studying grounded dialogue. In it, agents can perceive, emote, and act whilst conducting dialogue with other agents. Models and humans can both act as characters within the game. We describe the results of training state-of-the-art generative and retrieval models in this setting. We show that in addition to usin… ▽ More

    Submitted 7 March, 2019; originally announced March 2019.

  43. arXiv:1902.00098  [pdf, other

    cs.AI cs.CL cs.HC

    The Second Conversational Intelligence Challenge (ConvAI2)

    Authors: Emily Dinan, Varvara Logacheva, Valentin Malykh, Alexander Miller, Kurt Shuster, Jack Urbanek, Douwe Kiela, Arthur Szlam, Iulian Serban, Ryan Lowe, Shrimai Prabhumoye, Alan W Black, Alexander Rudnicky, Jason Williams, Joelle Pineau, Mikhail Burtsev, Jason Weston

    Abstract: We describe the setting and results of the ConvAI2 NeurIPS competition that aims to further the state-of-the-art in open-domain chatbots. Some key takeaways from the competition are: (i) pretrained Transformer variants are currently the best performing models on this task, (ii) but to improve performance on multi-turn conversations with humans, future systems must go beyond single word metrics lik… ▽ More

    Submitted 31 January, 2019; originally announced February 2019.

  44. arXiv:1811.09083  [pdf, other

    cs.LG stat.ML

    Learning Goal Embeddings via Self-Play for Hierarchical Reinforcement Learning

    Authors: Sainbayar Sukhbaatar, Emily Denton, Arthur Szlam, Rob Fergus

    Abstract: In hierarchical reinforcement learning a major challenge is determining appropriate low-level policies. We propose an unsupervised learning scheme, based on asymmetric self-play from Sukhbaatar et al. (2018), that automatically learns a good representation of sub-goals in the environment and a low-level policy that can execute them. A high-level policy can then direct the lower one by generating a… ▽ More

    Submitted 22 November, 2018; originally announced November 2018.

  45. arXiv:1811.00671  [pdf, other

    cs.CL cs.AI

    Dialogue Natural Language Inference

    Authors: Sean Welleck, Jason Weston, Arthur Szlam, Kyunghyun Cho

    Abstract: Consistency is a long standing issue faced by dialogue models. In this paper, we frame the consistency of dialogue agents as natural language inference (NLI) and create a new natural language inference dataset called Dialogue NLI. We propose a method which demonstrates that a model trained on Dialogue NLI can be used to improve the consistency of a dialogue model, and evaluate the method with huma… ▽ More

    Submitted 17 January, 2019; v1 submitted 1 November, 2018; originally announced November 2018.

  46. arXiv:1809.02031  [pdf, other

    cs.AI

    Planning with Arithmetic and Geometric Attributes

    Authors: David Folqué, Sainbayar Sukhbaatar, Arthur Szlam, Joan Bruna

    Abstract: A desirable property of an intelligent agent is its ability to understand its environment to quickly generalize to novel tasks and compose simpler tasks into more complex ones. If the environment has geometric or arithmetic structure, the agent should exploit these for faster generalization. Building on recent work that augments the environment with user-specified attributes, we show that further… ▽ More

    Submitted 6 September, 2018; originally announced September 2018.

  47. arXiv:1804.07705  [pdf, other

    cs.CL

    Lightweight Adaptive Mixture of Neural and N-gram Language Models

    Authors: Anton Bakhtin, Arthur Szlam, Marc'Aurelio Ranzato, Edouard Grave

    Abstract: It is often the case that the best performing language model is an ensemble of a neural language model with n-grams. In this work, we propose a method to improve how these two models are combined. By using a small network which predicts the mixture weight between the two models, we adapt their relative importance at each time step. Because the gating network is small, it trains quickly on small am… ▽ More

    Submitted 26 October, 2018; v1 submitted 20 April, 2018; originally announced April 2018.

  48. arXiv:1803.00512  [pdf, other

    cs.AI

    Composable Planning with Attributes

    Authors: Amy Zhang, Adam Lerer, Sainbayar Sukhbaatar, Rob Fergus, Arthur Szlam

    Abstract: The tasks that an agent will need to solve often are not known during training. However, if the agent knows which properties of the environment are important then, after learning how its actions affect those properties, it may be able to use this knowledge to solve complex tasks without training specifically for them. Towards this end, we consider a setup in which an environment is augmented with… ▽ More

    Submitted 25 April, 2019; v1 submitted 1 March, 2018; originally announced March 2018.

    Journal ref: International Conference on Machine Learning, 2018

  49. arXiv:1802.09640  [pdf, other

    cs.AI cs.LG

    Modeling Others using Oneself in Multi-Agent Reinforcement Learning

    Authors: Roberta Raileanu, Emily Denton, Arthur Szlam, Rob Fergus

    Abstract: We consider the multi-agent reinforcement learning setting with imperfect information in which each agent is trying to maximize its own utility. The reward function depends on the hidden state (or goal) of both agents, so the agents must infer the other players' hidden goals from their observed behavior in order to solve the tasks. We propose a new approach for learning in these domains: Self Othe… ▽ More

    Submitted 23 March, 2018; v1 submitted 26 February, 2018; originally announced February 2018.

    Comments: 10 pages, 16 figures, submitted to ICML 2018

  50. arXiv:1801.07243  [pdf, ps, other

    cs.AI cs.CL

    Personalizing Dialogue Agents: I have a dog, do you have pets too?

    Authors: Saizheng Zhang, Emily Dinan, Jack Urbanek, Arthur Szlam, Douwe Kiela, Jason Weston

    Abstract: Chit-chat models are known to have several problems: they lack specificity, do not display a consistent personality and are often not very captivating. In this work we present the task of making chit-chat more engaging by conditioning on profile information. We collect data and train models to (i) condition on their given profile information; and (ii) information about the person they are talking… ▽ More

    Submitted 25 September, 2018; v1 submitted 22 January, 2018; originally announced January 2018.