Skip to main content

Showing 1–50 of 115 results for author: Weston, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.17744  [pdf, other

    cs.CL

    Following Length Constraints in Instructions

    Authors: Weizhe Yuan, Ilia Kulikov, ** Yu, Kyunghyun Cho, Sainbayar Sukhbaatar, Jason Weston, **g Xu

    Abstract: Aligned instruction following models can better fulfill user requests than their unaligned counterparts. However, it has been shown that there is a length bias in evaluation of such models, and that training algorithms tend to exploit this bias by learning longer responses. In this work we show how to train models that can be controlled at inference time with instructions containing desired length… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 13 pages

  2. arXiv:2405.18719  [pdf, other

    cs.CL cs.AI

    Contextual Position Encoding: Learning to Count What's Important

    Authors: Olga Golovneva, Tianlu Wang, Jason Weston, Sainbayar Sukhbaatar

    Abstract: The attention mechanism is a critical component of Large Language Models (LLMs) that allows tokens in a sequence to interact with each other, but is order-invariant. Incorporating position encoding (PE) makes it possible to address by position, such as attending to the i-th token. However, current PE methods use token counts to derive position, and thus cannot generalize to higher levels of abstra… ▽ More

    Submitted 30 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

  3. arXiv:2404.19733  [pdf, other

    cs.CL cs.AI

    Iterative Reasoning Preference Optimization

    Authors: Richard Yuanzhe Pang, Weizhe Yuan, Kyunghyun Cho, He He, Sainbayar Sukhbaatar, Jason Weston

    Abstract: Iterative preference optimization methods have recently been shown to perform well for general instruction tuning tasks, but typically make little improvement on reasoning tasks (Yuan et al., 2024, Chen et al., 2024). In this work we develop an iterative approach that optimizes the preference between competing generated Chain-of-Thought (CoT) candidates by optimizing for winning vs. losing reasoni… ▽ More

    Submitted 25 June, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

  4. arXiv:2403.13799  [pdf, other

    cs.CL cs.AI

    Reverse Training to Nurse the Reversal Curse

    Authors: Olga Golovneva, Zeyuan Allen-Zhu, Jason Weston, Sainbayar Sukhbaatar

    Abstract: Large language models (LLMs) have a surprising failure: when trained on "A has a feature B", they do not generalize to "B is a feature of A", which is termed the Reversal Curse. Even when training with trillions of tokens this issue still appears due to Zipf's law - hence even if we train on the entire internet. This work proposes an alternative training scheme, called reverse training, whereby al… ▽ More

    Submitted 7 May, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

  5. arXiv:2403.07816  [pdf, other

    cs.CL cs.AI

    Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM

    Authors: Sainbayar Sukhbaatar, Olga Golovneva, Vasu Sharma, Hu Xu, Xi Victoria Lin, Baptiste Rozière, Jacob Kahn, Daniel Li, Wen-tau Yih, Jason Weston, Xian Li

    Abstract: We investigate efficient methods for training Large Language Models (LLMs) to possess capabilities in multiple specialized domains, such as coding, math reasoning and world knowledge. Our method, named Branch-Train-MiX (BTX), starts from a seed model, which is branched to train experts in embarrassingly parallel fashion with high throughput and reduced communication cost. After individual experts… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  6. arXiv:2402.14158  [pdf, other

    cs.CL

    TOOLVERIFIER: Generalization to New Tools via Self-Verification

    Authors: Dheeraj Mekala, Jason Weston, Jack Lanchantin, Roberta Raileanu, Maria Lomeli, **gbo Shang, Jane Dwivedi-Yu

    Abstract: Teaching language models to use tools is an important milestone towards building general assistants, but remains an open problem. While there has been significant progress on learning to use specific tools via fine-tuning, language models still struggle with learning how to robustly use new tools from only a few demonstrations. In this work we introduce a self-verification method which distinguish… ▽ More

    Submitted 13 March, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

  7. arXiv:2401.10020  [pdf, other

    cs.CL cs.AI

    Self-Rewarding Language Models

    Authors: Weizhe Yuan, Richard Yuanzhe Pang, Kyunghyun Cho, Xian Li, Sainbayar Sukhbaatar, **g Xu, Jason Weston

    Abstract: We posit that to achieve superhuman agents, future models require superhuman feedback in order to provide an adequate training signal. Current approaches commonly train reward models from human preferences, which may then be bottlenecked by human performance level, and secondly these separate frozen reward models cannot then learn to improve during LLM training. In this work, we study Self-Rewardi… ▽ More

    Submitted 8 February, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

  8. arXiv:2312.16682  [pdf, other

    cs.CL cs.AI

    Some things are more CRINGE than others: Iterative Preference Optimization with the Pairwise Cringe Loss

    Authors: **g Xu, Andrew Lee, Sainbayar Sukhbaatar, Jason Weston

    Abstract: Practitioners commonly align large language models using pairwise preferences, i.e., given labels of the type response A is preferred to response B for a given input. Perhaps less commonly, methods have also been developed for binary feedback, i.e. training models given labels of type response A is good or bad. We show how an existing performant binary feedback method, the Cringe Loss (Adolphs et… ▽ More

    Submitted 22 April, 2024; v1 submitted 27 December, 2023; originally announced December 2023.

  9. arXiv:2311.11829  [pdf, other

    cs.CL cs.AI cs.LG

    System 2 Attention (is something you might need too)

    Authors: Jason Weston, Sainbayar Sukhbaatar

    Abstract: Soft attention in Transformer-based Large Language Models (LLMs) is susceptible to incorporating irrelevant information from the context into its latent representations, which adversely affects next token generations. To help rectify these issues, we introduce System 2 Attention (S2A), which leverages the ability of LLMs to reason in natural language and follow instructions in order to decide what… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

  10. arXiv:2311.07961  [pdf, other

    cs.CL

    The ART of LLM Refinement: Ask, Refine, and Trust

    Authors: Kumar Shridhar, Koustuv Sinha, Andrew Cohen, Tianlu Wang, ** Yu, Ram Pasunuru, Mrinmaya Sachan, Jason Weston, Asli Celikyilmaz

    Abstract: In recent years, Large Language Models (LLMs) have demonstrated remarkable generative abilities, but can they judge the quality of their own generations? A popular concept, referred to as self-refinement, postulates that LLMs can detect and correct the errors in their generations when asked to do so. However, recent empirical evidence points in the opposite direction, suggesting that LLMs often st… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

  11. arXiv:2310.15123  [pdf, other

    cs.CL cs.AI cs.LG

    Branch-Solve-Merge Improves Large Language Model Evaluation and Generation

    Authors: Swarnadeep Saha, Omer Levy, Asli Celikyilmaz, Mohit Bansal, Jason Weston, Xian Li

    Abstract: Large Language Models (LLMs) are frequently used for multi-faceted language generation and evaluation tasks that involve satisfying intricate user constraints or taking into account multiple aspects and criteria. However, their performance can fall short, due to the model's lack of coherence and inability to plan and decompose the problem. We propose Branch-Solve-Merge (BSM), a Large Language Mode… ▽ More

    Submitted 7 June, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: NAACL 2024 (19 pages, 7 figures, 11 tables)

  12. arXiv:2310.05029  [pdf, other

    cs.CL

    Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading

    Authors: Howard Chen, Ramakanth Pasunuru, Jason Weston, Asli Celikyilmaz

    Abstract: Large language models (LLMs) have advanced in large strides due to the effectiveness of the self-attention mechanism that processes and compares all tokens at once. However, this mechanism comes with a fundamental issue -- the predetermined context window is bound to be limited. Despite attempts to extend the context window through methods like extrapolating the positional embedding, using recurre… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

  13. arXiv:2309.11495  [pdf, other

    cs.CL cs.AI

    Chain-of-Verification Reduces Hallucination in Large Language Models

    Authors: Shehzaad Dhuliawala, Mojtaba Komeili, **g Xu, Roberta Raileanu, Xian Li, Asli Celikyilmaz, Jason Weston

    Abstract: Generation of plausible yet incorrect factual information, termed hallucination, is an unsolved issue in large language models. We study the ability of language models to deliberate on the responses they give in order to correct their mistakes. We develop the Chain-of-Verification (CoVe) method whereby the model first (i) drafts an initial response; then (ii) plans verification questions to fact-c… ▽ More

    Submitted 25 September, 2023; v1 submitted 20 September, 2023; originally announced September 2023.

  14. arXiv:2308.06259  [pdf, other

    cs.CL

    Self-Alignment with Instruction Backtranslation

    Authors: Xian Li, ** Yu, Chunting Zhou, Timo Schick, Omer Levy, Luke Zettlemoyer, Jason Weston, Mike Lewis

    Abstract: We present a scalable method to build a high quality instruction following language model by automatically labelling human-written text with corresponding instructions. Our approach, named instruction backtranslation, starts with a language model finetuned on a small amount of seed data, and a given web corpus. The seed model is used to construct training examples by generating instruction prompts… ▽ More

    Submitted 12 March, 2024; v1 submitted 11 August, 2023; originally announced August 2023.

    Comments: ICLR2024 camera ready

  15. arXiv:2307.14117  [pdf, other

    cs.CL

    Leveraging Implicit Feedback from Deployment Data in Dialogue

    Authors: Richard Yuanzhe Pang, Stephen Roller, Kyunghyun Cho, He He, Jason Weston

    Abstract: We study improving social conversational agents by learning from natural dialogue between users and a deployed model, without extra annotations. To implicitly measure the quality of a machine-generated utterance, we leverage signals like user response length, sentiment and reaction of the future human utterances in the collected dialogue episodes. Our experiments use the publicly released deployme… ▽ More

    Submitted 31 January, 2024; v1 submitted 26 July, 2023; originally announced July 2023.

    Comments: EACL 2024

  16. arXiv:2306.13588  [pdf, other

    cs.CL cs.AI

    System-Level Natural Language Feedback

    Authors: Weizhe Yuan, Kyunghyun Cho, Jason Weston

    Abstract: Natural language (NL) feedback offers rich insights into user experience. While existing studies focus on an instance-level approach, where feedback is used to refine specific examples, we introduce a framework for system-level use of NL feedback. We show how to use feedback to formalize system-level design decisions in a human-in-the-loop-process -- in order to produce better models. In particula… ▽ More

    Submitted 2 February, 2024; v1 submitted 23 June, 2023; originally announced June 2023.

    Comments: Accepted by EACL 2024

  17. arXiv:2306.04765  [pdf, other

    cs.AI cs.CL

    The HCI Aspects of Public Deployment of Research Chatbots: A User Study, Design Recommendations, and Open Challenges

    Authors: Morteza Behrooz, William Ngan, Joshua Lane, Giuliano Morse, Benjamin Babcock, Kurt Shuster, Mojtaba Komeili, Moya Chen, Melanie Kambadur, Y-Lan Boureau, Jason Weston

    Abstract: Publicly deploying research chatbots is a nuanced topic involving necessary risk-benefit analyses. While there have recently been frequent discussions on whether it is responsible to deploy such models, there has been far less focus on the interaction paradigms and design approaches that the resulting interfaces should adopt, in order to achieve their goals more effectively. We aim to pose, ground… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

  18. arXiv:2306.04707  [pdf, other

    cs.CL cs.AI

    Improving Open Language Models by Learning from Organic Interactions

    Authors: **g Xu, Da Ju, Joshua Lane, Mojtaba Komeili, Eric Michael Smith, Megan Ung, Morteza Behrooz, William Ngan, Rashel Moritz, Sainbayar Sukhbaatar, Y-Lan Boureau, Jason Weston, Kurt Shuster

    Abstract: We present BlenderBot 3x, an update on the conversational model BlenderBot 3, which is now trained using organic conversation and feedback data from participating users of the system in order to improve both its skills and safety. We are publicly releasing the participating de-identified interaction data for use by the research community, in order to spur further progress. Training models with org… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

  19. arXiv:2305.05364  [pdf, other

    cs.LG cs.AI cs.CL

    Large Language Model Programs

    Authors: Imanol Schlag, Sainbayar Sukhbaatar, Asli Celikyilmaz, Wen-tau Yih, Jason Weston, Jürgen Schmidhuber, Xian Li

    Abstract: In recent years, large pre-trained language models (LLMs) have demonstrated the ability to follow instructions and perform novel tasks from a few examples. The possibility to parameterise an LLM through such in-context examples widens their capability at a much lower cost than finetuning. We extend this line of reasoning and present a method which further expands the capabilities of an LLM by embe… ▽ More

    Submitted 9 May, 2023; originally announced May 2023.

  20. arXiv:2305.00833  [pdf, other

    cs.LG cs.AI cs.CL

    Learning to Reason and Memorize with Self-Notes

    Authors: Jack Lanchantin, Shubham Toshniwal, Jason Weston, Arthur Szlam, Sainbayar Sukhbaatar

    Abstract: Large language models have been shown to struggle with multi-step reasoning, and do not retain previous reasoning steps for future use. We propose a simple method for solving both of these problems by allowing the model to take Self-Notes. Unlike recent chain-of-thought or scratchpad approaches, the model can deviate from the input context at any time to explicitly think and write down its thought… ▽ More

    Submitted 31 October, 2023; v1 submitted 1 May, 2023; originally announced May 2023.

  21. arXiv:2304.13835  [pdf, other

    cs.CL cs.LG

    Multi-Party Chat: Conversational Agents in Group Settings with Humans and Models

    Authors: Jimmy Wei, Kurt Shuster, Arthur Szlam, Jason Weston, Jack Urbanek, Mojtaba Komeili

    Abstract: Current dialogue research primarily studies pairwise (two-party) conversations, and does not address the everyday setting where more than two speakers converse together. In this work, we both collect and evaluate multi-party conversations to study this more general case. We use the LIGHT environment to construct grounded conversations, where each participant has an assigned character to role-play.… ▽ More

    Submitted 8 June, 2023; v1 submitted 26 April, 2023; originally announced April 2023.

  22. arXiv:2302.06784  [pdf, other

    cs.CL

    The Stable Entropy Hypothesis and Entropy-Aware Decoding: An Analysis and Algorithm for Robust Natural Language Generation

    Authors: Kushal Arora, Timothy J. O'Donnell, Doina Precup, Jason Weston, Jackie C. K. Cheung

    Abstract: State-of-the-art language generation models can degenerate when applied to open-ended generation problems such as text completion, story generation, or dialog modeling. This degeneration usually shows up in the form of incoherence, lack of vocabulary diversity, and self-repetition or copying from the context. In this paper, we postulate that ``human-like'' generations usually lie in a narrow and n… ▽ More

    Submitted 13 February, 2023; originally announced February 2023.

  23. arXiv:2301.05746  [pdf, other

    cs.CL cs.AI

    Infusing Commonsense World Models with Graph Knowledge

    Authors: Alexander Gurung, Mojtaba Komeili, Arthur Szlam, Jason Weston, Jack Urbanek

    Abstract: While language models have become more capable of producing compelling language, we find there are still gaps in maintaining consistency, especially when describing events in a dynamically changing world. We study the setting of generating narratives in an open world text adventure game, where a graph representation of the underlying game state can be used to train models that consume and output b… ▽ More

    Submitted 13 January, 2023; originally announced January 2023.

  24. arXiv:2211.05826  [pdf, other

    cs.CL cs.AI

    The CRINGE Loss: Learning what language not to model

    Authors: Leonard Adolphs, Tianyu Gao, **g Xu, Kurt Shuster, Sainbayar Sukhbaatar, Jason Weston

    Abstract: Standard language model training employs gold human documents or human-human interaction data, and treats all training data as positive examples. Growing evidence shows that even with very large amounts of positive training data, issues remain that can be alleviated with relatively small amounts of negative data -- examples of what the model should not do. In this work, we propose a novel procedur… ▽ More

    Submitted 10 November, 2022; originally announced November 2022.

  25. arXiv:2210.15893  [pdf, other

    cs.CL cs.AI

    When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad Responses into Good Labels

    Authors: Weiyan Shi, Emily Dinan, Kurt Shuster, Jason Weston, **g Xu

    Abstract: Deployed dialogue agents have the potential to integrate human feedback to continuously improve themselves. However, humans may not always provide explicit signals when the chatbot makes mistakes during interactions. In this work, we propose Juicer, a framework to make use of both binary and free-form textual human feedback. It works by: (i) extending sparse binary feedback by training a satisfact… ▽ More

    Submitted 28 October, 2022; originally announced October 2022.

  26. arXiv:2208.03295  [pdf, other

    cs.CL cs.AI

    Learning from data in the mixed adversarial non-adversarial case: Finding the helpers and ignoring the trolls

    Authors: Da Ju, **g Xu, Y-Lan Boureau, Jason Weston

    Abstract: The promise of interaction between intelligent conversational agents and humans is that models can learn from such feedback in order to improve. Unfortunately, such exchanges in the wild will not always involve human utterances that are benign or of high quality, and will include a mixture of engaged (helpers) and unengaged or even malicious users (trolls). In this work we study how to perform rob… ▽ More

    Submitted 5 August, 2022; originally announced August 2022.

  27. arXiv:2208.03270  [pdf, other

    cs.CL cs.AI

    Learning New Skills after Deployment: Improving open-domain internet-driven dialogue with human feedback

    Authors: **g Xu, Megan Ung, Mojtaba Komeili, Kushal Arora, Y-Lan Boureau, Jason Weston

    Abstract: Frozen models trained to mimic static datasets can never improve their performance. Models that can employ internet-retrieval for up-to-date information and obtain feedback from humans during deployment provide the promise of both adapting to new information, and improving their performance. In this work we study how to improve internet-driven conversational skills in such a learning framework. We… ▽ More

    Submitted 16 August, 2022; v1 submitted 5 August, 2022; originally announced August 2022.

  28. arXiv:2208.03188  [pdf, other

    cs.CL cs.AI

    BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage

    Authors: Kurt Shuster, **g Xu, Mojtaba Komeili, Da Ju, Eric Michael Smith, Stephen Roller, Megan Ung, Moya Chen, Kushal Arora, Joshua Lane, Morteza Behrooz, William Ngan, Spencer Poff, Naman Goyal, Arthur Szlam, Y-Lan Boureau, Melanie Kambadur, Jason Weston

    Abstract: We present BlenderBot 3, a 175B parameter dialogue model capable of open-domain conversation with access to the internet and a long-term memory, and having been trained on a large number of user defined tasks. We release both the model weights and code, and have also deployed the model on a public web page to interact with organic users. This technical report describes how the model was built (arc… ▽ More

    Submitted 10 August, 2022; v1 submitted 5 August, 2022; originally announced August 2022.

  29. arXiv:2206.07694  [pdf, other

    cs.CL

    DIRECTOR: Generator-Classifiers For Supervised Language Modeling

    Authors: Kushal Arora, Kurt Shuster, Sainbayar Sukhbaatar, Jason Weston

    Abstract: Current language models achieve low perplexity but their resulting generations still suffer from toxic responses, repetitiveness and contradictions. The standard language modeling setup fails to address these issues. In this paper, we introduce a new architecture, {\sc Director}, that consists of a unified generator-classifier with both a language modeling and a classification head for each output… ▽ More

    Submitted 25 November, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

  30. arXiv:2203.13224  [pdf, other

    cs.CL cs.AI

    Language Models that Seek for Knowledge: Modular Search & Generation for Dialogue and Prompt Completion

    Authors: Kurt Shuster, Mojtaba Komeili, Leonard Adolphs, Stephen Roller, Arthur Szlam, Jason Weston

    Abstract: Language models (LMs) have recently been shown to generate more factual responses by employing modularity (Zhou et al., 2021) in combination with retrieval (Adolphs et al., 2021). We extend the recent approach of Adolphs et al. (2021) to include internet search as a module. Our SeeKeR (Search engine->Knowledge->Response) method thus applies a single LM to three modular tasks in succession: search,… ▽ More

    Submitted 29 March, 2022; v1 submitted 24 March, 2022; originally announced March 2022.

  31. arXiv:2201.04723  [pdf, other

    cs.CL cs.AI

    Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents

    Authors: Eric Michael Smith, Orion Hsu, Rebecca Qian, Stephen Roller, Y-Lan Boureau, Jason Weston

    Abstract: At the heart of improving conversational AI is the open problem of how to evaluate conversations. Issues with automatic metrics are well known (Liu et al., 2016, arXiv:1603.08023), with human evaluations still considered the gold standard. Unfortunately, how to perform human evaluations is also an open problem: differing data collection methods have varying levels of human agreement and statistica… ▽ More

    Submitted 12 January, 2022; originally announced January 2022.

  32. arXiv:2112.05843  [pdf, other

    cs.CL

    Am I Me or You? State-of-the-Art Dialogue Models Cannot Maintain an Identity

    Authors: Kurt Shuster, Jack Urbanek, Arthur Szlam, Jason Weston

    Abstract: State-of-the-art dialogue models still often stumble with regards to factual accuracy and self-contradiction. Anecdotally, they have been observed to fail to maintain character identity throughout discourse; and more specifically, may take on the role of their interlocutor. In this work we formalize and quantify this deficiency, and show experimentally through human evaluations that this is indeed… ▽ More

    Submitted 10 December, 2021; originally announced December 2021.

  33. arXiv:2111.05204  [pdf, other

    cs.CL cs.AI cs.LG

    Reason first, then respond: Modular Generation for Knowledge-infused Dialogue

    Authors: Leonard Adolphs, Kurt Shuster, Jack Urbanek, Arthur Szlam, Jason Weston

    Abstract: Large language models can produce fluent dialogue but often hallucinate factual inaccuracies. While retrieval-augmented models help alleviate this issue, they still face a difficult challenge of both reasoning to provide correct knowledge and generating conversation simultaneously. In this work, we propose a modular model, Knowledge to Response (K2R), for incorporating knowledge into conversationa… ▽ More

    Submitted 9 November, 2021; originally announced November 2021.

  34. arXiv:2110.09456  [pdf, other

    cs.CL cs.AI

    NormFormer: Improved Transformer Pretraining with Extra Normalization

    Authors: Sam Shleifer, Jason Weston, Myle Ott

    Abstract: During pretraining, the Pre-LayerNorm transformer suffers from a gradient magnitude mismatch: gradients at early layers are much larger than at later layers. These issues can be alleviated by our proposed NormFormer architecture, which adds three normalization operations to each layer: a Layer Norm after self attention, head-wise scaling of self-attention outputs, and a Layer Norm after the first… ▽ More

    Submitted 1 November, 2021; v1 submitted 18 October, 2021; originally announced October 2021.

  35. arXiv:2107.08251  [pdf, other

    cs.CL cs.LG

    Generative Pretraining for Paraphrase Evaluation

    Authors: Jack Weston, Raphael Lenain, Udeepa Meepegama, Emil Fristed

    Abstract: We introduce ParaBLEU, a paraphrase representation learning model and evaluation metric for text generation. Unlike previous approaches, ParaBLEU learns to understand paraphrasis using generative conditioning as a pretraining objective. ParaBLEU correlates more strongly with human judgements than existing metrics, obtaining new state-of-the-art results on the 2017 WMT Metrics Shared Task. We show… ▽ More

    Submitted 24 July, 2021; v1 submitted 17 July, 2021; originally announced July 2021.

    Comments: Under review

  36. arXiv:2107.08248  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Learning De-identified Representations of Prosody from Raw Audio

    Authors: Jack Weston, Raphael Lenain, Udeepa Meepegama, Emil Fristed

    Abstract: We propose a method for learning de-identified prosody representations from raw audio using a contrastive self-supervised signal. Whereas prior work has relied on conditioning models on bottlenecks, we introduce a set of inductive biases that exploit the natural structure of prosody to minimize timbral information and decouple prosody from speaker representations. Despite aggressive downsampling o… ▽ More

    Submitted 17 July, 2021; originally announced July 2021.

    Comments: ICML 2021

    Journal ref: Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event. Proceedings of Machine Learning Research 139, PMLR 2021

  37. arXiv:2107.07567  [pdf, other

    cs.CL cs.AI

    Beyond Goldfish Memory: Long-Term Open-Domain Conversation

    Authors: **g Xu, Arthur Szlam, Jason Weston

    Abstract: Despite recent improvements in open-domain dialogue models, state of the art models are trained and evaluated on short conversations with little context. In contrast, the long-term conversation setting has hardly been studied. In this work we collect and release a human-human dataset consisting of multiple chat sessions whereby the speaking partners learn about each other's interests and discuss t… ▽ More

    Submitted 15 July, 2021; originally announced July 2021.

  38. arXiv:2107.07566  [pdf, other

    cs.AI cs.CL

    Internet-Augmented Dialogue Generation

    Authors: Mojtaba Komeili, Kurt Shuster, Jason Weston

    Abstract: The largest store of continually updating knowledge on our planet can be accessed via internet search. In this work we study giving access to this information to conversational agents. Large language models, even though they store an impressive amount of knowledge within their weights, are known to hallucinate facts when generating dialogue (Shuster et al., 2021); moreover, those facts are frozen… ▽ More

    Submitted 15 July, 2021; originally announced July 2021.

  39. arXiv:2106.04426  [pdf, other

    cs.LG cs.CL

    Hash Layers For Large Sparse Models

    Authors: Stephen Roller, Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston

    Abstract: We investigate the training of sparse layers that use different parameters for different inputs based on hashing in large Transformer models. Specifically, we modify the feedforward layer to hash to different sets of weights depending on the current token, over all tokens in the sequence. We show that this procedure either outperforms or is competitive with learning-to-route mixture-of-expert meth… ▽ More

    Submitted 20 July, 2021; v1 submitted 8 June, 2021; originally announced June 2021.

  40. arXiv:2106.04279  [pdf, other

    cs.LG cs.CL

    Staircase Attention for Recurrent Processing of Sequences

    Authors: Da Ju, Stephen Roller, Sainbayar Sukhbaatar, Jason Weston

    Abstract: Attention mechanisms have become a standard tool for sequence modeling tasks, in particular by stacking self-attention layers over the entire input sequence as in the Transformer architecture. In this work we introduce a novel attention procedure called staircase attention that, unlike self-attention, operates across the sequence (in time) recurrently processing the input by adding another step of… ▽ More

    Submitted 8 June, 2021; originally announced June 2021.

  41. arXiv:2105.06548  [pdf, other

    cs.LG cs.AI

    Not All Memories are Created Equal: Learning to Forget by Expiring

    Authors: Sainbayar Sukhbaatar, Da Ju, Spencer Poff, Stephen Roller, Arthur Szlam, Jason Weston, Angela Fan

    Abstract: Attention mechanisms have shown promising results in sequence modeling tasks that require long-term memory. Recent work investigated mechanisms to reduce the computational cost of preserving and storing memories. However, not all content in the past is equally important to remember. We propose Expire-Span, a method that learns to retain the most important information and expire the irrelevant info… ▽ More

    Submitted 13 June, 2021; v1 submitted 13 May, 2021; originally announced May 2021.

  42. arXiv:2104.07567  [pdf, other

    cs.CL cs.AI

    Retrieval Augmentation Reduces Hallucination in Conversation

    Authors: Kurt Shuster, Spencer Poff, Moya Chen, Douwe Kiela, Jason Weston

    Abstract: Despite showing increasingly human-like conversational abilities, state-of-the-art dialogue models often suffer from factual incorrectness and hallucination of knowledge (Roller et al., 2020). In this work we explore the use of neural-retrieval-in-the-loop architectures - recently shown to be effective in open-domain QA (Lewis et al., 2020b; Izacard and Grave, 2020) - for knowledge-grounded dialog… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

  43. arXiv:2104.00641  [pdf

    stat.ML cs.LG

    Dynamic Silos: Increased Modularity in Intra-organizational Communication Networks during the Covid-19 Pandemic

    Authors: Tiona Zuzul, Emily Cox Pahnke, Jonathan Larson, Patrick Bourke, Nicholas Caurvina, Neha Parikh Shah, Fereshteh Amini, Jeffrey Weston, Youngser Park, Joshua Vogelstein, Christopher White, Carey E. Priebe

    Abstract: Workplace communications around the world were drastically altered by Covid-19, related work-from-home orders, and the rise of remote work. To understand these shifts, we analyzed aggregated, anonymized metadata from over 360 billion emails within 4,361 organizations worldwide. By comparing month-to-month and year-over-year metrics, we examined changes in network community structures over 24 month… ▽ More

    Submitted 28 July, 2023; v1 submitted 1 April, 2021; originally announced April 2021.

    Comments: 48 pages, 15 figures

  44. arXiv:2012.13391  [pdf, other

    cs.CL cs.AI cs.LG

    I like fish, especially dolphins: Addressing Contradictions in Dialogue Modeling

    Authors: Yixin Nie, Mary Williamson, Mohit Bansal, Douwe Kiela, Jason Weston

    Abstract: To quantify how well natural language understanding models can capture consistency in a general conversation, we introduce the DialoguE COntradiction DEtection task (DECODE) and a new conversational dataset containing both human-human and human-bot contradictory dialogues. We then compare a structured utterance-based approach of using pre-trained Transformer models for contradiction detection with… ▽ More

    Submitted 28 December, 2020; v1 submitted 24 December, 2020; originally announced December 2020.

    Comments: 15 pages

  45. arXiv:2010.07079  [pdf, other

    cs.CL cs.AI

    Recipes for Safety in Open-domain Chatbots

    Authors: **g Xu, Da Ju, Margaret Li, Y-Lan Boureau, Jason Weston, Emily Dinan

    Abstract: Models trained on large unlabeled corpora of human interactions will learn patterns and mimic behaviors therein, which include offensive or otherwise toxic behavior and unwanted biases. We investigate a variety of methods to mitigate these issues in the context of open-domain generative dialogue models. We introduce a new human-and-model-in-the-loop framework for both training safer models and for… ▽ More

    Submitted 4 August, 2021; v1 submitted 14 October, 2020; originally announced October 2020.

  46. arXiv:2010.01082  [pdf, other

    cs.CL cs.AI

    Multi-Modal Open-Domain Dialogue

    Authors: Kurt Shuster, Eric Michael Smith, Da Ju, Jason Weston

    Abstract: Recent work in open-domain conversational agents has demonstrated that significant improvements in model engagingness and humanness metrics can be achieved via massive scaling in both pre-training data and model size (Adiwardana et al., 2020; Roller et al., 2020). However, if we want to build agents with human-like abilities, we must expand beyond handling just text. A particularly important topic… ▽ More

    Submitted 2 October, 2020; originally announced October 2020.

  47. arXiv:2010.00685  [pdf, other

    cs.CL cs.AI

    How to Motivate Your Dragon: Teaching Goal-Driven Agents to Speak and Act in Fantasy Worlds

    Authors: Prithviraj Ammanabrolu, Jack Urbanek, Margaret Li, Arthur Szlam, Tim Rocktäschel, Jason Weston

    Abstract: We seek to create agents that both act and communicate with other agents in pursuit of a goal. Towards this end, we extend LIGHT (Urbanek et al. 2019) -- a large-scale crowd-sourced fantasy text-game -- with a dataset of quests. These contain natural language motivations paired with in-game goals and human demonstrations; completing a quest might require dialogue or actions (or both). We introduce… ▽ More

    Submitted 25 May, 2021; v1 submitted 1 October, 2020; originally announced October 2020.

    Comments: In NAACL 2021

  48. arXiv:2008.08076  [pdf, other

    cs.AI cs.CL

    Deploying Lifelong Open-Domain Dialogue Learning

    Authors: Kurt Shuster, Jack Urbanek, Emily Dinan, Arthur Szlam, Jason Weston

    Abstract: Much of NLP research has focused on crowdsourced static datasets and the supervised learning paradigm of training once and then evaluating test performance. As argued in de Vries et al. (2020), crowdsourced data has the issues of lack of naturalness and relevance to real-world use cases, while the static dataset paradigm does not allow for a model to learn from its experiences of using language (S… ▽ More

    Submitted 19 August, 2020; v1 submitted 18 August, 2020; originally announced August 2020.

  49. arXiv:2007.15584  [pdf

    cs.CY cs.HC cs.SE

    How Work From Home Affects Collaboration: A Large-Scale Study of Information Workers in a Natural Experiment During COVID-19

    Authors: Longqi Yang, Sonia Jaffe, David Holtz, Siddharth Suri, Shilpi Sinha, Jeffrey Weston, Connor Joyce, Neha Shah, Kevin Sherman, CJ Lee, Brent Hecht, Jaime Teevan

    Abstract: The COVID-19 pandemic has had a wide-ranging impact on information workers such as higher stress levels, increased workloads, new workstreams, and more caregiving responsibilities during lockdown. COVID-19 also caused the overwhelming majority of information workers to rapidly shift to working from home (WFH). The central question this work addresses is: can we isolate the effects of WFH on inform… ▽ More

    Submitted 30 July, 2020; originally announced July 2020.

    Journal ref: Nature Human Behaviour (2021)

  50. arXiv:2006.12442  [pdf, other

    cs.CL cs.AI

    Open-Domain Conversational Agents: Current Progress, Open Problems, and Future Directions

    Authors: Stephen Roller, Y-Lan Boureau, Jason Weston, Antoine Bordes, Emily Dinan, Angela Fan, David Gunning, Da Ju, Margaret Li, Spencer Poff, Pratik Ringshia, Kurt Shuster, Eric Michael Smith, Arthur Szlam, Jack Urbanek, Mary Williamson

    Abstract: We present our view of what is necessary to build an engaging open-domain conversational agent: covering the qualities of such an agent, the pieces of the puzzle that have been built so far, and the ga** holes we have not filled yet. We present a biased view, focusing on work done by our own group, while citing related work in each area. In particular, we discuss in detail the properties of cont… ▽ More

    Submitted 13 July, 2020; v1 submitted 22 June, 2020; originally announced June 2020.