Skip to main content

Showing 1–28 of 28 results for author: Shuster, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2306.04765  [pdf, other

    cs.AI cs.CL

    The HCI Aspects of Public Deployment of Research Chatbots: A User Study, Design Recommendations, and Open Challenges

    Authors: Morteza Behrooz, William Ngan, Joshua Lane, Giuliano Morse, Benjamin Babcock, Kurt Shuster, Mojtaba Komeili, Moya Chen, Melanie Kambadur, Y-Lan Boureau, Jason Weston

    Abstract: Publicly deploying research chatbots is a nuanced topic involving necessary risk-benefit analyses. While there have recently been frequent discussions on whether it is responsible to deploy such models, there has been far less focus on the interaction paradigms and design approaches that the resulting interfaces should adopt, in order to achieve their goals more effectively. We aim to pose, ground… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

  2. arXiv:2306.04707  [pdf, other

    cs.CL cs.AI

    Improving Open Language Models by Learning from Organic Interactions

    Authors: **g Xu, Da Ju, Joshua Lane, Mojtaba Komeili, Eric Michael Smith, Megan Ung, Morteza Behrooz, William Ngan, Rashel Moritz, Sainbayar Sukhbaatar, Y-Lan Boureau, Jason Weston, Kurt Shuster

    Abstract: We present BlenderBot 3x, an update on the conversational model BlenderBot 3, which is now trained using organic conversation and feedback data from participating users of the system in order to improve both its skills and safety. We are publicly releasing the participating de-identified interaction data for use by the research community, in order to spur further progress. Training models with org… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

  3. arXiv:2304.13835  [pdf, other

    cs.CL cs.LG

    Multi-Party Chat: Conversational Agents in Group Settings with Humans and Models

    Authors: Jimmy Wei, Kurt Shuster, Arthur Szlam, Jason Weston, Jack Urbanek, Mojtaba Komeili

    Abstract: Current dialogue research primarily studies pairwise (two-party) conversations, and does not address the everyday setting where more than two speakers converse together. In this work, we both collect and evaluate multi-party conversations to study this more general case. We use the LIGHT environment to construct grounded conversations, where each participant has an assigned character to role-play.… ▽ More

    Submitted 8 June, 2023; v1 submitted 26 April, 2023; originally announced April 2023.

  4. arXiv:2212.12017  [pdf, other

    cs.CL

    OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization

    Authors: Srinivasan Iyer, Xi Victoria Lin, Ramakanth Pasunuru, Todor Mihaylov, Daniel Simig, ** Yu, Kurt Shuster, Tianlu Wang, Qing Liu, Punit Singh Koura, Xian Li, Brian O'Horo, Gabriel Pereyra, Jeff Wang, Christopher Dewan, Asli Celikyilmaz, Luke Zettlemoyer, Ves Stoyanov

    Abstract: Recent work has shown that fine-tuning large pre-trained language models on a collection of tasks described via instructions, a.k.a. instruction-tuning, improves their zero and few-shot generalization to unseen tasks. However, there is a limited understanding of the performance trade-offs of different decisions made during the instruction-tuning process. These decisions include the scale and diver… ▽ More

    Submitted 30 January, 2023; v1 submitted 22 December, 2022; originally announced December 2022.

    Comments: 56 pages. v2->v3: fix OPT-30B evaluation results across benchmarks (previously we reported lower performance of this model due to an evaluation pipeline bug)

  5. arXiv:2212.11353  [pdf, other

    cs.CL cs.LG

    Contrastive Distillation Is a Sample-Efficient Self-Supervised Loss Policy for Transfer Learning

    Authors: Chris Lengerich, Gabriel Synnaeve, Amy Zhang, Hugh Leather, Kurt Shuster, François Charton, Charysse Redwood

    Abstract: Traditional approaches to RL have focused on learning decision policies directly from episodic decisions, while slowly and implicitly learning the semantics of compositional representations needed for generalization. While some approaches have been adopted to refine representations via auxiliary self-supervised losses while simultaneously learning decision policies, learning compositional represen… ▽ More

    Submitted 21 December, 2022; originally announced December 2022.

  6. arXiv:2211.05826  [pdf, other

    cs.CL cs.AI

    The CRINGE Loss: Learning what language not to model

    Authors: Leonard Adolphs, Tianyu Gao, **g Xu, Kurt Shuster, Sainbayar Sukhbaatar, Jason Weston

    Abstract: Standard language model training employs gold human documents or human-human interaction data, and treats all training data as positive examples. Growing evidence shows that even with very large amounts of positive training data, issues remain that can be alleviated with relatively small amounts of negative data -- examples of what the model should not do. In this work, we propose a novel procedur… ▽ More

    Submitted 10 November, 2022; originally announced November 2022.

  7. arXiv:2210.15893  [pdf, other

    cs.CL cs.AI

    When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad Responses into Good Labels

    Authors: Weiyan Shi, Emily Dinan, Kurt Shuster, Jason Weston, **g Xu

    Abstract: Deployed dialogue agents have the potential to integrate human feedback to continuously improve themselves. However, humans may not always provide explicit signals when the chatbot makes mistakes during interactions. In this work, we propose Juicer, a framework to make use of both binary and free-form textual human feedback. It works by: (i) extending sparse binary feedback by training a satisfact… ▽ More

    Submitted 28 October, 2022; originally announced October 2022.

  8. arXiv:2208.03188  [pdf, other

    cs.CL cs.AI

    BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage

    Authors: Kurt Shuster, **g Xu, Mojtaba Komeili, Da Ju, Eric Michael Smith, Stephen Roller, Megan Ung, Moya Chen, Kushal Arora, Joshua Lane, Morteza Behrooz, William Ngan, Spencer Poff, Naman Goyal, Arthur Szlam, Y-Lan Boureau, Melanie Kambadur, Jason Weston

    Abstract: We present BlenderBot 3, a 175B parameter dialogue model capable of open-domain conversation with access to the internet and a long-term memory, and having been trained on a large number of user defined tasks. We release both the model weights and code, and have also deployed the model on a public web page to interact with organic users. This technical report describes how the model was built (arc… ▽ More

    Submitted 10 August, 2022; v1 submitted 5 August, 2022; originally announced August 2022.

  9. arXiv:2206.07694  [pdf, other

    cs.CL

    DIRECTOR: Generator-Classifiers For Supervised Language Modeling

    Authors: Kushal Arora, Kurt Shuster, Sainbayar Sukhbaatar, Jason Weston

    Abstract: Current language models achieve low perplexity but their resulting generations still suffer from toxic responses, repetitiveness and contradictions. The standard language modeling setup fails to address these issues. In this paper, we introduce a new architecture, {\sc Director}, that consists of a unified generator-classifier with both a language modeling and a classification head for each output… ▽ More

    Submitted 25 November, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

  10. arXiv:2205.01068  [pdf, other

    cs.CL cs.LG

    OPT: Open Pre-trained Transformer Language Models

    Authors: Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh Koura, Anjali Sridhar, Tianlu Wang, Luke Zettlemoyer

    Abstract: Large language models, which are often trained for hundreds of thousands of compute days, have shown remarkable capabilities for zero- and few-shot learning. Given their computational cost, these models are difficult to replicate without significant capital. For the few that are available through APIs, no access is granted to the full model weights, making them difficult to study. We present Open… ▽ More

    Submitted 21 June, 2022; v1 submitted 2 May, 2022; originally announced May 2022.

  11. arXiv:2203.13224  [pdf, other

    cs.CL cs.AI

    Language Models that Seek for Knowledge: Modular Search & Generation for Dialogue and Prompt Completion

    Authors: Kurt Shuster, Mojtaba Komeili, Leonard Adolphs, Stephen Roller, Arthur Szlam, Jason Weston

    Abstract: Language models (LMs) have recently been shown to generate more factual responses by employing modularity (Zhou et al., 2021) in combination with retrieval (Adolphs et al., 2021). We extend the recent approach of Adolphs et al. (2021) to include internet search as a module. Our SeeKeR (Search engine->Knowledge->Response) method thus applies a single LM to three modular tasks in succession: search,… ▽ More

    Submitted 29 March, 2022; v1 submitted 24 March, 2022; originally announced March 2022.

  12. arXiv:2112.05843  [pdf, other

    cs.CL

    Am I Me or You? State-of-the-Art Dialogue Models Cannot Maintain an Identity

    Authors: Kurt Shuster, Jack Urbanek, Arthur Szlam, Jason Weston

    Abstract: State-of-the-art dialogue models still often stumble with regards to factual accuracy and self-contradiction. Anecdotally, they have been observed to fail to maintain character identity throughout discourse; and more specifically, may take on the role of their interlocutor. In this work we formalize and quantify this deficiency, and show experimentally through human evaluations that this is indeed… ▽ More

    Submitted 10 December, 2021; originally announced December 2021.

  13. arXiv:2111.05204  [pdf, other

    cs.CL cs.AI cs.LG

    Reason first, then respond: Modular Generation for Knowledge-infused Dialogue

    Authors: Leonard Adolphs, Kurt Shuster, Jack Urbanek, Arthur Szlam, Jason Weston

    Abstract: Large language models can produce fluent dialogue but often hallucinate factual inaccuracies. While retrieval-augmented models help alleviate this issue, they still face a difficult challenge of both reasoning to provide correct knowledge and generating conversation simultaneously. In this work, we propose a modular model, Knowledge to Response (K2R), for incorporating knowledge into conversationa… ▽ More

    Submitted 9 November, 2021; originally announced November 2021.

  14. arXiv:2107.07566  [pdf, other

    cs.AI cs.CL

    Internet-Augmented Dialogue Generation

    Authors: Mojtaba Komeili, Kurt Shuster, Jason Weston

    Abstract: The largest store of continually updating knowledge on our planet can be accessed via internet search. In this work we study giving access to this information to conversational agents. Large language models, even though they store an impressive amount of knowledge within their weights, are known to hallucinate facts when generating dialogue (Shuster et al., 2021); moreover, those facts are frozen… ▽ More

    Submitted 15 July, 2021; originally announced July 2021.

  15. arXiv:2104.07567  [pdf, other

    cs.CL cs.AI

    Retrieval Augmentation Reduces Hallucination in Conversation

    Authors: Kurt Shuster, Spencer Poff, Moya Chen, Douwe Kiela, Jason Weston

    Abstract: Despite showing increasingly human-like conversational abilities, state-of-the-art dialogue models often suffer from factual incorrectness and hallucination of knowledge (Roller et al., 2020). In this work we explore the use of neural-retrieval-in-the-loop architectures - recently shown to be effective in open-domain QA (Lewis et al., 2020b; Izacard and Grave, 2020) - for knowledge-grounded dialog… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

  16. arXiv:2010.01082  [pdf, other

    cs.CL cs.AI

    Multi-Modal Open-Domain Dialogue

    Authors: Kurt Shuster, Eric Michael Smith, Da Ju, Jason Weston

    Abstract: Recent work in open-domain conversational agents has demonstrated that significant improvements in model engagingness and humanness metrics can be achieved via massive scaling in both pre-training data and model size (Adiwardana et al., 2020; Roller et al., 2020). However, if we want to build agents with human-like abilities, we must expand beyond handling just text. A particularly important topic… ▽ More

    Submitted 2 October, 2020; originally announced October 2020.

  17. arXiv:2008.08076  [pdf, other

    cs.AI cs.CL

    Deploying Lifelong Open-Domain Dialogue Learning

    Authors: Kurt Shuster, Jack Urbanek, Emily Dinan, Arthur Szlam, Jason Weston

    Abstract: Much of NLP research has focused on crowdsourced static datasets and the supervised learning paradigm of training once and then evaluating test performance. As argued in de Vries et al. (2020), crowdsourced data has the issues of lack of naturalness and relevance to real-world use cases, while the static dataset paradigm does not allow for a model to learn from its experiences of using language (S… ▽ More

    Submitted 19 August, 2020; v1 submitted 18 August, 2020; originally announced August 2020.

  18. arXiv:2006.12442  [pdf, other

    cs.CL cs.AI

    Open-Domain Conversational Agents: Current Progress, Open Problems, and Future Directions

    Authors: Stephen Roller, Y-Lan Boureau, Jason Weston, Antoine Bordes, Emily Dinan, Angela Fan, David Gunning, Da Ju, Margaret Li, Spencer Poff, Pratik Ringshia, Kurt Shuster, Eric Michael Smith, Arthur Szlam, Jack Urbanek, Mary Williamson

    Abstract: We present our view of what is necessary to build an engaging open-domain conversational agent: covering the qualities of such an agent, the pieces of the puzzle that have been built so far, and the ga** holes we have not filled yet. We present a biased view, focusing on work done by our own group, while citing related work in each area. In particular, we discuss in detail the properties of cont… ▽ More

    Submitted 13 July, 2020; v1 submitted 22 June, 2020; originally announced June 2020.

  19. arXiv:2004.13637  [pdf, other

    cs.CL cs.AI

    Recipes for building an open-domain chatbot

    Authors: Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, **g Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston

    Abstract: Building open-domain chatbots is a challenging area for machine learning research. While prior work has shown that scaling neural models in the number of parameters and the size of the data they are trained on gives improved results, we show that other ingredients are important for a high-performing chatbot. Good conversation requires a number of skills that an expert conversationalist blends in a… ▽ More

    Submitted 30 April, 2020; v1 submitted 28 April, 2020; originally announced April 2020.

  20. arXiv:2004.08449  [pdf, other

    cs.CL

    Can You Put it All Together: Evaluating Conversational Agents' Ability to Blend Skills

    Authors: Eric Michael Smith, Mary Williamson, Kurt Shuster, Jason Weston, Y-Lan Boureau

    Abstract: Being engaging, knowledgeable, and empathetic are all desirable general qualities in a conversational agent. Previous work has introduced tasks and datasets that aim to help agents to learn those qualities in isolation and gauge how well they can express them. But rather than being specialized in one single quality, a good open-domain conversational agent should be able to seamlessly blend them al… ▽ More

    Submitted 17 April, 2020; originally announced April 2020.

    Comments: accepted to ACL 2020 (long paper)

  21. arXiv:1912.12394  [pdf, other

    cs.CL cs.CV cs.LG

    All-in-One Image-Grounded Conversational Agents

    Authors: Da Ju, Kurt Shuster, Y-Lan Boureau, Jason Weston

    Abstract: As single-task accuracy on individual language and image tasks has improved substantially in the last few years, the long-term goal of a generally skilled agent that can both see and talk becomes more feasible to explore. In this work, we focus on leveraging individual language and image tasks, along with resources that incorporate both vision and language towards that objective. We design an arch… ▽ More

    Submitted 15 January, 2020; v1 submitted 27 December, 2019; originally announced December 2019.

  22. arXiv:1911.03768  [pdf, other

    cs.CL cs.AI

    The Dialogue Dodecathlon: Open-Domain Knowledge and Image Grounded Conversational Agents

    Authors: Kurt Shuster, Da Ju, Stephen Roller, Emily Dinan, Y-Lan Boureau, Jason Weston

    Abstract: We introduce dodecaDialogue: a set of 12 tasks that measures if a conversational agent can communicate engagingly with personality and empathy, ask questions, answer questions by utilizing knowledge resources, discuss topics and situations, and perceive and converse about images. By multi-tasking on such a broad large-scale set of data, we hope to both move towards and measure progress in producin… ▽ More

    Submitted 28 April, 2020; v1 submitted 9 November, 2019; originally announced November 2019.

    Comments: ACL 2020

  23. arXiv:1905.01969  [pdf, other

    cs.CL cs.AI

    Poly-encoders: Transformer Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring

    Authors: Samuel Humeau, Kurt Shuster, Marie-Anne Lachaux, Jason Weston

    Abstract: The use of deep pre-trained bidirectional transformers has led to remarkable progress in a number of applications (Devlin et al., 2018). For tasks that make pairwise comparisons between sequences, matching a given input with a corresponding label, two approaches are common: Cross-encoders performing full self-attention over the pair and Bi-encoders encoding the pair separately. The former often pe… ▽ More

    Submitted 25 March, 2020; v1 submitted 21 April, 2019; originally announced May 2019.

    Comments: ICLR 2020

  24. arXiv:1902.00098  [pdf, other

    cs.AI cs.CL cs.HC

    The Second Conversational Intelligence Challenge (ConvAI2)

    Authors: Emily Dinan, Varvara Logacheva, Valentin Malykh, Alexander Miller, Kurt Shuster, Jack Urbanek, Douwe Kiela, Arthur Szlam, Iulian Serban, Ryan Lowe, Shrimai Prabhumoye, Alan W Black, Alexander Rudnicky, Jason Williams, Joelle Pineau, Mikhail Burtsev, Jason Weston

    Abstract: We describe the setting and results of the ConvAI2 NeurIPS competition that aims to further the state-of-the-art in open-domain chatbots. Some key takeaways from the competition are: (i) pretrained Transformer variants are currently the best performing models on this task, (ii) but to improve performance on multi-turn conversations with humans, future systems must go beyond single word metrics lik… ▽ More

    Submitted 31 January, 2019; originally announced February 2019.

  25. arXiv:1811.01241  [pdf, other

    cs.CL

    Wizard of Wikipedia: Knowledge-Powered Conversational agents

    Authors: Emily Dinan, Stephen Roller, Kurt Shuster, Angela Fan, Michael Auli, Jason Weston

    Abstract: In open-domain dialogue intelligent agents should exhibit the use of knowledge, however there are few convincing demonstrations of this to date. The most popular sequence to sequence models typically "generate and hope" generic utterances that can be memorized in the weights of the model when map** from input utterance(s) to output, rather than employing recalled knowledge as context. Use of kno… ▽ More

    Submitted 21 February, 2019; v1 submitted 3 November, 2018; originally announced November 2018.

  26. arXiv:1811.00945  [pdf, other

    cs.CL

    Image Chat: Engaging Grounded Conversations

    Authors: Kurt Shuster, Samuel Humeau, Antoine Bordes, Jason Weston

    Abstract: To achieve the long-term goal of machines being able to engage humans in conversation, our models should captivate the interest of their speaking partners. Communication grounded in images, whereby a dialogue is conducted based on a given photo, is a setup naturally appealing to humans (Hu et al., 2014). In this work we study large-scale architectures and datasets for this goal. We test a set of n… ▽ More

    Submitted 29 April, 2020; v1 submitted 2 November, 2018; originally announced November 2018.

    Comments: ACL 2020

  27. arXiv:1810.10665  [pdf, other

    cs.CV cs.AI cs.CL

    Engaging Image Captioning Via Personality

    Authors: Kurt Shuster, Samuel Humeau, Hexiang Hu, Antoine Bordes, Jason Weston

    Abstract: Standard image captioning tasks such as COCO and Flickr30k are factual, neutral in tone and (to a human) state the obvious (e.g., "a man playing a guitar"). While such tasks are useful to verify that a machine understands the content of an image, they are not engaging to humans as captions. With this in mind we define a new task, Personality-Captions, where the goal is to be as engaging to humans… ▽ More

    Submitted 20 March, 2019; v1 submitted 24 October, 2018; originally announced October 2018.

  28. arXiv:1807.03367  [pdf, other

    cs.AI cs.CL cs.CV cs.LG

    Talk the Walk: Navigating New York City through Grounded Dialogue

    Authors: Harm de Vries, Kurt Shuster, Dhruv Batra, Devi Parikh, Jason Weston, Douwe Kiela

    Abstract: We introduce "Talk The Walk", the first large-scale dialogue dataset grounded in action and perception. The task involves two agents (a "guide" and a "tourist") that communicate via natural language in order to achieve a common goal: having the tourist navigate to a given target location. The task and dataset, which are described in detail, are challenging and their full solution is an open proble… ▽ More

    Submitted 23 December, 2018; v1 submitted 9 July, 2018; originally announced July 2018.