Skip to main content

Showing 1–50 of 160 results for author: Weston, J

.
  1. arXiv:2406.17744  [pdf, other

    cs.CL

    Following Length Constraints in Instructions

    Authors: Weizhe Yuan, Ilia Kulikov, ** Yu, Kyunghyun Cho, Sainbayar Sukhbaatar, Jason Weston, **g Xu

    Abstract: Aligned instruction following models can better fulfill user requests than their unaligned counterparts. However, it has been shown that there is a length bias in evaluation of such models, and that training algorithms tend to exploit this bias by learning longer responses. In this work we show how to train models that can be controlled at inference time with instructions containing desired length… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 13 pages

  2. arXiv:2405.18719  [pdf, other

    cs.CL cs.AI

    Contextual Position Encoding: Learning to Count What's Important

    Authors: Olga Golovneva, Tianlu Wang, Jason Weston, Sainbayar Sukhbaatar

    Abstract: The attention mechanism is a critical component of Large Language Models (LLMs) that allows tokens in a sequence to interact with each other, but is order-invariant. Incorporating position encoding (PE) makes it possible to address by position, such as attending to the i-th token. However, current PE methods use token counts to derive position, and thus cannot generalize to higher levels of abstra… ▽ More

    Submitted 30 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

  3. arXiv:2404.19733  [pdf, other

    cs.CL cs.AI

    Iterative Reasoning Preference Optimization

    Authors: Richard Yuanzhe Pang, Weizhe Yuan, Kyunghyun Cho, He He, Sainbayar Sukhbaatar, Jason Weston

    Abstract: Iterative preference optimization methods have recently been shown to perform well for general instruction tuning tasks, but typically make little improvement on reasoning tasks (Yuan et al., 2024, Chen et al., 2024). In this work we develop an iterative approach that optimizes the preference between competing generated Chain-of-Thought (CoT) candidates by optimizing for winning vs. losing reasoni… ▽ More

    Submitted 25 June, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

  4. arXiv:2404.10660  [pdf, other

    astro-ph.HE

    Discovery of the optical and radio counterpart to the fast X-ray transient EP240315a

    Authors: J. H. Gillanders, L. Rhodes, S. Srivastav, F. Carotenuto, J. Bright, M. E. Huber, H. F. Stevance, S. J. Smartt, K. C. Chambers, T. -W. Chen, R. Fender, A. Andersson, A. J. Cooper, P. G. Jonker, F. J. Cowie, T. deBoer, N. Erasmus, M. D. Fulton, H. Gao, J. Herman, C. -C. Lin, T. Lowe, E. A. Magnier, H. -Y. Miao, P. Minguez , et al. (14 additional authors not shown)

    Abstract: Fast X-ray Transients (FXTs) are extragalactic bursts of soft X-rays first identified >10 years ago. Since then, nearly 40 events have been discovered, although almost all of these have been recovered from archival Chandra and XMM-Newton data. To date, optical sky surveys and follow-up searches have not revealed any multi-wavelength counterparts. The Einstein Probe, launched in January 2024, has s… ▽ More

    Submitted 19 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: Updated to match version accepted for publication in ApJL (17 pages, 4 figures, 2 tables)

  5. arXiv:2403.13799  [pdf, other

    cs.CL cs.AI

    Reverse Training to Nurse the Reversal Curse

    Authors: Olga Golovneva, Zeyuan Allen-Zhu, Jason Weston, Sainbayar Sukhbaatar

    Abstract: Large language models (LLMs) have a surprising failure: when trained on "A has a feature B", they do not generalize to "B is a feature of A", which is termed the Reversal Curse. Even when training with trillions of tokens this issue still appears due to Zipf's law - hence even if we train on the entire internet. This work proposes an alternative training scheme, called reverse training, whereby al… ▽ More

    Submitted 7 May, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

  6. arXiv:2403.07816  [pdf, other

    cs.CL cs.AI

    Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM

    Authors: Sainbayar Sukhbaatar, Olga Golovneva, Vasu Sharma, Hu Xu, Xi Victoria Lin, Baptiste Rozière, Jacob Kahn, Daniel Li, Wen-tau Yih, Jason Weston, Xian Li

    Abstract: We investigate efficient methods for training Large Language Models (LLMs) to possess capabilities in multiple specialized domains, such as coding, math reasoning and world knowledge. Our method, named Branch-Train-MiX (BTX), starts from a seed model, which is branched to train experts in embarrassingly parallel fashion with high throughput and reduced communication cost. After individual experts… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  7. arXiv:2402.14158  [pdf, other

    cs.CL

    TOOLVERIFIER: Generalization to New Tools via Self-Verification

    Authors: Dheeraj Mekala, Jason Weston, Jack Lanchantin, Roberta Raileanu, Maria Lomeli, **gbo Shang, Jane Dwivedi-Yu

    Abstract: Teaching language models to use tools is an important milestone towards building general assistants, but remains an open problem. While there has been significant progress on learning to use specific tools via fine-tuning, language models still struggle with learning how to robustly use new tools from only a few demonstrations. In this work we introduce a self-verification method which distinguish… ▽ More

    Submitted 13 March, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

  8. arXiv:2401.10020  [pdf, other

    cs.CL cs.AI

    Self-Rewarding Language Models

    Authors: Weizhe Yuan, Richard Yuanzhe Pang, Kyunghyun Cho, Xian Li, Sainbayar Sukhbaatar, **g Xu, Jason Weston

    Abstract: We posit that to achieve superhuman agents, future models require superhuman feedback in order to provide an adequate training signal. Current approaches commonly train reward models from human preferences, which may then be bottlenecked by human performance level, and secondly these separate frozen reward models cannot then learn to improve during LLM training. In this work, we study Self-Rewardi… ▽ More

    Submitted 8 February, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

  9. arXiv:2401.09549  [pdf, other

    cond-mat.mes-hall

    Interferometric Single-Shot Parity Measurement in an InAs-Al Hybrid Device

    Authors: Morteza Aghaee, Alejandro Alcaraz Ramirez, Zulfi Alam, Rizwan Ali, Mariusz Andrzejczuk, Andrey Antipov, Mikhail Astafev, Amin Barzegar, Bela Bauer, Jonathan Becker, Umesh Kumar Bhaskar, Alex Bocharov, Srini Boddapati, David Bohn, Jouri Bommer, Leo Bourdet, Arnaud Bousquet, Samuel Boutin, Lucas Casparis, Benjamin James Chapman, Sohail Chatoor, Anna Wulff Christensen, Cassandra Chua, Patrick Codd, William Cole , et al. (137 additional authors not shown)

    Abstract: The fusion of non-Abelian anyons or topological defects is a fundamental operation in measurement-only topological quantum computation. In topological superconductors, this operation amounts to a determination of the shared fermion parity of Majorana zero modes. As a step towards this, we implement a single-shot interferometric measurement of fermion parity in indium arsenide-aluminum heterostruct… ▽ More

    Submitted 2 April, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

    Comments: Added data on a second measurement of device A and a measurement of device B, expanded discussion of a trivial scenario. Refs added, author list updated

  10. arXiv:2312.16682  [pdf, other

    cs.CL cs.AI

    Some things are more CRINGE than others: Iterative Preference Optimization with the Pairwise Cringe Loss

    Authors: **g Xu, Andrew Lee, Sainbayar Sukhbaatar, Jason Weston

    Abstract: Practitioners commonly align large language models using pairwise preferences, i.e., given labels of the type response A is preferred to response B for a given input. Perhaps less commonly, methods have also been developed for binary feedback, i.e. training models given labels of type response A is good or bad. We show how an existing performant binary feedback method, the Cringe Loss (Adolphs et… ▽ More

    Submitted 22 April, 2024; v1 submitted 27 December, 2023; originally announced December 2023.

  11. arXiv:2311.11829  [pdf, other

    cs.CL cs.AI cs.LG

    System 2 Attention (is something you might need too)

    Authors: Jason Weston, Sainbayar Sukhbaatar

    Abstract: Soft attention in Transformer-based Large Language Models (LLMs) is susceptible to incorporating irrelevant information from the context into its latent representations, which adversely affects next token generations. To help rectify these issues, we introduce System 2 Attention (S2A), which leverages the ability of LLMs to reason in natural language and follow instructions in order to decide what… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

  12. arXiv:2311.07961  [pdf, other

    cs.CL

    The ART of LLM Refinement: Ask, Refine, and Trust

    Authors: Kumar Shridhar, Koustuv Sinha, Andrew Cohen, Tianlu Wang, ** Yu, Ram Pasunuru, Mrinmaya Sachan, Jason Weston, Asli Celikyilmaz

    Abstract: In recent years, Large Language Models (LLMs) have demonstrated remarkable generative abilities, but can they judge the quality of their own generations? A popular concept, referred to as self-refinement, postulates that LLMs can detect and correct the errors in their generations when asked to do so. However, recent empirical evidence points in the opposite direction, suggesting that LLMs often st… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

  13. arXiv:2310.15123  [pdf, other

    cs.CL cs.AI cs.LG

    Branch-Solve-Merge Improves Large Language Model Evaluation and Generation

    Authors: Swarnadeep Saha, Omer Levy, Asli Celikyilmaz, Mohit Bansal, Jason Weston, Xian Li

    Abstract: Large Language Models (LLMs) are frequently used for multi-faceted language generation and evaluation tasks that involve satisfying intricate user constraints or taking into account multiple aspects and criteria. However, their performance can fall short, due to the model's lack of coherence and inability to plan and decompose the problem. We propose Branch-Solve-Merge (BSM), a Large Language Mode… ▽ More

    Submitted 7 June, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: NAACL 2024 (19 pages, 7 figures, 11 tables)

  14. arXiv:2310.05029  [pdf, other

    cs.CL

    Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading

    Authors: Howard Chen, Ramakanth Pasunuru, Jason Weston, Asli Celikyilmaz

    Abstract: Large language models (LLMs) have advanced in large strides due to the effectiveness of the self-attention mechanism that processes and compares all tokens at once. However, this mechanism comes with a fundamental issue -- the predetermined context window is bound to be limited. Despite attempts to extend the context window through methods like extrapolating the positional embedding, using recurre… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

  15. arXiv:2309.11495  [pdf, other

    cs.CL cs.AI

    Chain-of-Verification Reduces Hallucination in Large Language Models

    Authors: Shehzaad Dhuliawala, Mojtaba Komeili, **g Xu, Roberta Raileanu, Xian Li, Asli Celikyilmaz, Jason Weston

    Abstract: Generation of plausible yet incorrect factual information, termed hallucination, is an unsolved issue in large language models. We study the ability of language models to deliberate on the responses they give in order to correct their mistakes. We develop the Chain-of-Verification (CoVe) method whereby the model first (i) drafts an initial response; then (ii) plans verification questions to fact-c… ▽ More

    Submitted 25 September, 2023; v1 submitted 20 September, 2023; originally announced September 2023.

  16. arXiv:2308.06259  [pdf, other

    cs.CL

    Self-Alignment with Instruction Backtranslation

    Authors: Xian Li, ** Yu, Chunting Zhou, Timo Schick, Omer Levy, Luke Zettlemoyer, Jason Weston, Mike Lewis

    Abstract: We present a scalable method to build a high quality instruction following language model by automatically labelling human-written text with corresponding instructions. Our approach, named instruction backtranslation, starts with a language model finetuned on a small amount of seed data, and a given web corpus. The seed model is used to construct training examples by generating instruction prompts… ▽ More

    Submitted 12 March, 2024; v1 submitted 11 August, 2023; originally announced August 2023.

    Comments: ICLR2024 camera ready

  17. arXiv:2307.14117  [pdf, other

    cs.CL

    Leveraging Implicit Feedback from Deployment Data in Dialogue

    Authors: Richard Yuanzhe Pang, Stephen Roller, Kyunghyun Cho, He He, Jason Weston

    Abstract: We study improving social conversational agents by learning from natural dialogue between users and a deployed model, without extra annotations. To implicitly measure the quality of a machine-generated utterance, we leverage signals like user response length, sentiment and reaction of the future human utterances in the collected dialogue episodes. Our experiments use the publicly released deployme… ▽ More

    Submitted 31 January, 2024; v1 submitted 26 July, 2023; originally announced July 2023.

    Comments: EACL 2024

  18. arXiv:2306.13588  [pdf, other

    cs.CL cs.AI

    System-Level Natural Language Feedback

    Authors: Weizhe Yuan, Kyunghyun Cho, Jason Weston

    Abstract: Natural language (NL) feedback offers rich insights into user experience. While existing studies focus on an instance-level approach, where feedback is used to refine specific examples, we introduce a framework for system-level use of NL feedback. We show how to use feedback to formalize system-level design decisions in a human-in-the-loop-process -- in order to produce better models. In particula… ▽ More

    Submitted 2 February, 2024; v1 submitted 23 June, 2023; originally announced June 2023.

    Comments: Accepted by EACL 2024

  19. arXiv:2306.04765  [pdf, other

    cs.AI cs.CL

    The HCI Aspects of Public Deployment of Research Chatbots: A User Study, Design Recommendations, and Open Challenges

    Authors: Morteza Behrooz, William Ngan, Joshua Lane, Giuliano Morse, Benjamin Babcock, Kurt Shuster, Mojtaba Komeili, Moya Chen, Melanie Kambadur, Y-Lan Boureau, Jason Weston

    Abstract: Publicly deploying research chatbots is a nuanced topic involving necessary risk-benefit analyses. While there have recently been frequent discussions on whether it is responsible to deploy such models, there has been far less focus on the interaction paradigms and design approaches that the resulting interfaces should adopt, in order to achieve their goals more effectively. We aim to pose, ground… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

  20. arXiv:2306.04707  [pdf, other

    cs.CL cs.AI

    Improving Open Language Models by Learning from Organic Interactions

    Authors: **g Xu, Da Ju, Joshua Lane, Mojtaba Komeili, Eric Michael Smith, Megan Ung, Morteza Behrooz, William Ngan, Rashel Moritz, Sainbayar Sukhbaatar, Y-Lan Boureau, Jason Weston, Kurt Shuster

    Abstract: We present BlenderBot 3x, an update on the conversational model BlenderBot 3, which is now trained using organic conversation and feedback data from participating users of the system in order to improve both its skills and safety. We are publicly releasing the participating de-identified interaction data for use by the research community, in order to spur further progress. Training models with org… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

  21. arXiv:2305.05364  [pdf, other

    cs.LG cs.AI cs.CL

    Large Language Model Programs

    Authors: Imanol Schlag, Sainbayar Sukhbaatar, Asli Celikyilmaz, Wen-tau Yih, Jason Weston, Jürgen Schmidhuber, Xian Li

    Abstract: In recent years, large pre-trained language models (LLMs) have demonstrated the ability to follow instructions and perform novel tasks from a few examples. The possibility to parameterise an LLM through such in-context examples widens their capability at a much lower cost than finetuning. We extend this line of reasoning and present a method which further expands the capabilities of an LLM by embe… ▽ More

    Submitted 9 May, 2023; originally announced May 2023.

  22. arXiv:2305.00833  [pdf, other

    cs.LG cs.AI cs.CL

    Learning to Reason and Memorize with Self-Notes

    Authors: Jack Lanchantin, Shubham Toshniwal, Jason Weston, Arthur Szlam, Sainbayar Sukhbaatar

    Abstract: Large language models have been shown to struggle with multi-step reasoning, and do not retain previous reasoning steps for future use. We propose a simple method for solving both of these problems by allowing the model to take Self-Notes. Unlike recent chain-of-thought or scratchpad approaches, the model can deviate from the input context at any time to explicitly think and write down its thought… ▽ More

    Submitted 31 October, 2023; v1 submitted 1 May, 2023; originally announced May 2023.

  23. arXiv:2304.13835  [pdf, other

    cs.CL cs.LG

    Multi-Party Chat: Conversational Agents in Group Settings with Humans and Models

    Authors: Jimmy Wei, Kurt Shuster, Arthur Szlam, Jason Weston, Jack Urbanek, Mojtaba Komeili

    Abstract: Current dialogue research primarily studies pairwise (two-party) conversations, and does not address the everyday setting where more than two speakers converse together. In this work, we both collect and evaluate multi-party conversations to study this more general case. We use the LIGHT environment to construct grounded conversations, where each participant has an assigned character to role-play.… ▽ More

    Submitted 8 June, 2023; v1 submitted 26 April, 2023; originally announced April 2023.

  24. arXiv:2302.06784  [pdf, other

    cs.CL

    The Stable Entropy Hypothesis and Entropy-Aware Decoding: An Analysis and Algorithm for Robust Natural Language Generation

    Authors: Kushal Arora, Timothy J. O'Donnell, Doina Precup, Jason Weston, Jackie C. K. Cheung

    Abstract: State-of-the-art language generation models can degenerate when applied to open-ended generation problems such as text completion, story generation, or dialog modeling. This degeneration usually shows up in the form of incoherence, lack of vocabulary diversity, and self-repetition or copying from the context. In this paper, we postulate that ``human-like'' generations usually lie in a narrow and n… ▽ More

    Submitted 13 February, 2023; originally announced February 2023.

  25. arXiv:2301.05746  [pdf, other

    cs.CL cs.AI

    Infusing Commonsense World Models with Graph Knowledge

    Authors: Alexander Gurung, Mojtaba Komeili, Arthur Szlam, Jason Weston, Jack Urbanek

    Abstract: While language models have become more capable of producing compelling language, we find there are still gaps in maintaining consistency, especially when describing events in a dynamically changing world. We study the setting of generating narratives in an open world text adventure game, where a graph representation of the underlying game state can be used to train models that consume and output b… ▽ More

    Submitted 13 January, 2023; originally announced January 2023.

  26. arXiv:2211.05826  [pdf, other

    cs.CL cs.AI

    The CRINGE Loss: Learning what language not to model

    Authors: Leonard Adolphs, Tianyu Gao, **g Xu, Kurt Shuster, Sainbayar Sukhbaatar, Jason Weston

    Abstract: Standard language model training employs gold human documents or human-human interaction data, and treats all training data as positive examples. Growing evidence shows that even with very large amounts of positive training data, issues remain that can be alleviated with relatively small amounts of negative data -- examples of what the model should not do. In this work, we propose a novel procedur… ▽ More

    Submitted 10 November, 2022; originally announced November 2022.

  27. arXiv:2210.15893  [pdf, other

    cs.CL cs.AI

    When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad Responses into Good Labels

    Authors: Weiyan Shi, Emily Dinan, Kurt Shuster, Jason Weston, **g Xu

    Abstract: Deployed dialogue agents have the potential to integrate human feedback to continuously improve themselves. However, humans may not always provide explicit signals when the chatbot makes mistakes during interactions. In this work, we propose Juicer, a framework to make use of both binary and free-form textual human feedback. It works by: (i) extending sparse binary feedback by training a satisfact… ▽ More

    Submitted 28 October, 2022; originally announced October 2022.

  28. arXiv:2208.03295  [pdf, other

    cs.CL cs.AI

    Learning from data in the mixed adversarial non-adversarial case: Finding the helpers and ignoring the trolls

    Authors: Da Ju, **g Xu, Y-Lan Boureau, Jason Weston

    Abstract: The promise of interaction between intelligent conversational agents and humans is that models can learn from such feedback in order to improve. Unfortunately, such exchanges in the wild will not always involve human utterances that are benign or of high quality, and will include a mixture of engaged (helpers) and unengaged or even malicious users (trolls). In this work we study how to perform rob… ▽ More

    Submitted 5 August, 2022; originally announced August 2022.

  29. arXiv:2208.03270  [pdf, other

    cs.CL cs.AI

    Learning New Skills after Deployment: Improving open-domain internet-driven dialogue with human feedback

    Authors: **g Xu, Megan Ung, Mojtaba Komeili, Kushal Arora, Y-Lan Boureau, Jason Weston

    Abstract: Frozen models trained to mimic static datasets can never improve their performance. Models that can employ internet-retrieval for up-to-date information and obtain feedback from humans during deployment provide the promise of both adapting to new information, and improving their performance. In this work we study how to improve internet-driven conversational skills in such a learning framework. We… ▽ More

    Submitted 16 August, 2022; v1 submitted 5 August, 2022; originally announced August 2022.

  30. arXiv:2208.03188  [pdf, other

    cs.CL cs.AI

    BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage

    Authors: Kurt Shuster, **g Xu, Mojtaba Komeili, Da Ju, Eric Michael Smith, Stephen Roller, Megan Ung, Moya Chen, Kushal Arora, Joshua Lane, Morteza Behrooz, William Ngan, Spencer Poff, Naman Goyal, Arthur Szlam, Y-Lan Boureau, Melanie Kambadur, Jason Weston

    Abstract: We present BlenderBot 3, a 175B parameter dialogue model capable of open-domain conversation with access to the internet and a long-term memory, and having been trained on a large number of user defined tasks. We release both the model weights and code, and have also deployed the model on a public web page to interact with organic users. This technical report describes how the model was built (arc… ▽ More

    Submitted 10 August, 2022; v1 submitted 5 August, 2022; originally announced August 2022.

  31. InAs-Al Hybrid Devices Passing the Topological Gap Protocol

    Authors: Morteza Aghaee, Arun Akkala, Zulfi Alam, Rizwan Ali, Alejandro Alcaraz Ramirez, Mariusz Andrzejczuk, Andrey E Antipov, Pavel Aseev, Mikhail Astafev, Bela Bauer, Jonathan Becker, Srini Boddapati, Frenk Boekhout, Jouri Bommer, Esben Bork Hansen, Tom Bosma, Leo Bourdet, Samuel Boutin, Philippe Caroff, Lucas Casparis, Maja Cassidy, Anna Wulf Christensen, Noah Clay, William S Cole, Fabiano Corsetti , et al. (102 additional authors not shown)

    Abstract: We present measurements and simulations of semiconductor-superconductor heterostructure devices that are consistent with the observation of topological superconductivity and Majorana zero modes. The devices are fabricated from high-mobility two-dimensional electron gases in which quasi-one-dimensional wires are defined by electrostatic gates. These devices enable measurements of local and non-loca… ▽ More

    Submitted 8 March, 2024; v1 submitted 6 July, 2022; originally announced July 2022.

    Comments: Final version

  32. arXiv:2206.07694  [pdf, other

    cs.CL

    DIRECTOR: Generator-Classifiers For Supervised Language Modeling

    Authors: Kushal Arora, Kurt Shuster, Sainbayar Sukhbaatar, Jason Weston

    Abstract: Current language models achieve low perplexity but their resulting generations still suffer from toxic responses, repetitiveness and contradictions. The standard language modeling setup fails to address these issues. In this paper, we introduce a new architecture, {\sc Director}, that consists of a unified generator-classifier with both a language modeling and a classification head for each output… ▽ More

    Submitted 25 November, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

  33. arXiv:2203.13224  [pdf, other

    cs.CL cs.AI

    Language Models that Seek for Knowledge: Modular Search & Generation for Dialogue and Prompt Completion

    Authors: Kurt Shuster, Mojtaba Komeili, Leonard Adolphs, Stephen Roller, Arthur Szlam, Jason Weston

    Abstract: Language models (LMs) have recently been shown to generate more factual responses by employing modularity (Zhou et al., 2021) in combination with retrieval (Adolphs et al., 2021). We extend the recent approach of Adolphs et al. (2021) to include internet search as a module. Our SeeKeR (Search engine->Knowledge->Response) method thus applies a single LM to three modular tasks in succession: search,… ▽ More

    Submitted 29 March, 2022; v1 submitted 24 March, 2022; originally announced March 2022.

  34. arXiv:2202.11507  [pdf, other

    math.OC

    On Carbon Taxes Effectiveness to Induce a Clean Technology Transition: An Evaluation Framework Based on Optimal Strategic Capacity Planning

    Authors: N. Wolf, P. Escalona, A. Angulo, J. Weston

    Abstract: This paper studies carbon taxes effectiveness to induce a transition to cleaner production when a firm faces different technologies and demands. To determine carbon taxes effectiveness, we propose a framework based on a strategic capacity planning under carbon taxes model, that consider proper perfomance measures. The model, which is formulated as a mixed integer linear problem (MILP), considers i… ▽ More

    Submitted 23 February, 2022; originally announced February 2022.

  35. arXiv:2201.04723  [pdf, other

    cs.CL cs.AI

    Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents

    Authors: Eric Michael Smith, Orion Hsu, Rebecca Qian, Stephen Roller, Y-Lan Boureau, Jason Weston

    Abstract: At the heart of improving conversational AI is the open problem of how to evaluate conversations. Issues with automatic metrics are well known (Liu et al., 2016, arXiv:1603.08023), with human evaluations still considered the gold standard. Unfortunately, how to perform human evaluations is also an open problem: differing data collection methods have varying levels of human agreement and statistica… ▽ More

    Submitted 12 January, 2022; originally announced January 2022.

  36. arXiv:2112.05843  [pdf, other

    cs.CL

    Am I Me or You? State-of-the-Art Dialogue Models Cannot Maintain an Identity

    Authors: Kurt Shuster, Jack Urbanek, Arthur Szlam, Jason Weston

    Abstract: State-of-the-art dialogue models still often stumble with regards to factual accuracy and self-contradiction. Anecdotally, they have been observed to fail to maintain character identity throughout discourse; and more specifically, may take on the role of their interlocutor. In this work we formalize and quantify this deficiency, and show experimentally through human evaluations that this is indeed… ▽ More

    Submitted 10 December, 2021; originally announced December 2021.

  37. arXiv:2111.05204  [pdf, other

    cs.CL cs.AI cs.LG

    Reason first, then respond: Modular Generation for Knowledge-infused Dialogue

    Authors: Leonard Adolphs, Kurt Shuster, Jack Urbanek, Arthur Szlam, Jason Weston

    Abstract: Large language models can produce fluent dialogue but often hallucinate factual inaccuracies. While retrieval-augmented models help alleviate this issue, they still face a difficult challenge of both reasoning to provide correct knowledge and generating conversation simultaneously. In this work, we propose a modular model, Knowledge to Response (K2R), for incorporating knowledge into conversationa… ▽ More

    Submitted 9 November, 2021; originally announced November 2021.

  38. arXiv:2110.10497  [pdf, other

    hep-ex physics.ins-det

    Search for dark photons using a multilayer dielectric haloscope equipped with a single-photon avalanche diode

    Authors: Laura Manenti, Umang Mishra, Gianmarco Bruno, Adriano Di Giovanni, Alexander John Millar, Knut Dundas Morå, Renu Pasricha, Henry Roberts, Panos Oikonomou, Isaac Sarnoff, James Weston, Francesco Arneodo

    Abstract: We report on the results of the search for dark photons with mass around 1.5$\,\rm eV/c^2$ using a multilayer dielectric haloscope equipped with an affordable and commercially available photosensor. The multilayer stack, which enables the conversion of dark photons (DP) to Standard Model photons, is made of 23 bilayers of alternating SiO$_2$ and Si$_3$N$_4$ thin films with linearly increasing thic… ▽ More

    Submitted 7 January, 2023; v1 submitted 20 October, 2021; originally announced October 2021.

    Journal ref: Phys. Rev. D 105, 052010 (2022)

  39. arXiv:2110.09456  [pdf, other

    cs.CL cs.AI

    NormFormer: Improved Transformer Pretraining with Extra Normalization

    Authors: Sam Shleifer, Jason Weston, Myle Ott

    Abstract: During pretraining, the Pre-LayerNorm transformer suffers from a gradient magnitude mismatch: gradients at early layers are much larger than at later layers. These issues can be alleviated by our proposed NormFormer architecture, which adds three normalization operations to each layer: a Layer Norm after self attention, head-wise scaling of self-attention outputs, and a Layer Norm after the first… ▽ More

    Submitted 1 November, 2021; v1 submitted 18 October, 2021; originally announced October 2021.

  40. arXiv:2107.08251  [pdf, other

    cs.CL cs.LG

    Generative Pretraining for Paraphrase Evaluation

    Authors: Jack Weston, Raphael Lenain, Udeepa Meepegama, Emil Fristed

    Abstract: We introduce ParaBLEU, a paraphrase representation learning model and evaluation metric for text generation. Unlike previous approaches, ParaBLEU learns to understand paraphrasis using generative conditioning as a pretraining objective. ParaBLEU correlates more strongly with human judgements than existing metrics, obtaining new state-of-the-art results on the 2017 WMT Metrics Shared Task. We show… ▽ More

    Submitted 24 July, 2021; v1 submitted 17 July, 2021; originally announced July 2021.

    Comments: Under review

  41. arXiv:2107.08248  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Learning De-identified Representations of Prosody from Raw Audio

    Authors: Jack Weston, Raphael Lenain, Udeepa Meepegama, Emil Fristed

    Abstract: We propose a method for learning de-identified prosody representations from raw audio using a contrastive self-supervised signal. Whereas prior work has relied on conditioning models on bottlenecks, we introduce a set of inductive biases that exploit the natural structure of prosody to minimize timbral information and decouple prosody from speaker representations. Despite aggressive downsampling o… ▽ More

    Submitted 17 July, 2021; originally announced July 2021.

    Comments: ICML 2021

    Journal ref: Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event. Proceedings of Machine Learning Research 139, PMLR 2021

  42. arXiv:2107.07567  [pdf, other

    cs.CL cs.AI

    Beyond Goldfish Memory: Long-Term Open-Domain Conversation

    Authors: **g Xu, Arthur Szlam, Jason Weston

    Abstract: Despite recent improvements in open-domain dialogue models, state of the art models are trained and evaluated on short conversations with little context. In contrast, the long-term conversation setting has hardly been studied. In this work we collect and release a human-human dataset consisting of multiple chat sessions whereby the speaking partners learn about each other's interests and discuss t… ▽ More

    Submitted 15 July, 2021; originally announced July 2021.

  43. arXiv:2107.07566  [pdf, other

    cs.AI cs.CL

    Internet-Augmented Dialogue Generation

    Authors: Mojtaba Komeili, Kurt Shuster, Jason Weston

    Abstract: The largest store of continually updating knowledge on our planet can be accessed via internet search. In this work we study giving access to this information to conversational agents. Large language models, even though they store an impressive amount of knowledge within their weights, are known to hallucinate facts when generating dialogue (Shuster et al., 2021); moreover, those facts are frozen… ▽ More

    Submitted 15 July, 2021; originally announced July 2021.

  44. arXiv:2107.06251  [pdf, other

    astro-ph.HE astro-ph.SR

    Classical Novae at Radio Wavelengths

    Authors: Laura Chomiuk, Justin D. Linford, Elias Aydi, Keith W. Bannister, Miriam I. Krauss, Amy J. Mioduszewski, Koji Mukai, Thomas J. Nelson, Michael P. Rupen, Stuart D. Ryder, Jennifer L. Sokoloski, Kirill V. Sokolovsky, Jay Strader, Miroslav D. Filipovic, Tom Finzell, Adam Kawash, Erik C. Kool, Brian D. Metzger, Miriam M. Nyamai, Valerio A. R. M. Ribeiro, Nirupam Roy, Ryan Urquhart, Jennifer Weston

    Abstract: We present radio observations (1--40 GHz) for 36 classical novae, representing data from over five decades compiled from the literature, telescope archives, and our own programs. Our targets display a striking diversity in their optical parameters (e.g., spanning optical fading timescales, t_2 = 1--263 days), and we find a similar diversity in the radio light curves. Using a brightness temperature… ▽ More

    Submitted 13 July, 2021; originally announced July 2021.

    Comments: Submitted to AAS Journals

  45. Shocks and dust formation in nova V809 Cep

    Authors: Aliya-Nur Babul, Jennifer L. Sokoloski, Laura Chomiuk, Justin D. Linford, Jennifer H. S. Weston, Elias Aydi, Kirill V. Sokolovsky, Adam M. Kawash

    Abstract: The discovery that many classical novae produce detectable GeV $γ$-ray emission has raised the question of the role of shocks in nova eruptions. Here we use radio observations of nova V809 Cep (Nova Cep 2013) with the Jansky Very Large Array to show that it produced non-thermal emission indicative of particle acceleration in strong shocks for more than a month starting about six weeks into the eru… ▽ More

    Submitted 29 June, 2021; originally announced June 2021.

  46. arXiv:2106.04426  [pdf, other

    cs.LG cs.CL

    Hash Layers For Large Sparse Models

    Authors: Stephen Roller, Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston

    Abstract: We investigate the training of sparse layers that use different parameters for different inputs based on hashing in large Transformer models. Specifically, we modify the feedforward layer to hash to different sets of weights depending on the current token, over all tokens in the sequence. We show that this procedure either outperforms or is competitive with learning-to-route mixture-of-expert meth… ▽ More

    Submitted 20 July, 2021; v1 submitted 8 June, 2021; originally announced June 2021.

  47. arXiv:2106.04279  [pdf, other

    cs.LG cs.CL

    Staircase Attention for Recurrent Processing of Sequences

    Authors: Da Ju, Stephen Roller, Sainbayar Sukhbaatar, Jason Weston

    Abstract: Attention mechanisms have become a standard tool for sequence modeling tasks, in particular by stacking self-attention layers over the entire input sequence as in the Transformer architecture. In this work we introduce a novel attention procedure called staircase attention that, unlike self-attention, operates across the sequence (in time) recurrently processing the input by adding another step of… ▽ More

    Submitted 8 June, 2021; originally announced June 2021.

  48. arXiv:2105.06548  [pdf, other

    cs.LG cs.AI

    Not All Memories are Created Equal: Learning to Forget by Expiring

    Authors: Sainbayar Sukhbaatar, Da Ju, Spencer Poff, Stephen Roller, Arthur Szlam, Jason Weston, Angela Fan

    Abstract: Attention mechanisms have shown promising results in sequence modeling tasks that require long-term memory. Recent work investigated mechanisms to reduce the computational cost of preserving and storing memories. However, not all content in the past is equally important to remember. We propose Expire-Span, a method that learns to retain the most important information and expire the irrelevant info… ▽ More

    Submitted 13 June, 2021; v1 submitted 13 May, 2021; originally announced May 2021.

  49. arXiv:2104.07567  [pdf, other

    cs.CL cs.AI

    Retrieval Augmentation Reduces Hallucination in Conversation

    Authors: Kurt Shuster, Spencer Poff, Moya Chen, Douwe Kiela, Jason Weston

    Abstract: Despite showing increasingly human-like conversational abilities, state-of-the-art dialogue models often suffer from factual incorrectness and hallucination of knowledge (Roller et al., 2020). In this work we explore the use of neural-retrieval-in-the-loop architectures - recently shown to be effective in open-domain QA (Lewis et al., 2020b; Izacard and Grave, 2020) - for knowledge-grounded dialog… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

  50. arXiv:2104.00641  [pdf

    stat.ML cs.LG

    Dynamic Silos: Increased Modularity in Intra-organizational Communication Networks during the Covid-19 Pandemic

    Authors: Tiona Zuzul, Emily Cox Pahnke, Jonathan Larson, Patrick Bourke, Nicholas Caurvina, Neha Parikh Shah, Fereshteh Amini, Jeffrey Weston, Youngser Park, Joshua Vogelstein, Christopher White, Carey E. Priebe

    Abstract: Workplace communications around the world were drastically altered by Covid-19, related work-from-home orders, and the rise of remote work. To understand these shifts, we analyzed aggregated, anonymized metadata from over 360 billion emails within 4,361 organizations worldwide. By comparing month-to-month and year-over-year metrics, we examined changes in network community structures over 24 month… ▽ More

    Submitted 28 July, 2023; v1 submitted 1 April, 2021; originally announced April 2021.

    Comments: 48 pages, 15 figures