Skip to main content

Showing 1–50 of 152 results for author: Lewis, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.14485   

    cs.AI cs.HC cs.MM cs.SD eess.AS

    Proceedings of The second international workshop on eXplainable AI for the Arts (XAIxArts)

    Authors: Nick Bryan-Kinns, Corey Ford, Shuoyang Zheng, Helen Kennedy, Alan Chamberlain, Makayla Lewis, Drew Hemment, Zi** Li, Qiong Wu, Lanxi Xiao, Gus Xia, Jeba Rezwana, Michael Clemens, Gabriel Vigliensoni

    Abstract: This second international workshop on explainable AI for the Arts (XAIxArts) brought together a community of researchers in HCI, Interaction Design, AI, explainable AI (XAI), and digital arts to explore the role of XAI for the Arts. Workshop held at the 16th ACM Conference on Creativity and Cognition (C&C 2024), Chicago, USA.

    Submitted 1 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

  2. arXiv:2406.04004  [pdf, other

    quant-ph cs.NE

    T-Count Optimizing Genetic Algorithm for Quantum State Preparation

    Authors: Andrew Wright, Marco Lewis, Paolo Zuliani, Sadegh Soudjani

    Abstract: Quantum state preparation is a crucial process within numerous quantum algorithms, and the need for efficient initialization of quantum registers is ever increasing as demand for useful quantum computing grows. The problem arises as the number of qubits to be initialized grows, the circuits required to implement the desired state also exponentially increase in size leading to loss of fidelity to n… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: To appear in IEEE QSW 2024 proceedings

  3. arXiv:2406.03119  [pdf, ps, other

    quant-ph cs.LO cs.SE

    Automated Verification of Silq Quantum Programs using SMT Solvers

    Authors: Marco Lewis, Paolo Zuliani, Sadegh Soudjani

    Abstract: We present SilVer (Silq Verification), an automated tool for verifying behaviors of quantum programs written in Silq, which is a high-level programming language for quantum computing. The goal of the verification is to ensure correctness of the Silq quantum program against user-defined specifications using SMT solvers. We introduce a programming model that is based on a quantum RAM-style computer… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 10 pages, to appear in the proceedings of IEEE QSW 2024

  4. arXiv:2405.12886  [pdf, ps, other

    cs.SC

    The Recovery of $λ$ from a Hilbert Polynomial

    Authors: Joseph Donato, Monica Lewis

    Abstract: In the study of Hilbert schemes, the integer partition $λ$ helps researchers identify some geometric and combinatorial properties of the scheme in question. To aid researchers in extracting such information from a Hilbert polynomial, we describe an efficient algorithm which can identify if $p(x)\in\mathbb{Q}[x]$ is a Hilbert polynomial and if so, recover the integer partition $λ$ associated with i… ▽ More

    Submitted 4 June, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

  5. arXiv:2405.04324  [pdf, other

    cs.AI cs.CL cs.SE

    Granite Code Models: A Family of Open Foundation Models for Code Intelligence

    Authors: Mayank Mishra, Matt Stallone, Gaoyuan Zhang, Yikang Shen, Aditya Prasad, Adriana Meza Soria, Michele Merler, Parameswaran Selvam, Saptha Surendran, Shivdeep Singh, Manish Sethi, Xuan-Hong Dang, Pengyuan Li, Kun-Lung Wu, Syed Zawad, Andrew Coleman, Matthew White, Mark Lewis, Raju Pavuluri, Yan Koyfman, Boris Lublinsky, Maximilien de Bayser, Ibrahim Abdelaziz, Kinjal Basu, Mayank Agarwal , et al. (21 additional authors not shown)

    Abstract: Large Language Models (LLMs) trained on code are revolutionizing the software development process. Increasingly, code LLMs are being integrated into software development environments to improve the productivity of human programmers, and LLM-based agents are beginning to show promise for handling complex tasks autonomously. Realizing the full potential of code LLMs requires a wide range of capabili… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Corresponding Authors: Rameswar Panda, Ruchir Puri; Equal Contributors: Mayank Mishra, Matt Stallone, Gaoyuan Zhang

  6. arXiv:2405.03133  [pdf, other

    cs.CL cs.LG

    Lory: Fully Differentiable Mixture-of-Experts for Autoregressive Language Model Pre-training

    Authors: Zexuan Zhong, Mengzhou Xia, Danqi Chen, Mike Lewis

    Abstract: Mixture-of-experts (MoE) models facilitate efficient scaling; however, training the router network introduces the challenge of optimizing a non-differentiable, discrete objective. Recently, a fully-differentiable MoE architecture, SMEAR, was proposed (Muqeeth et al., 2023), which softly merges experts in the parameter space; nevertheless, its effectiveness was only demonstrated in downstream fine-… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: 21 pages, 12 figures

  7. arXiv:2404.08893  [pdf, other

    cs.LG math.DS q-bio.PE stat.AP

    Early detection of disease outbreaks and non-outbreaks using incidence data

    Authors: Shan Gao, Amit K. Chakraborty, Russell Greiner, Mark A. Lewis, Hao Wang

    Abstract: Forecasting the occurrence and absence of novel disease outbreaks is essential for disease management. Here, we develop a general model, with no real-world training data, that accurately forecasts outbreaks and non-outbreaks. We propose a novel framework, using a feature-based time series classification method to forecast outbreaks and non-outbreaks. We tested our methods on synthetic data from a… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

  8. arXiv:2403.16233  [pdf, other

    cs.LG q-bio.PE stat.AP

    An early warning indicator trained on stochastic disease-spreading models with different noises

    Authors: Amit K. Chakraborty, Shan Gao, Reza Miry, Pouria Ramazi, Russell Greiner, Mark A. Lewis, Hao Wang

    Abstract: The timely detection of disease outbreaks through reliable early warning signals (EWSs) is indispensable for effective public health mitigation strategies. Nevertheless, the intricate dynamics of real-world disease spread, often influenced by diverse sources of noise and limited data in the early stages of outbreaks, pose a significant challenge in develo** reliable EWSs, as the performance of e… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  9. arXiv:2403.11810  [pdf, other

    cs.CL

    Metaphor Understanding Challenge Dataset for LLMs

    Authors: Xiaoyu Tong, Rochelle Choenni, Martha Lewis, Ekaterina Shutova

    Abstract: Metaphors in natural language are a reflection of fundamental cognitive processes such as analogical reasoning and categorisation, and are deeply rooted in everyday communication. Metaphor understanding is therefore an essential task for large language models (LLMs). We release the Metaphor Understanding Challenge Dataset (MUNCH), designed to evaluate the metaphor understanding capabilities of LLM… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  10. arXiv:2402.08955  [pdf, other

    cs.AI cs.CL

    Using Counterfactual Tasks to Evaluate the Generality of Analogical Reasoning in Large Language Models

    Authors: Martha Lewis, Melanie Mitchell

    Abstract: Large language models (LLMs) have performed well on several reasoning benchmarks, including ones that test analogical reasoning abilities. However, it has been debated whether they are actually performing humanlike abstract reasoning or instead employing less general processes that rely on similarity to what has been seen in their training data. Here we investigate the generality of analogy-making… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

  11. arXiv:2402.06678  [pdf, other

    physics.soc-ph cs.LG q-bio.QM

    Can machine learning predict citizen-reported angler behavior?

    Authors: Julia S. Schmid, Sean Simmons, Mark A. Lewis, Mark S. Poesch, Pouria Ramazi

    Abstract: Prediction of angler behaviors, such as catch rates and angler pressure, is essential to maintaining fish populations and ensuring angler satisfaction. Angler behavior can partly be tracked by online platforms and mobile phone applications that provide fishing activities reported by recreational anglers. Moreover, angler behavior is known to be driven by local site attributes. Here, the prediction… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: 36 pages, 10 figures, 4 tables (including supplementary information)

  12. Grounded learning for compositional vector semantics

    Authors: Martha Lewis

    Abstract: Categorical compositional distributional semantics is an approach to modelling language that combines the success of vector-based models of meaning with the compositional power of formal semantics. However, this approach was developed without an eye to cognitive plausibility. Vector representations of concepts and concept binding are also of interest in cognitive science, and have been proposed as… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

  13. Architectural Design for Secure Smart Contract Development

    Authors: Myles Lewis, Chris Crawford

    Abstract: As time progresses, the need for more secure applications grows exponentially. The different types of sensitive information that is being transferred virtually has sparked a rise in systems that leverage blockchain. Different sectors are beginning to use this disruptive technology to evaluate the risks and benefits. Sectors like finance, medicine, higher education, and wireless communication have… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

    Comments: 5 pages, 2 figures

    Journal ref: 14th International Conference on Applied Human Factors and Ergonomics (AHFE 2023)

  14. arXiv:2312.08397  [pdf, other

    cs.LG cs.AI cs.HC

    Personalized Decision Supports based on Theory of Mind Modeling and Explainable Reinforcement Learning

    Authors: Huao Li, Yao Fan, Keyang Zheng, Michael Lewis, Katia Sycara

    Abstract: In this paper, we propose a novel personalized decision support system that combines Theory of Mind (ToM) modeling and explainable Reinforcement Learning (XRL) to provide effective and interpretable interventions. Our method leverages DRL to provide expert action recommendations while incorporating ToM modeling to understand users' mental states and predict their future actions, enabling appropria… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

    Comments: Accepted to IEEE SMC 2023

  15. arXiv:2311.18064  [pdf, other

    cs.CV

    GELDA: A generative language annotation framework to reveal visual biases in datasets

    Authors: Krish Kabra, Kathleen M. Lewis, Guha Balakrishnan

    Abstract: Bias analysis is a crucial step in the process of creating fair datasets for training and evaluating computer vision models. The bottleneck in dataset analysis is annotation, which typically requires: (1) specifying a list of attributes relevant to the dataset domain, and (2) classifying each image-attribute pair. While the second step has made rapid progress in automation, the first has remained… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

    Comments: 21 pages, 15 figures, 9 tables

  16. arXiv:2311.11085  [pdf, other

    cs.LG

    Compositional Fusion of Signals in Data Embedding

    Authors: Zhi** Guo, Zhaozhen Xu, Martha Lewis, Nello Cristianini

    Abstract: Embeddings in AI convert symbolic structures into fixed-dimensional vectors, effectively fusing multiple signals. However, the nature of this fusion in real-world data is often unclear. To address this, we introduce two methods: (1) Correlation-based Fusion Detection, measuring correlation between known attributes and embeddings, and (2) Additive Fusion Detection, viewing embeddings as sums of ind… ▽ More

    Submitted 18 November, 2023; originally announced November 2023.

  17. arXiv:2311.05720  [pdf, other

    cs.CL cs.AI cs.LG

    Long-Horizon Dialogue Understanding for Role Identification in the Game of Avalon with Large Language Models

    Authors: Simon Stepputtis, Joseph Campbell, Yaqi Xie, Zhengyang Qi, Wenxin Sharon Zhang, Ruiyi Wang, Sanketh Rangreji, Michael Lewis, Katia Sycara

    Abstract: Deception and persuasion play a critical role in long-horizon dialogues between multiple parties, especially when the interests, goals, and motivations of the participants are not aligned. Such complex tasks pose challenges for current Large Language Models (LLM) as deception and persuasion can easily mislead them, especially in long-horizon multi-party dialogues. To this end, we explore the game… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

    Comments: Accepted to the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP, Findings of the Association for Computational Linguistics)

  18. arXiv:2311.00115  [pdf, other

    cs.LG cs.CY

    EXTRACT: Explainable Transparent Control of Bias in Embeddings

    Authors: Zhi** Guo, Zhaozhen Xu, Martha Lewis, Nello Cristianini

    Abstract: Knowledge Graphs are a widely used method to represent relations between entities in various AI applications, and Graph Embedding has rapidly become a standard technique to represent Knowledge Graphs in such a way as to facilitate inferences and decisions. As this representation is obtained from behavioural data, and is not in a form readable by humans, there is a concern that it might incorporate… ▽ More

    Submitted 31 October, 2023; originally announced November 2023.

    Comments: Aequitas 2023: Workshop on Fairness and Bias in AI | co-located with ECAI 2023, Kraków, Poland

  19. Theory of Mind for Multi-Agent Collaboration via Large Language Models

    Authors: Huao Li, Yu Quan Chong, Simon Stepputtis, Joseph Campbell, Dana Hughes, Michael Lewis, Katia Sycara

    Abstract: While Large Language Models (LLMs) have demonstrated impressive accomplishments in both reasoning and planning, their abilities in multi-agent collaborations remains largely unexplored. This study evaluates LLM-based agents in a multi-agent cooperative text game with Theory of Mind (ToM) inference tasks, comparing their performance with Multi-Agent Reinforcement Learning (MARL) and planning-based… ▽ More

    Submitted 26 June, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: Accepted to EMNLP 2023 (Main Conference). Code available at https://github.com/romanlee6/multi_LLM_comm

    Journal ref: in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Page 180-192, ACL

  20. arXiv:2310.10638  [pdf, other

    cs.CL cs.AI cs.LG

    In-context Pretraining: Language Modeling Beyond Document Boundaries

    Authors: Weijia Shi, Sewon Min, Maria Lomeli, Chunting Zhou, Margaret Li, Gergely Szilvasy, Rich James, Xi Victoria Lin, Noah A. Smith, Luke Zettlemoyer, Scott Yih, Mike Lewis

    Abstract: Large language models (LMs) are currently trained to predict tokens given document prefixes, enabling them to directly perform long-form generation and prompting-style tasks which can be reduced to document completion. Existing pretraining pipelines train LMs by concatenating random sets of short documents to create input contexts but the prior documents provide no signal for predicting the next d… ▽ More

    Submitted 24 June, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

  21. arXiv:2310.01352  [pdf, other

    cs.CL cs.AI

    RA-DIT: Retrieval-Augmented Dual Instruction Tuning

    Authors: Xi Victoria Lin, Xilun Chen, Mingda Chen, Weijia Shi, Maria Lomeli, Rich James, Pedro Rodriguez, Jacob Kahn, Gergely Szilvasy, Mike Lewis, Luke Zettlemoyer, Scott Yih

    Abstract: Retrieval-augmented language models (RALMs) improve performance by accessing long-tail and up-to-date knowledge from external data stores, but are challenging to build. Existing approaches require either expensive retrieval-specific modifications to LM pre-training or use post-hoc integration of the data store that leads to suboptimal performance. We introduce Retrieval-Augmented Dual Instruction… ▽ More

    Submitted 6 May, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: v4: ICLR 2024 camera-ready version

  22. arXiv:2309.17453  [pdf, other

    cs.CL cs.AI

    Efficient Streaming Language Models with Attention Sinks

    Authors: Guangxuan Xiao, Yuandong Tian, Beidi Chen, Song Han, Mike Lewis

    Abstract: Deploying Large Language Models (LLMs) in streaming applications such as multi-round dialogue, where long interactions are expected, is urgently needed but poses two major challenges. Firstly, during the decoding stage, caching previous tokens' Key and Value states (KV) consumes extensive memory. Secondly, popular LLMs cannot generalize to longer texts than the training sequence length. Window att… ▽ More

    Submitted 6 April, 2024; v1 submitted 29 September, 2023; originally announced September 2023.

    Comments: ICLR 2024

  23. arXiv:2309.16039  [pdf, other

    cs.CL

    Effective Long-Context Scaling of Foundation Models

    Authors: Wenhan Xiong, **gyu Liu, Igor Molybog, Hejia Zhang, Prajjwal Bhargava, Rui Hou, Louis Martin, Rashi Rungta, Karthik Abinav Sankararaman, Barlas Oguz, Madian Khabsa, Han Fang, Yashar Mehdad, Sharan Narang, Kshitiz Malik, Angela Fan, Shruti Bhosale, Sergey Edunov, Mike Lewis, Sinong Wang, Hao Ma

    Abstract: We present a series of long-context LLMs that support effective context windows of up to 32,768 tokens. Our model series are built through continual pretraining from Llama 2 with longer training sequences and on a dataset where long texts are upsampled. We perform extensive evaluation on language modeling, synthetic context probing tasks, and a wide range of research benchmarks. On research benchm… ▽ More

    Submitted 13 November, 2023; v1 submitted 27 September, 2023; originally announced September 2023.

  24. arXiv:2309.10650  [pdf, other

    cs.CV q-bio.QM

    MUSTANG: Multi-Stain Self-Attention Graph Multiple Instance Learning Pipeline for Histopathology Whole Slide Images

    Authors: Amaya Gallagher-Syed, Luca Rossi, Felice Rivellese, Costantino Pitzalis, Myles Lewis, Michael Barnes, Gregory Slabaugh

    Abstract: Whole Slide Images (WSIs) present a challenging computer vision task due to their gigapixel size and presence of numerous artefacts. Yet they are a valuable resource for patient diagnosis and stratification, often representing the gold standard for diagnostic tasks. Real-world clinical datasets tend to come as sets of heterogeneous WSIs with labels present at the patient-level, with poor to no ann… ▽ More

    Submitted 4 October, 2023; v1 submitted 19 September, 2023; originally announced September 2023.

    Comments: Accepted for publication at BMVC 2023

  25. arXiv:2309.09117  [pdf, other

    cs.CL cs.AI

    Contrastive Decoding Improves Reasoning in Large Language Models

    Authors: Sean O'Brien, Mike Lewis

    Abstract: We demonstrate that Contrastive Decoding -- a simple, computationally light, and training-free text generation method proposed by Li et al 2022 -- achieves large out-of-the-box improvements over greedy decoding on a variety of reasoning tasks. Originally shown to improve the perceived quality of long-form text generation, Contrastive Decoding searches for strings that maximize a weighted differenc… ▽ More

    Submitted 29 September, 2023; v1 submitted 16 September, 2023; originally announced September 2023.

    Comments: 9 figures, 11 tables

    ACM Class: I.2.7

  26. arXiv:2309.07255  [pdf

    eess.IV cs.CV q-bio.QM

    Automated segmentation of rheumatoid arthritis immunohistochemistry stained synovial tissue

    Authors: Amaya Gallagher-Syed, Abbas Khan, Felice Rivellese, Costantino Pitzalis, Myles J. Lewis, Gregory Slabaugh, Michael R. Barnes

    Abstract: Rheumatoid Arthritis (RA) is a chronic, autoimmune disease which primarily affects the joint's synovial tissue. It is a highly heterogeneous disease, with wide cellular and molecular variability observed in synovial tissues. Over the last two decades, the methods available for their study have advanced considerably. In particular, Immunohistochemistry stains are well suited to highlighting the fun… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

  27. arXiv:2308.11424  [pdf

    cs.HC cs.AI

    AIxArtist: A First-Person Tale of Interacting with Artificial Intelligence to Escape Creative Block

    Authors: Makayla Lewis

    Abstract: The future of the arts and artificial intelligence (AI) is promising as technology advances. As the use of AI in design becomes more widespread, art practice may not be a human-only art form and could instead become a digitally integrated experience. With enhanced creativity and collaboration, arts and AI could work together towards creating artistic outputs that are visually appealing and meet th… ▽ More

    Submitted 22 August, 2023; originally announced August 2023.

    Comments: 1st International Workshop on Explainable AI for the Arts (XAIxArts), ACM Creativity and Cognition (C&C) 2023. Online, 6 pages. https://xaixarts.github.io

    MSC Class: 68T99 ACM Class: I.2.m

  28. arXiv:2308.06259  [pdf, other

    cs.CL

    Self-Alignment with Instruction Backtranslation

    Authors: Xian Li, ** Yu, Chunting Zhou, Timo Schick, Omer Levy, Luke Zettlemoyer, Jason Weston, Mike Lewis

    Abstract: We present a scalable method to build a high quality instruction following language model by automatically labelling human-written text with corresponding instructions. Our approach, named instruction backtranslation, starts with a language model finetuned on a small amount of seed data, and a given web corpus. The seed model is used to construct training examples by generating instruction prompts… ▽ More

    Submitted 12 March, 2024; v1 submitted 11 August, 2023; originally announced August 2023.

    Comments: ICLR2024 camera ready

  29. arXiv:2307.15519   

    cs.LO math.CT

    Proceedings Fifth International Conference on Applied Category Theory

    Authors: Jade Master, Martha Lewis

    Abstract: The Fifth International Conference on Applied Category Theory took place at the University of Strathclyde in Glasgow, Scotland on 18-22 July 2022. This conference follows the previous meetings at Leiden (2018), Oxford (2019), MIT (2020, fully online), and Cambridge (2021). The conference comprised 59 contributed talks, a poster session, an industry showcase session, and a session where junior rese… ▽ More

    Submitted 28 July, 2023; originally announced July 2023.

    Journal ref: EPTCS 380, 2023

  30. arXiv:2307.11315  [pdf, other

    cs.CV cs.CL

    GIST: Generating Image-Specific Text for Fine-grained Object Classification

    Authors: Kathleen M. Lewis, Emily Mu, Adrian V. Dalca, John Guttag

    Abstract: Recent vision-language models outperform vision-only models on many image classification tasks. However, because of the absence of paired text/image descriptions, it remains difficult to fine-tune these models for fine-grained image classification. In this work, we propose a method, GIST, for generating image-specific fine-grained text descriptions from image-only datasets, and show that these tex… ▽ More

    Submitted 4 August, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

    Comments: The first two authors contributed equally to this work and are listed in alphabetical order

  31. arXiv:2305.14739  [pdf, other

    cs.CL

    Trusting Your Evidence: Hallucinate Less with Context-aware Decoding

    Authors: Weijia Shi, Xiaochuang Han, Mike Lewis, Yulia Tsvetkov, Luke Zettlemoyer, Scott Wen-tau Yih

    Abstract: Language models (LMs) often struggle to pay enough attention to the input context, and generate texts that are unfaithful or contain hallucinations. To mitigate this issue, we present context-aware decoding (CAD), which follows a contrastive output distribution that amplifies the difference between the output probabilities when a model is used with and without context. Our experiments show that CA… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

  32. arXiv:2305.14251  [pdf, other

    cs.CL cs.AI cs.LG

    FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation

    Authors: Sewon Min, Kalpesh Krishna, Xinxi Lyu, Mike Lewis, Wen-tau Yih, Pang Wei Koh, Mohit Iyyer, Luke Zettlemoyer, Hannaneh Hajishirzi

    Abstract: Evaluating the factuality of long-form text generated by large language models (LMs) is non-trivial because (1) generations often contain a mixture of supported and unsupported pieces of information, making binary judgments of quality inadequate, and (2) human evaluation is time-consuming and costly. In this paper, we introduce FACTSCORE, a new evaluation that breaks a generation into a series of… ▽ More

    Submitted 11 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: 25 pages; 7 figures. Published as a main conference paper at EMNLP 2023. Code available at https://github.com/shmsw25/FActScore

  33. arXiv:2305.11206  [pdf, other

    cs.CL cs.AI cs.LG

    LIMA: Less Is More for Alignment

    Authors: Chunting Zhou, Pengfei Liu, Puxin Xu, Srini Iyer, Jiao Sun, Yuning Mao, Xuezhe Ma, Avia Efrat, ** Yu, Lili Yu, Susan Zhang, Gargi Ghosh, Mike Lewis, Luke Zettlemoyer, Omer Levy

    Abstract: Large language models are trained in two stages: (1) unsupervised pretraining from raw text, to learn general-purpose representations, and (2) large scale instruction tuning and reinforcement learning, to better align to end tasks and user preferences. We measure the relative importance of these two stages by training LIMA, a 65B parameter LLaMa language model fine-tuned with the standard supervis… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

  34. arXiv:2305.07185  [pdf, other

    cs.LG

    MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers

    Authors: Lili Yu, Dániel Simig, Colin Flaherty, Armen Aghajanyan, Luke Zettlemoyer, Mike Lewis

    Abstract: Autoregressive transformers are spectacular models for short sequences but scale poorly to long sequences such as high-resolution images, podcasts, code, or books. We proposed Megabyte, a multi-scale decoder architecture that enables end-to-end differentiable modeling of sequences of over one million bytes. Megabyte segments sequences into patches and uses a local submodel within patches and a glo… ▽ More

    Submitted 19 May, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

  35. arXiv:2305.03937  [pdf, other

    cs.CL cs.AI

    Residual Prompt Tuning: Improving Prompt Tuning with Residual Reparameterization

    Authors: Anastasia Razdaibiedina, Yuning Mao, Rui Hou, Madian Khabsa, Mike Lewis, Jimmy Ba, Amjad Almahairi

    Abstract: Prompt tuning is one of the successful approaches for parameter-efficient tuning of pre-trained language models. Despite being arguably the most parameter-efficient (tuned soft prompts constitute <0.1% of total parameters), it typically performs worse than other efficient tuning methods and is quite sensitive to hyper-parameters. In this work, we introduce Residual Prompt Tuning - a simple and eff… ▽ More

    Submitted 6 May, 2023; originally announced May 2023.

    Comments: ACL Findings 2023

  36. arXiv:2303.14177  [pdf, other

    cs.CL cs.AI

    Scaling Expert Language Models with Unsupervised Domain Discovery

    Authors: Suchin Gururangan, Margaret Li, Mike Lewis, Weijia Shi, Tim Althoff, Noah A. Smith, Luke Zettlemoyer

    Abstract: Large language models are typically trained densely: all parameters are updated with respect to all inputs. This requires synchronization of billions of parameters across thousands of GPUs. We introduce a simple but effective method to asynchronously train large, sparse language models on arbitrary text corpora. Our method clusters a corpus into sets of related documents, trains a separate expert… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

  37. arXiv:2301.12652  [pdf, other

    cs.CL

    REPLUG: Retrieval-Augmented Black-Box Language Models

    Authors: Weijia Shi, Sewon Min, Michihiro Yasunaga, Minjoon Seo, Rich James, Mike Lewis, Luke Zettlemoyer, Wen-tau Yih

    Abstract: We introduce REPLUG, a retrieval-augmented language modeling framework that treats the language model (LM) as a black box and augments it with a tuneable retrieval model. Unlike prior retrieval-augmented LMs that train language models with special cross attention mechanisms to encode the retrieved text, REPLUG simply prepends retrieved documents to the input for the frozen black-box LM. This simpl… ▽ More

    Submitted 24 May, 2023; v1 submitted 29 January, 2023; originally announced January 2023.

  38. arXiv:2301.12314  [pdf, other

    cs.CL cs.AI cs.LG

    Progressive Prompts: Continual Learning for Language Models

    Authors: Anastasia Razdaibiedina, Yuning Mao, Rui Hou, Madian Khabsa, Mike Lewis, Amjad Almahairi

    Abstract: We introduce Progressive Prompts - a simple and efficient approach for continual learning in language models. Our method allows forward transfer and resists catastrophic forgetting, without relying on data replay or a large number of task-specific parameters. Progressive Prompts learns a new soft prompt for each task and sequentially concatenates it with the previously learned prompts, while keepi… ▽ More

    Submitted 28 January, 2023; originally announced January 2023.

  39. arXiv:2212.10537  [pdf, other

    cs.CV cs.AI cs.CL

    Does CLIP Bind Concepts? Probing Compositionality in Large Image Models

    Authors: Martha Lewis, Nihal V. Nayak, Peilin Yu, Qinan Yu, Jack Merullo, Stephen H. Bach, Ellie Pavlick

    Abstract: Large-scale neural network models combining text and images have made incredible progress in recent years. However, it remains an open question to what extent such models encode compositional representations of the concepts over which they operate, such as correctly identifying ''red cube'' by reasoning over the constituents ''red'' and ''cube''. In this work, we focus on the ability of a large pr… ▽ More

    Submitted 29 March, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

  40. arXiv:2212.08195  [pdf, other

    cs.CL

    Improving Chess Commentaries by Combining Language Models with Symbolic Reasoning Engines

    Authors: Andrew Lee, David Wu, Emily Dinan, Mike Lewis

    Abstract: Despite many recent advancements in language modeling, state-of-the-art language models lack grounding in the real world and struggle with tasks involving complex reasoning. Meanwhile, advances in the symbolic reasoning capabilities of AI have led to systems that outperform humans in games like chess and Go (Silver et al., 2018). Chess commentary provides an interesting domain for bridging these t… ▽ More

    Submitted 15 December, 2022; originally announced December 2022.

  41. arXiv:2212.02437  [pdf, other

    cs.CL

    In-context Examples Selection for Machine Translation

    Authors: Sweta Agrawal, Chunting Zhou, Mike Lewis, Luke Zettlemoyer, Marjan Ghazvininejad

    Abstract: Large-scale generative models show an impressive ability to perform a wide range of Natural Language Processing (NLP) tasks using in-context learning, where a few examples are used to describe a task to the model. For Machine Translation (MT), these examples are typically randomly sampled from the development dataset with a similar distribution as the evaluation set. However, it is unclear how the… ▽ More

    Submitted 5 December, 2022; originally announced December 2022.

    Comments: 14 pages; 4 figures; 16 tables

  42. arXiv:2212.01349  [pdf, other

    cs.CL cs.AI cs.LG

    Nonparametric Masked Language Modeling

    Authors: Sewon Min, Weijia Shi, Mike Lewis, Xilun Chen, Wen-tau Yih, Hannaneh Hajishirzi, Luke Zettlemoyer

    Abstract: Existing language models (LMs) predict tokens with a softmax over a finite vocabulary, which can make it difficult to predict rare tokens or phrases. We introduce NPM, the first nonparametric masked language model that replaces this softmax with a nonparametric distribution over every phrase in a reference corpus. NPM fills in the [MASK] solely from retrieving a token from a text corpus. We show t… ▽ More

    Submitted 25 May, 2023; v1 submitted 2 December, 2022; originally announced December 2022.

    Comments: 20 pages; 9 figures. Published at ACL 2023 Findings. Code available at https://github.com/facebookresearch/NPM

  43. arXiv:2211.16490  [pdf, other

    cs.LG cs.CL cs.PL cs.SE

    Coder Reviewer Reranking for Code Generation

    Authors: Tianyi Zhang, Tao Yu, Tatsunori B. Hashimoto, Mike Lewis, Wen-tau Yih, Daniel Fried, Sida I. Wang

    Abstract: Sampling diverse programs from a code language model and reranking with model likelihood is a popular method for code generation but it is prone to preferring degenerate solutions. Inspired by collaborative programming, we propose Coder-Reviewer reranking. We augment Coder language models from past work, which generate programs given language instructions, with Reviewer models, which evaluate the… ▽ More

    Submitted 29 November, 2022; originally announced November 2022.

  44. arXiv:2211.12615  [pdf, other

    cs.CL cs.AI

    AutoReply: Detecting Nonsense in Dialogue Introspectively with Discriminative Replies

    Authors: Weiyan Shi, Emily Dinan, Adi Renduchintala, Daniel Fried, Athul Paul Jacob, Zhou Yu, Mike Lewis

    Abstract: Existing approaches built separate classifiers to detect nonsense in dialogues. In this paper, we show that without external classifiers, dialogue models can detect errors in their own messages introspectively, by calculating the likelihood of replies that are indicative of poor messages. For example, if an agent believes its partner is likely to respond "I don't understand" to a candidate message… ▽ More

    Submitted 22 November, 2022; originally announced November 2022.

  45. arXiv:2211.12561  [pdf, other

    cs.CV cs.CL cs.LG

    Retrieval-Augmented Multimodal Language Modeling

    Authors: Michihiro Yasunaga, Armen Aghajanyan, Weijia Shi, Rich James, Jure Leskovec, Percy Liang, Mike Lewis, Luke Zettlemoyer, Wen-tau Yih

    Abstract: Recent multimodal models such as DALL-E and CM3 have achieved remarkable progress in text-to-image and image-to-text generation. However, these models store all learned knowledge (e.g., the appearance of the Eiffel Tower) in the model parameters, requiring increasingly larger models and training data to capture more knowledge. To integrate knowledge in a more scalable and modular way, we propose a… ▽ More

    Submitted 5 June, 2023; v1 submitted 22 November, 2022; originally announced November 2022.

    Comments: Published at ICML 2023. Blog post available at https://cs.stanford.edu/~myasu/blog/racm3/

  46. arXiv:2211.02892  [pdf, other

    cs.CV

    SizeGAN: Improving Size Representation in Clothing Catalogs

    Authors: Kathleen M. Lewis, John Guttag

    Abstract: Online clothing catalogs lack diversity in body shape and garment size. Brands commonly display their garments on models of one or two sizes, rarely including plus-size models. To our knowledge, our paper presents the first method for generating images of garments and models in a new target size to tackle the size under-representation problem. Our primary technical contribution is a conditional ge… ▽ More

    Submitted 26 June, 2023; v1 submitted 5 November, 2022; originally announced November 2022.

  47. arXiv:2210.15097  [pdf, other

    cs.CL cs.AI cs.LG

    Contrastive Decoding: Open-ended Text Generation as Optimization

    Authors: Xiang Lisa Li, Ari Holtzman, Daniel Fried, Percy Liang, Jason Eisner, Tatsunori Hashimoto, Luke Zettlemoyer, Mike Lewis

    Abstract: Given a language model (LM), maximum probability is a poor decoding objective for open-ended generation, because it produces short and repetitive text. On the other hand, sampling can often produce incoherent text that drifts from the original topics. We propose contrastive decoding (CD), a reliable decoding approach that optimizes a contrastive objective subject to a plausibility constraint. The… ▽ More

    Submitted 10 July, 2023; v1 submitted 26 October, 2022; originally announced October 2022.

    Comments: Main conference long paper at ACL 2023

  48. arXiv:2210.03350  [pdf, other

    cs.CL

    Measuring and Narrowing the Compositionality Gap in Language Models

    Authors: Ofir Press, Muru Zhang, Sewon Min, Ludwig Schmidt, Noah A. Smith, Mike Lewis

    Abstract: We investigate the ability of language models to perform compositional reasoning tasks where the overall solution depends on correctly composing the answers to sub-problems. We measure how often models can correctly answer all sub-problems but not generate the overall solution, a ratio we call the compositionality gap. We evaluate this ratio by asking multi-hop questions with answers that require… ▽ More

    Submitted 17 October, 2023; v1 submitted 7 October, 2022; originally announced October 2022.

    Comments: To appear at Findings of EMNLP 2023

  49. arXiv:2208.07339  [pdf, other

    cs.LG cs.AI

    LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

    Authors: Tim Dettmers, Mike Lewis, Younes Belkada, Luke Zettlemoyer

    Abstract: Large language models have been widely adopted but require significant GPU memory for inference. We develop a procedure for Int8 matrix multiplication for feed-forward and attention projection layers in transformers, which cut the memory needed for inference by half while retaining full precision performance. With our method, a 175B parameter 16/32-bit checkpoint can be loaded, converted to Int8,… ▽ More

    Submitted 10 November, 2022; v1 submitted 15 August, 2022; originally announced August 2022.

    Comments: Published at NeurIPS 2022. Camera-ready version

  50. arXiv:2208.03306  [pdf, other

    cs.CL

    Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models

    Authors: Margaret Li, Suchin Gururangan, Tim Dettmers, Mike Lewis, Tim Althoff, Noah A. Smith, Luke Zettlemoyer

    Abstract: We present Branch-Train-Merge (BTM), a communication-efficient algorithm for embarrassingly parallel training of large language models (LLMs). We show it is possible to independently train subparts of a new class of LLMs on different subsets of the data, eliminating the massive multi-node synchronization currently required to train LLMs. BTM learns a set of independent expert LMs (ELMs), each spec… ▽ More

    Submitted 5 August, 2022; originally announced August 2022.