Skip to main content

Showing 1–8 of 8 results for author: Lott, C

.
  1. arXiv:2406.06647  [pdf, other

    cs.SE cs.AI cs.LG

    How Efficient is LLM-Generated Code? A Rigorous & High-Standard Benchmark

    Authors: Ruizhong Qiu, Weiliang Will Zeng, Hanghang Tong, James Ezick, Christopher Lott

    Abstract: The emergence of large language models (LLMs) has significantly pushed the frontiers of program synthesis. Advancement of LLM-based program synthesis calls for a thorough evaluation of LLM-generated code. Most evaluation frameworks focus on the (functional) correctness of generated code; efficiency, as an important measure of code quality, has been overlooked in existing evaluations. In this work,… ▽ More

    Submitted 16 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

  2. arXiv:2404.08856  [pdf, other

    cs.CL cs.AI cs.LG

    On Speculative Decoding for Multimodal Large Language Models

    Authors: Mukul Gagrani, Raghavv Goel, Wonseok Jeon, Junyoung Park, Mingu Lee, Christopher Lott

    Abstract: Inference with Multimodal Large Language Models (MLLMs) is slow due to their large-language-model backbone which suffers from memory bandwidth bottleneck and generates tokens auto-regressively. In this paper, we explore the application of speculative decoding to enhance the inference efficiency of MLLMs, specifically the LLaVA 7B model. We show that a language-only model can serve as a good draft… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: Accepted as a spotlight paper to ELVM workshop at CVPR 2024

  3. arXiv:2403.00858  [pdf, other

    cs.LG cs.AI cs.CL

    Direct Alignment of Draft Model for Speculative Decoding with Chat-Fine-Tuned LLMs

    Authors: Raghavv Goel, Mukul Gagrani, Wonseok Jeon, Junyoung Park, Mingu Lee, Christopher Lott

    Abstract: Text generation with Large Language Models (LLMs) is known to be memory bound due to the combination of their auto-regressive nature, huge parameter counts, and limited memory bandwidths, often resulting in low token rates. Speculative decoding has been proposed as a solution for LLM inference acceleration. However, since draft models are often unavailable in the modern open-source LLM families, e… ▽ More

    Submitted 13 May, 2024; v1 submitted 29 February, 2024; originally announced March 2024.

    Comments: 8 pages, 3 figures, Published at the ICLR 2024 Workshop on Understanding of Foundation Models (ME-FoMo)

  4. arXiv:2402.14160  [pdf, other

    cs.LG cs.AI

    Recursive Speculative Decoding: Accelerating LLM Inference via Sampling Without Replacement

    Authors: Wonseok Jeon, Mukul Gagrani, Raghavv Goel, Junyoung Park, Mingu Lee, Christopher Lott

    Abstract: Speculative decoding is an inference-acceleration method for large language models (LLMs) where a small language model generates a draft-token sequence which is further verified by the target LLM in parallel. Recent works have advanced this method by establishing a draft-token tree, achieving superior performance over a single-sequence speculative decoding. However, those works independently gener… ▽ More

    Submitted 5 March, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

    Comments: 82 pages, 9 figures, 54 tables

  5. arXiv:2304.14463  [pdf, other

    cs.LG cs.AI

    Moccasin: Efficient Tensor Rematerialization for Neural Networks

    Authors: Burak Bartan, Haoming Li, Harris Teague, Christopher Lott, Bistra Dilkina

    Abstract: The deployment and training of neural networks on edge computing devices pose many challenges. The low memory nature of edge devices is often one of the biggest limiting factors encountered in the deployment of large neural network models. Tensor rematerialization or recompute is a way to address high memory requirements for neural network training and inference. In this paper we consider the prob… ▽ More

    Submitted 30 May, 2023; v1 submitted 27 April, 2023; originally announced April 2023.

  6. arXiv:2207.05899  [pdf, other

    cs.LG

    Neural Topological Ordering for Computation Graphs

    Authors: Mukul Gagrani, Corrado Rainone, Yang Yang, Harris Teague, Wonseok Jeon, Herke Van Hoof, Weiliang Will Zeng, Piero Zappi, Christopher Lott, Roberto Bondesan

    Abstract: Recent works on machine learning for combinatorial optimization have shown that learning based approaches can outperform heuristic methods in terms of speed and performance. In this paper, we consider the problem of finding an optimal topological order on a directed acyclic graph with focus on the memory minimization problem which arises in compilers. We propose an end-to-end machine learning base… ▽ More

    Submitted 7 October, 2022; v1 submitted 12 July, 2022; originally announced July 2022.

    Comments: To appear in NeurIPS 2022

  7. arXiv:2012.08859  [pdf, other

    cs.LG cs.AI cs.CV cs.NE stat.ML

    Distilling Optimal Neural Networks: Rapid Search in Diverse Spaces

    Authors: Bert Moons, Parham Noorzad, Andrii Skliar, Giovanni Mariani, Dushyant Mehta, Chris Lott, Tijmen Blankevoort

    Abstract: Current state-of-the-art Neural Architecture Search (NAS) methods neither efficiently scale to multiple hardware platforms, nor handle diverse architectural search-spaces. To remedy this, we present DONNA (Distilling Optimal Neural Network Architectures), a novel pipeline for rapid, scalable and diverse NAS, that scales to many user scenarios. DONNA consists of three phases. First, an accuracy pre… ▽ More

    Submitted 27 August, 2021; v1 submitted 16 December, 2020; originally announced December 2020.

    Comments: Accepted at ICCV2021. Main text 9 pages, Full text 21 pages, 18 figures

  8. arXiv:1811.06096  [pdf, other

    cs.CL cs.SD eess.AS

    Automatic Grammar Augmentation for Robust Voice Command Recognition

    Authors: Yang Yang, Anusha Lalitha, **won Lee, Chris Lott

    Abstract: This paper proposes a novel pipeline for automatic grammar augmentation that provides a significant improvement in the voice command recognition accuracy for systems with small footprint acoustic model (AM). The improvement is achieved by augmenting the user-defined voice command set, also called grammar set, with alternate grammar expressions. For a given grammar set, a set of potential grammar e… ▽ More

    Submitted 14 November, 2018; originally announced November 2018.