Skip to main content

Showing 1–50 of 164 results for author: Gonzalez, J E

.
  1. arXiv:2406.18665  [pdf, other

    cs.LG cs.AI cs.CL

    RouteLLM: Learning to Route LLMs with Preference Data

    Authors: Isaac Ong, Amjad Almahairi, Vincent Wu, Wei-Lin Chiang, Tianhao Wu, Joseph E. Gonzalez, M Waleed Kadous, Ion Stoica

    Abstract: Large language models (LLMs) exhibit impressive capabilities across a wide range of tasks, yet the choice of which model to use often involves a trade-off between performance and cost. More powerful models, though effective, come with higher expenses, while less capable models are more cost-effective. To address this dilemma, we propose several efficient router models that dynamically select betwe… ▽ More

    Submitted 1 July, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

  2. arXiv:2406.11939  [pdf, other

    cs.LG cs.AI cs.CL

    From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline

    Authors: Tianle Li, Wei-Lin Chiang, Evan Frick, Lisa Dunlap, Tianhao Wu, Banghua Zhu, Joseph E. Gonzalez, Ion Stoica

    Abstract: The rapid evolution of language models has necessitated the development of more challenging benchmarks. Current static benchmarks often struggle to consistently distinguish between the capabilities of different models and fail to align with real-world user preferences. On the other hand, live crowd-sourced platforms like the Chatbot Arena collect a wide range of natural prompts and user feedback.… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  3. arXiv:2406.04271  [pdf, other

    cs.CL

    Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models

    Authors: Ling Yang, Zhaochen Yu, Tianjun Zhang, Shiyi Cao, Minkai Xu, Wentao Zhang, Joseph E. Gonzalez, Bin Cui

    Abstract: We introduce Buffer of Thoughts (BoT), a novel and versatile thought-augmented reasoning approach for enhancing accuracy, efficiency and robustness of large language models (LLMs). Specifically, we propose meta-buffer to store a series of informative high-level thoughts, namely thought-template, distilled from the problem-solving processes across various tasks. Then for each problem, we retrieve a… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Project: https://github.com/YangLing0818/buffer-of-thought-llm

  4. arXiv:2406.03636  [pdf, other

    cs.PL cs.LG

    Synthetic Programming Elicitation and Repair for Text-to-Code in Very Low-Resource Programming Languages

    Authors: Federico Mora, Justin Wong, Haley Lepe, Sahil Bhatia, Karim Elmaaroufi, George Varghese, Joseph E. Gonzalez, Elizabeth Polgreen, Sanjit A. Seshia

    Abstract: Recent advances in large language models (LLMs) for code applications have demonstrated remarkable zero-shot fluency and instruction following on challenging code related tasks ranging from test case generation to self-repair. Unsurprisingly, however, models struggle to compose syntactically valid programs in programming languages unrepresented in pre-training, referred to as very low-resource Pro… ▽ More

    Submitted 29 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

    Comments: 15 pages, 6 figures, 1 table

  5. arXiv:2405.13665  [pdf, other

    astro-ph.CO

    Unveiling the Hubble Constant through Galaxy Cluster Gas Mass Fractions

    Authors: Javier E. Gonzalez, Marcelo Ferreira, Leorando R. Colaço, Rodrigo F. L. Holanda, Rafael C. Nunes

    Abstract: In this work, we obtain Hubble constant ($H_0$) estimates by using two galaxy cluster gas mass fraction measurement samples, Type Ia supernovae luminosity distances and the validity of the cosmic distance duality relation. Notably, the angular diameter distance (ADD) to each galaxy cluster in the samples is determined by combining its gas mass fraction measurement with galaxy clustering observatio… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 9 pages, 5 figures

  6. arXiv:2404.18928  [pdf, other

    cs.CV cs.AI cs.CL cs.GR cs.LG

    Stylus: Automatic Adapter Selection for Diffusion Models

    Authors: Michael Luo, Justin Wong, Brandon Trabucco, Yan** Huang, Joseph E. Gonzalez, Zhifeng Chen, Ruslan Salakhutdinov, Ion Stoica

    Abstract: Beyond scaling base models with more data or parameters, fine-tuned adapters provide an alternative way to generate high fidelity, custom images at reduced costs. As such, adapters have been widely adopted by open-source communities, accumulating a database of over 100K adapters-most of which are highly customized with insufficient descriptions. This paper explores the problem of matching the prom… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: Project Website: https://stylus-diffusion.github.io

  7. arXiv:2404.07979  [pdf, other

    cs.CL cs.AI cs.LG

    LLoCO: Learning Long Contexts Offline

    Authors: Sijun Tan, Xiuyu Li, Shishir Patil, Ziyang Wu, Tianjun Zhang, Kurt Keutzer, Joseph E. Gonzalez, Raluca Ada Popa

    Abstract: Processing long contexts remains a challenge for large language models (LLMs) due to the quadratic computational and memory overhead of the self-attention mechanism and the substantial KV cache sizes during generation. We propose a novel approach to address this problem by learning contexts offline through context compression and in-domain parameter-efficient finetuning. Our method enables an LLM… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: The first two authors contributed equally to this work

  8. arXiv:2404.06921  [pdf, other

    cs.CL cs.AI

    GoEX: Perspectives and Designs Towards a Runtime for Autonomous LLM Applications

    Authors: Shishir G. Patil, Tianjun Zhang, Vivian Fang, Noppapon C., Roy Huang, Aaron Hao, Martin Casado, Joseph E. Gonzalez, Raluca Ada Popa, Ion Stoica

    Abstract: Large Language Models (LLMs) are evolving beyond their classical role of providing information within dialogue systems to actively engaging with tools and performing actions on real-world applications and services. Today, humans verify the correctness and appropriateness of the LLM-generated outputs (e.g., code, functions, or actions) before putting them into real-world execution. This poses signi… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  9. arXiv:2404.02904  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    ALOHa: A New Measure for Hallucination in Captioning Models

    Authors: Suzanne Petryk, David M. Chan, Anish Kachinthaya, Haodi Zou, John Canny, Joseph E. Gonzalez, Trevor Darrell

    Abstract: Despite recent advances in multimodal pre-training for visual description, state-of-the-art models still produce captions containing errors, such as hallucinating objects not present in a scene. The existing prominent metric for object hallucination, CHAIR, is limited to a fixed set of MS COCO objects and synonyms. In this work, we propose a modernized open-vocabulary metric, ALOHa, which leverage… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: To appear at NAACL 2024

  10. arXiv:2403.10131  [pdf, other

    cs.CL cs.AI

    RAFT: Adapting Language Model to Domain Specific RAG

    Authors: Tianjun Zhang, Shishir G. Patil, Naman Jain, Sheng Shen, Matei Zaharia, Ion Stoica, Joseph E. Gonzalez

    Abstract: Pretraining Large Language Models (LLMs) on large corpora of textual data is now a standard paradigm. When using these LLMs for many downstream applications, it is common to additionally bake in new knowledge (e.g., time-critical news, or private domain knowledge) into the pretrained model either through RAG-based-prompting, or fine-tuning. However, the optimal methodology for the model to gain su… ▽ More

    Submitted 5 June, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

  11. arXiv:2403.05821  [pdf, other

    cs.LG cs.DB

    Optimizing LLM Queries in Relational Workloads

    Authors: Shu Liu, Asim Biswal, Audrey Cheng, Xiangxi Mo, Shiyi Cao, Joseph E. Gonzalez, Ion Stoica, Matei Zaharia

    Abstract: Analytical database providers (e.g., Redshift, Databricks, BigQuery) have rapidly added support for invoking Large Language Models (LLMs) through native user-defined functions (UDFs) to help users perform natural language tasks, such as classification, entity extraction, and translation, inside analytical workloads. For instance, an analyst might want to extract customer sentiments on millions of… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  12. arXiv:2403.04132  [pdf, other

    cs.AI cs.CL

    Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference

    Authors: Wei-Lin Chiang, Lianmin Zheng, Ying Sheng, Anastasios Nikolas Angelopoulos, Tianle Li, Dacheng Li, Hao Zhang, Banghua Zhu, Michael Jordan, Joseph E. Gonzalez, Ion Stoica

    Abstract: Large Language Models (LLMs) have unlocked new capabilities and applications; however, evaluating the alignment with human preferences still poses significant challenges. To address this issue, we introduce Chatbot Arena, an open platform for evaluating LLMs based on human preferences. Our methodology employs a pairwise comparison approach and leverages input from a diverse user base through crowd… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

  13. arXiv:2401.00588  [pdf, other

    cs.AI cs.LG cs.PF

    Fairness in Serving Large Language Models

    Authors: Ying Sheng, Shiyi Cao, Dacheng Li, Banghua Zhu, Zhuohan Li, Danyang Zhuo, Joseph E. Gonzalez, Ion Stoica

    Abstract: High-demand LLM inference services (e.g., ChatGPT and BARD) support a wide range of requests from short chat conversations to long document reading. To ensure that all client requests are processed fairly, most major LLM inference services have request rate limits, to ensure that no client can dominate the request queue. However, this rudimentary notion of fairness also results in under-utilizatio… ▽ More

    Submitted 5 June, 2024; v1 submitted 31 December, 2023; originally announced January 2024.

  14. arXiv:2312.08366  [pdf, other

    cs.CV

    See, Say, and Segment: Teaching LMMs to Overcome False Premises

    Authors: Tsung-Han Wu, Giscard Biamby, David Chan, Lisa Dunlap, Ritwik Gupta, Xudong Wang, Joseph E. Gonzalez, Trevor Darrell

    Abstract: Current open-source Large Multimodal Models (LMMs) excel at tasks such as open-vocabulary language grounding and segmentation but can suffer under false premises when queries imply the existence of something that is not actually present in the image. We observe that existing methods that fine-tune an LMM to segment images significantly degrade their ability to reliably determine ("see") if an obje… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

    Comments: Project Page: https://see-say-segment.github.io

  15. arXiv:2312.07104  [pdf, other

    cs.AI cs.PL

    SGLang: Efficient Execution of Structured Language Model Programs

    Authors: Lianmin Zheng, Liangsheng Yin, Zhiqiang Xie, Chuyue Sun, Jeff Huang, Cody Hao Yu, Shiyi Cao, Christos Kozyrakis, Ion Stoica, Joseph E. Gonzalez, Clark Barrett, Ying Sheng

    Abstract: Large language models (LLMs) are increasingly used for complex tasks that require multiple generation calls, advanced prompting techniques, control flow, and structured inputs/outputs. However, efficient systems are lacking for programming and executing these applications. We introduce SGLang, a system for efficient execution of complex language model programs. SGLang consists of a frontend langua… ▽ More

    Submitted 5 June, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

  16. arXiv:2312.02974  [pdf, other

    cs.CV cs.CL cs.CY cs.LG

    Describing Differences in Image Sets with Natural Language

    Authors: Lisa Dunlap, Yuhui Zhang, Xiaohan Wang, Ruiqi Zhong, Trevor Darrell, Jacob Steinhardt, Joseph E. Gonzalez, Serena Yeung-Levy

    Abstract: How do two sets of images differ? Discerning set-level differences is crucial for understanding model behaviors and analyzing datasets, yet manually sifting through thousands of images is impractical. To aid in this discovery process, we explore the task of automatically describing the differences between two $\textbf{sets}$ of images, which we term Set Difference Captioning. This task takes in im… ▽ More

    Submitted 26 April, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

    Comments: CVPR 2024 Oral

  17. arXiv:2311.16090  [pdf, other

    cs.CV

    Self-correcting LLM-controlled Diffusion Models

    Authors: Tsung-Han Wu, Long Lian, Joseph E. Gonzalez, Boyi Li, Trevor Darrell

    Abstract: Text-to-image generation has witnessed significant progress with the advent of diffusion models. Despite the ability to generate photorealistic images, current text-to-image diffusion models still often struggle to accurately interpret and follow complex input text prompts. In contrast to existing models that aim to generate images only with their best effort, we introduce Self-correcting LLM-cont… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: 16 pages, 10 figures

  18. arXiv:2311.14904  [pdf, other

    cs.LG cs.SE

    LLM-Assisted Code Cleaning For Training Accurate Code Generators

    Authors: Naman Jain, Tianjun Zhang, Wei-Lin Chiang, Joseph E. Gonzalez, Koushik Sen, Ion Stoica

    Abstract: Natural language to code generation is an important application area of LLMs and has received wide attention from the community. The majority of relevant studies have exclusively concentrated on increasing the quantity and functional correctness of training sets while disregarding other stylistic elements of programs. More recently, data quality has garnered a lot of interest and multiple works ha… ▽ More

    Submitted 24 November, 2023; originally announced November 2023.

  19. arXiv:2311.04850  [pdf, other

    cs.CL cs.AI

    Rethinking Benchmark and Contamination for Language Models with Rephrased Samples

    Authors: Shuo Yang, Wei-Lin Chiang, Lianmin Zheng, Joseph E. Gonzalez, Ion Stoica

    Abstract: Large language models are increasingly trained on all the data ever produced by humans. Many have raised concerns about the trustworthiness of public benchmarks due to potential contamination in pre-training or fine-tuning datasets. While most data decontamination efforts apply string matching (e.g., n-gram overlap) to remove benchmark data, we show that these methods are insufficient, and simple… ▽ More

    Submitted 11 November, 2023; v1 submitted 8 November, 2023; originally announced November 2023.

  20. arXiv:2311.03285  [pdf, other

    cs.LG cs.AI cs.DC

    S-LoRA: Serving Thousands of Concurrent LoRA Adapters

    Authors: Ying Sheng, Shiyi Cao, Dacheng Li, Coleman Hooper, Nicholas Lee, Shuo Yang, Christopher Chou, Banghua Zhu, Lianmin Zheng, Kurt Keutzer, Joseph E. Gonzalez, Ion Stoica

    Abstract: The "pretrain-then-finetune" paradigm is commonly adopted in the deployment of large language models. Low-Rank Adaptation (LoRA), a parameter-efficient fine-tuning method, is often employed to adapt a base model to a multitude of tasks, resulting in a substantial collection of LoRA adapters derived from one base model. We observe that this paradigm presents significant opportunities for batched in… ▽ More

    Submitted 5 June, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

  21. arXiv:2311.01491  [pdf, other

    physics.chem-ph cond-mat.mtrl-sci cs.LG physics.comp-ph

    Investigating the Behavior of Diffusion Models for Accelerating Electronic Structure Calculations

    Authors: Daniel Rothchild, Andrew S. Rosen, Eric Taw, Connie Robinson, Joseph E. Gonzalez, Aditi S. Krishnapriyan

    Abstract: We present an investigation into diffusion models for molecular generation, with the aim of better understanding how their predictions compare to the results of physics-based calculations. The investigation into these models is driven by their potential to significantly accelerate electronic structure calculations using machine learning, without requiring expensive first-principles datasets for tr… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

  22. arXiv:2310.18711  [pdf, other

    astro-ph.CO gr-qc

    A Hubble Constant Estimate from Galaxy Cluster and type Ia SNe Observations

    Authors: L. R. Colaço, M. S. Ferreira, R. F. L. Holanda, J. E. Gonzalez, Rafael C. Nunes

    Abstract: In this work, we constrain the Hubble constant parameter, $H_0$, using a combination of the Pantheon sample and galaxy clusters (GC) measurements from minimal cosmological assumptions. Assuming the validity of the cosmic distance duality relation, an estimator is created for $H_0$ that only depends on simple geometrical distances, which is evaluated from Pantheon and a GC angular diameter distance… ▽ More

    Submitted 24 May, 2024; v1 submitted 28 October, 2023; originally announced October 2023.

    Comments: 8 pages, two figures

    Journal ref: JCAP05(2024)098

  23. arXiv:2310.12971  [pdf, other

    cs.CV cs.AI cs.CL

    CLAIR: Evaluating Image Captions with Large Language Models

    Authors: David Chan, Suzanne Petryk, Joseph E. Gonzalez, Trevor Darrell, John Canny

    Abstract: The evaluation of machine-generated image captions poses an interesting yet persistent challenge. Effective evaluation measures must consider numerous dimensions of similarity, including semantic relevance, visual structure, object interactions, caption diversity, and specificity. Existing highly-engineered measures attempt to capture specific aspects, but fall short in providing a holistic score… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: To Appear at EMNLP 2023

  24. arXiv:2310.08560  [pdf, other

    cs.AI

    MemGPT: Towards LLMs as Operating Systems

    Authors: Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G. Patil, Ion Stoica, Joseph E. Gonzalez

    Abstract: Large language models (LLMs) have revolutionized AI, but are constrained by limited context windows, hindering their utility in tasks like extended conversations and document analysis. To enable using context beyond limited context windows, we propose virtual context management, a technique drawing inspiration from hierarchical memory systems in traditional operating systems that provide the appea… ▽ More

    Submitted 12 February, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

    Comments: Code and data available at https://research.memgpt.ai

  25. arXiv:2310.03294  [pdf, other

    cs.LG cs.AI cs.DC

    DISTFLASHATTN: Distributed Memory-efficient Attention for Long-context LLMs Training

    Authors: Dacheng Li, Rulin Shao, Anze Xie, Eric P. Xing, Xuezhe Ma, Ion Stoica, Joseph E. Gonzalez, Hao Zhang

    Abstract: FlashAttention (Dao, 2023) effectively reduces the quadratic peak memory usage to linear in training transformer-based large language models (LLMs) on a single GPU. In this paper, we introduce DISTFLASHATTN, a distributed memory-efficient attention mechanism optimized for long-context LLMs training. We propose three key techniques: token-level workload balancing, overlap** key-value communicatio… ▽ More

    Submitted 31 March, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

  26. arXiv:2309.11998  [pdf, other

    cs.CL cs.AI

    LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset

    Authors: Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Tianle Li, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zhuohan Li, Zi Lin, Eric P. Xing, Joseph E. Gonzalez, Ion Stoica, Hao Zhang

    Abstract: Studying how people interact with large language models (LLMs) in real-world scenarios is increasingly important due to their widespread use in various applications. In this paper, we introduce LMSYS-Chat-1M, a large-scale dataset containing one million real-world conversations with 25 state-of-the-art LLMs. This dataset is collected from 210K unique IP addresses in the wild on our Vicuna demo and… ▽ More

    Submitted 10 March, 2024; v1 submitted 21 September, 2023; originally announced September 2023.

  27. arXiv:2309.06180  [pdf, other

    cs.LG cs.DC

    Efficient Memory Management for Large Language Model Serving with PagedAttention

    Authors: Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, Ion Stoica

    Abstract: High throughput serving of large language models (LLMs) requires batching sufficiently many requests at a time. However, existing systems struggle because the key-value cache (KV cache) memory for each request is huge and grows and shrinks dynamically. When managed inefficiently, this memory can be significantly wasted by fragmentation and redundant duplication, limiting the batch size. To address… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

    Comments: SOSP 2023

  28. arXiv:2308.03204  [pdf, other

    cs.RO

    Leveraging Cloud Computing to Make Autonomous Vehicles Safer

    Authors: Peter Schafhalter, Sukrit Kalra, Le Xu, Joseph E. Gonzalez, Ion Stoica

    Abstract: The safety of autonomous vehicles (AVs) depends on their ability to perform complex computations on high-volume sensor data in a timely manner. Their ability to run these computations with state-of-the-art models is limited by the processing power and slow update cycles of their onboard hardware. In contrast, cloud computing offers the ability to burst computation to vast amounts of the latest gen… ▽ More

    Submitted 6 August, 2023; originally announced August 2023.

    Comments: IROS 2023 (to appear); 8 pages, 7 figures, 2 tables

  29. arXiv:2306.05685  [pdf, other

    cs.CL cs.AI

    Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

    Authors: Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric P. Xing, Hao Zhang, Joseph E. Gonzalez, Ion Stoica

    Abstract: Evaluating large language model (LLM) based chat assistants is challenging due to their broad capabilities and the inadequacy of existing benchmarks in measuring human preferences. To address this, we explore using strong LLMs as judges to evaluate these models on more open-ended questions. We examine the usage and limitations of LLM-as-a-judge, including position, verbosity, and self-enhancement… ▽ More

    Submitted 23 December, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023 Datasets and Benchmarks Track

  30. arXiv:2305.16289  [pdf, other

    cs.CV cs.AI

    Diversify Your Vision Datasets with Automatic Diffusion-Based Augmentation

    Authors: Lisa Dunlap, Alyssa Umino, Han Zhang, Jiezhi Yang, Joseph E. Gonzalez, Trevor Darrell

    Abstract: Many fine-grained classification tasks, like rare animal identification, have limited training data and consequently classifiers trained on these datasets often fail to generalize to variations in the domain like changes in weather or location. As such, we explore how natural language descriptions of the domains seen in training data can be used with large vision models trained on diverse pretrain… ▽ More

    Submitted 29 October, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: Update: replaced Planes dataset with Waterbirds & updated results after bug fix

  31. arXiv:2305.15334  [pdf, other

    cs.CL cs.AI

    Gorilla: Large Language Model Connected with Massive APIs

    Authors: Shishir G. Patil, Tianjun Zhang, Xin Wang, Joseph E. Gonzalez

    Abstract: Large Language Models (LLMs) have seen an impressive wave of advances recently, with models now excelling in a variety of tasks, such as mathematical reasoning and program synthesis. However, their potential to effectively use tools via API calls remains unfulfilled. This is a challenging task even for today's state-of-the-art LLMs such as GPT-4, largely due to their inability to generate accurate… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

  32. arXiv:2305.15053  [pdf, other

    cs.CL cs.IR

    Decomposing Complex Queries for Tip-of-the-tongue Retrieval

    Authors: Kevin Lin, Kyle Lo, Joseph E. Gonzalez, Dan Klein

    Abstract: When re-finding items, users who forget or are uncertain about identifying details often rely on creative strategies for expressing their information needs -- complex queries that describe content elements (e.g., book characters or events), information beyond the document text (e.g., descriptions of book covers), or personal context (e.g., when they read a book). This retrieval setting, called tip… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

  33. arXiv:2305.07021  [pdf, other

    cs.CV

    Simple Token-Level Confidence Improves Caption Correctness

    Authors: Suzanne Petryk, Spencer Whitehead, Joseph E. Gonzalez, Trevor Darrell, Anna Rohrbach, Marcus Rohrbach

    Abstract: The ability to judge whether a caption correctly describes an image is a critical part of vision-language understanding. However, state-of-the-art models often misinterpret the correctness of fine-grained details, leading to errors in outputs such as hallucinating objects in generated captions or poor compositional reasoning. In this work, we explore Token-Level Confidence, or TLC, as a simple yet… ▽ More

    Submitted 11 May, 2023; originally announced May 2023.

  34. arXiv:2303.06865  [pdf, other

    cs.LG cs.AI cs.PF

    FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU

    Authors: Ying Sheng, Lianmin Zheng, Binhang Yuan, Zhuohan Li, Max Ryabinin, Daniel Y. Fu, Zhiqiang Xie, Beidi Chen, Clark Barrett, Joseph E. Gonzalez, Percy Liang, Christopher Ré, Ion Stoica, Ce Zhang

    Abstract: The high computational and memory requirements of large language model (LLM) inference make it feasible only with multiple high-end accelerators. Motivated by the emerging demand for latency-insensitive tasks with batched processing, this paper initiates the study of high-throughput LLM inference using limited resources, such as a single commodity GPU. We present FlexGen, a high-throughput generat… ▽ More

    Submitted 12 June, 2023; v1 submitted 13 March, 2023; originally announced March 2023.

  35. arXiv:2302.11665  [pdf, other

    cs.LG cs.DC cs.NI

    AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving

    Authors: Zhuohan Li, Lianmin Zheng, Yinmin Zhong, Vincent Liu, Ying Sheng, Xin **, Yan** Huang, Zhifeng Chen, Hao Zhang, Joseph E. Gonzalez, Ion Stoica

    Abstract: Model parallelism is conventionally viewed as a method to scale a single large deep learning model beyond the memory limits of a single device. In this paper, we demonstrate that model parallelism can be additionally used for the statistical multiplexing of multiple devices when serving multiple models, even when a single model can fit into a single device. Our work reveals a fundamental trade-off… ▽ More

    Submitted 19 July, 2023; v1 submitted 22 February, 2023; originally announced February 2023.

    Comments: OSDI 2023

  36. arXiv:2302.05206  [pdf, other

    cs.CL cs.AI

    The Wisdom of Hindsight Makes Language Models Better Instruction Followers

    Authors: Tianjun Zhang, Fangchen Liu, Justin Wong, Pieter Abbeel, Joseph E. Gonzalez

    Abstract: Reinforcement learning has seen wide success in finetuning large language models to better align with instructions via human feedback. The so-called algorithm, Reinforcement Learning with Human Feedback (RLHF) demonstrates impressive performance on the GPT series models. However, the underlying Reinforcement Learning (RL) algorithm is complex and requires an additional training pipeline for reward… ▽ More

    Submitted 10 February, 2023; originally announced February 2023.

  37. arXiv:2211.11890  [pdf, other

    cs.CL cs.AI

    TEMPERA: Test-Time Prompting via Reinforcement Learning

    Authors: Tianjun Zhang, Xuezhi Wang, Denny Zhou, Dale Schuurmans, Joseph E. Gonzalez

    Abstract: Careful prompt design is critical to the use of large language models in zero-shot or few-shot learning. As a consequence, there is a growing interest in automated methods to design optimal prompts. In this work, we propose Test-time Prompt Editing using Reinforcement learning (TEMPERA). In contrast to prior prompt generation methods, TEMPERA can efficiently leverage prior knowledge, is adaptive t… ▽ More

    Submitted 21 November, 2022; originally announced November 2022.

  38. arXiv:2211.11720  [pdf, other

    cs.CV cs.CL

    Multitask Vision-Language Prompt Tuning

    Authors: Sheng Shen, Shijia Yang, Tianjun Zhang, Bohan Zhai, Joseph E. Gonzalez, Kurt Keutzer, Trevor Darrell

    Abstract: Prompt Tuning, conditioning on task-specific learned prompt vectors, has emerged as a data-efficient and parameter-efficient method for adapting large pretrained vision-language models to multiple downstream tasks. However, existing approaches usually consider learning prompt vectors for each task independently from scratch, thereby failing to exploit the rich shareable knowledge across different… ▽ More

    Submitted 5 December, 2022; v1 submitted 21 November, 2022; originally announced November 2022.

    Comments: Preprint

  39. arXiv:2211.05322  [pdf, other

    cs.LG cs.DC

    On Optimizing the Communication of Model Parallelism

    Authors: Yonghao Zhuang, Hexu Zhao, Lianmin Zheng, Zhuohan Li, Eric P. Xing, Qirong Ho, Joseph E. Gonzalez, Ion Stoica, Hao Zhang

    Abstract: We study a novel and important communication pattern in large-scale model-parallel deep learning (DL), which we call cross-mesh resharding. This pattern emerges when the two paradigms of model parallelism - intra-operator and inter-operator parallelism - are combined to support large models on large clusters. In cross-mesh resharding, a sharded tensor needs to be sent from a source device mesh to… ▽ More

    Submitted 9 November, 2022; originally announced November 2022.

  40. arXiv:2210.09520  [pdf, other

    cs.CV

    Using Language to Extend to Unseen Domains

    Authors: Lisa Dunlap, Clara Mohri, Devin Guillory, Han Zhang, Trevor Darrell, Joseph E. Gonzalez, Aditi Raghunathan, Anja Rohrbach

    Abstract: It is expensive to collect training data for every possible domain that a vision model may encounter when deployed. We instead consider how simply verbalizing the training domain (e.g. "photos of birds") as well as domains we want to extend to but do not have data for (e.g. "paintings of birds") can improve robustness. Using a multimodal model with a joint image and language embedding space, our m… ▽ More

    Submitted 29 April, 2023; v1 submitted 17 October, 2022; originally announced October 2022.

  41. arXiv:2210.07259  [pdf, other

    cs.NI cs.DC

    Skyplane: Optimizing Transfer Cost and Throughput Using Cloud-Aware Overlays

    Authors: Paras Jain, Sam Kumar, Sarah Wooders, Shishir G. Patil, Joseph E. Gonzalez, Ion Stoica

    Abstract: Cloud applications are increasingly distributing data across multiple regions and cloud providers. Unfortunately, wide-area bulk data transfers are often slow, bottlenecking applications. We demonstrate that it is possible to significantly improve inter-region cloud bulk transfer throughput by adapting network overlays to the cloud setting -- that is, by routing data through indirect paths at the… ▽ More

    Submitted 13 October, 2022; originally announced October 2022.

    Comments: To appear at NSDI 2023

  42. arXiv:2208.09515  [pdf, other

    cs.LG stat.ML

    Spectral Decomposition Representation for Reinforcement Learning

    Authors: Tongzheng Ren, Tianjun Zhang, Lisa Lee, Joseph E. Gonzalez, Dale Schuurmans, Bo Dai

    Abstract: Representation learning often plays a critical role in reinforcement learning by managing the curse of dimensionality. A representative class of algorithms exploits a spectral decomposition of the stochastic transition dynamics to construct representations that enjoy strong theoretical properties in an idealized setting. However, current spectral methods suffer from limited applicability because t… ▽ More

    Submitted 7 March, 2023; v1 submitted 19 August, 2022; originally announced August 2022.

    Comments: ICLR 2023. The first two authors contribute equally

  43. arXiv:2207.07813  [pdf, other

    cs.RO cs.AI

    Autonomously Untangling Long Cables

    Authors: Vainavi Viswanath, Kaushik Shivakumar, Justin Kerr, Brijen Thananjeyan, Ellen Novoseller, Jeffrey Ichnowski, Alejandro Escontrela, Michael Laskey, Joseph E. Gonzalez, Ken Goldberg

    Abstract: Cables are ubiquitous in many settings and it is often useful to untangle them. However, cables are prone to self-occlusions and knots, making them difficult to perceive and manipulate. The challenge increases with cable length: long cables require more complex slack management to facilitate observability and reachability. In this paper, we focus on autonomously untangling cables up to 3 meters in… ▽ More

    Submitted 31 July, 2022; v1 submitted 15 July, 2022; originally announced July 2022.

  44. arXiv:2207.07697  [pdf, other

    cs.LG cs.CV cs.DC stat.ML

    POET: Training Neural Networks on Tiny Devices with Integrated Rematerialization and Paging

    Authors: Shishir G. Patil, Paras Jain, Prabal Dutta, Ion Stoica, Joseph E. Gonzalez

    Abstract: Fine-tuning models on edge devices like mobile phones would enable privacy-preserving personalization over sensitive data. However, edge training has historically been limited to relatively small models with simple architectures because training is both memory and energy intensive. We present POET, an algorithm to enable training large neural networks on memory-scarce battery-operated edge devices… ▽ More

    Submitted 15 July, 2022; originally announced July 2022.

    Comments: Proceedings of the 39th International Conference on Machine Learning 2022 (ICML 2022)

  45. arXiv:2207.07150  [pdf, other

    cs.LG stat.ML

    Making Linear MDPs Practical via Contrastive Representation Learning

    Authors: Tianjun Zhang, Tongzheng Ren, Mengjiao Yang, Joseph E. Gonzalez, Dale Schuurmans, Bo Dai

    Abstract: It is common to address the curse of dimensionality in Markov decision processes (MDPs) by exploiting low-rank representations. This motivates much of the recent theoretical study on linear MDPs. However, most approaches require a given representation under unrealistic assumptions about the normalization of the decomposition or introduce unresolved computational challenges in practice. Instead, we… ▽ More

    Submitted 7 December, 2022; v1 submitted 14 July, 2022; originally announced July 2022.

    Comments: ICML 2022. The first two authors contribute equally

  46. arXiv:2206.10341  [pdf, other

    cs.CR cs.AI cs.LG

    Neurotoxin: Durable Backdoors in Federated Learning

    Authors: Zhengming Zhang, Ashwinee Panda, Linyue Song, Yaoqing Yang, Michael W. Mahoney, Joseph E. Gonzalez, Kannan Ramchandran, Prateek Mittal

    Abstract: Due to their decentralized nature, federated learning (FL) systems have an inherent vulnerability during their training to adversarial backdoor attacks. In this type of attack, the goal of the attacker is to use poisoned updates to implant so-called backdoors into the learned model such that, at test time, the model's outputs can be fixed to a given target for certain inputs. (As a simple toy exam… ▽ More

    Submitted 12 June, 2022; originally announced June 2022.

    Comments: Appears in ICML 2022

  47. arXiv:2205.07147  [pdf

    cs.DC

    The Sky Above The Clouds

    Authors: Sarah Chasins, Alvin Cheung, Natacha Crooks, Ali Ghodsi, Ken Goldberg, Joseph E. Gonzalez, Joseph M. Hellerstein, Michael I. Jordan, Anthony D. Joseph, Michael W. Mahoney, Aditya Parameswaran, David Patterson, Raluca Ada Popa, Koushik Sen, Scott Shenker, Dawn Song, Ion Stoica

    Abstract: Technology ecosystems often undergo significant transformations as they mature. For example, telephony, the Internet, and PCs all started with a single provider, but in the United States each is now served by a competitive market that uses comprehensive and universal technology standards to provide compatibility. This white paper presents our view on how the cloud ecosystem, barely over fifteen ye… ▽ More

    Submitted 14 May, 2022; originally announced May 2022.

    Comments: 35 pages

  48. Constraining a possible time-variation of the speed of light along with the fine-structure constant using strong gravitational lensing and Type Ia supernovae observations

    Authors: L. R. Colaço, S. J. Landau, J. E. Gonzalez, J. Spinelly, G. L. F. Santos

    Abstract: The possible time variation of the fundamental constants of nature has been an active subject of research since the large-number hypothesis was proposed by Dirac. In this paper, we propose a new method to investigate a possible time variation of the speed of light ($c$) along with the fine-structure constant ($α$) using Strong Gravitational Lensing (SGL) and Type Ia Supernovae (SNe Ia) observation… ▽ More

    Submitted 26 August, 2022; v1 submitted 13 April, 2022; originally announced April 2022.

    Comments: 10 Pages, 4 Figures, and 1 Table. Published in JCAP

    Journal ref: JCAP08(2022)062

  49. arXiv:2203.04566  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    All You Need is LUV: Unsupervised Collection of Labeled Images using Invisible UV Fluorescent Indicators

    Authors: Brijen Thananjeyan, Justin Kerr, Huang Huang, Joseph E. Gonzalez, Ken Goldberg

    Abstract: Large-scale semantic image annotation is a significant challenge for learning-based perception systems in robotics. Current approaches often rely on human labelers, which can be expensive, or simulation data, which can visually or physically differ from real data. This paper proposes Labels from UltraViolet (LUV), a novel framework that enables rapid, labeled data collection in real manipulation e… ▽ More

    Submitted 13 March, 2022; v1 submitted 9 March, 2022; originally announced March 2022.

  50. arXiv:2202.02842  [pdf, other

    cs.CL cs.LG

    Evaluating natural language processing models with generalization metrics that do not need access to any training or testing data

    Authors: Yaoqing Yang, Ryan Theisen, Liam Hodgkinson, Joseph E. Gonzalez, Kannan Ramchandran, Charles H. Martin, Michael W. Mahoney

    Abstract: Selecting suitable architecture parameters and training hyperparameters is essential for enhancing machine learning (ML) model performance. Several recent empirical studies conduct large-scale correlational analysis on neural networks (NNs) to search for effective \emph{generalization metrics} that can guide this type of model selection. Effective metrics are typically expected to correlate strong… ▽ More

    Submitted 4 June, 2023; v1 submitted 6 February, 2022; originally announced February 2022.

    Journal ref: Proceedings of the 29th ACM SIGKDD international conference on knowledge discovery and data mining (2023)