Skip to main content

Showing 1–50 of 81 results for author: Zaharia, M

.
  1. arXiv:2406.11695  [pdf, other

    cs.CL cs.AI cs.LG

    Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs

    Authors: Krista Opsahl-Ong, Michael J Ryan, Josh Purtell, David Broman, Christopher Potts, Matei Zaharia, Omar Khattab

    Abstract: Language Model Programs, i.e. sophisticated pipelines of modular language model (LM) calls, are increasingly advancing NLP tasks, but they require crafting prompts that are jointly effective for all modules. We study prompt optimization for LM programs, i.e. how to update these prompts to maximize a downstream metric without access to module-level labels or gradients. To make this tractable, we fa… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Krista and Michael contributed equally to this work

  2. arXiv:2405.03709  [pdf, other

    cs.SE cs.AI cs.LG cs.PL

    Generating Probabilistic Scenario Programs from Natural Language

    Authors: Karim Elmaaroufi, Devan Shanker, Ana Cismaru, Marcell Vazquez-Chanlatte, Alberto Sangiovanni-Vincentelli, Matei Zaharia, Sanjit A. Seshia

    Abstract: For cyber-physical systems (CPS), including robotics and autonomous vehicles, mass deployment has been hindered by fatal errors that occur when operating in rare events. To replicate rare events such as vehicle crashes, many companies have created logging systems and employed crash reconstruction experts to meticulously recreate these valuable events in simulation. However, in these methods, "what… ▽ More

    Submitted 14 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

    Comments: 17 pages, 2 figures

  3. arXiv:2403.10131  [pdf, other

    cs.CL cs.AI

    RAFT: Adapting Language Model to Domain Specific RAG

    Authors: Tianjun Zhang, Shishir G. Patil, Naman Jain, Sheng Shen, Matei Zaharia, Ion Stoica, Joseph E. Gonzalez

    Abstract: Pretraining Large Language Models (LLMs) on large corpora of textual data is now a standard paradigm. When using these LLMs for many downstream applications, it is common to additionally bake in new knowledge (e.g., time-critical news, or private domain knowledge) into the pretrained model either through RAG-based-prompting, or fine-tuning. However, the optimal methodology for the model to gain su… ▽ More

    Submitted 5 June, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

  4. arXiv:2403.05821  [pdf, other

    cs.LG cs.DB

    Optimizing LLM Queries in Relational Workloads

    Authors: Shu Liu, Asim Biswal, Audrey Cheng, Xiangxi Mo, Shiyi Cao, Joseph E. Gonzalez, Ion Stoica, Matei Zaharia

    Abstract: Analytical database providers (e.g., Redshift, Databricks, BigQuery) have rapidly added support for invoking Large Language Models (LLMs) through native user-defined functions (UDFs) to help users perform natural language tasks, such as classification, entity extraction, and translation, inside analytical workloads. For instance, an analyst might want to extract customer sentiments on millions of… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  5. arXiv:2403.04871  [pdf, other

    cs.IR cs.DB

    ACORN: Performant and Predicate-Agnostic Search Over Vector Embeddings and Structured Data

    Authors: Liana Patel, Peter Kraft, Carlos Guestrin, Matei Zaharia

    Abstract: Applications increasingly leverage mixed-modality data, and must jointly search over vector data, such as embedded images, text and video, as well as structured data, such as attributes and keywords. Proposed methods for this hybrid search setting either suffer from poor performance or support a severely restricted set of search predicates (e.g., only small sets of equality predicates), making the… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  6. arXiv:2403.04311  [pdf, other

    cs.AI cs.CL cs.DC cs.IR

    ALTO: An Efficient Network Orchestrator for Compound AI Systems

    Authors: Keshav Santhanam, Deepti Raghavan, Muhammad Shahir Rahman, Thejas Venkatesh, Neha Kunjal, Pratiksha Thaker, Philip Levis, Matei Zaharia

    Abstract: We present ALTO, a network orchestrator for efficiently serving compound AI systems such as pipelines of language models. ALTO achieves high throughput and low latency by taking advantage of an optimization opportunity specific to generative language models: streaming intermediate outputs. As language models produce outputs token by token, ALTO exposes opportunities to stream intermediate outputs… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  7. arXiv:2403.02419  [pdf, other

    cs.LG cs.AI cs.CL eess.SY

    Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems

    Authors: Lingjiao Chen, Jared Quincy Davis, Boris Hanin, Peter Bailis, Ion Stoica, Matei Zaharia, James Zou

    Abstract: Many recent state-of-the-art results in language tasks were achieved using compound systems that perform multiple Language Model (LM) calls and aggregate their responses. However, there is little understanding of how the number of LM calls - e.g., when asking the LM to answer each question multiple times and taking a majority vote - affects such a compound system's performance. In this paper, we i… ▽ More

    Submitted 4 June, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

  8. arXiv:2402.08268  [pdf, other

    cs.LG

    World Model on Million-Length Video And Language With Blockwise RingAttention

    Authors: Hao Liu, Wilson Yan, Matei Zaharia, Pieter Abbeel

    Abstract: Current language models fall short in understanding aspects of the world not easily described in words, and struggle with complex, long-form tasks. Video sequences offer valuable temporal information absent in language and static images, making them attractive for joint modeling with language. Such models could develop a understanding of both human textual knowledge and the physical world, enablin… ▽ More

    Submitted 14 March, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

  9. arXiv:2312.13382  [pdf, ps, other

    cs.CL cs.AI cs.PL

    DSPy Assertions: Computational Constraints for Self-Refining Language Model Pipelines

    Authors: Arnav Singhvi, Manish Shetty, Shangyin Tan, Christopher Potts, Koushik Sen, Matei Zaharia, Omar Khattab

    Abstract: Chaining language model (LM) calls as composable modules is fueling a new way of programming, but ensuring LMs adhere to important constraints requires heuristic "prompt engineering". We introduce LM Assertions, a programming construct for expressing computational constraints that LMs should satisfy. We integrate our constructs into the recent DSPy programming model for LMs, and present new strate… ▽ More

    Submitted 2 February, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

    Comments: Arnav*, Manish*, Shangyin* contributed equally to this work

  10. arXiv:2312.05468  [pdf

    cs.AI cond-mat.mtrl-sci cs.CV cs.IR

    Image and Data Mining in Reticular Chemistry Using GPT-4V

    Authors: Zhiling Zheng, Zhiguo He, Omar Khattab, Nakul Rampal, Matei A. Zaharia, Christian Borgs, Jennifer T. Chayes, Omar M. Yaghi

    Abstract: The integration of artificial intelligence into scientific research has reached a new pinnacle with GPT-4V, a large language model featuring enhanced vision capabilities, accessible through ChatGPT or an API. This study demonstrates the remarkable ability of GPT-4V to navigate and obtain complex data for metal-organic frameworks, especially from graphical sources. Our approach involved an automate… ▽ More

    Submitted 9 December, 2023; originally announced December 2023.

    Comments: 36 pages, 24 figures

  11. arXiv:2311.13712  [pdf, other

    cs.AI

    Data Acquisition: A New Frontier in Data-centric AI

    Authors: Lingjiao Chen, Bilge Acun, Newsha Ardalani, Yifan Sun, Feiyang Kang, Hanrui Lyu, Yongchan Kwon, Ruoxi Jia, Carole-Jean Wu, Matei Zaharia, James Zou

    Abstract: As Machine Learning (ML) systems continue to grow, the demand for relevant and comprehensive datasets becomes imperative. There is limited study on the challenges of data acquisition due to ad-hoc processes and lack of consistent methodologies. We first present an investigation of current data marketplaces, revealing lack of platforms offering detailed information about datasets, transparent prici… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

  12. arXiv:2311.09476  [pdf, other

    cs.CL cs.AI cs.IR

    ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems

    Authors: Jon Saad-Falcon, Omar Khattab, Christopher Potts, Matei Zaharia

    Abstract: Evaluating retrieval-augmented generation (RAG) systems traditionally relies on hand annotations for input queries, passages to retrieve, and responses to generate. We introduce ARES, an Automated RAG Evaluation System, for evaluating RAG systems along the dimensions of context relevance, answer faithfulness, and answer relevance. By creating its own synthetic training data, ARES finetunes lightwe… ▽ More

    Submitted 31 March, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

    Comments: NAACL 2024

  13. arXiv:2310.08899  [pdf, other

    cs.CL

    Exploration with Principles for Diverse AI Supervision

    Authors: Hao Liu, Matei Zaharia, Pieter Abbeel

    Abstract: Training large transformers using next-token prediction has given rise to groundbreaking advancements in AI. While this generative AI approach has produced impressive results, it heavily leans on human supervision. Even state-of-the-art AI models like ChatGPT depend on fine-tuning through human demonstrations, demanding extensive human input and domain expertise. This strong reliance on human over… ▽ More

    Submitted 23 November, 2023; v1 submitted 13 October, 2023; originally announced October 2023.

  14. arXiv:2310.03714  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines

    Authors: Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, Zhiyuan Zhang, Keshav Santhanam, Sri Vardhamanan, Saiful Haq, Ashutosh Sharma, Thomas T. Joshi, Hanna Moazam, Heather Miller, Matei Zaharia, Christopher Potts

    Abstract: The ML community is rapidly exploring techniques for prompting language models (LMs) and for stacking them into pipelines that solve complex tasks. Unfortunately, existing LM pipelines are typically implemented using hard-coded "prompt templates", i.e. lengthy strings discovered via trial and error. Toward a more systematic approach for develo** and optimizing LM pipelines, we introduce DSPy, a… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

  15. arXiv:2310.01889  [pdf, other

    cs.CL

    Ring Attention with Blockwise Transformers for Near-Infinite Context

    Authors: Hao Liu, Matei Zaharia, Pieter Abbeel

    Abstract: Transformers have emerged as the architecture of choice for many state-of-the-art AI models, showcasing exceptional performance across a wide range of AI applications. However, the memory demands imposed by Transformers limit their ability to handle long sequences, thereby posing challenges in utilizing videos, actions, and other long-form sequences and modalities in complex environments. We prese… ▽ More

    Submitted 27 November, 2023; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: Code: https://github.com/lhao499/llm_large_context

  16. Accelerating Aggregation Queries on Unstructured Streams of Data

    Authors: Matthew Russo, Tatsunori Hashimoto, Daniel Kang, Yi Sun, Matei Zaharia

    Abstract: Analysts and scientists are interested in querying streams of video, audio, and text to extract quantitative insights. For example, an urban planner may wish to measure congestion by querying the live feed from a traffic camera. Prior work has used deep neural networks (DNNs) to answer such queries in the batch setting. However, much of this work is not suited for the streaming setting because it… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

    Comments: 14 pages, 11 figures, to be published in Proceedings of the VLDB Endowment, Vol. 16, No. 11

    Journal ref: PVLDB, 16(11): 2897 - 2910, 2023

  17. arXiv:2307.09009  [pdf, other

    cs.CL cs.AI cs.LG

    How is ChatGPT's behavior changing over time?

    Authors: Lingjiao Chen, Matei Zaharia, James Zou

    Abstract: GPT-3.5 and GPT-4 are the two most widely used large language model (LLM) services. However, when and how these models are updated over time is opaque. Here, we evaluate the March 2023 and June 2023 versions of GPT-3.5 and GPT-4 on several diverse tasks: 1) math problems, 2) sensitive/dangerous questions, 3) opinion surveys, 4) multi-hop knowledge-intensive questions, 5) generating code, 6) US Med… ▽ More

    Submitted 31 October, 2023; v1 submitted 18 July, 2023; originally announced July 2023.

    Comments: add more evaluations on instruction following

  18. arXiv:2305.05176  [pdf, other

    cs.LG cs.AI cs.CL cs.SE

    FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance

    Authors: Lingjiao Chen, Matei Zaharia, James Zou

    Abstract: There is a rapidly growing number of large language models (LLMs) that users can query for a fee. We review the cost associated with querying popular LLM APIs, e.g. GPT-4, ChatGPT, J1-Jumbo, and find that these models have heterogeneous pricing structures, with fees that can differ by two orders of magnitude. In particular, using LLMs on large collections of queries and text can be expensive. Moti… ▽ More

    Submitted 9 May, 2023; originally announced May 2023.

  19. arXiv:2305.03785  [pdf, other

    cs.DB

    Zelda: Video Analytics using Vision-Language Models

    Authors: Francisco Romero, Caleb Winston, Johann Hauswald, Matei Zaharia, Christos Kozyrakis

    Abstract: Advances in ML have motivated the design of video analytics systems that allow for structured queries over video datasets. However, existing systems limit query expressivity, require users to specify an ML model per predicate, rely on complex optimizations that trade off accuracy for performance, and return large amounts of redundant and low-quality results. This paper focuses on the recently deve… ▽ More

    Submitted 7 November, 2023; v1 submitted 5 May, 2023; originally announced May 2023.

  20. arXiv:2302.05733  [pdf, other

    cs.CR cs.LG

    Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks

    Authors: Daniel Kang, Xuechen Li, Ion Stoica, Carlos Guestrin, Matei Zaharia, Tatsunori Hashimoto

    Abstract: Recent advances in instruction-following large language models (LLMs) have led to dramatic improvements in a range of NLP tasks. Unfortunately, we find that the same improved capabilities amplify the dual-use risks for malicious purposes of these models. Dual-use is difficult to prevent as instruction-following capabilities now enable standard attacks from computer security. The capabilities of th… ▽ More

    Submitted 11 February, 2023; originally announced February 2023.

  21. arXiv:2212.14161  [pdf, other

    cs.DB cs.DC cs.SE

    Transactions Make Debugging Easy

    Authors: Qian Li, Peter Kraft, Michael Cafarella, Çağatay Demiralp, Goetz Graefe, Christos Kozyrakis, Michael Stonebraker, Lalith Suresh, Matei Zaharia

    Abstract: We propose TROD, a novel transaction-oriented framework for debugging modern distributed web applications and online services. Our critical insight is that if applications store all state in databases and only access state transactionally, TROD can use lightweight always-on tracing to track the history of application state changes and data provenance, and then leverage the captured traces and tran… ▽ More

    Submitted 28 December, 2022; originally announced December 2022.

    Comments: CIDR'23

  22. arXiv:2212.14024  [pdf, other

    cs.CL cs.IR

    Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP

    Authors: Omar Khattab, Keshav Santhanam, Xiang Lisa Li, David Hall, Percy Liang, Christopher Potts, Matei Zaharia

    Abstract: Retrieval-augmented in-context learning has emerged as a powerful approach for addressing knowledge-intensive tasks using frozen language models (LM) and retrieval models (RM). Existing work has combined these in simple "retrieve-then-read" pipelines in which the RM retrieves passages that are inserted into the LM prompt. To begin to fully realize the potential of frozen LMs and RMs, we propose De… ▽ More

    Submitted 23 January, 2023; v1 submitted 28 December, 2022; originally announced December 2022.

  23. arXiv:2212.01340  [pdf, other

    cs.IR cs.CL

    Moving Beyond Downstream Task Accuracy for Information Retrieval Benchmarking

    Authors: Keshav Santhanam, Jon Saad-Falcon, Martin Franz, Omar Khattab, Avirup Sil, Radu Florian, Md Arafat Sultan, Salim Roukos, Matei Zaharia, Christopher Potts

    Abstract: Neural information retrieval (IR) systems have progressed rapidly in recent years, in large part due to the release of publicly available benchmarking tasks. Unfortunately, some dimensions of this progress are illusory: the majority of the popular IR benchmarks today focus exclusively on downstream task accuracy and thus conceal the costs incurred by systems that trade away efficiency for quality.… ▽ More

    Submitted 2 December, 2022; originally announced December 2022.

  24. arXiv:2211.15841  [pdf, other

    cs.LG cs.AI cs.DC

    MegaBlocks: Efficient Sparse Training with Mixture-of-Experts

    Authors: Trevor Gale, Deepak Narayanan, Cliff Young, Matei Zaharia

    Abstract: We present MegaBlocks, a system for efficient Mixture-of-Experts (MoE) training on GPUs. Our system is motivated by the limitations of current frameworks, which restrict the dynamic routing in MoE layers to satisfy the constraints of existing software and hardware. These formulations force a tradeoff between model quality and hardware efficiency, as users must choose between drop** tokens from t… ▽ More

    Submitted 28 November, 2022; originally announced November 2022.

  25. arXiv:2209.08443  [pdf, other

    cs.SE cs.AI cs.DB cs.LG cs.PF

    HAPI: A Large-scale Longitudinal Dataset of Commercial ML API Predictions

    Authors: Lingjiao Chen, Zhihua **, Sabri Eyuboglu, Christopher Ré, Matei Zaharia, James Zou

    Abstract: Commercial ML APIs offered by providers such as Google, Amazon and Microsoft have dramatically simplified ML adoption in many applications. Numerous companies and academics pay to use ML APIs for tasks such as object detection, OCR and sentiment analysis. Different ML APIs tackling the same task can have very heterogeneous performance. Moreover, the ML models underlying the APIs also evolve over t… ▽ More

    Submitted 17 September, 2022; originally announced September 2022.

    Comments: Preprint, to appear in NeurIPS 2022

  26. arXiv:2209.08436  [pdf, other

    stat.ML cs.AI cs.LG stat.AP

    Estimating and Explaining Model Performance When Both Covariates and Labels Shift

    Authors: Lingjiao Chen, Matei Zaharia, James Zou

    Abstract: Deployed machine learning (ML) models often encounter new user data that differs from their training data. Therefore, estimating how well a given model might perform on the new data is an important step toward reliable ML applications. This is very challenging, however, as the data distribution can change in flexible ways, and we may not have any labels on the new data, which is often the case in… ▽ More

    Submitted 17 September, 2022; originally announced September 2022.

    Comments: Accepted to NeurIPS 2022

  27. arXiv:2208.13068  [pdf, other

    cs.DB cs.DC

    Apiary: A DBMS-Integrated Transactional Function-as-a-Service Framework

    Authors: Peter Kraft, Qian Li, Kostis Kaffes, Athinagoras Skiadopoulos, Deeptaanshu Kumar, Danny Cho, Jason Li, Robert Redmond, Nathan Weckwerth, Brian Xia, Peter Bailis, Michael Cafarella, Goetz Graefe, Jeremy Kepner, Christos Kozyrakis, Michael Stonebraker, Lalith Suresh, Xiangyao Yu, Matei Zaharia

    Abstract: Developers increasingly use function-as-a-service (FaaS) platforms for data-centric applications that perform low-latency and transactional operations on data, such as for microservices or web serving. Unfortunately, existing FaaS platforms support these applications poorly because they physically and logically separate application logic, executed in cloud functions, from data management, done in… ▽ More

    Submitted 30 June, 2023; v1 submitted 27 August, 2022; originally announced August 2022.

    Comments: 14 pages, 13 figures, 3 tables. Preprint

  28. arXiv:2205.09707  [pdf, other

    cs.IR cs.CL

    PLAID: An Efficient Engine for Late Interaction Retrieval

    Authors: Keshav Santhanam, Omar Khattab, Christopher Potts, Matei Zaharia

    Abstract: Pre-trained language models are increasingly important components across multiple information retrieval (IR) paradigms. Late interaction, introduced with the ColBERT model and recently refined in ColBERTv2, is a popular paradigm that holds state-of-the-art status across many benchmarks. To dramatically speed up the search latency of late interaction, we introduce the Performance-optimized Late Int… ▽ More

    Submitted 19 May, 2022; originally announced May 2022.

    Comments: Preprint. Omar and Keshav contributed equally to this work

  29. arXiv:2204.13737  [pdf, other

    cs.CR

    Extricating IoT Devices from Vendor Infrastructure with Karl

    Authors: Gina Yuan, David Mazières, Matei Zaharia

    Abstract: Most consumer IoT devices are vertically integrated with cloud-side infrastructure. Such architectures present enormous risk to user data, exacerbated by vendor heterogeneity and the inability for users to audit cloud-side activity. A more promising approach would be to leverage local hardware, providing users control over how their data is processed and why it can be shared with other devices or… ▽ More

    Submitted 31 May, 2023; v1 submitted 28 April, 2022; originally announced April 2022.

  30. arXiv:2201.05797  [pdf, other

    cs.DB

    Finding Label and Model Errors in Perception Data With Learned Observation Assertions

    Authors: Daniel Kang, Nikos Arechiga, Sudeep Pillai, Peter Bailis, Matei Zaharia

    Abstract: ML is being deployed in complex, real-world scenarios where errors have impactful consequences. In these systems, thorough testing of the ML pipelines is critical. A key component in ML deployment pipelines is the curation of labeled training data. Common practice in the ML literature assumes that labels are the ground truth. However, in our experience in a large autonomous vehicle development cen… ▽ More

    Submitted 15 January, 2022; originally announced January 2022.

    Journal ref: SIGMOD 2022

  31. arXiv:2112.06439  [pdf, other

    cs.LG cs.DB

    What can Data-Centric AI Learn from Data and ML Engineering?

    Authors: Neoklis Polyzotis, Matei Zaharia

    Abstract: Data-centric AI is a new and exciting research topic in the AI community, but many organizations already build and maintain various "data-centric" applications whose goal is to produce high quality data. These range from traditional business data processing applications (e.g., "how much should we charge each of our customers this month?") to production ML systems such as recommendation engines. Th… ▽ More

    Submitted 13 December, 2021; originally announced December 2021.

  32. arXiv:2112.01488  [pdf, other

    cs.IR cs.CL

    ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction

    Authors: Keshav Santhanam, Omar Khattab, Jon Saad-Falcon, Christopher Potts, Matei Zaharia

    Abstract: Neural information retrieval (IR) has greatly advanced search and other knowledge-intensive language tasks. While many neural IR methods encode queries and documents into single-vector representations, late interaction models produce multi-vector representations at the granularity of each token and decompose relevance modeling into scalable token-level computations. This decomposition has been sho… ▽ More

    Submitted 10 July, 2022; v1 submitted 2 December, 2021; originally announced December 2021.

    Comments: NAACL 2022. Omar and Keshav contributed equally to this work

  33. arXiv:2111.10320  [pdf, other

    cs.CV cs.LG

    Toward Compact Parameter Representations for Architecture-Agnostic Neural Network Compression

    Authors: Yuezhou Sun, Wenlong Zhao, Lijun Zhang, Xiao Liu, Hui Guan, Matei Zaharia

    Abstract: This paper investigates deep neural network (DNN) compression from the perspective of compactly representing and storing trained parameters. We explore the previously overlooked opportunity of cross-layer architecture-agnostic representation sharing for DNN parameters. To do this, we decouple feedforward parameters from DNN architectures and leverage additive quantization, an extreme lossy compres… ▽ More

    Submitted 19 November, 2021; originally announced November 2021.

  34. arXiv:2111.05426  [pdf, other

    cs.LG cs.DC

    DistIR: An Intermediate Representation and Simulator for Efficient Neural Network Distribution

    Authors: Keshav Santhanam, Siddharth Krishna, Ryota Tomioka, Tim Harris, Matei Zaharia

    Abstract: The rapidly growing size of deep neural network (DNN) models and datasets has given rise to a variety of distribution strategies such as data, tensor-model, pipeline parallelism, and hybrid combinations thereof. Each of these strategies offers its own trade-offs and exhibits optimal performance across different models and hardware topologies. Selecting the best set of strategies for a given setup… ▽ More

    Submitted 9 November, 2021; originally announced November 2021.

  35. arXiv:2110.11927  [pdf, other

    cs.DC

    Solving Large-Scale Granular Resource Allocation Problems Efficiently with POP

    Authors: Deepak Narayanan, Fiodar Kazhamiaka, Firas Abuzaid, Peter Kraft, Akshay Agrawal, Srikanth Kandula, Stephen Boyd, Matei Zaharia

    Abstract: Resource allocation problems in many computer systems can be formulated as mathematical optimization problems. However, finding exact solutions to these problems using off-the-shelf solvers is often intractable for large problem sizes with tight SLAs, leading system designers to rely on cheap, heuristic algorithms. We observe, however, that many allocation problems are granular: they consist of a… ▽ More

    Submitted 22 October, 2021; originally announced October 2021.

    Comments: Accepted to SOSP 2021 (extended version)

  36. arXiv:2110.07752  [pdf, other

    cs.CL cs.IR

    Hindsight: Posterior-guided training of retrievers for improved open-ended generation

    Authors: Ashwin Paranjape, Omar Khattab, Christopher Potts, Matei Zaharia, Christopher D. Manning

    Abstract: Many text generation systems benefit from using a retriever to retrieve passages from a textual knowledge corpus (e.g., Wikipedia) which are then provided as additional context to the generator. For open-ended generation tasks (like generating informative utterances in conversations) many varied passages may be equally relevant and we find that existing methods that jointly train the retriever and… ▽ More

    Submitted 20 October, 2021; v1 submitted 14 October, 2021; originally announced October 2021.

  37. arXiv:2108.07258  [pdf, other

    cs.LG cs.AI cs.CY

    On the Opportunities and Risks of Foundation Models

    Authors: Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh , et al. (89 additional authors not shown)

    Abstract: AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their cap… ▽ More

    Submitted 12 July, 2022; v1 submitted 16 August, 2021; originally announced August 2021.

    Comments: Authored by the Center for Research on Foundation Models (CRFM) at the Stanford Institute for Human-Centered Artificial Intelligence (HAI). Report page with citation guidelines: https://crfm.stanford.edu/report.html

  38. Accelerating Approximate Aggregation Queries with Expensive Predicates

    Authors: Daniel Kang, John Guibas, Peter Bailis, Tatsunori Hashimoto, Yi Sun, Matei Zaharia

    Abstract: Researchers and industry analysts are increasingly interested in computing aggregation queries over large, unstructured datasets with selective predicates that are computed using expensive deep neural networks (DNNs). As these DNNs are expensive and because many applications can tolerate approximate answers, analysts are interested in accelerating these queries via approximations. Unfortunately, s… ▽ More

    Submitted 13 August, 2021; originally announced August 2021.

    Journal ref: PVLDB, 14(11): 2341 - 2354, 2021

  39. arXiv:2107.14203  [pdf, other

    stat.ML cs.AI cs.LG stat.AP

    Did the Model Change? Efficiently Assessing Machine Learning API Shifts

    Authors: Lingjiao Chen, Tracy Cai, Matei Zaharia, James Zou

    Abstract: Machine learning (ML) prediction APIs are increasingly widely used. An ML API can change over time due to model updates or retraining. This presents a key challenge in the usage of the API because it is often not clear to the user if and how the ML model has changed. Model shifts can affect downstream application performance and also create oversight issues (e.g. if consistency is desired). In thi… ▽ More

    Submitted 29 July, 2021; originally announced July 2021.

  40. arXiv:2107.12525  [pdf, ps, other

    math.ST cs.DB cs.LG stat.ML

    Proof: Accelerating Approximate Aggregation Queries with Expensive Predicates

    Authors: Daniel Kang, John Guibas, Peter Bailis, Tatsunori Hashimoto, Yi Sun, Matei Zaharia

    Abstract: Given a dataset $\mathcal{D}$, we are interested in computing the mean of a subset of $\mathcal{D}$ which matches a predicate. ABae leverages stratified sampling and proxy models to efficiently compute this statistic given a sampling budget $N$. In this document, we theoretically analyze ABae and show that the MSE of the estimate decays at rate $O(N_1^{-1} + N_2^{-1} + N_1^{1/2}N_2^{-3/2})$, where… ▽ More

    Submitted 28 July, 2021; v1 submitted 26 July, 2021; originally announced July 2021.

  41. arXiv:2104.06513  [pdf, other

    cs.DC

    Don't Give Up on Large Optimization Problems; POP Them!

    Authors: Deepak Narayanan, Fiodar Kazhamiaka, Firas Abuzaid, Peter Kraft, Matei Zaharia

    Abstract: Resource allocation problems in many computer systems can be formulated as mathematical optimization problems. However, finding exact solutions to these problems using off-the-shelf solvers in an online setting is often intractable for "hyper-scale" system sizes with tight SLAs, leading system designers to rely on cheap, heuristic algorithms. In this work, we explore an alternative approach that r… ▽ More

    Submitted 13 April, 2021; originally announced April 2021.

  42. arXiv:2104.04473  [pdf, other

    cs.CL cs.DC

    Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM

    Authors: Deepak Narayanan, Mohammad Shoeybi, Jared Casper, Patrick LeGresley, Mostofa Patwary, Vijay Anand Korthikanti, Dmitri Vainbrand, Prethvi Kashinkunti, Julie Bernauer, Bryan Catanzaro, Amar Phanishayee, Matei Zaharia

    Abstract: Large language models have led to state-of-the-art accuracies across a range of tasks. However, training these models efficiently is challenging for two reasons: a) GPU memory capacity is limited, making it impossible to fit large models on even a multi-GPU server, and b) the number of compute operations required to train these models can result in unrealistically long training times. Consequently… ▽ More

    Submitted 23 August, 2021; v1 submitted 9 April, 2021; originally announced April 2021.

    Comments: Accepted to SC 2021

  43. arXiv:2104.00282  [pdf, other

    math.OC cs.DC

    Allocation of Fungible Resources via a Fast, Scalable Price Discovery Method

    Authors: Akshay Agrawal, Stephen Boyd, Deepak Narayanan, Fiodar Kazhamiaka, Matei Zaharia

    Abstract: We consider the problem of assigning or allocating resources to a set of jobs. We consider the case when the resources are fungible, that is, the job can be done with any mix of the resources, but with different efficiencies. In our formulation we maximize a total utility subject to a given limit on the resource usage, which is a convex optimization problem and so is tractable. In this paper we de… ▽ More

    Submitted 18 April, 2021; v1 submitted 1 April, 2021; originally announced April 2021.

  44. arXiv:2102.09127  [pdf, other

    cs.LG cs.AI cs.DS math.OC

    Efficient Online ML API Selection for Multi-Label Classification Tasks

    Authors: Lingjiao Chen, Matei Zaharia, James Zou

    Abstract: Multi-label classification tasks such as OCR and multi-object recognition are a major focus of the growing machine learning as a service industry. While many multi-label prediction APIs are available, it is challenging for users to decide which API to use for their own data and budget, due to the heterogeneity in those APIs' price and performance. Recent work shows how to select from single-label… ▽ More

    Submitted 16 July, 2022; v1 submitted 17 February, 2021; originally announced February 2021.

    Comments: Accepted to ICML 2022

  45. arXiv:2101.00436  [pdf, other

    cs.CL cs.IR

    Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval

    Authors: Omar Khattab, Christopher Potts, Matei Zaharia

    Abstract: Multi-hop reasoning (i.e., reasoning across two or more documents) is a key ingredient for NLP models that leverage large corpora to exhibit broad knowledge. To retrieve evidence passages, multi-hop models must contend with a fast-growing search space across the hops, represent complex queries that combine multiple information needs, and resolve ambiguity about the best order in which to hop betwe… ▽ More

    Submitted 10 July, 2022; v1 submitted 2 January, 2021; originally announced January 2021.

    Comments: NeurIPS 2021 (Spotlight)

  46. arXiv:2009.04540  [pdf, other

    cs.DB

    Semantic Indexes for Machine Learning-based Queries over Unstructured Data

    Authors: Daniel Kang, John Guibas, Peter Bailis, Tatsunori Hashimoto, Matei Zaharia

    Abstract: Unstructured data (e.g., video or text) is now commonly queried by using computationally expensive deep neural networks or human labelers to produce structured information, e.g., object types and positions in video. To accelerate queries, many recent systems (e.g., BlazeIt, NoScope, Tahoma, SUPG, etc.) train a query-specific proxy model to approximate a large target labelers (i.e., these expensive… ▽ More

    Submitted 6 January, 2022; v1 submitted 9 September, 2020; originally announced September 2020.

    Journal ref: SIGMOD 2022

  47. arXiv:2008.09213  [pdf, other

    cs.DC

    Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads

    Authors: Deepak Narayanan, Keshav Santhanam, Fiodar Kazhamiaka, Amar Phanishayee, Matei Zaharia

    Abstract: Specialized accelerators such as GPUs, TPUs, FPGAs, and custom ASICs have been increasingly deployed to train deep learning models. These accelerators exhibit heterogeneous performance behavior across model architectures. Existing schedulers for clusters of accelerators, which are used to arbitrate these expensive training resources across many users, have shown how to optimize for various multi-j… ▽ More

    Submitted 20 August, 2020; originally announced August 2020.

  48. arXiv:2007.13005  [pdf, other

    cs.DB cs.CV

    Jointly Optimizing Preprocessing and Inference for DNN-based Visual Analytics

    Authors: Daniel Kang, Ankit Mathur, Teja Veeramacheneni, Peter Bailis, Matei Zaharia

    Abstract: While deep neural networks (DNNs) are an increasingly popular way to query large corpora of data, their significant runtime remains an active area of research. As a result, researchers have proposed systems and optimizations to reduce these costs by allowing users to trade off accuracy and speed. In this work, we examine end-to-end DNN execution in visual analytics systems on modern accelerators.… ▽ More

    Submitted 25 July, 2020; originally announced July 2020.

  49. arXiv:2007.11112  [pdf, other

    cs.OS cs.AR cs.DB cs.DC cs.NI

    DBOS: A Proposal for a Data-Centric Operating System

    Authors: Michael Cafarella, David DeWitt, Vijay Gadepally, Jeremy Kepner, Christos Kozyrakis, Tim Kraska, Michael Stonebraker, Matei Zaharia

    Abstract: Current operating systems are complex systems that were designed before today's computing environments. This makes it difficult for them to meet the scalability, heterogeneity, availability, and security challenges in current cloud and parallel computing environments. To address these problems, we propose a radically new OS design based on data-centric architecture: all operating system state shou… ▽ More

    Submitted 21 July, 2020; originally announced July 2020.

  50. arXiv:2007.00814  [pdf, other

    cs.CL cs.IR

    Relevance-guided Supervision for OpenQA with ColBERT

    Authors: Omar Khattab, Christopher Potts, Matei Zaharia

    Abstract: Systems for Open-Domain Question Answering (OpenQA) generally depend on a retriever for finding candidate passages in a large corpus and a reader for extracting answers from those passages. In much recent work, the retriever is a learned component that uses coarse-grained vector representations of questions and passages. We argue that this modeling choice is insufficiently expressive for dealing w… ▽ More

    Submitted 2 August, 2021; v1 submitted 1 July, 2020; originally announced July 2020.

    Comments: Accepted for publication in Transactions of the Association for Computational Linguistics (TACL), 2021. Author's final version. Oral presentation at ACL'21