Skip to main content

Showing 1–22 of 22 results for author: Ragan-Kelley, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.05334  [pdf, other

    cs.PL cs.AI cs.HC

    WatChat: Explaining perplexing programs by debugging mental models

    Authors: Kartik Chandra, Tzu-Mao Li, Rachit Nigam, Joshua Tenenbaum, Jonathan Ragan-Kelley

    Abstract: Often, a good explanation for a program's unexpected behavior is a bug in the programmer's code. But sometimes, an even better explanation is a bug in the programmer's mental model of the language they are using. Instead of merely debugging our current code ("giving the programmer a fish"), what if our tools could directly debug our mental models ("teaching the programmer to fish")? In this paper,… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  2. arXiv:2402.05109  [pdf, other

    cs.LG

    Hydra: Sequentially-Dependent Draft Heads for Medusa Decoding

    Authors: Zachary Ankner, Rishab Parthasarathy, Aniruddha Nrusimha, Christopher Rinard, Jonathan Ragan-Kelley, William Brandon

    Abstract: To combat the memory bandwidth-bound nature of autoregressive LLM inference, previous research has proposed the speculative decoding framework. To perform speculative decoding, a small draft model proposes candidate continuations of the input sequence, that are then verified in parallel by the base model. One way to specify the draft model, as used in the recent Medusa decoding framework, is as a… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

  3. arXiv:2312.04709  [pdf, other

    cs.LG cs.NE

    How to guess a gradient

    Authors: Utkarsh Singhal, Brian Cheung, Kartik Chandra, Jonathan Ragan-Kelley, Joshua B. Tenenbaum, Tomaso A. Poggio, Stella X. Yu

    Abstract: How much can you say about the gradient of a neural network without computing a loss or knowing the label? This may sound like a strange question: surely the answer is "very little." However, in this paper, we show that gradients are more structured than previously thought. Gradients lie in a predictable low-dimensional subspace which depends on the network architecture and incoming features. Expl… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

  4. arXiv:2311.09431  [pdf, other

    cs.LG cs.CL

    Striped Attention: Faster Ring Attention for Causal Transformers

    Authors: William Brandon, Aniruddha Nrusimha, Kevin Qian, Zachary Ankner, Tian **, Zhiye Song, Jonathan Ragan-Kelley

    Abstract: To help address the growing demand for ever-longer sequence lengths in transformer models, Liu et al. recently proposed Ring Attention, an exact attention algorithm capable of overcoming per-device memory bottle- necks by distributing self-attention across multiple devices. In this paper, we study the performance characteristics of Ring Attention in the important special case of causal transformer… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

  5. arXiv:2310.04680  [pdf, other

    cs.CL cs.AI cs.LG

    The Cost of Down-Scaling Language Models: Fact Recall Deteriorates before In-Context Learning

    Authors: Tian **, Nolan Clement, Xin Dong, Vaishnavh Nagarajan, Michael Carbin, Jonathan Ragan-Kelley, Gintare Karolina Dziugaite

    Abstract: How does scaling the number of parameters in large language models (LLMs) affect their core capabilities? We study two natural scaling techniques -- weight pruning and simply training a smaller or larger model, which we refer to as dense scaling -- and their effects on two core capabilities of LLMs: (a) recalling facts presented during pre-training and (b) processing information presented in-conte… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

  6. arXiv:2306.07961  [pdf, other

    stat.ML cs.LG stat.CO stat.ME

    Differentiating Metropolis-Hastings to Optimize Intractable Densities

    Authors: Gaurav Arya, Ruben Seyer, Frank Schäfer, Kartik Chandra, Alexander K. Lew, Mathieu Huot, Vikash K. Mansinghka, Jonathan Ragan-Kelley, Christopher Rackauckas, Moritz Schauer

    Abstract: We develop an algorithm for automatic differentiation of Metropolis-Hastings samplers, allowing us to differentiate through probabilistic inference, even if the model has discrete components within it. Our approach fuses recent advances in stochastic automatic differentiation with traditional Markov chain coupling schemes, providing an unbiased and low-variance gradient estimator. This allows us t… ▽ More

    Submitted 30 June, 2023; v1 submitted 13 June, 2023; originally announced June 2023.

    Comments: 6 pages, 6 figures; accepted at Differentiable Almost Everything Workshop of ICML 2023

  7. arXiv:2305.17195  [pdf, other

    cs.AI cs.GR cs.RO

    Inferring the Future by Imagining the Past

    Authors: Kartik Chandra, Tony Chen, Tzu-Mao Li, Jonathan Ragan-Kelley, Josh Tenenbaum

    Abstract: A single panel of a comic book can say a lot: it can depict not only where the characters currently are, but also their motions, their motivations, their emotions, and what they might do next. More generally, humans routinely infer complex sequences of past and future events from a *static snapshot* of a *dynamic scene*, even in situations they have never seen before. In this paper, we model how… ▽ More

    Submitted 30 October, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023

    ACM Class: I.2.10; I.2.9; J.4; I.3.6

  8. Acting as Inverse Inverse Planning

    Authors: Kartik Chandra, Tzu-Mao Li, Josh Tenenbaum, Jonathan Ragan-Kelley

    Abstract: Great storytellers know how to take us on a journey. They direct characters to act -- not necessarily in the most rational way -- but rather in a way that leads to interesting situations, and ultimately creates an impactful experience for audience members looking on. If audience experience is what matters most, then can we help artists and animators *directly* craft such experiences, independent… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: SIGGRAPH '23

    ACM Class: I.3.0; I.2.0

  9. arXiv:2210.15740  [pdf, other

    cs.PL

    Formal Semantics for the Halide Language

    Authors: Alex Reinking, Gilbert Louis Bernstein, Jonathan Ragan-Kelley

    Abstract: We present the first formalization and metatheory of language soundness for a user-schedulable language, the widely used array processing language Halide. User-schedulable languages strike a balance between abstraction and control in high-performance computing by separating the specification of what a program should compute from a schedule for how to compute it. In the process, they make a novel l… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

    Comments: 27 pages, 12 figures

    ACM Class: D.3.1

  10. arXiv:2204.12301  [pdf, other

    cs.GR cs.AI cs.LG

    Designing Perceptual Puzzles by Differentiating Probabilistic Programs

    Authors: Kartik Chandra, Tzu-Mao Li, Joshua Tenenbaum, Jonathan Ragan-Kelley

    Abstract: We design new visual illusions by finding "adversarial examples" for principled models of human perception -- specifically, for probabilistic models, which treat vision as Bayesian inference. To perform this search efficiently, we design a differentiable probabilistic programming language, whose API exposes MCMC inference as a first-class differentiable function. We demonstrate our method by autom… ▽ More

    Submitted 26 April, 2022; originally announced April 2022.

    Comments: 9 pages; 3 figures; SIGGRAPH '22 Conference Proceedings

  11. arXiv:2107.12567  [pdf, other

    cs.HC cs.PL

    Guided Optimization for Image Processing Pipelines

    Authors: Yuka Ikarashi, Jonathan Ragan-Kelley, Tsukasa Fukusato, Jun Kato, Takeo Igarashi

    Abstract: Writing high-performance image processing code is challenging and labor-intensive. The Halide programming language simplifies this task by decoupling high-level algorithms from "schedules" which optimize their implementation. However, even with this abstraction, it is still challenging for Halide programmers to understand complicated scheduling strategies and productively write valid, optimized sc… ▽ More

    Submitted 27 July, 2021; v1 submitted 26 July, 2021; originally announced July 2021.

  12. arXiv:2104.05372  [pdf, other

    cs.PL

    Getting to the Point. Index Sets and Parallelism-Preserving Autodiff for Pointful Array Programming

    Authors: Adam Paszke, Daniel Johnson, David Duvenaud, Dimitrios Vytiniotis, Alexey Radul, Matthew Johnson, Jonathan Ragan-Kelley, Dougal Maclaurin

    Abstract: We present a novel programming language design that attempts to combine the clarity and safety of high-level functional languages with the efficiency and parallelism of low-level numerical languages. We treat arrays as eagerly-memoized functions on typed index sets, allowing abstract function manipulations, such as currying, to work on arrays. In contrast to composing primitive bulk-array operatio… ▽ More

    Submitted 12 April, 2021; originally announced April 2021.

    Comments: 31 pages with appendix, 11 figures. A conference submission is still under review

  13. arXiv:2012.07145  [pdf, other

    cs.PL

    Efficient Automatic Scheduling of Imaging and Vision Pipelines for the GPU

    Authors: Luke Anderson, Andrew Adams, Karima Ma, Tzu-Mao Li, Tian **, Jonathan Ragan-Kelley

    Abstract: We present a new algorithm to quickly generate high-performance GPU implementations of complex imaging and vision pipelines, directly from high-level Halide algorithm code. It is fully automatic, requiring no schedule templates or hand-optimized kernels. We address the scalability challenge of extending search-based automatic scheduling to map large real-world programs to the deep hierarchies of m… ▽ More

    Submitted 27 August, 2023; v1 submitted 13 December, 2020; originally announced December 2020.

    Comments: Revised version published at OOPSLA 2021

  14. arXiv:2008.11256  [pdf, other

    cs.PL cs.GR

    Differentiating a Tensor Language

    Authors: Gilbert Bernstein, Michael Mara, Tzu-Mao Li, Dougal Maclaurin, Jonathan Ragan-Kelley

    Abstract: How does one compile derivatives of tensor programs, such that the resulting code is purely functional (hence easier to optimize and parallelize) and provably efficient relative to the original program? We show that naively differentiating tensor code---as done in popular systems like Tensorflow and PyTorch---can cause asymptotic slowdowns in pathological cases, violating the Cheap Gradients Princ… ▽ More

    Submitted 25 August, 2020; originally announced August 2020.

    Comments: In-progress Draft; unsubmitted

  15. arXiv:2003.02237  [pdf, other

    cs.LG stat.ML

    Neural Kernels Without Tangents

    Authors: Vaishaal Shankar, Alex Fang, Wenshuo Guo, Sara Fridovich-Keil, Ludwig Schmidt, Jonathan Ragan-Kelley, Benjamin Recht

    Abstract: We investigate the connections between neural networks and simple building blocks in kernel space. In particular, using well established feature space tools such as direct sum, averaging, and moment lifting, we present an algebra for creating "compositional" kernels from bags of features. We show that these operations correspond to many of the building blocks of "neural tangent kernels (NTK)". Exp… ▽ More

    Submitted 5 March, 2020; v1 submitted 4 March, 2020; originally announced March 2020.

    Comments: code used to produce our results can be found at: https://github.com/modestyachts/neural_kernels_code

  16. arXiv:1911.09925  [pdf, other

    cs.DC cs.AR cs.LG cs.PF

    Gemmini: Enabling Systematic Deep-Learning Architecture Evaluation via Full-Stack Integration

    Authors: Hasan Genc, Seah Kim, Alon Amid, Ameer Haj-Ali, Vighnesh Iyer, Pranav Prakash, Jerry Zhao, Daniel Grubb, Harrison Liew, Howard Mao, Albert Ou, Colin Schmidt, Samuel Steffl, John Wright, Ion Stoica, Jonathan Ragan-Kelley, Krste Asanovic, Borivoje Nikolic, Yakun Sophia Shao

    Abstract: DNN accelerators are often developed and evaluated in isolation without considering the cross-stack, system-level effects in real-world environments. This makes it difficult to appreciate the impact of System-on-Chip (SoC) resource contention, OS overheads, and programming-stack inefficiencies on overall performance/energy-efficiency. To address this challenge, we present Gemmini, an open-source*,… ▽ More

    Submitted 9 July, 2021; v1 submitted 22 November, 2019; originally announced November 2019.

    Comments: To appear at the 58th IEEE/ACM Design Automation Conference (DAC), December 2021, San Francisco, CA, USA

  17. arXiv:1910.00935  [pdf, other

    cs.LG cs.GR physics.comp-ph stat.ML

    DiffTaichi: Differentiable Programming for Physical Simulation

    Authors: Yuanming Hu, Luke Anderson, Tzu-Mao Li, Qi Sun, Nathan Carr, Jonathan Ragan-Kelley, Frédo Durand

    Abstract: We present DiffTaichi, a new differentiable programming language tailored for building high-performance differentiable physical simulators. Based on an imperative programming language, DiffTaichi generates gradients of simulation steps using source code transformations that preserve arithmetic intensity and parallelism. A light-weight tape is used to record the whole simulation program structure a… ▽ More

    Submitted 14 February, 2020; v1 submitted 1 October, 2019; originally announced October 2019.

    Comments: Published at ICLR 2020

  18. arXiv:1909.13371  [pdf, other

    cs.LG stat.ML

    Gradient Descent: The Ultimate Optimizer

    Authors: Kartik Chandra, Audrey Xie, Jonathan Ragan-Kelley, Erik Meijer

    Abstract: Working with any gradient-based machine learning algorithm involves the tedious task of tuning the optimizer's hyperparameters, such as its step size. Recent work has shown how the step size can itself be optimized alongside the model parameters by manually deriving expressions for "hypergradients" ahead of time. We show how to automatically compute hypergradients with a simple and elegant modif… ▽ More

    Submitted 14 October, 2022; v1 submitted 29 September, 2019; originally announced September 2019.

  19. arXiv:1810.09679  [pdf, other

    cs.DC

    numpywren: serverless linear algebra

    Authors: Vaishaal Shankar, Karl Krauth, Qifan Pu, Eric Jonas, Shivaram Venkataraman, Ion Stoica, Benjamin Recht, Jonathan Ragan-Kelley

    Abstract: Linear algebra operations are widely used in scientific computing and machine learning applications. However, it is challenging for scientists and data analysts to run linear algebra at scales beyond a single machine. Traditional approaches either require access to supercomputing clusters, or impose configuration and cluster management challenges. In this paper we show how the disaggregation of st… ▽ More

    Submitted 23 October, 2018; originally announced October 2018.

  20. arXiv:1610.09405  [pdf, other

    cs.SE

    Programming Heterogeneous Systems from an Image Processing DSL

    Authors: **g Pu, Steven Bell, Xuan Yang, Jeff Setter, Stephen Richardson, Jonathan Ragan-Kelley, Mark Horowitz

    Abstract: Specialized image processing accelerators are necessary to deliver the performance and energy efficiency required by important applications in computer vision, computational photography, and augmented reality. But creating, "programming,"and integrating this hardware into a hardware/software system is difficult. We address this problem by extending the image processing language, Halide, so users c… ▽ More

    Submitted 28 October, 2016; originally announced October 2016.

  21. arXiv:1606.04209  [pdf, other

    cs.DC cs.NE

    A Systematic Approach to Blocking Convolutional Neural Networks

    Authors: Xuan Yang, **g Pu, Blaine Burton Rister, Nikhil Bhagdikar, Stephen Richardson, Shahar Kvatinsky, Jonathan Ragan-Kelley, Ardavan Pedram, Mark Horowitz

    Abstract: Convolutional Neural Networks (CNNs) are the state of the art solution for many computer vision problems, and many researchers have explored optimized implementations. Most implementations heuristically block the computation to deal with the large data sizes and high data reuse of CNNs. This paper explores how to block CNN computations for memory locality by creating an analytical model for CNN-li… ▽ More

    Submitted 14 June, 2016; originally announced June 2016.

  22. arXiv:1604.06525  [pdf, other

    cs.GR cs.CV cs.PL

    Opt: A Domain Specific Language for Non-linear Least Squares Optimization in Graphics and Imaging

    Authors: Zachary DeVito, Michael Mara, Michael Zollhöfer, Gilbert Bernstein, Jonathan Ragan-Kelley, Christian Theobalt, Pat Hanrahan, Matthew Fisher, Matthias Nießner

    Abstract: Many graphics and vision problems can be expressed as non-linear least squares optimizations of objective functions over visual data, such as images and meshes. The mathematical descriptions of these functions are extremely concise, but their implementation in real code is tedious, especially when optimized for real-time performance on modern GPUs in interactive applications. In this work, we prop… ▽ More

    Submitted 9 September, 2017; v1 submitted 21 April, 2016; originally announced April 2016.