Skip to main content

Showing 1–17 of 17 results for author: Paszke, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.07839  [pdf, other

    cs.LG cs.AI cs.CL

    RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

    Authors: Aleksandar Botev, Soham De, Samuel L Smith, Anushan Fernando, George-Cristian Muraru, Ruba Haroun, Leonard Berrada, Razvan Pascanu, Pier Giuseppe Sessa, Robert Dadashi, Léonard Hussenot, Johan Ferret, Sertan Girgin, Olivier Bachem, Alek Andreev, Kathleen Kenealy, Thomas Mesnard, Cassidy Hardin, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti , et al. (37 additional authors not shown)

    Abstract: We introduce RecurrentGemma, an open language model which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide a pre-trained model with 2B non-embedding parameters, and an instruction tuned var… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  2. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  3. arXiv:2401.11202  [pdf, other

    cs.LG cs.DC cs.PL

    PartIR: Composing SPMD Partitioning Strategies for Machine Learning

    Authors: Sami Alabed, Daniel Belov, Bart Chrzaszcz, Juliana Franco, Dominik Grewe, Dougal Maclaurin, James Molloy, Tom Natan, Tamara Norman, Xiaoyue Pan, Adam Paszke, Norman A. Rink, Michael Schaarschmidt, Timur Sitdikov, Agnieszka Swietlik, Dimitrios Vytiniotis, Joel Wee

    Abstract: Training of modern large neural networks (NN) requires a combination of parallelization strategies encompassing data, model, or optimizer sharding. When strategies increase in complexity, it becomes necessary for partitioning tools to be 1) expressive, allowing the composition of simpler strategies, and 2) predictable to estimate performance analytically. We present PartIR, our design for a NN par… ▽ More

    Submitted 3 March, 2024; v1 submitted 20 January, 2024; originally announced January 2024.

  4. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  5. arXiv:2210.04729  [pdf, ps, other

    cs.PL

    The Foil: Capture-Avoiding Substitution With No Sharp Edges

    Authors: Dougal Maclaurin, Alexey Radul, Adam Paszke

    Abstract: Correctly manipulating program terms in a compiler is surprisingly difficult because of the need to avoid name capture. The rapier from "Secrets of the Glasgow Haskell Compiler inliner" is a cutting-edge technique for fast, stateless capture-avoiding substitution for expressions represented with explicit names. It is, however, a sharp tool: its invariants are tricky and need to be maintained throu… ▽ More

    Submitted 10 October, 2022; originally announced October 2022.

    Comments: Presented at IFL 2022

  6. arXiv:2204.10923  [pdf, other

    cs.PL

    You Only Linearize Once: Tangents Transpose to Gradients

    Authors: Alexey Radul, Adam Paszke, Roy Frostig, Matthew Johnson, Dougal Maclaurin

    Abstract: Automatic differentiation (AD) is conventionally understood as a family of distinct algorithms, rooted in two "modes" -- forward and reverse -- which are typically presented (and implemented) separately. Can there be only one? Following up on the AD systems developed in the JAX and Dex projects, we formalize a decomposition of reverse-mode AD into (i) forward-mode AD followed by (ii) unzip** the… ▽ More

    Submitted 6 December, 2022; v1 submitted 22 April, 2022; originally announced April 2022.

  7. arXiv:2112.02958  [pdf, other

    cs.LG cs.DC

    Automap: Towards Ergonomic Automated Parallelism for ML Models

    Authors: Michael Schaarschmidt, Dominik Grewe, Dimitrios Vytiniotis, Adam Paszke, Georg Stefan Schmid, Tamara Norman, James Molloy, Jonathan Godwin, Norman Alexander Rink, Vinod Nair, Dan Belov

    Abstract: The rapid rise in demand for training large neural network architectures has brought into focus the need for partitioning strategies, for example by using data, model, or pipeline parallelism. Implementing these methods is increasingly supported through program primitives, but identifying efficient partitioning strategies requires expensive experimentation and expertise. We present the prototype o… ▽ More

    Submitted 6 December, 2021; originally announced December 2021.

    Comments: Workshop on ML for Systems at NeurIPS 2021

  8. arXiv:2112.01075  [pdf, other

    cs.DC cs.LG cs.PL

    Memory-efficient array redistribution through portable collective communication

    Authors: Norman A. Rink, Adam Paszke, Dimitrios Vytiniotis, Georg Stefan Schmid

    Abstract: Modern large-scale deep learning workloads highlight the need for parallel execution across many devices in order to fit model data into hardware accelerator memories. In these settings, array redistribution may be required during a computation, but can also become a bottleneck if not done efficiently. In this paper we address the problem of redistributing multi-dimensional array data in SPMD comp… ▽ More

    Submitted 28 November, 2022; v1 submitted 2 December, 2021; originally announced December 2021.

    Comments: minor errata fixed

  9. arXiv:2110.07493  [pdf, ps, other

    cs.PL

    Parallel Algebraic Effect Handlers

    Authors: Ningning Xie, Daniel D. Johnson, Dougal Maclaurin, Adam Paszke

    Abstract: Algebraic effects and handlers support composable and structured control-flow abstraction. However, existing designs of algebraic effects often require effects to be executed sequentially. This paper studies parallel algebraic effect handlers. In particular, we formalize λp, an untyped lambda calculus which models two key features, effect handlers and parallelizable computations, the latter of whi… ▽ More

    Submitted 14 October, 2021; originally announced October 2021.

    Comments: Short paper submitted to the ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation (PEPM) 2022

  10. arXiv:2105.09469  [pdf, other

    cs.PL cs.LG

    Decomposing reverse-mode automatic differentiation

    Authors: Roy Frostig, Matthew J. Johnson, Dougal Maclaurin, Adam Paszke, Alexey Radul

    Abstract: We decompose reverse-mode automatic differentiation into (forward-mode) linearization followed by transposition. Doing so isolates the essential difference between forward- and reverse-mode AD, and simplifies their joint implementation. In particular, once forward-mode AD rules are defined for every primitive operation in a source language, only linear primitives require an additional transpositio… ▽ More

    Submitted 19 May, 2021; originally announced May 2021.

    Comments: Presented at the LAFI 2021 workshop at POPL, 17 January 2021

  11. arXiv:2104.05372  [pdf, other

    cs.PL

    Getting to the Point. Index Sets and Parallelism-Preserving Autodiff for Pointful Array Programming

    Authors: Adam Paszke, Daniel Johnson, David Duvenaud, Dimitrios Vytiniotis, Alexey Radul, Matthew Johnson, Jonathan Ragan-Kelley, Dougal Maclaurin

    Abstract: We present a novel programming language design that attempts to combine the clarity and safety of high-level functional languages with the efficiency and parallelism of low-level numerical languages. We treat arrays as eagerly-memoized functions on typed index sets, allowing abstract function manipulations, such as currying, to work on arrays. In contrast to composing primitive bulk-array operatio… ▽ More

    Submitted 12 April, 2021; originally announced April 2021.

    Comments: 31 pages with appendix, 11 figures. A conference submission is still under review

  12. arXiv:2102.13254  [pdf, ps, other

    cs.PL cs.LG

    Tensors Fitting Perfectly

    Authors: Adam Paszke, Brennan Saeta

    Abstract: Multidimensional arrays (NDArrays) are a central abstraction in modern scientific computing environments. Unfortunately, they can make reasoning about programs harder as the number of different array shapes used in an execution of a program is usually very large, and they rarely appear explicitly in program text. To make things worse, many operators make implicit assumptions about the shapes of th… ▽ More

    Submitted 25 February, 2021; originally announced February 2021.

  13. arXiv:2006.15704  [pdf, other

    cs.DC cs.LG

    PyTorch Distributed: Experiences on Accelerating Data Parallel Training

    Authors: Shen Li, Yanli Zhao, Rohan Varma, Omkar Salpekar, Pieter Noordhuis, Teng Li, Adam Paszke, Jeff Smith, Brian Vaughan, Pritam Damania, Soumith Chintala

    Abstract: This paper presents the design, implementation, and evaluation of the PyTorch distributed data parallel module. PyTorch is a widely-adopted scientific computing package used in deep learning research and applications. Recent advances in deep learning argue for the value of large datasets and large models, which necessitates the ability to scale out model training to more computational resources. D… ▽ More

    Submitted 28 June, 2020; originally announced June 2020.

    Comments: To appear in VLDB 2020

  14. arXiv:2003.14177  [pdf, other

    cs.LO cs.DM cs.FL

    VC density of set systems defnable in tree-like graphs

    Authors: Adam Paszke, Michał Pilipczuk

    Abstract: We study set systems definable in graphs using variants of logic with different expressive power. Our focus is on the notion of Vapnik-Chervonenkis density: the smallest possible degree of a polynomial bounding the cardinalities of restrictions of such set systems. On one hand, we prove that if $\varphi(\bar x,\bar y)$ is a fixed CMSO$_1$ formula and $\cal C$ is a class of graphs with uniformly bo… ▽ More

    Submitted 31 March, 2020; originally announced March 2020.

    Comments: 14 pages, 1 figure

  15. arXiv:1912.01703  [pdf, other

    cs.LG cs.MS stat.ML

    PyTorch: An Imperative Style, High-Performance Deep Learning Library

    Authors: Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, Soumith Chintala

    Abstract: Deep learning frameworks have often focused on either usability or speed, but not both. PyTorch is a machine learning library that shows that these two goals are in fact compatible: it provides an imperative and Pythonic programming style that supports code as a model, makes debugging easy and is consistent with other popular scientific computing libraries, while remaining efficient and supporting… ▽ More

    Submitted 3 December, 2019; originally announced December 2019.

    Comments: 12 pages, 3 figures, NeurIPS 2019

  16. arXiv:1606.02147  [pdf, other

    cs.CV

    ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation

    Authors: Adam Paszke, Abhishek Chaurasia, Sangpil Kim, Eugenio Culurciello

    Abstract: The ability to perform pixel-wise semantic segmentation in real-time is of paramount importance in mobile applications. Recent deep neural networks aimed at this task have the disadvantage of requiring a large number of floating point operations and have long run-times that hinder their usability. In this paper, we propose a novel deep neural network architecture named ENet (efficient neural netwo… ▽ More

    Submitted 7 June, 2016; originally announced June 2016.

  17. arXiv:1605.07678  [pdf, other

    cs.CV

    An Analysis of Deep Neural Network Models for Practical Applications

    Authors: Alfredo Canziani, Adam Paszke, Eugenio Culurciello

    Abstract: Since the emergence of Deep Neural Networks (DNNs) as a prominent technique in the field of computer vision, the ImageNet classification challenge has played a major role in advancing the state-of-the-art. While accuracy figures have steadily increased, the resource utilisation of winning models has not been properly taken into account. In this work, we present a comprehensive analysis of importan… ▽ More

    Submitted 14 April, 2017; v1 submitted 24 May, 2016; originally announced May 2016.

    Comments: 7 pages, 10 figures, legend for Figure 2 got lost :/