Skip to main content

Showing 1–10 of 10 results for author: DeVito, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.02803  [pdf, other

    cs.LG cs.DC

    Is Flash Attention Stable?

    Authors: Alicia Golden, Samuel Hsia, Fei Sun, Bilge Acun, Basil Hosmer, Ye** Lee, Zachary DeVito, Jeff Johnson, Gu-Yeon Wei, David Brooks, Carole-Jean Wu

    Abstract: Training large-scale machine learning models poses distinct system challenges, given both the size and complexity of today's workloads. Recently, many organizations training state-of-the-art Generative AI models have reported cases of instability during training, often taking the form of loss spikes. Numeric deviation has emerged as a potential cause of this training instability, although quantify… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

  2. arXiv:2312.14385  [pdf, other

    cs.DC cs.LG cs.MM

    Generative AI Beyond LLMs: System Implications of Multi-Modal Generation

    Authors: Alicia Golden, Samuel Hsia, Fei Sun, Bilge Acun, Basil Hosmer, Ye** Lee, Zachary DeVito, Jeff Johnson, Gu-Yeon Wei, David Brooks, Carole-Jean Wu

    Abstract: As the development of large-scale Generative AI models evolve beyond text (1D) generation to include image (2D) and video (3D) generation, processing spatial and temporal information presents unique challenges to quality, performance, and efficiency. We present the first work towards understanding this new system design space for multi-modal text-to-image (TTI) and text-to-video (TTV) generation m… ▽ More

    Submitted 5 May, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: Published at 2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

  3. arXiv:2310.02784  [pdf, other

    cs.DC cs.AR cs.LG

    MAD Max Beyond Single-Node: Enabling Large Machine Learning Model Acceleration on Distributed Systems

    Authors: Samuel Hsia, Alicia Golden, Bilge Acun, Newsha Ardalani, Zachary DeVito, Gu-Yeon Wei, David Brooks, Carole-Jean Wu

    Abstract: Training and deploying large-scale machine learning models is time-consuming, requires significant distributed computing infrastructures, and incurs high operational costs. Our analysis, grounded in real-world large model training on datacenter-scale infrastructures, reveals that 14~32% of all GPU hours are spent on communication with no overlap** computation. To minimize this outstanding commun… ▽ More

    Submitted 10 June, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: ISCA 2024

  4. arXiv:2304.09871  [pdf, other

    cs.LG cs.AI math.OC

    A Theory on Adam Instability in Large-Scale Machine Learning

    Authors: Igor Molybog, Peter Albert, Moya Chen, Zachary DeVito, David Esiobu, Naman Goyal, Punit Singh Koura, Sharan Narang, Andrew Poulton, Ruan Silva, Binh Tang, Diana Liskovich, Puxin Xu, Yuchen Zhang, Melanie Kambadur, Stephen Roller, Susan Zhang

    Abstract: We present a theory for the previously unexplained divergent behavior noticed in the training of large language models. We argue that the phenomenon is an artifact of the dominant optimization algorithm used for training, called Adam. We observe that Adam can enter a state in which the parameter update vector has a relatively large norm and is essentially uncorrelated with the direction of descent… ▽ More

    Submitted 25 April, 2023; v1 submitted 19 April, 2023; originally announced April 2023.

  5. arXiv:2112.08429  [pdf, other

    cs.LG

    Torch.fx: Practical Program Capture and Transformation for Deep Learning in Python

    Authors: James K. Reed, Zachary DeVito, Horace He, Ansley Ussery, Jason Ansel

    Abstract: Modern deep learning frameworks provide imperative, eager execution programming interfaces embedded in Python to provide a productive development experience. However, deep learning practitioners sometimes need to capture and transform program structure for performance optimization, visualization, analysis, and hardware integration. We study the different designs for program capture and transformat… ▽ More

    Submitted 4 March, 2022; v1 submitted 15 December, 2021; originally announced December 2021.

    Comments: 14 pages, 8 figures, Accepted to MLSys 2022. v2: Added correctness information to evals, clarified 6.2.3, added transform runtime measurement

  6. arXiv:2104.00254  [pdf, other

    cs.LG

    Using Python for Model Inference in Deep Learning

    Authors: Zachary DeVito, Jason Ansel, Will Constable, Michael Suo, Ailing Zhang, Kim Hazelwood

    Abstract: Python has become the de-facto language for training deep neural networks, coupling a large suite of scientific computing libraries with efficient libraries for tensor computation such as PyTorch or TensorFlow. However, when models are used for inference they are typically extracted from Python as TensorFlow graphs or TorchScript programs in order to meet performance and packaging constraints. The… ▽ More

    Submitted 1 April, 2021; originally announced April 2021.

  7. arXiv:1912.01703  [pdf, other

    cs.LG cs.MS stat.ML

    PyTorch: An Imperative Style, High-Performance Deep Learning Library

    Authors: Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, Soumith Chintala

    Abstract: Deep learning frameworks have often focused on either usability or speed, but not both. PyTorch is a machine learning library that shows that these two goals are in fact compatible: it provides an imperative and Pythonic programming style that supports code as a model, makes debugging easy and is consistent with other popular scientific computing libraries, while remaining efficient and supporting… ▽ More

    Submitted 3 December, 2019; originally announced December 2019.

    Comments: 12 pages, 3 figures, NeurIPS 2019

  8. arXiv:1802.04730  [pdf, other

    cs.PL cs.LG

    Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions

    Authors: Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S. Moses, Sven Verdoolaege, Andrew Adams, Albert Cohen

    Abstract: Deep learning models with convolutional and recurrent networks are now ubiquitous and analyze massive amounts of audio, image, video, text and graph data, with applications in automatic translation, speech-to-text, scene understanding, ranking user preferences, ad placement, etc. Competing frameworks for building these networks such as TensorFlow, Chainer, CNTK, Torch/PyTorch, Caffe1/2, MXNet and… ▽ More

    Submitted 28 June, 2018; v1 submitted 13 February, 2018; originally announced February 2018.

  9. arXiv:1604.06525  [pdf, other

    cs.GR cs.CV cs.PL

    Opt: A Domain Specific Language for Non-linear Least Squares Optimization in Graphics and Imaging

    Authors: Zachary DeVito, Michael Mara, Michael Zollhöfer, Gilbert Bernstein, Jonathan Ragan-Kelley, Christian Theobalt, Pat Hanrahan, Matthew Fisher, Matthias Nießner

    Abstract: Many graphics and vision problems can be expressed as non-linear least squares optimizations of objective functions over visual data, such as images and meshes. The mathematical descriptions of these functions are extremely concise, but their implementation in real code is tedious, especially when optimized for real-time performance on modern GPUs in interactive applications. In this work, we prop… ▽ More

    Submitted 9 September, 2017; v1 submitted 21 April, 2016; originally announced April 2016.

  10. arXiv:1506.07577  [pdf, other

    cs.GR

    Ebb: A DSL for Physical Simulation on CPUs and GPUs

    Authors: Gilbert Louis Bernstein, Chinmayee Shah, Crystal Lemire, Zachary DeVito, Matthew Fisher, Philip Levis, Pat Hanrahan

    Abstract: Designing programming environments for physical simulation is challenging because simulations rely on diverse algorithms and geometric domains. These challenges are compounded when we try to run efficiently on heterogeneous parallel architectures. We present Ebb, a domain-specific language (DSL) for simulation, that runs efficiently on both CPUs and GPUs. Unlike previous DSLs, Ebb uses a three-lay… ▽ More

    Submitted 24 February, 2016; v1 submitted 24 June, 2015; originally announced June 2015.