-
Automap: Towards Ergonomic Automated Parallelism for ML Models
Authors:
Michael Schaarschmidt,
Dominik Grewe,
Dimitrios Vytiniotis,
Adam Paszke,
Georg Stefan Schmid,
Tamara Norman,
James Molloy,
Jonathan Godwin,
Norman Alexander Rink,
Vinod Nair,
Dan Belov
Abstract:
The rapid rise in demand for training large neural network architectures has brought into focus the need for partitioning strategies, for example by using data, model, or pipeline parallelism. Implementing these methods is increasingly supported through program primitives, but identifying efficient partitioning strategies requires expensive experimentation and expertise. We present the prototype o…
▽ More
The rapid rise in demand for training large neural network architectures has brought into focus the need for partitioning strategies, for example by using data, model, or pipeline parallelism. Implementing these methods is increasingly supported through program primitives, but identifying efficient partitioning strategies requires expensive experimentation and expertise. We present the prototype of an automated partitioner that seamlessly integrates into existing compilers and existing user workflows. Our partitioner enables SPMD-style parallelism that encompasses data parallelism and parameter/activation sharding. Through a combination of inductive tactics and search in a platform-independent partitioning IR, automap can recover expert partitioning strategies such as Megatron sharding for transformer layers.
△ Less
Submitted 6 December, 2021;
originally announced December 2021.
-
Memory-efficient array redistribution through portable collective communication
Authors:
Norman A. Rink,
Adam Paszke,
Dimitrios Vytiniotis,
Georg Stefan Schmid
Abstract:
Modern large-scale deep learning workloads highlight the need for parallel execution across many devices in order to fit model data into hardware accelerator memories. In these settings, array redistribution may be required during a computation, but can also become a bottleneck if not done efficiently. In this paper we address the problem of redistributing multi-dimensional array data in SPMD comp…
▽ More
Modern large-scale deep learning workloads highlight the need for parallel execution across many devices in order to fit model data into hardware accelerator memories. In these settings, array redistribution may be required during a computation, but can also become a bottleneck if not done efficiently. In this paper we address the problem of redistributing multi-dimensional array data in SPMD computations, the most prevalent form of parallelism in deep learning. We present a type-directed approach to synthesizing array redistributions as sequences of MPI-style collective operations. We prove formally that our synthesized redistributions are memory-efficient and perform no excessive data transfers. Array redistribution for SPMD computations using collective operations has also been implemented in the context of the XLA SPMD partitioner, a production-grade tool for partitioning programs across accelerator systems. We evaluate our approach against the XLA implementation and find that our approach delivers a geometric mean speedup of $1.22\times$, with maximum speedups as a high as $5.7\times$, while offering provable memory guarantees, making our system particularly appealing for large-scale models.
△ Less
Submitted 28 November, 2022; v1 submitted 2 December, 2021;
originally announced December 2021.
-
Coming to Terms with Your Choices: An Existential Take on Dependent Types
Authors:
Georg Stefan Schmid,
Olivier Blanvillain,
Jad Hamza,
Viktor KunĨak
Abstract:
Type-level programming is an increasingly popular way to obtain additional type safety. Unfortunately, it remains a second-class citizen in the majority of industrially-used programming languages. We propose a new dependently-typed system with subty** and singleton types whose goal is to enable type-level programming in an accessible style. At the heart of our system lies a non-deterministic cho…
▽ More
Type-level programming is an increasingly popular way to obtain additional type safety. Unfortunately, it remains a second-class citizen in the majority of industrially-used programming languages. We propose a new dependently-typed system with subty** and singleton types whose goal is to enable type-level programming in an accessible style. At the heart of our system lies a non-deterministic choice operator. We argue that embracing non-determinism is crucial for bringing dependent types to a broader audience of programmers, since real-world programs will inevitably interact with imprecisely-typed, or even impure code. Furthermore, we show that singleton types combined with the choice operator can serve as a replacement for many type functions of interest in practice. We establish the soundness of our approach using the Coq proof assistant. Our soundness approach models non-determinism using additional function arguments to represent choices. We represent type-level computation using singleton types and existential types that quantify over choice arguments. To demonstrate the practicality of our type system, we present an implementation as a modification of the Scala compiler. We provide a case study in which we develop a strongly-typed wrapper for Spark datasets.
△ Less
Submitted 15 November, 2020;
originally announced November 2020.