-
Torch.fx: Practical Program Capture and Transformation for Deep Learning in Python
Authors:
James K. Reed,
Zachary DeVito,
Horace He,
Ansley Ussery,
Jason Ansel
Abstract:
Modern deep learning frameworks provide imperative, eager execution programming interfaces embedded in Python to provide a productive development experience. However, deep learning practitioners sometimes need to capture and transform program structure for performance optimization, visualization, analysis, and hardware integration. We study the different designs for program capture and transformat…
▽ More
Modern deep learning frameworks provide imperative, eager execution programming interfaces embedded in Python to provide a productive development experience. However, deep learning practitioners sometimes need to capture and transform program structure for performance optimization, visualization, analysis, and hardware integration. We study the different designs for program capture and transformation used in deep learning. By designing for typical deep learning use cases rather than long tail ones, it is possible to create a simpler framework for program capture and transformation. We apply this principle in torch.fx, a program capture and transformation library for PyTorch written entirely in Python and optimized for high developer productivity by ML practitioners. We present case studies showing how torch.fx enables workflows previously inaccessible in the PyTorch ecosystem.
△ Less
Submitted 4 March, 2022; v1 submitted 15 December, 2021;
originally announced December 2021.
-
CompilerGym: Robust, Performant Compiler Optimization Environments for AI Research
Authors:
Chris Cummins,
Bram Wasti,
Jiadong Guo,
Brandon Cui,
Jason Ansel,
Sahir Gomez,
Somya Jain,
Jia Liu,
Olivier Teytaud,
Benoit Steiner,
Yuandong Tian,
Hugh Leather
Abstract:
Interest in applying Artificial Intelligence (AI) techniques to compiler optimizations is increasing rapidly, but compiler research has a high entry barrier. Unlike in other domains, compiler and AI researchers do not have access to the datasets and frameworks that enable fast iteration and development of ideas, and getting started requires a significant engineering investment. What is needed is a…
▽ More
Interest in applying Artificial Intelligence (AI) techniques to compiler optimizations is increasing rapidly, but compiler research has a high entry barrier. Unlike in other domains, compiler and AI researchers do not have access to the datasets and frameworks that enable fast iteration and development of ideas, and getting started requires a significant engineering investment. What is needed is an easy, reusable experimental infrastructure for real world compiler optimization tasks that can serve as a common benchmark for comparing techniques, and as a platform to accelerate progress in the field.
We introduce CompilerGym, a set of environments for real world compiler optimization tasks, and a toolkit for exposing new optimization tasks to compiler researchers. CompilerGym enables anyone to experiment on production compiler optimization problems through an easy-to-use package, regardless of their experience with compilers. We build upon the popular OpenAI Gym interface enabling researchers to interact with compilers using Python and a familiar API.
We describe the CompilerGym architecture and implementation, characterize the optimization spaces and computational efficiencies of three included compiler environments, and provide extensive empirical evaluations. Compared to prior works, CompilerGym offers larger datasets and optimization spaces, is 27x more computationally efficient, is fault-tolerant, and capable of detecting reproducibility bugs in the underlying compilers.
In making it easy for anyone to experiment with compilers - irrespective of their background - we aim to accelerate progress in the AI and compiler research domains.
△ Less
Submitted 22 December, 2021; v1 submitted 16 September, 2021;
originally announced September 2021.
-
Using Python for Model Inference in Deep Learning
Authors:
Zachary DeVito,
Jason Ansel,
Will Constable,
Michael Suo,
Ailing Zhang,
Kim Hazelwood
Abstract:
Python has become the de-facto language for training deep neural networks, coupling a large suite of scientific computing libraries with efficient libraries for tensor computation such as PyTorch or TensorFlow. However, when models are used for inference they are typically extracted from Python as TensorFlow graphs or TorchScript programs in order to meet performance and packaging constraints. The…
▽ More
Python has become the de-facto language for training deep neural networks, coupling a large suite of scientific computing libraries with efficient libraries for tensor computation such as PyTorch or TensorFlow. However, when models are used for inference they are typically extracted from Python as TensorFlow graphs or TorchScript programs in order to meet performance and packaging constraints. The extraction process can be time consuming, impeding fast prototy**. We show how it is possible to meet these performance and packaging constraints while performing inference in Python. In particular, we present a way of using multiple Python interpreters within a single process to achieve scalable inference and describe a new container format for models that contains both native Python code and data. This approach simplifies the model deployment story by eliminating the model extraction step, and makes it easier to integrate existing performance-enhancing Python libraries. We evaluate our design on a suite of popular PyTorch models on Github, showing how they can be packaged in our inference format, and comparing their performance to TorchScript. For larger models, our packaged Python models perform the same as TorchScript, and for smaller models where there is some Python overhead, our multi-interpreter approach ensures inference is still scalable.
△ Less
Submitted 1 April, 2021;
originally announced April 2021.
-
Tight Prediction Intervals Using Expanded Interval Minimization
Authors:
Dongqi Su,
Ying Yin Ting,
Jason Ansel
Abstract:
Prediction intervals are a valuable way of quantifying uncertainty in regression problems. Good prediction intervals should be both correct, containing the actual value between the lower and upper bound at least a target percentage of the time; and tight, having a small mean width of the bounds. Many prior techniques for generating prediction intervals make assumptions on the distribution of error…
▽ More
Prediction intervals are a valuable way of quantifying uncertainty in regression problems. Good prediction intervals should be both correct, containing the actual value between the lower and upper bound at least a target percentage of the time; and tight, having a small mean width of the bounds. Many prior techniques for generating prediction intervals make assumptions on the distribution of error, which causes them to work poorly for problems with asymmetric distributions.
This paper presents Expanded Interval Minimization (EIM), a novel loss function for generating prediction intervals using neural networks. This loss function uses minibatch statistics to estimate the coverage and optimize the width of the prediction intervals. It does not make the same assumptions on the distributions of data and error as prior work. We compare to three published techniques and show EIM produces on average 1.37x tighter prediction intervals and in the worst case 1.06x tighter intervals across two large real-world datasets and varying coverage levels.
△ Less
Submitted 28 June, 2018;
originally announced June 2018.
-
DMTCP: Transparent Checkpointing for Cluster Computations and the Desktop
Authors:
Jason Ansel,
Kapil Arya,
Gene Cooperman
Abstract:
DMTCP (Distributed MultiThreaded CheckPointing) is a transparent user-level checkpointing package for distributed applications. Checkpointing and restart is demonstrated for a wide range of over 20 well known applications, including MATLAB, Python, TightVNC, MPICH2, OpenMPI, and runCMS. RunCMS runs as a 680 MB image in memory that includes 540 dynamic libraries, and is used for the CMS experimen…
▽ More
DMTCP (Distributed MultiThreaded CheckPointing) is a transparent user-level checkpointing package for distributed applications. Checkpointing and restart is demonstrated for a wide range of over 20 well known applications, including MATLAB, Python, TightVNC, MPICH2, OpenMPI, and runCMS. RunCMS runs as a 680 MB image in memory that includes 540 dynamic libraries, and is used for the CMS experiment of the Large Hadron Collider at CERN. DMTCP transparently checkpoints general cluster computations consisting of many nodes, processes, and threads; as well as typical desktop applications. On 128 distributed cores (32 nodes), checkpoint and restart times are typically 2 seconds, with negligible run-time overhead. Typical checkpoint times are reduced to 0.2 seconds when using forked checkpointing. Experimental results show that checkpoint time remains nearly constant as the number of nodes increases on a medium-size cluster.
DMTCP automatically accounts for fork, exec, ssh, mutexes/semaphores, TCP/IP sockets, UNIX domain sockets, pipes, ptys (pseudo-terminals), terminal modes, ownership of controlling terminals, signal handlers, open file descriptors, shared open file descriptors, I/O (including the readline library), shared memory (via mmap), parent-child process relationships, pid virtualization, and other operating system artifacts. By emphasizing an unprivileged, user-space approach, compatibility is maintained across Linux kernels from 2.6.9 through the current 2.6.28. Since DMTCP is unprivileged and does not require special kernel modules or kernel patches, DMTCP can be incorporated and distributed as a checkpoint-restart module within some larger package.
△ Less
Submitted 24 February, 2009; v1 submitted 6 January, 2007;
originally announced January 2007.