-
Optimal Heap Limits for Reducing Browser Memory Use
Authors:
Marisa Kirisame,
Pranav Shenoy,
Pavel Panchekha
Abstract:
Garbage-collected language runtimes carefully tune heap limits to reduce garbage collection time and memory usage. However, there's a trade-off: a lower heap limit reduces memory use but increases garbage collection time. Classic methods for setting heap limits include manually tuned heap limits and multiple-of-live-size rules of thumb, but it is not clear when one rule is better than another or h…
▽ More
Garbage-collected language runtimes carefully tune heap limits to reduce garbage collection time and memory usage. However, there's a trade-off: a lower heap limit reduces memory use but increases garbage collection time. Classic methods for setting heap limits include manually tuned heap limits and multiple-of-live-size rules of thumb, but it is not clear when one rule is better than another or how to compare them.
We address this problem with a new framework where heap limits are set for multiple heaps at once. Our key insight is that every heap limit rule induces a particular allocation of memory across multiple processes, and this allocation can be sub-optimal. We use our framework to derive an optimal "square-root" heap limit rule, which minimizes total memory usage for any amount of total garbage collection time. Paradoxically, the square-root heap limit rule achieves this coordination without communication: it allocates memory optimally across multiple heaps without requiring any communication between those heaps.
To demonstrate that this heap limit rule is effective, we prototype it for V8, the JavaScript runtime used in Google Chrome, Microsoft Edge, and other browsers, as well as in server-side frameworks like node.js and Deno. On real-world web pages, our prototype achieves reductions of approximately 16.0% of memory usage while kee** garbage collection time constant. On memory-intensive benchmarks, reductions of up to 30.0% of garbage collection time are possible with no change in total memory usage.
△ Less
Submitted 25 September, 2022; v1 submitted 21 April, 2022;
originally announced April 2022.
-
Dynamic Tensor Rematerialization
Authors:
Marisa Kirisame,
Steven Lyubomirsky,
Altan Haan,
Jennifer Brennan,
Mike He,
Jared Roesch,
Tianqi Chen,
Zachary Tatlock
Abstract:
Checkpointing enables the training of deep learning models under restricted memory budgets by freeing intermediate activations from memory and recomputing them on demand. Current checkpointing techniques statically plan these recomputations offline and assume static computation graphs. We demonstrate that a simple online algorithm can achieve comparable performance by introducing Dynamic Tensor Re…
▽ More
Checkpointing enables the training of deep learning models under restricted memory budgets by freeing intermediate activations from memory and recomputing them on demand. Current checkpointing techniques statically plan these recomputations offline and assume static computation graphs. We demonstrate that a simple online algorithm can achieve comparable performance by introducing Dynamic Tensor Rematerialization (DTR), a greedy online algorithm for checkpointing that is extensible and general, is parameterized by eviction policy, and supports dynamic models. We prove that DTR can train an $N$-layer linear feedforward network on an $Ω(\sqrt{N})$ memory budget with only $\mathcal{O}(N)$ tensor operations. DTR closely matches the performance of optimal static checkpointing in simulated experiments. We incorporate a DTR prototype into PyTorch merely by interposing on tensor allocations and operator calls and collecting lightweight metadata on tensors.
△ Less
Submitted 18 March, 2021; v1 submitted 16 June, 2020;
originally announced June 2020.
-
Relay: A High-Level Compiler for Deep Learning
Authors:
Jared Roesch,
Steven Lyubomirsky,
Marisa Kirisame,
Logan Weber,
Josh Pollock,
Luis Vega,
Ziheng Jiang,
Tianqi Chen,
Thierry Moreau,
Zachary Tatlock
Abstract:
Frameworks for writing, compiling, and optimizing deep learning (DL) models have recently enabled progress in areas like computer vision and natural language processing. Extending these frameworks to accommodate the rapidly diversifying landscape of DL models and hardware platforms presents challenging tradeoffs between expressivity, composability, and portability. We present Relay, a new compiler…
▽ More
Frameworks for writing, compiling, and optimizing deep learning (DL) models have recently enabled progress in areas like computer vision and natural language processing. Extending these frameworks to accommodate the rapidly diversifying landscape of DL models and hardware platforms presents challenging tradeoffs between expressivity, composability, and portability. We present Relay, a new compiler framework for DL. Relay's functional, statically typed intermediate representation (IR) unifies and generalizes existing DL IRs to express state-of-the-art models. The introduction of Relay's expressive IR requires careful design of domain-specific optimizations, addressed via Relay's extension mechanisms. Using these extension mechanisms, Relay supports a unified compiler that can target a variety of hardware platforms. Our evaluation demonstrates Relay's competitive performance for a broad class of models and devices (CPUs, GPUs, and emerging accelerators). Relay's design demonstrates how a unified IR can provide expressivity, composability, and portability without compromising performance.
△ Less
Submitted 24 August, 2019; v1 submitted 17 April, 2019;
originally announced April 2019.
-
Relay: A New IR for Machine Learning Frameworks
Authors:
Jared Roesch,
Steven Lyubomirsky,
Logan Weber,
Josh Pollock,
Marisa Kirisame,
Tianqi Chen,
Zachary Tatlock
Abstract:
Machine learning powers diverse services in industry including search, translation, recommendation systems, and security. The scale and importance of these models require that they be efficient, expressive, and portable across an array of heterogeneous hardware devices. These constraints are often at odds; in order to better accommodate them we propose a new high-level intermediate representation…
▽ More
Machine learning powers diverse services in industry including search, translation, recommendation systems, and security. The scale and importance of these models require that they be efficient, expressive, and portable across an array of heterogeneous hardware devices. These constraints are often at odds; in order to better accommodate them we propose a new high-level intermediate representation (IR) called Relay. Relay is being designed as a purely-functional, statically-typed language with the goal of balancing efficient compilation, expressiveness, and portability. We discuss the goals of Relay and highlight its important design constraints. Our prototype is part of the open source NNVM compiler framework, which powers Amazon's deep learning framework MxNet.
△ Less
Submitted 25 September, 2018;
originally announced October 2018.