Search | arXiv e-print repository

There and Back Again: A Netlist's Tale with Much Egraphin'

Authors: Gus Henry Smith, Zachary D. Sisco, Thanawat Techaumnuaiwit, **gtao Xia, Vishal Canumalla, Andrew Cheung, Zachary Tatlock, Chandrakana Nandi, Jonathan Balkind

Abstract: EDA toolchains are notoriously unpredictable, incomplete, and error-prone; the generally-accepted remedy has been to re-imagine EDA tasks as compilation problems. However, any compiler framework we apply must be prepared to handle the wide range of EDA tasks, including not only compilation tasks like technology map** and optimization (the "there"} in our title), but also decompilation tasks like… ▽ More EDA toolchains are notoriously unpredictable, incomplete, and error-prone; the generally-accepted remedy has been to re-imagine EDA tasks as compilation problems. However, any compiler framework we apply must be prepared to handle the wide range of EDA tasks, including not only compilation tasks like technology map** and optimization (the "there"} in our title), but also decompilation tasks like loop rerolling (the "back again"). In this paper, we advocate for equality saturation -- a term rewriting framework -- as the framework of choice when building hardware toolchains. Through a series of case studies, we show how the needs of EDA tasks line up conspicuously well with the features equality saturation provides. △ Less

Submitted 31 March, 2024; originally announced April 2024.

arXiv:2403.03481 [pdf, other]

Magic Markup: Maintaining Document-External Markup with an LLM

Authors: Edward Misback, Zachary Tatlock, Steven L. Tanimoto

Abstract: Text documents, including programs, typically have human-readable semantic structure. Historically, programmatic access to these semantics has required explicit in-document tagging. Especially in systems where the text has an execution semantics, this means it is an opt-in feature that is hard to support properly. Today, language models offer a new method: metadata can be bound to entities in chan… ▽ More Text documents, including programs, typically have human-readable semantic structure. Historically, programmatic access to these semantics has required explicit in-document tagging. Especially in systems where the text has an execution semantics, this means it is an opt-in feature that is hard to support properly. Today, language models offer a new method: metadata can be bound to entities in changing text using a model's human-like understanding of semantics, with no requirements on the document structure. This method expands the applications of document annotation, a fundamental operation in program writing, debugging, maintenance, and presentation. We contribute a system that employs an intelligent agent to re-tag modified programs, enabling rich annotations to automatically follow code as it evolves. We also contribute a formal problem definition, an empirical synthetic benchmark suite, and our benchmark generator. Our system achieves an accuracy of 90% on our benchmarks and can replace a document's tags in parallel at a rate of 5 seconds per tag. While there remains significant room for improvement, we find performance reliable enough to justify further exploration of applications. △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: 10 pages; 2 figures; to be published in the <Programming> 2024 Conference Companion

arXiv:2401.16526 [pdf, other]

doi 10.1145/3620665.3640387

FPGA Technology Map** Using Sketch-Guided Program Synthesis

Authors: Gus Henry Smith, Ben Kushigian, Vishal Canumalla, Andrew Cheung, Steven Lyubomirsky, Sorawee Porncharoenwase, René Just, Gilbert Louis Bernstein, Zachary Tatlock

Abstract: FPGA technology map** is the process of implementing a hardware design expressed in high-level HDL (hardware design language) code using the low-level, architecture-specific primitives of the target FPGA. As FPGAs become increasingly heterogeneous, achieving high performance requires hardware synthesis tools that better support map** to complex, highly configurable primitives like digital sign… ▽ More FPGA technology map** is the process of implementing a hardware design expressed in high-level HDL (hardware design language) code using the low-level, architecture-specific primitives of the target FPGA. As FPGAs become increasingly heterogeneous, achieving high performance requires hardware synthesis tools that better support map** to complex, highly configurable primitives like digital signal processors (DSPs). Current tools support DSP map** via handwritten special-case map** rules, which are laborious to write, error-prone, and often overlook map** opportunities. We introduce Lakeroad, a principled approach to technology map** via sketch-guided program synthesis. Lakeroad leverages two techniques -- architecture-independent sketch templates and semantics extraction from HDL -- to provide extensible technology map** with stronger correctness guarantees and higher coverage of map** opportunities than state-of-the-art tools. Across representative microbenchmarks, Lakeroad produces 2--3.5$\times$ the number of optimal map**s compared to proprietary state-of-the-art tools and 6--44$\times$ the number of optimal map**s compared to popular open-source tools, while also providing correctness guarantees not given by any other tool. △ Less

Submitted 29 January, 2024; originally announced January 2024.

arXiv:2305.10599 [pdf, other]

Odyssey: An Interactive Workbench for Expert-Driven Floating-Point Expression Rewriting

Authors: Edward Misback, Caleb C. Chan, Brett Saiki, Eunice Jun, Zachary Tatlock, Pavel Panchekha

Abstract: In recent years, researchers have proposed a number of automated tools to identify and improve floating-point rounding error in mathematical expressions. However, users struggle to effectively apply these tools. In this paper, we work with novices, experts, and tool developers to investigate user needs during the expression rewriting process. We find that users follow an iterative design process.… ▽ More In recent years, researchers have proposed a number of automated tools to identify and improve floating-point rounding error in mathematical expressions. However, users struggle to effectively apply these tools. In this paper, we work with novices, experts, and tool developers to investigate user needs during the expression rewriting process. We find that users follow an iterative design process. They want to compare expressions on multiple input ranges, integrate and guide various rewriting tools and understand where errors come from. We organize this investigation's results into a three-stage workflow and implement that workflow in a new, extensible workbench dubbed Odyssey. Odyssey enables users to: (1) diagnose problems in an expression, (2) generate solutions automatically or by hand, and (3) tune their results. Odyssey tracks a working set of expressions and turns a state-of-the-art automated tool "inside out," giving the user access to internal heuristics, algorithms, and functionality. In a user study, Odyssey enabled five expert numerical analysts to solve challenging rewriting problems where state-of-the-art automated tools fail. In particular, the experts unanimously praised Odyssey's novel support for interactive range modification and local error visualization. △ Less

Submitted 9 August, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

Comments: 15 pages, 7 figures, 3 tables

arXiv:2305.09580 [pdf, other]

Generate Compilers from Hardware Models!

Authors: Gus Henry Smith, Ben Kushigian, Vishal Canumalla, Andrew Cheung, René Just, Zachary Tatlock

Abstract: Compiler backends should be automatically generated from hardware design language (HDL) models of the hardware they target. Generating compiler components directly from HDL can provide stronger correctness guarantees, ease development effort, and encourage hardware exploration. Past work has already championed this idea; here we argue that advances in program synthesis make the approach more feasi… ▽ More Compiler backends should be automatically generated from hardware design language (HDL) models of the hardware they target. Generating compiler components directly from HDL can provide stronger correctness guarantees, ease development effort, and encourage hardware exploration. Past work has already championed this idea; here we argue that advances in program synthesis make the approach more feasible. We present a concrete example by demonstrating how FPGA technology mappers can be automatically generated from SystemVerilog models of an FPGA's primitives using program synthesis. △ Less

Submitted 16 May, 2023; originally announced May 2023.

Comments: 3 pages, 2 figures, to be presented at the 2023 PLARCH Workshop at FCRC

arXiv:2304.04332 [pdf, other]

Better Together: Unifying Datalog and Equality Saturation

Authors: Yihong Zhang, Yisu Remy Wang, Oliver Flatt, David Cao, Philip Zucker, Eli Rosenthal, Zachary Tatlock, Max Willsey

Abstract: We present egglog, a fixpoint reasoning system that unifies Datalog and equality saturation (EqSat). Like Datalog, it supports efficient incremental execution, cooperating analyses, and lattice-based reasoning. Like EqSat, it supports term rewriting, efficient congruence closure, and extraction of optimized terms. We identify two recent applications--a unification-based pointer analysis in Datal… ▽ More We present egglog, a fixpoint reasoning system that unifies Datalog and equality saturation (EqSat). Like Datalog, it supports efficient incremental execution, cooperating analyses, and lattice-based reasoning. Like EqSat, it supports term rewriting, efficient congruence closure, and extraction of optimized terms. We identify two recent applications--a unification-based pointer analysis in Datalog and an EqSat-based floating-point term rewriter--that have been hampered by features missing from Datalog but found in EqSat or vice-versa. We evaluate egglog by reimplementing those projects in egglog. The resulting systems in egglog are faster, simpler, and fix bugs found in the original systems. △ Less

Submitted 15 May, 2023; v1 submitted 9 April, 2023; originally announced April 2023.

Comments: PLDI 2023

arXiv:2212.04596 [pdf, other]

doi 10.1145/3571207

babble: Learning Better Abstractions with E-Graphs and Anti-Unification

Authors: David Cao, Rose Kunkel, Chandrakana Nandi, Max Willsey, Zachary Tatlock, Nadia Polikarpova

Abstract: Library learning compresses a given corpus of programs by extracting common structure from the corpus into reusable library functions. Prior work on library learning suffers from two limitations that prevent it from scaling to larger, more complex inputs. First, it explores too many candidate library functions that are not useful for compression. Second, it is not robust to syntactic variation in… ▽ More Library learning compresses a given corpus of programs by extracting common structure from the corpus into reusable library functions. Prior work on library learning suffers from two limitations that prevent it from scaling to larger, more complex inputs. First, it explores too many candidate library functions that are not useful for compression. Second, it is not robust to syntactic variation in the input. We propose library learning modulo theory (LLMT), a new library learning algorithm that additionally takes as input an equational theory for a given problem domain. LLMT uses e-graphs and equality saturation to compactly represent the space of programs equivalent modulo the theory, and uses a novel e-graph anti-unification technique to find common patterns in the corpus more directly and efficiently. We implemented LLMT in a tool named BABBLE. Our evaluation shows that BABBLE achieves better compression orders of magnitude faster than the state of the art. We also provide a qualitative evaluation showing that BABBLE learns reusable functions on inputs previously out of reach for library learning. △ Less

Submitted 8 December, 2022; originally announced December 2022.

Comments: POPL 2023

arXiv:2209.03398 [pdf, other]

Small Proofs from Congruence Closure

Authors: Oliver Flatt, Samuel Coward, Max Willsey, Zachary Tatlock, Pavel Panchekha

Abstract: Satisfiability Modulo Theory (SMT) solvers and equality saturation engines must generate proof certificates from e-graph-based congruence closure procedures to enable verification and conflict clause generation. Smaller proof certificates speed up these activities. Though the problem of generating proofs of minimal size is known to be NP-complete, existing proof minimization algorithms for congrue… ▽ More Satisfiability Modulo Theory (SMT) solvers and equality saturation engines must generate proof certificates from e-graph-based congruence closure procedures to enable verification and conflict clause generation. Smaller proof certificates speed up these activities. Though the problem of generating proofs of minimal size is known to be NP-complete, existing proof minimization algorithms for congruence closure generate unnecessarily large proofs and introduce asymptotic overhead over the core congruence closure procedure. In this paper, we introduce an O(n^5) time algorithm which generates optimal proofs under a new relaxed "proof tree size" metric that directly bounds proof size. We then relax this approach further to a practical O(n \log(n)) greedy algorithm which generates small proofs with no asymptotic overhead. We implemented our techniques in the egg equality saturation toolkit, yielding the first certifying equality saturation engine. We show that our greedy approach in egg quickly generates substantially smaller proofs than the state-of-the-art Z3 SMT solver on a corpus of 3760 benchmarks. △ Less

Submitted 7 September, 2022; originally announced September 2022.

arXiv:2203.00218 [pdf, other]

Application-Level Validation of Accelerator Designs Using a Formal Software/Hardware Interface

Authors: Bo-Yuan Huang, Steven Lyubomirsky, Yi Li, Mike He, Gus Henry Smith, Thierry Tambe, Akash Gaonkar, Vishal Canumalla, Andrew Cheung, Gu-Yeon Wei, Aarti Gupta, Zachary Tatlock, Sharad Malik

Abstract: Ideally, accelerator development should be as easy as software development. Several recent design languages/tools are working toward this goal, but actually testing early designs on real applications end-to-end remains prohibitively difficult due to the costs of building specialized compiler and simulator support. We propose a new first-in-class, mostly automated methodology termed "3LA" to enable… ▽ More Ideally, accelerator development should be as easy as software development. Several recent design languages/tools are working toward this goal, but actually testing early designs on real applications end-to-end remains prohibitively difficult due to the costs of building specialized compiler and simulator support. We propose a new first-in-class, mostly automated methodology termed "3LA" to enable end-to-end testing of prototype accelerator designs on unmodified source applications. A key contribution of 3LA is the use of a formal software/hardware interface that specifies an accelerator's operations and their semantics. Specifically, we leverage the Instruction-Level Abstraction (ILA) formal specification for accelerators that has been successfully used thus far for accelerator implementation verification. We show how the ILA for accelerators serves as a software/hardware interface, similar to the Instruction Set Architecture (ISA) for processors, that can be used for automated development of compilers and instruction-level simulators. Another key contribution of this work is to show how ILA-based accelerator semantics enables extending recent work on equality saturation to auto-generate basic compiler support for prototype accelerators in a technique we term "flexible matching." By combining flexible matching with simulators auto-generated from ILA specifications, our approach enables end-to-end evaluation with modest engineering effort. We detail several case studies of 3LA, which uncovered an unknown flaw in a recently published accelerator and facilitated its fix. △ Less

Submitted 22 August, 2023; v1 submitted 28 February, 2022; originally announced March 2022.

arXiv:2108.10436 [pdf, other]

Rewrite Rule Inference Using Equality Saturation

Authors: Chandrakana Nandi, Max Willsey, Amy Zhu, Yisu Remy Wang, Brett Saiki, Adam Anderson, Adriana Schulz, Dan Grossman, Zachary Tatlock

Abstract: Many compilers, synthesizers, and theorem provers rely on rewrite rules to simplify expressions or prove equivalences. Develo** rewrite rules can be difficult: rules may be subtly incorrect, profitable rules are easy to miss, and rulesets must be rechecked or extended whenever semantics are tweaked. Large rulesets can also be challenging to apply: redundant rules slow down rule-based search and… ▽ More Many compilers, synthesizers, and theorem provers rely on rewrite rules to simplify expressions or prove equivalences. Develo** rewrite rules can be difficult: rules may be subtly incorrect, profitable rules are easy to miss, and rulesets must be rechecked or extended whenever semantics are tweaked. Large rulesets can also be challenging to apply: redundant rules slow down rule-based search and frustrate debugging. This paper explores how equality saturation, a promising technique that uses e-graphs to apply rewrite rules, can also be used to infer rewrite rules. E-graphs can compactly represent the exponentially large sets of enumerated terms and potential rewrite rules. We show that equality saturation efficiently shrinks both sets, leading to faster synthesis of smaller, more general rulesets. We prototyped these strategies in a tool dubbed ruler. Compared to a similar tool built on CVC4, ruler synthesizes 5.8X smaller rulesets 25X faster without compromising on proving power. In an end-to-end case study, we show ruler-synthesized rules which perform as well as those crafted by domain experts, and addressed a longstanding issue in a popular open source tool. △ Less

Submitted 23 August, 2021; originally announced August 2021.

arXiv:2108.02290 [pdf, other]

Relational E-Matching

Authors: Yihong Zhang, Yisu Remy Wang, Max Willsey, Zachary Tatlock

Abstract: We present a new approach to e-matching based on relational join; in particular, we apply recent database query execution techniques to guarantee worst-case optimal run time. Compared to the conventional backtracking approach that always searches the e-graph "top down", our new relational e-matching approach can better exploit pattern structure by searching the e-graph according to an optimized qu… ▽ More We present a new approach to e-matching based on relational join; in particular, we apply recent database query execution techniques to guarantee worst-case optimal run time. Compared to the conventional backtracking approach that always searches the e-graph "top down", our new relational e-matching approach can better exploit pattern structure by searching the e-graph according to an optimized query plan. We also establish the first data complexity result for e-matching, bounding run time as a function of the e-graph size and output size. We prototyped and evaluated our technique in the state-of-the-art egg e-graph framework. Compared to a conventional baseline, relational e-matching is simpler to implement and orders of magnitude faster in practice. △ Less

Submitted 5 January, 2022; v1 submitted 4 August, 2021; originally announced August 2021.

Comments: POPL 2022

arXiv:2107.14745 [pdf, other]

Co-Optimization of Design and Fabrication Plans for Carpentry: Supplemental Material

Authors: Haisen Zhao, Max Willsey, Amy Zhu, Chandrakana Nandi, Zachary Tatlock, Justin Solomon, Adriana Schulz

Abstract: Past work on optimizing fabrication plans given a carpentry design can provide Pareto-optimal plans trading off between material waste, fabrication time, precision, and other considerations. However, when develo** fabrication plans, experts rarely restrict to a single design, instead considering families of design variations, sometimes adjusting designs to simplify fabrication. Jointly exploring… ▽ More Past work on optimizing fabrication plans given a carpentry design can provide Pareto-optimal plans trading off between material waste, fabrication time, precision, and other considerations. However, when develo** fabrication plans, experts rarely restrict to a single design, instead considering families of design variations, sometimes adjusting designs to simplify fabrication. Jointly exploring the design and fabrication plan spaces for each design is intractable using current techniques. We present a new approach to jointly optimize design and fabrication plans for carpentered objects. To make this bi-level optimization tractable, we adapt recent work from program synthesis based on equality graphs (e-graphs), which encode sets of equivalent programs. Our insight is that subproblems within our bi-level problem share significant substructures. By representing both designs and fabrication plans in a new bag of parts(BOP) e-graph, we amortize the cost of optimizing design components shared among multiple candidates. Even using BOP e-graphs, the optimization space grows quickly in practice. Hence, we also show how a feedback-guided search strategy dubbed Iterative Contraction and Expansion on E-graphs (ICEE) can keep the size of the e-graph manage-able and direct the search toward promising candidates. We illustrate the advantages of our pipeline through examples from the carpentry domain. △ Less

Submitted 30 July, 2021; originally announced July 2021.

Comments: 20 pages, 18 figure

ACM Class: I.3; I.3.8

arXiv:2107.12265 [pdf, other]

Co-Optimization of Design and Fabrication Plans for Carpentry

Authors: Haisen Zhao, Max Willsey, Amy Zhu, Chandrakana Nandi, Zachary Tatlock, Justin Solomon, Adriana Schulz

Abstract: Past work on optimizing fabrication plans given a carpentry design can provide Pareto-optimal plans trading off between material waste, fabrication time, precision, and other considerations. However, when develo** fabrication plans, experts rarely restrict to a single design, instead considering families of design variations, sometimes adjusting designs to simplify fabrication. Jointly exploring… ▽ More Past work on optimizing fabrication plans given a carpentry design can provide Pareto-optimal plans trading off between material waste, fabrication time, precision, and other considerations. However, when develo** fabrication plans, experts rarely restrict to a single design, instead considering families of design variations, sometimes adjusting designs to simplify fabrication. Jointly exploring the design and fabrication plan spaces for each design is intractable using current techniques. We present a new approach to jointly optimize design and fabrication plans for carpentered objects. To make this bi-level optimization tractable, we adapt recent work from program synthesis based on equality graphs (e-graphs), which encode sets of equivalent programs. Our insight is that subproblems within our bi-level problem share significant substructures. By representing both designs and fabrication plans in a new bag of parts(BOP) e-graph, we amortize the cost of optimizing design components shared among multiple candidates. Even using BOP e-graphs, the optimization space grows quickly in practice. Hence, we also show how a feedback-guided search strategy dubbed Iterative Contraction and Expansion on E-graphs(ICEE) can keep the size of the e-graph manage-able and direct the search toward promising candidates. We illustrate the advantages of our pipeline through examples from the carpentry domain. △ Less

Submitted 3 August, 2021; v1 submitted 26 July, 2021; originally announced July 2021.

Comments: 14 pages, 13 figure, Supplemental material: arXiv:2107.14745

ACM Class: I.3; I.3.8

arXiv:2105.09377 [pdf, other]

doi 10.1145/3460945.3464953

Pure Tensor Program Rewriting via Access Patterns (Representation Pearl)

Authors: Gus Henry Smith, Andrew Liu, Steven Lyubomirsky, Scott Davidson, Joseph McMahan, Michael Taylor, Luis Ceze, Zachary Tatlock

Abstract: Tensor kernels in machine learning (ML) often correspond to pure mathematical expressions, making term rewriting an attractive strategy for optimization and map** to specialized hardware accelerators. However, existing ML intermediate representations (IRs) tend to either be \textit{pure but high-level}, making low-level rewrites to hardware targets inexpressible, or \textit{low-level but impure}… ▽ More Tensor kernels in machine learning (ML) often correspond to pure mathematical expressions, making term rewriting an attractive strategy for optimization and map** to specialized hardware accelerators. However, existing ML intermediate representations (IRs) tend to either be \textit{pure but high-level}, making low-level rewrites to hardware targets inexpressible, or \textit{low-level but impure}, hampering the use of term rewriting altogether. This paper introduces Glenside, a pure IR whose core abstraction -- the \textit{access pattern} -- enables low-level, layout-aware, hardware-centric program rewrites. We demonstrate how term rewriting in Glenside can be used to map program fragments to hardware accelerator invocations and automatically discover classic data layout transformations like \texttt{im2col}. Glenside establishes a new foundation for exploring further term rewriting techniques in optimizing low-level tensor programs. △ Less

Submitted 19 May, 2021; originally announced May 2021.

Comments: To be published at MAPS 2021

arXiv:2006.09616 [pdf, other]

Dynamic Tensor Rematerialization

Authors: Marisa Kirisame, Steven Lyubomirsky, Altan Haan, Jennifer Brennan, Mike He, Jared Roesch, Tianqi Chen, Zachary Tatlock

Abstract: Checkpointing enables the training of deep learning models under restricted memory budgets by freeing intermediate activations from memory and recomputing them on demand. Current checkpointing techniques statically plan these recomputations offline and assume static computation graphs. We demonstrate that a simple online algorithm can achieve comparable performance by introducing Dynamic Tensor Re… ▽ More Checkpointing enables the training of deep learning models under restricted memory budgets by freeing intermediate activations from memory and recomputing them on demand. Current checkpointing techniques statically plan these recomputations offline and assume static computation graphs. We demonstrate that a simple online algorithm can achieve comparable performance by introducing Dynamic Tensor Rematerialization (DTR), a greedy online algorithm for checkpointing that is extensible and general, is parameterized by eviction policy, and supports dynamic models. We prove that DTR can train an $N$-layer linear feedforward network on an $Ω(\sqrt{N})$ memory budget with only $\mathcal{O}(N)$ tensor operations. DTR closely matches the performance of optimal static checkpointing in simulated experiments. We incorporate a DTR prototype into PyTorch merely by interposing on tensor allocations and operator calls and collecting lightweight metadata on tensors. △ Less

Submitted 18 March, 2021; v1 submitted 16 June, 2020; originally announced June 2020.

Comments: 31 pages, 12 figures, implementation available here: https://github.com/uwsampl/dtr-prototype, OpenReview: https://openreview.net/forum?id=Vfs_2RnOD0H

ACM Class: C.3

arXiv:2006.03031 [pdf, other]

Nimble: Efficiently Compiling Dynamic Neural Networks for Model Inference

Authors: Haichen Shen, Jared Roesch, Zhi Chen, Wei Chen, Yong Wu, Mu Li, Vin Sharma, Zachary Tatlock, Yida Wang

Abstract: Modern deep neural networks increasingly make use of features such as dynamic control flow, data structures and dynamic tensor shapes. Existing deep learning systems focus on optimizing and executing static neural networks which assume a pre-determined model architecture and input data shapes--assumptions which are violated by dynamic neural networks. Therefore, executing dynamic models with deep… ▽ More Modern deep neural networks increasingly make use of features such as dynamic control flow, data structures and dynamic tensor shapes. Existing deep learning systems focus on optimizing and executing static neural networks which assume a pre-determined model architecture and input data shapes--assumptions which are violated by dynamic neural networks. Therefore, executing dynamic models with deep learning systems is currently both inflexible and sub-optimal, if not impossible. Optimizing dynamic neural networks is more challenging than static neural networks; optimizations must consider all possible execution paths and tensor shapes. This paper proposes Nimble, a high-performance and flexible system to optimize, compile, and execute dynamic neural networks on multiple platforms. Nimble handles model dynamism by introducing a dynamic type system, a set of dynamism-oriented optimizations, and a light-weight virtual machine runtime. Our evaluation demonstrates that Nimble outperforms state-of-the-art deep learning frameworks and runtime systems for dynamic neural networks by up to 20x on hardware platforms including Intel CPUs, ARM CPUs, and Nvidia GPUs. △ Less

Submitted 12 March, 2021; v1 submitted 4 June, 2020; originally announced June 2020.

Journal ref: In Proceedings of the 4th Conference on Machine Learning and Systems (MLSys 2021)

arXiv:2004.03082 [pdf, other]

doi 10.1145/3434304

egg: Fast and Extensible Equality Saturation

Authors: Max Willsey, Chandrakana Nandi, Yisu Remy Wang, Oliver Flatt, Zachary Tatlock, Pavel Panchekha

Abstract: An e-graph efficiently represents a congruence relation over many expressions. Although they were originally developed in the late 1970s for use in automated theorem provers, a more recent technique known as equality saturation repurposes e-graphs to implement state-of-the-art, rewrite-driven compiler optimizations and program synthesizers. However, e-graphs remain unspecialized for this newer use… ▽ More An e-graph efficiently represents a congruence relation over many expressions. Although they were originally developed in the late 1970s for use in automated theorem provers, a more recent technique known as equality saturation repurposes e-graphs to implement state-of-the-art, rewrite-driven compiler optimizations and program synthesizers. However, e-graphs remain unspecialized for this newer use case. Equality saturation workloads exhibit distinct characteristics and often require ad-hoc e-graph extensions to incorporate transformations beyond purely syntactic rewrites. This work contributes two techniques that make e-graphs fast and extensible, specializing them to equality saturation. A new amortized invariant restoration technique called rebuilding takes advantage of equality saturation's distinct workload, providing asymptotic speedups over current techniques in practice. A general mechanism called e-class analyses integrates domain-specific analyses into the e-graph, reducing the need for ad hoc manipulation. We implemented these techniques in a new open-source library called egg. Our case studies on three previously published applications of equality saturation highlight how egg's performance and flexibility enable state-of-the-art results across diverse domains. △ Less

Submitted 7 November, 2020; v1 submitted 6 April, 2020; originally announced April 2020.

Comments: 25 pages, 15 figures, POPL 2021

Journal ref: POPL 2021

arXiv:2003.06458 [pdf, other]

doi 10.1561/2500000045

QED at Large: A Survey of Engineering of Formally Verified Software

Authors: Talia Ringer, Karl Palmskog, Ilya Sergey, Milos Gligoric, Zachary Tatlock

Abstract: Development of formal proofs of correctness of programs can increase actual and perceived reliability and facilitate better understanding of program specifications and their underlying assumptions. Tools supporting such development have been available for over 40 years, but have only recently seen wide practical use. Projects based on construction of machine-checked formal proofs are now reaching… ▽ More Development of formal proofs of correctness of programs can increase actual and perceived reliability and facilitate better understanding of program specifications and their underlying assumptions. Tools supporting such development have been available for over 40 years, but have only recently seen wide practical use. Projects based on construction of machine-checked formal proofs are now reaching an unprecedented scale, comparable to large software projects, which leads to new challenges in proof development and maintenance. Despite its increasing importance, the field of proof engineering is seldom considered in its own right; related theories, techniques, and tools span many fields and venues. This survey of the literature presents a holistic understanding of proof engineering for program correctness, covering impact in practice, foundations, proof automation, proof organization, and practical proof development. △ Less

Submitted 13 March, 2020; originally announced March 2020.

Comments: 183 pages, for errata see https://proofengineering.org/qed_errata.html

ACM Class: F.3.1; D.2.4; I.2.3

Journal ref: Foundations and Trends in Programming Languages, Vol. 5, No. 2-3 (Sept. 2019), pp. 102-281

arXiv:2003.00290 [pdf, other]

Enumerating Hardware-Software Splits with Program Rewriting

Authors: Gus Smith, Zachary Tatlock, Luis Ceze

Abstract: A core problem in hardware-software codesign is in the sheer size of the design space. Without a set ISA to constrain the hardware-software interface, the design space explodes. This work presents a strategy for managing the massive hardware-software design space within the domain of machine learning inference workloads and accelerators. We first propose EngineIR, a new language for representing m… ▽ More A core problem in hardware-software codesign is in the sheer size of the design space. Without a set ISA to constrain the hardware-software interface, the design space explodes. This work presents a strategy for managing the massive hardware-software design space within the domain of machine learning inference workloads and accelerators. We first propose EngineIR, a new language for representing machine learning hardware and software in a single program. Then, using equality graphs -- a data structure from the compilers literature -- we suggest a method for efficiently enumerating the design space by performing rewrites over our representation. △ Less

Submitted 29 February, 2020; originally announced March 2020.

Comments: Accepted in the Second Young Architect Workshop, in conjunction with ASPLOS 2020

arXiv:1909.12252 [pdf, other]

doi 10.1145/3385412.3386012

Synthesizing Structured CAD Models with Equality Saturation and Inverse Transformations

Authors: Chandrakana Nandi, Max Willsey, Adam Anderson, James R. Wilcox, Eva Darulova, Dan Grossman, Zachary Tatlock

Abstract: Recent program synthesis techniques help users customize CAD models(e.g., for 3D printing) by decompiling low-level triangle meshes to Constructive Solid Geometry (CSG) expressions. Without loops or functions, editing CSG can require many coordinated changes, and existing mesh decompilers use heuristics that can obfuscate high-level structure. This paper proposes a second decompilation stage to… ▽ More Recent program synthesis techniques help users customize CAD models(e.g., for 3D printing) by decompiling low-level triangle meshes to Constructive Solid Geometry (CSG) expressions. Without loops or functions, editing CSG can require many coordinated changes, and existing mesh decompilers use heuristics that can obfuscate high-level structure. This paper proposes a second decompilation stage to robustly "shrink" unstructured CSG expressions into more editable programs with map and fold operators. We present Szalinski, a tool that uses Equality Saturation with semantics-preserving CAD rewrites to efficiently search for smaller equivalent programs. Szalinski relies on inverse transformations, a novel way for solvers to speculatively add equivalences to an E-graph. We qualitatively evaluate Szalinski in case studies, show how it composes with an existing mesh decompiler, and demonstrate that Szalinski can shrink large models in seconds. △ Less

Submitted 12 April, 2020; v1 submitted 26 September, 2019; originally announced September 2019.

Comments: 14 pages

Journal ref: PLDI 2020

arXiv:1904.08368 [pdf, other]

Relay: A High-Level Compiler for Deep Learning

Authors: Jared Roesch, Steven Lyubomirsky, Marisa Kirisame, Logan Weber, Josh Pollock, Luis Vega, Ziheng Jiang, Tianqi Chen, Thierry Moreau, Zachary Tatlock

Abstract: Frameworks for writing, compiling, and optimizing deep learning (DL) models have recently enabled progress in areas like computer vision and natural language processing. Extending these frameworks to accommodate the rapidly diversifying landscape of DL models and hardware platforms presents challenging tradeoffs between expressivity, composability, and portability. We present Relay, a new compiler… ▽ More Frameworks for writing, compiling, and optimizing deep learning (DL) models have recently enabled progress in areas like computer vision and natural language processing. Extending these frameworks to accommodate the rapidly diversifying landscape of DL models and hardware platforms presents challenging tradeoffs between expressivity, composability, and portability. We present Relay, a new compiler framework for DL. Relay's functional, statically typed intermediate representation (IR) unifies and generalizes existing DL IRs to express state-of-the-art models. The introduction of Relay's expressive IR requires careful design of domain-specific optimizations, addressed via Relay's extension mechanisms. Using these extension mechanisms, Relay supports a unified compiler that can target a variety of hardware platforms. Our evaluation demonstrates Relay's competitive performance for a broad class of models and devices (CPUs, GPUs, and emerging accelerators). Relay's design demonstrates how a unified IR can provide expressivity, composability, and portability without compromising performance. △ Less

Submitted 24 August, 2019; v1 submitted 17 April, 2019; originally announced April 2019.

arXiv:1810.00952 [pdf, other]

doi 10.1145/3211346.3211348

Relay: A New IR for Machine Learning Frameworks

Authors: Jared Roesch, Steven Lyubomirsky, Logan Weber, Josh Pollock, Marisa Kirisame, Tianqi Chen, Zachary Tatlock

Abstract: Machine learning powers diverse services in industry including search, translation, recommendation systems, and security. The scale and importance of these models require that they be efficient, expressive, and portable across an array of heterogeneous hardware devices. These constraints are often at odds; in order to better accommodate them we propose a new high-level intermediate representation… ▽ More Machine learning powers diverse services in industry including search, translation, recommendation systems, and security. The scale and importance of these models require that they be efficient, expressive, and portable across an array of heterogeneous hardware devices. These constraints are often at odds; in order to better accommodate them we propose a new high-level intermediate representation (IR) called Relay. Relay is being designed as a purely-functional, statically-typed language with the goal of balancing efficient compilation, expressiveness, and portability. We discuss the goals of Relay and highlight its important design constraints. Our prototype is part of the open source NNVM compiler framework, which powers Amazon's deep learning framework MxNet. △ Less

Submitted 25 September, 2018; originally announced October 2018.

arXiv:1806.05300 [pdf, other]

A Graphical Interactive Debugger for Distributed Systems

Authors: Doug Woos, Zachary Tatlock, Michael D. Ernst, Thomas E. Anderson

Abstract: Designing and debugging distributed systems is notoriously difficult. The correctness of a distributed system is largely determined by its handling of failure scenarios. The sequence of events leading to a bug can be long and complex, and it is likely to include message reorderings and failures. On single-node systems, interactive debuggers enable step** through an execution of the program, but… ▽ More Designing and debugging distributed systems is notoriously difficult. The correctness of a distributed system is largely determined by its handling of failure scenarios. The sequence of events leading to a bug can be long and complex, and it is likely to include message reorderings and failures. On single-node systems, interactive debuggers enable step** through an execution of the program, but they lack the ability to easily simulate failure scenarios and control the order in which messages are delivered. Oddity is a graphical, interactive debugger for distributed systems. It brings the power of traditional step-through debugging---fine-grained control and observation of a program as it executes---to distributed systems. It also enables exploratory testing, in which an engineer examines and perturbs the behavior of a system in order to better understand it, perhaps without a specific bug in mind. A programmer can directly control message and failure interleaving. Oddity supports time travel, allowing a developer to explore multiple branching executions of a system within a single debugging session. Above all, Oddity encourages distributed systems thinking: rather than assuming the normal case and attaching failure handling as an afterthought, distributed systems should be developed around the certainty of message loss and node failure. Graduate and undergraduate students used Oddity in two distributed systems classes. Usage tracking and qualitative surveys showed that students found Oddity useful for both debugging and exploratory testing. △ Less

Submitted 13 June, 2018; originally announced June 2018.

arXiv:1805.02436 [pdf, ps, other]

Combining Tools for Optimization and Analysis of Floating-Point Computations

Authors: Heiko Becker, Pavel Pancheckha, Eva Darulova, Zachary Tatlock

Abstract: Recent renewed interest in optimizing and analyzing floating-point programs has lead to a diverse array of new tools for numerical programs. These tools are often complementary, each focusing on a distinct aspect of numerical programming. Building reliable floating point applications typically requires addressing several of these aspects, which makes easy composition essential. This paper describe… ▽ More Recent renewed interest in optimizing and analyzing floating-point programs has lead to a diverse array of new tools for numerical programs. These tools are often complementary, each focusing on a distinct aspect of numerical programming. Building reliable floating point applications typically requires addressing several of these aspects, which makes easy composition essential. This paper describes the composition of two recent floating-point tools: Herbie, which performs accuracy optimization, and Daisy, which performs accuracy verification. We find that the combination provides numerous benefits to users, such as being able to use Daisy to check whether Herbie's unsound optimizations improved the worst-case roundoff error, as well as benefits to tool authors, including uncovering a number of bugs in both tools. The combination also allowed us to compare the different program rewriting techniques implemented by these tools for the first time. The paper lays out a road map for combining other floating-point tools and for surmounting common challenges. △ Less

Submitted 7 May, 2018; originally announced May 2018.

Comments: 13 pages

arXiv:1705.10416 [pdf, other]

Finding Root Causes of Floating Point Error with Herbgrind

Authors: Alex Sanchez-Stern, Pavel Panchekha, Sorin Lerner, Zachary Tatlock

Abstract: Floating-point arithmetic plays a central role in science, engineering, and finance by enabling developers to approximate real arithmetic. To address numerical issues in large floating-point applications, developers must identify root causes, which is difficult because floating-point errors are generally non-local, non-compositional, and non-uniform. This paper presents Herbgrind, a tool to help… ▽ More Floating-point arithmetic plays a central role in science, engineering, and finance by enabling developers to approximate real arithmetic. To address numerical issues in large floating-point applications, developers must identify root causes, which is difficult because floating-point errors are generally non-local, non-compositional, and non-uniform. This paper presents Herbgrind, a tool to help developers identify and address root causes in numerical code written in low-level C/C++ and Fortran. Herbgrind dynamically tracks dependencies between operations and program outputs to avoid false positives and abstracts erroneous computations to a simplified program fragment whose improvement can reduce output error. We perform several case studies applying Herbgrind to large, expert-crafted numerical programs and show that it scales to applications spanning hundreds of thousands of lines, correctly handling the low-level details of modern floating point hardware and mathematical libraries, and tracking error across function boundaries and through the heap. △ Less

Submitted 28 June, 2018; v1 submitted 29 May, 2017; originally announced May 2017.

Comments: 15 pages published at PLDI 18

arXiv:1012.1802 [pdf, ps, other]

doi 10.2168/LMCS-7(1:10)2011

Equality Saturation: A New Approach to Optimization

Authors: Ross Tate, Michael Stepp, Zachary Tatlock, Sorin Lerner

Abstract: Optimizations in a traditional compiler are applied sequentially, with each optimization destructively modifying the program to produce a transformed program that is then passed to the next optimization. We present a new approach for structuring the optimization phase of a compiler. In our approach, optimizations take the form of equality analyses that add equality information to a common interme… ▽ More Optimizations in a traditional compiler are applied sequentially, with each optimization destructively modifying the program to produce a transformed program that is then passed to the next optimization. We present a new approach for structuring the optimization phase of a compiler. In our approach, optimizations take the form of equality analyses that add equality information to a common intermediate representation. The optimizer works by repeatedly applying these analyses to infer equivalences between program fragments, thus saturating the intermediate representation with equalities. Once saturated, the intermediate representation encodes multiple optimized versions of the input program. At this point, a profitability heuristic picks the final optimized program from the various programs represented in the saturated representation. Our proposed way of structuring optimizers has a variety of benefits over previous approaches: our approach obviates the need to worry about optimization ordering, enables the use of a global optimization heuristic that selects among fully optimized programs, and can be used to perform translation validation, even on compilers other than our own. We present our approach, formalize it, and describe our choice of intermediate representation. We also present experimental results showing that our approach is practical in terms of time and space overhead, is effective at discovering intricate optimization opportunities, and is effective at performing translation validation for a realistic optimizer. △ Less

Submitted 25 March, 2011; v1 submitted 2 December, 2010; originally announced December 2010.

Comments: 80 pages, 39 figures

ACM Class: D.3.4

Journal ref: Logical Methods in Computer Science, Volume 7, Issue 1 (March 28, 2011) lmcs:1016

Showing 1–26 of 26 results for author: Tatlock, Z