-
Better Together: Unifying Datalog and Equality Saturation
Authors:
Yihong Zhang,
Yisu Remy Wang,
Oliver Flatt,
David Cao,
Philip Zucker,
Eli Rosenthal,
Zachary Tatlock,
Max Willsey
Abstract:
We present egglog, a fixpoint reasoning system that unifies Datalog and equality saturation (EqSat). Like Datalog, it supports efficient incremental execution, cooperating analyses, and lattice-based reasoning. Like EqSat, it supports term rewriting, efficient congruence closure, and extraction of optimized terms.
We identify two recent applications--a unification-based pointer analysis in Datal…
▽ More
We present egglog, a fixpoint reasoning system that unifies Datalog and equality saturation (EqSat). Like Datalog, it supports efficient incremental execution, cooperating analyses, and lattice-based reasoning. Like EqSat, it supports term rewriting, efficient congruence closure, and extraction of optimized terms.
We identify two recent applications--a unification-based pointer analysis in Datalog and an EqSat-based floating-point term rewriter--that have been hampered by features missing from Datalog but found in EqSat or vice-versa. We evaluate egglog by reimplementing those projects in egglog. The resulting systems in egglog are faster, simpler, and fix bugs found in the original systems.
△ Less
Submitted 15 May, 2023; v1 submitted 9 April, 2023;
originally announced April 2023.
-
Free Join: Unifying Worst-Case Optimal and Traditional Joins
Authors:
Yisu Remy Wang,
Max Willsey,
Dan Suciu
Abstract:
Over the last decade, worst-case optimal join (WCOJ) algorithms have emerged as a new paradigm for one of the most fundamental challenges in query processing: computing joins efficiently. Such an algorithm can be asymptotically faster than traditional binary joins, all the while remaining simple to understand and implement. However, they have been found to be less efficient than the old paradigm,…
▽ More
Over the last decade, worst-case optimal join (WCOJ) algorithms have emerged as a new paradigm for one of the most fundamental challenges in query processing: computing joins efficiently. Such an algorithm can be asymptotically faster than traditional binary joins, all the while remaining simple to understand and implement. However, they have been found to be less efficient than the old paradigm, traditional binary join plans, on the typical acyclic queries found in practice. Some database systems that support WCOJ use a hypbrid approach: use WCOJ to process the cyclic subparts of the query (if any), and rely on traditional binary joins otherwise. In this paper we propose a new framework, called Free Join, that unifies the two paradigms. We describe a new type of plan, a new data structure (which unifies the hash tables and tries used by the two paradigms), and a suite of optimization techniques. Our system, implemented in Rust, matches or outperforms both traditional binary joins and Generic Join on standard query benchmarks.
△ Less
Submitted 27 January, 2023; v1 submitted 25 January, 2023;
originally announced January 2023.
-
babble: Learning Better Abstractions with E-Graphs and Anti-Unification
Authors:
David Cao,
Rose Kunkel,
Chandrakana Nandi,
Max Willsey,
Zachary Tatlock,
Nadia Polikarpova
Abstract:
Library learning compresses a given corpus of programs by extracting common structure from the corpus into reusable library functions. Prior work on library learning suffers from two limitations that prevent it from scaling to larger, more complex inputs. First, it explores too many candidate library functions that are not useful for compression. Second, it is not robust to syntactic variation in…
▽ More
Library learning compresses a given corpus of programs by extracting common structure from the corpus into reusable library functions. Prior work on library learning suffers from two limitations that prevent it from scaling to larger, more complex inputs. First, it explores too many candidate library functions that are not useful for compression. Second, it is not robust to syntactic variation in the input.
We propose library learning modulo theory (LLMT), a new library learning algorithm that additionally takes as input an equational theory for a given problem domain. LLMT uses e-graphs and equality saturation to compactly represent the space of programs equivalent modulo the theory, and uses a novel e-graph anti-unification technique to find common patterns in the corpus more directly and efficiently.
We implemented LLMT in a tool named BABBLE. Our evaluation shows that BABBLE achieves better compression orders of magnitude faster than the state of the art. We also provide a qualitative evaluation showing that BABBLE learns reusable functions on inputs previously out of reach for library learning.
△ Less
Submitted 8 December, 2022;
originally announced December 2022.
-
Small Proofs from Congruence Closure
Authors:
Oliver Flatt,
Samuel Coward,
Max Willsey,
Zachary Tatlock,
Pavel Panchekha
Abstract:
Satisfiability Modulo Theory (SMT) solvers and equality saturation engines must generate proof certificates from e-graph-based congruence closure procedures to enable verification and conflict clause generation. Smaller proof certificates speed up these activities. Though the problem of generating proofs of minimal size is known to be NP-complete, existing proof minimization algorithms for congrue…
▽ More
Satisfiability Modulo Theory (SMT) solvers and equality saturation engines must generate proof certificates from e-graph-based congruence closure procedures to enable verification and conflict clause generation. Smaller proof certificates speed up these activities. Though the problem of generating proofs of minimal size is known to be NP-complete, existing proof minimization algorithms for congruence closure generate unnecessarily large proofs and introduce asymptotic overhead over the core congruence closure procedure. In this paper, we introduce an O(n^5) time algorithm which generates optimal proofs under a new relaxed "proof tree size" metric that directly bounds proof size. We then relax this approach further to a practical O(n \log(n)) greedy algorithm which generates small proofs with no asymptotic overhead. We implemented our techniques in the egg equality saturation toolkit, yielding the first certifying equality saturation engine. We show that our greedy approach in egg quickly generates substantially smaller proofs than the state-of-the-art Z3 SMT solver on a corpus of 3760 benchmarks.
△ Less
Submitted 7 September, 2022;
originally announced September 2022.
-
Rewrite Rule Inference Using Equality Saturation
Authors:
Chandrakana Nandi,
Max Willsey,
Amy Zhu,
Yisu Remy Wang,
Brett Saiki,
Adam Anderson,
Adriana Schulz,
Dan Grossman,
Zachary Tatlock
Abstract:
Many compilers, synthesizers, and theorem provers rely on rewrite rules to simplify expressions or prove equivalences. Develo** rewrite rules can be difficult: rules may be subtly incorrect, profitable rules are easy to miss, and rulesets must be rechecked or extended whenever semantics are tweaked. Large rulesets can also be challenging to apply: redundant rules slow down rule-based search and…
▽ More
Many compilers, synthesizers, and theorem provers rely on rewrite rules to simplify expressions or prove equivalences. Develo** rewrite rules can be difficult: rules may be subtly incorrect, profitable rules are easy to miss, and rulesets must be rechecked or extended whenever semantics are tweaked. Large rulesets can also be challenging to apply: redundant rules slow down rule-based search and frustrate debugging. This paper explores how equality saturation, a promising technique that uses e-graphs to apply rewrite rules, can also be used to infer rewrite rules. E-graphs can compactly represent the exponentially large sets of enumerated terms and potential rewrite rules. We show that equality saturation efficiently shrinks both sets, leading to faster synthesis of smaller, more general rulesets.
We prototyped these strategies in a tool dubbed ruler. Compared to a similar tool built on CVC4, ruler synthesizes 5.8X smaller rulesets 25X faster without compromising on proving power. In an end-to-end case study, we show ruler-synthesized rules which perform as well as those crafted by domain experts, and addressed a longstanding issue in a popular open source tool.
△ Less
Submitted 23 August, 2021;
originally announced August 2021.
-
Relational E-Matching
Authors:
Yihong Zhang,
Yisu Remy Wang,
Max Willsey,
Zachary Tatlock
Abstract:
We present a new approach to e-matching based on relational join; in particular, we apply recent database query execution techniques to guarantee worst-case optimal run time. Compared to the conventional backtracking approach that always searches the e-graph "top down", our new relational e-matching approach can better exploit pattern structure by searching the e-graph according to an optimized qu…
▽ More
We present a new approach to e-matching based on relational join; in particular, we apply recent database query execution techniques to guarantee worst-case optimal run time. Compared to the conventional backtracking approach that always searches the e-graph "top down", our new relational e-matching approach can better exploit pattern structure by searching the e-graph according to an optimized query plan. We also establish the first data complexity result for e-matching, bounding run time as a function of the e-graph size and output size. We prototyped and evaluated our technique in the state-of-the-art egg e-graph framework. Compared to a conventional baseline, relational e-matching is simpler to implement and orders of magnitude faster in practice.
△ Less
Submitted 5 January, 2022; v1 submitted 4 August, 2021;
originally announced August 2021.
-
Co-Optimization of Design and Fabrication Plans for Carpentry: Supplemental Material
Authors:
Haisen Zhao,
Max Willsey,
Amy Zhu,
Chandrakana Nandi,
Zachary Tatlock,
Justin Solomon,
Adriana Schulz
Abstract:
Past work on optimizing fabrication plans given a carpentry design can provide Pareto-optimal plans trading off between material waste, fabrication time, precision, and other considerations. However, when develo** fabrication plans, experts rarely restrict to a single design, instead considering families of design variations, sometimes adjusting designs to simplify fabrication. Jointly exploring…
▽ More
Past work on optimizing fabrication plans given a carpentry design can provide Pareto-optimal plans trading off between material waste, fabrication time, precision, and other considerations. However, when develo** fabrication plans, experts rarely restrict to a single design, instead considering families of design variations, sometimes adjusting designs to simplify fabrication. Jointly exploring the design and fabrication plan spaces for each design is intractable using current techniques. We present a new approach to jointly optimize design and fabrication plans for carpentered objects. To make this bi-level optimization tractable, we adapt recent work from program synthesis based on equality graphs (e-graphs), which encode sets of equivalent programs. Our insight is that subproblems within our bi-level problem share significant substructures. By representing both designs and fabrication plans in a new bag of parts(BOP) e-graph, we amortize the cost of optimizing design components shared among multiple candidates. Even using BOP e-graphs, the optimization space grows quickly in practice. Hence, we also show how a feedback-guided search strategy dubbed Iterative Contraction and Expansion on E-graphs (ICEE) can keep the size of the e-graph manage-able and direct the search toward promising candidates. We illustrate the advantages of our pipeline through examples from the carpentry domain.
△ Less
Submitted 30 July, 2021;
originally announced July 2021.
-
Co-Optimization of Design and Fabrication Plans for Carpentry
Authors:
Haisen Zhao,
Max Willsey,
Amy Zhu,
Chandrakana Nandi,
Zachary Tatlock,
Justin Solomon,
Adriana Schulz
Abstract:
Past work on optimizing fabrication plans given a carpentry design can provide Pareto-optimal plans trading off between material waste, fabrication time, precision, and other considerations. However, when develo** fabrication plans, experts rarely restrict to a single design, instead considering families of design variations, sometimes adjusting designs to simplify fabrication. Jointly exploring…
▽ More
Past work on optimizing fabrication plans given a carpentry design can provide Pareto-optimal plans trading off between material waste, fabrication time, precision, and other considerations. However, when develo** fabrication plans, experts rarely restrict to a single design, instead considering families of design variations, sometimes adjusting designs to simplify fabrication. Jointly exploring the design and fabrication plan spaces for each design is intractable using current techniques. We present a new approach to jointly optimize design and fabrication plans for carpentered objects. To make this bi-level optimization tractable, we adapt recent work from program synthesis based on equality graphs (e-graphs), which encode sets of equivalent programs. Our insight is that subproblems within our bi-level problem share significant substructures. By representing both designs and fabrication plans in a new bag of parts(BOP) e-graph, we amortize the cost of optimizing design components shared among multiple candidates. Even using BOP e-graphs, the optimization space grows quickly in practice. Hence, we also show how a feedback-guided search strategy dubbed Iterative Contraction and Expansion on E-graphs(ICEE) can keep the size of the e-graph manage-able and direct the search toward promising candidates. We illustrate the advantages of our pipeline through examples from the carpentry domain.
△ Less
Submitted 3 August, 2021; v1 submitted 26 July, 2021;
originally announced July 2021.
-
Equality Saturation for Tensor Graph Superoptimization
Authors:
Yichen Yang,
Phitchaya Mangpo Phothilimtha,
Yisu Remy Wang,
Max Willsey,
Sudip Roy,
Jacques Pienaar
Abstract:
One of the major optimizations employed in deep learning frameworks is graph rewriting. Production frameworks rely on heuristics to decide if rewrite rules should be applied and in which order. Prior research has shown that one can discover more optimal tensor computation graphs if we search for a better sequence of substitutions instead of relying on heuristics. However, we observe that existing…
▽ More
One of the major optimizations employed in deep learning frameworks is graph rewriting. Production frameworks rely on heuristics to decide if rewrite rules should be applied and in which order. Prior research has shown that one can discover more optimal tensor computation graphs if we search for a better sequence of substitutions instead of relying on heuristics. However, we observe that existing approaches for tensor graph superoptimization both in production and research frameworks apply substitutions in a sequential manner. Such sequential search methods are sensitive to the order in which the substitutions are applied and often only explore a small fragment of the exponential space of equivalent graphs. This paper presents a novel technique for tensor graph superoptimization that employs equality saturation to apply all possible substitutions at once. We show that our approach can find optimized graphs with up to 16% speedup over state-of-the-art, while spending on average 48x less time optimizing.
△ Less
Submitted 17 March, 2021; v1 submitted 4 January, 2021;
originally announced January 2021.
-
egg: Fast and Extensible Equality Saturation
Authors:
Max Willsey,
Chandrakana Nandi,
Yisu Remy Wang,
Oliver Flatt,
Zachary Tatlock,
Pavel Panchekha
Abstract:
An e-graph efficiently represents a congruence relation over many expressions. Although they were originally developed in the late 1970s for use in automated theorem provers, a more recent technique known as equality saturation repurposes e-graphs to implement state-of-the-art, rewrite-driven compiler optimizations and program synthesizers. However, e-graphs remain unspecialized for this newer use…
▽ More
An e-graph efficiently represents a congruence relation over many expressions. Although they were originally developed in the late 1970s for use in automated theorem provers, a more recent technique known as equality saturation repurposes e-graphs to implement state-of-the-art, rewrite-driven compiler optimizations and program synthesizers. However, e-graphs remain unspecialized for this newer use case. Equality saturation workloads exhibit distinct characteristics and often require ad-hoc e-graph extensions to incorporate transformations beyond purely syntactic rewrites.
This work contributes two techniques that make e-graphs fast and extensible, specializing them to equality saturation. A new amortized invariant restoration technique called rebuilding takes advantage of equality saturation's distinct workload, providing asymptotic speedups over current techniques in practice. A general mechanism called e-class analyses integrates domain-specific analyses into the e-graph, reducing the need for ad hoc manipulation.
We implemented these techniques in a new open-source library called egg. Our case studies on three previously published applications of equality saturation highlight how egg's performance and flexibility enable state-of-the-art results across diverse domains.
△ Less
Submitted 7 November, 2020; v1 submitted 6 April, 2020;
originally announced April 2020.
-
Synthesizing Structured CAD Models with Equality Saturation and Inverse Transformations
Authors:
Chandrakana Nandi,
Max Willsey,
Adam Anderson,
James R. Wilcox,
Eva Darulova,
Dan Grossman,
Zachary Tatlock
Abstract:
Recent program synthesis techniques help users customize CAD models(e.g., for 3D printing) by decompiling low-level triangle meshes to Constructive Solid Geometry (CSG) expressions. Without loops or functions, editing CSG can require many coordinated changes, and existing mesh decompilers use heuristics that can obfuscate high-level structure.
This paper proposes a second decompilation stage to…
▽ More
Recent program synthesis techniques help users customize CAD models(e.g., for 3D printing) by decompiling low-level triangle meshes to Constructive Solid Geometry (CSG) expressions. Without loops or functions, editing CSG can require many coordinated changes, and existing mesh decompilers use heuristics that can obfuscate high-level structure.
This paper proposes a second decompilation stage to robustly "shrink" unstructured CSG expressions into more editable programs with map and fold operators. We present Szalinski, a tool that uses Equality Saturation with semantics-preserving CAD rewrites to efficiently search for smaller equivalent programs. Szalinski relies on inverse transformations, a novel way for solvers to speculatively add equivalences to an E-graph. We qualitatively evaluate Szalinski in case studies, show how it composes with an existing mesh decompiler, and demonstrate that Szalinski can shrink large models in seconds.
△ Less
Submitted 12 April, 2020; v1 submitted 26 September, 2019;
originally announced September 2019.
-
Design and Implementation of Concurrent C0
Authors:
Max Willsey,
Rokhini Prabhu,
Frank Pfenning
Abstract:
We describe Concurrent C0, a type-safe C-like language with contracts and session-typed communication over channels. Concurrent C0 supports an operation called forwarding which allows channels to be combined in a well-defined way. The language's type system enables elegant expression of session types and message-passing concurrent programs. We provide a Go-based implementation with language based…
▽ More
We describe Concurrent C0, a type-safe C-like language with contracts and session-typed communication over channels. Concurrent C0 supports an operation called forwarding which allows channels to be combined in a well-defined way. The language's type system enables elegant expression of session types and message-passing concurrent programs. We provide a Go-based implementation with language based optimizations that outperforms traditional message passing techniques.
△ Less
Submitted 17 January, 2017;
originally announced January 2017.