Search | arXiv e-print repository

Manipulate-Anything: Automating Real-World Robots using Vision-Language Models

Authors: Jiafei Duan, Wentao Yuan, Wilbert Pumacay, Yi Ru Wang, Kiana Ehsani, Dieter Fox, Ranjay Krishna

Abstract: Large-scale endeavors like RT-1 and widespread community efforts such as Open-X-Embodiment have contributed to growing the scale of robot demonstration data. However, there is still an opportunity to improve the quality, quantity, and diversity of robot demonstration data. Although vision-language models have been shown to automatically generate demonstration data, their utility has been limited t… ▽ More Large-scale endeavors like RT-1 and widespread community efforts such as Open-X-Embodiment have contributed to growing the scale of robot demonstration data. However, there is still an opportunity to improve the quality, quantity, and diversity of robot demonstration data. Although vision-language models have been shown to automatically generate demonstration data, their utility has been limited to environments with privileged state information, they require hand-designed skills, and are limited to interactions with few object instances. We propose Manipulate-Anything, a scalable automated generation method for real-world robotic manipulation. Unlike prior work, our method can operate in real-world environments without any privileged state information, hand-designed skills, and can manipulate any static object. We evaluate our method using two setups. First, Manipulate-Anything successfully generates trajectories for all 5 real-world and 12 simulation tasks, significantly outperforming existing methods like VoxPoser. Second, Manipulate-Anything's demonstrations can train more robust behavior cloning policies than training with human demonstrations, or from data generated by VoxPoser and Code-As-Policies. We believe Manipulate-Anything can be the scalable method for both generating data for robotics and solving novel tasks in a zero-shot setting. △ Less

Submitted 27 June, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

Comments: Project page: https://robot-ma.github.io/

arXiv:2310.07018 [pdf, other]

NEWTON: Are Large Language Models Capable of Physical Reasoning?

Authors: Yi Ru Wang, Jiafei Duan, Dieter Fox, Siddhartha Srinivasa

Abstract: Large Language Models (LLMs), through their contextualized representations, have been empirically proven to encapsulate syntactic, semantic, word sense, and common-sense knowledge. However, there has been limited exploration of their physical reasoning abilities, specifically concerning the crucial attributes for comprehending everyday objects. To address this gap, we introduce NEWTON, a repositor… ▽ More Large Language Models (LLMs), through their contextualized representations, have been empirically proven to encapsulate syntactic, semantic, word sense, and common-sense knowledge. However, there has been limited exploration of their physical reasoning abilities, specifically concerning the crucial attributes for comprehending everyday objects. To address this gap, we introduce NEWTON, a repository and benchmark for evaluating the physics reasoning skills of LLMs. Further, to enable domain-specific adaptation of this benchmark, we present a pipeline to enable researchers to generate a variant of this benchmark that has been customized to the objects and attributes relevant for their application. The NEWTON repository comprises a collection of 2800 object-attribute pairs, providing the foundation for generating infinite-scale assessment templates. The NEWTON benchmark consists of 160K QA questions, curated using the NEWTON repository to investigate the physical reasoning capabilities of several mainstream language models across foundational, explicit, and implicit reasoning tasks. Through extensive empirical analysis, our results highlight the capabilities of LLMs for physical reasoning. We find that LLMs like GPT-4 demonstrate strong reasoning capabilities in scenario-based tasks but exhibit less consistency in object-attribute reasoning compared to humans (50% vs. 84%). Furthermore, the NEWTON platform demonstrates its potential for evaluating and enhancing language models, paving the way for their integration into physically grounded settings, such as robotic manipulation. Project site: https://newtonreasoning.github.io △ Less

Submitted 10 October, 2023; originally announced October 2023.

Comments: EMNLP 2023 Findings; 8 pages, 3 figures, 7 tables; Project page: https://newtonreasoning.github.io

arXiv:2307.15751 [pdf, other]

No More Nulls!

Authors: Yisu Remy Wang

Abstract: Since the inception of SQL, nulls have frustrated database users and builders alike. Those writing SQL must painstakingly guard their queries against surprising results caused by nulls, while those building database engines constantly struggle to implement the subtle semantics of 3-valued logic. Given that the relational model already provides a way to represent missing information, namely, with t… ▽ More Since the inception of SQL, nulls have frustrated database users and builders alike. Those writing SQL must painstakingly guard their queries against surprising results caused by nulls, while those building database engines constantly struggle to implement the subtle semantics of 3-valued logic. Given that the relational model already provides a way to represent missing information, namely, with the absence of a tuple in a relation, one may step back and ask:"Are nulls really necessary?" We answer:"No!" by proposing a new semantics for SQL that completely eliminates nulls. Our semantics, called Columnar Semantics, is as expressive as the standard 3-valued logic semantics, and behaves the same when the data and query are null-free. Where the two semantics differ, Columnar Semantics results in simpler queries. To evaluate Columnar Semantics and any other alternative semantics or query languages, we propose MIA (Missing Information Artifacts), a collection of queries and data sets for handling missing information, and invite contributions from the community. △ Less

Submitted 28 July, 2023; originally announced July 2023.

arXiv:2306.13818 [pdf, other]

AR2-D2:Training a Robot Without a Robot

Authors: Jiafei Duan, Yi Ru Wang, Mohit Shridhar, Dieter Fox, Ranjay Krishna

Abstract: Diligently gathered human demonstrations serve as the unsung heroes empowering the progression of robot learning. Today, demonstrations are collected by training people to use specialized controllers, which (tele-)operate robots to manipulate a small number of objects. By contrast, we introduce AR2-D2: a system for collecting demonstrations which (1) does not require people with specialized traini… ▽ More Diligently gathered human demonstrations serve as the unsung heroes empowering the progression of robot learning. Today, demonstrations are collected by training people to use specialized controllers, which (tele-)operate robots to manipulate a small number of objects. By contrast, we introduce AR2-D2: a system for collecting demonstrations which (1) does not require people with specialized training, (2) does not require any real robots during data collection, and therefore, (3) enables manipulation of diverse objects with a real robot. AR2-D2 is a framework in the form of an iOS app that people can use to record a video of themselves manipulating any object while simultaneously capturing essential data modalities for training a real robot. We show that data collected via our system enables the training of behavior cloning agents in manipulating real objects. Our experiments further show that training with our AR data is as effective as training with real-world robot demonstrations. Moreover, our user study indicates that users find AR2-D2 intuitive to use and require no training in contrast to four other frequently employed methods for collecting robot demonstrations. △ Less

Submitted 23 June, 2023; originally announced June 2023.

Comments: Project website: www.ar2d2.site

arXiv:2304.14501 [pdf, other]

Read My Mind: A Multi-Modal Dataset for Human Belief Prediction

Authors: Jiafei Duan, Samson Yu, Nicholas Tan, Yi Ru Wang, Cheston Tan

Abstract: Understanding human intentions is key to enabling effective and efficient human-robot interaction (HRI) in collaborative settings. To enable developments and evaluation of the ability of artificial intelligence (AI) systems to infer human beliefs, we introduce a large-scale multi-modal video dataset for intent prediction based on object-context relations. Understanding human intentions is key to enabling effective and efficient human-robot interaction (HRI) in collaborative settings. To enable developments and evaluation of the ability of artificial intelligence (AI) systems to infer human beliefs, we introduce a large-scale multi-modal video dataset for intent prediction based on object-context relations. △ Less

Submitted 7 March, 2023; originally announced April 2023.

Comments: Accepted to ICRA 2023 Communicating Robot Learning Across Human-Robot Interaction Workshop

arXiv:2304.04332 [pdf, other]

Better Together: Unifying Datalog and Equality Saturation

Authors: Yihong Zhang, Yisu Remy Wang, Oliver Flatt, David Cao, Philip Zucker, Eli Rosenthal, Zachary Tatlock, Max Willsey

Abstract: We present egglog, a fixpoint reasoning system that unifies Datalog and equality saturation (EqSat). Like Datalog, it supports efficient incremental execution, cooperating analyses, and lattice-based reasoning. Like EqSat, it supports term rewriting, efficient congruence closure, and extraction of optimized terms. We identify two recent applications--a unification-based pointer analysis in Datal… ▽ More We present egglog, a fixpoint reasoning system that unifies Datalog and equality saturation (EqSat). Like Datalog, it supports efficient incremental execution, cooperating analyses, and lattice-based reasoning. Like EqSat, it supports term rewriting, efficient congruence closure, and extraction of optimized terms. We identify two recent applications--a unification-based pointer analysis in Datalog and an EqSat-based floating-point term rewriter--that have been hampered by features missing from Datalog but found in EqSat or vice-versa. We evaluate egglog by reimplementing those projects in egglog. The resulting systems in egglog are faster, simpler, and fix bugs found in the original systems. △ Less

Submitted 15 May, 2023; v1 submitted 9 April, 2023; originally announced April 2023.

Comments: PLDI 2023

arXiv:2302.14360 [pdf, other]

A Study of Comfortability between Interactive AI and Human

Authors: Yi Ru Wang, Jiafei Duan, Sidharth Talia, Hao Zhu

Abstract: As the use of interactive AI systems becomes increasingly prevalent in our daily lives, it is crucial to understand how individuals feel when interacting with such systems. In this work, we investigate the comfort level of individuals when interacting with intent-predicting AI systems and identify the factors of influence. We introduce a study protocol to analyze human comfortability when interact… ▽ More As the use of interactive AI systems becomes increasingly prevalent in our daily lives, it is crucial to understand how individuals feel when interacting with such systems. In this work, we investigate the comfort level of individuals when interacting with intent-predicting AI systems and identify the factors of influence. We introduce a study protocol to analyze human comfortability when interacting with intent-predicting AI systems and execute the study with over a dozen participants. The study findings suggest that users are comfortable with AI systems if they have control and their privacy is not affected. Additionally, the study found that users could differentiate between AI and human responses, but this did not significantly affect their comfort levels. This research paper's significance lies in its contribution to the growing body of literature on interactive AI systems, and it emphasizes the need to consider user perceptions in the development and deployment. △ Less

Submitted 28 February, 2023; originally announced February 2023.

arXiv:2302.11683 [pdf, other]

MVTrans: Multi-View Perception of Transparent Objects

Authors: Yi Ru Wang, Yuchi Zhao, Hao** Xu, Saggi Eppel, Alan Aspuru-Guzik, Florian Shkurti, Animesh Garg

Abstract: Transparent object perception is a crucial skill for applications such as robot manipulation in household and laboratory settings. Existing methods utilize RGB-D or stereo inputs to handle a subset of perception tasks including depth and pose estimation. However, transparent object perception remains to be an open problem. In this paper, we forgo the unreliable depth map from RGB-D sensors and ext… ▽ More Transparent object perception is a crucial skill for applications such as robot manipulation in household and laboratory settings. Existing methods utilize RGB-D or stereo inputs to handle a subset of perception tasks including depth and pose estimation. However, transparent object perception remains to be an open problem. In this paper, we forgo the unreliable depth map from RGB-D sensors and extend the stereo based method. Our proposed method, MVTrans, is an end-to-end multi-view architecture with multiple perception capabilities, including depth estimation, segmentation, and pose estimation. Additionally, we establish a novel procedural photo-realistic dataset generation pipeline and create a large-scale transparent object detection dataset, Syn-TODD, which is suitable for training networks with all three modalities, RGB-D, stereo and multi-view RGB. Project Site: https://ac-rad.github.io/MVTrans/ △ Less

Submitted 22 February, 2023; originally announced February 2023.

Comments: Accepted to ICRA 2023; 6 pages, 4 figures, 4 tables

arXiv:2301.10841 [pdf, other]

Free Join: Unifying Worst-Case Optimal and Traditional Joins

Authors: Yisu Remy Wang, Max Willsey, Dan Suciu

Abstract: Over the last decade, worst-case optimal join (WCOJ) algorithms have emerged as a new paradigm for one of the most fundamental challenges in query processing: computing joins efficiently. Such an algorithm can be asymptotically faster than traditional binary joins, all the while remaining simple to understand and implement. However, they have been found to be less efficient than the old paradigm,… ▽ More Over the last decade, worst-case optimal join (WCOJ) algorithms have emerged as a new paradigm for one of the most fundamental challenges in query processing: computing joins efficiently. Such an algorithm can be asymptotically faster than traditional binary joins, all the while remaining simple to understand and implement. However, they have been found to be less efficient than the old paradigm, traditional binary join plans, on the typical acyclic queries found in practice. Some database systems that support WCOJ use a hypbrid approach: use WCOJ to process the cyclic subparts of the query (if any), and rely on traditional binary joins otherwise. In this paper we propose a new framework, called Free Join, that unifies the two paradigms. We describe a new type of plan, a new data structure (which unifies the hash tables and tries used by the two paradigms), and a suite of optimization techniques. Our system, implemented in Rust, matches or outperforms both traditional binary joins and Generic Join on standard query benchmarks. △ Less

Submitted 27 January, 2023; v1 submitted 25 January, 2023; originally announced January 2023.

arXiv:2202.10390 [pdf, other]

Optimizing Recursive Queries with Program Synthesis

Authors: Yisu Remy Wang, Mahmoud Abo Khamis, Hung Q. Ngo, Reinhard Pichler, Dan Suciu

Abstract: Most work on query optimization has concentrated on loop-free queries. However, data science and machine learning workloads today typically involve recursive or iterative computation. In this work, we propose a novel framework for optimizing recursive queries using methods from program synthesis. In particular, we introduce a simple yet powerful optimization rule called the "FGH-rule" which aims t… ▽ More Most work on query optimization has concentrated on loop-free queries. However, data science and machine learning workloads today typically involve recursive or iterative computation. In this work, we propose a novel framework for optimizing recursive queries using methods from program synthesis. In particular, we introduce a simple yet powerful optimization rule called the "FGH-rule" which aims to find a faster way to evaluate a recursive program. The solution is found by making use of powerful tools, such as a program synthesizer, an SMT-solver, and an equality saturation system. We demonstrate the strength of the optimization by showing that the FGH-rule can lead to speedups up to 4 orders of magnitude on three, already optimized Datalog systems. △ Less

Submitted 21 February, 2022; originally announced February 2022.

arXiv:2110.06830 [pdf, other]

CONetV2: Efficient Auto-Channel Size Optimization for CNNs

Authors: Yi Ru Wang, Samir Khaki, Weihang Zheng, Mahdi S. Hosseini, Konstantinos N. Plataniotis

Abstract: Neural Architecture Search (NAS) has been pivotal in finding optimal network configurations for Convolution Neural Networks (CNNs). While many methods explore NAS from a global search-space perspective, the employed optimization schemes typically require heavy computational resources. This work introduces a method that is efficient in computationally constrained environments by examining the micro… ▽ More Neural Architecture Search (NAS) has been pivotal in finding optimal network configurations for Convolution Neural Networks (CNNs). While many methods explore NAS from a global search-space perspective, the employed optimization schemes typically require heavy computational resources. This work introduces a method that is efficient in computationally constrained environments by examining the micro-search space of channel size. In tackling channel-size optimization, we design an automated algorithm to extract the dependencies within different connected layers of the network. In addition, we introduce the idea of knowledge distillation, which enables preservation of trained weights, admist trials where the channel sizes are changing. Further, since the standard performance indicators (accuracy, loss) fail to capture the performance of individual network components (providing an overall network evaluation), we introduce a novel metric that highly correlates with test accuracy and enables analysis of individual network layers. Combining dependency extraction, metrics, and knowledge distillation, we introduce an efficient searching algorithm, with simulated annealing inspired stochasticity, and demonstrate its effectiveness in finding optimal architectures that outperform baselines by a large margin. △ Less

Submitted 13 October, 2021; originally announced October 2021.

ACM Class: I.2; I.2.8; I.2.10

arXiv:2110.00087 [pdf, other]

Seeing Glass: Joint Point Cloud and Depth Completion for Transparent Objects

Authors: Hao** Xu, Yi Ru Wang, Sagi Eppel, Alàn Aspuru-Guzik, Florian Shkurti, Animesh Garg

Abstract: The basis of many object manipulation algorithms is RGB-D input. Yet, commodity RGB-D sensors can only provide distorted depth maps for a wide range of transparent objects due light refraction and absorption. To tackle the perception challenges posed by transparent objects, we propose TranspareNet, a joint point cloud and depth completion method, with the ability to complete the depth of transpare… ▽ More The basis of many object manipulation algorithms is RGB-D input. Yet, commodity RGB-D sensors can only provide distorted depth maps for a wide range of transparent objects due light refraction and absorption. To tackle the perception challenges posed by transparent objects, we propose TranspareNet, a joint point cloud and depth completion method, with the ability to complete the depth of transparent objects in cluttered and complex scenes, even with partially filled fluid contents within the vessels. To address the shortcomings of existing transparent object data collection schemes in literature, we also propose an automated dataset creation workflow that consists of robot-controlled image collection and vision-based automatic annotation. Through this automated workflow, we created Toronto Transparent Objects Depth Dataset (TODD), which consists of nearly 15000 RGB-D images. Our experimental evaluation demonstrates that TranspareNet outperforms existing state-of-the-art depth completion methods on multiple datasets, including ClearGrasp, and that it also handles cluttered scenes when trained on TODD. Code and dataset will be released at https://www.pair.toronto.edu/TranspareNet/ △ Less

Submitted 30 September, 2021; originally announced October 2021.

Comments: Accepted for Oral at Conference on Robot Learning (CoRL) 2021; Hao** Xu and Yi Ru Wang contributed equally; 8 pages, 6 figures, 3 tables

arXiv:2109.07577 [pdf]

Predicting 3D shapes, masks, and properties of materials, liquids, and objects inside transparent containers, using the TransProteus CGI dataset

Authors: Sagi Eppel, Hao** Xu, Yi Ru Wang, Alan Aspuru-Guzik

Abstract: We present TransProteus, a dataset, and methods for predicting the 3D structure, masks, and properties of materials, liquids, and objects inside transparent vessels from a single image without prior knowledge of the image source and camera parameters. Manipulating materials in transparent containers is essential in many fields and depends heavily on vision. This work supplies a new procedurally ge… ▽ More We present TransProteus, a dataset, and methods for predicting the 3D structure, masks, and properties of materials, liquids, and objects inside transparent vessels from a single image without prior knowledge of the image source and camera parameters. Manipulating materials in transparent containers is essential in many fields and depends heavily on vision. This work supplies a new procedurally generated dataset consisting of 50k images of liquids and solid objects inside transparent containers. The image annotations include 3D models, material properties (color/transparency/roughness...), and segmentation masks for the vessel and its content. The synthetic (CGI) part of the dataset was procedurally generated using 13k different objects, 500 different environments (HDRI), and 1450 material textures (PBR) combined with simulated liquids and procedurally generated vessels. In addition, we supply 104 real-world images of objects inside transparent vessels with depth maps of both the vessel and its content. We propose a camera agnostic method that predicts 3D models from an image as an XYZ map. This allows the trained net to predict the 3D model as a map with XYZ coordinates per pixel without prior knowledge of the image source. To calculate the training loss, we use the distance between pairs of points inside the 3D model instead of the absolute XYZ coordinates. This makes the loss function translation invariant. We use this to predict 3D models of vessels and their content from a single image. Finally, we demonstrate a net that uses a single image to predict the material properties of the vessel content and surface. △ Less

Submitted 20 December, 2021; v1 submitted 15 September, 2021; originally announced September 2021.

arXiv:2108.10436 [pdf, other]

Rewrite Rule Inference Using Equality Saturation

Authors: Chandrakana Nandi, Max Willsey, Amy Zhu, Yisu Remy Wang, Brett Saiki, Adam Anderson, Adriana Schulz, Dan Grossman, Zachary Tatlock

Abstract: Many compilers, synthesizers, and theorem provers rely on rewrite rules to simplify expressions or prove equivalences. Develo** rewrite rules can be difficult: rules may be subtly incorrect, profitable rules are easy to miss, and rulesets must be rechecked or extended whenever semantics are tweaked. Large rulesets can also be challenging to apply: redundant rules slow down rule-based search and… ▽ More Many compilers, synthesizers, and theorem provers rely on rewrite rules to simplify expressions or prove equivalences. Develo** rewrite rules can be difficult: rules may be subtly incorrect, profitable rules are easy to miss, and rulesets must be rechecked or extended whenever semantics are tweaked. Large rulesets can also be challenging to apply: redundant rules slow down rule-based search and frustrate debugging. This paper explores how equality saturation, a promising technique that uses e-graphs to apply rewrite rules, can also be used to infer rewrite rules. E-graphs can compactly represent the exponentially large sets of enumerated terms and potential rewrite rules. We show that equality saturation efficiently shrinks both sets, leading to faster synthesis of smaller, more general rulesets. We prototyped these strategies in a tool dubbed ruler. Compared to a similar tool built on CVC4, ruler synthesizes 5.8X smaller rulesets 25X faster without compromising on proving power. In an end-to-end case study, we show ruler-synthesized rules which perform as well as those crafted by domain experts, and addressed a longstanding issue in a popular open source tool. △ Less

Submitted 23 August, 2021; originally announced August 2021.

arXiv:2108.02290 [pdf, other]

Relational E-Matching

Authors: Yihong Zhang, Yisu Remy Wang, Max Willsey, Zachary Tatlock

Abstract: We present a new approach to e-matching based on relational join; in particular, we apply recent database query execution techniques to guarantee worst-case optimal run time. Compared to the conventional backtracking approach that always searches the e-graph "top down", our new relational e-matching approach can better exploit pattern structure by searching the e-graph according to an optimized qu… ▽ More We present a new approach to e-matching based on relational join; in particular, we apply recent database query execution techniques to guarantee worst-case optimal run time. Compared to the conventional backtracking approach that always searches the e-graph "top down", our new relational e-matching approach can better exploit pattern structure by searching the e-graph according to an optimized query plan. We also establish the first data complexity result for e-matching, bounding run time as a function of the e-graph size and output size. We prototyped and evaluated our technique in the state-of-the-art egg e-graph framework. Compared to a conventional baseline, relational e-matching is simpler to implement and orders of magnitude faster in practice. △ Less

Submitted 5 January, 2022; v1 submitted 4 August, 2021; originally announced August 2021.

Comments: POPL 2022

arXiv:2105.14435 [pdf, ps, other]

Convergence of Datalog over (Pre-) Semirings

Authors: Mahmoud Abo Khamis, Hung Q. Ngo, Reinhard Pichler, Dan Suciu, Yisu Remy Wang

Abstract: Recursive queries have been traditionally studied in the framework of datalog, a language that restricts recursion to monotone queries over sets, which is guaranteed to converge in polynomial time in the size of the input. But modern big data systems require recursive computations beyond the Boolean space. In this paper we study the convergence of datalog when it is interpreted over an arbitrary s… ▽ More Recursive queries have been traditionally studied in the framework of datalog, a language that restricts recursion to monotone queries over sets, which is guaranteed to converge in polynomial time in the size of the input. But modern big data systems require recursive computations beyond the Boolean space. In this paper we study the convergence of datalog when it is interpreted over an arbitrary semiring. We consider an ordered semiring, define the semantics of a datalog program as a least fixpoint in this semiring, and study the number of steps required to reach that fixpoint, if ever. We identify algebraic properties of the semiring that correspond to certain convergence properties of datalog programs. Finally, we describe a class of ordered semirings on which one can use the semi-naïve evaluation algorithm on any datalog program. △ Less

Submitted 24 January, 2024; v1 submitted 30 May, 2021; originally announced May 2021.

arXiv:2101.01332 [pdf, other]

Equality Saturation for Tensor Graph Superoptimization

Authors: Yichen Yang, Phitchaya Mangpo Phothilimtha, Yisu Remy Wang, Max Willsey, Sudip Roy, Jacques Pienaar

Abstract: One of the major optimizations employed in deep learning frameworks is graph rewriting. Production frameworks rely on heuristics to decide if rewrite rules should be applied and in which order. Prior research has shown that one can discover more optimal tensor computation graphs if we search for a better sequence of substitutions instead of relying on heuristics. However, we observe that existing… ▽ More One of the major optimizations employed in deep learning frameworks is graph rewriting. Production frameworks rely on heuristics to decide if rewrite rules should be applied and in which order. Prior research has shown that one can discover more optimal tensor computation graphs if we search for a better sequence of substitutions instead of relying on heuristics. However, we observe that existing approaches for tensor graph superoptimization both in production and research frameworks apply substitutions in a sequential manner. Such sequential search methods are sensitive to the order in which the substitutions are applied and often only explore a small fragment of the exponential space of equivalent graphs. This paper presents a novel technique for tensor graph superoptimization that employs equality saturation to apply all possible substitutions at once. We show that our approach can find optimized graphs with up to 16% speedup over state-of-the-art, while spending on average 48x less time optimizing. △ Less

Submitted 17 March, 2021; v1 submitted 4 January, 2021; originally announced January 2021.

arXiv:2004.03082 [pdf, other]

doi 10.1145/3434304

egg: Fast and Extensible Equality Saturation

Authors: Max Willsey, Chandrakana Nandi, Yisu Remy Wang, Oliver Flatt, Zachary Tatlock, Pavel Panchekha

Abstract: An e-graph efficiently represents a congruence relation over many expressions. Although they were originally developed in the late 1970s for use in automated theorem provers, a more recent technique known as equality saturation repurposes e-graphs to implement state-of-the-art, rewrite-driven compiler optimizations and program synthesizers. However, e-graphs remain unspecialized for this newer use… ▽ More An e-graph efficiently represents a congruence relation over many expressions. Although they were originally developed in the late 1970s for use in automated theorem provers, a more recent technique known as equality saturation repurposes e-graphs to implement state-of-the-art, rewrite-driven compiler optimizations and program synthesizers. However, e-graphs remain unspecialized for this newer use case. Equality saturation workloads exhibit distinct characteristics and often require ad-hoc e-graph extensions to incorporate transformations beyond purely syntactic rewrites. This work contributes two techniques that make e-graphs fast and extensible, specializing them to equality saturation. A new amortized invariant restoration technique called rebuilding takes advantage of equality saturation's distinct workload, providing asymptotic speedups over current techniques in practice. A general mechanism called e-class analyses integrates domain-specific analyses into the e-graph, reducing the need for ad hoc manipulation. We implemented these techniques in a new open-source library called egg. Our case studies on three previously published applications of equality saturation highlight how egg's performance and flexibility enable state-of-the-art results across diverse domains. △ Less

Submitted 7 November, 2020; v1 submitted 6 April, 2020; originally announced April 2020.

Comments: 25 pages, 15 figures, POPL 2021

Journal ref: POPL 2021

arXiv:2002.07951 [pdf, other]

SPORES: Sum-Product Optimization via Relational Equality Saturation for Large Scale Linear Algebra

Authors: Yisu Remy Wang, Shana Hutchison, Jonathan Leang, Bill Howe, Dan Suciu

Abstract: Machine learning algorithms are commonly specified in linear algebra (LA). LA expressions can be rewritten into more efficient forms, by taking advantage of input properties such as sparsity, as well as program properties such as common subexpressions and fusible operators. The complex interaction among these properties' impact on the execution cost poses a challenge to optimizing compilers. Exist… ▽ More Machine learning algorithms are commonly specified in linear algebra (LA). LA expressions can be rewritten into more efficient forms, by taking advantage of input properties such as sparsity, as well as program properties such as common subexpressions and fusible operators. The complex interaction among these properties' impact on the execution cost poses a challenge to optimizing compilers. Existing compilers resort to intricate heuristics that complicate the codebase and add maintenance cost but fail to search through the large space of equivalent LA expressions to find the cheapest one. We introduce a general optimization technique for LA expressions, by converting the LA expressions into Relational Algebra (RA) expressions, optimizing the latter, then converting the result back to (optimized) LA expressions. One major advantage of this method is that it is complete, meaning that any equivalent LA expression can be found using the equivalence rules in RA. The challenge is the major size of the search space, and we address this by adopting and extending a technique used in compilers, called equality saturation. We integrate the optimizer into SystemML and validate it empirically across a spectrum of machine learning tasks; we show that we can derive all existing hand-coded optimizations in SystemML, and perform new optimizations that lead to speedups from 1.2X to 5X. △ Less

Submitted 22 December, 2020; v1 submitted 18 February, 2020; originally announced February 2020.

Showing 1–19 of 19 results for author: Wang, Y R