Search | arXiv e-print repository

MeMo: Meaningful, Modular Controllers via Noise Injection

Authors: Megan Tjandrasuwita, Jie Xu, Armando Solar-Lezama, Wojciech Matusik

Abstract: Robots are often built from standardized assemblies, (e.g. arms, legs, or fingers), but each robot must be trained from scratch to control all the actuators of all the parts together. In this paper we demonstrate a new approach that takes a single robot and its controller as input and produces a set of modular controllers for each of these assemblies such that when a new robot is built from the sa… ▽ More Robots are often built from standardized assemblies, (e.g. arms, legs, or fingers), but each robot must be trained from scratch to control all the actuators of all the parts together. In this paper we demonstrate a new approach that takes a single robot and its controller as input and produces a set of modular controllers for each of these assemblies such that when a new robot is built from the same parts, its control can be quickly learned by reusing the modular controllers. We achieve this with a framework called MeMo which learns (Me)aningful, (Mo)dular controllers. Specifically, we propose a novel modularity objective to learn an appropriate division of labor among the modules. We demonstrate that this objective can be optimized simultaneously with standard behavior cloning loss via noise injection. We benchmark our framework in locomotion and gras** environments on simple to complex robot morphology transfer. We also show that the modules help in task transfer. On both structure and task transfer, MeMo achieves improved training efficiency to graph neural network and Transformer baselines. △ Less

Submitted 24 May, 2024; originally announced July 2024.

arXiv:2406.07897 [pdf, other]

When Do Skills Help Reinforcement Learning? A Theoretical Analysis of Temporal Abstractions

Authors: Zhening Li, Gabriel Poesia, Armando Solar-Lezama

Abstract: Skills are temporal abstractions that are intended to improve reinforcement learning (RL) performance through hierarchical RL. Despite our intuition about the properties of an environment that make skills useful, a precise characterization has been absent. We provide the first such characterization, focusing on the utility of deterministic skills in deterministic sparse-reward environments with fi… ▽ More Skills are temporal abstractions that are intended to improve reinforcement learning (RL) performance through hierarchical RL. Despite our intuition about the properties of an environment that make skills useful, a precise characterization has been absent. We provide the first such characterization, focusing on the utility of deterministic skills in deterministic sparse-reward environments with finite action spaces. We show theoretically and empirically that RL performance gain from skills is worse in environments where solutions to states are less compressible. Additional theoretical results suggest that skills benefit exploration more than they benefit learning from existing experience, and that using unexpressive skills such as macroactions may worsen RL performance. We hope our findings can guide research on automatic skill discovery and help RL practitioners better decide when and how to use skills. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 29 pages, 1 figure. Accepted to ICML 2024

arXiv:2403.07974 [pdf, other]

LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

Authors: Naman Jain, King Han, Alex Gu, Wen-Ding Li, Fanjia Yan, Tianjun Zhang, Sida Wang, Armando Solar-Lezama, Koushik Sen, Ion Stoica

Abstract: Large Language Models (LLMs) applied to code-related applications have emerged as a prominent field, attracting significant interest from both academia and industry. However, as new and improved LLMs are developed, existing evaluation benchmarks (e.g., HumanEval, MBPP) are no longer sufficient for assessing their capabilities. In this work, we propose LiveCodeBench, a comprehensive and contaminati… ▽ More Large Language Models (LLMs) applied to code-related applications have emerged as a prominent field, attracting significant interest from both academia and industry. However, as new and improved LLMs are developed, existing evaluation benchmarks (e.g., HumanEval, MBPP) are no longer sufficient for assessing their capabilities. In this work, we propose LiveCodeBench, a comprehensive and contamination-free evaluation of LLMs for code, which continuously collects new problems over time from contests across three competition platforms, namely LeetCode, AtCoder, and CodeForces. Notably, our benchmark also focuses on a broader range of code related capabilities, such as self-repair, code execution, and test output prediction, beyond just code generation. Currently, LiveCodeBench hosts four hundred high-quality coding problems that were published between May 2023 and May 2024. We have evaluated 18 base LLMs and 34 instruction-tuned LLMs on LiveCodeBench. We present empirical findings on contamination, holistic performance comparisons, potential overfitting in existing benchmarks as well as individual model comparisons. We will release all prompts and model completions for further community analysis, along with a general toolkit for adding new scenarios and model △ Less

Submitted 6 June, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

Comments: Website - https://livecodebench.github.io/

arXiv:2402.19475 [pdf, other]

The Counterfeit Conundrum: Can Code Language Models Grasp the Nuances of Their Incorrect Generations?

Authors: Alex Gu, Wen-Ding Li, Naman Jain, Theo X. Olausson, Celine Lee, Koushik Sen, Armando Solar-Lezama

Abstract: While language models are increasingly more proficient at code generation, they still frequently generate incorrect programs. Many of these programs are obviously wrong, but others are more subtle and pass weaker correctness checks such as being able to compile. In this work, we focus on these counterfeit samples: programs sampled from a language model that 1) have a high enough log-probability to… ▽ More While language models are increasingly more proficient at code generation, they still frequently generate incorrect programs. Many of these programs are obviously wrong, but others are more subtle and pass weaker correctness checks such as being able to compile. In this work, we focus on these counterfeit samples: programs sampled from a language model that 1) have a high enough log-probability to be generated at a moderate temperature and 2) pass weak correctness checks. Overall, we discover that most models have a very shallow understanding of counterfeits through three clear failure modes. First, models mistakenly classify them as correct. Second, models are worse at reasoning about the execution behaviour of counterfeits and often predict their execution results as if they were correct. Third, when asking models to fix counterfeits, the likelihood of a model successfully repairing a counterfeit is often even lower than that of sampling a correct program from scratch. Counterfeits also have very unexpected properties: first, counterfeit programs for problems that are easier for a model to solve are not necessarily easier to detect and only slightly easier to execute and repair. Second, counterfeits from a given model are just as confusing to the model itself as they are to other models. Finally, both strong and weak models are able to generate counterfeit samples that equally challenge all models. In light of our findings, we recommend that care and caution be taken when relying on models to understand their own samples, especially when no external feedback is incorporated. △ Less

Submitted 29 February, 2024; originally announced February 2024.

Comments: 54 pages, 25 figures

arXiv:2401.03065 [pdf, other]

CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution

Authors: Alex Gu, Baptiste Rozière, Hugh Leather, Armando Solar-Lezama, Gabriel Synnaeve, Sida I. Wang

Abstract: We present CRUXEval (Code Reasoning, Understanding, and eXecution Evaluation), a benchmark consisting of 800 Python functions (3-13 lines). Each function comes with an input-output pair, leading to two natural tasks: input prediction and output prediction. First, we propose a generic recipe for generating our execution benchmark which can be used to create future variation of the benchmark. Second… ▽ More We present CRUXEval (Code Reasoning, Understanding, and eXecution Evaluation), a benchmark consisting of 800 Python functions (3-13 lines). Each function comes with an input-output pair, leading to two natural tasks: input prediction and output prediction. First, we propose a generic recipe for generating our execution benchmark which can be used to create future variation of the benchmark. Second, we evaluate twenty code models on our benchmark and discover that many recent high-scoring models on HumanEval do not show the same improvements on our benchmark. Third, we show that simple CoT and fine-tuning schemes can improve performance on our benchmark but remain far from solving it. The best setup, GPT-4 with chain of thought (CoT), achieves a pass@1 of 75% and 81% on input and output prediction, respectively. In contrast, Code Llama 34B achieves a pass@1 of 50% and 46% on input and output prediction, highlighting the gap between open and closed source models. As no model is close to acing CRUXEval, we provide examples of consistent GPT-4 failures on simple programs as a lens into its code reasoning capabilities and areas for improvement. △ Less

Submitted 5 January, 2024; originally announced January 2024.

Comments: 71 pages, 29 figures

arXiv:2310.15164 [pdf, other]

doi 10.18653/v1/2023.emnlp-main.313

LINC: A Neurosymbolic Approach for Logical Reasoning by Combining Language Models with First-Order Logic Provers

Authors: Theo X. Olausson, Alex Gu, Benjamin Lipkin, Cedegao E. Zhang, Armando Solar-Lezama, Joshua B. Tenenbaum, Roger Levy

Abstract: Logical reasoning, i.e., deductively inferring the truth value of a conclusion from a set of premises, is an important task for artificial intelligence with wide potential impacts on science, mathematics, and society. While many prompting-based strategies have been proposed to enable Large Language Models (LLMs) to do such reasoning more effectively, they still appear unsatisfactory, often failing… ▽ More Logical reasoning, i.e., deductively inferring the truth value of a conclusion from a set of premises, is an important task for artificial intelligence with wide potential impacts on science, mathematics, and society. While many prompting-based strategies have been proposed to enable Large Language Models (LLMs) to do such reasoning more effectively, they still appear unsatisfactory, often failing in subtle and unpredictable ways. In this work, we investigate the validity of instead reformulating such tasks as modular neurosymbolic programming, which we call LINC: Logical Inference via Neurosymbolic Computation. In LINC, the LLM acts as a semantic parser, translating premises and conclusions from natural language to expressions in first-order logic. These expressions are then offloaded to an external theorem prover, which symbolically performs deductive inference. Leveraging this approach, we observe significant performance gains on FOLIO and a balanced subset of ProofWriter for three different models in nearly all experimental conditions we evaluate. On ProofWriter, augmenting the comparatively small open-source StarCoder+ (15.5B parameters) with LINC even outperforms GPT-3.5 and GPT-4 with Chain-of-Thought (CoT) prompting by an absolute 38% and 10%, respectively. When used with GPT-4, LINC scores 26% higher than CoT on ProofWriter while performing comparatively on FOLIO. Further analysis reveals that although both methods on average succeed roughly equally often on this dataset, they exhibit distinct and complementary failure modes. We thus provide promising evidence for how logical reasoning over natural language can be tackled through jointly leveraging LLMs alongside symbolic provers. All corresponding code is publicly available at https://github.com/benlipkin/linc △ Less

Submitted 14 February, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

Comments: EMNLP Main 2023 (Outstanding Paper Award)

Journal ref: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 5153-5176, Singapore. Association for Computational Linguistics

arXiv:2310.11614 [pdf, other]

Learning a Hierarchical Planner from Humans in Multiple Generations

Authors: Leonardo Hernandez Cano, Yewen Pu, Robert D. Hawkins, Josh Tenenbaum, Armando Solar-Lezama

Abstract: A typical way in which a machine acquires knowledge from humans is by programming. Compared to learning from demonstrations or experiences, programmatic learning allows the machine to acquire a novel skill as soon as the program is written, and, by building a library of programs, a machine can quickly learn how to perform complex tasks. However, as programs often take their execution contexts for… ▽ More A typical way in which a machine acquires knowledge from humans is by programming. Compared to learning from demonstrations or experiences, programmatic learning allows the machine to acquire a novel skill as soon as the program is written, and, by building a library of programs, a machine can quickly learn how to perform complex tasks. However, as programs often take their execution contexts for granted, they are brittle when the contexts change, making it difficult to adapt complex programs to new contexts. We present natural programming, a library learning system that combines programmatic learning with a hierarchical planner. Natural programming maintains a library of decompositions, consisting of a goal, a linguistic description of how this goal decompose into sub-goals, and a concrete instance of its decomposition into sub-goals. A user teaches the system via curriculum building, by identifying a challenging yet not impossible goal along with linguistic hints on how this goal may be decomposed into sub-goals. The system solves for the goal via hierarchical planning, using the linguistic hints to guide its probability distribution in proposing the right plans. The system learns from this interaction by adding newly found decompositions in the successful search into its library. Simulated studies and a human experiment (n=360) on a controlled environment demonstrate that natural programming can robustly compose programs learned from different users and contexts, adapting faster and solving more complex tasks when compared to programmatic baselines. △ Less

Submitted 17 October, 2023; originally announced October 2023.

Comments: First two authors contributed equally

arXiv:2306.09896 [pdf, other]

Is Self-Repair a Silver Bullet for Code Generation?

Authors: Theo X. Olausson, Jeevana Priya Inala, Chenglong Wang, Jianfeng Gao, Armando Solar-Lezama

Abstract: Large language models have shown remarkable aptitude in code generation, but still struggle to perform complex tasks. Self-repair -- in which the model debugs and repairs its own code -- has recently become a popular way to boost performance in these settings. However, despite its increasing popularity, existing studies of self-repair have been limited in scope; in many settings, its efficacy thus… ▽ More Large language models have shown remarkable aptitude in code generation, but still struggle to perform complex tasks. Self-repair -- in which the model debugs and repairs its own code -- has recently become a popular way to boost performance in these settings. However, despite its increasing popularity, existing studies of self-repair have been limited in scope; in many settings, its efficacy thus remains poorly understood. In this paper, we analyze Code Llama, GPT-3.5 and GPT-4's ability to perform self-repair on problems taken from HumanEval and APPS. We find that when the cost of carrying out repair is taken into account, performance gains are often modest, vary a lot between subsets of the data, and are sometimes not present at all. We hypothesize that this is because self-repair is bottlenecked by the model's ability to provide feedback on its own code; using a stronger model to artificially boost the quality of the feedback, we observe substantially larger performance gains. Similarly, a small-scale study in which we provide GPT-4 with feedback from human participants suggests that even for the strongest models, self-repair still lags far behind what can be achieved with human-level debugging. △ Less

Submitted 2 February, 2024; v1 submitted 16 June, 2023; originally announced June 2023.

Comments: Accepted to ICLR 2024. Added additional Code Llama experiments and fixed a data processing error harming Code Llama's reported self-repair performance on HumanEval

arXiv:2306.08997

Exploring the MIT Mathematics and EECS Curriculum Using Large Language Models

Authors: Sarah J. Zhang, Samuel Florin, Ariel N. Lee, Eamon Niknafs, Andrei Marginean, Annie Wang, Keith Tyser, Zad Chin, Yann Hicke, Nikhil Singh, Madeleine Udell, Yoon Kim, Tonio Buonassisi, Armando Solar-Lezama, Iddo Drori

Abstract: We curate a comprehensive dataset of 4,550 questions and solutions from problem sets, midterm exams, and final exams across all MIT Mathematics and Electrical Engineering and Computer Science (EECS) courses required for obtaining a degree. We evaluate the ability of large language models to fulfill the graduation requirements for any MIT major in Mathematics and EECS. Our results demonstrate that… ▽ More We curate a comprehensive dataset of 4,550 questions and solutions from problem sets, midterm exams, and final exams across all MIT Mathematics and Electrical Engineering and Computer Science (EECS) courses required for obtaining a degree. We evaluate the ability of large language models to fulfill the graduation requirements for any MIT major in Mathematics and EECS. Our results demonstrate that GPT-3.5 successfully solves a third of the entire MIT curriculum, while GPT-4, with prompt engineering, achieves a perfect solve rate on a test set excluding questions based on images. We fine-tune an open-source large language model on this dataset. We employ GPT-4 to automatically grade model responses, providing a detailed performance breakdown by course, question, and answer type. By embedding questions in a low-dimensional space, we explore the relationships between questions, topics, and classes and discover which questions and classes are required for solving other questions and classes through few-shot learning. Our analysis offers valuable insights into course prerequisites and curriculum design, highlighting language models' potential for learning and improving Mathematics and EECS education. △ Less

Submitted 24 June, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

Comments: Did not receive permission to release the data or model fine-tuned on the data

arXiv:2302.01976 [pdf, other]

SPARLING: Learning Latent Representations with Extremely Sparse Activations

Authors: Kavi Gupta, Osbert Bastani, Armando Solar-Lezama

Abstract: Real-world processes often contain intermediate state that can be modeled as an extremely sparse tensor. We introduce Sparling, a technique that allows you to learn models with intermediate layers that match this state from only end-to-end labeled examples (i.e., no supervision on the intermediate state). Sparling uses a new kind of informational bottleneck that enforces levels of activation spars… ▽ More Real-world processes often contain intermediate state that can be modeled as an extremely sparse tensor. We introduce Sparling, a technique that allows you to learn models with intermediate layers that match this state from only end-to-end labeled examples (i.e., no supervision on the intermediate state). Sparling uses a new kind of informational bottleneck that enforces levels of activation sparsity unachievable using other techniques. We find that extreme sparsity is necessary to achieve good intermediate state modeling. On our synthetic DigitCircle domain as well as the LaTeX-OCR and Audio-MNIST-Sequence domains, we are able to precisely localize the intermediate states up to feature permutation with > 90% accuracy, even though we only train end-to-end. △ Less

Submitted 20 October, 2023; v1 submitted 3 February, 2023; originally announced February 2023.

Comments: 10 pages, 6 figures

arXiv:2211.16605 [pdf, other]

doi 10.1145/3571234

Top-Down Synthesis for Library Learning

Authors: Matthew Bowers, Theo X. Olausson, Lionel Wong, Gabriel Grand, Joshua B. Tenenbaum, Kevin Ellis, Armando Solar-Lezama

Abstract: This paper introduces corpus-guided top-down synthesis as a mechanism for synthesizing library functions that capture common functionality from a corpus of programs in a domain specific language (DSL). The algorithm builds abstractions directly from initial DSL primitives, using syntactic pattern matching of intermediate abstractions to intelligently prune the search space and guide the algorithm… ▽ More This paper introduces corpus-guided top-down synthesis as a mechanism for synthesizing library functions that capture common functionality from a corpus of programs in a domain specific language (DSL). The algorithm builds abstractions directly from initial DSL primitives, using syntactic pattern matching of intermediate abstractions to intelligently prune the search space and guide the algorithm towards abstractions that maximally capture shared structures in the corpus. We present an implementation of the approach in a tool called Stitch and evaluate it against the state-of-the-art deductive library learning algorithm from DreamCoder. Our evaluation shows that Stitch is 3-4 orders of magnitude faster and uses 2 orders of magnitude less memory while maintaining comparable or better library quality (as measured by compressivity). We also demonstrate Stitch's scalability on corpora containing hundreds of complex programs that are intractable with prior deductive approaches and show empirically that it is robust to terminating the search procedure early -- further allowing it to scale to challenging datasets by means of early stop**. △ Less

Submitted 15 January, 2023; v1 submitted 29 November, 2022; originally announced November 2022.

Comments: Published at POPL 2023

Journal ref: Proc. ACM Program. Lang. 7, POPL, Article 41 (January 2023), pp 1182-1213

arXiv:2211.12112 [pdf, other]

Human Evaluation of Text-to-Image Models on a Multi-Task Benchmark

Authors: Vitali Petsiuk, Alexander E. Siemenn, Saisamrit Surbehera, Zad Chin, Keith Tyser, Gregory Hunter, Arvind Raghavan, Yann Hicke, Bryan A. Plummer, Ori Kerret, Tonio Buonassisi, Kate Saenko, Armando Solar-Lezama, Iddo Drori

Abstract: We provide a new multi-task benchmark for evaluating text-to-image models. We perform a human evaluation comparing the most common open-source (Stable Diffusion) and commercial (DALL-E 2) models. Twenty computer science AI graduate students evaluated the two models, on three tasks, at three difficulty levels, across ten prompts each, providing 3,600 ratings. Text-to-image generation has seen rapid… ▽ More We provide a new multi-task benchmark for evaluating text-to-image models. We perform a human evaluation comparing the most common open-source (Stable Diffusion) and commercial (DALL-E 2) models. Twenty computer science AI graduate students evaluated the two models, on three tasks, at three difficulty levels, across ten prompts each, providing 3,600 ratings. Text-to-image generation has seen rapid progress to the point that many recent models have demonstrated their ability to create realistic high-resolution images for various prompts. However, current text-to-image methods and the broader body of research in vision-language understanding still struggle with intricate text prompts that contain many objects with multiple attributes and relationships. We introduce a new text-to-image benchmark that contains a suite of thirty-two tasks over multiple applications that capture a model's ability to handle different features of a text prompt. For example, asking a model to generate a varying number of the same object to measure its ability to count or providing a text prompt with several objects that each have a different attribute to identify its ability to match objects and attributes correctly. Rather than subjectively evaluating text-to-image results on a set of prompts, our new multi-task benchmark consists of challenge tasks at three difficulty levels (easy, medium, and hard) and human ratings for each generated image. △ Less

Submitted 22 November, 2022; originally announced November 2022.

Comments: NeurIPS 2022 Workshop on Human Evaluation of Generative Models (HEGM)

arXiv:2211.08671 [pdf, other]

LEMMA: Bootstrap** High-Level Mathematical Reasoning with Learned Symbolic Abstractions

Authors: Zhening Li, Gabriel Poesia, Omar Costilla-Reyes, Noah Goodman, Armando Solar-Lezama

Abstract: Humans tame the complexity of mathematical reasoning by develo** hierarchies of abstractions. With proper abstractions, solutions to hard problems can be expressed concisely, thus making them more likely to be found. In this paper, we propose Learning Mathematical Abstractions (LEMMA): an algorithm that implements this idea for reinforcement learning agents in mathematical domains. LEMMA augment… ▽ More Humans tame the complexity of mathematical reasoning by develo** hierarchies of abstractions. With proper abstractions, solutions to hard problems can be expressed concisely, thus making them more likely to be found. In this paper, we propose Learning Mathematical Abstractions (LEMMA): an algorithm that implements this idea for reinforcement learning agents in mathematical domains. LEMMA augments Expert Iteration with an abstraction step, where solutions found so far are revisited and rewritten in terms of new higher-level actions, which then become available to solve new problems. We evaluate LEMMA on two mathematical reasoning tasks--equation solving and fraction simplification--in a step-by-step fashion. In these two domains, LEMMA improves the ability of an existing agent, both solving more problems and generalizing more effectively to harder problems than those seen during training. △ Less

Submitted 15 November, 2022; originally announced November 2022.

Comments: 10 pages, 2 figures; to appear in 2nd MATH-AI Workshop at NeurIPS'22

arXiv:2210.11468 [pdf, other]

ObSynth: An Interactive Synthesis System for Generating Object Models from Natural Language Specifications

Authors: Alex Gu, Tamara Mitrovska, Daniela Velez, Jacob Andreas, Armando Solar-Lezama

Abstract: We introduce ObSynth, an interactive system leveraging the domain knowledge embedded in large language models (LLMs) to help users design object models from high level natural language prompts. This is an example of specification reification, the process of taking a high-level, potentially vague specification and reifying it into a more concrete form. We evaluate ObSynth via a user study, leading… ▽ More We introduce ObSynth, an interactive system leveraging the domain knowledge embedded in large language models (LLMs) to help users design object models from high level natural language prompts. This is an example of specification reification, the process of taking a high-level, potentially vague specification and reifying it into a more concrete form. We evaluate ObSynth via a user study, leading to three key findings: first, object models designed using ObSynth are more detailed, showing that it often synthesizes fields users might have otherwise omitted. Second, a majority of objects, methods, and fields generated by ObSynth are kept by the user in the final object model, highlighting the quality of generated components. Third, ObSynth altered the workflow of participants: they focus on checking that synthesized components were correct rather than generating them from scratch, though ObSynth did not reduce the time participants took to generate object models. △ Less

Submitted 20 October, 2022; originally announced October 2022.

Comments: 25 pages, 15 figures

arXiv:2210.05050 [pdf, other]

Neurosymbolic Programming for Science

Authors: Jennifer J. Sun, Megan Tjandrasuwita, Atharva Sehgal, Armando Solar-Lezama, Swarat Chaudhuri, Yisong Yue, Omar Costilla-Reyes

Abstract: Neurosymbolic Programming (NP) techniques have the potential to accelerate scientific discovery. These models combine neural and symbolic components to learn complex patterns and representations from data, using high-level concepts or known constraints. NP techniques can interface with symbolic domain knowledge from scientists, such as prior knowledge and experimental context, to produce interpret… ▽ More Neurosymbolic Programming (NP) techniques have the potential to accelerate scientific discovery. These models combine neural and symbolic components to learn complex patterns and representations from data, using high-level concepts or known constraints. NP techniques can interface with symbolic domain knowledge from scientists, such as prior knowledge and experimental context, to produce interpretable outputs. We identify opportunities and challenges between current NP models and scientific workflows, with real-world examples from behavior analysis in science: to enable the use of NP broadly for workflows across the natural and social sciences. △ Less

Submitted 7 November, 2022; v1 submitted 10 October, 2022; originally announced October 2022.

Comments: Neural Information Processing Systems 2022 - AI for science workshop

arXiv:2206.07828 [pdf, other]

Searching Entangled Program Spaces

Authors: James Koppel, Zheng Guo, Edsko de Vries, Armando Solar-Lezama, Nadia Polikarpova

Abstract: Many problem domains, including program synthesis and rewrite-based optimization, require searching astronomically large spaces of programs. Existing approaches often rely on building specialized data structures -- version-space algebras, finite tree automata, or e-graphs -- to compactly represent these programs. To find a compact representation, existing data structures exploit independence of su… ▽ More Many problem domains, including program synthesis and rewrite-based optimization, require searching astronomically large spaces of programs. Existing approaches often rely on building specialized data structures -- version-space algebras, finite tree automata, or e-graphs -- to compactly represent these programs. To find a compact representation, existing data structures exploit independence of subterms; they blow up when the choices of subterms are entangled. We introduce equality-constrained tree automata (ECTAs), a generalization of the three aforementioned data structures that can efficiently represent large spaces of programs with entangled subterms. We present efficient algorithms for extracting programs from ECTAs, implemented in a performant Haskell library, \texttt{ecta}. Using \texttt{ecta} we construct \textsc{Hectare}, a type-driven program synthesizer for Haskell. \textsc{Hectare} significantly outperforms a state-of-the-art synthesizer Hoogle+ -- providing an average speedup of 8x -- despite its implementation being an order of magnitude smaller. △ Less

Submitted 15 June, 2022; originally announced June 2022.

arXiv:2206.06164 [pdf, other]

Metric Program Synthesis

Authors: John Feser, Isil Dillig, Armando Solar-Lezama

Abstract: We present a new domain-agnostic synthesis technique for generating programs from input-output examples. Our method, called metric program synthesis, relaxes the well-known observational equivalence idea (used widely in bottom-up enumerative synthesis) into a weaker notion of observational similarity, with the goal of reducing the search space that the synthesizer needs to explore. Our method clus… ▽ More We present a new domain-agnostic synthesis technique for generating programs from input-output examples. Our method, called metric program synthesis, relaxes the well-known observational equivalence idea (used widely in bottom-up enumerative synthesis) into a weaker notion of observational similarity, with the goal of reducing the search space that the synthesizer needs to explore. Our method clusters programs into equivalence classes based on a distance metric and constructs a version space that compactly represents "approximately correct" programs. Then, given a "close enough" program sampled from this version space, our approach uses a distance-guided repair algorithm to find a program that exactly matches the given input-output examples. We have implemented our proposed metric program synthesis technique in a tool called SyMetric and evaluate it in three different domains considered in prior work. Our evaluation shows that SyMetric outperforms other domain-agnostic synthesizers that use observational equivalence and that it achieves results competitive with domain-specific synthesizers that are either designed for or trained on those domains. △ Less

Submitted 10 October, 2022; v1 submitted 13 June, 2022; originally announced June 2022.

arXiv:2111.12772 [pdf, other]

JoinABLe: Learning Bottom-up Assembly of Parametric CAD Joints

Authors: Karl D. D. Willis, Pradeep Kumar Jayaraman, Hang Chu, Yunsheng Tian, Yifei Li, Daniele Grandi, Aditya Sanghi, Linh Tran, Joseph G. Lambourne, Armando Solar-Lezama, Wojciech Matusik

Abstract: Physical products are often complex assemblies combining a multitude of 3D parts modeled in computer-aided design (CAD) software. CAD designers build up these assemblies by aligning individual parts to one another using constraints called joints. In this paper we introduce JoinABLe, a learning-based method that assembles parts together to form joints. JoinABLe uses the weak supervision available i… ▽ More Physical products are often complex assemblies combining a multitude of 3D parts modeled in computer-aided design (CAD) software. CAD designers build up these assemblies by aligning individual parts to one another using constraints called joints. In this paper we introduce JoinABLe, a learning-based method that assembles parts together to form joints. JoinABLe uses the weak supervision available in standard parametric CAD files without the help of object class labels or human guidance. Our results show that by making network predictions over a graph representation of solid models we can outperform multiple baseline methods with an accuracy (79.53%) that approaches human performance (80%). Finally, to support future research we release the Fusion 360 Gallery assembly dataset, containing assemblies with rich information on joints, contact surfaces, holes, and the underlying assembly graph structure. △ Less

Submitted 22 April, 2022; v1 submitted 24 November, 2021; originally announced November 2021.

Comments: CVPR 2022; code available at https://github.com/AutodeskAILab/JoinABLe and data available at https://github.com/AutodeskAILab/Fusion360GalleryDataset

arXiv:2110.05440 [pdf, other]

Safe Human-Interactive Control via Shielding

Authors: Jeevana Priya Inala, Yecheng Jason Ma, Osbert Bastani, Xin Zhang, Armando Solar-Lezama

Abstract: Ensuring safety for human-interactive robotics is important due to the potential for human injury. The key challenge is defining safety in a way that accounts for the complex range of human behaviors without modeling the human as an unconstrained adversary. We propose a novel approach to ensuring safety in these settings. Our approach focuses on defining backup actions that we believe human always… ▽ More Ensuring safety for human-interactive robotics is important due to the potential for human injury. The key challenge is defining safety in a way that accounts for the complex range of human behaviors without modeling the human as an unconstrained adversary. We propose a novel approach to ensuring safety in these settings. Our approach focuses on defining backup actions that we believe human always considers taking to avoid an accident -- e.g., brake to avoid rear-ending the other agent. Given such a definition, we consider a safety constraint that guarantees safety as long as the human takes the appropriate backup actions when necessary to ensure safety. Then, we propose an algorithm that overrides an arbitrary given controller as needed to ensure that the robot is safe. We evaluate our approach in a simulated environment, interacting with both real and simulated humans. △ Less

Submitted 11 October, 2021; originally announced October 2021.

arXiv:2102.11137 [pdf, other]

Program Synthesis Guided Reinforcement Learning for Partially Observed Environments

Authors: Yichen David Yang, Jeevana Priya Inala, Osbert Bastani, Yewen Pu, Armando Solar-Lezama, Martin Rinard

Abstract: A key challenge for reinforcement learning is solving long-horizon planning problems. Recent work has leveraged programs to guide reinforcement learning in these settings. However, these approaches impose a high manual burden on the user since they must provide a guiding program for every new task. Partially observed environments further complicate the programming task because the program must imp… ▽ More A key challenge for reinforcement learning is solving long-horizon planning problems. Recent work has leveraged programs to guide reinforcement learning in these settings. However, these approaches impose a high manual burden on the user since they must provide a guiding program for every new task. Partially observed environments further complicate the programming task because the program must implement a strategy that correctly, and ideally optimally, handles every possible configuration of the hidden regions of the environment. We propose a new approach, model predictive program synthesis (MPPS), that uses program synthesis to automatically generate the guiding programs. It trains a generative model to predict the unobserved portions of the world, and then synthesizes a program based on samples from this model in a way that is robust to its uncertainty. In our experiments, we show that our approach significantly outperforms non-program-guided approaches on a set of challenging benchmarks, including a 2D Minecraft-inspired environment where the agent must complete a complex sequence of subtasks to achieve its goal, and achieves a similar performance as using handcrafted programs to guide the agent. Our results demonstrate that our approach can obtain the benefits of program-guided reinforcement learning without requiring the user to provide a new guiding program for every new task. △ Less

Submitted 1 November, 2021; v1 submitted 22 February, 2021; originally announced February 2021.

Journal ref: NeurIPS 2021

arXiv:2101.03238 [pdf, other]

Neurosymbolic Transformers for Multi-Agent Communication

Authors: Jeevana Priya Inala, Yichen Yang, James Paulos, Yewen Pu, Osbert Bastani, Vijay Kumar, Martin Rinard, Armando Solar-Lezama

Abstract: We study the problem of inferring communication structures that can solve cooperative multi-agent planning problems while minimizing the amount of communication. We quantify the amount of communication as the maximum degree of the communication graph; this metric captures settings where agents have limited bandwidth. Minimizing communication is challenging due to the combinatorial nature of both t… ▽ More We study the problem of inferring communication structures that can solve cooperative multi-agent planning problems while minimizing the amount of communication. We quantify the amount of communication as the maximum degree of the communication graph; this metric captures settings where agents have limited bandwidth. Minimizing communication is challenging due to the combinatorial nature of both the decision space and the objective; for instance, we cannot solve this problem by training neural networks using gradient descent. We propose a novel algorithm that synthesizes a control policy that combines a programmatic communication policy used to generate the communication graph with a transformer policy network used to choose actions. Our algorithm first trains the transformer policy, which implicitly generates a "soft" communication graph; then, it synthesizes a programmatic communication policy that "hardens" this graph, forming a neurosymbolic transformer. Our experiments demonstrate how our approach can synthesize policies that generate low-degree communication graphs while maintaining near-optimal performance. △ Less

Submitted 4 January, 2021; originally announced January 2021.

Journal ref: NeurIPS 2020

arXiv:2012.12964 [pdf, other]

Representing Partial Programs with Blended Abstract Semantics

Authors: Maxwell Nye, Yewen Pu, Matthew Bowers, Jacob Andreas, Joshua B. Tenenbaum, Armando Solar-Lezama

Abstract: Synthesizing programs from examples requires searching over a vast, combinatorial space of possible programs. In this search process, a key challenge is representing the behavior of a partially written program before it can be executed, to judge if it is on the right track and predict where to search next. We introduce a general technique for representing partially written programs in a program sy… ▽ More Synthesizing programs from examples requires searching over a vast, combinatorial space of possible programs. In this search process, a key challenge is representing the behavior of a partially written program before it can be executed, to judge if it is on the right track and predict where to search next. We introduce a general technique for representing partially written programs in a program synthesis engine. We take inspiration from the technique of abstract interpretation, in which an approximate execution model is used to determine if an unfinished program will eventually satisfy a goal specification. Here we learn an approximate execution model implemented as a modular neural network. By constructing compositional program representations that implicitly encode the interpretation semantics of the underlying programming language, we can represent partial programs using a flexible combination of concrete execution state and learned neural representations, using the learned approximate semantics when concrete semantics are not known (in unfinished parts of the program). We show that these hybrid neuro-symbolic representations enable execution-guided synthesizers to use more powerful language constructs, such as loops and higher-order functions, and can be used to synthesize programs more accurately for a given search budget than pure neural approaches in several domains. △ Less

Submitted 19 April, 2021; v1 submitted 23 December, 2020; originally announced December 2020.

Comments: ICLR 2021

arXiv:2010.04918 [pdf, other]

doi 10.1145/3547648

Automatically Deriving Control-Flow Graph Generators from Operational Semantics

Authors: James Koppel, Jackson Kearl, Armando Solar-Lezama

Abstract: We develop the first theory of control-flow graphs from first principles, and use it to create an algorithm for automatically synthesizing many variants of control-flow graph generators from a language's operational semantics. Our approach first introduces a new algorithm for converting a large class of small-step operational semantics to an abstract machine. It next uses a technique called "abstr… ▽ More We develop the first theory of control-flow graphs from first principles, and use it to create an algorithm for automatically synthesizing many variants of control-flow graph generators from a language's operational semantics. Our approach first introduces a new algorithm for converting a large class of small-step operational semantics to an abstract machine. It next uses a technique called "abstract rewriting" to automatically abstract the semantics of a language, which is used both to directly generate a CFG from a program ("interpreted mode") and to generate standalone code, similar to a human-written CFG generator, for any program in a language. We show how the choice of two abstraction and projection parameters allow our approach to synthesize several families of CFG-generators useful for different kinds of tools. We prove the correspondence between the generated graphs and the original semantics. We provide and prove an algorithm for automatically proving the termination of interpreted-mode generators. In addition to our theoretical results, we have implemented this algorithm in a tool called Mandate, and show that it produces human-readable code on two medium-size languages with 60-80 rules, featuring nearly all intraprocedural control constructs common in modern languages. We then showed these CFG-generators were sufficient to build two static analyzers atop them. Our work is a promising step towards the grand vision of being able to synthesize all desired tools from the semantics of a programming language. △ Less

Submitted 22 July, 2022; v1 submitted 10 October, 2020; originally announced October 2020.

arXiv:2010.02392 [pdf, other]

Fusion 360 Gallery: A Dataset and Environment for Programmatic CAD Construction from Human Design Sequences

Authors: Karl D. D. Willis, Yewen Pu, Jieliang Luo, Hang Chu, Tao Du, Joseph G. Lambourne, Armando Solar-Lezama, Wojciech Matusik

Abstract: Parametric computer-aided design (CAD) is a standard paradigm used to design manufactured objects, where a 3D shape is represented as a program supported by the CAD software. Despite the pervasiveness of parametric CAD and a growing interest from the research community, currently there does not exist a dataset of realistic CAD models in a concise programmatic form. In this paper we present the Fus… ▽ More Parametric computer-aided design (CAD) is a standard paradigm used to design manufactured objects, where a 3D shape is represented as a program supported by the CAD software. Despite the pervasiveness of parametric CAD and a growing interest from the research community, currently there does not exist a dataset of realistic CAD models in a concise programmatic form. In this paper we present the Fusion 360 Gallery, consisting of a simple language with just the sketch and extrude modeling operations, and a dataset of 8,625 human design sequences expressed in this language. We also present an interactive environment called the Fusion 360 Gym, which exposes the sequential construction of a CAD program as a Markov decision process, making it amendable to machine learning approaches. As a use case for our dataset and environment, we define the CAD reconstruction task of recovering a CAD program from a target geometry. We report results of applying state-of-the-art methods of program synthesis with neurally guided search on this task. △ Less

Submitted 16 May, 2021; v1 submitted 5 October, 2020; originally announced October 2020.

Comments: Accepted to SIGGRAPH 2021; data/code available at https://github.com/AutodeskAILab/Fusion360GalleryDataset

arXiv:2007.05060 [pdf, other]

Program Synthesis with Pragmatic Communication

Authors: Yewen Pu, Kevin Ellis, Marta Kryven, Josh Tenenbaum, Armando Solar-Lezama

Abstract: Program synthesis techniques construct or infer programs from user-provided specifications, such as input-output examples. Yet most specifications, especially those given by end-users, leave the synthesis problem radically ill-posed, because many programs may simultaneously satisfy the specification. Prior work resolves this ambiguity by using various inductive biases, such as a preference for sim… ▽ More Program synthesis techniques construct or infer programs from user-provided specifications, such as input-output examples. Yet most specifications, especially those given by end-users, leave the synthesis problem radically ill-posed, because many programs may simultaneously satisfy the specification. Prior work resolves this ambiguity by using various inductive biases, such as a preference for simpler programs. This work introduces a new inductive bias derived by modeling the program synthesis task as rational communication, drawing insights from recursive reasoning models of pragmatics. Given a specification, we score a candidate program both on its consistency with the specification, and also whether a rational speaker would chose this particular specification to communicate that program. We develop efficient algorithms for such an approach when learning from input-output examples, and build a pragmatic program synthesizer over a simple grid-like layout domain. A user study finds that end-user participants communicate more effectively with the pragmatic program synthesizer over a non-pragmatic one. △ Less

Submitted 20 October, 2020; v1 submitted 9 July, 2020; originally announced July 2020.

Comments: The second author and the third author contributed equally to this work

ACM Class: I.2.2; D.3.0

arXiv:2007.03704 [pdf]

Computer-Aided Personalized Education

Authors: Rajeev Alur, Richard Baraniuk, Rastislav Bodik, Ann Drobnis, Sumit Gulwani, Bjoern Hartmann, Yasmin Kafai, Jeff Karpicke, Ran Libeskind-Hadas, Debra Richardson, Armando Solar-Lezama, Candace Thille, Moshe Vardi

Abstract: The shortage of people trained in STEM fields is becoming acute, and universities and colleges are straining to satisfy this demand. In the case of computer science, for instance, the number of US students taking introductory courses has grown three-fold in the past decade. Recently, massive open online courses (MOOCs) have been promoted as a way to ease this strain. This at best provides access t… ▽ More The shortage of people trained in STEM fields is becoming acute, and universities and colleges are straining to satisfy this demand. In the case of computer science, for instance, the number of US students taking introductory courses has grown three-fold in the past decade. Recently, massive open online courses (MOOCs) have been promoted as a way to ease this strain. This at best provides access to education. The bigger challenge though is co** with heterogeneous backgrounds of different students, retention, providing feedback, and assessment. Personalized education relying on computational tools can address this challenge. While automated tutoring has been studied at different times in different communities, recent advances in computing and education technology offer exciting opportunities to transform the manner in which students learn. In particular, at least three trends are significant. First, progress in logical reasoning, data analytics, and natural language processing has led to tutoring tools for automatic assessment, personalized instruction including targeted feedback, and adaptive content generation for a variety of subjects. Second, research in the science of learning and human-computer interaction is leading to a better understanding of how different students learn, when and what types of interventions are effective for different instructional goals, and how to measure the success of educational tools. Finally, the recent emergence of online education platforms, both in academia and industry, is leading to new opportunities for the development of a shared infrastructure. This CCC workshop brought together researchers develo** educational tools based on technologies such as logical reasoning and machine learning with researchers in education, human-computer interaction, and cognitive psychology. △ Less

Submitted 7 July, 2020; originally announced July 2020.

Comments: A Computing Community Consortium (CCC) workshop report, 12 pages

Report number: ccc2016report_6

arXiv:2007.01223 [pdf, other]

Verifiably Safe Exploration for End-to-End Reinforcement Learning

Authors: Nathan Hunt, Nathan Fulton, Sara Magliacane, Nghia Hoang, Subhro Das, Armando Solar-Lezama

Abstract: Deploying deep reinforcement learning in safety-critical settings requires develo** algorithms that obey hard constraints during exploration. This paper contributes a first approach toward enforcing formal safety constraints on end-to-end policies with visual inputs. Our approach draws on recent advances in object detection and automated reasoning for hybrid dynamical systems. The approach is ev… ▽ More Deploying deep reinforcement learning in safety-critical settings requires develo** algorithms that obey hard constraints during exploration. This paper contributes a first approach toward enforcing formal safety constraints on end-to-end policies with visual inputs. Our approach draws on recent advances in object detection and automated reasoning for hybrid dynamical systems. The approach is evaluated on a novel benchmark that emphasizes the challenge of safely exploring in the presence of hard constraints. Our benchmark draws from several proposed problem sets for safe learning and includes problems that emphasize challenges such as reward signals that are not aligned with safety constraints. On each of these benchmark problems, our algorithm completely avoids unsafe behavior while remaining competitive at optimizing for as much reward as is safe. We also prove that our method of enforcing the safety constraints preserves all safe policies from the original environment. △ Less

Submitted 2 July, 2020; originally announced July 2020.

ACM Class: F.3.1; I.2.8

arXiv:2006.08381 [pdf, other]

DreamCoder: Growing generalizable, interpretable knowledge with wake-sleep Bayesian program learning

Authors: Kevin Ellis, Catherine Wong, Maxwell Nye, Mathias Sable-Meyer, Luc Cary, Lucas Morales, Luke Hewitt, Armando Solar-Lezama, Joshua B. Tenenbaum

Abstract: Expert problem-solving is driven by powerful languages for thinking about problems and their solutions. Acquiring expertise means learning these languages -- systems of concepts, alongside the skills to use them. We present DreamCoder, a system that learns to solve problems by writing programs. It builds expertise by creating programming languages for expressing domain concepts, together with neur… ▽ More Expert problem-solving is driven by powerful languages for thinking about problems and their solutions. Acquiring expertise means learning these languages -- systems of concepts, alongside the skills to use them. We present DreamCoder, a system that learns to solve problems by writing programs. It builds expertise by creating programming languages for expressing domain concepts, together with neural networks to guide the search for programs within these languages. A ``wake-sleep'' learning algorithm alternately extends the language with new symbolic abstractions and trains the neural network on imagined and replayed problems. DreamCoder solves both classic inductive programming tasks and creative tasks such as drawing pictures and building scenes. It rediscovers the basics of modern functional programming, vector algebra and classical physics, including Newton's and Coulomb's laws. Concepts are built compositionally from those learned earlier, yielding multi-layered symbolic representations that are interpretable and transferrable to new tasks, while still growing scalably and flexibly with experience. △ Less

Submitted 15 June, 2020; originally announced June 2020.

arXiv:2003.05562 [pdf, other]

Learning Compositional Rules via Neural Program Synthesis

Authors: Maxwell I. Nye, Armando Solar-Lezama, Joshua B. Tenenbaum, Brenden M. Lake

Abstract: Many aspects of human reasoning, including language, require learning rules from very little data. Humans can do this, often learning systematic rules from very few examples, and combining these rules to form compositional rule-based systems. Current neural architectures, on the other hand, often fail to generalize in a compositional manner, especially when evaluated in ways that vary systematical… ▽ More Many aspects of human reasoning, including language, require learning rules from very little data. Humans can do this, often learning systematic rules from very few examples, and combining these rules to form compositional rule-based systems. Current neural architectures, on the other hand, often fail to generalize in a compositional manner, especially when evaluated in ways that vary systematically from training. In this work, we present a neuro-symbolic model which learns entire rule systems from a small set of examples. Instead of directly predicting outputs from inputs, we train our model to induce the explicit system of rules governing a set of previously seen examples, drawing upon techniques from the neural program synthesis literature. Our rule-synthesis approach outperforms neural meta-learning techniques in three domains: an artificial instruction-learning domain used to evaluate human learning, the SCAN challenge datasets, and learning rule-based translations of number words into integers for a wide range of human languages. △ Less

Submitted 22 October, 2020; v1 submitted 11 March, 2020; originally announced March 2020.

Comments: NeurIPS 2020. Code can be found at https://github.com/mtensor/rulesynthesis

arXiv:1912.12659 [pdf, other]

Synthesizing Queries via Interactive Sketching

Authors: Osbert Bastani, Xin Zhang, Armando Solar-Lezama

Abstract: We propose a novel approach to program synthesis, focusing on synthesizing database queries. At a high level, our proposed algorithm takes as input a sketch with soft constraints encoding user intent, and then iteratively interacts with the user to refine the sketch. At each step, our algorithm proposes a candidate refinement of the sketch, which the user can either accept or reject. By leveraging… ▽ More We propose a novel approach to program synthesis, focusing on synthesizing database queries. At a high level, our proposed algorithm takes as input a sketch with soft constraints encoding user intent, and then iteratively interacts with the user to refine the sketch. At each step, our algorithm proposes a candidate refinement of the sketch, which the user can either accept or reject. By leveraging this rich form of user feedback, our algorithm is able to both resolve ambiguity in user intent and improve scalability. In particular, assuming the user provides accurate inputs and responses, then our algorithm is guaranteed to converge to the true program (i.e., one that the user approves) in polynomial time. We perform a qualitative evaluation of our algorithm, showing how it can be used to synthesize a variety of queries on a database of academic publications. △ Less

Submitted 11 October, 2021; v1 submitted 29 December, 2019; originally announced December 2019.

arXiv:1906.04604 [pdf, other]

Write, Execute, Assess: Program Synthesis with a REPL

Authors: Kevin Ellis, Maxwell Nye, Yewen Pu, Felix Sosa, Josh Tenenbaum, Armando Solar-Lezama

Abstract: We present a neural program synthesis approach integrating components which write, execute, and assess code to navigate the search space of possible programs. We equip the search process with an interpreter or a read-eval-print-loop (REPL), which immediately executes partially written programs, exposing their semantics. The REPL addresses a basic challenge of program synthesis: tiny changes in syn… ▽ More We present a neural program synthesis approach integrating components which write, execute, and assess code to navigate the search space of possible programs. We equip the search process with an interpreter or a read-eval-print-loop (REPL), which immediately executes partially written programs, exposing their semantics. The REPL addresses a basic challenge of program synthesis: tiny changes in syntax can lead to huge changes in semantics. We train a pair of models, a policy that proposes the new piece of code to write, and a value function that assesses the prospects of the code written so-far. At test time we can combine these models with a Sequential Monte Carlo algorithm. We apply our approach to two domains: synthesizing text editing programs and inferring 2D and 3D graphics programs. △ Less

Submitted 9 June, 2019; originally announced June 2019.

Comments: The first four authors contributed equally to this work

arXiv:1903.03229 [pdf, other]

Deductive Optimization of Relational Data Storage

Authors: John K. Feser, Samuel Madden, Nan Tang, Armando Solar-Lezama

Abstract: Optimizing the physical data storage and retrieval of data are two key database management problems. In this paper, we propose a language that can express a wide range of physical database layouts, going well beyond the row- and column-based methods that are widely used in database management systems. We use deductive synthesis to turn a high-level relational representation of a database query int… ▽ More Optimizing the physical data storage and retrieval of data are two key database management problems. In this paper, we propose a language that can express a wide range of physical database layouts, going well beyond the row- and column-based methods that are widely used in database management systems. We use deductive synthesis to turn a high-level relational representation of a database query into a highly optimized low-level implementation which operates on a specialized layout of the dataset. We build a compiler for this language and conduct experiments using a popular database benchmark, which shows that the performance of these specialized queries is competitive with a state-of-the-art in memory compiled database system. △ Less

Submitted 5 February, 2020; v1 submitted 7 March, 2019; originally announced March 2019.

arXiv:1902.06349 [pdf, other]

Learning to Infer Program Sketches

Authors: Maxwell Nye, Luke Hewitt, Joshua Tenenbaum, Armando Solar-Lezama

Abstract: Our goal is to build systems which write code automatically from the kinds of specifications humans can most easily provide, such as examples and natural language instruction. The key idea of this work is that a flexible combination of pattern recognition and explicit reasoning can be used to solve these complex programming problems. We propose a method for dynamically integrating these types of i… ▽ More Our goal is to build systems which write code automatically from the kinds of specifications humans can most easily provide, such as examples and natural language instruction. The key idea of this work is that a flexible combination of pattern recognition and explicit reasoning can be used to solve these complex programming problems. We propose a method for dynamically integrating these types of information. Our novel intermediate representation and training algorithm allow a program synthesis system to learn, without direct supervision, when to rely on pattern recognition and when to perform symbolic search. Our model matches the memorization and generalization performance of neural synthesis and symbolic search, respectively, and achieves state-of-the-art performance on a dataset of simple English description-to-code programming problems. △ Less

Submitted 4 June, 2019; v1 submitted 17 February, 2019; originally announced February 2019.

Comments: Accepted to ICML 2019

arXiv:1812.02573 [pdf, other]

Probabilistic Verification of Fairness Properties via Concentration

Authors: Osbert Bastani, Xin Zhang, Armando Solar-Lezama

Abstract: As machine learning systems are increasingly used to make real world legal and financial decisions, it is of paramount importance that we develop algorithms to verify that these systems do not discriminate against minorities. We design a scalable algorithm for verifying fairness specifications. Our algorithm obtains strong correctness guarantees based on adaptive concentration inequalities; such i… ▽ More As machine learning systems are increasingly used to make real world legal and financial decisions, it is of paramount importance that we develop algorithms to verify that these systems do not discriminate against minorities. We design a scalable algorithm for verifying fairness specifications. Our algorithm obtains strong correctness guarantees based on adaptive concentration inequalities; such inequalities enable our algorithm to adaptively take samples until it has enough data to make a decision. We implement our algorithm in a tool called VeriFair, and show that it scales to large machine learning models, including a deep recurrent neural network that is more than five orders of magnitude larger than the largest previously-verified neural network. While our technique only gives probabilistic guarantees due to the use of random samples, we show that we can choose the probability of error to be extremely small. △ Less

Submitted 30 December, 2019; v1 submitted 2 December, 2018; originally announced December 2018.

arXiv:1807.08137 [pdf, other]

Delta-Decision Procedures for Exists-Forall Problems over the Reals

Authors: Soonho Kong, Armando Solar-Lezama, Sicun Gao

Abstract: Solving nonlinear SMT problems over real numbers has wide applications in robotics and AI. While significant progress is made in solving quantifier-free SMT formulas in the domain, quantified formulas have been much less investigated. We propose the first delta-complete algorithm for solving satisfiability of nonlinear SMT over real numbers with universal quantification and a wide range of nonline… ▽ More Solving nonlinear SMT problems over real numbers has wide applications in robotics and AI. While significant progress is made in solving quantifier-free SMT formulas in the domain, quantified formulas have been much less investigated. We propose the first delta-complete algorithm for solving satisfiability of nonlinear SMT over real numbers with universal quantification and a wide range of nonlinear functions. Our methods combine ideas from counterexample-guided synthesis, interval constraint propagation, and local optimization. In particular, we show how special care is required in handling the interleaving of numerical and symbolic reasoning to ensure delta-completeness. In experiments, we show that the proposed algorithms can handle many new problems beyond the reach of existing SMT solvers. △ Less

Submitted 21 July, 2018; originally announced July 2018.

arXiv:1805.08328 [pdf, other]

Verifiable Reinforcement Learning via Policy Extraction

Authors: Osbert Bastani, Yewen Pu, Armando Solar-Lezama

Abstract: While deep reinforcement learning has successfully solved many challenging control tasks, its real-world applicability has been limited by the inability to ensure the safety of learned policies. We propose an approach to verifiable reinforcement learning by training decision tree policies, which can represent complex policies (since they are nonparametric), yet can be efficiently verified using ex… ▽ More While deep reinforcement learning has successfully solved many challenging control tasks, its real-world applicability has been limited by the inability to ensure the safety of learned policies. We propose an approach to verifiable reinforcement learning by training decision tree policies, which can represent complex policies (since they are nonparametric), yet can be efficiently verified using existing techniques (since they are highly structured). The challenge is that decision tree policies are difficult to train. We propose VIPER, an algorithm that combines ideas from model compression and imitation learning to learn decision tree policies guided by a DNN policy (called the oracle) and its Q-function, and show that it substantially outperforms two baselines. We use VIPER to (i) learn a provably robust decision tree policy for a variant of Atari Pong with a symbolic state space, (ii) learn a decision tree policy for a toy game based on Pong that provably never loses, and (iii) learn a provably stable decision tree policy for cart-pole. In each case, the decision tree policy achieves performance equal to that of the original DNN policy. △ Less

Submitted 23 January, 2019; v1 submitted 21 May, 2018; originally announced May 2018.

arXiv:1803.07244 [pdf, other]

The Three Pillars of Machine Programming

Authors: Justin Gottschlich, Armando Solar-Lezama, Nesime Tatbul, Michael Carbin, Martin Rinard, Regina Barzilay, Saman Amarasinghe, Joshua B Tenenbaum, Tim Mattson

Abstract: In this position paper, we describe our vision of the future of machine programming through a categorical examination of three pillars of research. Those pillars are: (i) intention, (ii) invention, and(iii) adaptation. Intention emphasizes advancements in the human-to-computer and computer-to-machine-learning interfaces. Invention emphasizes the creation or refinement of algorithms or core hardwar… ▽ More In this position paper, we describe our vision of the future of machine programming through a categorical examination of three pillars of research. Those pillars are: (i) intention, (ii) invention, and(iii) adaptation. Intention emphasizes advancements in the human-to-computer and computer-to-machine-learning interfaces. Invention emphasizes the creation or refinement of algorithms or core hardware and software building blocks through machine learning (ML). Adaptation emphasizes advances in the use of ML-based constructs to autonomously evolve software. △ Less

Submitted 26 June, 2021; v1 submitted 19 March, 2018; originally announced March 2018.

arXiv:1802.07384 [pdf, other]

Interpreting Neural Network Judgments via Minimal, Stable, and Symbolic Corrections

Authors: Xin Zhang, Armando Solar-Lezama, Rishabh Singh

Abstract: We present a new algorithm to generate minimal, stable, and symbolic corrections to an input that will cause a neural network with ReLU activations to change its output. We argue that such a correction is a useful way to provide feedback to a user when the network's output is different from a desired output. Our algorithm generates such a correction by solving a series of linear constraint satisfa… ▽ More We present a new algorithm to generate minimal, stable, and symbolic corrections to an input that will cause a neural network with ReLU activations to change its output. We argue that such a correction is a useful way to provide feedback to a user when the network's output is different from a desired output. Our algorithm generates such a correction by solving a series of linear constraint satisfaction problems. The technique is evaluated on three neural network models: one predicting whether an applicant will pay a mortgage, one predicting whether a first-order theorem can be proved efficiently by a solver using certain heuristics, and the final one judging whether a drawing is an accurate rendition of a canonical drawing of a cat. △ Less

Submitted 30 August, 2018; v1 submitted 20 February, 2018; originally announced February 2018.

Comments: 24 pages

MSC Class: 68T01

arXiv:1802.04408 [pdf, other]

REAS: Combining Numerical Optimization with SAT Solving

Authors: Jeevana Priya Inala, Sicun Gao, Soonho Kong, Armando Solar-Lezama

Abstract: In this paper, we present ReaS, a technique that combines numerical optimization with SAT solving to synthesize unknowns in a program that involves discrete and floating point computation. ReaS makes the program end-to-end differentiable by smoothing any Boolean expression that introduces discontinuity such as conditionals and relaxing the Boolean unknowns so that numerical optimization can be per… ▽ More In this paper, we present ReaS, a technique that combines numerical optimization with SAT solving to synthesize unknowns in a program that involves discrete and floating point computation. ReaS makes the program end-to-end differentiable by smoothing any Boolean expression that introduces discontinuity such as conditionals and relaxing the Boolean unknowns so that numerical optimization can be performed. On top of this, ReaS uses a SAT solver to help the numerical search overcome local solutions by incrementally fixing values to the Boolean expressions. We evaluated the approach on 5 case studies involving hybrid systems and show that ReaS can synthesize programs that could not be solved by previous SMT approaches. △ Less

Submitted 12 February, 2018; originally announced February 2018.

arXiv:1711.11438 [pdf, other]

doi 10.4204/EPTCS.260.9

SyGuS-Comp 2017: Results and Analysis

Authors: Rajeev Alur, Dana Fisman, Rishabh Singh, Armando Solar-Lezama

Abstract: Syntax-Guided Synthesis (SyGuS) is the computational problem of finding an implementation f that meets both a semantic constraint given by a logical formula phi in a background theory T, and a syntactic constraint given by a grammar G, which specifies the allowed set of candidate implementations. Such a synthesis problem can be formally defined in SyGuS-IF, a language that is built on top of SMT-L… ▽ More Syntax-Guided Synthesis (SyGuS) is the computational problem of finding an implementation f that meets both a semantic constraint given by a logical formula phi in a background theory T, and a syntactic constraint given by a grammar G, which specifies the allowed set of candidate implementations. Such a synthesis problem can be formally defined in SyGuS-IF, a language that is built on top of SMT-LIB. The Syntax-Guided Synthesis Competition (SyGuS-Comp) is an effort to facilitate, bring together and accelerate research and development of efficient solvers for SyGuS by providing a platform for evaluating different synthesis techniques on a comprehensive set of benchmarks. In this year's competition six new solvers competed on over 1500 benchmarks. This paper presents and analyses the results of SyGuS-Comp'17. △ Less

Submitted 28 November, 2017; originally announced November 2017.

Comments: In Proceedings SYNT 2017, arXiv:1711.10224. arXiv admin note: text overlap with arXiv:1611.07627, arXiv:1602.01170

Journal ref: EPTCS 260, 2017, pp. 97-115

arXiv:1711.03243 [pdf, other]

Selecting Representative Examples for Program Synthesis

Authors: Yewen Pu, Zachery Miranda, Armando Solar-Lezama, Leslie Pack Kaelbling

Abstract: Program synthesis is a class of regression problems where one seeks a solution, in the form of a source-code program, map** the inputs to their corresponding outputs exactly. Due to its precise and combinatorial nature, program synthesis is commonly formulated as a constraint satisfaction problem, where input-output examples are encoded as constraints and solved with a constraint solver. A key c… ▽ More Program synthesis is a class of regression problems where one seeks a solution, in the form of a source-code program, map** the inputs to their corresponding outputs exactly. Due to its precise and combinatorial nature, program synthesis is commonly formulated as a constraint satisfaction problem, where input-output examples are encoded as constraints and solved with a constraint solver. A key challenge of this formulation is scalability: while constraint solvers work well with a few well-chosen examples, a large set of examples can incur significant overhead in both time and memory. We describe a method to discover a subset of examples that is both small and representative: the subset is constructed iteratively, using a neural network to predict the probability of unchosen examples conditioned on the chosen examples in the subset, and greedily adding the least probable example. We empirically evaluate the representativeness of the subsets constructed by our method, and demonstrate such subsets can significantly improve synthesis time and stability. △ Less

Submitted 7 June, 2018; v1 submitted 8 November, 2017; originally announced November 2017.

ACM Class: I.2.2

arXiv:1710.10385 [pdf, other]

Capturing the Future by Replaying the Past

Authors: James Koppel, Gabriel Scherer, Armando Solar-Lezama

Abstract: Delimited continuations are the mother of all monads! So goes the slogan inspired by Filinski's 1994 paper, which showed that delimited continuations can implement any monadic effect, letting the programmer use an effect as easily as if it was built into the language. It's a shame that not many languages have delimited continuations. Luckily, exceptions and state are also the mother of all monad… ▽ More Delimited continuations are the mother of all monads! So goes the slogan inspired by Filinski's 1994 paper, which showed that delimited continuations can implement any monadic effect, letting the programmer use an effect as easily as if it was built into the language. It's a shame that not many languages have delimited continuations. Luckily, exceptions and state are also the mother of all monads! In this Pearl, we show how to implement delimited continuations in terms of exceptions and state, a construction we call $\textit{thermometer continuations}$. While traditional implementations of delimited continuations require some way of "capturing" an intermediate state of the computation, the insight of thermometer continuations is to reach this intermediate state by replaying the entire computation from the start, guiding it using a recording it so that the same thing happens until the captured point. Along the way, we explain delimited continuations and monadic reflection, show how the Filinski construction lets thermometer continuations express any monadic effect, share an elegant special-case for nondeterminism, and discuss why our construction is not prevented by theoretical results that exceptions and state cannot macro-express continuations. △ Less

Submitted 5 July, 2018; v1 submitted 28 October, 2017; originally announced October 2017.

arXiv:1707.09627 [pdf, other]

Learning to Infer Graphics Programs from Hand-Drawn Images

Authors: Kevin Ellis, Daniel Ritchie, Armando Solar-Lezama, Joshua B. Tenenbaum

Abstract: We introduce a model that learns to convert simple hand drawings into graphics programs written in a subset of \LaTeX. The model combines techniques from deep learning and program synthesis. We learn a convolutional neural network that proposes plausible drawing primitives that explain an image. These drawing primitives are like a trace of the set of primitive commands issued by a graphics program… ▽ More We introduce a model that learns to convert simple hand drawings into graphics programs written in a subset of \LaTeX. The model combines techniques from deep learning and program synthesis. We learn a convolutional neural network that proposes plausible drawing primitives that explain an image. These drawing primitives are like a trace of the set of primitive commands issued by a graphics program. We learn a model that uses program synthesis techniques to recover a graphics program from that trace. These programs have constructs like variable bindings, iterative loops, or simple kinds of conditionals. With a graphics program in hand, we can correct errors made by the deep network, measure similarity between drawings by use of similar high-level geometric structures, and extrapolate drawings. Taken together these results are a step towards agents that induce useful, human-readable programs from perceptual input. △ Less

Submitted 26 October, 2018; v1 submitted 30 July, 2017; originally announced July 2017.

MSC Class: 68T05

arXiv:1707.04600 [pdf, other]

doi 10.1145/3276492

One Tool, Many Languages: Language-Parametric Transformation with Incremental Parametric Syntax

Authors: James Koppel, Varot Premtoon, Armando Solar-Lezama

Abstract: We present a new approach for building source-to-source transformations that can run on multiple programming languages, based on a new way of representing programs called incremental parametric syntax. We implement this approach in Haskell in our Cubix system, and construct incremental parametric syntaxes for C, Java, JavaScript, Lua, and Python. We demonstrate a whole-program refactoring tool tha… ▽ More We present a new approach for building source-to-source transformations that can run on multiple programming languages, based on a new way of representing programs called incremental parametric syntax. We implement this approach in Haskell in our Cubix system, and construct incremental parametric syntaxes for C, Java, JavaScript, Lua, and Python. We demonstrate a whole-program refactoring tool that runs on all of them, along with three smaller transformations that each run on several. Our evaluation shows that (1) once a transformation is written, little work is required to configure it for a new language (2) transformations built this way output readable code which preserve the structure of the original, according to participants in our human study, and (3) our transformations can still handle language corner-cases, as validated on compiler test suites. △ Less

Submitted 1 October, 2018; v1 submitted 14 July, 2017; originally announced July 2017.

ACM Class: D.3.4; D.3.1

Journal ref: Proc. ACM Program. Lang., Vol. 2, No. OOPSLA, Article 122. Publication date: November 2018

arXiv:1705.07478 [pdf, other]

Report of the HPC Correctness Summit, Jan 25--26, 2017, Washington, DC

Authors: Ganesh Gopalakrishnan, Paul D. Hovland, Costin Iancu, Sriram Krishnamoorthy, Ignacio Laguna, Richard A. Lethin, Koushik Sen, Stephen F. Siegel, Armando Solar-Lezama

Abstract: Maintaining leadership in HPC requires the ability to support simulations at large scales and fidelity. In this study, we detail one of the most significant productivity challenges in achieving this goal, namely the increasing proclivity to bugs, especially in the face of growing hardware and software heterogeneity and sheer system scale. We identify key areas where timely new research must be pro… ▽ More Maintaining leadership in HPC requires the ability to support simulations at large scales and fidelity. In this study, we detail one of the most significant productivity challenges in achieving this goal, namely the increasing proclivity to bugs, especially in the face of growing hardware and software heterogeneity and sheer system scale. We identify key areas where timely new research must be proactively begun to address these challenges, and create new correctness tools that must ideally play a significant role even while ram** up toward exacale. We close with the proposal for a two-day workshop in which the problems identified in this report can be more broadly discussed, and specific plans to launch these new research thrusts identified. △ Less

Submitted 21 May, 2017; originally announced May 2017.

Comments: 57 pages

arXiv:1704.06131 [pdf, other]

Learning to Acquire Information

Authors: Yewen Pu, Leslie P Kaelbling, Armando Solar-Lezama

Abstract: We consider the problem of diagnosis where a set of simple observations are used to infer a potentially complex hidden hypothesis. Finding the optimal subset of observations is intractable in general, thus we focus on the problem of active diagnosis, where the agent selects the next most-informative observation based on the results of previous observations. We show that under the assumption of uni… ▽ More We consider the problem of diagnosis where a set of simple observations are used to infer a potentially complex hidden hypothesis. Finding the optimal subset of observations is intractable in general, thus we focus on the problem of active diagnosis, where the agent selects the next most-informative observation based on the results of previous observations. We show that under the assumption of uniform observation entropy, one can build an implication model which directly predicts the outcome of the potential next observation conditioned on the results of past observations, and selects the observation with the maximum entropy. This approach enjoys reduced computation complexity by bypassing the complicated hypothesis space, and can be trained on observation data alone, learning how to query without knowledge of the hidden hypothesis. △ Less

Submitted 11 July, 2017; v1 submitted 20 April, 2017; originally announced April 2017.

arXiv:1611.07627 [pdf, other]

doi 10.4204/EPTCS.229.13

SyGuS-Comp 2016: Results and Analysis

Authors: Rajeev Alur, Dana Fisman, Rishabh Singh, Armando Solar-Lezama

Abstract: Syntax-Guided Synthesis (SyGuS) is the computational problem of finding an implementation f that meets both a semantic constraint given by a logical formula $\varphi$ in a background theory T, and a syntactic constraint given by a grammar G, which specifies the allowed set of candidate implementations. Such a synthesis problem can be formally defined in SyGuS-IF, a language that is built on top of… ▽ More Syntax-Guided Synthesis (SyGuS) is the computational problem of finding an implementation f that meets both a semantic constraint given by a logical formula $\varphi$ in a background theory T, and a syntactic constraint given by a grammar G, which specifies the allowed set of candidate implementations. Such a synthesis problem can be formally defined in SyGuS-IF, a language that is built on top of SMT-LIB. The Syntax-Guided Synthesis Competition (SyGuS-Comp) is an effort to facilitate, bring together and accelerate research and development of efficient solvers for SyGuS by providing a platform for evaluating different synthesis techniques on a comprehensive set of benchmarks. In this year's competition we added a new track devoted to programming by examples. This track consisted of two categories, one using the theory of bit-vectors and one using the theory of strings. This paper presents and analyses the results of SyGuS-Comp'16. △ Less

Submitted 22 November, 2016; originally announced November 2016.

Comments: In Proceedings SYNT 2016, arXiv:1611.07178. arXiv admin note: text overlap with arXiv:1602.01170

ACM Class: I.2.2; D.1.2; D.2.2; D.2.4;

Journal ref: EPTCS 229, 2016, pp. 178-202

arXiv:1607.03445 [pdf, other]

Liquid Information Flow Control

Authors: Nadia Polikarpova, Deian Stefan, Jean Yang, Shachar Itzhaky, Travis Hance, Armando Solar-Lezama

Abstract: We present Lifty, a domain-specific language for data-centric applications that manipulate sensitive data. A Lifty programmer annotates the sources of sensitive data with declarative security policies, and the language statically and automatically verifies that the application handles the data according to the policies. Moreover, if verification fails, Lifty suggests a provably correct repair, the… ▽ More We present Lifty, a domain-specific language for data-centric applications that manipulate sensitive data. A Lifty programmer annotates the sources of sensitive data with declarative security policies, and the language statically and automatically verifies that the application handles the data according to the policies. Moreover, if verification fails, Lifty suggests a provably correct repair, thereby easing the programmer burden of implementing policy enforcing code throughout the application. The main insight behind Lifty is to encode information flow control using liquid types, an expressive yet decidable type system. Liquid types enable fully automatic checking of complex, data dependent policies, and power our repair mechanism via type-driven error localization and patch synthesis. Our experience using Lifty to implement three case studies from the literature shows that (1) the Lifty policy language is sufficiently expressive to specify many real-world policies, (2) the Lifty type checker is able to verify secure programs and find leaks in insecure programs quickly, and (3) even if the programmer leaves out all policy enforcing code, the Lifty repair engine is able to patch all leaks automatically within a reasonable time. △ Less

Submitted 30 June, 2020; v1 submitted 12 July, 2016; originally announced July 2016.

arXiv:1607.02902 [pdf, other]

sk_p: a neural program corrector for MOOCs

Authors: Yewen Pu, Karthik Narasimhan, Armando Solar-Lezama, Regina Barzilay

Abstract: We present a novel technique for automatic program correction in MOOCs, capable of fixing both syntactic and semantic errors without manual, problem specific correction strategies. Given an incorrect student program, it generates candidate programs from a distribution of likely corrections, and checks each candidate for correctness against a test suite. The key observation is that in MOOCs many… ▽ More We present a novel technique for automatic program correction in MOOCs, capable of fixing both syntactic and semantic errors without manual, problem specific correction strategies. Given an incorrect student program, it generates candidate programs from a distribution of likely corrections, and checks each candidate for correctness against a test suite. The key observation is that in MOOCs many programs share similar code fragments, and the seq2seq neural network model, used in the natural-language processing task of machine translation, can be modified and trained to recover these fragments. Experiment shows our scheme can correct 29% of all incorrect submissions and out-performs state of the art approach which requires manual, problem specific correction strategies. △ Less

Submitted 11 July, 2016; originally announced July 2016.

arXiv:1602.07285 [pdf, other]

Automatic Generation of Formula Simplifiers based on Conditional Rewrite Rules

Authors: Rohit Singh, Armando Solar-Lezama

Abstract: This paper addresses the problem of creating simplifiers for logic formulas based on conditional term rewriting. In particular, the paper focuses on a program synthesis application where formula simplifications have been shown to have a significant impact. We show that by combining machine learning techniques with constraint-based synthesis, it is possible to synthesize a formula simplifier fully… ▽ More This paper addresses the problem of creating simplifiers for logic formulas based on conditional term rewriting. In particular, the paper focuses on a program synthesis application where formula simplifications have been shown to have a significant impact. We show that by combining machine learning techniques with constraint-based synthesis, it is possible to synthesize a formula simplifier fully automatically from a corpus of representative problems, making it possible to create formula simplifiers tailored to specific problem domains. We demonstrate the benefits of our approach for synthesis benchmarks from the SyGuS competition and automated grading. △ Less

Submitted 23 February, 2016; originally announced February 2016.

Comments: Submitted for peer reviewed conference

Showing 1–50 of 60 results for author: Solar-Lezama, A