Search | arXiv e-print repository

Searching Entangled Program Spaces

Authors: James Koppel, Zheng Guo, Edsko de Vries, Armando Solar-Lezama, Nadia Polikarpova

Abstract: Many problem domains, including program synthesis and rewrite-based optimization, require searching astronomically large spaces of programs. Existing approaches often rely on building specialized data structures -- version-space algebras, finite tree automata, or e-graphs -- to compactly represent these programs. To find a compact representation, existing data structures exploit independence of su… ▽ More Many problem domains, including program synthesis and rewrite-based optimization, require searching astronomically large spaces of programs. Existing approaches often rely on building specialized data structures -- version-space algebras, finite tree automata, or e-graphs -- to compactly represent these programs. To find a compact representation, existing data structures exploit independence of subterms; they blow up when the choices of subterms are entangled. We introduce equality-constrained tree automata (ECTAs), a generalization of the three aforementioned data structures that can efficiently represent large spaces of programs with entangled subterms. We present efficient algorithms for extracting programs from ECTAs, implemented in a performant Haskell library, \texttt{ecta}. Using \texttt{ecta} we construct \textsc{Hectare}, a type-driven program synthesizer for Haskell. \textsc{Hectare} significantly outperforms a state-of-the-art synthesizer Hoogle+ -- providing an average speedup of 8x -- despite its implementation being an order of magnitude smaller. △ Less

Submitted 15 June, 2022; originally announced June 2022.

arXiv:2206.04615 [pdf, other]

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting. △ Less

Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

arXiv:2107.12568 [pdf, other]

Version Space Algebras are Acyclic Tree Automata

Authors: James Koppel

Abstract: Version space algebras are ways of representing spaces of programs which can be combined using union, intersection, and cross-product/``join" operators. In their reified form as ASTs with explicit union and join nodes, they have the ability to compactly represent exponentially-large spaces of programs, owing to which they have become become the most popular approach to enumerative program synthesi… ▽ More Version space algebras are ways of representing spaces of programs which can be combined using union, intersection, and cross-product/``join" operators. In their reified form as ASTs with explicit union and join nodes, they have the ability to compactly represent exponentially-large spaces of programs, owing to which they have become become the most popular approach to enumerative program synthesis since the introduction of FlashFill in 2010. We present a linear-time semantics-preserving constructive embedding from version space algebras into nondeterministic finite tree automata, showing that the former are but a special case of the latter. Combined with recent results finding a correspondence between e-graphs and minimal deterministic tree automata, this shows that tree automata are strict generalizations of all recent major approaches to efficiently representing large spaces of programs by sharing. △ Less

Submitted 26 July, 2021; originally announced July 2021.

arXiv:2104.00739 [pdf, other]

doi 10.31219/osf.io/t4qs8

Formal Methods for the Informal Engineer: Workshop Recommendations

Authors: Gopal Sarma, James Koppel, Gregory Malecha, Patrick Schultz, Eric Drexler, Ramana Kumar, Cody Roux, Philip Zucker

Abstract: Formal Methods for the Informal Engineer (FMIE) was a workshop held at the Broad Institute of MIT and Harvard in 2021 to explore the potential role of verified software in the biomedical software ecosystem. The motivation for organizing FMIE was the recognition that the life sciences and medicine are undergoing a transition from being passive consumers of software and AI/ML technologies to fundame… ▽ More Formal Methods for the Informal Engineer (FMIE) was a workshop held at the Broad Institute of MIT and Harvard in 2021 to explore the potential role of verified software in the biomedical software ecosystem. The motivation for organizing FMIE was the recognition that the life sciences and medicine are undergoing a transition from being passive consumers of software and AI/ML technologies to fundamental drivers of new platforms, including those which will need to be mission and safety-critical. Drawing on conversations leading up to and during the workshop, we make five concrete recommendations to help software leaders organically incorporate tools, techniques, and perspectives from formal methods into their project planning and development trajectories. △ Less

Submitted 1 April, 2021; originally announced April 2021.

Comments: 6 pages

arXiv:2010.04918 [pdf, other]

doi 10.1145/3547648

Automatically Deriving Control-Flow Graph Generators from Operational Semantics

Authors: James Koppel, Jackson Kearl, Armando Solar-Lezama

Abstract: We develop the first theory of control-flow graphs from first principles, and use it to create an algorithm for automatically synthesizing many variants of control-flow graph generators from a language's operational semantics. Our approach first introduces a new algorithm for converting a large class of small-step operational semantics to an abstract machine. It next uses a technique called "abstr… ▽ More We develop the first theory of control-flow graphs from first principles, and use it to create an algorithm for automatically synthesizing many variants of control-flow graph generators from a language's operational semantics. Our approach first introduces a new algorithm for converting a large class of small-step operational semantics to an abstract machine. It next uses a technique called "abstract rewriting" to automatically abstract the semantics of a language, which is used both to directly generate a CFG from a program ("interpreted mode") and to generate standalone code, similar to a human-written CFG generator, for any program in a language. We show how the choice of two abstraction and projection parameters allow our approach to synthesize several families of CFG-generators useful for different kinds of tools. We prove the correspondence between the generated graphs and the original semantics. We provide and prove an algorithm for automatically proving the termination of interpreted-mode generators. In addition to our theoretical results, we have implemented this algorithm in a tool called Mandate, and show that it produces human-readable code on two medium-size languages with 60-80 rules, featuring nearly all intraprocedural control constructs common in modern languages. We then showed these CFG-generators were sufficient to build two static analyzers atop them. Our work is a promising step towards the grand vision of being able to synthesize all desired tools from the semantics of a programming language. △ Less

Submitted 22 July, 2022; v1 submitted 10 October, 2020; originally announced October 2020.

arXiv:1710.10385 [pdf, other]

Capturing the Future by Replaying the Past

Authors: James Koppel, Gabriel Scherer, Armando Solar-Lezama

Abstract: Delimited continuations are the mother of all monads! So goes the slogan inspired by Filinski's 1994 paper, which showed that delimited continuations can implement any monadic effect, letting the programmer use an effect as easily as if it was built into the language. It's a shame that not many languages have delimited continuations. Luckily, exceptions and state are also the mother of all monad… ▽ More Delimited continuations are the mother of all monads! So goes the slogan inspired by Filinski's 1994 paper, which showed that delimited continuations can implement any monadic effect, letting the programmer use an effect as easily as if it was built into the language. It's a shame that not many languages have delimited continuations. Luckily, exceptions and state are also the mother of all monads! In this Pearl, we show how to implement delimited continuations in terms of exceptions and state, a construction we call $\textit{thermometer continuations}$. While traditional implementations of delimited continuations require some way of "capturing" an intermediate state of the computation, the insight of thermometer continuations is to reach this intermediate state by replaying the entire computation from the start, guiding it using a recording it so that the same thing happens until the captured point. Along the way, we explain delimited continuations and monadic reflection, show how the Filinski construction lets thermometer continuations express any monadic effect, share an elegant special-case for nondeterminism, and discuss why our construction is not prevented by theoretical results that exceptions and state cannot macro-express continuations. △ Less

Submitted 5 July, 2018; v1 submitted 28 October, 2017; originally announced October 2017.

arXiv:1707.04600 [pdf, other]

doi 10.1145/3276492

One Tool, Many Languages: Language-Parametric Transformation with Incremental Parametric Syntax

Authors: James Koppel, Varot Premtoon, Armando Solar-Lezama

Abstract: We present a new approach for building source-to-source transformations that can run on multiple programming languages, based on a new way of representing programs called incremental parametric syntax. We implement this approach in Haskell in our Cubix system, and construct incremental parametric syntaxes for C, Java, JavaScript, Lua, and Python. We demonstrate a whole-program refactoring tool tha… ▽ More We present a new approach for building source-to-source transformations that can run on multiple programming languages, based on a new way of representing programs called incremental parametric syntax. We implement this approach in Haskell in our Cubix system, and construct incremental parametric syntaxes for C, Java, JavaScript, Lua, and Python. We demonstrate a whole-program refactoring tool that runs on all of them, along with three smaller transformations that each run on several. Our evaluation shows that (1) once a transformation is written, little work is required to configure it for a new language (2) transformations built this way output readable code which preserve the structure of the original, according to participants in our human study, and (3) our transformations can still handle language corner-cases, as validated on compiler test suites. △ Less

Submitted 1 October, 2018; v1 submitted 14 July, 2017; originally announced July 2017.

ACM Class: D.3.4; D.3.1

Journal ref: Proc. ACM Program. Lang., Vol. 2, No. OOPSLA, Article 122. Publication date: November 2018

Showing 1–7 of 7 results for author: Koppel, J