Search | arXiv e-print repository

doi 10.1145/3519939.3523712

Synthesizing Analytical SQL Queries from Computation Demonstration

Authors: Xiangyu Zhou, Rastislav Bodik, Alvin Cheung, Chenglong Wang

Abstract: Analytical SQL is widely used in modern database applications and data analysis. However, its partitioning and grou** operators are challenging for novice users. Unfortunately, programming by example, shown effective on standard SQL, are less attractive because examples for analytical queries are more laborious to solve by hand. To make demonstrations easier to create, we designed a new end-us… ▽ More Analytical SQL is widely used in modern database applications and data analysis. However, its partitioning and grou** operators are challenging for novice users. Unfortunately, programming by example, shown effective on standard SQL, are less attractive because examples for analytical queries are more laborious to solve by hand. To make demonstrations easier to create, we designed a new end-user specification, programming by computation demonstration, that allows the user to demonstrate the task using a (possibly incomplete) cell-level computation trace. This specification is exploited in a new abstraction-based synthesis algorithm to prove that a partially formed query cannot be completed to satisfy the specification, allowing us to prune the search space. We implemented our approach in a tool named Sickle and tested it on 80 real-world analytical SQL tasks. Results show that even from small demonstrations, Sickle can solve 76 tasks, in 12.8 seconds on average, while the prior approaches can solve only 60 tasks and are on average 22.5x slower. Our user study with 13 participants reveals that our specification increases user efficiency and confidence on challenging tasks. △ Less

Submitted 22 April, 2022; v1 submitted 14 April, 2022; originally announced April 2022.

arXiv:2110.01182 [pdf, other]

Differentiable 3D CAD Programs for Bidirectional Editing

Authors: Dan Cascaval, Mira Shalah, Phillip Quinn, Rastislav Bodik, Maneesh Agrawala, Adriana Schulz

Abstract: Modern CAD tools represent 3D designs not only as geometry, but also as a program composed of geometric operations, each of which depends on a set of parameters. Program representations enable meaningful and controlled shape variations via parameter changes. However, achieving desired modifications solely through parameter editing is challenging when CAD models have not been explicitly authored to… ▽ More Modern CAD tools represent 3D designs not only as geometry, but also as a program composed of geometric operations, each of which depends on a set of parameters. Program representations enable meaningful and controlled shape variations via parameter changes. However, achieving desired modifications solely through parameter editing is challenging when CAD models have not been explicitly authored to expose select degrees of freedom in advance. We introduce a novel bidirectional editing system for 3D CAD programs. In addition to editing the CAD program, users can directly manipulate 3D geometry and our system infers parameter updates to keep both representations in sync. We formulate inverse edits as a set of constrained optimization objectives, returning plausible updates to program parameters that both match user intent and maintain program validity. Our approach implements an automatically differentiable domain-specific language for CAD programs, providing derivatives for this optimization to be performed quickly on any expressed program. Our system enables rapid, interactive exploration of a constrained 3D design space by allowing users to manipulate the program and geometry interchangeably during design iteration. While our approach is not designed to optimize across changes in geometric topology, we show it is expressive and performant enough for users to produce a diverse set of design variants, even when the CAD program contains a large number of parameters. △ Less

Submitted 4 October, 2021; originally announced October 2021.

Comments: 13 pages, 8 figures

arXiv:2102.01024 [pdf, other]

doi 10.1145/3411764.3445249

Falx: Synthesis-Powered Visualization Authoring

Authors: Chenglong Wang, Yu Feng, Rastislav Bodik, Isil Dillig, Alvin Cheung, Amy J. Ko

Abstract: Modern visualization tools aim to allow data analysts to easily create exploratory visualizations. When the input data layout conforms to the visualization design, users can easily specify visualizations by map** data columns to visual channels of the design. However, when there is a mismatch between data layout and the design, users need to spend significant effort on data transformation. We… ▽ More Modern visualization tools aim to allow data analysts to easily create exploratory visualizations. When the input data layout conforms to the visualization design, users can easily specify visualizations by map** data columns to visual channels of the design. However, when there is a mismatch between data layout and the design, users need to spend significant effort on data transformation. We propose Falx, a synthesis-powered visualization tool that allows users to specify visualizations in a similarly simple way but without needing to worry about data layout. In Falx, users specify visualizations using examples of how concrete values in the input are mapped to visual channels, and Falx automatically infers the visualization specification and transforms the data to match the design. In a study with 33 data analysts on four visualization tasks involving data transformation, we found that users can effectively adopt Falx to create visualizations they otherwise cannot implement. △ Less

Submitted 1 February, 2021; originally announced February 2021.

Comments: CHI 2021

arXiv:2007.03704 [pdf]

Computer-Aided Personalized Education

Authors: Rajeev Alur, Richard Baraniuk, Rastislav Bodik, Ann Drobnis, Sumit Gulwani, Bjoern Hartmann, Yasmin Kafai, Jeff Karpicke, Ran Libeskind-Hadas, Debra Richardson, Armando Solar-Lezama, Candace Thille, Moshe Vardi

Abstract: The shortage of people trained in STEM fields is becoming acute, and universities and colleges are straining to satisfy this demand. In the case of computer science, for instance, the number of US students taking introductory courses has grown three-fold in the past decade. Recently, massive open online courses (MOOCs) have been promoted as a way to ease this strain. This at best provides access t… ▽ More The shortage of people trained in STEM fields is becoming acute, and universities and colleges are straining to satisfy this demand. In the case of computer science, for instance, the number of US students taking introductory courses has grown three-fold in the past decade. Recently, massive open online courses (MOOCs) have been promoted as a way to ease this strain. This at best provides access to education. The bigger challenge though is co** with heterogeneous backgrounds of different students, retention, providing feedback, and assessment. Personalized education relying on computational tools can address this challenge. While automated tutoring has been studied at different times in different communities, recent advances in computing and education technology offer exciting opportunities to transform the manner in which students learn. In particular, at least three trends are significant. First, progress in logical reasoning, data analytics, and natural language processing has led to tutoring tools for automatic assessment, personalized instruction including targeted feedback, and adaptive content generation for a variety of subjects. Second, research in the science of learning and human-computer interaction is leading to a better understanding of how different students learn, when and what types of interventions are effective for different instructional goals, and how to measure the success of educational tools. Finally, the recent emergence of online education platforms, both in academia and industry, is leading to new opportunities for the development of a shared infrastructure. This CCC workshop brought together researchers develo** educational tools based on technologies such as logical reasoning and machine learning with researchers in education, human-computer interaction, and cognitive psychology. △ Less

Submitted 7 July, 2020; originally announced July 2020.

Comments: A Computing Community Consortium (CCC) workshop report, 12 pages

Report number: ccc2016report_6

arXiv:2003.06324 [pdf, other]

Fireiron: A Scheduling Language for High-Performance Linear Algebra on GPUs

Authors: Bastian Hagedorn, Archibald Samuel Elliott, Henrik Barthels, Rastislav Bodik, Vinod Grover

Abstract: Achieving high-performance GPU kernels requires optimizing algorithm implementations to the targeted GPU architecture. It is of utmost importance to fully use the compute and memory hierarchy, as well as available specialised hardware. Currently, vendor libraries like cuBLAS and cuDNN provide the best performing implementations of GPU algorithms. However the task of the library programmer is incre… ▽ More Achieving high-performance GPU kernels requires optimizing algorithm implementations to the targeted GPU architecture. It is of utmost importance to fully use the compute and memory hierarchy, as well as available specialised hardware. Currently, vendor libraries like cuBLAS and cuDNN provide the best performing implementations of GPU algorithms. However the task of the library programmer is incredibly challenging: for each provided algorithm, high-performance implementations have to be developed for all commonly used architectures, input sizes, and different storage formats. These implementations are generally provided as optimized assembly code because performance-critical architectural features are only exposed at this level. This prevents reuse between different implementations of even the same algorithm, as simple differences can have major effects on low-level implementation details. In this paper we introduce Fireiron, a DSL and compiler which allows the specification of high-performance GPU implementations as compositions of simple and reusable building blocks. We show how to use Fireiron to optimize matrix multiplication implementations, achieving performance matching hand-coded CUDA kernels, even when using specialised hardware such as NIVIDA Tensor Cores, and outperforming state-of-the-art implementations provided by cuBLAS by more than 2x. △ Less

Submitted 13 March, 2020; originally announced March 2020.

arXiv:1911.09668 [pdf, other]

Visualization by Example

Authors: Chenglong Wang, Yu Feng, Rastislav Bodik, Alvin Cheung, Isil Dillig

Abstract: While visualizations play a crucial role in gaining insights from data, generating useful visualizations from a complex dataset is far from an easy task. Besides understanding the functionality provided by existing visualization libraries, generating the desired visualization also requires resha** and aggregating the underlying data as well as composing different visual elements to achieve the i… ▽ More While visualizations play a crucial role in gaining insights from data, generating useful visualizations from a complex dataset is far from an easy task. Besides understanding the functionality provided by existing visualization libraries, generating the desired visualization also requires resha** and aggregating the underlying data as well as composing different visual elements to achieve the intended visual narrative. This paper aims to simplify visualization tasks by automatically synthesizing the required program from simple visual sketches provided by the user. Specifically, given an input data set and a visual sketch that demonstrates how to visualize a very small subset of this data, our technique automatically generates a program that can be used to visualize the entire data set. Automating visualization poses several challenges. First, because many visualization tasks require data wrangling in addition to generating plots, we need to decompose the end-to-end synthesis task into two separate sub-problems. Second, because the intermediate specification that results from the decomposition is necessarily imprecise, this makes the data wrangling task particularly challenging in our context. In this paper, we address these problems by develo** a new compositional visualization-by-example technique that (a) decomposes the end-to-end task into two different synthesis problems over different DSLs and (b) leverages bi-directional program analysis to deal with the complexity that arises from having an imprecise intermediate specification. We implemented our visualization-by-example algorithm and evaluate it on 83 visualization tasks collected from on-line forums and tutorials. Viser can solve 84% of these benchmarks within a 600 second time limit, and, for those tasks that can be solved, the desired visualization is among the top-5 generated by Viser in 70% of the cases. △ Less

Submitted 21 November, 2019; originally announced November 2019.

arXiv:1909.11206 [pdf, other]

Using human-in-the-loop synthesis to author functional reactive programs

Authors: Julie L Newcomb, Rastislav Bodik

Abstract: Programs that respond to asynchronous events are challenging to write; they are difficult to reason about and tricky to test and debug. Because these programs can have a huge space of possible input timings and interleaving, the programmer may easily miss corner cases. We propose applying synthesis to aid programmers in creating programs more easily and with a higher degree of confidence in their… ▽ More Programs that respond to asynchronous events are challenging to write; they are difficult to reason about and tricky to test and debug. Because these programs can have a huge space of possible input timings and interleaving, the programmer may easily miss corner cases. We propose applying synthesis to aid programmers in creating programs more easily and with a higher degree of confidence in their correctness. We have written an efficient encoding of functional reactive programming (FRP) semantics based on functional programming over lists lifted in Rosette. We demonstrate that this technique is state-of-the-art by first comparing its performance against two existing synthesis tools that produce list manipulation programs, and then by synthesizing a suite of benchmarks given complete specifications. We also propose an interactive tool in which a programmer provides some initial partial specification in the form of input/output examples or invariants; the tool finds ambiguity in the specification by synthesizing two candidate programs and gives the user an input that distinguishes them; the user updates the specification and continues iterating until the correct program is found. As evaluation, we demonstrate the use of the tool on a suite of benchmarks from the web programming and Internet of Things domains and walk through a sample interaction on a realistic web benchmark, showing that we can converge on the target program with a tractable number of interactions. As future work, we discuss encoding additional FRP languages to in order to explore metalinguistic features, strategies for decomposition that would allow the synthesis of larger programs, and improved programmer tools such as a GUI to more easily elicit specifications. △ Less

Submitted 24 September, 2019; originally announced September 2019.

arXiv:1902.06067 [pdf, other]

Precise Attack Synthesis for Smart Contracts

Authors: Yu Feng, Emina Torlak, Rastislav Bodik

Abstract: Smart contracts are programs running on top of blockchain platforms. They interact with each other through well-defined interfaces to perform financial transactions in a distributed system with no trusted third parties. But these interfaces also provide a favorable setting for attackers, who can exploit security vulnerabilities in smart contracts to achieve financial gain. This paper presents Sm… ▽ More Smart contracts are programs running on top of blockchain platforms. They interact with each other through well-defined interfaces to perform financial transactions in a distributed system with no trusted third parties. But these interfaces also provide a favorable setting for attackers, who can exploit security vulnerabilities in smart contracts to achieve financial gain. This paper presents SmartScopy, a system for automatic synthesis of adversarial contracts that identify and exploit vulnerabilities in a victim smart contract. Our tool explores the space of \emph{attack programs} based on the Application Binary Interface (ABI) specification of a victim smart contract in the Ethereum ecosystem. To make the synthesis tractable, we introduce \emph{summary-based symbolic evaluation}, which significantly reduces the number of instructions that our synthesizer needs to evaluate symbolically, without compromising the precision of the vulnerability query. Building on the summary-based symbolic evaluation, SmartScopy further introduces a novel approach for partitioning the synthesis search space for parallel exploration, as well as a lightweight deduction technique that can prune infeasible candidates earlier. We encoded common vulnerabilities of smart contracts in our query language, and evaluated SmartScopy on the entire data set from etherscan with $>$25K smart contracts. Our experiments demonstrate the benefits of summary-based symbolic evaluation and show that SmartScopy outperforms two state-of-the-art smart contracts analyzers, Oyente and Contractfuzz, in terms of running time, precision, and soundness. Furthermore, running on recent popular smart contracts, SmartScopy uncovers 20 vulnerable smart contracts that contain the recent BatchOverflow vulnerability and cannot be precisely detected by existing tools. △ Less

Submitted 16 February, 2019; originally announced February 2019.

arXiv:1708.00551 [pdf, other]

Bonsai: Synthesis-Based Reasoning for Type Systems

Authors: Kartik Chandra, Rastislav Bodik

Abstract: We describe algorithms for symbolic reasoning about executable models of type systems, supporting three queries intended for designers of type systems. First, we check for type soundness bugs and synthesize a counterexample program if such a bug is found. Second, we compare two versions of a type system, synthesizing a program accepted by one but rejected by the other. Third, we minimize the size… ▽ More We describe algorithms for symbolic reasoning about executable models of type systems, supporting three queries intended for designers of type systems. First, we check for type soundness bugs and synthesize a counterexample program if such a bug is found. Second, we compare two versions of a type system, synthesizing a program accepted by one but rejected by the other. Third, we minimize the size of synthesized counterexample programs. These algorithms symbolically evaluate typecheckers and interpreters, producing formulas that characterize the set of programs that fail or succeed in the typechecker and the interpreter. However, symbolically evaluating interpreters poses efficiency challenges, which are caused by having to merge execution paths of the various possible input programs. Our main contribution is the Bonsai tree, a novel symbolic representation of programs and program states which addresses these challenges. Bonsai trees encode complex syntactic information in terms of logical constraints, enabling more efficient merging. We implement these algorithms in the Bonsai tool, an assistant for type system designers. We perform case studies on how Bonsai helps test and explore a variety of type systems. Bonsai efficiently synthesizes counterexamples for soundness bugs that have been inaccessible to automatic tools, and is the first automated tool to find a counterexample for the recently discovered Scala soundness bug SI-9633. △ Less

Submitted 1 August, 2017; originally announced August 2017.

arXiv:1611.07629 [pdf, other]

doi 10.4204/EPTCS.229.6

Approaching Symbolic Parallelization by Synthesis of Recurrence Decompositions

Authors: Grigory Fedyukovich, Rastislav Bodík

Abstract: We present GraSSP, a novel approach to perform automated parallelization relying on recent advances in formal verification and synthesis. GraSSP augments an existing sequential program with an additional functionality to decompose data dependencies in loop iterations, to compute partial results, and to compose them together. We show that for some classes of the sequential prefix sum problems, suc… ▽ More We present GraSSP, a novel approach to perform automated parallelization relying on recent advances in formal verification and synthesis. GraSSP augments an existing sequential program with an additional functionality to decompose data dependencies in loop iterations, to compute partial results, and to compose them together. We show that for some classes of the sequential prefix sum problems, such parallelization can be performed efficiently. △ Less

Submitted 22 November, 2016; originally announced November 2016.

Comments: In Proceedings SYNT 2016, arXiv:1611.07178

Journal ref: EPTCS 229, 2016, pp. 55-66

arXiv:1606.09242 [pdf, other]

Swift: Compiled Inference for Probabilistic Programming Languages

Authors: Yi Wu, Lei Li, Stuart Russell, Rastislav Bodik

Abstract: A probabilistic program defines a probability measure over its semantic structures. One common goal of probabilistic programming languages (PPLs) is to compute posterior probabilities for arbitrary models and queries, given observed evidence, using a generic inference engine. Most PPL inference engines---even the compiled ones---incur significant runtime interpretation overhead, especially for con… ▽ More A probabilistic program defines a probability measure over its semantic structures. One common goal of probabilistic programming languages (PPLs) is to compute posterior probabilities for arbitrary models and queries, given observed evidence, using a generic inference engine. Most PPL inference engines---even the compiled ones---incur significant runtime interpretation overhead, especially for contingent and open-universe models. This paper describes Swift, a compiler for the BLOG PPL. Swift-generated code incorporates optimizations that eliminate interpretation overhead, maintain dynamic dependencies efficiently, and handle memory management for possible worlds of varying sizes. Experiments comparing Swift with other PPL engines on a variety of inference problems demonstrate speedups ranging from 12x to 326x. △ Less

Submitted 30 June, 2016; originally announced June 2016.

Comments: IJCAI 2016

arXiv:1604.04729 [pdf, other]

SIMPL: A DSL for Automatic Specialization of Inference Algorithms

Authors: Rohin Shah, Emina Torlak, Rastislav Bodik

Abstract: Inference algorithms in probabilistic programming languages (PPLs) can be thought of as interpreters, since an inference algorithm traverses a model given evidence to answer a query. As with interpreters, we can improve the efficiency of inference algorithms by compiling them once the model, evidence and query are known. We present SIMPL, a domain specific language for inference algorithms, which… ▽ More Inference algorithms in probabilistic programming languages (PPLs) can be thought of as interpreters, since an inference algorithm traverses a model given evidence to answer a query. As with interpreters, we can improve the efficiency of inference algorithms by compiling them once the model, evidence and query are known. We present SIMPL, a domain specific language for inference algorithms, which uses this idea in order to automatically specialize annotated inference algorithms. Due to the approach of specialization, unlike a traditional compiler, with SIMPL new inference algorithms can be added easily, and still be optimized using domain-specific information. We evaluate SIMPL and show that partial evaluation gives a 2-6x speedup, caching provides an additional 1-1.5x speedup, and generating C code yields an additional 13-20x speedup, for an overall speedup of 30-150x for several inference algorithms and models. △ Less

Submitted 16 April, 2016; originally announced April 2016.

arXiv:1505.05093 [pdf, other]

doi 10.1080/10618600.2016.1172487

Programming with models: writing statistical algorithms for general model structures with NIMBLE

Authors: Perry de Valpine, Daniel Turek, Christopher J. Paciorek, Clifford Anderson-Bergman, Duncan Temple Lang, Rastislav Bodik

Abstract: We describe NIMBLE, a system for programming statistical algorithms for general model structures within R. NIMBLE is designed to meet three challenges: flexible model specification, a language for programming algorithms that can use different models, and a balance between high-level programmability and execution efficiency. For model specification, NIMBLE extends the BUGS language and creates mode… ▽ More We describe NIMBLE, a system for programming statistical algorithms for general model structures within R. NIMBLE is designed to meet three challenges: flexible model specification, a language for programming algorithms that can use different models, and a balance between high-level programmability and execution efficiency. For model specification, NIMBLE extends the BUGS language and creates model objects, which can manipulate variables, calculate log probability values, generate simulations, and query the relationships among variables. For algorithm programming, NIMBLE provides functions that operate with model objects using two stages of evaluation. The first stage allows specialization of a function to a particular model and/or nodes, such as creating a Metropolis-Hastings sampler for a particular block of nodes. The second stage allows repeated execution of computations using the results of the first stage. To achieve efficient second-stage computation, NIMBLE compiles models and functions via C++, using the Eigen library for linear algebra, and provides the user with an interface to compiled objects. The NIMBLE language represents a compilable domain-specific language (DSL) embedded within R. This paper provides an overview of the design and rationale for NIMBLE along with illustrative examples including importance sampling, Markov chain Monte Carlo (MCMC) and Monte Carlo expectation maximization (MCEM). △ Less

Submitted 12 April, 2016; v1 submitted 19 May, 2015; originally announced May 2015.

Comments: 20 pages, 2 figures

Journal ref: Journal of Computational and Graphical Statistics (2017) 26: 403-413

Showing 1–13 of 13 results for author: Bodik, R