Search | arXiv e-print repository

doi 10.1145/3591285

Absynthe: Abstract Interpretation-Guided Synthesis

Authors: Sankha Narayan Guria, Jeffrey S. Foster, David Van Horn

Abstract: Synthesis tools have seen significant success in recent times. However, past approaches often require a complete and accurate embedding of the source language in the logic of the underlying solver, an approach difficult for industrial-grade languages. Other approaches couple the semantics of the source language with purpose-built synthesizers, necessarily tying the synthesis engine to a particular… ▽ More Synthesis tools have seen significant success in recent times. However, past approaches often require a complete and accurate embedding of the source language in the logic of the underlying solver, an approach difficult for industrial-grade languages. Other approaches couple the semantics of the source language with purpose-built synthesizers, necessarily tying the synthesis engine to a particular language model. In this paper, we propose Absynthe, an alternative approach based on user-defined abstract semantics that aims to be both lightweight and language agnostic, yet effective in guiding the search for programs. A synthesis goal in Absynthe is specified as an abstract specification in a lightweight user-defined abstract domain and concrete test cases. The synthesis engine is parameterized by the abstract semantics and independent of the source language. Absynthe validates candidate programs against test cases using the actual concrete language implementation to ensure correctness. We formalize the synthesis rules for Absynthe and describe how the key ideas are scaled-up in our implementation in Ruby. We evaluated Absynthe on SyGuS strings benchmark and found it competitive with other enumerative search solvers. Moreover, Absynthe's ability to combine abstract domains allows the user to move along a cost spectrum, i.e., expressive domains prune more programs but require more time. Finally, to verify Absynthe can act as a general purpose synthesis tool, we use Absynthe to synthesize Pandas data frame manipulating programs in Python using simple abstractions like types and column labels of a data frame. Absynthe reaches parity with AutoPandas, a deep learning based tool for the same benchmark suite. In summary, our results demonstrate Absynthe is a promising step forward towards a general-purpose approach to synthesis that may broaden the applicability of synthesis to more $\ldots$ △ Less

Submitted 24 April, 2023; v1 submitted 25 February, 2023; originally announced February 2023.

Comments: 22 pages, 6 figures, this is a preprint of a paper conditionally accepted at Programming Language Design and Implementation (PLDI) 2023

arXiv:2102.13183 [pdf, other]

RbSyn: Type- and Effect-Guided Program Synthesis

Authors: Sankha Narayan Guria, Jeffrey S. Foster, David Van Horn

Abstract: In recent years, researchers have explored component-based synthesis, which aims to automatically construct programs that operate by composing calls to existing APIs. However, prior work has not considered efficient synthesis of methods with side effects, e.g., web app methods that update a database. In this paper, we introduce RbSyn, a novel type- and effect-guided synthesis tool for Ruby. An RbS… ▽ More In recent years, researchers have explored component-based synthesis, which aims to automatically construct programs that operate by composing calls to existing APIs. However, prior work has not considered efficient synthesis of methods with side effects, e.g., web app methods that update a database. In this paper, we introduce RbSyn, a novel type- and effect-guided synthesis tool for Ruby. An RbSyn synthesis goal is specified as the type for the target method and a series of test cases it must pass. RbSyn works by recursively generating well-typed candidate method bodies whose write effects match the read effects of the test case assertions. After finding a set of candidates that separately satisfy each test, RbSyn synthesizes a solution that branches to execute the correct candidate code under the appropriate conditions. We formalize RbSyn on a core, object-oriented language $λ_{syn}$ and describe how the key ideas of the model are scaled-up in our implementation for Ruby. We evaluated RbSyn on 19 benchmarks, 12 of which come from popular, open-source Ruby apps. We found that RbSyn synthesizes correct solutions for all benchmarks, with 15 benchmarks synthesizing in under 9 seconds, while the slowest benchmark takes 83 seconds. Using observed reads to guide synthesize is effective: using type-guidance alone times out on 10 of 12 app benchmarks. We also found that using less precise effect annotations leads to worse synthesis performance. In summary, we believe type- and effect-guided synthesis is an important step forward in synthesis of effectful methods from test cases. △ Less

Submitted 7 April, 2021; v1 submitted 25 February, 2021; originally announced February 2021.

Comments: 17 pages, 13 figures, this is a technical report (with appendix) of a paper to appear in Programming Language Design and Implementation (PLDI) 2021

arXiv:1912.00317 [pdf, other]

An Observational Investigation of Reverse Engineers' Processes

Authors: Daniel Votipka, Seth M. Rabin, Kristopher Micinski, Jeffrey S. Foster, Michelle L. Mazurek

Abstract: Reverse engineering is a complex process essential to software-security tasks such as vulnerability discovery and malware analysis. Significant research and engineering effort has gone into develo** tools to support reverse engineers. However, little work has been done to understand the way reverse engineers think when analyzing programs, leaving tool developers to make interface design decision… ▽ More Reverse engineering is a complex process essential to software-security tasks such as vulnerability discovery and malware analysis. Significant research and engineering effort has gone into develo** tools to support reverse engineers. However, little work has been done to understand the way reverse engineers think when analyzing programs, leaving tool developers to make interface design decisions based only on intuition. This paper takes a first step toward a better understanding of reverse engineers' processes, with the goal of producing insights for improving interaction design for reverse engineering tools. We present the results of a semi-structured, observational interview study of reverse engineers (N=16). Each observation investigated the questions reverse engineers ask as they probe a program, how they answer these questions, and the decisions they make throughout the reverse engineering process. From the interview responses, we distill a model of the reverse engineering process, divided into three phases: overview, sub-component scanning, and focused experimentation. Each analysis phase's results feed the next as reverse engineers' mental representations become more concrete. We find that reverse engineers typically use static methods in the first two phases, but dynamic methods in the final phase, with experience playing large, but varying, roles in each phase. % and the role of experience varies between phases. Based on these results, we provide five interaction design guidelines for reverse engineering tools. △ Less

Submitted 30 November, 2019; originally announced December 2019.

Comments: 22 pages, 6 figures, to appear at the 2020 USENIX Security Symposium

arXiv:1906.09621 [pdf, other]

Making the Cut: A Bandit-based Approach to Tiered Interviewing

Authors: Candice Schumann, Zhi Lang, Jeffrey S. Foster, John P. Dickerson

Abstract: Given a huge set of applicants, how should a firm allocate sequential resume screenings, phone interviews, and in-person site visits? In a tiered interview process, later stages (e.g., in-person visits) are more informative, but also more expensive than earlier stages (e.g., resume screenings). Using accepted hiring models and the concept of structured interviews, a best practice in human resource… ▽ More Given a huge set of applicants, how should a firm allocate sequential resume screenings, phone interviews, and in-person site visits? In a tiered interview process, later stages (e.g., in-person visits) are more informative, but also more expensive than earlier stages (e.g., resume screenings). Using accepted hiring models and the concept of structured interviews, a best practice in human resources, we cast tiered hiring as a combinatorial pure exploration (CPE) problem in the stochastic multi-armed bandit setting. The goal is to select a subset of arms (in our case, applicants) with some combinatorial structure. We present new algorithms in both the probably approximately correct (PAC) and fixed-budget settings that select a near-optimal cohort with provable guarantees. We show via simulations on real data from one of the largest US-based computer science graduate programs that our algorithms make better hiring decisions or use less budget than the status quo. △ Less

Submitted 14 November, 2019; v1 submitted 23 June, 2019; originally announced June 2019.

arXiv:1904.03521 [pdf, other]

Type-Level Computations for Ruby Libraries

Authors: Milod Kazerounian, Sankha Narayan Guria, Niki Vazou, Jeffrey S. Foster, David Van Horn

Abstract: Many researchers have explored ways to bring static ty** to dynamic languages. However, to date, such systems are not precise enough when types depend on values, which often arises when using certain Ruby libraries. For example, the type safety of a database query in Ruby on Rails depends on the table and column names used in the query. To address this issue, we introduce CompRDL, a type system… ▽ More Many researchers have explored ways to bring static ty** to dynamic languages. However, to date, such systems are not precise enough when types depend on values, which often arises when using certain Ruby libraries. For example, the type safety of a database query in Ruby on Rails depends on the table and column names used in the query. To address this issue, we introduce CompRDL, a type system for Ruby that allows library method type signatures to include type-level computations (or comp types for short). Combined with singleton types for table and column names, comp types let us give database query methods type signatures that compute a table's schema to yield very precise type information. Comp types for hash, array, and string libraries can also increase precision and thereby reduce the need for type casts. We formalize CompRDL and prove its type system sound. Rather than type check the bodies of library methods with comp types---those methods may include native code or be complex---CompRDL inserts run-time checks to ensure library methods abide by their computed types. We evaluated CompRDL by writing annotations with type-level computations for several Ruby core libraries and database query APIs. We then used those annotations to type check two popular Ruby libraries and four Ruby on Rails web apps. We found the annotations were relatively compact and could successfully type check 132 methods across our subject programs. Moreover, the use of type-level computations allowed us to check more expressive properties, with fewer manually inserted casts, than was possible without type-level computations. In the process, we found two type errors and a documentation error that were confirmed by the developers. Thus, we believe CompRDL is an important step forward in bringing precise static type checking to dynamic languages. △ Less

Submitted 6 April, 2019; originally announced April 2019.

Comments: 22 pages, this is a technical report (with appendix) of a paper to appear in Programming Language Design and Implementation (PLDI 2019)

arXiv:1903.12247 [pdf, ps, other]

iGen: Dynamic Interaction Inference for Configurable Software

Authors: ThanhVu Nguyen, Ugur Koc, Javran Cheng, Jeffrey S. Foster, Adam A. Porter

Abstract: To develop, analyze, and evolve today's highly configurable software systems, developers need deep knowledge of a system's configuration options, e.g., how options need to be set to reach certain locations, what configurations to use for testing, etc. Today, acquiring this detailed information requires manual effort that is difficult, expensive, and error prone. In this paper, we propose iGen, a n… ▽ More To develop, analyze, and evolve today's highly configurable software systems, developers need deep knowledge of a system's configuration options, e.g., how options need to be set to reach certain locations, what configurations to use for testing, etc. Today, acquiring this detailed information requires manual effort that is difficult, expensive, and error prone. In this paper, we propose iGen, a novel, lightweight dynamic analysis technique that automatically discovers a program's \emph{interactions}---expressive logical formulae that give developers rich and detailed information about how a system's configuration option settings map to particular code coverage. iGen employs an iterative algorithm that runs a system under a small set of configurations, capturing coverage data; processes the coverage data to infer potential interactions; and then generates new configurations to further refine interactions in the next iteration. We evaluated iGen on 29 programs spanning five languages; the breadth of this study would be unachievable using prior interaction inference tools. Our results show that iGen finds precise interactions based on a very small fraction of the number of possible configurations. Moreover, iGen's results confirm several earlier hypotheses about typical interaction distributions and structures. △ Less

Submitted 28 March, 2019; originally announced March 2019.

Journal ref: 11th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE), pages 655--665. ACM, 2016

arXiv:1802.08927 [pdf, other]

doi 10.1007/978-3-319-89884-1_23

Evaluating Design Tradeoffs in Numeric Static Analysis for Java

Authors: Shiyi Wei, Piotr Mardziel, Andrew Ruef, Jeffrey S. Foster, Michael Hicks

Abstract: Numeric static analysis for Java has a broad range of potentially useful applications, including array bounds checking and resource usage estimation. However, designing a scalable numeric static analysis for real-world Java programs presents a multitude of design choices, each of which may interact with others. For example, an analysis could handle method calls via either a top-down or bottom-up i… ▽ More Numeric static analysis for Java has a broad range of potentially useful applications, including array bounds checking and resource usage estimation. However, designing a scalable numeric static analysis for real-world Java programs presents a multitude of design choices, each of which may interact with others. For example, an analysis could handle method calls via either a top-down or bottom-up interprocedural analysis. Moreover, this choice could interact with how we choose to represent aliasing in the heap and/or whether we use a relational numeric domain, e.g., convex polyhedra. In this paper, we present a family of abstract interpretation-based numeric static analyses for Java and systematically evaluate the impact of 162 analysis configurations on the DaCapo benchmark suite. Our experiment considered the precision and performance of the analyses for discharging array bounds checks. We found that top-down analysis is generally a better choice than bottom-up analysis, and that using access paths to describe heap objects is better than using summary objects corresponding to points-to analysis locations. Moreover, these two choices are the most significant, while choices about the numeric domain, representation of abstract objects, and context-sensitivity make much less difference to the precision/performance tradeoff. △ Less

Submitted 24 February, 2018; originally announced February 2018.

arXiv:1711.09281 [pdf, ps, other]

Refinement Types for Ruby

Authors: Milod Kazerounian, Niki Vazou, Austin Bourgerie, Jeffrey S. Foster, Emina Torlak

Abstract: Refinement types are a popular way to specify and reason about key program properties. In this paper, we introduce RTR, a new system that adds refinement types to Ruby. RTR is built on top of RDL, a Ruby type checker that provides basic type information for the verification process. RTR works by encoding its verification problems into Rosette, a solver-aided host language. RTR handles mixins throu… ▽ More Refinement types are a popular way to specify and reason about key program properties. In this paper, we introduce RTR, a new system that adds refinement types to Ruby. RTR is built on top of RDL, a Ruby type checker that provides basic type information for the verification process. RTR works by encoding its verification problems into Rosette, a solver-aided host language. RTR handles mixins through assume-guarantee reasoning and uses just-in-time verification for metaprogramming. We formalize RTR by showing a translation from a core, Ruby-like language with refinement types into Rosette. We apply RTR to check a range of functional correctness properties on six Ruby programs. We find that RTR can successfully verify key methods in these programs, taking only a few minutes to perform verification. △ Less

Submitted 25 November, 2017; originally announced November 2017.

arXiv:1709.03441 [pdf, ps, other]

The Diverse Cohort Selection Problem

Authors: Candice Schumann, Samsara N. Counts, Jeffrey S. Foster, John P. Dickerson

Abstract: How should a firm allocate its limited interviewing resources to select the optimal cohort of new employees from a large set of job applicants? How should that firm allocate cheap but noisy resume screenings and expensive but in-depth in-person interviews? We view this problem through the lens of combinatorial pure exploration (CPE) in the multi-armed bandit setting, where a central learning agent… ▽ More How should a firm allocate its limited interviewing resources to select the optimal cohort of new employees from a large set of job applicants? How should that firm allocate cheap but noisy resume screenings and expensive but in-depth in-person interviews? We view this problem through the lens of combinatorial pure exploration (CPE) in the multi-armed bandit setting, where a central learning agent performs costly exploration of a set of arms before selecting a final subset with some combinatorial structure. We generalize a recent CPE algorithm to the setting where arm pulls can have different costs and return different levels of information. We then prove theoretical upper bounds for a general class of arm-pulling strategies in this new setting. We apply our general algorithm to a real-world problem with combinatorial structure: incorporating diversity into university admissions. We take real data from admissions at one of the largest US-based computer science graduate programs and show that a simulation of our algorithm produces a cohort with hiring overall utility while spending comparable budget to the current admissions process at that university. △ Less

Submitted 14 March, 2019; v1 submitted 11 September, 2017; originally announced September 2017.

arXiv:1604.03641 [pdf, ps, other]

Just-in-Time Static Type Checking for Dynamic Languages

Authors: Brianna M. Ren, Jeffrey S. Foster

Abstract: Dynamic languages such as Ruby, Python, and JavaScript have many compelling benefits, but the lack of static types means subtle errors can remain latent in code for a long time. While many researchers have developed various systems to bring some of the benefits of static types to dynamic languages, prior approaches have trouble dealing with metaprogramming, which generates code as the program exec… ▽ More Dynamic languages such as Ruby, Python, and JavaScript have many compelling benefits, but the lack of static types means subtle errors can remain latent in code for a long time. While many researchers have developed various systems to bring some of the benefits of static types to dynamic languages, prior approaches have trouble dealing with metaprogramming, which generates code as the program executes. In this paper, we propose Hummingbird, a new system that uses a novel technique, just-in-time static type checking, to type check Ruby code even in the presence of metaprogramming. In Hummingbird, method type signatures are gathered dynamically at run-time, as those methods are created. When a method is called, Hummingbird statically type checks the method body against current type signatures. Thus, Hummingbird provides thorough static checks on a per-method basis, while also allowing arbitrarily complex metaprogramming. For performance, Hummingbird memoizes the static type checking pass, invalidating cached checks only if necessary. We formalize Hummingbird using a core, Ruby-like language and prove it sound. To evaluate Hummingbird, we applied it to six apps, including three that use Ruby on Rails, a powerful framework that relies heavily on metaprogramming. We found that all apps typecheck successfully using Hummingbird, and that Hummingbird's performance overhead is reasonable. We applied Hummingbird to earlier versions of one Rails app and found several type errors that had been introduced and then fixed. Lastly, we demonstrate using Hummingbird in Rails development mode to typecheck an app as live updates are applied to it. △ Less

Submitted 12 April, 2016; originally announced April 2016.

Comments: 19 pages, 6 figures, 2 tables, 1 appendix, This is a preprint of a paper to appear in Programming Language Design and Implementation (PLDI 2016)

ACM Class: F.3.2

arXiv:1507.03577 [pdf, other]

JSKETCH: Sketching for Java

Authors: **seong Jeon, Xiaokang Qiu, Jeffrey S. Foster, Armando Solar-Lezama

Abstract: Sketch-based synthesis, epitomized by the SKETCH tool, lets developers synthesize software starting from a partial program, also called a sketch or template. This paper presents JSKETCH, a tool that brings sketch-based synthesis to Java. JSKETCH's input is a partial Java program that may include holes, which are unknown constants, expression generators, which range over sets of expressions, and cl… ▽ More Sketch-based synthesis, epitomized by the SKETCH tool, lets developers synthesize software starting from a partial program, also called a sketch or template. This paper presents JSKETCH, a tool that brings sketch-based synthesis to Java. JSKETCH's input is a partial Java program that may include holes, which are unknown constants, expression generators, which range over sets of expressions, and class generators, which are partial classes. JSKETCH then translates the synthesis problem into a SKETCH problem; this translation is complex because SKETCH is not object-oriented. Finally, JSKETCH synthesizes an executable Java program by interpreting the output of SKETCH. △ Less

Submitted 13 July, 2015; originally announced July 2015.

Comments: This research was supported in part by NSF CCF-1139021, CCF- 1139056, CCF-1161775, and the partnership between UMIACS and the Laboratory for Telecommunication Sciences

ACM Class: I.2.2; F.3.1

arXiv:1504.03711 [pdf, other]

Checking Interaction-Based Declassification Policies for Android Using Symbolic Execution

Authors: Kristopher Micinski, Jonathan Fetter-Degges, **seong Jeon, Jeffrey S. Foster, Michael R. Clarkson

Abstract: Mobile apps can access a wide variety of secure information, such as contacts and location. However, current mobile platforms include only coarse access control mechanisms to protect such data. In this paper, we introduce interaction-based declassification policies, in which the user's interactions with the app constrain the release of sensitive information. Our policies are defined extensionally,… ▽ More Mobile apps can access a wide variety of secure information, such as contacts and location. However, current mobile platforms include only coarse access control mechanisms to protect such data. In this paper, we introduce interaction-based declassification policies, in which the user's interactions with the app constrain the release of sensitive information. Our policies are defined extensionally, so as to be independent of the app's implementation, based on sequences of security-relevant events that occur in app runs. Policies use LTL formulae to precisely specify which secret inputs, read at which times, may be released. We formalize a semantic security condition, interaction-based noninterference, to define our policies precisely. Finally, we describe a prototype tool that uses symbolic execution to check interaction-based declassification policies for Android, and we show that it enforces policies correctly on a set of apps. △ Less

Submitted 29 July, 2015; v1 submitted 14 April, 2015; originally announced April 2015.

Comments: This research was supported in part by NSF grants CNS-1064997 and 1421373, AFOSR grants FA9550-12-1-0334 and FA9550-14-1-0334, a partnership between UMIACS and the Laboratory for Telecommunication Sciences, and the National Security Agency

arXiv:1503.07792 [pdf, other]

doi 10.1145/2814270.2814305

Incremental Computation with Names

Authors: Matthew A. Hammer, Jana Dunfield, Kyle Headley, Nicholas Labich, Jeffrey S. Foster, Michael Hicks, David Van Horn

Abstract: Over the past thirty years, there has been significant progress in develo** general-purpose, language-based approaches to incremental computation, which aims to efficiently update the result of a computation when an input is changed. A key design challenge in such approaches is how to provide efficient incremental support for a broad range of programs. In this paper, we argue that first-class na… ▽ More Over the past thirty years, there has been significant progress in develo** general-purpose, language-based approaches to incremental computation, which aims to efficiently update the result of a computation when an input is changed. A key design challenge in such approaches is how to provide efficient incremental support for a broad range of programs. In this paper, we argue that first-class names are a critical linguistic feature for efficient incremental computation. Names identify computations to be reused across differing runs of a program, and making them first class gives programmers a high level of control over reuse. We demonstrate the benefits of names by presenting NOMINAL ADAPTON, an ML-like language for incremental computation with names. We describe how to use NOMINAL ADAPTON to efficiently incrementalize several standard programming patterns -- including maps, folds, and unfolds -- and show how to build efficient, incremental probabilistic trees and tries. Since NOMINAL ADAPTON's implementation is subtle, we formalize it as a core calculus and prove it is from-scratch consistent, meaning it always produces the same answer as simply re-running the computation. Finally, we demonstrate that NOMINAL ADAPTON can provide large speedups over both from-scratch computation and ADAPTON, a previous state-of-the-art incremental computation system. △ Less

Submitted 23 March, 2021; v1 submitted 26 March, 2015; originally announced March 2015.

Comments: OOPSLA '15, October 25-30, 2015, Pittsburgh, PA, USA

ACM Class: D.3.1; D.3.3; F.3.2

Showing 1–13 of 13 results for author: Foster, J S