Skip to main content

Showing 1–37 of 37 results for author: Reps, T

.
  1. arXiv:2406.01792  [pdf, other

    cs.PL

    The SemGuS Toolkit

    Authors: Keith J. C. Johnson, Andrew Reynolds, Thomas Reps, Loris D'Antoni

    Abstract: Semantics-Guided Synthesis (SemGuS) is a programmable framework for defining synthesis problems in a domain- and solver-agnostic way. This paper presents the standardized SemGuS format, together with an open-source toolkit that provides a parser, a verifier, and enumerative SemGuS solvers. The paper also describes an initial set of SemGuS benchmarks, which form the basis for comparing SemGuS solve… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  2. arXiv:2406.01525  [pdf, other

    cs.SC cs.DM cs.DS cs.FL

    Polynomial Bounds of CFLOBDDs against BDDs

    Authors: Xusheng Zhi, Thomas Reps

    Abstract: Binary Decision Diagrams (BDDs) are widely used for the representation of Boolean functions. Context-Free-Language Ordered Decision Diagrams (CFLOBDDs) are a plug-compatible replacement for BDDs -- roughly, they are BDDs augmented with a certain form of procedure call. A natural question to ask is, ``For a given Boolean function $f$, what is the relationship between the size of a BDD for $f$ and t… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    ACM Class: I.1.1; G.2.2; F.4.3

  3. arXiv:2405.15282  [pdf, other

    cs.LG cs.AI

    Prompt Tuning Strikes Back: Customizing Foundation Models with Low-Rank Prompt Adaptation

    Authors: Abhinav Jain, Swarat Chaudhuri, Thomas Reps, Chris Jermaine

    Abstract: Parameter-Efficient Fine-Tuning (PEFT) has become the standard for customising Foundation Models (FMs) to user-specific downstream tasks. However, typical PEFT methods require storing multiple task-specific adapters, creating scalability issues as these adapters must be housed and run at the FM server. Traditional prompt tuning offers a potential solution by customising them through task-specific… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 11 pages, 4 figures, 3 tables

  4. arXiv:2401.13244  [pdf, ps, other

    cs.PL

    Automating Unrealizability Logic: Hoare-style Proof Synthesis for Infinite Sets of Programs

    Authors: Shaan Nagy, **woo Kim, Loris D'Antoni, Thomas Reps

    Abstract: Unrealizability logic (UL) was proposed by Kim et al. as the first Hoare-style proof system for proving properties that hold for an infinite set of programs (defined by a regular tree grammar). The goal of our work is to automate reasoning and proof generation for UL. A key ingredient in UL is the notion of nonterminal summaries-inductive facts that characterize recursive nonterminals in the gramm… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

    Comments: 30 pages, 5 figures, 2 tables

  5. arXiv:2310.13144  [pdf, other

    cs.PL

    Optimal Symbolic Bound Synthesis

    Authors: John Cyphert, Yotam Feldman, Zachary Kincaid, Thomas Reps

    Abstract: The problem of finding a constant bound on a term given a set of assumptions has wide applications in optimization as well as program analysis. However, in many contexts the objective term may be unbounded. Still, some sort of symbolic bound may be useful. In this paper we introduce the optimal symbolic-bound synthesis problem, and a technique that tackles this problem for non-linear arithmetic wi… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

  6. arXiv:2310.06758  [pdf, other

    cs.SE cs.PL

    slash: A Technique for Static Configuration-Logic Identification

    Authors: Mohannad Alhanahnah, Philipp Schubert, Thomas Reps, Somesh Jha, Eric Bodden

    Abstract: Researchers have recently devised tools for debloating software and detecting configuration errors. Several of these tools rely on the observation that programs are composed of an initialization phase followed by a main-computation phase. Users of these tools are required to manually annotate the boundary that separates these phases, a task that can be time-consuming and error-prone (typically, th… ▽ More

    Submitted 20 November, 2023; v1 submitted 10 October, 2023; originally announced October 2023.

  7. arXiv:2308.06956  [pdf, ps, other

    cs.PL

    Modular System Synthesis

    Authors: Kanghee Park, Keith J. C. Johnson, Loris D'Antoni, Thomas Reps

    Abstract: This paper describes a way to improve the scalability of program synthesis by exploiting modularity: larger programs are synthesized from smaller programs. The key issue is to make each "larger-created-from-smaller" synthesis sub-problem be of a similar nature, so that the kind of synthesis sub-problem that needs to be solved--and the size of each search space--has roughly the same character at ea… ▽ More

    Submitted 14 August, 2023; originally announced August 2023.

  8. arXiv:2307.09064  [pdf, other

    cs.PL

    Newtonian Program Analysis of Probabilistic Programs

    Authors: Di Wang, Thomas Reps

    Abstract: Due to their quantitative nature, probabilistic programs pose non-trivial challenges for designing compositional and efficient program analyses. Many analyses for probabilistic programs rely on iterative approximation. This article presents an interprocedural dataflow-analysis framework, called NPA-PMA, for designing and implementing (partially) non-iterative program analyses of probabilistic prog… ▽ More

    Submitted 7 March, 2024; v1 submitted 18 July, 2023; originally announced July 2023.

  9. arXiv:2305.18341  [pdf, other

    cs.PL cs.AI cs.LG

    Coarse-Tuning Models of Code with Reinforcement Learning Feedback

    Authors: Abhinav Jain, Chima Adiole, Swarat Chaudhuri, Thomas Reps, Chris Jermaine

    Abstract: Large Language Models (LLMs) pre-trained on code have recently emerged as the dominant approach to program synthesis. However, these models are trained using next-token prediction, which ignores the syntax and semantics of code. We propose RLCF, that further trains a pre-trained LLM via reinforcement learning, using feedback from a grounding function that scores the quality of the code. The ground… ▽ More

    Submitted 23 December, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: 23 pages

  10. arXiv:2305.13610  [pdf, other

    cs.FL quant-ph

    Weighted Context-Free-Language Ordered Binary Decision Diagrams

    Authors: Meghana Sistla, Swarat Chaudhuri, Thomas Reps

    Abstract: Over the years, many variants of Binary Decision Diagrams (BDDs) have been developed to address the deficiencies of vanilla BDDs. A recent innovation is the Context-Free-Language Ordered BDD (CFLOBDD), a hierarchically structured decision diagram, akin to BDDs enhanced with a procedure-call mechanism, which allows substructures to be shared in ways not possible with BDDs. For some functions, CFLOB… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: 21 pages

  11. arXiv:2302.04349  [pdf, other

    cs.FL cs.SC quant-ph

    Symbolic Quantum Simulation with Quasimodo

    Authors: Meghana Sistla, Swarat Chaudhuri, Thomas Reps

    Abstract: The simulation of quantum circuits on classical computers is an important problem in quantum computing. Such simulation requires representations of distributions over very large sets of basis vectors, and recent work has used symbolic data-structures such as Binary Decision Diagrams (BDDs) for this purpose. In this tool paper, we present Quasimodo, an extensible, open-source Python library for sym… ▽ More

    Submitted 29 May, 2023; v1 submitted 8 February, 2023; originally announced February 2023.

    Comments: 15 pages; 35th International Conference on Computer Aided Verification (CAV 2023)

  12. arXiv:2301.11117  [pdf, other

    cs.PL

    Synthesizing Specifications

    Authors: Kanghee Park, Loris D'Antoni, Thomas Reps

    Abstract: Every program should be accompanied by a specification that describes important aspects of the code's behavior, but writing good specifications is often harder than writing the code itself. This paper addresses the problem of synthesizing specifications automatically, guided by user-supplied inputs of two kinds: i) a query posed about a set of function definitions, and ii) a domain-specific langua… ▽ More

    Submitted 5 February, 2024; v1 submitted 26 January, 2023; originally announced January 2023.

  13. arXiv:2211.07117  [pdf, ps, other

    cs.PL cs.LO

    Unrealizability Logic

    Authors: **woo Kim, Loris D'Antoni, Thomas Reps

    Abstract: We consider the problem of establishing that a program-synthesis problem is unrealizable (i.e., has no solution in a given search space of programs). Prior work on unrealizability has developed some automatic techniques to establish that a problem is unrealizable; however, these techniques are all black-box, meaning that they conceal the reasoning behind why a synthesis problem is unrealizable. In… ▽ More

    Submitted 8 March, 2023; v1 submitted 14 November, 2022; originally announced November 2022.

  14. CFLOBDDs: Context-Free-Language Ordered Binary Decision Diagrams

    Authors: Meghana Sistla, Swarat Chaudhuri, Thomas Reps

    Abstract: This paper presents a new compressed representation of Boolean functions, called CFLOBDDs (for Context-Free-Language Ordered Binary Decision Diagrams). They are essentially a plug-compatible alternative to BDDs (Binary Decision Diagrams), and hence useful for representing certain classes of functions, matrices, graphs, relations, etc. in a highly compressed fashion. CFLOBDDs share many of the good… ▽ More

    Submitted 6 May, 2024; v1 submitted 12 November, 2022; originally announced November 2022.

    Comments: 144 pages

    Journal ref: TOPLAS 2024

  15. arXiv:2111.01633  [pdf, other

    cs.LG cs.AI cs.PL

    Neural Program Generation Modulo Static Analysis

    Authors: Rohan Mukherjee, Yeming Wen, Dipak Chaudhari, Thomas W. Reps, Swarat Chaudhuri, Chris Jermaine

    Abstract: State-of-the-art neural models of source code tend to be evaluated on the generation of individual expressions and lines of code, and commonly fail on long-horizon tasks such as the generation of entire method bodies. We propose to address this deficiency using weak supervision from a static program analyzer. Our neurosymbolic method allows a deep generative model to symbolically compute, using ca… ▽ More

    Submitted 22 November, 2021; v1 submitted 26 October, 2021; originally announced November 2021.

    Comments: Accepted for publication at Neurips 2021

  16. arXiv:2109.02775  [pdf, other

    cs.SE

    Lightweight, Multi-Stage, Compiler-Assisted Application Specialization

    Authors: Mohannad Alhanahnah, Rithik Jain, Vaibhav Rastogi, Somesh Jha, Thomas Reps

    Abstract: Program debloating aims to enhance the performance and reduce the attack surface of bloated applications. Several techniques have been recently proposed to specialize programs. These approaches are either based on unsound strategies or demanding techniques, leading to unsafe results or a high overhead debloating process. In this paper, we address these limitations by applying partial-evaluation pr… ▽ More

    Submitted 6 September, 2021; originally announced September 2021.

  17. arXiv:2105.00493  [pdf, other

    cs.PL

    Synthesizing Abstract Transformers

    Authors: Pankaj Kumar Kalita, Sujit Kumar Muduli, Loris D'Antoni, Thomas Reps, Subhajit Roy

    Abstract: This paper addresses the problem of creating abstract transformers automatically. The method we present automates the construction of static analyzers in a fashion similar to the way $\textit{yacc}$ automates the construction of parsers. Our method treats the problem as a program-synthesis problem. The user provides specifications of (i) the concrete semantics of a given operation $\textit{op}$, (… ▽ More

    Submitted 15 August, 2022; v1 submitted 2 May, 2021; originally announced May 2021.

    Comments: 17 Figures, 5 Tables, 32 pages

  18. arXiv:2104.03598  [pdf, other

    cs.PL

    Sound Probabilistic Inference via Guide Types

    Authors: Di Wang, Jan Hoffmann, Thomas Reps

    Abstract: Probabilistic programming languages aim to describe and automate Bayesian modeling and inference. Modern languages support programmable inference, which allows users to customize inference algorithms by incorporating guide programs to improve inference performance. For Bayesian inference to be sound, guide programs must be compatible with model programs. One pervasive but challenging condition for… ▽ More

    Submitted 8 April, 2021; originally announced April 2021.

  19. arXiv:2103.16105  [pdf, ps, other

    cs.PL

    Expected-Cost Analysis for Probabilistic Programs and Semantics-Level Adaption of Optional Stop** Theorems

    Authors: Di Wang, Jan Hoffmann, Thomas Reps

    Abstract: In this article, we present a semantics-level adaption of the Optional Stop** Theorem, sketch an expected-cost analysis as its application, and survey different variants of the Optional Stop** Theorem that have been used in static analysis of probabilistic programs.

    Submitted 30 March, 2021; originally announced March 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:2001.10150

  20. arXiv:2103.04188  [pdf, ps, other

    cs.PL

    Synthesis with Asymptotic Resource Bounds

    Authors: Qinhe** Hu, John Cyphert, Loris D'Antoni, Thomas Reps

    Abstract: We present a method for synthesizing recursive functions that satisfy both a functional specification and an asymptotic resource bound. Prior methods for synthesis with a resource metric require the user to specify a concrete expression exactly describing resource usage, whereas our method uses big-O notation to specify the asymptotic resource usage. Our method can synthesize programs with complex… ▽ More

    Submitted 26 May, 2021; v1 submitted 6 March, 2021; originally announced March 2021.

  21. arXiv:2103.02591  [pdf, other

    cs.SE

    Shipwright: A Human-in-the-Loop System for Dockerfile Repair

    Authors: Jordan Henkel, Denini Silva, Leopoldo Teixeira, Marcelo d'Amorim, Thomas Reps

    Abstract: Docker is a tool for lightweight OS-level virtualization. Docker images are created by performing a build, controlled by a source-level artifact called a Dockerfile. We studied Dockerfiles on GitHub, and -- to our great surprise -- found that over a quarter of the examined Dockerfiles failed to build (and thus to produce images). To address this problem, we propose SHIPWRIGHT, a human-in-the-loop… ▽ More

    Submitted 3 March, 2021; originally announced March 2021.

    Comments: Published in ICSE'2021

  22. Semantics-Guided Synthesis

    Authors: **woo Kim, Qinhe** Hu, Loris D'Antoni, Thomas Reps

    Abstract: This paper develops a new framework for program synthesis, called semantics-guided synthesis (SemGuS), that allows a user to provide both the syntax and the semantics for the constructs in the language. SemGuS accepts a recursively defined big-step semantics, which allows it, for example, to be used to specify and solve synthesis problems over an imperative programming language that may contain lo… ▽ More

    Submitted 11 November, 2020; v1 submitted 22 August, 2020; originally announced August 2020.

    Comments: Updated version for camera-ready

  23. arXiv:2005.06645  [pdf, other

    cs.PL

    A Generating-Extension-Generator for Machine Code

    Authors: Michael Vaughn, Thomas Reps

    Abstract: The problem of "debloating" programs for security and performance purposes has begun to see increased attention. Of particular interest in many environments is debloating commodity off-the-shelf (COTS) software, which is most commonly made available to end users as stripped binaries (i.e., neither source code nor symbol-table/debugging information is available). Toward this end, we created a syste… ▽ More

    Submitted 14 May, 2020; v1 submitted 13 May, 2020; originally announced May 2020.

    Comments: 21 pages, 8 Figures Fixed inclusion of LaTeX macro in plaintext abstract

    ACM Class: D.3.0; D.2.7

  24. arXiv:2004.14375  [pdf, other

    cs.SE

    TOFU: Target-Oriented FUzzer

    Authors: Zi Wang, Ben Liblit, Thomas Reps

    Abstract: Program fuzzing---providing randomly constructed inputs to a computer program---has proved to be a powerful way to uncover bugs, find security vulnerabilities, and generate test inputs that increase code coverage. In many applications, however, one is interested in a target-oriented approach-one wants to find an input that causes the program to reach a specific target point in the program. We have… ▽ More

    Submitted 3 May, 2020; v1 submitted 29 April, 2020; originally announced April 2020.

  25. Exact and Approximate Methods for Proving Unrealizability of Syntax-Guided Synthesis Problems

    Authors: Qinhe** Hu, John Cyphert, Loris D'Antoni, Thomas Reps

    Abstract: We consider the problem of automatically establishing that a given syntax-guided-synthesis (SyGuS) problem is unrealizable (i.e., has no solution). We formulate the problem of proving that a SyGuS problem is unrealizable over a finite set of examples as one of solving a set of equations: the solution yields an overapproximation of the set of possible outputs that any term in the search space can p… ▽ More

    Submitted 2 April, 2020; originally announced April 2020.

    Journal ref: PLDI 2020

  26. arXiv:2003.13515  [pdf, other

    cs.PL

    Templates and Recurrences: Better Together

    Authors: Jason Breck, John Cyphert, Zachary Kincaid, Thomas Reps

    Abstract: This paper is the confluence of two streams of ideas in the literature on generating numerical invariants, namely: (1) template-based methods, and (2) recurrence-based methods. A template-based method begins with a template that contains unknown quantities, and finds invariants that match the template by extracting and solving constraints on the unknowns. A disadvantage of template-based methods i… ▽ More

    Submitted 30 March, 2020; originally announced March 2020.

    Comments: 20 pages, 3 figures

  27. A Dataset of Dockerfiles

    Authors: Jordan Henkel, Christian Bird, Shuvendu K. Lahiri, Thomas Reps

    Abstract: Dockerfiles are one of the most prevalent kinds of DevOps artifacts used in industry. Despite their prevalence, there is a lack of sophisticated semantics-aware static analysis of Dockerfiles. In this paper, we introduce a dataset of approximately 178,000 unique Dockerfiles collected from GitHub. To enhance the usability of this data, we describe five representations we have devised for working wi… ▽ More

    Submitted 28 March, 2020; originally announced March 2020.

    Comments: Published as a Data Showcase in MSR'2020

  28. Learning from, Understanding, and Supporting DevOps Artifacts for Docker

    Authors: Jordan Henkel, Christian Bird, Shuvendu K. Lahiri, Thomas Reps

    Abstract: With the growing use of DevOps tools and frameworks, there is an increased need for tools and techniques that support more than code. The current state-of-the-art in static developer assistance for tools like Docker is limited to shallow syntactic validation. We identify three core challenges in the realm of learning from, understanding, and supporting developers writing DevOps artifacts: (i) nest… ▽ More

    Submitted 7 February, 2020; originally announced February 2020.

    Comments: Published in ICSE'2020

  29. Semantic Robustness of Models of Source Code

    Authors: Goutham Ramakrishnan, Jordan Henkel, Zi Wang, Aws Albarghouthi, Somesh Jha, Thomas Reps

    Abstract: Deep neural networks are vulnerable to adversarial examples - small input perturbations that result in incorrect predictions. We study this problem for models of source code, where we want the network to be robust to source-code modifications that preserve code functionality. (1) We define a powerful adversary that can employ sequences of parametric, semantics-preserving program transformations; (… ▽ More

    Submitted 11 June, 2020; v1 submitted 7 February, 2020; originally announced February 2020.

  30. arXiv:2001.10150  [pdf, other

    cs.PL

    Central Moment Analysis for Cost Accumulators in Probabilistic Programs

    Authors: Di Wang, Jan Hoffmann, Thomas Reps

    Abstract: For probabilistic programs, it is usually not possible to automatically derive exact information about their properties, such as the distribution of states at a given program point. Instead, one can attempt to derive approximations, such as upper bounds on tail probabilities. Such bounds can be obtained via concentration inequalities, which rely on the moments of a distribution, such as the expect… ▽ More

    Submitted 8 April, 2021; v1 submitted 27 January, 2020; originally announced January 2020.

  31. arXiv:1905.05800  [pdf, ps, other

    cs.PL

    Proving Unrealizability for Syntax-Guided Synthesis

    Authors: Qinhe** Hu, Jason Breck, John Cyphert, Loris D'Antoni, Thomas Reps

    Abstract: Proving Unrealizability for Syntax-Guided Synthesis We consider the problem of automatically establishing that a given syntax-guided-synthesis (SyGuS) problem is unrealizable (i.e., has no solution). Existing techniques have quite limited ability to establish unrealizability for general SyGuS instances in which the grammar describing the search space contains infinitely many programs. By encodin… ▽ More

    Submitted 23 July, 2019; v1 submitted 14 May, 2019; originally announced May 2019.

  32. arXiv:1904.12098  [pdf, other

    cs.SE cs.LG

    Enabling Open-World Specification Mining via Unsupervised Learning

    Authors: Jordan Henkel, Shuvendu K. Lahiri, Ben Liblit, Thomas Reps

    Abstract: Many programming tasks require using both domain-specific code and well-established patterns (such as routines concerned with file IO). Together, several small patterns combine to create complex interactions. This compounding effect, mixed with domain-specific idiosyncrasies, creates a challenging environment for fully automatic specification inference. Mining specifications in this environment, w… ▽ More

    Submitted 26 April, 2019; originally announced April 2019.

  33. Code Vectors: Understanding Programs Through Embedded Abstracted Symbolic Traces

    Authors: Jordan Henkel, Shuvendu K. Lahiri, Ben Liblit, Thomas Reps

    Abstract: With the rise of machine learning, there is a great deal of interest in treating programs as data to be fed to learning algorithms. However, programs do not start off in a form that is immediately amenable to most off-the-shelf learning techniques. Instead, it is necessary to transform the program to a suitable representation before a learning technique can be applied. In this paper, we use abst… ▽ More

    Submitted 20 August, 2018; v1 submitted 18 March, 2018; originally announced March 2018.

  34. arXiv:1706.02769  [pdf, ps, other

    cs.SE cs.IR cs.PL

    Source Forager: A Search Engine for Similar Source Code

    Authors: Vineeth Kashyap, David Bingham Brown, Ben Liblit, David Melski, Thomas Reps

    Abstract: Developers spend a significant amount of time searching for code: e.g., to understand how to complete, correct, or adapt their own code for a new context. Unfortunately, the state of the art in code search has not evolved much beyond text search over tokenized source. Code has much richer structure and semantics than normal text, and this property can be exploited to specialize the code-search pro… ▽ More

    Submitted 8 June, 2017; originally announced June 2017.

    Comments: 11 pages

    MSC Class: 68N15

  35. arXiv:1705.09231  [pdf, other

    cs.AI cs.PL

    Neural Attribute Machines for Program Generation

    Authors: Matthew Amodio, Swarat Chaudhuri, Thomas W. Reps

    Abstract: Recurrent neural networks have achieved remarkable success at generating sequences with complex structures, thanks to advances that include richer embeddings of input and cures for vanishing gradients. Trained only on sequences from a known grammar, though, they can still struggle to learn rules and constraints of the grammar. Neural Attribute Machines (NAMs) are equipped with a logical machine th… ▽ More

    Submitted 26 October, 2021; v1 submitted 25 May, 2017; originally announced May 2017.

  36. Simulating reachability using first-order logic with applications to verification of linked data structures

    Authors: Tal Lev-Ami, Neil Immerman, Thomas Reps, Mooly Sagiv, Siddharth Srivastava, Greta Yorsh

    Abstract: This paper shows how to harness existing theorem provers for first-order logic to automatically verify safety properties of imperative programs that perform dynamic storage allocation and destructive updating of pointer-valued structure fields. One of the main obstacles is specifying and proving the (absence) of reachability properties among dynamically allocated cells. The main technical cont… ▽ More

    Submitted 27 May, 2009; v1 submitted 30 April, 2009; originally announced April 2009.

    Comments: 30 pages, LMCS

    ACM Class: F.3.1; F.3.2; F.4.1

    Journal ref: Logical Methods in Computer Science, Volume 5, Issue 2 (May 28, 2009) lmcs:680

  37. arXiv:cs/0312014  [pdf, ps, other

    cs.LO

    Logical Characterizations of Heap Abstractions

    Authors: G. Yorsh, T. Reps, M. Sagiv, R. Wilhelm

    Abstract: Shape analysis concerns the problem of determining "shape invariants" for programs that perform destructive updating on dynamically allocated storage. In recent work, we have shown how shape analysis can be performed, using an abstract interpretation based on 3-valued first-order logic. In that work, concrete stores are finite 2-valued logical structures, and the sets of stores that can possibly… ▽ More

    Submitted 16 March, 2005; v1 submitted 7 December, 2003; originally announced December 2003.

    ACM Class: D.2.4