Skip to main content

Showing 1–26 of 26 results for author: Lahiri, S K

.
  1. arXiv:2406.09757  [pdf, ps, other

    cs.PL cs.LG cs.SE

    Evaluating LLM-driven User-Intent Formalization for Verification-Aware Languages

    Authors: Shuvendu K. Lahiri

    Abstract: Verification-aware programming languages such as Dafny and F* provide means to formally specify and prove properties of programs. Although the problem of checking an implementation against a specification can be defined mechanically, there is no algorithmic way of ensuring the correctness of the user-intent formalization for programs -- that a specification adheres to the user's intent behind the… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  2. arXiv:2406.04693  [pdf, other

    cs.SE cs.AI cs.LG cs.PF

    LLM-Vectorizer: LLM-based Verified Loop Vectorizer

    Authors: Jubi Taneja, Avery Laird, Cong Yan, Madan Musuvathi, Shuvendu K. Lahiri

    Abstract: Vectorization is a powerful optimization technique that significantly boosts the performance of high performance computing applications operating on large data arrays. Despite decades of research on auto-vectorization, compilers frequently miss opportunities to vectorize code. On the other hand, writing vectorized code manually using compiler intrinsics is still a complex, error-prone task that de… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  3. arXiv:2405.06276  [pdf, other

    physics.flu-dyn

    Transition from laminar to turbulent pipe flow as a process of growing material instabilities

    Authors: Saptarshi Kumar Lahiri, Konstantin Volokh

    Abstract: In this work, we simulate the transition to turbulence in the pipe flow based on the modified NS theory incorporating the viscous fluid strength in the constitutive equations. The latter concept enriches theory by allowing for material instabilities in addition to the kinematic ones. We present results of comparative numerical simulations based on the classical NS model and the NS model enhanced w… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  4. arXiv:2404.10362  [pdf, other

    cs.SE

    3DGen: AI-Assisted Generation of Provably Correct Binary Format Parsers

    Authors: Sarah Fakhoury, Markus Kuppe, Shuvendu K. Lahiri, Tahina Ramananandro, Nikhil Swamy

    Abstract: Improper parsing of attacker-controlled input is a leading source of software security vulnerabilities, especially when programmers transcribe informal format descriptions in RFCs into efficient parsing logic in low-level, memory unsafe languages. Several researchers have proposed formal specification languages for data formats from which efficient code can be extracted. However, distilling inform… ▽ More

    Submitted 6 May, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

  5. arXiv:2404.10100  [pdf, other

    cs.SE

    LLM-based Test-driven Interactive Code Generation: User Study and Empirical Evaluation

    Authors: Sarah Fakhoury, Aaditya Naik, Georgios Sakkas, Saikat Chakraborty, Shuvendu K. Lahiri

    Abstract: Large language models (LLMs) have shown great potential in automating significant aspects of coding by producing natural code from informal natural language (NL) intent. However, given NL is informal, it does not lend easily to checking that the generated code correctly satisfies the user intent. In this paper, we propose a novel interactive workflow TiCoder for guided intent clarification (i.e.,… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  6. arXiv:2311.07948  [pdf, other

    cs.PL cs.LG

    Finding Inductive Loop Invariants using Large Language Models

    Authors: Adharsh Kamath, Aditya Senthilnathan, Saikat Chakraborty, Pantazis Deligiannis, Shuvendu K. Lahiri, Akash Lal, Aseem Rastogi, Subhajit Roy, Rahul Sharma

    Abstract: Loop invariants are fundamental to reasoning about programs with loops. They establish properties about a given loop's behavior. When they additionally are inductive, they become useful for the task of formal verification that seeks to establish strong mathematical guarantees about program's runtime behavior. The inductiveness ensures that the invariants can be checked locally without consulting t… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

  7. arXiv:2310.09342  [pdf, other

    cs.PL cs.AI cs.CL cs.SE

    Ranking LLM-Generated Loop Invariants for Program Verification

    Authors: Saikat Chakraborty, Shuvendu K. Lahiri, Sarah Fakhoury, Madanlal Musuvathi, Akash Lal, Aseem Rastogi, Aditya Senthilnathan, Rahul Sharma, Nikhil Swamy

    Abstract: Synthesizing inductive loop invariants is fundamental to automating program verification. In this work, we observe that Large Language Models (such as gpt-3.5 or gpt-4) are capable of synthesizing loop invariants for a class of programs in a 0-shot setting, yet require several samples to generate the correct invariants. This can lead to a large number of calls to a program verifier to establish an… ▽ More

    Submitted 12 February, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: Findings of The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP-findings 2023)

  8. arXiv:2310.01831  [pdf, other

    cs.SE cs.AI cs.PL

    Can Large Language Models Transform Natural Language Intent into Formal Method Postconditions?

    Authors: Madeline Endres, Sarah Fakhoury, Saikat Chakraborty, Shuvendu K. Lahiri

    Abstract: Informal natural language that describes code functionality, such as code comments or function documentation, may contain substantial information about a programs intent. However, there is typically no guarantee that a programs implementation and natural language documentation are aligned. In the case of a conflict, leveraging information in code-adjacent natural language has the potential to enha… ▽ More

    Submitted 15 April, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: To appear at the Proceedings of the ACM on Software Engineering (PACMSE), Issue Foundations of Software Engineering (FSE) 2024

  9. arXiv:2306.10763  [pdf, other

    cs.CL cs.AI cs.LG cs.PL cs.SE

    Guiding Language Models of Code with Global Context using Monitors

    Authors: Lakshya A Agrawal, Aditya Kanade, Navin Goyal, Shuvendu K. Lahiri, Sriram K. Rajamani

    Abstract: Language models of code (LMs) work well when the surrounding code provides sufficient context. This is not true when it becomes necessary to use types, functionality or APIs defined elsewhere in the repository or a linked library, especially those not seen during training. LMs suffer from limited awareness of such global context and end up hallucinating. Integrated development environments (IDEs… ▽ More

    Submitted 3 November, 2023; v1 submitted 19 June, 2023; originally announced June 2023.

    Comments: Accepted to NeurIPS 2023 and to appear as "Monitor-Guided Decoding of Code LMs with Static Analysis of Repository Context" at https://neurips.cc/virtual/2023/poster/70362 . Contents: 11 pages, 15 additional pages of appendix, 13 figures, 3 tables

    ACM Class: I.2.2; I.2.7; I.2.5

  10. arXiv:2304.03816  [pdf, other

    cs.SE cs.LG

    Towards Generating Functionally Correct Code Edits from Natural Language Issue Descriptions

    Authors: Sarah Fakhoury, Saikat Chakraborty, Madan Musuvathi, Shuvendu K. Lahiri

    Abstract: Large language models (LLMs), such as OpenAI's Codex, have demonstrated their potential to generate code from natural language descriptions across a wide range of programming tasks. Several benchmarks have recently emerged to evaluate the ability of LLMs to generate functionally correct code from natural language intent with respect to a set of hidden test cases. This has enabled the research comm… ▽ More

    Submitted 7 April, 2023; originally announced April 2023.

  11. arXiv:2208.05950  [pdf, other

    cs.SE cs.LG cs.PL

    Interactive Code Generation via Test-Driven User-Intent Formalization

    Authors: Shuvendu K. Lahiri, Sarah Fakhoury, Aaditya Naik, Georgios Sakkas, Saikat Chakraborty, Madanlal Musuvathi, Piali Choudhury, Curtis von Veh, Jeevana Priya Inala, Chenglong Wang, Jianfeng Gao

    Abstract: Large language models (LLMs) have shown great potential in automating significant aspects of coding by producing natural code from informal natural language (NL) intent. However, when interacting with LLMs, users have no guarantees that the code suggestions produced correctly satisfy the intent they provided. In fact, it is hard to define a notion of correctness since natural language can be ambig… ▽ More

    Submitted 3 October, 2023; v1 submitted 11 August, 2022; originally announced August 2022.

    Comments: 18 pages

  12. arXiv:2206.03865  [pdf, other

    cs.PL cs.AI cs.SE

    Fault-Aware Neural Code Rankers

    Authors: Jeevana Priya Inala, Chenglong Wang, Mei Yang, Andres Codas, Mark EncarnaciĆ³n, Shuvendu K Lahiri, Madanlal Musuvathi, Jianfeng Gao

    Abstract: Large language models (LLMs) have demonstrated an impressive ability to generate code for various programming tasks. In many instances, LLMs can generate a correct program for a task when given numerous trials. Consequently, a recent trend is to do large scale sampling of programs using a model and then filtering/ranking the programs based on the program execution on a small number of known unit t… ▽ More

    Submitted 9 December, 2022; v1 submitted 4 June, 2022; originally announced June 2022.

    Comments: In the proceedings of Advances in Neural Information Processing Systems, 2022

  13. arXiv:2111.11904  [pdf, other

    cs.SE

    Can Pre-trained Language Models be Used to Resolve Textual and Semantic Merge Conflicts?

    Authors: Jialu Zhang, Todd Mytkowicz, Mike Kaufman, Ruzica Piskac, Shuvendu K. Lahiri

    Abstract: Program merging is standard practice when developers integrate their individual changes to a common code base. When the merge algorithm fails, this is called a merge conflict. The conflict either manifests in textual merge conflicts where the merge fails to produce code, or semantic merge conflicts where the merged code results in compiler or test breaks. Resolving these conflicts for large code p… ▽ More

    Submitted 23 November, 2021; originally announced November 2021.

  14. SolType: Refinement Types for Arithmetic Overflow in Solidity

    Authors: Bryan Tan, Benjamin Mariano, Shuvendu K. Lahiri, Isil Dillig, Yu Feng

    Abstract: As smart contracts gain adoption in financial transactions, it becomes increasingly important to ensure that they are free of bugs and security vulnerabilities. Of particular relevance in this context are arithmetic overflow bugs, as integers are often used to represent financial assets like account balances. Motivated by this observation, this paper presents SolType, a refinement type system for… ▽ More

    Submitted 20 December, 2021; v1 submitted 1 October, 2021; originally announced October 2021.

    Comments: To appear in POPL '22. This is the extended version of the paper with the proofs, after the main text went through peer review. 51 pages, 15 figures

  15. TOGA: A Neural Method for Test Oracle Generation

    Authors: Elizabeth Dinella, Gabriel Ryan, Todd Mytkowicz, Shuvendu K. Lahiri

    Abstract: Testing is widely recognized as an important stage of the software development lifecycle. Effective software testing can provide benefits such as bug finding, preventing regressions, and documentation. In terms of documentation, unit tests express a unit's intended functionality, as conceived by the developer. A test oracle, typically expressed as an condition, documents the intended behavior of a… ▽ More

    Submitted 20 April, 2022; v1 submitted 19 September, 2021; originally announced September 2021.

    Comments: To appear in ICSE 2022

    ACM Class: D.2.5; D.2.1

  16. arXiv:2105.07569  [pdf, other

    cs.SE

    DeepMerge: Learning to Merge Programs

    Authors: Elizabeth Dinella, Todd Mytkowicz, Alexey Svyatkovskiy, Christian Bird, Mayur Naik, Shuvendu K. Lahiri

    Abstract: In collaborative software development, program merging is the mechanism to integrate changes from multiple programmers. Merge algorithms in modern version control systems report a conflict when changes interfere textually. Merge conflicts require manual intervention and frequently stall modern continuous integration pipelines. Prior work found that, although costly, a large majority of resolutions… ▽ More

    Submitted 6 September, 2021; v1 submitted 16 May, 2021; originally announced May 2021.

    Comments: 11 pages

  17. A Dataset of Dockerfiles

    Authors: Jordan Henkel, Christian Bird, Shuvendu K. Lahiri, Thomas Reps

    Abstract: Dockerfiles are one of the most prevalent kinds of DevOps artifacts used in industry. Despite their prevalence, there is a lack of sophisticated semantics-aware static analysis of Dockerfiles. In this paper, we introduce a dataset of approximately 178,000 unique Dockerfiles collected from GitHub. To enhance the usability of this data, we describe five representations we have devised for working wi… ▽ More

    Submitted 28 March, 2020; originally announced March 2020.

    Comments: Published as a Data Showcase in MSR'2020

  18. Learning from, Understanding, and Supporting DevOps Artifacts for Docker

    Authors: Jordan Henkel, Christian Bird, Shuvendu K. Lahiri, Thomas Reps

    Abstract: With the growing use of DevOps tools and frameworks, there is an increased need for tools and techniques that support more than code. The current state-of-the-art in static developer assistance for tools like Docker is limited to shallow syntactic validation. We identify three core challenges in the realm of learning from, understanding, and supporting developers writing DevOps artifacts: (i) nest… ▽ More

    Submitted 7 February, 2020; originally announced February 2020.

    Comments: Published in ICSE'2020

  19. A stable SPH with adaptive B-spline kernel

    Authors: Saptarshi Kumar Lahiri, Kanishka Bhattacharya, Amit Shaw, L S Ramachandra

    Abstract: Tensile instability, often observed in smoothed particle hydrodynamics (SPH), is a numerical artifact that manifests itself by unphysical clustering or separation of particles. The instability originates in estimating the derivatives of the smoothing functions which, when interact with material constitution may result in negative stiffness in the discretized system. In the present study, a stable… ▽ More

    Submitted 4 January, 2020; originally announced January 2020.

    Comments: 34 Pages, 22 Figures

    Journal ref: Journal of Computational Physics 422(2020) 109761

  20. arXiv:1904.12098  [pdf, other

    cs.SE cs.LG

    Enabling Open-World Specification Mining via Unsupervised Learning

    Authors: Jordan Henkel, Shuvendu K. Lahiri, Ben Liblit, Thomas Reps

    Abstract: Many programming tasks require using both domain-specific code and well-established patterns (such as routines concerned with file IO). Together, several small patterns combine to create complex interactions. This compounding effect, mixed with domain-specific idiosyncrasies, creates a challenging environment for fully automatic specification inference. Mining specifications in this environment, w… ▽ More

    Submitted 26 April, 2019; originally announced April 2019.

  21. arXiv:1812.08829  [pdf, other

    cs.PL

    Formal Specification and Verification of Smart Contracts for Azure Blockchain

    Authors: Yuepeng Wang, Shuvendu K. Lahiri, Shuo Chen, Rong Pan, Isil Dillig, Cody Born, Immad Naseer

    Abstract: Ensuring correctness of smart contracts is paramount to ensuring trust in blockchain-based systems. This paper studies the safety and security of smart contracts in the \emph{Azure Blockchain Workbench}, an enterprise Blockchain-as-a-Service offering from Microsoft. As part of this study, we formalize \emph{semantic conformance} of smart contracts against a state machine model with access-control… ▽ More

    Submitted 29 April, 2019; v1 submitted 20 December, 2018; originally announced December 2018.

    Comments: 13 pages

    ACM Class: F.3.1

  22. Code Vectors: Understanding Programs Through Embedded Abstracted Symbolic Traces

    Authors: Jordan Henkel, Shuvendu K. Lahiri, Ben Liblit, Thomas Reps

    Abstract: With the rise of machine learning, there is a great deal of interest in treating programs as data to be fed to learning algorithms. However, programs do not start off in a form that is immediately amenable to most off-the-shelf learning techniques. Instead, it is necessary to transform the program to a suitable representation before a learning technique can be applied. In this paper, we use abst… ▽ More

    Submitted 20 August, 2018; v1 submitted 18 March, 2018; originally announced March 2018.

  23. arXiv:1710.07660  [pdf, other

    cs.LO cs.DB

    Verifying Equivalence of Database-Driven Applications

    Authors: Yuepeng Wang, Isil Dillig, Shuvendu K. Lahiri, William R. Cook

    Abstract: This paper addresses the problem of verifying equivalence between a pair of programs that operate over databases with different schemas. This problem is particularly important in the context of web applications, which typically undergo database refactoring either for performance or maintainability reasons. While web applications should have the same externally observable behavior before and after… ▽ More

    Submitted 20 October, 2017; originally announced October 2017.

  24. arXiv:1609.08734  [pdf, ps, other

    cs.SE

    Interprocedural Semantic Change-Impact Analysis using Equivalence Relations

    Authors: Alex Gyori, Shuvendu K. Lahiri, Nimrod Partush

    Abstract: Change-impact analysis (CIA) is the task of determining the set of program elements impacted by a program change. Precise CIA has great potential to avoid expensive testing and code reviews for (parts of) changes that are refactorings (semantics-preserving). Existing CIA is imprecise because it is coarse-grained, deals with only few refactoring patterns, or is unaware of the change semantics. We… ▽ More

    Submitted 27 September, 2016; originally announced September 2016.

    Comments: 12 pages

    ACM Class: D.2.4; D.2.5

  25. arXiv:cs/0612003  [pdf, ps, other

    cs.LO cs.PL cs.SC

    Predicate Abstraction via Symbolic Decision Procedures

    Authors: Shuvendu K. Lahiri, Thomas Ball, Byron Cook

    Abstract: We present a new approach for performing predicate abstraction based on symbolic decision procedures. Intuitively, a symbolic decision procedure for a theory takes a set of predicates in the theory and symbolically executes a decision procedure on all the subsets over the set of predicates. The result of the symbolic decision procedure is a shared expression (represented by a directed acyclic gr… ▽ More

    Submitted 24 April, 2007; v1 submitted 1 December, 2006; originally announced December 2006.

    Comments: The final accepted paper for Logical Methods in Computer Science, special issue on CAV 2005. Editor Sriram Rajamani ([email protected]). Please perform make to build the paper. The pdf file is paper.pdf, and the comments for the referee's is present in referee_comments

    ACM Class: F.3.1; F.4.1

    Journal ref: Logical Methods in Computer Science, Volume 3, Issue 2 (April 24, 2007) lmcs:2218

  26. arXiv:cs/0407006  [pdf, ps, other

    cs.LO

    Predicate Abstraction with Indexed Predicates

    Authors: Shuvendu K. Lahiri, Randal E. Bryant

    Abstract: Predicate abstraction provides a powerful tool for verifying properties of infinite-state systems using a combination of a decision procedure for a subset of first-order logic and symbolic methods originally developed for finite-state model checking. We consider models containing first-order state variables, where the system state includes mutable functions and predicates. Such a model can descr… ▽ More

    Submitted 2 July, 2004; originally announced July 2004.

    Comments: 27 pages, 4 figures, 1 table, short version appeared in International Conference on Verification, Model Checking and Abstract Interpretation (VMCAI'04), LNCS 2937, pages = 267--281

    ACM Class: F.3.1