Skip to main content

Showing 1–11 of 11 results for author: Buratti, L

.
  1. arXiv:2406.14712  [pdf, ps, other

    quant-ph cs.AI

    Qiskit HumanEval: An Evaluation Benchmark For Quantum Code Generative Models

    Authors: Sanjay Vishwakarma, Francis Harkins, Siddharth Golecha, Vishal Sharathchandra Bajpe, Nicolas Dupuis, Luca Buratti, David Kremer, Ismael Faro, Ruchir Puri, Juan Cruz-Benito

    Abstract: Quantum programs are typically developed using quantum Software Development Kits (SDKs). The rapid advancement of quantum computing necessitates new tools to streamline this development process, and one such tool could be Generative Artificial intelligence (GenAI). In this study, we introduce and use the Qiskit HumanEval dataset, a hand-curated collection of tasks designed to benchmark the ability… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  2. arXiv:2405.19495  [pdf, ps, other

    quant-ph cs.AI

    Qiskit Code Assistant: Training LLMs for generating Quantum Computing Code

    Authors: Nicolas Dupuis, Luca Buratti, Sanjay Vishwakarma, Aitana Viudes Forrat, David Kremer, Ismael Faro, Ruchir Puri, Juan Cruz-Benito

    Abstract: Code Large Language Models (Code LLMs) have emerged as powerful tools, revolutionizing the software development landscape by automating the coding process and reducing time and effort required to build applications. This paper focuses on training Code LLMs to specialize in the field of quantum computing. We begin by discussing the unique needs of quantum computing programming, which differ signifi… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  3. arXiv:2402.17442  [pdf, other

    cs.SE cs.AI cs.PL

    Ansible Lightspeed: A Code Generation Service for IT Automation

    Authors: Priyam Sahoo, Saurabh Pujar, Ganesh Nalawade, Richard Gebhardt, Louis Mandel, Luca Buratti

    Abstract: The availability of Large Language Models (LLMs) which can generate code, has made it possible to create tools that improve developer productivity. Integrated development environments or IDEs which developers use to write software are often used as an interface to interact with LLMs. Although many such tools have been released, almost all of them focus on general-purpose programming languages. Dom… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  4. arXiv:2310.16937  [pdf, other

    cs.CL

    Learning Transfers over Several Programming Languages

    Authors: Razan Baltaji, Saurabh Pujar, Louis Mandel, Martin Hirzel, Luca Buratti, Lav Varshney

    Abstract: Large language models (LLMs) have become remarkably good at improving developer productivity for high-resource programming languages. These models use two kinds of data: large amounts of unlabeled code samples for pre-training and relatively smaller amounts of labeled code samples for fine-tuning or in-context learning. Unfortunately, many programming languages are low-resource, lacking labeled sa… ▽ More

    Submitted 25 March, 2024; v1 submitted 25 October, 2023; originally announced October 2023.

    Comments: 15 pages, 9 figures, 8 tables

    ACM Class: I.2.7; I.2.5

  5. arXiv:2310.14053  [pdf, other

    cs.LG cs.CL cs.SE

    Beyond Accuracy: Evaluating Self-Consistency of Code Large Language Models with IdentityChain

    Authors: Marcus J. Min, Yangruibo Ding, Luca Buratti, Saurabh Pujar, Gail Kaiser, Suman Jana, Baishakhi Ray

    Abstract: Code Large Language Models (Code LLMs) are being increasingly employed in real-life applications, so evaluating them is critical. While the conventional accuracy evaluates the performance of Code LLMs on a set of individual tasks, their self-consistency across different tasks is overlooked. Intuitively, a trustworthy model should be self-consistent when generating natural language specifications f… ▽ More

    Submitted 26 February, 2024; v1 submitted 21 October, 2023; originally announced October 2023.

    Comments: ICLR 2024

    MSC Class: 68 ACM Class: I.2; D.2

  6. arXiv:2306.03234  [pdf, other

    cs.SE

    CONCORD: Clone-aware Contrastive Learning for Source Code

    Authors: Yangruibo Ding, Saikat Chakraborty, Luca Buratti, Saurabh Pujar, Alessandro Morari, Gail Kaiser, Baishakhi Ray

    Abstract: Deep Learning (DL) models to analyze source code have shown immense promise during the past few years. More recently, self-supervised pre-training has gained traction for learning generic code representations valuable for many downstream SE tasks, such as clone and bug detection. While previous work successfully learned from different code abstractions (e.g., token, AST, graph), we argue that it… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

    Comments: Camera-ready for 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 23)

  7. arXiv:2305.02783  [pdf, ps, other

    cs.SE cs.AI cs.CL cs.PL

    Automated Code generation for Information Technology Tasks in YAML through Large Language Models

    Authors: Saurabh Pujar, Luca Buratti, Xiaojie Guo, Nicolas Dupuis, Burn Lewis, Sahil Suneja, Atin Sood, Ganesh Nalawade, Matthew Jones, Alessandro Morari, Ruchir Puri

    Abstract: The recent improvement in code generation capabilities due to the use of large language models has mainly benefited general purpose programming languages. Domain specific languages, such as the ones used for IT Automation, have received far less attention, despite involving many active developers and being an essential component of modern cloud platforms. This work focuses on the generation of Ans… ▽ More

    Submitted 23 May, 2023; v1 submitted 2 May, 2023; originally announced May 2023.

  8. arXiv:2110.03868  [pdf, other

    cs.PL cs.AI cs.LG cs.SE

    Towards Learning (Dis)-Similarity of Source Code from Program Contrasts

    Authors: Yangruibo Ding, Luca Buratti, Saurabh Pujar, Alessandro Morari, Baishakhi Ray, Saikat Chakraborty

    Abstract: Understanding the functional (dis)-similarity of source code is significant for code modeling tasks such as software vulnerability and code clone detection. We present DISCO(DIS-similarity of COde), a novel self-supervised model focusing on identifying (dis)similar functionalities of source code. Different from existing works, our approach does not require a huge amount of randomly collected datas… ▽ More

    Submitted 20 March, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

    Comments: ACL 2022 Camera-Ready

  9. arXiv:2105.12655  [pdf, other

    cs.SE cs.AI

    CodeNet: A Large-Scale AI for Code Dataset for Learning a Diversity of Coding Tasks

    Authors: Ruchir Puri, David S. Kung, Geert Janssen, Wei Zhang, Giacomo Domeniconi, Vladimir Zolotov, Julian Dolby, Jie Chen, Mihir Choudhury, Lindsey Decker, Veronika Thost, Luca Buratti, Saurabh Pujar, Shyam Ramji, Ulrich Finkler, Susan Malaika, Frederick Reiss

    Abstract: Over the last several decades, software has been woven into the fabric of every aspect of our society. As software development surges and code infrastructure of enterprise applications ages, it is now more critical than ever to increase software development productivity and modernize legacy applications. Advances in deep learning and machine learning algorithms have enabled numerous breakthroughs,… ▽ More

    Submitted 29 August, 2021; v1 submitted 24 May, 2021; originally announced May 2021.

    Comments: 22 pages including references

  10. arXiv:2102.07995  [pdf, other

    cs.SE cs.AI cs.LG

    D2A: A Dataset Built for AI-Based Vulnerability Detection Methods Using Differential Analysis

    Authors: Yunhui Zheng, Saurabh Pujar, Burn Lewis, Luca Buratti, Edward Epstein, Bo Yang, Jim Laredo, Alessandro Morari, Zhong Su

    Abstract: Static analysis tools are widely used for vulnerability detection as they understand programs with complex behavior and millions of lines of code. Despite their popularity, static analysis tools are known to generate an excess of false positives. The recent ability of Machine Learning models to understand programming languages opens new possibilities when applied to static analysis. However, exist… ▽ More

    Submitted 16 February, 2021; originally announced February 2021.

    Comments: Accepted to the 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP '21)

  11. arXiv:2006.12641  [pdf, ps, other

    cs.CL cs.LG cs.PL

    Exploring Software Naturalness through Neural Language Models

    Authors: Luca Buratti, Saurabh Pujar, Mihaela Bornea, Scott McCarley, Yunhui Zheng, Gaetano Rossiello, Alessandro Morari, Jim Laredo, Veronika Thost, Yufan Zhuang, Giacomo Domeniconi

    Abstract: The Software Naturalness hypothesis argues that programming languages can be understood through the same techniques used in natural language processing. We explore this hypothesis through the use of a pre-trained transformer-based language model to perform code analysis tasks. Present approaches to code analysis depend heavily on features derived from the Abstract Syntax Tree (AST) while our trans… ▽ More

    Submitted 24 June, 2020; v1 submitted 22 June, 2020; originally announced June 2020.