Skip to main content

Showing 1–18 of 18 results for author: Vasilescu, B

.
  1. arXiv:2407.02733  [pdf, other

    cs.CR

    STRIDE: Simple Type Recognition In Decompiled Executables

    Authors: Harrison Green, Edward J. Schwartz, Claire Le Goues, Bogdan Vasilescu

    Abstract: Decompilers are widely used by security researchers and developers to reverse engineer executable code. While modern decompilers are adept at recovering instructions, control flow, and function boundaries, some useful information from the original source code, such as variable types and names, is lost during the compilation process. Our work aims to predict these variable types and names from the… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  2. arXiv:2406.01966  [pdf, ps, other

    cs.SE

    Creativity, Generative AI, and Software Development: A Research Agenda

    Authors: Victoria Jackson, Bogdan Vasilescu, Daniel Russo, Paul Ralph, Maliheh Izadi, Rafael Prikladnicki, Sarah D'Angelo, Sarah Inman, Anielle Lisboa, Andre van der Hoek

    Abstract: Creativity has always been considered a major differentiator to separate the good from the great, and we believe the importance of creativity for software development will only increase as GenAI becomes embedded in developer tool-chains and working practices. This paper uses the McLuhan tetrad alongside scenarios of how GenAI may disrupt software development more broadly, to identify potential imp… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  3. arXiv:2310.10817  [pdf, other

    cs.SE cs.HC

    Understanding Documentation Use Through Log Analysis: An Exploratory Case Study of Four Cloud Services

    Authors: Daye Nam, Andrew Macvean, Brad Myers, Bogdan Vasilescu

    Abstract: Almost no modern software system is written from scratch, and developers are required to effectively learn to use third-party libraries or software services. Thus, many practitioners and researchers have looked for ways to create effective documentation that supports developers' learning. However, few efforts have focused on how people actually use the documentation. In this paper, we report on an… ▽ More

    Submitted 29 February, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

  4. arXiv:2307.08177  [pdf, other

    cs.SE cs.AI cs.HC

    Using an LLM to Help With Code Understanding

    Authors: Daye Nam, Andrew Macvean, Vincent Hellendoorn, Bogdan Vasilescu, Brad Myers

    Abstract: Understanding code is challenging, especially when working in new and complex development environments. Code comments and documentation can help, but are typically scarce or hard to navigate. Large language models (LLMs) are revolutionizing the process of writing code. Can they do the same for hel** understand it? In this study, we provide a first investigation of an LLM-based conversational UI… ▽ More

    Submitted 16 January, 2024; v1 submitted 16 July, 2023; originally announced July 2023.

  5. arXiv:2112.02650  [pdf, other

    cs.SE cs.CL cs.LG cs.PL

    VarCLR: Variable Semantic Representation Pre-training via Contrastive Learning

    Authors: Qibin Chen, Jeremy Lacomis, Edward J. Schwartz, Graham Neubig, Bogdan Vasilescu, Claire Le Goues

    Abstract: Variable names are critical for conveying intended program behavior. Machine learning-based program analysis methods use variable name representations for a wide range of tasks, such as suggesting new variable names and bug detection. Ideally, such methods could capture semantic relationships between names beyond syntactic similarity, e.g., the fact that the names average and mean are similar. Unf… ▽ More

    Submitted 5 December, 2021; originally announced December 2021.

    Comments: Accepted by ICSE 2022

  6. arXiv:2108.06363  [pdf, other

    cs.SE cs.PL

    Augmenting Decompiler Output with Learned Variable Names and Types

    Authors: Qibin Chen, Jeremy Lacomis, Edward J. Schwartz, Claire Le Goues, Graham Neubig, Bogdan Vasilescu

    Abstract: A common tool used by security professionals for reverse-engineering binaries found in the wild is the decompiler. A decompiler attempts to reverse compilation, transforming a binary to a higher-level language such as C. High-level languages ease reasoning about programs by providing useful abstractions such as loops, typed variables, and comments, but these abstractions are lost during compilatio… ▽ More

    Submitted 13 August, 2021; originally announced August 2021.

    Comments: 17 pages to be published in USENIX Security '22

  7. arXiv:2101.11149  [pdf, other

    cs.SE

    In-IDE Code Generation from Natural Language: Promise and Challenges

    Authors: Frank F. Xu, Bogdan Vasilescu, Graham Neubig

    Abstract: A great part of software development involves conceptualizing or communicating the underlying procedures and logic that needs to be expressed in programs. One major difficulty of programming is turning concept into code, especially when dealing with the APIs of unfamiliar libraries. Recently, there has been a proliferation of machine learning methods for code generation and retrieval from natural… ▽ More

    Submitted 22 September, 2021; v1 submitted 26 January, 2021; originally announced January 2021.

    Comments: 47 pages, accepted to ACM Transactions on Software Engineering and Methodology

  8. arXiv:2006.12636  [pdf, ps, other

    cs.SE

    Multitasking Across Industry Projects: A Replication Study

    Authors: Karina Kohl, Bogdan Vasilescu, Rafael Prikladnicki

    Abstract: Background: Multitasking is usual in software development. It is the ability to stop working on a task, switch to another, and return eventually to the first one, as needed or as scheduled. Multitasking, however, comes at a cognitive cost: frequent context-switches can lead to distraction, sub-standard work, and even greater stress. Aims: This paper reports a replication experiment where we gather… ▽ More

    Submitted 22 June, 2020; originally announced June 2020.

  9. arXiv:2004.09015  [pdf, other

    cs.CL

    Incorporating External Knowledge through Pre-training for Natural Language to Code Generation

    Authors: Frank F. Xu, Zhengbao Jiang, Pengcheng Yin, Bogdan Vasilescu, Graham Neubig

    Abstract: Open-domain code generation aims to generate code in a general-purpose programming language (such as Python) from natural language (NL) intents. Motivated by the intuition that developers usually retrieve resources on the web when writing code, we explore the effectiveness of incorporating two varieties of external knowledge into NL-to-code generation: automatically mined NL-code pairs from the on… ▽ More

    Submitted 19 April, 2020; originally announced April 2020.

    Comments: Accepted by ACL 2020

  10. An Exploratory Study of Bot Commits

    Authors: Tapajit Dey, Bogdan Vasilescu, Audris Mockus

    Abstract: Background: Bots help automate many of the tasks performed by software developers and are widely used to commit code in various social coding platforms. At present, it is not clear what types of activities these bots perform and understanding it may help design better bots, and find application areas which might benefit from bot adoption. Aim: We aim to categorize the Bot Commits by the type of ch… ▽ More

    Submitted 27 March, 2020; v1 submitted 17 March, 2020; originally announced March 2020.

  11. arXiv:2003.03172  [pdf, other

    cs.SE cs.CR cs.LG cs.SI stat.ML

    Detecting and Characterizing Bots that Commit Code

    Authors: Tapajit Dey, Sara Mousavi, Eduardo Ponce, Tanner Fry, Bogdan Vasilescu, Anna Filippova, Audris Mockus

    Abstract: Background: Some developer activity traditionally performed manually, such as making code commits, opening, managing, or closing issues is increasingly subject to automation in many OSS projects. Specifically, such activity is often performed by tools that react to events or run at specific times. We refer to such automation tools as bots and, in many software mining scenarios related to developer… ▽ More

    Submitted 27 March, 2020; v1 submitted 2 March, 2020; originally announced March 2020.

    Comments: Preprint of the paper accepted in MSR, 2020 conference

  12. arXiv:1909.09029  [pdf, other

    cs.SE

    DIRE: A Neural Approach to Decompiled Identifier Naming

    Authors: Jeremy Lacomis, Pengcheng Yin, Edward J. Schwartz, Miltiadis Allamanis, Claire Le Goues, Graham Neubig, Bogdan Vasilescu

    Abstract: The decompiler is one of the most common tools for examining binaries without corresponding source code. It transforms binaries into high-level code, reversing the compilation process. Decompilers can reconstruct much of the information that is lost during the compilation process (e.g., structure and type information). Unfortunately, they do not reconstruct semantically meaningful variable names,… ▽ More

    Submitted 3 October, 2019; v1 submitted 19 September, 2019; originally announced September 2019.

    Comments: 2019 International Conference on Automated Software Engineering

  13. A large-scale, in-depth analysis of developers' personalities in the Apache ecosystem

    Authors: Fabio Calefato, Filippo Lanubile, Bogdan Vasilescu

    Abstract: Context: Large-scale distributed projects are typically the results of collective efforts performed by multiple developers with heterogeneous personalities. Objective: We aim to find evidence that personalities can explain developers' behavior in large scale-distributed projects. For example, the propensity to trust others - a critical factor for the success of global software engineering - has be… ▽ More

    Submitted 13 April, 2022; v1 submitted 30 May, 2019; originally announced May 2019.

    Comments: pp. 1-20

    Journal ref: Information and Software Technology, Vol. 114, 2019

  14. arXiv:1903.06725  [pdf, other

    cs.SE

    BugSwarm: Mining and Continuously Growing a Dataset of Reproducible Failures and Fixes

    Authors: David A. Tomassi, Naji Dmeiri, Yichen Wang, Antara Bhowmick, Yen-Chuan Liu, Premkumar Devanbu, Bogdan Vasilescu, Cindy Rubio-González

    Abstract: Fault-detection, localization, and repair methods are vital to software quality; but it is difficult to evaluate their generality, applicability, and current effectiveness. Large, diverse, realistic datasets of durably-reproducible faults and fixes are vital to good experimental evaluation of approaches to software quality, but they are difficult and expensive to assemble and keep current. Modern… ▽ More

    Submitted 22 July, 2019; v1 submitted 15 March, 2019; originally announced March 2019.

    Comments: In Proceedings of the 41st ACM/IEEE International Conference on Software Engineering (ICSE'19)

  15. arXiv:1805.08949  [pdf, other

    cs.CL cs.SE

    Learning to Mine Aligned Code and Natural Language Pairs from Stack Overflow

    Authors: Pengcheng Yin, Bowen Deng, Edgar Chen, Bogdan Vasilescu, Graham Neubig

    Abstract: For tasks like code synthesis from natural language, code retrieval, and code summarization, data-driven models have shown great promise. However, creating these models require parallel data between natural language (NL) and code with fine-grained alignments. Stack Overflow (SO) is a promising source to create such a data set: the questions are diverse and most of them have corresponding answers w… ▽ More

    Submitted 22 May, 2018; originally announced May 2018.

    Comments: MSR '18

  16. On Developers' Personality in Large-scale Distributed Projects: The Case of the Apache Ecosystem

    Authors: Fabio Calefato, Giuseppe Iaffaldano, Filippo Lanubile, Bogdan Vasilescu

    Abstract: Large-scale distributed projects are typically the results of collective efforts performed by multiple developers, each one having a different personality. The study of developers' personalities has the potential of explaining their' behavior in various contexts. For example, the propensity to trust others, a critical factor to the success of global software engineering - has been found to influen… ▽ More

    Submitted 24 September, 2018; v1 submitted 3 March, 2018; originally announced March 2018.

    Comments: In Proc. Int'l Conf. on Global Software Engineering (ICGSE'18), Gothenburg, Sweden, May 28-29, 2018

    Journal ref: In Proc. Int'l Conf. on Global Software Engineering (ICGSE'18), Gothenburg, Sweden, May 28-29, 2018

  17. arXiv:1606.00521  [pdf, other

    cs.SE

    Initial and Eventual Software Quality Relating to Continuous Integration in GitHub

    Authors: Yue Yu, Bogdan Vasilescu, Huaimin Wang, Vladimir Filkov, Premkumar Devanbu

    Abstract: The constant demand for new features and bug fixes are forcing software projects to shorten cycles and deliver updates ever faster, while sustaining software quality. The availability of inexpensive, virtualized, cloud-computing has helped shorten schedules, by enabling continuous integration (CI) on demand. Platforms like GitHub support CI in-the-cloud. In projects using CI, a user submitting a p… ▽ More

    Submitted 1 June, 2016; originally announced June 2016.

  18. Continuous integration in a social-coding world: Empirical evidence from GitHub. **Updated version with corrections**

    Authors: Bogdan Vasilescu, Stef van Schuylenburg, Jules Wulms, Alexander Serebrenik, Mark G. J. van den Brand

    Abstract: Continuous integration is a software engineering practice of frequently merging all developer working copies with a shared main branch, e.g., several times a day. With the advent of GitHub, a platform well known for its "social coding" features that aid collaboration and sharing, and currently the largest code host in the open source world, collaborative software development has never been more pr… ▽ More

    Submitted 6 December, 2015; originally announced December 2015.

    Comments: This is an updated and corrected version of our ICSME 2014 paper: http://dx.doi.org/10.1109/ICSME.2014.62