Skip to main content

Showing 1–18 of 18 results for author: Barr, E T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.17218  [pdf, other

    cs.SE cs.CR cs.LG

    A Comprehensive Study of the Capabilities of Large Language Models for Vulnerability Detection

    Authors: Benjamin Steenhoek, Md Mahbubur Rahman, Monoshi Kumar Roy, Mirza Sanjida Alam, Earl T. Barr, Wei Le

    Abstract: Large Language Models (LLMs) have demonstrated great potential for code generation and other software engineering tasks. Vulnerability detection is of crucial importance to maintaining the security, integrity, and trustworthiness of software systems. Precise vulnerability detection requires reasoning about the code, making it a good case study for exploring the limits of LLMs' reasoning capabiliti… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  2. arXiv:2308.08203  [pdf, other

    cs.LG cs.SE

    Epicure: Distilling Sequence Model Predictions into Patterns

    Authors: Miltiadis Allamanis, Earl T. Barr

    Abstract: Most machine learning models predict a probability distribution over concrete outputs and struggle to accurately predict names over high entropy sequence distributions. Here, we explore finding abstract, high-precision patterns intrinsic to these predictions in order to make abstract predictions that usefully capture rare sequences. In this short paper, we present Epicure, a method that distils th… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

  3. arXiv:2307.10896  [pdf, other

    cs.SE

    Software Product Line Engineering via Software Transplantation

    Authors: Leandro O. Souza, Earl T. Barr, Justyna Petke, Eduardo S. Almeida, Paulo Anselmo M. S. Neto

    Abstract: For companies producing related products, a Software Product Line (SPL) is a software reuse method that improves time-to-market and software quality, achieving substantial cost reductions.These benefits do not come for free. It often takes years to re-architect and re-engineer a codebase to support SPL and, once adopted, it must be maintained. Current SPL practice relies on a collection of tools,… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

  4. arXiv:2304.06815  [pdf, other

    cs.SE cs.LG

    Automatic Semantic Augmentation of Language Model Prompts (for Code Summarization)

    Authors: Toufique Ahmed, Kunal Suresh Pai, Premkumar Devanbu, Earl T. Barr

    Abstract: Large Language Models (LLM) are a new class of computation engines, "programmed" via prompt engineering. We are still learning how to best "program" these LLMs to help developers. We start with the intuition that developers tend to consciously and unconsciously have a collection of semantics facts in mind when working on coding tasks. Mostly these are shallow, simple facts arising from a quick rea… ▽ More

    Submitted 11 January, 2024; v1 submitted 13 April, 2023; originally announced April 2023.

    Comments: Accepted at International Conference on Software Engineering (ICSE-2024)

  5. arXiv:2204.07363  [pdf, ps, other

    cs.CL cs.SE

    Is Surprisal in Issue Trackers Actionable?

    Authors: James Caddy, Markus Wagner, Christoph Treude, Earl T. Barr, Miltiadis Allamanis

    Abstract: Background. From information theory, surprisal is a measurement of how unexpected an event is. Statistical language models provide a probabilistic approximation of natural languages, and because surprisal is constructed with the probability of an event occuring, it is therefore possible to determine the surprisal associated with English sentences. The issues and pull requests of software repositor… ▽ More

    Submitted 15 April, 2022; originally announced April 2022.

    Comments: 8 pages, 1 figure. Submitted to 2022 International Conference on Mining Software Repositories Registered Reports track

    ACM Class: H.3.3; I.2.7

  6. arXiv:2004.10657  [pdf, other

    cs.PL cs.LG stat.ML

    Typilus: Neural Type Hints

    Authors: Miltiadis Allamanis, Earl T. Barr, Soline Ducousso, Zheng Gao

    Abstract: Type inference over partial contexts in dynamically typed languages is challenging. In this work, we present a graph neural network model that predicts types by probabilistically reasoning over a program's structure, names, and patterns. The network uses deep similarity learning to learn a TypeSpace -- a continuous relaxation of the discrete space of types -- and how to embed the type properties o… ▽ More

    Submitted 6 April, 2020; originally announced April 2020.

    Comments: Accepted to PLDI 2020

  7. arXiv:2004.00348  [pdf, other

    cs.PL cs.LG

    OptTyper: Probabilistic Type Inference by Optimising Logical and Natural Constraints

    Authors: Irene Vlassi Pandi, Earl T. Barr, Andrew D. Gordon, Charles Sutton

    Abstract: We present a new approach to the type inference problem for dynamic languages. Our goal is to combine \emph{logical} constraints, that is, deterministic information from a type system, with \emph{natural} constraints, that is, uncertain statistical information about types learnt from sources like identifier names. To this end, we introduce a framework for probabilistic type inference that combines… ▽ More

    Submitted 26 March, 2021; v1 submitted 1 April, 2020; originally announced April 2020.

    Comments: 29 pages, 5 figures, 2 tables

  8. arXiv:1905.12734  [pdf, other

    cs.PL

    Sub-Turing Islands in the Wild

    Authors: Earl T. Barr, David W. Binkley, Mark Harman, Mohamed Nassim Seghir

    Abstract: Recently, there has been growing debate as to whether or not static analysis can be truly sound. In spite of this concern, research on techniques seeking to at least partially answer undecidable questions has a long history. However, little attention has been given to the more empirical question of how often an exact solution might be given to a question despite the question being, at least in the… ▽ More

    Submitted 29 May, 2019; originally announced May 2019.

  9. arXiv:1905.10201  [pdf, other

    cs.LG stat.ML

    Model Validation Using Mutated Training Labels: An Exploratory Study

    Authors: Jie M. Zhang, Mark Harman, Benjamin Guedj, Earl T. Barr, John Shawe-Taylor

    Abstract: We introduce an exploratory study on Mutation Validation (MV), a model validation method using mutated training labels for supervised learning. MV mutates training data labels, retrains the model against the mutated data, then uses the metamorphic relation that captures the consequent training performance changes to assess model fit. It does not use a validation set or test set. The intuition unde… ▽ More

    Submitted 20 October, 2021; v1 submitted 24 May, 2019; originally announced May 2019.

  10. arXiv:1904.11254  [pdf, ps, other

    cs.PL

    SafeStrings: Representing Strings as Structured Data

    Authors: David Kelly, Mark Marron, David Clark, Earl T. Barr

    Abstract: Strings are ubiquitous in code. Not all strings are created equal, some contain structure that makes them incompatible with other strings. CSS units are an obvious example. Worse, type checkers cannot see this structure: this is the latent structure problem. We introduce SafeStrings to solve this problem and expose latent structure in strings. Once visible, operations can leverage this structure t… ▽ More

    Submitted 25 April, 2019; originally announced April 2019.

    Comments: 25 pages

  11. arXiv:1806.10235  [pdf, other

    cs.SE

    Indexing Operators to Extend the Reach of Symbolic Execution

    Authors: Earl T. Barr, David Clark, Mark Harman, Alexandru Marginean

    Abstract: Traditional program analysis analyses a program language, that is, all programs that can be written in the language. There is a difference, however, between all possible programs that can be written and the corpus of actual programs written in a language. We seek to exploit this difference: for a given program, we apply a bespoke program transformation Indexify to convert expressions that current… ▽ More

    Submitted 26 June, 2018; originally announced June 2018.

  12. arXiv:1806.04616  [pdf, ps, other

    cs.SE cs.CL

    Deep Learning to Detect Redundant Method Comments

    Authors: Annie Louis, Santanu Kumar Dash, Earl T. Barr, Charles Sutton

    Abstract: Comments in software are critical for maintenance and reuse. But apart from prescriptive advice, there is little practical support or quantitative understanding of what makes a comment useful. In this paper, we introduce the task of identifying comments which are uninformative about the code they are meant to document. To address this problem, we introduce the notion of comment entailment from cod… ▽ More

    Submitted 12 June, 2018; originally announced June 2018.

    Comments: 12 pages

  13. arXiv:1709.06182  [pdf, ps, other

    cs.SE cs.LG cs.PL

    A Survey of Machine Learning for Big Code and Naturalness

    Authors: Miltiadis Allamanis, Earl T. Barr, Premkumar Devanbu, Charles Sutton

    Abstract: Research at the intersection of machine learning, programming languages, and software engineering has recently taken important steps in proposing learnable probabilistic models of source code that exploit code's abundance of patterns. In this article, we survey this work. We contrast programming languages against natural languages and discuss how these similarities and differences drive the design… ▽ More

    Submitted 4 May, 2018; v1 submitted 18 September, 2017; originally announced September 2017.

    Comments: Website accompanying this survey paper can be found at https://ml4code.github.io

  14. arXiv:1611.02516  [pdf, other

    cs.SE

    Tailored Mutants Fit Bugs Better

    Authors: Miltiadis Allamanis, Earl T. Barr, René Just, Charles Sutton

    Abstract: Mutation analysis measures test suite adequacy, the degree to which a test suite detects seeded faults: one test suite is better than another if it detects more mutants. Mutation analysis effectiveness rests on the assumption that mutants are coupled with real faults i.e. mutant detection is strongly correlated with real fault detection. The work that validated this also showed that a large portio… ▽ More

    Submitted 8 November, 2016; originally announced November 2016.

  15. arXiv:1502.07661  [pdf, other

    cs.CR cs.CC

    Detecting Malware with Information Complexity

    Authors: Nadia Alshahwan, Earl T. Barr, David Clark, George Danezis

    Abstract: This work focuses on a specific front of the malware detection arms-race, namely the detection of persistent, disk-resident malware. We exploit normalised compression distance (NCD), an information theoretic measure, applied directly to binaries. Given a zoo of labelled malware and benign-ware, we ask whether a suspect program is more similar to our malware or to our benign-ware. Our approach clas… ▽ More

    Submitted 26 February, 2015; originally announced February 2015.

  16. Casper: Debugging Null Dereferences with Dynamic Causality Traces

    Authors: Benoit Cornu, Earl T. Barr, Lionel Seinturier, Martin Monperrus

    Abstract: Fixing a software error requires understanding its root cause. In this paper, we introduce ''causality traces'', crafted execution traces augmented with the information needed to reconstruct the causal chain from the root cause of a bug to an execution error. We propose an approach and a tool, called Casper, for dynamically constructing causality traces for null dereference errors. The core idea o… ▽ More

    Submitted 20 November, 2015; v1 submitted 6 February, 2015; originally announced February 2015.

    Journal ref: Journal of Systems and Software, 2016

  17. arXiv:1502.01410  [pdf, other

    cs.SE

    On the Lexical Distinguishability of Source Code

    Authors: Martin Velez, Dong Qiu, You Zhou, Earl T. Barr, Zhendong Su

    Abstract: Natural language is robust against noise. The meaning of many sentences survives the loss of words, sometimes many of them. Some words in a sentence, however, cannot be lost without changing the meaning of the sentence. We call these words "wheat" and the rest "chaff". The word "not" in the sentence "I do not like rain" is wheat and "do" is chaff. For human understanding of the purpose and behavio… ▽ More

    Submitted 27 June, 2018; v1 submitted 4 February, 2015; originally announced February 2015.

    Comments: 14 pages, 10 figures, Under Submission

  18. Learning Natural Coding Conventions

    Authors: Miltiadis Allamanis, Earl T. Barr, Christian Bird, Charles Sutton

    Abstract: Every programmer has a characteristic style, ranging from preferences about identifier naming to preferences about object relationships and design patterns. Coding conventions define a consistent syntactic style, fostering readability and hence maintainability. When collaborating, programmers strive to obey a project's coding conventions. However, one third of reviews of changes contain feedback a… ▽ More

    Submitted 7 April, 2014; v1 submitted 17 February, 2014; originally announced February 2014.