Skip to main content

Showing 1–18 of 18 results for author: Berger, E D

.
  1. arXiv:2403.16354  [pdf, other

    cs.SE cs.AI cs.LG cs.PL

    ChatDBG: An AI-Powered Debugging Assistant

    Authors: Kyla Levin, Nicolas van Kempen, Emery D. Berger, Stephen N. Freund

    Abstract: This paper presents ChatDBG, the first AI-powered debugging assistant. ChatDBG integrates large language models (LLMs) to significantly enhance the capabilities and user-friendliness of conventional debuggers. ChatDBG lets programmers engage in a collaborative dialogue with the debugger, allowing them to pose complex questions about program state, perform root cause analysis for crashes or asserti… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: 11 pages

  2. arXiv:2403.16218  [pdf, other

    cs.SE cs.AI cs.LG cs.PL

    CoverUp: Coverage-Guided LLM-Based Test Generation

    Authors: Juan Altmayer Pizzorno, Emery D. Berger

    Abstract: This paper presents CoverUp, a novel system that drives the generation of high-coverage Python regression tests via a combination of coverage analysis and large-language models (LLMs). CoverUp iteratively improves coverage, interleaving coverage analysis with dialogs with the LLM to focus its attention on as yet uncovered lines and branches. The resulting test suites significantly improve coverage… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: 11 pages

  3. SlipCover: Near Zero-Overhead Code Coverage for Python

    Authors: Juan Altmayer Pizzorno, Emery D Berger

    Abstract: Coverage analysis is widely used but can suffer from high overhead. This overhead is especially acute in the context of Python, which is already notoriously slow (a recent study observes a roughly 30x slowdown vs. native code). We find that the state-of-the-art coverage tool for Python, coverage$.$py, introduces a median overhead of 180% with the standard Python interpreter. Slowdowns are even mor… ▽ More

    Submitted 31 May, 2023; v1 submitted 4 May, 2023; originally announced May 2023.

    Comments: Accepted to ISSTA 2023

    ACM Class: D.2.5

  4. arXiv:2212.07597  [pdf, other

    cs.PL cs.PF

    Triangulating Python Performance Issues with Scalene

    Authors: Emery D. Berger, Sam Stern, Juan Altmayer Pizzorno

    Abstract: This paper proposes Scalene, a profiler specialized for Python. Scalene combines a suite of innovations to precisely and simultaneously profile CPU, memory, and GPU usage, all with low overhead. Scalene's CPU and memory profilers help Python programmers direct their optimization efforts by distinguishing between inefficient Python and efficient native execution time and memory usage. Scalene's mem… ▽ More

    Submitted 14 December, 2022; originally announced December 2022.

    Report number: Accepted, to appear at OSDI 2023

  5. arXiv:2010.01700  [pdf, other

    cs.CR cs.CY cs.NE cs.PL

    Mossad: Defeating Software Plagiarism Detection

    Authors: Breanna Devore-McDonald, Emery D. Berger

    Abstract: Automatic software plagiarism detection tools are widely used in educational settings to ensure that submitted work was not copied. These tools have grown in use together with the rise in enrollments in computer science programs and the widespread availability of code on-line. Educators rely on the robustness of plagiarism detection tools; the working assumption is that the effort required to evad… ▽ More

    Submitted 4 October, 2020; originally announced October 2020.

    Comments: 30 pages. To appear, OOPSLA 2020

  6. arXiv:2006.03879  [pdf, other

    cs.PL cs.SE

    Scalene: Scripting-Language Aware Profiling for Python

    Authors: Emery D. Berger

    Abstract: Existing profilers for scripting languages (a.k.a. "glue" languages) like Python suffer from numerous problems that drastically limit their usefulness. They impose order-of-magnitude overheads, report information at too coarse a granularity, or fail in the face of threads. Worse, past profilers---essentially variants of their counterparts for C---are oblivious to the fact that optimizing code in s… ▽ More

    Submitted 25 July, 2020; v1 submitted 6 June, 2020; originally announced June 2020.

  7. arXiv:1911.11894  [pdf, other

    cs.SE cs.PL

    FSE/CACM Rebuttal$^2$: Correcting A Large-Scale Study of Programming Languages and Code Quality in GitHub

    Authors: Emery D. Berger, Petr Maj, Olga Vitek, Jan Vitek

    Abstract: Ray, Devanbu and Filkov issued a rebuttal of our TOPLAS paper "On the Impact of Programming Languages on Code Quality: A Reproduction Study". Our paper reproduced "A Large-Scale Study of Programming Languages and Code Quality in GitHub", which appeared at FSE 2014 and was subsequently republished as a CACM research highlight in 2017. This article is a rebuttal to that rebuttal.

    Submitted 26 November, 2019; originally announced November 2019.

  8. PlanAlyzer: Assessing Threats to the Validity of Online Experiments

    Authors: Emma Tosch, Eytan Bakshy, Emery D. Berger, David D. Jensen, J. Eliot B. Moss

    Abstract: Online experiments are ubiquitous. As the scale of experiments has grown, so has the complexity of their design and implementation. In response, firms have developed software frameworks for designing and deploying online experiments. Ensuring that experiments in these frameworks are correctly designed and that their results are trustworthy---referred to as *internal validity*---can be difficult. C… ▽ More

    Submitted 30 September, 2019; originally announced September 2019.

    Comments: 30 pages, hella long

    Journal ref: OOPSLA 2019

  9. arXiv:1904.05387  [pdf, other

    cs.PL cs.HC cs.MS

    Tea: A High-level Language and Runtime System for Automating Statistical Analysis

    Authors: Eunice Jun, Maureen Daum, Jared Roesch, Sarah E. Chasins, Emery D. Berger, Rene Just, Katharina Reinecke

    Abstract: Though statistical analyses are centered on research questions and hypotheses, current statistical analysis tools are not. Users must first translate their hypotheses into specific statistical tests and then perform API calls with functions and parameters. To do so accurately requires that users have statistical expertise. To lower this barrier to valid, replicable statistical analysis, we introdu… ▽ More

    Submitted 10 April, 2019; originally announced April 2019.

    Comments: 11 pages

  10. arXiv:1902.04738  [pdf, other

    cs.PL cs.DS cs.PF

    Mesh: Compacting Memory Management for C/C++ Applications

    Authors: Bobby Powers, David Tench, Emery D. Berger, Andrew McGregor

    Abstract: Programs written in C/C++ can suffer from serious memory fragmentation, leading to low utilization of memory, degraded performance, and application failure due to memory exhaustion. This paper introduces Mesh, a plug-in replacement for malloc that, for the first time, eliminates fragmentation in unmodified C/C++ applications. Mesh combines novel randomized algorithms with widely-supported virtual… ▽ More

    Submitted 16 February, 2019; v1 submitted 12 February, 2019; originally announced February 2019.

    Comments: Draft version, accepted at PLDI 2019

  11. arXiv:1901.11100  [pdf, other

    cs.PL cs.SE

    ExceLint: Automatically Finding Spreadsheet Formula Errors

    Authors: Daniel W. Barowy, Emery D. Berger, Benjamin Zorn

    Abstract: Spreadsheets are one of the most widely used programming environments, and are widely deployed in domains like finance where errors can have catastrophic consequences. We present a static analysis specifically designed to find spreadsheet formula errors. Our analysis directly leverages the rectangular character of spreadsheets. It uses an information-theoretic approach to identify formulas that ar… ▽ More

    Submitted 30 January, 2019; originally announced January 2019.

    Comments: Appeared at OOPSLA 2018

    Journal ref: Proceedings of the ACM on Programming Languages, Volume 2 Issue OOPSLA, November 2018

  12. On the Impact of Programming Languages on Code Quality

    Authors: Emery D. Berger, Celeste Hollenbeck, Petr Maj, Olga Vitek, Jan Vitek

    Abstract: This paper is a reproduction of work by Ray et al. which claimed to have uncovered a statistically significant association between eleven programming languages and software defects in projects hosted on GitHub. First we conduct an experimental repetition, repetition is only partially successful, but it does validate one of the key claims of the original work about the association of ten programmin… ▽ More

    Submitted 24 April, 2019; v1 submitted 29 January, 2019; originally announced January 2019.

    Comments: Accepted, to appear in TOPLAS (ACM Transactions of Programming Languages and Systems)

  13. arXiv:1810.11865  [pdf, other

    cs.PL

    McFly: Time-Travel Debugging for the Web

    Authors: John Vilk, Emery D. Berger, James Mickens, Mark Marron

    Abstract: Time-traveling debuggers offer the promise of simplifying debugging by letting developers freely step forwards and backwards through a program's execution. However, web applications present multiple challenges that make time-travel debugging especially difficult. A time-traveling debugger for web applications must accurately reproduce all network interactions, asynchronous events, and visual state… ▽ More

    Submitted 28 October, 2018; originally announced October 2018.

  14. Browsix: Bridging the Gap Between Unix and the Browser

    Authors: Bobby Powers, John Vilk, Emery D. Berger

    Abstract: Applications written to run on conventional operating systems typically depend on OS abstractions like processes, pipes, signals, sockets, and a shared file system. Porting these applications to the web currently requires extensive rewriting or hosting significant portions of code server-side because browsers present a nontraditional runtime environment that lacks OS functionality. This paper pr… ▽ More

    Submitted 29 April, 2019; v1 submitted 23 November, 2016; originally announced November 2016.

    Comments: Final version published at https://dl.acm.org/citation.cfm?doid=3037697.3037727

    Journal ref: ASPLOS 2017

  15. Prioritized Garbage Collection: Explicit GC Support for Software Caches

    Authors: Diogenes Nunez, Samuel Z. Guyer, Emery D. Berger

    Abstract: Programmers routinely trade space for time to increase performance, often in the form of caching or memoization. In managed languages like Java or JavaScript, however, this space-time tradeoff is complex. Using more space translates into higher garbage collection costs, especially at the limit of available memory. Existing runtime systems provide limited support for space-sensitive algorithms, for… ▽ More

    Submitted 15 October, 2016; originally announced October 2016.

    Comments: to appear in OOPSLA 2016

    ACM Class: D.3.4

  16. Coz: Finding Code that Counts with Causal Profiling

    Authors: Charlie Curtsinger, Emery D. Berger

    Abstract: Improving performance is a central concern for software developers. To locate optimization opportunities, developers rely on software profilers. However, these profilers only report where programs spent their time: optimizing that code may have no impact on performance. Past profilers thus both waste developer time and make it difficult for them to uncover significant optimization opportunities.… ▽ More

    Submitted 12 August, 2016; originally announced August 2016.

    Comments: Published at SOSP 2015 (Best Paper Award)

    ACM Class: D.4.8; C.4

    Journal ref: Proceedings of the 25th Symposium on Operating Systems Principles (SOSP '15), 2015, 184-197

  17. arXiv:1601.07962  [pdf, other

    cs.SE

    DoubleTake: Fast and Precise Error Detection via Evidence-Based Dynamic Analysis

    Authors: Tong** Liu, Charlie Curtsinger, Emery D. Berger

    Abstract: This paper presents evidence-based dynamic analysis, an approach that enables lightweight analyses--under 5% overhead for these bugs--making it practical for the first time to perform these analyses in deployed settings. The key insight of evidence-based dynamic analysis is that for a class of errors, it is possible to ensure that evidence that they happened at some point in the past remains for l… ▽ More

    Submitted 28 January, 2016; originally announced January 2016.

    Comments: Pre-print, accepted to appear at ICSE 2016

    ACM Class: D.2.5; D.2.4; D.3.4

  18. arXiv:1406.5572  [pdf, other

    cs.PL cs.HC

    SurveyMan: Programming and Automatically Debugging Surveys

    Authors: Emma Tosch, Emery D. Berger

    Abstract: Surveys can be viewed as programs, complete with logic, control flow, and bugs. Word choice or the order in which questions are asked can unintentionally bias responses. Vague, confusing, or intrusive questions can cause respondents to abandon a survey. Surveys can also have runtime errors: inattentive respondents can taint results. This effect is especially problematic when deploying surveys in u… ▽ More

    Submitted 20 June, 2014; originally announced June 2014.

    Comments: Submitted version; accepted to OOPSLA 2014

    ACM Class: D.3.2; J.4; J.5