Search | arXiv e-print repository

Learning to Learn to Predict Performance Regressions in Production at Meta

Authors: Moritz Beller, Hongyu Li, Vivek Nair, Vijayaraghavan Murali, Imad Ahmad, Jürgen Cito, Drew Carlson, Ari Aye, Wes Dyer

Abstract: Catching and attributing code change-induced performance regressions in production is hard; predicting them beforehand, even harder. A primer on automatically learning to predict performance regressions in software, this article gives an account of the experiences we gained when researching and deploying an ML-based regression prediction pipeline at Meta. In this paper, we report on a comparative… ▽ More Catching and attributing code change-induced performance regressions in production is hard; predicting them beforehand, even harder. A primer on automatically learning to predict performance regressions in software, this article gives an account of the experiences we gained when researching and deploying an ML-based regression prediction pipeline at Meta. In this paper, we report on a comparative study with four ML models of increasing complexity, from (1) code-opaque, over (2) Bag of Words, (3) off-the-shelve Transformer-based, to (4) a bespoke Transformer-based model, coined SuperPerforator. Our investigation shows the inherent difficulty of the performance prediction problem, which is characterized by a large imbalance of benign onto regressing changes. Our results also call into question the general applicability of Transformer-based architectures for performance prediction: an off-the-shelve CodeBERT-based approach had surprisingly poor performance; our highly customized SuperPerforator architecture initially achieved prediction performance that was just on par with simpler Bag of Words models, and only outperformed them for down-stream use cases. This ability of SuperPerforator to transfer to an application with few learning examples afforded an opportunity to deploy it in practice at Meta: it can act as a pre-filter to sort out changes that are unlikely to introduce a regression, truncating the space of changes to search a regression in by up to 43%, a 45x improvement over a random baseline. To gain further insight into SuperPerforator, we explored it via a series of experiments computing counterfactual explanations. These highlight which parts of a code change the model deems important, thereby validating the learned black-box model. △ Less

Submitted 22 May, 2023; v1 submitted 8 August, 2022; originally announced August 2022.

arXiv:2101.09563 [pdf, other]

doi 10.1007/s10664-021-10071-9

Präzi: From Package-based to Call-based Dependency Networks

Authors: Joseph Hejderup, Moritz Beller, Konstantinos Triantafyllou, Georgios Gousios

Abstract: Modern programming languages such as Java, JavaScript, and Rust encourage software reuse by hosting diverse and fast-growing repositories of highly interdependent packages (i.e., reusable libraries) for their users. The standard way to study the interdependence between software packages is to infer a package dependency network by parsing manifest data. Such networks help answer questions such as "… ▽ More Modern programming languages such as Java, JavaScript, and Rust encourage software reuse by hosting diverse and fast-growing repositories of highly interdependent packages (i.e., reusable libraries) for their users. The standard way to study the interdependence between software packages is to infer a package dependency network by parsing manifest data. Such networks help answer questions such as "How many packages have dependencies to packages with known security issues?" or "What are the most used packages?". However, an overlooked aspect in existing studies is that manifest-inferred relationships do not necessarily examine the actual usage of these dependencies in source code. To better model dependencies between packages, we developed Präzi, an approach combining manifests and call graphs of packages. Präzi constructs a dependency network at the more fine-grained function-level, instead of at the manifest level. This paper discusses a prototypical Präzi implementation for the popular system programming language Rust. We use Präzi to characterize Rust's package repository, Cratesio, at the function level and perform a comparative study with metadata-based networks. Our results show that metadata-based networks generalize how packages use their dependencies. Using Präzi, we find packages call only 40% of their resolved dependencies, and that manual analysis of 34 cases reveals that not all packages use a dependency the same way. We argue that researchers and practitioners interested in understanding how developers or programs use dependencies should account for its context -- not the sum of all resolved dependencies. △ Less

Submitted 20 October, 2021; v1 submitted 23 January, 2021; originally announced January 2021.

Comments: 42 pages, 14 figures, journal

arXiv:2012.07428 [pdf, other]

Mind the Gap: On the Relationship Between Automatically Measured and Self-Reported Productivity

Authors: Moritz Beller, Vince Orgovan, Spencer Buja, Thomas Zimmermann

Abstract: To improve software developers' productivity has been the holy grail of software engineering research. But before we can claim to have improved it, we must first be able to measure productivity. This is far from trivial. In fact, two separate research lines on software engineers' productivity have co-existed almost in complete isolation for a long time: automated product and process measures on th… ▽ More To improve software developers' productivity has been the holy grail of software engineering research. But before we can claim to have improved it, we must first be able to measure productivity. This is far from trivial. In fact, two separate research lines on software engineers' productivity have co-existed almost in complete isolation for a long time: automated product and process measures on the one hand and self-reported or perceived productivity on the other hand. In this article, we bridge the gap between the two with an empirical study of 81 software developers at Microsoft. △ Less

Submitted 14 December, 2020; originally announced December 2020.

arXiv:2010.13464 [pdf, other]

What It Would Take to Use Mutation Testing in Industry--A Study at Facebook

Authors: Moritz Beller, Chu-Pan Wong, Johannes Bader, Andrew Scott, Mateusz Machalica, Satish Chandra, Erik Meijer

Abstract: Traditionally, mutation testing generates an abundance of small deviations of a program, called mutants. At industrial systems the scale and size of Facebook's, doing this is infeasible. We should not create mutants that the test suite would likely fail on or that give no actionable signal to developers. To tackle this problem, in this paper, we semi-automatically learn error-inducing patterns fro… ▽ More Traditionally, mutation testing generates an abundance of small deviations of a program, called mutants. At industrial systems the scale and size of Facebook's, doing this is infeasible. We should not create mutants that the test suite would likely fail on or that give no actionable signal to developers. To tackle this problem, in this paper, we semi-automatically learn error-inducing patterns from a corpus of common Java coding errors and from changes that caused operational anomalies at Facebook specifically. We combine the mutations with instrumentation that measures which tests exactly visited the mutated piece of code. Results on more than 15,000 generated mutants show that more than half of the generated mutants survive Facebook's rigorous test suite of unit, integration, and system tests. Moreover, in a case study with 26 developers, all but two found information of automatically detected test holes interesting in principle. As such, almost half of the 26 would actually act on the mutant presented to them by adapting an existing or creating a new test. The others did not for a variety of reasons often outside the scope of mutation testing. It remains a practical challenge how we can include such external information to increase the true actionability rate on mutants. △ Less

Submitted 27 January, 2021; v1 submitted 26 October, 2020; originally announced October 2020.

arXiv:1911.01254 [pdf, other]

Site-selective and real-time observation of bimolecular electron transfer during photocatalytic water splitting

Authors: Alexander Britz, Sergey I. Bokarev, Tadesse A. Assefa, Éva G. Bajnóczi, Zoltán Németh, György Vankó, Nils Rockstroh, Henrik Junge, Matthias Beller, Gilles Doumy, Anne Marie March, Stephen H. Southworth, Stefan Lochbrunner, Oliver Kühn, Christian Bressler, Wojciech Gawelda

Abstract: Time-resolved X-ray absorption spectroscopy has been utilized to monitor the bimolecular electron transfer in a photocatalytic water splitting system for the first time. This has been possible by uniting the local probe and element specific character of X-ray transitions with insights from high-level ab initio calculations. The specific target has been a heteroleptic [Ir$^{\rm III}$(ppy)$_2$(bpy)]… ▽ More Time-resolved X-ray absorption spectroscopy has been utilized to monitor the bimolecular electron transfer in a photocatalytic water splitting system for the first time. This has been possible by uniting the local probe and element specific character of X-ray transitions with insights from high-level ab initio calculations. The specific target has been a heteroleptic [Ir$^{\rm III}$(ppy)$_2$(bpy)]$^+$ photosensitizer, in combination with triethylamine as a sacrificial reductant and Fe$_3$(CO)$_{12}$ as a water reduction catalyst. The relevant molecular transitions have been characterized via high-resolution Ir L-edge X-ray absorption spectroscopy on the picosecond time scale. The present findings enhance our understanding of functionally relevant bimolecular electron transfer reactions and thus will pave the road to rational optimization of photocatalytic performance. △ Less

Submitted 9 October, 2020; v1 submitted 4 November, 2019; originally announced November 2019.

Comments: revised version

Showing 1–5 of 5 results for author: Beller, M