Skip to main content

Showing 1–29 of 29 results for author: Harman, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.15374  [pdf, other

    cs.SE

    Enhancing Testing at Meta with Rich-State Simulated Populations

    Authors: Nadia Alshahwan, Arianna Blasi, Kinga Bojarczuk, Andrea Ciancone, Natalija Gucevska, Mark Harman, Simon Schellaert, Inna Harper, Yue Jia, Michał Królikowski, Will Lewis, Dragos Martac, Rubmary Rojas, Kate Ustiuzhanina

    Abstract: This paper reports the results of the deployment of Rich-State Simulated Populations at Meta for both automated and manual testing. We use simulated users (aka test users) to mimic user interactions and acquire state in much the same way that real user accounts acquire state. For automated testing, we present empirical results from deployment on the Facebook, Messenger, and Instagram apps for iOS… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: ICSE 2024

  2. arXiv:2402.09171  [pdf, other

    cs.SE

    Automated Unit Test Improvement using Large Language Models at Meta

    Authors: Nadia Alshahwan, Jubin Chheda, Anastasia Finegenova, Beliz Gokkaya, Mark Harman, Inna Harper, Alexandru Marginean, Shubho Sengupta, Eddy Wang

    Abstract: This paper describes Meta's TestGen-LLM tool, which uses LLMs to automatically improve existing human-written tests. TestGen-LLM verifies that its generated test classes successfully clear a set of filters that assure measurable improvement over the original test suite, thereby eliminating problems due to LLM hallucination. We describe the deployment of TestGen-LLM at Meta test-a-thons for the Ins… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: 12 pages, 8 figures, 32nd ACM Symposium on the Foundations of Software Engineering (FSE 24)

  3. arXiv:2402.06111  [pdf, other

    cs.SE

    Observation-based unit test generation at Meta

    Authors: Nadia Alshahwan, Mark Harman, Alexandru Marginean, Rotem Tal, Eddy Wang

    Abstract: TestGen automatically generates unit tests, carved from serialized observations of complex objects, observed during app execution. We describe the development and deployment of TestGen at Meta. In particular, we focus on the scalability challenges overcome during development in order to deploy observation-based test carving at scale in industry. So far, TestGen has landed 518 tests into production… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: 12 pages, 8 figures, FSE 2024, Mon 15 - Fri 19 July 2024, Porto de Galinhas, Brazil

  4. arXiv:2402.04380  [pdf, other

    cs.SE

    Assured LLM-Based Software Engineering

    Authors: Nadia Alshahwan, Mark Harman, Inna Harper, Alexandru Marginean, Shubho Sengupta, Eddy Wang

    Abstract: In this paper we address the following question: How can we use Large Language Models (LLMs) to improve code independently of a human, while ensuring that the improved code - does not regress the properties of the original code? - improves the original in a verifiable and measurable way? To address this question, we advocate Assured LLM-Based Software Engineering; a generate-and-test approac… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: 6 pages, 1 figure, InteNSE 24: ACM International Workshop on Interpretability, Robustness, and Benchmarking in Neural Software Engineering, April, 2024, Lisbon, Portugal

  5. arXiv:2310.03533  [pdf, other

    cs.SE

    Large Language Models for Software Engineering: Survey and Open Problems

    Authors: Angela Fan, Beliz Gokkaya, Mark Harman, Mitya Lyubarskiy, Shubho Sengupta, Shin Yoo, Jie M. Zhang

    Abstract: This paper provides a survey of the emerging area of Large Language Models (LLMs) for Software Engineering (SE). It also sets out open research challenges for the application of LLMs to technical problems faced by software engineers. LLMs' emergent properties bring novelty and creativity with applications right across the spectrum of Software Engineering activities including coding, design, requir… ▽ More

    Submitted 11 November, 2023; v1 submitted 5 October, 2023; originally announced October 2023.

  6. arXiv:2308.15276  [pdf, other

    cs.SE

    Large Language Models in Fault Localisation

    Authors: Yonghao Wu, Zheng Li, Jie M. Zhang, Mike Papadakis, Mark Harman, Yong Liu

    Abstract: Large Language Models (LLMs) have shown promise in multiple software engineering tasks including code generation, program repair, code summarisation, and test generation. Fault localisation is instrumental in enabling automated debugging and repair of programs and was prominently featured as a highlight during the launch event of ChatGPT-4. Nevertheless, the performance of LLMs compared to state-o… ▽ More

    Submitted 2 October, 2023; v1 submitted 29 August, 2023; originally announced August 2023.

  7. arXiv:2308.13319  [pdf, other

    cs.SE

    COCO: Testing Code Generation Systems via Concretized Instructions

    Authors: Ming Yan, Junjie Chen, Jie M. Zhang, Xuejie Cao, Chen Yang, Mark Harman

    Abstract: Code generation systems have been extensively developed in recent years to generate source code based on natural language instructions. However, despite their advancements, these systems still face robustness issues where even slightly different instructions can result in significantly different code semantics. Robustness is critical for code generation systems, as it can have significant impacts… ▽ More

    Submitted 25 August, 2023; originally announced August 2023.

  8. arXiv:2308.02828  [pdf, other

    cs.SE

    LLM is Like a Box of Chocolates: the Non-determinism of ChatGPT in Code Generation

    Authors: Shuyin Ouyang, Jie M. Zhang, Mark Harman, Meng Wang

    Abstract: There has been a recent explosion of research on Large Language Models (LLMs) for software engineering tasks, in particular code generation. However, results from LLMs can be highly unstable; nondeterministically returning very different codes for the same prompt. Non-determinism is a potential menace to scientific conclusion validity. When non-determinism is high, scientific conclusions simply ca… ▽ More

    Submitted 5 August, 2023; originally announced August 2023.

  9. arXiv:2308.01923  [pdf, other

    cs.LG cs.AI cs.CY cs.SE

    Fairness Improvement with Multiple Protected Attributes: How Far Are We?

    Authors: Zhenpeng Chen, Jie M. Zhang, Federica Sarro, Mark Harman

    Abstract: Existing research mostly improves the fairness of Machine Learning (ML) software regarding a single protected attribute at a time, but this is unrealistic given that many users have multiple protected attributes. This paper conducts an extensive study of fairness improvement regarding multiple protected attributes, covering 11 state-of-the-art fairness improvement methods. We analyze the effective… ▽ More

    Submitted 4 April, 2024; v1 submitted 25 July, 2023; originally announced August 2023.

    Comments: Accepted by the 46th International Conference on Software Engineering (ICSE 2024). Please include ICSE in any citations

  10. arXiv:2302.02374  [pdf, other

    cs.SE

    Simulation-Driven Automated End-to-End Test and Oracle Inference

    Authors: Shreshth Tuli, Kinga Bojarczuk, Natalija Gucevska, Mark Harman, Xiao-Yu Wang, Graham Wright

    Abstract: This is the first work to report on inferential testing at scale in industry. Specifically, it reports the experience of automated testing of integrity systems at Meta. We built an internal tool called ALPACAS for automated inference of end-to-end integrity tests. Integrity tests are designed to keep users safe online by checking that interventions take place when harmful behaviour occurs on a pla… ▽ More

    Submitted 5 February, 2023; originally announced February 2023.

    Comments: Accepted in ICSE 2023 (SEIP Track)

  11. arXiv:2212.11762  [pdf, other

    cs.SE

    Kee** Mutation Test Suites Consistent and Relevant with Long-Standing Mutants

    Authors: Milos Ojdanic, Mike Papadakis, Mark Harman

    Abstract: Mutation testing has been demonstrated to be one of the most powerful fault-revealing tools in the tester's tool kit. Much previous work implicitly assumed it to be sufficient to re-compute mutant suites per release. Sadly, this makes mutation results inconsistent; mutant scores from each release cannot be directly compared, making it harder to measure test improvement. Furthermore, regular code c… ▽ More

    Submitted 22 December, 2022; originally announced December 2022.

  12. arXiv:2207.10223  [pdf, other

    cs.SE

    Fairness Testing: A Comprehensive Survey and Analysis of Trends

    Authors: Zhenpeng Chen, Jie M. Zhang, Max Hort, Mark Harman, Federica Sarro

    Abstract: Unfair behaviors of Machine Learning (ML) software have garnered increasing attention and concern among software engineers. To tackle this issue, extensive research has been dedicated to conducting fairness testing of ML software, and this paper offers a comprehensive survey of existing studies in this field. We collect 100 papers and organize them based on the testing workflow (i.e., how to test)… ▽ More

    Submitted 6 March, 2024; v1 submitted 20 July, 2022; originally announced July 2022.

    Comments: Accepted by ACM Transactions on Software Engineering and Methodology (TOSEM 2024). Please include TOSEM in any citations

  13. arXiv:2207.07068  [pdf, other

    cs.LG

    Bias Mitigation for Machine Learning Classifiers: A Comprehensive Survey

    Authors: Max Hort, Zhenpeng Chen, Jie M. Zhang, Mark Harman, Federica Sarro

    Abstract: This paper provides a comprehensive survey of bias mitigation methods for achieving fairness in Machine Learning (ML) models. We collect a total of 341 publications concerning bias mitigation for ML classifiers. These methods can be distinguished based on their intervention procedure (i.e., pre-processing, in-processing, post-processing) and the technique they apply. We investigate how existing bi… ▽ More

    Submitted 11 October, 2023; v1 submitted 14 July, 2022; originally announced July 2022.

    Comments: 52 pages, 7 figures

  14. arXiv:2207.03277  [pdf, other

    cs.SE cs.AI

    A Comprehensive Empirical Study of Bias Mitigation Methods for Machine Learning Classifiers

    Authors: Zhenpeng Chen, Jie M. Zhang, Federica Sarro, Mark Harman

    Abstract: Software bias is an increasingly important operational concern for software engineers. We present a large-scale, comprehensive empirical study of 17 representative bias mitigation methods for Machine Learning (ML) classifiers, evaluated with 11 ML performance metrics (e.g., accuracy), 4 fairness metrics, and 20 types of fairness-performance trade-off assessment, applied to 8 widely-adopted softwar… ▽ More

    Submitted 10 February, 2023; v1 submitted 7 July, 2022; originally announced July 2022.

    Comments: Accepted by ACM Transactions on Software Engineering and Methodology (TOSEM 2023). Please include TOSEM in any citations

  15. arXiv:2110.06773  [pdf, other

    cs.SE cs.CL cs.LG

    Leveraging Automated Unit Tests for Unsupervised Code Translation

    Authors: Baptiste Roziere, Jie M. Zhang, Francois Charton, Mark Harman, Gabriel Synnaeve, Guillaume Lample

    Abstract: With little to no parallel data available for programming languages, unsupervised methods are well-suited to source code translation. However, the majority of unsupervised machine translation approaches rely on back-translation, a method developed in the context of natural language translation and one that inherently involves training on noisy inputs. Unfortunately, source code is highly sensitive… ▽ More

    Submitted 16 February, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

  16. arXiv:2109.10834  [pdf, other

    astro-ph.SR astro-ph.IM cs.LG cs.NE physics.space-ph

    SCSS-Net: Solar Corona Structures Segmentation by Deep Learning

    Authors: Šimon Mackovjak, Martin Harman, Viera Maslej-Krešňáková, Peter Butka

    Abstract: Structures in the solar corona are the main drivers of space weather processes that might directly or indirectly affect the Earth. Thanks to the most recent space-based solar observatories, with capabilities to acquire high-resolution images continuously, the structures in the solar corona can be monitored over the years with a time resolution of minutes. For this purpose, we have developed a meth… ▽ More

    Submitted 22 September, 2021; originally announced September 2021.

    Comments: accepted for publication in Monthly Notices of the Royal Astronomical Society; for associated code, see https://github.com/space-lab-sk/scss-net

  17. arXiv:2011.10787  [pdf, other

    cs.SE

    An Empirical Study on Failed Error Propagation in Java Programs with Real Faults

    Authors: Gunel Jahangirova, David Clark, Mark Harman, Paolo Tonella

    Abstract: During testing, developers can place oracles externally or internally with respect to a method. Given a faulty execution state, i.e., one that differs from the expected one, an oracle might be unable to expose the fault if it is placed at a program point with no access to the incorrect program state or where the program state is no longer corrupted. In such a case, the oracle is subject to failed… ▽ More

    Submitted 21 November, 2020; originally announced November 2020.

  18. FrUITeR: A Framework for Evaluating UI Test Reuse

    Authors: Yixue Zhao, Justin Chen, Adriana Sejfia, Marcelo Schmitt Laser, Jie Zhang, Federica Sarro, Mark Harman, Nenad Medvidovic

    Abstract: UI testing is tedious and time-consuming due to the manual effort required. Recent research has explored opportunities for reusing existing UI tests from an app to automatically generate new tests for other apps. However, the evaluation of such techniques currently remains manual, unscalable, and unreproducible, which can waste effort and impede progress in this emerging area. We introduce FrUITeR… ▽ More

    Submitted 3 November, 2020; v1 submitted 7 August, 2020; originally announced August 2020.

    Comments: ESEC/FSE 2020

  19. arXiv:2004.07352  [pdf, other

    cs.SE cs.IR cs.LG

    Ownership at Large -- Open Problems and Challenges in Ownership Management

    Authors: John Ahlgren, Maria Eugenia Berezin, Kinga Bojarczuk, Elena Dulskyte, Inna Dvortsova, Johann George, Natalija Gucevska, Mark Harman, Shan He, Ralf Lämmel, Erik Meijer, Silvia Sapora, Justin Spahr-Summers

    Abstract: Software-intensive organizations rely on large numbers of software assets of different types, e.g., source-code files, tables in the data warehouse, and software configurations. Who is the most suitable owner of a given asset changes over time, e.g., due to reorganization and individual function changes. New forms of automation can help suggest more suitable owners for any given asset at a given p… ▽ More

    Submitted 15 April, 2020; originally announced April 2020.

    Comments: Author order is alphabetical. Contact author: Ralf Lämmel ([email protected]). The subject of the paper is covered by the contact author's keynote at the same conference

  20. arXiv:2004.05363  [pdf, other

    cs.SE cs.HC cs.LG cs.SI

    WES: Agent-based User Interaction Simulation on Real Infrastructure

    Authors: John Ahlgren, Maria Eugenia Berezin, Kinga Bojarczuk, Elena Dulskyte, Inna Dvortsova, Johann George, Natalija Gucevska, Mark Harman, Ralf Lämmel, Erik Meijer, Silvia Sapora, Justin Spahr-Summers

    Abstract: We introduce the Web-Enabled Simulation (WES) research agenda, and describe FACEBOOK's WW system. We describe the application of WW to reliability, integrity and privacy at FACEBOOK , where it is used to simulate social media interactions on an infrastructure consisting of hundreds of millions of lines of code. The WES agenda draws on research from many areas of study, including Search Based Softw… ▽ More

    Submitted 11 April, 2020; originally announced April 2020.

    Comments: Author order is alphabetical. Correspondence to Mark Harman ([email protected]). This paper appears in GI 2020: 8th International Workshop on Genetic Improvement

  21. arXiv:1912.03197  [pdf, other

    cs.SE

    FlakiMe: Laboratory-Controlled Test Flakiness Impact Assessment. A Case Study on Mutation Testing and Program Repair

    Authors: Maxime Cordy, Renaud Rwemalika, Mike Papadakis, Mark Harman

    Abstract: Much research on software testing makes an implicit assumption that test failures are deterministic such that they always witness the presence of the same defects. However, this assumption is not always true because some test failures are due to so-called flaky tests, i.e., tests with non-deterministic outcomes. Unfortunately, flaky tests have major implications for testing and test-dependent acti… ▽ More

    Submitted 6 December, 2019; originally announced December 2019.

  22. arXiv:1910.02688  [pdf, other

    cs.SE

    Automatic Testing and Improvement of Machine Translation

    Authors: Zeyu Sun, Jie M. Zhang, Mark Harman, Mike Papadakis, Lu Zhang

    Abstract: This paper presents TransRepair, a fully automatic approach for testing and repairing the consistency of machine translation systems. TransRepair combines mutation with metamorphic testing to detect inconsistency bugs (without access to human oracles). It then adopts probability-reference or cross-reference to post-process the translations, in a grey-box or black-box manner, to repair the inconsis… ▽ More

    Submitted 25 December, 2019; v1 submitted 7 October, 2019; originally announced October 2019.

  23. arXiv:1908.02480  [pdf, other

    cs.SE

    A Survey of Constrained Combinatorial Testing

    Authors: Huayao Wu, Changhai Nie, Justyna Petke, Yue Jia, Mark Harman

    Abstract: Combinatorial Testing (CT) is a potentially powerful testing technique, whereas its failure revealing ability might be dramatically reduced if it fails to handle constraints in an adequate and efficient manner. To ensure the wider applicability of CT in the presence of constrained problem domains, large and diverse efforts have been invested towards the techniques and applications of constrained c… ▽ More

    Submitted 7 August, 2019; originally announced August 2019.

  24. arXiv:1906.10742  [pdf, other

    cs.LG cs.AI cs.SE stat.ML

    Machine Learning Testing: Survey, Landscapes and Horizons

    Authors: Jie M. Zhang, Mark Harman, Lei Ma, Yang Liu

    Abstract: This paper provides a comprehensive survey of Machine Learning Testing (ML testing) research. It covers 144 papers on testing properties (e.g., correctness, robustness, and fairness), testing components (e.g., the data, learning program, and framework), testing workflow (e.g., test generation and test evaluation), and application scenarios (e.g., autonomous driving, machine translation). The paper… ▽ More

    Submitted 21 December, 2019; v1 submitted 19 June, 2019; originally announced June 2019.

  25. arXiv:1905.12734  [pdf, other

    cs.PL

    Sub-Turing Islands in the Wild

    Authors: Earl T. Barr, David W. Binkley, Mark Harman, Mohamed Nassim Seghir

    Abstract: Recently, there has been growing debate as to whether or not static analysis can be truly sound. In spite of this concern, research on techniques seeking to at least partially answer undecidable questions has a long history. However, little attention has been given to the more empirical question of how often an exact solution might be given to a question despite the question being, at least in the… ▽ More

    Submitted 29 May, 2019; originally announced May 2019.

  26. arXiv:1905.10201  [pdf, other

    cs.LG stat.ML

    Model Validation Using Mutated Training Labels: An Exploratory Study

    Authors: Jie M. Zhang, Mark Harman, Benjamin Guedj, Earl T. Barr, John Shawe-Taylor

    Abstract: We introduce an exploratory study on Mutation Validation (MV), a model validation method using mutated training labels for supervised learning. MV mutates training data labels, retrains the model against the mutated data, then uses the metamorphic relation that captures the consequent training performance changes to assess model fit. It does not use a validation set or test set. The intuition unde… ▽ More

    Submitted 20 October, 2021; v1 submitted 24 May, 2019; originally announced May 2019.

  27. arXiv:1806.10235  [pdf, other

    cs.SE

    Indexing Operators to Extend the Reach of Symbolic Execution

    Authors: Earl T. Barr, David Clark, Mark Harman, Alexandru Marginean

    Abstract: Traditional program analysis analyses a program language, that is, all programs that can be written in the language. There is a difference, however, between all possible programs that can be written and the corpus of actual programs written in a language. We seek to exploit this difference: for a given program, we apply a bespoke program transformation Indexify to convert expressions that current… ▽ More

    Submitted 26 June, 2018; originally announced June 2018.

  28. arXiv:1801.01025  [pdf, other

    cs.SE

    A Study of Bug Resolution Characteristics in Popular Programming Languages

    Authors: Jie M. Zhang, Feng Li, Dan Hao, Meng Wang, Hao Tang, Lu Zhang, Mark Harman

    Abstract: This paper presents a large-scale study that investigates the bug resolution characteristics among popular Github projects written in different programming languages. We explore correlations but, of course, we cannot infer causation. Specifically, we analyse bug resolution data from approximately 70 million Source Line of Code, drawn from 3 million commits to 600 GitHub projects, primarily written… ▽ More

    Submitted 4 January, 2020; v1 submitted 3 January, 2018; originally announced January 2018.

    Journal ref: Transactions on Software Engineering 2020

  29. arXiv:1306.5667  [pdf, other

    cs.NE cs.AI

    Using Genetic Programming to Model Software

    Authors: W. B. Langdon, M. Harman

    Abstract: We study a generic program to investigate the scope for automatically customising it for a vital current task, which was not considered when it was first written. In detail, we show genetic programming (GP) can evolve models of aspects of BLAST's output when it is used to map Solexa Next-Gen DNA sequences to the human genome.

    Submitted 24 June, 2013; originally announced June 2013.

    Comments: As UCL computer science Technical Report RN/13/12

    Report number: RN/13/12