Skip to main content

Showing 1–14 of 14 results for author: Bird, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.15100  [pdf, other

    cs.SE cs.LG

    Studying LLM Performance on Closed- and Open-source Data

    Authors: Toufique Ahmed, Christian Bird, Premkumar Devanbu, Saikat Chakraborty

    Abstract: Large Language models (LLMs) are finding wide use in software engineering practice. These models are extremely data-hungry, and are largely trained on open-source (OSS) code distributed with permissive licenses. In terms of actual use however, a great deal of software development still occurs in the for-profit/proprietary sphere, where the code under development is not, and never has been, in the… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  2. arXiv:2311.00236  [pdf, other

    cs.SE

    Objectives and Key Results in Software Teams: Challenges, Opportunities and Impact on Development

    Authors: Jenna Butler, Thomas Zimmermann, Christian Bird

    Abstract: Building software, like building almost anything, requires people to understand a common goal and work together towards it. In large software companies, a VP or Director will have an idea or goal and it is often the job of middle management to distill that lofty, general idea into manageable, finite units of work. How do organizations do this hard work of setting and measuring progress towards goa… ▽ More

    Submitted 31 October, 2023; originally announced November 2023.

    Comments: 11 pages, 2 figures

  3. arXiv:2310.01727  [pdf, other

    cs.SE cs.AI

    Can GPT-4 Replicate Empirical Software Engineering Research?

    Authors: Jenny T. Liang, Carmen Badea, Christian Bird, Robert DeLine, Denae Ford, Nicole Forsgren, Thomas Zimmermann

    Abstract: Empirical software engineering research on production systems has brought forth a better understanding of the software engineering process for practitioners and researchers alike. However, only a small subset of production systems is studied, limiting the impact of this research. While software engineering practitioners could benefit from replicating research on their own data, this poses its own… ▽ More

    Submitted 19 June, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

  4. arXiv:2309.05529  [pdf, other

    cs.AI math.ST

    On the meaning of uncertainty for ethical AI: philosophy and practice

    Authors: Cassandra Bird, Daniel Williamson, Sabina Leonelli

    Abstract: Whether and how data scientists, statisticians and modellers should be accountable for the AI systems they develop remains a controversial and highly debated topic, especially given the complexity of AI systems and the difficulties in comparing and synthesising competing claims arising from their deployment for data analysis. This paper proposes to address this issue by decreasing the opacity and… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

    Comments: 26 pages, 2 figures

  5. arXiv:2307.05543  [pdf, ps, other

    cs.CY

    Typology of Risks of Generative Text-to-Image Models

    Authors: Charlotte Bird, Eddie L. Ungless, Atoosa Kasirzadeh

    Abstract: This paper investigates the direct risks and harms associated with modern text-to-image generative models, such as DALL-E and Midjourney, through a comprehensive literature review. While these models offer unprecedented capabilities for generating images, their development and use introduce new types of risk that require careful consideration. Our review reveals significant knowledge gaps concerni… ▽ More

    Submitted 8 July, 2023; originally announced July 2023.

    Comments: Accepted for publication in 2023 AAAI/ACM Conference on AI, Ethics, and Society (AIES 2023)

  6. arXiv:2202.02385  [pdf, other

    cs.SE cs.AI

    Using Large-scale Heterogeneous Graph Representation Learning for Code Review Recommendations at Microsoft

    Authors: Jiyang Zhang, Chandra Maddila, Ram Bairi, Christian Bird, Ujjwal Raizada, Apoorva Agrawal, Yamini Jhawar, Kim Herzig, Arie van Deursen

    Abstract: Code review is an integral part of any mature software development process, and identifying the best reviewer for a code change is a well-accepted problem within the software engineering community. Selecting a reviewer who lacks expertise and understanding can slow development or result in more defects. To date, most reviewer recommendation systems rely primarily on historical file change and revi… ▽ More

    Submitted 2 February, 2023; v1 submitted 4 February, 2022; originally announced February 2022.

    Comments: ICSE 2023 Software Engineering in Practice (camera ready)

  7. Program Merge Conflict Resolution via Neural Transformers

    Authors: Alexey Svyatkovskiy, Sarah Fakhoury, Negar Ghorbani, Todd Mytkowicz, Elizabeth Dinella, Christian Bird, **u Jang, Neel Sundaresan, Shuvendu Lahiri

    Abstract: Collaborative software development is an integral part of the modern software development life cycle, essential to the success of large-scale software projects. When multiple developers make concurrent changes around the same lines of code, a merge conflict may occur. Such conflicts stall pull requests and continuous integration pipelines for hours to several days, seriously hurting developer prod… ▽ More

    Submitted 29 November, 2022; v1 submitted 31 August, 2021; originally announced September 2021.

    Comments: ESEC/FSE '22 camera ready version. 12 pages, 4 figures, online appendix

  8. arXiv:2105.07569  [pdf, other

    cs.SE

    DeepMerge: Learning to Merge Programs

    Authors: Elizabeth Dinella, Todd Mytkowicz, Alexey Svyatkovskiy, Christian Bird, Mayur Naik, Shuvendu K. Lahiri

    Abstract: In collaborative software development, program merging is the mechanism to integrate changes from multiple programmers. Merge algorithms in modern version control systems report a conflict when changes interfere textually. Merge conflicts require manual intervention and frequently stall modern continuous integration pipelines. Prior work found that, although costly, a large majority of resolutions… ▽ More

    Submitted 6 September, 2021; v1 submitted 16 May, 2021; originally announced May 2021.

    Comments: 11 pages

  9. arXiv:2101.06542  [pdf, other

    cs.SE cs.LG cs.PL

    ConE: A Concurrent Edit Detection Tool for Large Scale Software Development

    Authors: Chandra Maddila, Nachiappan Nagappan, Christian Bird, Georgios Gousios, Arie van Deursen

    Abstract: Modern, complex software systems are being continuously extended and adjusted. The developers responsible for this may come from different teams or organizations, and may be distributed over the world. This may make it difficult to keep track of what other developers are doing, which may result in multiple developers concurrently editing the same code areas. This, in turn, may lead to hard-to-merg… ▽ More

    Submitted 25 September, 2021; v1 submitted 16 January, 2021; originally announced January 2021.

    Journal ref: ACM Transactions on Software Engineering and Methodology (TOSEM), 2022, 31(2)

  10. arXiv:2008.11147  [pdf, other

    cs.SE cs.CY cs.HC

    A Tale of Two Cities: Software Developers Working from Home During the COVID-19 Pandemic

    Authors: Denae Ford, Margaret-Anne Storey, Thomas Zimmermann, Christian Bird, Sonia Jaffe, Chandra Maddila, Jenna L. Butler, Brian Houck, Nachiappan Nagappan

    Abstract: The COVID-19 pandemic has shaken the world to its core and has provoked an overnight exodus of developers that normally worked in an office setting to working from home. The magnitude of this shift and the factors that have accompanied this new unplanned work setting go beyond what the software engineering community has previously understood to be remote work. To find out how developers and their… ▽ More

    Submitted 10 September, 2021; v1 submitted 25 August, 2020; originally announced August 2020.

    Comments: 36 pages, 1 figure, 6 tables

    Journal ref: ACM Transactions on Software Engineering and Methodology, Volume 31, Issue 2 (April 2022)

  11. A Dataset of Dockerfiles

    Authors: Jordan Henkel, Christian Bird, Shuvendu K. Lahiri, Thomas Reps

    Abstract: Dockerfiles are one of the most prevalent kinds of DevOps artifacts used in industry. Despite their prevalence, there is a lack of sophisticated semantics-aware static analysis of Dockerfiles. In this paper, we introduce a dataset of approximately 178,000 unique Dockerfiles collected from GitHub. To enhance the usability of this data, we describe five representations we have devised for working wi… ▽ More

    Submitted 28 March, 2020; originally announced March 2020.

    Comments: Published as a Data Showcase in MSR'2020

  12. Learning from, Understanding, and Supporting DevOps Artifacts for Docker

    Authors: Jordan Henkel, Christian Bird, Shuvendu K. Lahiri, Thomas Reps

    Abstract: With the growing use of DevOps tools and frameworks, there is an increased need for tools and techniques that support more than code. The current state-of-the-art in static developer assistance for tools like Docker is limited to shallow syntactic validation. We identify three core challenges in the realm of learning from, understanding, and supporting developers writing DevOps artifacts: (i) nest… ▽ More

    Submitted 7 February, 2020; originally announced February 2020.

    Comments: Published in ICSE'2020

  13. arXiv:1908.02617  [pdf

    cs.CY

    Towards Requirements for a Demand Side Response Energy Management System for Households

    Authors: Caroline Bird, Ruzanna Chitchyan

    Abstract: Demand response is considered to be one of the key means through which peak energy demand could be ameliorated. This report presents a requirements elicitation exercise (undertaken in collaboration with the Bristol City Council, UK) to elicit the requirements that a smart appliance automation service for domestic energy demand response management must address to be accepted by the households. The… ▽ More

    Submitted 11 August, 2019; v1 submitted 6 August, 2019; originally announced August 2019.

    ACM Class: D.2.1; K.4.2

  14. Learning Natural Coding Conventions

    Authors: Miltiadis Allamanis, Earl T. Barr, Christian Bird, Charles Sutton

    Abstract: Every programmer has a characteristic style, ranging from preferences about identifier naming to preferences about object relationships and design patterns. Coding conventions define a consistent syntactic style, fostering readability and hence maintainability. When collaborating, programmers strive to obey a project's coding conventions. However, one third of reviews of changes contain feedback a… ▽ More

    Submitted 7 April, 2014; v1 submitted 17 February, 2014; originally announced February 2014.