Skip to main content

Showing 1–16 of 16 results for author: German, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.09165  [pdf, other

    cs.SE

    An Empirical Study of Token-based Micro Commits

    Authors: Masanari Kondo, Daniel M. German, Yasutaka Kamei, Naoyasu Ubayashi, Osamu Mizuno

    Abstract: In software development, developers frequently apply maintenance activities to the source code that change a few lines by a single commit. A good understanding of the characteristics of such small changes can support quality assurance approaches (e.g., automated program repair), as it is likely that small changes are addressing deficiencies in other changes; thus, understanding the reasons for cre… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  2. arXiv:2405.01560  [pdf, ps, other

    cs.SE cs.CY

    Copyright related risks in the creation and use of ML/AI systems

    Authors: Daniel M. German

    Abstract: This paper summarizes the current copyright related risks that Machine Learning (ML) and Artificial Intelligence (AI) systems (including Large Language Models --LLMs) incur. These risks affect different stakeholders: owners of the copyright of the training data, the users of ML/AI systems, the creators of trained models, and the operators of AI systems. This paper also provides an overview of ongo… ▽ More

    Submitted 26 March, 2024; originally announced May 2024.

    MSC Class: 68-04 ACM Class: K.5.1

  3. What Is an App Store? The Software Engineering Perspective

    Authors: Wenhan Zhu, Sebastian Proksch, Daniel M. German, Michael W. Godfrey, Li Li, Shane McIntosh

    Abstract: "App stores" are online software stores where end users may browse, purchase, download, and install software applications. By far, the best known app stores are associated with mobile platforms, such as Google Play for Android and Apple's App Store for iOS. The ubiquity of smartphones has led to mobile app stores becoming a touchstone experience of modern living. However, most of app store researc… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

    Comments: 41 pages

    Journal ref: Empir Software Eng 29, 35 (2024)

  4. BOMs Away! Inside the Minds of Stakeholders: A Comprehensive Study of Bills of Materials for Software Systems

    Authors: Trevor Stalnaker, Nathan Wintersgill, Oscar Chaparro, Massimiliano Di Penta, Daniel M German, Denys Poshyvanyk

    Abstract: Software Bills of Materials (SBOMs) have emerged as tools to facilitate the management of software dependencies, vulnerabilities, licenses, and the supply chain. While significant effort has been devoted to increasing SBOM awareness and develo** SBOM formats and tools, recent studies have shown that SBOMs are still an early technology not yet adequately adopted in practice. Expanding on previous… ▽ More

    Submitted 22 September, 2023; v1 submitted 21 September, 2023; originally announced September 2023.

    Comments: 11 pages, ICSE 2024

  5. Using the Uniqueness of Global Identifiers to Determine the Provenance of Python Software Source Code

    Authors: Yiming Sun, Daniel M. German, Stefano Zacchiroli

    Abstract: We consider the problem of identifying the provenance of free/open source software (FOSS) and specifically the need of identifying where reused source code has been copied from. We propose a lightweight approach to solve the problem based on software identifiers-such as the names of variables, classes, and functions chosen by programmers. The proposed approach is able to efficiently narrow down to… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

    Journal ref: Empirical Software Engineering, In press

  6. Do I Belong? Modeling Sense of Virtual Community Among Linux Kernel Contributors

    Authors: Bianca Trinkenreich, Klaas-Jan Stol, Anita Sarma, Daniel M. German, Marco A. Gerosa, Igor Steinmacher

    Abstract: The sense of belonging to a community is a basic human need that impacts an individuals behavior, long-term engagement, and job satisfaction, as revealed by research in disciplines such as psychology, healthcare, and education. Despite much research on how to retain developers in Open Source Software projects and other virtual, peer-production communities, there is a paucity of research investigat… ▽ More

    Submitted 22 February, 2023; v1 submitted 16 January, 2023; originally announced January 2023.

    Journal ref: 45th IEEE/ACM International Conference on Software Engineering (ICSE 2023)

  7. arXiv:2111.02374  [pdf, other

    cs.LG cs.AI cs.SE

    Can I use this publicly available dataset to build commercial AI software? -- A Case Study on Publicly Available Image Datasets

    Authors: Gopi Krishnan Rajbahadur, Erika Tuck, Li Zi, Dayi Lin, Boyuan Chen, Zhen Ming, Jiang, Daniel M. German

    Abstract: Publicly available datasets are one of the key drivers for commercial AI software. The use of publicly available datasets is governed by dataset licenses. These dataset licenses outline the rights one is entitled to on a given dataset and the obligations that one must fulfil to enjoy such rights without any license compliance violations. Unlike standardized Open Source Software (OSS) licenses, exi… ▽ More

    Submitted 11 April, 2022; v1 submitted 3 November, 2021; originally announced November 2021.

    Comments: This is revised version of the paper with updated co-authors

  8. arXiv:2110.00361  [pdf, other

    cs.SE

    An analysis of open source software licensing questions in Stack Exchange sites

    Authors: Maria Papoutsoglou, Georgia M. Kapitsaki, Daniel German, Lefteris Angelis

    Abstract: Free and open source software is widely used in the creation of software systems, whereas many organisations choose to provide their systems as open source. Open source software carries licenses that determine the conditions under which the original software can be used. Appropriate use of licenses requires relevant expertise by the practitioners, and has an important legal angle. Educators and em… ▽ More

    Submitted 1 October, 2021; originally announced October 2021.

  9. arXiv:2003.05615  [pdf, other

    cs.SE

    Code Clone Matching: A Practical and Effective Approach to Find Code Snippets

    Authors: Katsuro Inoue, Yuya Miyamoto, Daniel M. German, Takashi Ishio

    Abstract: Finding the same or similar code snippets in source code is one of fundamental activities in software maintenance. Text-based pattern matching tools such as grep is frequently used for such purpose, but making proper queries for the expected result is not easy. Code clone detectors could be used but their features and result are generally excessive. In this paper, we propose Code Clone matching (C… ▽ More

    Submitted 12 March, 2020; originally announced March 2020.

    Comments: 11 pages, for downloading ccgrep, https://github.com/yuy-m/CCGrep

  10. Google Summer of Code: Student Motivations and Contributions

    Authors: Jefferson O. Silva, Igor Wiese, Daniel M. German, Christoph Treude, Marco A. Gerosa, Igor Steinmacher

    Abstract: Several open source software (OSS) projects expect to foster newcomers' onboarding and to receive contributions by participating in engagement programs, like Summers of Code. However, there is little empirical evidence showing why students join such programs. In this paper, we study the well-established Google Summer of Code (GSoC), which is a 3-month OSS engagement program that offers stipends an… ▽ More

    Submitted 13 October, 2019; originally announced October 2019.

    Comments: 30 pages

    Journal ref: Journal of Systems and Software (JSS), V. 162, April 2020, 110487

  11. arXiv:1809.07954  [pdf, other

    cs.SE cs.LG stat.ML

    Predicting the Programming Language of Questions and Snippets of StackOverflow Using Natural Language Processing

    Authors: Kamel Alreshedy, Dhanush Dharmaretnam, Daniel M. German, Venkatesh Srinivasan, T. Aaron Gulliver

    Abstract: Stack Overflow is the most popular Q&A website among software developers. As a platform for knowledge sharing and acquisition, the questions posted in Stack Overflow usually contain a code snippet. Stack Overflow relies on users to properly tag the programming language of a question and it simply assumes that the programming language of the snippets inside a question is the same as the tag of the… ▽ More

    Submitted 21 September, 2018; originally announced September 2018.

  12. arXiv:1809.07945  [pdf, other

    cs.SE cs.LG stat.ML

    SCC: Automatic Classification of Code Snippets

    Authors: Kamel Alreshedy, Dhanush Dharmaretnam, Daniel M. German, Venkatesh Srinivasan, T. Aaron Gulliver

    Abstract: Determining the programming language of a source code file has been considered in the research community; it has been shown that Machine Learning (ML) and Natural Language Processing (NLP) algorithms can be effective in identifying the programming language of source code files. However, determining the programming language of a code snippet or a few lines of source code is still a challenging task… ▽ More

    Submitted 21 September, 2018; originally announced September 2018.

    Journal ref: Working Conference on Source Code Analysis & Manipulation 2018

  13. arXiv:1709.09474  [pdf, other

    cs.SE

    An Empirical Study on the Impact of Refactoring Activities on Evolving Client-Used APIs

    Authors: Raula Gaikovina Kula, Ali Ouni, Daniel M. German, Katsuro Inoue

    Abstract: Context: Refactoring is recognized as an effective practice to maintain evolving software systems. For software libraries, we study how library developers refactor their Application Programming Interfaces (APIs), especially when it impacts client users by breaking an API of the library. Objective: Our work aims to understand how clients that use a library API are affected by refactoring activities… ▽ More

    Submitted 28 September, 2017; v1 submitted 27 September, 2017; originally announced September 2017.

    Comments: Information and Software Technology Journal

  14. arXiv:1709.04638  [pdf, other

    cs.SE

    On the Impact of Micro-Packages: An Empirical Study of the npm JavaScript Ecosystem

    Authors: Raula Gaikovina Kula, Ali Ouni, Daniel M. German, Katsuro Inoue

    Abstract: The rise of user-contributed Open Source Software (OSS) ecosystems demonstrate their prevalence in the software engineering discipline. Libraries work together by depending on each other across the ecosystem. From these ecosystems emerges a minimized library called a micro-package. Micro- packages become problematic when breaks in a critical ecosystem dependency ripples its effects to unsuspecting… ▽ More

    Submitted 14 September, 2017; originally announced September 2017.

    Comments: Submitted 2017

  15. arXiv:1709.04626  [pdf, other

    cs.SE

    Modeling Library Dependencies and Updates in Large Software Repository Universes

    Authors: Raula Gaikovina Kula, Coen De Roover, Daniel M. German, Takashi Ishio, Katsuro Inoue

    Abstract: Popular (re)use of third-party open-source software (OSS) is evidence of the impact of hosting repositories like maven on software development today. Updating libraries is crucial, with recent studies highlighting the associated vulnerabilities with aging OSS libraries. The decision to migrate to a newer library can range from trivial (security threat) to complex (assessment of work required to ac… ▽ More

    Submitted 14 September, 2017; originally announced September 2017.

    Comments: First Version October 15th 2015

    Report number: Report20151015

  16. Do Developers Update Their Library Dependencies? An Empirical Study on the Impact of Security Advisories on Library Migration

    Authors: Raula Gaikovina Kula, Daniel M. German, Ali Ouni, Takashi Ishio, Katsuro Inoue

    Abstract: Third-party library reuse has become common practice in contemporary software development, as it includes several benefits for developers. Library dependencies are constantly evolving, with newly added features and patches that fix bugs in older versions. To take full advantage of third-party reuse, developers should always keep up to date with the latest versions of their library dependencies. In… ▽ More

    Submitted 14 September, 2017; originally announced September 2017.

    Comments: 37 Pages

    Journal ref: Empirical Software Engineering 2017