-
Exploring Spoken Language Identification Strategies for Automatic Transcription of Multilingual Broadcast and Institutional Speech
Authors:
Martina Valente,
Fabio Brugnara,
Giovanni Morrone,
Enrico Zovato,
Leonardo Badino
Abstract:
This paper addresses spoken language identification (SLI) and speech recognition of multilingual broadcast and institutional speech, real application scenarios that have been rarely addressed in the SLI literature. Observing that in these domains language changes are mostly associated with speaker changes, we propose a cascaded system consisting of speaker diarization and language identification a…
▽ More
This paper addresses spoken language identification (SLI) and speech recognition of multilingual broadcast and institutional speech, real application scenarios that have been rarely addressed in the SLI literature. Observing that in these domains language changes are mostly associated with speaker changes, we propose a cascaded system consisting of speaker diarization and language identification and compare it with more traditional language identification and language diarization systems. Results show that the proposed system often achieves lower language classification and language diarization error rates (up to 10% relative language diarization error reduction and 60% relative language confusion reduction) and leads to lower WERs on multilingual test sets (more than 8% relative WER reduction), while at the same time does not negatively affect speech recognition on monolingual audio (with an absolute WER increase between 0.1% and 0.7% w.r.t. monolingual ASR).
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Broadband ferromagnetic resonance in Mn-doped Li ferrite nanoparticles
Authors:
P. Hernandez-Gomez,
J. M. Muñoz,
M. A. Valente,
M. P. F. Graça
Abstract:
Lithium ferrites are well known materials due to their numerous technological applications especially in microwave devices. Mn-doped lithium ferrite nanoparticles were prepared by sol-gel technique by means of Pechini method, and then annealed at different temperatures in 250 to 1000 °C range. XRD confirms spinel formation with particle size in the 15 to 200 nm range, with increased size with anne…
▽ More
Lithium ferrites are well known materials due to their numerous technological applications especially in microwave devices. Mn-doped lithium ferrite nanoparticles were prepared by sol-gel technique by means of Pechini method, and then annealed at different temperatures in 250 to 1000 °C range. XRD confirms spinel formation with particle size in the 15 to 200 nm range, with increased size with annealing temperature. Microwave magnetoabsorption data of annealed lithium ferrite nanoparticles, obtained with a broadband system based on a network analyzer operating up to 8.5 GHz are presented. At fields up to 200 mT we can observe a broad absorption peak that shifts to higher frequencies with magnetic field according to ferromagnetic resonance theory. The amplitude of absorption, up to 85%, together with the frequency width of about 4.5 GHz makes this material suitable as wave absorber. Samples annealed at higher temperatures show a behaviour similar to polycrystalline samples, thus suggesting their multidomain character
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
End-to-End Software Construction using ChatGPT: An Experience Report
Authors:
Mauricio Monteiro,
Bruno Castelo Branco,
Samuel Silvestre,
Guilherme Avelino,
Marco Tulio Valente
Abstract:
In this paper, we explore the application of Large Language Models (LLMs) in the particular context of end-to-end software construction, i.e., in contexts where software developers have a set of requirements and have to design, implement, test, and validate a new software system. Particularly, we report an experiment where we asked three software developers to use ChatGPT to fully implement a Web-…
▽ More
In this paper, we explore the application of Large Language Models (LLMs) in the particular context of end-to-end software construction, i.e., in contexts where software developers have a set of requirements and have to design, implement, test, and validate a new software system. Particularly, we report an experiment where we asked three software developers to use ChatGPT to fully implement a Web-based application using mainstream software architectures and technologies. After that, we compare the apps produced by ChatGPT with a reference implementation that we manually implemented for our research. As a result, we document four categories of prompts that can be used by developers in similar contexts, including initialization prompts, feature requests, bug-fixing, and layout prompts. Additionally, we discuss the advantages and disadvantages of two prompt construction approaches: top-down (where we start with a high-level description of the target software, typically in the form of user stories) and bottom-up (where we request the construction of the system feature by feature).
△ Less
Submitted 23 October, 2023;
originally announced October 2023.
-
Lessons learned after three years of SPIDER operation and the first MITICA integrated tests
Authors:
D. Marcuzzi,
V. Toigo,
M. Boldrin,
G. Chitarin,
S. Dal Bello,
L. Grando,
A. Luchetta,
R. Pasqualotto,
M. Pavei,
G. Serianni,
L. Zanotto,
R. Agnello,
P. Agostinetti,
M. Agostini,
D. Aprile,
M. Barbisan,
M. Battistella,
G. Berton,
M. Bigi,
M. Brombin,
V. Candela,
V. Candeloro,
A. Canton,
R. Casagrande,
C. Cavallini
, et al. (117 additional authors not shown)
Abstract:
ITER envisages the use of two heating neutral beam injectors plus an optional one as part of the auxiliary heating and current drive system. The 16.5 MW expected neutral beam power per injector is several notches higher than worldwide existing facilities. A Neutral Beam Test Facility (NBTF) was established at Consorzio RFX, exploiting the synergy of two test beds, SPIDER and MITICA. SPIDER is dedi…
▽ More
ITER envisages the use of two heating neutral beam injectors plus an optional one as part of the auxiliary heating and current drive system. The 16.5 MW expected neutral beam power per injector is several notches higher than worldwide existing facilities. A Neutral Beam Test Facility (NBTF) was established at Consorzio RFX, exploiting the synergy of two test beds, SPIDER and MITICA. SPIDER is dedicated to develo** and characterizing large efficient negative ion sources at relevant parameters in ITER-like conditions: source and accelerator located in the same vacuum where the beam propagates, immunity to electromagnetic interferences of multiple radio-frequency (RF) antennas, avoidance of RF-induced discharges on the outside of the source. Three years of experiments on SPIDER have addressed to the necessary design modifications to enable full performances. The source is presently under a long shut-down phase to incorporate learnings from the experimental campaign. Parallelly, developments on MITICA, the full-scale prototype of the ITER NBI featuring a 1 MV accelerator and ion neutralization, are underway including manufacturing of in-vessel components, while power supplies and auxiliary plants are already under final testing and commissioning. Integration, commissioning and tests of the 1MV power supplies are essential for this first-of-kind system, unparalleled both in research and industry field. The integrated test to confirm 1MV output by combining invertor systems, DC generators and transmission lines extracted errors/accidents in some components. To realize a concrete system for ITER, solutions for the repair and the improvement of the system were developed. Hence, NBTF is emerging as a necessary facility, due to the large gap with existing injectors, effectively dedicated to identify issues and find solutions to enable successful ITER NBI operations in a time bound fashion.
△ Less
Submitted 4 April, 2023;
originally announced April 2023.
-
Towards a Muon Collider
Authors:
Carlotta Accettura,
Dean Adams,
Rohit Agarwal,
Claudia Ahdida,
Chiara Aimè,
Nicola Amapane,
David Amorim,
Paolo Andreetto,
Fabio Anulli,
Robert Appleby,
Artur Apresyan,
Aram Apyan,
Sergey Arsenyev,
Pouya Asadi,
Mohammed Attia Mahmoud,
Aleksandr Azatov,
John Back,
Lorenzo Balconi,
Laura Bandiera,
Roger Barlow,
Nazar Bartosik,
Emanuela Barzi,
Fabian Batsch,
Matteo Bauce,
J. Scott Berg
, et al. (272 additional authors not shown)
Abstract:
A muon collider would enable the big jump ahead in energy reach that is needed for a fruitful exploration of fundamental interactions. The challenges of producing muon collisions at high luminosity and 10 TeV centre of mass energy are being investigated by the recently-formed International Muon Collider Collaboration. This Review summarises the status and the recent advances on muon colliders desi…
▽ More
A muon collider would enable the big jump ahead in energy reach that is needed for a fruitful exploration of fundamental interactions. The challenges of producing muon collisions at high luminosity and 10 TeV centre of mass energy are being investigated by the recently-formed International Muon Collider Collaboration. This Review summarises the status and the recent advances on muon colliders design, physics and detector studies. The aim is to provide a global perspective of the field and to outline directions for future work.
△ Less
Submitted 27 November, 2023; v1 submitted 15 March, 2023;
originally announced March 2023.
-
Muon Collider Forum Report
Authors:
K. M. Black,
S. **dariani,
D. Li,
F. Maltoni,
P. Meade,
D. Stratakis,
D. Acosta,
R. Agarwal,
K. Agashe,
C. Aime,
D. Ally,
A. Apresyan,
A. Apyan,
P. Asadi,
D. Athanasakos,
Y. Bao,
E. Barzi,
N. Bartosik,
L. A. T. Bauerdick,
J. Beacham,
S. Belomestnykh,
J. S. Berg,
J. Berryhill,
A. Bertolin,
P. C. Bhat
, et al. (160 additional authors not shown)
Abstract:
A multi-TeV muon collider offers a spectacular opportunity in the direct exploration of the energy frontier. Offering a combination of unprecedented energy collisions in a comparatively clean leptonic environment, a high energy muon collider has the unique potential to provide both precision measurements and the highest energy reach in one machine that cannot be paralleled by any currently availab…
▽ More
A multi-TeV muon collider offers a spectacular opportunity in the direct exploration of the energy frontier. Offering a combination of unprecedented energy collisions in a comparatively clean leptonic environment, a high energy muon collider has the unique potential to provide both precision measurements and the highest energy reach in one machine that cannot be paralleled by any currently available technology. The topic generated a lot of excitement in Snowmass meetings and continues to attract a large number of supporters, including many from the early career community. In light of this very strong interest within the US particle physics community, Snowmass Energy, Theory and Accelerator Frontiers created a cross-frontier Muon Collider Forum in November of 2020. The Forum has been meeting on a monthly basis and organized several topical workshops dedicated to physics, accelerator technology, and detector R&D. Findings of the Forum are summarized in this report.
△ Less
Submitted 8 August, 2023; v1 submitted 2 September, 2022;
originally announced September 2022.
-
Identifying Source Code File Experts
Authors:
Otávio Cury,
Guilherme Avelino,
Pedro Santos Neto,
Ricardo Britto,
Marco Túlio Valente
Abstract:
In software development, the identification of source code file experts is an important task. Identifying these experts helps to improve software maintenance and evolution activities, such as develo** new features, code reviews, and bug fixes. Although some studies have proposed repository mining techniques to automatically identify source code experts, there are still gaps in this area that can…
▽ More
In software development, the identification of source code file experts is an important task. Identifying these experts helps to improve software maintenance and evolution activities, such as develo** new features, code reviews, and bug fixes. Although some studies have proposed repository mining techniques to automatically identify source code experts, there are still gaps in this area that can be explored. For example, investigating new variables related to source code knowledge and applying machine learning aiming to improve the performance of techniques to identify source code experts. The goal of this study is to investigate opportunities to improve the performance of existing techniques to recommend source code files experts. We built an oracle by collecting data from the development history and surveying developers of 113 software projects. Then, we use this oracle to: (i) analyze the correlation between measures extracted from the development history and the developers source code knowledge and (ii) investigate the use of machine learning classifiers by evaluating their performance in identifying source code files experts. First Authorship and Recency of Modification are the variables with the highest positive and negative correlations with source code knowledge, respectively. Machine learning classifiers outperformed the linear techniques (F-Measure = 71% to 73%) in the public dataset, but this advantage is not clear in the private dataset, with F-Measure ranging from 55% to 68% for the linear techniques and 58% to 67% for ML techniques. Overall, the linear techniques and the machine learning classifiers achieved similar performance, particularly if we analyze F-Measure. However, machine learning classifiers usually get higher precision while linear techniques obtained the highest recall values.
△ Less
Submitted 15 August, 2022;
originally announced August 2022.
-
Code Smells in Elixir: Early Results from a Grey Literature Review
Authors:
Lucas Francisco da Matta Vegi,
Marco Tulio Valente
Abstract:
Elixir is a new functional programming language whose popularity is rising in the industry. However, there are few works in the literature focused on studying the internal quality of systems implemented in this language. Particularly, to the best of our knowledge, there is currently no catalog of code smells for Elixir. Therefore, in this paper, through a grey literature review, we investigate whe…
▽ More
Elixir is a new functional programming language whose popularity is rising in the industry. However, there are few works in the literature focused on studying the internal quality of systems implemented in this language. Particularly, to the best of our knowledge, there is currently no catalog of code smells for Elixir. Therefore, in this paper, through a grey literature review, we investigate whether Elixir developers discuss code smells. Our preliminary results indicate that 11 of the 22 traditional code smells cataloged by Fowler and Beck are discussed by Elixir developers. We also propose a list of 18 new smells specific for Elixir systems and investigate whether these smells are currently identified by Credo, a well-known static code analysis tool for Elixir. We conclude that only two traditional code smells and one Elixir-specific code smell are automatically detected by this tool. Thus, these early results represent an opportunity for extending tools such as Credo to detect code smells and then contribute to improving the internal quality of Elixir systems.
△ Less
Submitted 16 March, 2022;
originally announced March 2022.
-
Prospects for the Measurement of the Standard Model Higgs Pair Production at the Muon Colliders
Authors:
K. Black,
T. Bose,
S. Dasu,
H. Jia,
S. Lomte,
V. Sharma,
C. Vuosalo,
I. Ojalvo,
T. Holmes,
L. Lee,
M. Swiatlowski,
M. Valente,
J. Oliver
Abstract:
We study the Higgs pair production process at a muon collider using b-pair decays of the Higgs bosons. Efficient identification and good measurement resolution for the b-jet pair invariant mass are crucial for unearthing the di-Higgs signal. However, the beam-induced background has potential to drastically degrade the performance. We report on the full simulation studies of the degradation of the…
▽ More
We study the Higgs pair production process at a muon collider using b-pair decays of the Higgs bosons. Efficient identification and good measurement resolution for the b-jet pair invariant mass are crucial for unearthing the di-Higgs signal. However, the beam-induced background has potential to drastically degrade the performance. We report on the full simulation studies of the degradation of the reconstructed b-jet pair invariant mass in di-Higgs events, considering only the beam-induced background in the calorimeter. Mitigation strategies for the suppression of the beam-induced background are underway. We also report prospects for the measurement of the Standard Model Higgs pair production at the Muon Colliders at various benchmarks of the collider center of mass energy and integrated luminosity using a fast simulation program.
△ Less
Submitted 8 August, 2023; v1 submitted 16 March, 2022;
originally announced March 2022.
-
Simulated Detector Performance at the Muon Collider
Authors:
Nazar Bartosik,
Karol Krizka,
Simone Pagan Griso,
Chiara Aimè,
Aram Apyan,
Mohammed Attia Mahmoud,
Alessandro Bertolin,
Alessandro Braghieri,
Laura Buonincontri,
Simone Calzaferri,
Massimo Casarsa,
Luca Castelli,
Maria Gabriella Catanesi,
Francesco Giovanni Celiberto,
Alessandro Cerri,
Grigorios Chachamis,
Anna Colaleo,
Camilla Curatolo,
Giacomo Da Molin,
Sridhara Dasu,
Dmitri Desinov,
Haluk Denizli,
Biagio Di Micco,
Tommaso Dorigo,
Filippo Errico
, et al. (46 additional authors not shown)
Abstract:
In this paper we report on the current status of studies on the expected performance for a detector designed to operate in a muon collider environment. Beam-induced backgrounds (BIB) represent the main challenge in the design of the detector and the event reconstruction algorithms. The current detector design aims to show that satisfactory performance can be achieved, while further optimizations a…
▽ More
In this paper we report on the current status of studies on the expected performance for a detector designed to operate in a muon collider environment. Beam-induced backgrounds (BIB) represent the main challenge in the design of the detector and the event reconstruction algorithms. The current detector design aims to show that satisfactory performance can be achieved, while further optimizations are expected to significantly improve the overall performance. We present the characterization of the expected beam-induced background, describe the detector design and software used for detailed event simulations taking into account BIB effects. The expected performance of charged-particle reconstruction, jets, electrons, photons and muons is discussed, including an initial study on heavy-flavor jet tagging. A simple method to measure the delivered luminosity is also described. Overall, the proposed design and reconstruction algorithms can successfully reconstruct the high transverse-momentum objects needed to carry out a broad physics program.
△ Less
Submitted 12 August, 2022; v1 submitted 15 March, 2022;
originally announced March 2022.
-
The Cost of Influence: How Gifts to Physicians Shape Prescriptions and Drug Costs
Authors:
Melissa Newham,
Marica Valente
Abstract:
This paper studies how gifts - monetary or in-kind payments - from drug firms to physicians in the US affect prescriptions and drug costs. We estimate heterogeneous treatment effects by combining physician-level data on antidiabetic prescriptions and payments with causal inference and machine learning methods. We find that payments cause physicians to prescribe more brand drugs, resulting in a cos…
▽ More
This paper studies how gifts - monetary or in-kind payments - from drug firms to physicians in the US affect prescriptions and drug costs. We estimate heterogeneous treatment effects by combining physician-level data on antidiabetic prescriptions and payments with causal inference and machine learning methods. We find that payments cause physicians to prescribe more brand drugs, resulting in a cost increase of $30 per dollar received. Responses differ widely across physicians, and are primarily explained by variation in patients' out-of-pocket costs. A gift ban is estimated to decrease drug costs by 3-4%. Taken together, these novel findings reveal how payments shape prescription choices and drive up costs.
△ Less
Submitted 18 April, 2023; v1 submitted 3 March, 2022;
originally announced March 2022.
-
Towards a Catalog of Composite Refactorings
Authors:
Aline Brito,
Andre Hora,
Marco Tulio Valente
Abstract:
Catalogs of refactoring have key importance in software maintenance and evolution, since developers rely on such documents to understand and perform refactoring operations. Furthermore, these catalogs constitute a reference guide for communication between practitioners since they standardize a common refactoring vocabulary. Fowler's book describes the most popular catalog of refactorings, which do…
▽ More
Catalogs of refactoring have key importance in software maintenance and evolution, since developers rely on such documents to understand and perform refactoring operations. Furthermore, these catalogs constitute a reference guide for communication between practitioners since they standardize a common refactoring vocabulary. Fowler's book describes the most popular catalog of refactorings, which documents single and well-known refactoring operations. However, sometimes refactorings are composite transformations, i.e., a sequence of refactorings is performed over a given program element. For example, a sequence of Extract Method operations (a single refactoring) can be performed over the same method, in one or in multiple commits, to simplify its implementation, therefore, leading to a Method Decomposition operation (a composite refactoring). In this paper, we propose and document a catalog with eight composite refactorings. We also implement a set of scripts to mine composite refactorings by preprocessing the results of refactoring detection tools. Using such scripts, we search for composites in a representative refactoring oracle with hundreds of confirmed single refactoring operations. Next, to complement this first study, we also search for composites in the full history of ten well-known open-source projects. We characterize the detected composite refactorings, under dimensions such as size and location. We conclude by addressing the applications and implications of the proposed catalog.
△ Less
Submitted 15 November, 2022; v1 submitted 12 January, 2022;
originally announced January 2022.
-
RAID: Tool Support for Refactoring-Aware Code Reviews
Authors:
Rodrigo Brito,
Marco Tulio Valente
Abstract:
Code review is a key development practice that contributes to improve software quality and to foster knowledge sharing among developers. However, code review usually takes time and demands detailed and time-consuming analysis of textual diffs. Particularly, detecting refactorings during code reviews is not a trivial task, since they are not explicitly represented in diffs. For example, a Move Func…
▽ More
Code review is a key development practice that contributes to improve software quality and to foster knowledge sharing among developers. However, code review usually takes time and demands detailed and time-consuming analysis of textual diffs. Particularly, detecting refactorings during code reviews is not a trivial task, since they are not explicitly represented in diffs. For example, a Move Function refactoring is represented by deleted (-) and added lines (+) of code which can be located in different and distant source code files. To tackle this problem, we introduce RAID, a refactoring-aware and intelligent diff tool. Besides proposing an architecture for RAID, we implemented a Chrome browser plug-in that supports our solution. Then, we conducted a field experiment with eight professional developers who used RAID for three months. We concluded that RAID can reduce the cognitive effort required for detecting and reviewing refactorings in textual diff. Besides documenting refactorings in diffs, RAID reduces the number of lines required for reviewing such operations. For example, the median number of lines to be reviewed decreases from 14.5 to 2 lines in the case of move refactorings and from 113 to 55 lines in the case of extractions.
△ Less
Submitted 21 March, 2021;
originally announced March 2021.
-
What Skills do IT Companies look for in New Developers? A Study with Stack Overflow Jobs
Authors:
João Eduardo Montandon,
Cristiano Politowski,
Luciana Lourdes Silva,
Marco Tulio Valente,
Fabio Petrillo,
Yann-Gaël Guéhéneuc
Abstract:
Context: There is a growing demand for information on how IT companies look for candidates to their open positions. Objective: This paper investigates which hard and soft skills are more required in IT companies by analyzing the description of 20,000 job opportunities. Method: We applied open card sorting to perform a high-level analysis on which types of hard skills are more requested. Further, w…
▽ More
Context: There is a growing demand for information on how IT companies look for candidates to their open positions. Objective: This paper investigates which hard and soft skills are more required in IT companies by analyzing the description of 20,000 job opportunities. Method: We applied open card sorting to perform a high-level analysis on which types of hard skills are more requested. Further, we manually analyzed the most mentioned soft skills. Results: Programming languages are the most demanded hard skills. Communication, collaboration, and problem-solving are the most demanded soft skills. Conclusion: We recommend developers to organize their resumé according to the positions they are applying. We also highlight the importance of soft skills, as they appear in many job opportunities.
△ Less
Submitted 4 November, 2020;
originally announced November 2020.
-
Policy evaluation of waste pricing programs using heterogeneous causal effect estimation
Authors:
Marica Valente
Abstract:
Using machine learning methods in a quasi-experimental setting, I study the heterogeneous effects of introducing waste prices - unit prices on household unsorted waste disposal on - waste demands, municipal costs and pollution. Using a unique panel of Italian municipalities with large variation in prices and observables, I show that waste demands are nonlinear. I find evidence of constant elastici…
▽ More
Using machine learning methods in a quasi-experimental setting, I study the heterogeneous effects of introducing waste prices - unit prices on household unsorted waste disposal on - waste demands, municipal costs and pollution. Using a unique panel of Italian municipalities with large variation in prices and observables, I show that waste demands are nonlinear. I find evidence of constant elasticities at low prices, and increasing elasticities at high prices driven by income effects and waste habits before policy. The policy reduces waste management costs and pollution in all municipalities after three years of adoption, when prices cause significant waste avoidance.
△ Less
Submitted 3 November, 2022; v1 submitted 2 October, 2020;
originally announced October 2020.
-
Are Game Engines Software Frameworks? A Three-perspective Study
Authors:
Cristiano Politowski,
Fabio Petrillo,
João Eduardo Montandon,
Marco Tulio Valente,
Yann-Gaël Guéhéneuc
Abstract:
Game engines help developers create video games and avoid duplication of code and effort, like frameworks for traditional software systems. In this paper, we explore open-source game engines along three perspectives: literature, code, and human. First, we explore and summarise the academic literature on game engines. Second, we compare the characteristics of the 282 most popular engines and the 28…
▽ More
Game engines help developers create video games and avoid duplication of code and effort, like frameworks for traditional software systems. In this paper, we explore open-source game engines along three perspectives: literature, code, and human. First, we explore and summarise the academic literature on game engines. Second, we compare the characteristics of the 282 most popular engines and the 282 most popular frameworks in GitHub. Finally, we survey 124 engine developers about their experience with the development of their engines. We report that: (1) Game engines are not well-studied in software-engineering research with few studies having engines as object of research. (2) Open-source game engines are slightly larger in terms of size and complexity and less popular and engaging than traditional frameworks. Their programming languages differ greatly from frameworks. Engine projects have shorter histories with less releases. (3) Developers perceive game engines as different from traditional frameworks. Generally, they build game engines to (a) better control the environment and source code, (b) learn about game engines, and (c) develop specific games. We conclude that open-source game engines have differences compared to traditional open-source frameworks although this differences do not demand special treatments.
△ Less
Submitted 19 September, 2020; v1 submitted 12 April, 2020;
originally announced April 2020.
-
Beyond the Code: Mining Self-Admitted Technical Debt in Issue Tracker Systems
Authors:
Laerte Xavier,
Fabio Ferreira,
Rodrigo Brito,
Marco Tulio Valente
Abstract:
Self-admitted technical debt (SATD) is a particular case of Technical Debt (TD) where developers explicitly acknowledge their sub-optimal implementation decisions. Previous studies mine SATD by searching for specific TD-related terms in source code comments. By contrast, in this paper we argue that developers can admit technical debt by other means, e.g., by creating issues in tracking systems and…
▽ More
Self-admitted technical debt (SATD) is a particular case of Technical Debt (TD) where developers explicitly acknowledge their sub-optimal implementation decisions. Previous studies mine SATD by searching for specific TD-related terms in source code comments. By contrast, in this paper we argue that developers can admit technical debt by other means, e.g., by creating issues in tracking systems and labelling them as referring to TD. We refer to this type of SATD as issue-based SATD or just SATD-I. We study a sample of 286 SATD-I instances collected from five open source projects, including Microsoft Visual Studio and GitLab Community Edition. We show that only 29% of the studied SATD-I instances can be tracked to source code comments. We also show that SATD-I issues take more time to be closed, compared to other issues, although they are not more complex in terms of code churn. Besides, in 45% of the studied issues TD was introduced to ship earlier, and in almost 60% it refers to Design flaws. Finally, we report that most developers pay SATD-I to reduce its costs or interests (66%). Our findings suggest that there is space for designing novel tools to support technical debt management, particularly tools that encourage developers to create and label issues containing TD concerns.
△ Less
Submitted 20 March, 2020;
originally announced March 2020.
-
REST vs GraphQL: A Controlled Experiment
Authors:
Gleison Brito,
Marco Tulio Valente
Abstract:
GraphQL is a novel query language for implementing service-based software architectures. The language is gaining momentum and it is now used by major software companies, such as Facebook and GitHub. However, we still lack empirical evidence on the real gains achieved by GraphQL, particularly in terms of the effort required to implement queries in this language. Therefore, in this paper we describe…
▽ More
GraphQL is a novel query language for implementing service-based software architectures. The language is gaining momentum and it is now used by major software companies, such as Facebook and GitHub. However, we still lack empirical evidence on the real gains achieved by GraphQL, particularly in terms of the effort required to implement queries in this language. Therefore, in this paper we describe a controlled experiment with 22 students (10 undergraduate and 12 graduate), who were asked to implement eight queries for accessing a web service, using GraphQL and REST. Our results show that GraphQL requires less effort to implement remote service queries when compared to REST (9 vs 6 minutes, median times). These gains increase when REST queries include more complex endpoints, with several parameters. Interestingly, GraphQL outperforms REST even among more experienced participants (as is the case of graduate students) and among participants with previous experience in REST, but no previous experience in GraphQL.
△ Less
Submitted 10 March, 2020;
originally announced March 2020.
-
Is this GitHub Project Maintained? Measuring the Level of Maintenance Activity of Open-Source Projects
Authors:
Jailton Coelho,
Marco Tulio Valente,
Luciano Milen,
Luciana L. Silva
Abstract:
Context: GitHub hosts an impressive number of high-quality OSS projects. However, selecting "the right tool for the job" is a challenging task, because we do not have precise information about those high-quality projects. Objective: In this paper, we propose a data-driven approach to measure the level of maintenance activity of GitHub projects. Our goal is to alert users about the risks of using u…
▽ More
Context: GitHub hosts an impressive number of high-quality OSS projects. However, selecting "the right tool for the job" is a challenging task, because we do not have precise information about those high-quality projects. Objective: In this paper, we propose a data-driven approach to measure the level of maintenance activity of GitHub projects. Our goal is to alert users about the risks of using unmaintained projects and possibly motivate other developers to assume the maintenance of such projects. Method: We train machine learning models to define a metric to express the level of maintenance activity of GitHub projects. Next, we analyze the historical evolution of 2,927 active projects in the time frame of one year. Results: From 2,927 active projects, 16% become unmaintained in the interval of one year. We also found that Objective-C projects tend to have lower maintenance activity than projects implemented in other languages. Finally, software tools---such as compilers and editors---have the highest maintenance activity over time. Conclusions: A metric about the level of maintenance activity of GitHub projects can help developers to select open source projects.
△ Less
Submitted 9 March, 2020;
originally announced March 2020.
-
Refactoring Graphs: Assessing Refactoring over Time
Authors:
Aline Brito,
Andre Hora,
Marco Tulio Valente
Abstract:
Refactoring is an essential activity during software evolution. Frequently, practitioners rely on such transformations to improve source code maintainability and quality. As a consequence, this process may produce new source code entities or change the structure of existing ones. Sometimes, the transformations are atomic, i.e., performed in a single commit. In other cases, they generate sequences…
▽ More
Refactoring is an essential activity during software evolution. Frequently, practitioners rely on such transformations to improve source code maintainability and quality. As a consequence, this process may produce new source code entities or change the structure of existing ones. Sometimes, the transformations are atomic, i.e., performed in a single commit. In other cases, they generate sequences of modifications performed over time. To study and reason about refactorings over time, in this paper, we propose a novel concept called refactoring graphs and provide an algorithm to build such graphs. Then, we investigate the history of 10 popular open-source Java-based projects. After eliminating trivial graphs, we characterize a large sample of 1,150 refactoring graphs, providing quantitative data on their size, commits, age, refactoring composition, and developers. We conclude by discussing applications and implications of refactoring graphs, for example, to improve code comprehension, detect refactoring patterns, and support software evolution studies.
△ Less
Submitted 10 March, 2020;
originally announced March 2020.
-
Report on the ECFA Early-Career Researchers Debate on the 2020 European Strategy Update for Particle Physics
Authors:
N. Andari,
L. Apolinário,
K. Augsten,
E. Bakos,
I. Bellafont,
L. Beresford,
A. Bethani,
J. Beyer,
L. Bianchini,
C. Bierlich,
B. Bilin,
K. L. Bjørke,
E. Bols,
P. A. Brás,
L. Brenner,
E. Brondolin,
P. Calvo,
B. Capdevila,
I. Cioara,
L. N. Cojocariu,
F. Collamati,
A. de Wit,
F. Dordei,
M. Dordevic,
T. A. du Pree
, et al. (96 additional authors not shown)
Abstract:
A group of Early-Career Researchers (ECRs) has been given a mandate from the European Committee for Future Accelerators (ECFA) to debate the topics of the current European Strategy Update (ESU) for Particle Physics and to summarise the outcome in a brief document [1]. A full-day debate with 180 delegates was held at CERN, followed by a survey collecting quantitative input. During the debate, the E…
▽ More
A group of Early-Career Researchers (ECRs) has been given a mandate from the European Committee for Future Accelerators (ECFA) to debate the topics of the current European Strategy Update (ESU) for Particle Physics and to summarise the outcome in a brief document [1]. A full-day debate with 180 delegates was held at CERN, followed by a survey collecting quantitative input. During the debate, the ECRs discussed future colliders in terms of the physics prospects, their implications for accelerator and detector technology as well as computing and software. The discussion was organised into several topic areas. From these areas two common themes were particularly highlighted by the ECRs: sociological and human aspects; and issues of the environmental impact and sustainability of our research.
△ Less
Submitted 7 February, 2020;
originally announced February 2020.
-
The Habitable Exoplanet Observatory (HabEx) Mission Concept Study Final Report
Authors:
B. Scott Gaudi,
Sara Seager,
Bertrand Mennesson,
Alina Kiessling,
Keith Warfield,
Kerri Cahoy,
John T. Clarke,
Shawn Domagal-Goldman,
Lee Feinberg,
Olivier Guyon,
Jeremy Kasdin,
Dimitri Mawet,
Peter Plavchan,
Tyler Robinson,
Leslie Rogers,
Paul Scowen,
Rachel Somerville,
Karl Stapelfeldt,
Christopher Stark,
Daniel Stern,
Margaret Turnbull,
Rashied Amini,
Gary Kuan,
Stefan Martin,
Rhonda Morgan
, et al. (161 additional authors not shown)
Abstract:
The Habitable Exoplanet Observatory, or HabEx, has been designed to be the Great Observatory of the 2030s. For the first time in human history, technologies have matured sufficiently to enable an affordable space-based telescope mission capable of discovering and characterizing Earthlike planets orbiting nearby bright sunlike stars in order to search for signs of habitability and biosignatures. Su…
▽ More
The Habitable Exoplanet Observatory, or HabEx, has been designed to be the Great Observatory of the 2030s. For the first time in human history, technologies have matured sufficiently to enable an affordable space-based telescope mission capable of discovering and characterizing Earthlike planets orbiting nearby bright sunlike stars in order to search for signs of habitability and biosignatures. Such a mission can also be equipped with instrumentation that will enable broad and exciting general astrophysics and planetary science not possible from current or planned facilities. HabEx is a space telescope with unique imaging and multi-object spectroscopic capabilities at wavelengths ranging from ultraviolet (UV) to near-IR. These capabilities allow for a broad suite of compelling science that cuts across the entire NASA astrophysics portfolio. HabEx has three primary science goals: (1) Seek out nearby worlds and explore their habitability; (2) Map out nearby planetary systems and understand the diversity of the worlds they contain; (3) Enable new explorations of astrophysical systems from our own solar system to external galaxies by extending our reach in the UV through near-IR. This Great Observatory science will be selected through a competed GO program, and will account for about 50% of the HabEx primary mission. The preferred HabEx architecture is a 4m, monolithic, off-axis telescope that is diffraction-limited at 0.4 microns and is in an L2 orbit. HabEx employs two starlight suppression systems: a coronagraph and a starshade, each with their own dedicated instrument.
△ Less
Submitted 26 January, 2020; v1 submitted 18 January, 2020;
originally announced January 2020.
-
Optical design for CETUS: a wide-field 1.5m aperture UV payload being studied for a NASA probe class mission study
Authors:
Robert A. Woodruff,
William C. Danchi,
Sara R. Heap,
Tony Hull,
Stephen E. Kendrick,
Lloyd R. Purvesb,
Michael S. Rhee,
Eric Mentzell,
Brian Fleming,
Marty Valente,
James Burge,
Ben Lewis,
Kelly Dodson,
Greg Mehle,
Matt Tomic
Abstract:
As part of a study funded by NASA Headquarters, we are develo** a Probe-class mission concept called the Cosmic Evolution Through UV Spectroscopy (CETUS). CETUS includes a 1.5-m aperture diameter telescope with a large field-of-view (FOV). CETUS includes three scientific instruments: a Far Ultraviolet (FUV) and Near Ultraviolet (NUV) imaging camera (CAM); a NUV Multi-Object Spectrograph (MOS); a…
▽ More
As part of a study funded by NASA Headquarters, we are develo** a Probe-class mission concept called the Cosmic Evolution Through UV Spectroscopy (CETUS). CETUS includes a 1.5-m aperture diameter telescope with a large field-of-view (FOV). CETUS includes three scientific instruments: a Far Ultraviolet (FUV) and Near Ultraviolet (NUV) imaging camera (CAM); a NUV Multi-Object Spectrograph (MOS); and a dual-channel Point Source Spectrograph (PSS) in the Lyman Ultraviolet (LUV), FUV, and NUV spectral regions. The large FOV Three Mirror Anastigmatic (TMA) Optical Telescope Assembly (OTA) simultaneously feeds the three separate scientific instruments. That is, the instruments view separate portions of the TMA image plane, enabling parallel operation of the three instruments. The field viewed by the MOS, whose design is based on an Offner-type spectrographic configuration to provide wide FOV correction, is actively configured to select and isolate numerous field sources using a next-generation Micro-Shutter Array (MSA). The two-channel camera design is also based on an Offner-like configuration. The Point Source Spectrograph (PSS) performs high spectral resolution spectroscopy on unresolved objects over the NUV region with spectral resolving power, R~ 40,000, in an echelle mode. The PSS also performs long-slit imaging spectroscopy at R~ 20,000 in the LUV and FUV spectral regions with two aberration-corrected, blazed, holographic gratings used in a Rowland-like configuration. The optical system also includes two Fine Guidance Sensors (FGS), and Wavefront Sensors (WFS) that sample numerous locations over the full OTA FOV. In-flight wavelength calibration is performed by a Wavelength Calibration System (WCS), and flat-fielding is also performed, both using in-flight calibration sources. This paper will describe the current optical design and the major trade studies leading to the design.
△ Less
Submitted 13 December, 2019;
originally announced December 2019.
-
Beyond Textual Issues: Understanding the Usage and Impact of GitHub Reactions
Authors:
Hudson Borges,
Rodrigo Brito,
Marco Tulio Valente
Abstract:
Recently, GitHub introduced a new social feature, named reactions, which are "pictorial characters" similar to emoji symbols widely used nowadays in text-based communications. Particularly, GitHub users can use a pre-defined set of such symbols to react to issues and pull requests. However, little is known about the real usage and impact of GitHub reactions. In this paper, we analyze the reactions…
▽ More
Recently, GitHub introduced a new social feature, named reactions, which are "pictorial characters" similar to emoji symbols widely used nowadays in text-based communications. Particularly, GitHub users can use a pre-defined set of such symbols to react to issues and pull requests. However, little is known about the real usage and impact of GitHub reactions. In this paper, we analyze the reactions provided by developers to more than 2.5 million issues and 9.7 million issue comments, in order to answer an extensive list of nine research questions about the usage and adoption of reactions. We show that reactions are being increasingly used by open source developers. Moreover, we also found that issues with reactions usually take more time to be handled and have longer discussions.
△ Less
Submitted 30 September, 2019;
originally announced October 2019.
-
Software Engineering Meets Deep Learning: A Map** Study
Authors:
Fabio Ferreira,
Luciana Lourdes Silva,
Marco Tulio Valente
Abstract:
Deep Learning (DL) is being used nowadays in many traditional Software Engineering (SE) problems and tasks. However, since the renaissance of DL techniques is still very recent, we lack works that summarize and condense the most recent and relevant research conducted at the intersection of DL and SE. Therefore, in this paper, we describe the first results of a map** study covering 81 papers abou…
▽ More
Deep Learning (DL) is being used nowadays in many traditional Software Engineering (SE) problems and tasks. However, since the renaissance of DL techniques is still very recent, we lack works that summarize and condense the most recent and relevant research conducted at the intersection of DL and SE. Therefore, in this paper, we describe the first results of a map** study covering 81 papers about DL & SE. Our results confirm that DL is gaining momentum among SE researchers over the years and that the top-3 research problems tackled by the analyzed papers are documentation, defect prediction, and testing.
△ Less
Submitted 4 December, 2020; v1 submitted 25 September, 2019;
originally announced September 2019.
-
How do Developers Promote Open Source Projects?
Authors:
Hudson Borges,
Marco Tulio Valente
Abstract:
Open source projects have an increasing importance on modern software development. For this reason, these projects, as usual with commercial software projects, should make use of promotion channels to communicate and establish contact with users and contributors. In this article, we study the channels used to promote a set of 100 popular GitHub projects. First, we reveal that Twitter, user meeting…
▽ More
Open source projects have an increasing importance on modern software development. For this reason, these projects, as usual with commercial software projects, should make use of promotion channels to communicate and establish contact with users and contributors. In this article, we study the channels used to promote a set of 100 popular GitHub projects. First, we reveal that Twitter, user meetings, and blogs are the most common promotion channels used by the studied projects. Second, we report a major difference between the studied projects and a random sample of projects, regarding the use of the investigated promotion channels. Third, we show the importance of a popular news aggregation site (Hacker News) on the promotion of open source. We conclude by presenting a set of practical recommendation to open source project managers and leaders, regarding the promotion of their projects.
△ Less
Submitted 12 August, 2019;
originally announced August 2019.
-
Deep Sensor Fusion for Real-Time Odometry Estimation
Authors:
Michelle Valente,
Cyril Joly,
Arnaud de La Fortelle
Abstract:
Cameras and 2D laser scanners, in combination, are able to provide low-cost, light-weight and accurate solutions, which make their fusion well-suited for many robot navigation tasks. However, correct data fusion depends on precise calibration of the rigid body transform between the sensors. In this paper we present the first framework that makes use of Convolutional Neural Networks (CNNs) for odom…
▽ More
Cameras and 2D laser scanners, in combination, are able to provide low-cost, light-weight and accurate solutions, which make their fusion well-suited for many robot navigation tasks. However, correct data fusion depends on precise calibration of the rigid body transform between the sensors. In this paper we present the first framework that makes use of Convolutional Neural Networks (CNNs) for odometry estimation fusing 2D laser scanners and mono-cameras. The use of CNNs provides the tools to not only extract the features from the two sensors, but also to fuse and match them without needing a calibration between the sensors. We transform the odometry estimation into an ordinal classification problem in order to find accurate rotation and translation values between consecutive frames. Results on a real road dataset show that the fusion network runs in real-time and is able to improve the odometry estimation of a single sensor alone by learning how to fuse two different types of data information.
△ Less
Submitted 31 July, 2019;
originally announced August 2019.
-
On the abandonment and survival of open source projects: An empirical investigation
Authors:
Guilherme Avelino,
Eleni Constantinou,
Marco Tulio Valente,
Alexander Serebrenik
Abstract:
Background: Evolution of open source projects frequently depends on a small number of core developers. The loss of such core developers might be detrimental for projects and even threaten their entire continuation. However, it is possible that new core developers assume the project maintenance and allow the project to survive. Aims: The objective of this paper is to provide empirical evidence on:…
▽ More
Background: Evolution of open source projects frequently depends on a small number of core developers. The loss of such core developers might be detrimental for projects and even threaten their entire continuation. However, it is possible that new core developers assume the project maintenance and allow the project to survive. Aims: The objective of this paper is to provide empirical evidence on: 1) the frequency of project abandonment and survival, 2) the differences between abandoned and surviving projects, and 3) the motivation and difficulties faced when assuming an abandoned project. Method: We adopt a mixed-methods approach to investigate project abandonment and survival. We carefully select 1,932 popular GitHub projects and recover the abandoned and surviving projects, and conduct a survey with developers that have been instrumental in the survival of the projects. Results: We found that 315 projects (16%) were abandoned and 128 of these projects (41%) survived because of new core developers who assumed the project development. The survey indicates that (i) in most cases the new maintainers were aware of the project abandonment risks when they started to contribute; (ii) their own usage of the systems is the main motivation to contribute to such projects; (iii) human and social factors played a key role when making these contributions; and (iv) lack of time and the difficulty to obtain push access to the repositories are the main barriers faced by them. Conclusions: Project abandonment is a reality even in large open source projects and our work enables a better understanding of such risks, as well as highlights ways in avoiding them.
△ Less
Submitted 19 June, 2019;
originally announced June 2019.
-
Migrating to GraphQL: A Practical Assessment
Authors:
Gleison Brito,
Thais Mombach,
Marco Tulio Valente
Abstract:
GraphQL is a novel query language proposed by Facebook to implement Web-based APIs. In this paper, we present a practical study on migrating API clients to this new technology. First, we conduct a grey literature review to gain an in-depth understanding on the benefits and key characteristics normally associated to GraphQL by practitioners. After that, we assess such benefits in practice, by migra…
▽ More
GraphQL is a novel query language proposed by Facebook to implement Web-based APIs. In this paper, we present a practical study on migrating API clients to this new technology. First, we conduct a grey literature review to gain an in-depth understanding on the benefits and key characteristics normally associated to GraphQL by practitioners. After that, we assess such benefits in practice, by migrating seven systems to use GraphQL, instead of standard REST-based APIs. As our key result, we show that GraphQL can reduce the size of the JSON documents returned by REST APIs in 94% (in number of fields) and in 99% (in number of bytes), both median results.
△ Less
Submitted 18 June, 2019;
originally announced June 2019.
-
Identifying Experts in Software Libraries and Frameworks among GitHub Users
Authors:
Joao Eduardo Montandon,
Luciana Lourdes Silva,
Marco Tulio Valente
Abstract:
Software development increasingly depends on libraries and frameworks to increase productivity and reduce time-to-market. Despite this fact, we still lack techniques to assess developers expertise in widely popular libraries and frameworks. In this paper, we evaluate the performance of unsupervised (based on clustering) and supervised machine learning classifiers (Random Forest and SVM) to identif…
▽ More
Software development increasingly depends on libraries and frameworks to increase productivity and reduce time-to-market. Despite this fact, we still lack techniques to assess developers expertise in widely popular libraries and frameworks. In this paper, we evaluate the performance of unsupervised (based on clustering) and supervised machine learning classifiers (Random Forest and SVM) to identify experts in three popular JavaScript libraries: facebook/react, mongodb/node-mongodb, and socketio/socket.io. First, we collect 13 features about developers activity on GitHub projects, including commits on source code files that depend on these libraries. We also build a ground truth including the expertise of 575 developers on the studied libraries, as self-reported by them in a survey. Based on our findings, we document the challenges of using machine learning classifiers to predict expertise in software libraries, using features extracted from GitHub. Then, we propose a method to identify library experts based on clustering feature data from GitHub; by triangulating the results of this method with information available on Linkedin profiles, we show that it is able to recommend dozens of GitHub users with evidences of being experts in the studied JavaScript libraries. We also provide a public dataset with the expertise of 575 developers on the studied libraries.
△ Less
Submitted 19 March, 2019;
originally announced March 2019.
-
An LSTM Network for Real-Time Odometry Estimation
Authors:
Michelle Valente,
Cyril Joly,
Arnaud de La Fortelle
Abstract:
The use of 2D laser scanners is attractive for the autonomous driving industry because of its accuracy, light-weight and low-cost. However, since only a 2D slice of the surrounding environment is detected at each scan, it is a challenge to execute important tasks such as the localization of the vehicle. In this paper we present a novel framework that explores the use of deep Recurrent Convolutiona…
▽ More
The use of 2D laser scanners is attractive for the autonomous driving industry because of its accuracy, light-weight and low-cost. However, since only a 2D slice of the surrounding environment is detected at each scan, it is a challenge to execute important tasks such as the localization of the vehicle. In this paper we present a novel framework that explores the use of deep Recurrent Convolutional Neural Networks (RCNN) for odometry estimation using only 2D laser scanners. The application of RCNNs provides the tools to not only extract the features of the laser scanner data using Convolutional Neural Networks (CNNs), but in addition it models the possible connections among consecutive scans using the Long Short-Term Memory (LSTM) Recurrent Neural Network. Results on a real road dataset show that the method can run in real-time without using GPU acceleration and have competitive performance compared to other methods, being an interesting approach that could complement traditional localization systems.
△ Less
Submitted 22 February, 2019;
originally announced February 2019.
-
What's in a GitHub Star? Understanding Repository Starring Practices in a Social Coding Platform
Authors:
Hudson Borges,
Marco Tulio Valente
Abstract:
Besides a git-based version control system, GitHub integrates several social coding features. Particularly, GitHub users can star a repository, presumably to manifest interest or satisfaction with an open source project. However, the real and practical meaning of starring a project was never the subject of an in-depth and well-founded empirical investigation. Therefore, we provide in this paper a…
▽ More
Besides a git-based version control system, GitHub integrates several social coding features. Particularly, GitHub users can star a repository, presumably to manifest interest or satisfaction with an open source project. However, the real and practical meaning of starring a project was never the subject of an in-depth and well-founded empirical investigation. Therefore, we provide in this paper a throughout study on the meaning, characteristics, and dynamic growth of GitHub stars. First, by surveying 791 developers, we report that three out of four developers consider the number of stars before using or contributing to a GitHub project. Then, we report a quantitative analysis on the characteristics of the top-5,000 most starred GitHub repositories. We propose four patterns to describe stars growth, which are derived after clustering the time series representing the number of stars of the studied repositories; we also reveal the perception of 115 developers about these growth patterns. To conclude, we provide a list of recommendations to open source project managers (e.g., on the importance of social media promotion) and to GitHub users and Software Engineering researchers (e.g., on the risks faced when selecting projects by GitHub stars).
△ Less
Submitted 19 November, 2018;
originally announced November 2018.
-
Monorepos: A Multivocal Literature Review
Authors:
Gleison Brito,
Ricardo Terra,
Marco Tulio Valente
Abstract:
Monorepos (Monolithic Repositories) are used by large companies, such as Google and Facebook, and by popular open-source projects, such as Babel and Ember. This study provides an overview on the definition and characteristics of monorepos as well as on their benefits and challenges. Thereupon, we conducted a multivocal literature review on mostly grey literature. Our findings are fourfold. First,…
▽ More
Monorepos (Monolithic Repositories) are used by large companies, such as Google and Facebook, and by popular open-source projects, such as Babel and Ember. This study provides an overview on the definition and characteristics of monorepos as well as on their benefits and challenges. Thereupon, we conducted a multivocal literature review on mostly grey literature. Our findings are fourfold. First, monorepos are single repositories that contain multiple projects, related or unrelated, sharing the same dependencies. Second, centralization and standardization are some key characteristics. Third, the main benefits include simplified dependencies, coordination of cross-project changes, and easy refactoring. Fourth, code health, codebase complexity, and tooling investments for both development and execution are considered the main challenges.
△ Less
Submitted 22 October, 2018;
originally announced October 2018.
-
Identifying Unmaintained Projects in GitHub
Authors:
Jailton Coelho,
Marco Tulio Valente,
Luciana L. Silva,
Emad Shihab
Abstract:
Background: Open source software has an increasing importance in modern software development. However, there is also a growing concern on the sustainability of such projects, which are usually managed by a small number of developers, frequently working as volunteers. Aims: In this paper, we propose an approach to identify GitHub projects that are not actively maintained. Our goal is to alert users…
▽ More
Background: Open source software has an increasing importance in modern software development. However, there is also a growing concern on the sustainability of such projects, which are usually managed by a small number of developers, frequently working as volunteers. Aims: In this paper, we propose an approach to identify GitHub projects that are not actively maintained. Our goal is to alert users about the risks of using these projects and possibly motivate other developers to assume the maintenance of the projects. Method: We train machine learning models to identify unmaintained or sparsely maintained projects, based on a set of features about project activity (commits, forks, issues, etc). We empirically validate the model with the best performance with the principal developers of 129 GitHub projects. Results: The proposed machine learning approach has a precision of 80%, based on the feedback of real open source developers; and a recall of 96%. We also show that our approach can be used to assess the risks of projects becoming unmaintained. Conclusions: The model proposed in this paper can be used by open source users and developers to identify GitHub projects that are not actively maintained anymore.
△ Less
Submitted 11 September, 2018;
originally announced September 2018.
-
Microservices in Practice: A Survey Study
Authors:
Markos Viggiato,
Ricardo Terra,
Henrique Rocha,
Marco Tulio Valente,
Eduardo Figueiredo
Abstract:
Microservices architectures have become largely popular in the last years. However, we still lack empirical evidence about the use of microservices and the practices followed by practitioners. Thereupon, in this paper, we report the results of a survey with 122 professionals who work with microservices. We report how the industry is using this architectural style and whether the perception of prac…
▽ More
Microservices architectures have become largely popular in the last years. However, we still lack empirical evidence about the use of microservices and the practices followed by practitioners. Thereupon, in this paper, we report the results of a survey with 122 professionals who work with microservices. We report how the industry is using this architectural style and whether the perception of practitioners regarding the advantages and challenges of microservices is according to the literature.
△ Less
Submitted 14 August, 2018;
originally announced August 2018.
-
CSIndexbr: Exploring the Brazilian Scientific Production in Computer Science
Authors:
Marco Tulio Valente,
Klérisson Paixão
Abstract:
CSIndexbr is a web-based system that provides meaningful,open,and transparent data about Brazilian scientific production in Computer Science. Currently, the system collects full research papers published in the main track of selected conferences. The papers are retrieved from DBLP. In this article, we describe the main features and resources provided by CSIndexbr. We also comment on how other rese…
▽ More
CSIndexbr is a web-based system that provides meaningful,open,and transparent data about Brazilian scientific production in Computer Science. Currently, the system collects full research papers published in the main track of selected conferences. The papers are retrieved from DBLP. In this article, we describe the main features and resources provided by CSIndexbr. We also comment on how other researchers can use the data provided by the system to analyze the Brazilian production in Computer Science.
△ Less
Submitted 23 July, 2018;
originally announced July 2018.
-
Fusing Laser Scanner and Stereo Camera in Evidential Grid Maps
Authors:
Michelle Valente,
Cyril Joly,
Arnaud de la Fortelle
Abstract:
Automation driving techniques have seen tremendous progresses these last years, particularly due to a better perception of the environment. In order to provide safe yet not too conservative driving in complex urban environment, data fusion should not only consider redundant sensing to characterize the surrounding obstacles, but also be able to describe the uncertainties and errors beyond presence/…
▽ More
Automation driving techniques have seen tremendous progresses these last years, particularly due to a better perception of the environment. In order to provide safe yet not too conservative driving in complex urban environment, data fusion should not only consider redundant sensing to characterize the surrounding obstacles, but also be able to describe the uncertainties and errors beyond presence/absence (be it binary or probabilistic). This paper introduces an enriched representation of the world, more precisely of the potential existence of obstacles through an evidential grid map. A method to create this representation from 2 very different sensors, laser scanner and stereo camera, is presented along with algorithms for data fusion and temporal updates. This work allows a better handling of the dynamic aspects of the urban environment and a proper management of errors in order to create a more reliable map. We use the evidential framework based on the Dempster-Shafer theory to model the environment perception by the sensors. A new combination operator is proposed to merge the different sensor grids considering their distinct uncertainties. In addition, we introduce a new long-life layer with high level states that allows the maintenance of a global map of the entire vehicle's trajectory and distinguish between static and dynamic obstacles. Results on a real road dataset show that the environment map** data can be improved by adding relevant information that could be missed without the proposed approach.
△ Less
Submitted 22 February, 2019; v1 submitted 25 May, 2018;
originally announced May 2018.
-
Open Source Development Around the World: A Comparative Study
Authors:
Thais Mombach,
Marco Tulio Valente,
Cuiting Chen,
Magiel Bruntink,
Gustavo Pinto
Abstract:
Open source software has an increasing importance in our modern society, providing basic services to other software systems and also supporting the rapid development of a variety of end-user applications. Recently, world-wide code sharing platforms, like GitHub, are also contributing to open source's growth. However, little is known on how this growth is distributed around the world and about the…
▽ More
Open source software has an increasing importance in our modern society, providing basic services to other software systems and also supporting the rapid development of a variety of end-user applications. Recently, world-wide code sharing platforms, like GitHub, are also contributing to open source's growth. However, little is known on how this growth is distributed around the world and about the characteristics of the projects developed in different countries. In this article, we provide a characterization of 2,648 open source projects developed in 20 countries. We reveal the number of projects per country, the popularity and programming language of each country's project and also show how the number of projects in a country correlates to its GDP. Finally, we assess the maintainability and internal code quality of the studied projects, using a tool called BetterCodeHub.
△ Less
Submitted 3 May, 2018;
originally announced May 2018.
-
Why We Engage in FLOSS: Answers from Core Developers
Authors:
Jailton Coelho,
Marco Tulio Valente,
Luciana L. Silva,
Andre Hora
Abstract:
The maintenance and evolution of Free/Libre Open Source Software (FLOSS) projects demand the constant attraction of core developers. In this paper, we report the results of a survey with 52 developers, who recently became core contributors of popular GitHub projects. We reveal their motivations to assume a key role in FLOSS projects (e.g., improving the projects because they are also using it), th…
▽ More
The maintenance and evolution of Free/Libre Open Source Software (FLOSS) projects demand the constant attraction of core developers. In this paper, we report the results of a survey with 52 developers, who recently became core contributors of popular GitHub projects. We reveal their motivations to assume a key role in FLOSS projects (e.g., improving the projects because they are also using it), the project characteristics that most helped in their engagement process (e.g., a friendly community), and the barriers faced by the surveyed core developers (e.g., lack of time of the project leaders). We also compare our results with related studies about others kinds of open source contributors (casual, one-time, and newcomers).
△ Less
Submitted 15 March, 2018;
originally announced March 2018.
-
Why and How Java Developers Break APIs
Authors:
Aline Brito,
Laerte Xavier,
Andre Hora,
Marco Tulio Valente
Abstract:
Modern software development depends on APIs to reuse code and increase productivity. As most software systems, these libraries and frameworks also evolve, which may break existing clients. However, the main reasons to introduce breaking changes in APIs are unclear. Therefore, in this paper, we report the results of an almost 4-month long field study with the developers of 400 popular Java librarie…
▽ More
Modern software development depends on APIs to reuse code and increase productivity. As most software systems, these libraries and frameworks also evolve, which may break existing clients. However, the main reasons to introduce breaking changes in APIs are unclear. Therefore, in this paper, we report the results of an almost 4-month long field study with the developers of 400 popular Java libraries and frameworks. We configured an infrastructure to observe all changes in these libraries and to detect breaking changes shortly after their introduction in the code. After identifying breaking changes, we asked the developers to explain the reasons behind their decision to change the APIs. During the study, we identified 59 breaking changes, confirmed by the developers of 19 projects. By analyzing the developers' answers, we report that breaking changes are mostly motivated by the need to implement new features, by the desire to make the APIs simpler and with fewer elements, and to improve maintainability. We conclude by providing suggestions to language designers, tool builders, software engineering researchers and API developers.
△ Less
Submitted 16 January, 2018;
originally announced January 2018.
-
Why Modern Open Source Projects Fail
Authors:
Jailton Coelho,
Marco Tulio Valente
Abstract:
Open source is experiencing a renaissance period, due to the appearance of modern platforms and workflows for develo** and maintaining public code. As a result, developers are creating open source software at speeds never seen before. Consequently, these projects are also facing unprecedented mortality rates. To better understand the reasons for the failure of modern open source projects, this p…
▽ More
Open source is experiencing a renaissance period, due to the appearance of modern platforms and workflows for develo** and maintaining public code. As a result, developers are creating open source software at speeds never seen before. Consequently, these projects are also facing unprecedented mortality rates. To better understand the reasons for the failure of modern open source projects, this paper describes the results of a survey with the maintainers of 104 popular GitHub systems that have been deprecated. We provide a set of nine reasons for the failure of these open source projects. We also show that some maintenance practices -- specifically the adoption of contributing guidelines and continuous integration -- have an important association with a project failure or success. Finally, we discuss and reveal the principal strategies developers have tried to overcome the failure of the studied projects.
△ Less
Submitted 7 July, 2017;
originally announced July 2017.
-
CodeCity for (and by) JavaScript
Authors:
Marcos Viana,
Andre Hora,
Marco Tulio Valente
Abstract:
JavaScript is one of the most popular programming languages on the web. Despite the language popularity and the increasing size of JavaScript systems, there is a limited number of visualization tools that can be used by developers to comprehend, maintain, and evolve JavaScript software. In this paper, we introduce JSCity, an implementation in JavaScript of the well-known Code City software visuali…
▽ More
JavaScript is one of the most popular programming languages on the web. Despite the language popularity and the increasing size of JavaScript systems, there is a limited number of visualization tools that can be used by developers to comprehend, maintain, and evolve JavaScript software. In this paper, we introduce JSCity, an implementation in JavaScript of the well-known Code City software visualization metaphor. JSCity relies on JavaScript features and libraries to show "software cities" in standard web browsers, without requiring complex installation procedures. We also report our experience on producing visualizations for 40 popular JavaScript systems using JScity.
△ Less
Submitted 15 May, 2017;
originally announced May 2017.
-
AngularJS Performance: A Survey Study
Authors:
Miguel Ramos,
Marco Tulio Valente,
Ricardo Terra
Abstract:
AngularJS is a popular JavaScript MVC-based framework to construct single-page web applications. In this paper, we report the results of a survey with 95 professional developers about performance issues of AngularJS applications. We report common practices followed by developers to avoid performance problems (e.g., use of third-party or custom components), the general causes of performance problem…
▽ More
AngularJS is a popular JavaScript MVC-based framework to construct single-page web applications. In this paper, we report the results of a survey with 95 professional developers about performance issues of AngularJS applications. We report common practices followed by developers to avoid performance problems (e.g., use of third-party or custom components), the general causes of performance problems in AngularJS applications (e.g., inadequate architecture decisions taken by AngularJS users), and the technical and specific causes of performance problems (e.g., unnecessary processing included in the digest cycle, which is the internal computation that automatically updates the view with changes detected in the model).
△ Less
Submitted 6 May, 2017;
originally announced May 2017.
-
RefDiff: Detecting Refactorings in Version Histories
Authors:
Danilo Silva,
Marco Tulio Valente
Abstract:
Refactoring is a well-known technique that is widely adopted by software engineers to improve the design and enable the evolution of a system. Knowing which refactoring operations were applied in a code change is a valuable information to understand software evolution, adapt software components, merge code changes, and other applications. In this paper, we present RefDiff, an automated approach th…
▽ More
Refactoring is a well-known technique that is widely adopted by software engineers to improve the design and enable the evolution of a system. Knowing which refactoring operations were applied in a code change is a valuable information to understand software evolution, adapt software components, merge code changes, and other applications. In this paper, we present RefDiff, an automated approach that identifies refactorings performed between two code revisions in a git repository. RefDiff employs a combination of heuristics based on static analysis and code similarity to detect 13 well-known refactoring types. In an evaluation using an oracle of 448 known refactoring operations, distributed across seven Java projects, our approach achieved precision of 100% and recall of 88%. Moreover, our evaluation suggests that RefDiff has superior precision and recall than existing state-of-the-art approaches.
△ Less
Submitted 5 April, 2017;
originally announced April 2017.
-
Assessing Code Authorship: The Case of the Linux Kernel
Authors:
Guilherme Avelino,
Leonardo Passos,
Andre Hora,
Marco Tulio Valente
Abstract:
Code authorship is a key information in large-scale open source systems. Among others, it allows maintainers to assess division of work and identify key collaborators. Interestingly, open-source communities lack guidelines on how to manage authorship. This could be mitigated by setting to build an empirical body of knowledge on how authorship-related measures evolve in successful open-source commu…
▽ More
Code authorship is a key information in large-scale open source systems. Among others, it allows maintainers to assess division of work and identify key collaborators. Interestingly, open-source communities lack guidelines on how to manage authorship. This could be mitigated by setting to build an empirical body of knowledge on how authorship-related measures evolve in successful open-source communities. Towards that direction, we perform a case study on the Linux kernel. Our results show that: (a) only a small portion of developers (26 %) makes significant contributions to the code base; (b) the distribution of the number of files per author is highly skewed --- a small group of top authors (3 %) is responsible for hundreds of files, while most authors (75 %) are responsible for at most 11 files; (c) most authors (62 %) have a specialist profile; (d) authors with a high number of co-authorship connections tend to collaborate with others with less connections.
△ Less
Submitted 8 March, 2017;
originally announced March 2017.
-
Refactoring Legacy JavaScript Code to Use Classes: The Good, The Bad and The Ugly
Authors:
Leonardo Humberto Silva,
Marco Tulio Valente,
Alexandre Bergel
Abstract:
JavaScript systems are becoming increasingly complex and large. To tackle the challenges involved in implementing these systems, the language is evolving to include several constructions for programming- in-the-large. For example, although the language is prototype-based, the latest JavaScript standard, named ECMAScript 6 (ES6), provides native support for implementing classes. Even though most mo…
▽ More
JavaScript systems are becoming increasingly complex and large. To tackle the challenges involved in implementing these systems, the language is evolving to include several constructions for programming- in-the-large. For example, although the language is prototype-based, the latest JavaScript standard, named ECMAScript 6 (ES6), provides native support for implementing classes. Even though most modern web browsers support ES6, only a very few applications use the class syntax. In this paper, we analyze the process of migrating structures that emulate classes in legacy JavaScript code to adopt the new syntax for classes introduced by ES6. We apply a set of migration rules on eight legacy JavaScript systems. In our study, we document: (a) cases that are straightforward to migrate (the good parts); (b) cases that require manual and ad-hoc migration (the bad parts); and (c) cases that cannot be migrated due to limitations and restrictions of ES6 (the ugly parts). Six out of eight systems (75%) contain instances of bad and/or ugly cases. We also collect the perceptions of JavaScript developers about migrating their code to use the new syntax for classes.
△ Less
Submitted 5 March, 2017;
originally announced March 2017.
-
Time in the theory of relativity: inertial time, light clocks, and proper time
Authors:
Mario Bacelar Valente
Abstract:
In a way similar to classical mechanics where we have the concept of inertial time as expressed in the motions of bodies, in the (special) theory of relativity we can regard the inertial time as the only notion of time at play. The inertial time is expressed also in the propagation of light. This gives rise to a notion of clock, the light clock, which we can regard as a notion derived from the ine…
▽ More
In a way similar to classical mechanics where we have the concept of inertial time as expressed in the motions of bodies, in the (special) theory of relativity we can regard the inertial time as the only notion of time at play. The inertial time is expressed also in the propagation of light. This gives rise to a notion of clock, the light clock, which we can regard as a notion derived from the inertial time. The light clock can be seen as a solution of the theory, which complies with the requirement that a clock to be so must have a rate that is independent of its past history. Contrary to Einstein's view, we do not need the concept of clock as an independent concept. This implies, in particular, that we do not need to rely on the notions of atomic clock or atomic time in the theory of relativity.
△ Less
Submitted 14 September, 2017; v1 submitted 25 October, 2016;
originally announced October 2016.
-
AngularJS in the Wild: A Survey with 460 Developers
Authors:
Miguel Ramos,
Marco Tulio Valente,
Ricardo Terra,
Gustavo Santos
Abstract:
To implement modern web applications, a new family of JavaScript frameworks has emerged, using the MVC pattern. Among these frameworks, the most popular one is AngularJS, which is supported by Google. In spite of its popularity, there is not a clear knowledge on how AngularJS design and features affect the development experience of Web applications. Therefore, this paper reports the results of a s…
▽ More
To implement modern web applications, a new family of JavaScript frameworks has emerged, using the MVC pattern. Among these frameworks, the most popular one is AngularJS, which is supported by Google. In spite of its popularity, there is not a clear knowledge on how AngularJS design and features affect the development experience of Web applications. Therefore, this paper reports the results of a survey about AngularJS, including answers from 460 developers. Our contributions include the identification of the most appreciated features of AngularJS (e.g., custom interface components, dependency injection, and two-way data binding) and the most problematic aspects of the framework (e.g., performance and implementation of directives).
△ Less
Submitted 27 September, 2016; v1 submitted 5 August, 2016;
originally announced August 2016.
-
Predicting the Popularity of GitHub Repositories
Authors:
Hudson Borges,
Andre Hora,
Marco Tulio Valente
Abstract:
GitHub is the largest source code repository in the world. It provides a git-based source code management platform and also many features inspired by social networks. For example, GitHub users can show appreciation to projects by adding stars to them. Therefore, the number of stars of a repository is a direct measure of its popularity. In this paper, we use multiple linear regressions to predict t…
▽ More
GitHub is the largest source code repository in the world. It provides a git-based source code management platform and also many features inspired by social networks. For example, GitHub users can show appreciation to projects by adding stars to them. Therefore, the number of stars of a repository is a direct measure of its popularity. In this paper, we use multiple linear regressions to predict the number of stars of GitHub repositories. These predictions are useful both to repository owners and clients, who usually want to know how their projects are performing in a competitive open source development market. In a large-scale analysis, we show that the proposed models start to provide accurate predictions after being trained with the number of stars received in the last six months. Furthermore, specific models---generated using data from repositories that share the same growth trends---are recommended for repositories with slow growth and/or for repositories with less stars. Finally, we evaluate the ability to predict not the number of stars of a repository but its rank among the GitHub repositories. We found a very strong correlation between predicted and real rankings (Spearman's rho greater than 0.95).
△ Less
Submitted 14 July, 2016;
originally announced July 2016.
-
Why We Refactor? Confessions of GitHub Contributors
Authors:
Danilo Silva,
Nikolaos Tsantalis,
Marco Tulio Valente
Abstract:
Refactoring is a widespread practice that helps developers to improve the maintainability and readability of their code. However, there is a limited number of studies empirically investigating the actual motivations behind specific refactoring operations applied by developers. To fill this gap, we monitored Java projects hosted on GitHub to detect recently applied refactorings, and asked the devel…
▽ More
Refactoring is a widespread practice that helps developers to improve the maintainability and readability of their code. However, there is a limited number of studies empirically investigating the actual motivations behind specific refactoring operations applied by developers. To fill this gap, we monitored Java projects hosted on GitHub to detect recently applied refactorings, and asked the developers to ex- plain the reasons behind their decision to refactor the code. By applying thematic analysis on the collected responses, we compiled a catalogue of 44 distinct motivations for 12 well-known refactoring types. We found that refactoring activity is mainly driven by changes in the requirements and much less by code smells. Extract Method is the most versatile refactoring operation serving 11 different purposes. Finally, we found evidence that the IDE used by the developers affects the adoption of automated refactoring tools.
△ Less
Submitted 8 July, 2016;
originally announced July 2016.