-
Physically recurrent neural network for rate and path-dependent heterogeneous materials in a finite strain framework
Authors:
M. A. Maia,
I. B. C. M. Rocha,
D. Kovačević,
F. P. van der Meer
Abstract:
In this work, a hybrid physics-based data-driven surrogate model for the microscale analysis of heterogeneous material is investigated. The proposed model benefits from the physics-based knowledge contained in the constitutive models used in the full-order micromodel by embedding them in a neural network. Following previous developments, this paper extends the applicability of the physically recur…
▽ More
In this work, a hybrid physics-based data-driven surrogate model for the microscale analysis of heterogeneous material is investigated. The proposed model benefits from the physics-based knowledge contained in the constitutive models used in the full-order micromodel by embedding them in a neural network. Following previous developments, this paper extends the applicability of the physically recurrent neural network (PRNN) by introducing an architecture suitable for rate-dependent materials in a finite strain framework. In this model, the homogenized deformation gradient of the micromodel is encoded into a set of deformation gradients serving as input to the embedded constitutive models. These constitutive models compute stresses, which are combined in a decoder to predict the homogenized stress, such that the internal variables of the history-dependent constitutive models naturally provide physics-based memory for the network. To demonstrate the capabilities of the surrogate model, we consider a unidirectional composite micromodel with transversely isotropic elastic fibers and elasto-viscoplastic matrix material. The extrapolation properties of the surrogate model trained to replace such micromodel are tested on loading scenarios unseen during training, ranging from different strain-rates to cyclic loading and relaxation. Speed-ups of three orders of magnitude with respect to the runtime of the original micromodel are obtained.
△ Less
Submitted 5 April, 2024;
originally announced April 2024.
-
A Review of the In-Network Computing and Its Role in the Edge-Cloud Continuum
Authors:
Manel Gherari,
Fatemeh Aghaali Akbari,
Sama Habibi,
Soukaina Ouledsidi Ali,
Zakaria Ait Hmitti,
Youcef Kardjadja,
Muhammad Saqib,
Adyson Magalhaes Maia,
Marsa Rayani,
Ece Gelal Soyak,
Halima Elbiaze,
Ozgur Ercetin,
Yacine Ghamri-Doudane,
Roch Glitho,
Wessam Ajib
Abstract:
Future networks are anticipated to enable exciting applications and industrial services ranging from Multisensory Extended Reality to Holographic and Haptic communication. These services are accompanied by high bandwidth requirements and/or require low latency and low reliability, which leads to the need for scarce and expensive resources. Cloud and edge computing offer different functionalities t…
▽ More
Future networks are anticipated to enable exciting applications and industrial services ranging from Multisensory Extended Reality to Holographic and Haptic communication. These services are accompanied by high bandwidth requirements and/or require low latency and low reliability, which leads to the need for scarce and expensive resources. Cloud and edge computing offer different functionalities to these applications that require communication, computing, and caching (3C) resources working collectively. Hence, a paradigm shift is necessary to enable the joint management of the 3Cs in the edge-cloud continuum. We argue that In-Network Computing (INC) is the missing element that completes the edge-cloud continuum. This paper provides a detailed analysis of the driving use-cases, explores the synergy between INC and 3C, and emphasizes the crucial role of INC. A discussion on the opportunities and challenges posed by INC is held from various perspectives, including hardware implementation, architectural design, and regulatory and commercial aspects.
△ Less
Submitted 4 August, 2023;
originally announced December 2023.
-
How do Developers Improve Code Readability? An Empirical Study of Pull Requests
Authors:
Carlos Eduardo C. Dantas,
Adriano M. Rocha,
Marcelo A. Maia
Abstract:
Readability models and tools have been proposed to measure the effort to read code. However, these models are not completely able to capture the quality improvements in code as perceived by developers. To investigate possible features for new readability models and production-ready tools, we aim to better understand the types of readability improvements performed by developers when actually improv…
▽ More
Readability models and tools have been proposed to measure the effort to read code. However, these models are not completely able to capture the quality improvements in code as perceived by developers. To investigate possible features for new readability models and production-ready tools, we aim to better understand the types of readability improvements performed by developers when actually improving code readability, and identify discrepancies between suggestions of automatic static tools and the actual improvements performed by developers. We collected 370 code readability improvements from 284 Merged Pull Requests (PRs) under 109 GitHub repositories and produce a catalog with 26 different types of code readability improvements, where in most of the scenarios, the developers improved the code readability to be more intuitive, modular, and less verbose. Surprisingly, SonarQube only detected 26 out of the 370 code readability improvements. This suggests that some of the catalog produced has not yet been addressed by SonarQube rules, highlighting the potential for improvement in Automatic static analysis tools (ASAT) code readability rules as they are perceived by developers.
△ Less
Submitted 5 September, 2023;
originally announced September 2023.
-
Physically recurrent neural networks for path-dependent heterogeneous materials: embedding constitutive models in a data-driven surrogate
Authors:
M. A. Maia,
I. B. C. M. Rocha,
P. Kerfriden,
F. P. van der Meer
Abstract:
Driven by the need to accelerate numerical simulations, the use of machine learning techniques is rapidly growing in the field of computational solid mechanics. Their application is especially advantageous in concurrent multiscale finite element analysis (FE$^2$) due to the exceedingly high computational costs often associated with it and the high number of similar micromechanical analyses involve…
▽ More
Driven by the need to accelerate numerical simulations, the use of machine learning techniques is rapidly growing in the field of computational solid mechanics. Their application is especially advantageous in concurrent multiscale finite element analysis (FE$^2$) due to the exceedingly high computational costs often associated with it and the high number of similar micromechanical analyses involved. To tackle the issue, using surrogate models to approximate the microscopic behavior and accelerate the simulations is a promising and increasingly popular strategy. However, several challenges related to their data-driven nature compromise the reliability of surrogate models in material modeling. The alternative explored in this work is to reintroduce some of the physics-based knowledge of classical constitutive modeling into a neural network by employing the actual material models used in the full-order micromodel to introduce non-linearity. Thus, path-dependency arises naturally since every material model in the layer keeps track of its own internal variables. For the numerical examples, a composite Representative Volume Element with elastic fibers and elasto-plastic matrix material is used as the microscopic model. The network is tested in a series of challenging scenarios and its performance is compared to that of a state-of-the-art Recurrent Neural Network (RNN). A remarkable outcome of the novel framework is the ability to naturally predict unloading/reloading behavior without ever seeing it during training, a stark contrast with popular but data-hungry models such as RNNs. Finally, the proposed network is applied to FE$^2$ examples to assess its robustness for application in nonlinear finite element analysis.
△ Less
Submitted 15 September, 2022;
originally announced September 2022.
-
GP-BART: a novel Bayesian additive regression trees approach using Gaussian processes
Authors:
Mateus Maia,
Keefe Murphy,
Andrew C. Parnell
Abstract:
The Bayesian additive regression trees (BART) model is an ensemble method extensively and successfully used in regression tasks due to its consistently strong predictive performance and its ability to quantify uncertainty. BART combines "weak" tree models through a set of shrinkage priors, whereby each tree explains a small portion of the variability in the data. However, the lack of smoothness an…
▽ More
The Bayesian additive regression trees (BART) model is an ensemble method extensively and successfully used in regression tasks due to its consistently strong predictive performance and its ability to quantify uncertainty. BART combines "weak" tree models through a set of shrinkage priors, whereby each tree explains a small portion of the variability in the data. However, the lack of smoothness and the absence of an explicit covariance structure over the observations in standard BART can yield poor performance in cases where such assumptions would be necessary. The Gaussian processes Bayesian additive regression trees (GP-BART) model is an extension of BART which addresses this limitation by assuming Gaussian process (GP) priors for the predictions of each terminal node among all trees. The model's effectiveness is demonstrated through applications to simulated and real-world data, surpassing the performance of traditional modeling approaches in various scenarios.
△ Less
Submitted 14 September, 2023; v1 submitted 5 April, 2022;
originally announced April 2022.
-
Readability and Understandability of Snippets Recommended by General-purpose Web Search Engines: a Comparative Study
Authors:
Carlos Eduardo C. Dantas,
Marcelo A. Maia
Abstract:
Developers often search for reusable code snippets on general-purpose web search engines like Google, Yahoo! or Microsoft Bing. But some of these code snippets may have poor quality in terms of readability or understandability. In this paper, we propose an empirical analysis to analyze the readability and understandability score from snippets extracted from the web using three independent variable…
▽ More
Developers often search for reusable code snippets on general-purpose web search engines like Google, Yahoo! or Microsoft Bing. But some of these code snippets may have poor quality in terms of readability or understandability. In this paper, we propose an empirical analysis to analyze the readability and understandability score from snippets extracted from the web using three independent variables: ranking, general-purpose web search engine, and recommended site. We collected the top-5 recommended sites and their respective code snippet recommendations using Google, Yahoo!, and Bing for 9,480 queries, and evaluate their readability and understandability scores. We found that some recommended sites have significantly better readability and understandability scores than others. The better-ranked code snippet is not necessarily more readable or understandable than a lower-ranked code snippet for all general-purpose web search engines. Moreover, considering the readability score, Google has better-ranked code snippets compared to Yahoo! or Microsoft Bing
△ Less
Submitted 13 October, 2021;
originally announced October 2021.
-
Readability and Understandability Scores for Snippet Assessment: an Exploratory Study
Authors:
Carlos Eduardo C. Dantas,
Marcelo A. Maia
Abstract:
Code search engines usually use readability feature to rank code snippets. There are several metrics to calculate this feature, but developers may have different perceptions about readability. Correlation between readability and understandability features has already been proposed, i.e., developers need to read and comprehend the code snippet syntax, but also understand the semantics. This work in…
▽ More
Code search engines usually use readability feature to rank code snippets. There are several metrics to calculate this feature, but developers may have different perceptions about readability. Correlation between readability and understandability features has already been proposed, i.e., developers need to read and comprehend the code snippet syntax, but also understand the semantics. This work investigate scores for understandability and readability features, under the perspective of the possible subjective perception of code snippet comprehension. We find that code snippets with higher readability score has better comprehension than lower ones. The understandability score presents better comprehension in specific situations, e.g. nested loops or if-else chains. The developers also mentioned writability aspects as the principal characteristic to evaluate code snippets comprehension. These results provide insights for future works in code comprehension score optimization.
△ Less
Submitted 20 August, 2021;
originally announced August 2021.
-
Improved Retrieval of Programming Solutions With Code Examples Using a Multi-featured Score
Authors:
Rodrigo F. Silva,
M. Masudur Rahman,
Carlos Eduardo Dantas,
Chanchal Roy,
Foutse Khomh,
Marcelo A. Maia
Abstract:
Developers often depend on code search engines to obtain solutions for their programming tasks. However, finding an expected solution containing code examples along with their explanations is challenging due to several issues. There is a vocabulary mismatch between the search keywords (the query) and the appropriate solutions. Semantic gap may increase for similar bag of words due to antonyms and…
▽ More
Developers often depend on code search engines to obtain solutions for their programming tasks. However, finding an expected solution containing code examples along with their explanations is challenging due to several issues. There is a vocabulary mismatch between the search keywords (the query) and the appropriate solutions. Semantic gap may increase for similar bag of words due to antonyms and negation. Moreover, documents retrieved by search engines might not contain solutions containing both code examples and their explanations. So, we propose CRAR (Crowd Answer Recommender) to circumvent those issues aiming at improving retrieval of relevant answers from Stack Overflow containing not only the expected code examples for the given task but also their explanations. Given a programming task, we investigate the effectiveness of combining information retrieval techniques along with a set of features to enhance the ranking of important threads (i.e., the units containing questions along with their answers) for the given task and then selects relevant answers contained in those threads, including semantic features, like word embeddings and sentence embeddings, for instance, a Convolutional Neural Network (CNN). CRAR also leverages social aspects of Stack Overflow discussions like popularity to select relevant answers for the tasks. Our experimental evaluation shows that the combination of the different features performs better than each one individually. We also compare the retrieval performance with the state-of-art CROKAGE (Crowd Knowledge Answer Generator), which is also a system aimed at retrieving relevant answers from Stack Overflow. We show that CRAR outperforms CROKAGE in Mean Reciprocal Rank and Mean Recall with small and medium effect sizes, respectively.
△ Less
Submitted 5 August, 2021;
originally announced August 2021.
-
On the Interplay of Smells Large Class, Complex Class and Duplicate Code
Authors:
Elder Vicente de Paulo Sobrinho,
Marcelo de Almeida Maia
Abstract:
Bad smells have been defined to describe potential problems in code, possibly pointing out refactoring opportunities. Several empirical studies have highlighted that smells have a negative impact on comprehension and maintainability. Consequently, several approaches have been proposed to detect and restructure them. However, studies on the inter-relationship of occurrence of different types of sme…
▽ More
Bad smells have been defined to describe potential problems in code, possibly pointing out refactoring opportunities. Several empirical studies have highlighted that smells have a negative impact on comprehension and maintainability. Consequently, several approaches have been proposed to detect and restructure them. However, studies on the inter-relationship of occurrence of different types of smells in source code are still lacking, especially those focused on the quantification of this inter-relationship. In this work, we aim at understand and quantify the possible the inter-relation of smells Large Class - LC, Complex Class - CC and Duplicate Code - DC. In particular, we investigate patterns of LC and CC regarding the presence or absence of duplicate code. We conduct a quantitative study on five open source projects, and also a qualitative analysis to measure and understand the association of specific smells. As one of the main results, we highlight that there are "occurrence patterns" among these smells, for example: either in Complex Class or in the co-occurrence of Large Class and Complex Class, clones tend to be more prevalent in highly complex classes than less complex classes. The found patterns could be used to improve the performance of detection tools or even help in refactoring tasks.
△ Less
Submitted 20 July, 2021;
originally announced July 2021.
-
Towards a question answering assistant for software development using a transformer-based language model
Authors:
Liliane do Nascimento Vale,
Marcelo de Almeida Maia
Abstract:
Question answering platforms, such as Stack Overflow, have impacted substantially how developers search for solutions for their programming problems. The crowd knowledge content available from such platforms has also been used to leverage software development tools. The recent advances on Natural Language Processing, specifically on more powerful language models, have demonstrated ability to enhan…
▽ More
Question answering platforms, such as Stack Overflow, have impacted substantially how developers search for solutions for their programming problems. The crowd knowledge content available from such platforms has also been used to leverage software development tools. The recent advances on Natural Language Processing, specifically on more powerful language models, have demonstrated ability to enhance text understanding and generation. In this context, we aim at investigating the factors that can influence on the application of such models for understanding source code related data and produce more interactive and intelligent assistants for software development. In this preliminary study, we particularly investigate if a how-to question filter and the level of context in the question may impact the results of a question answering transformer-based model. We suggest that fine-tuning models with corpus based on how-to questions can impact positively in the model and more contextualized questions also induce more objective answers.
△ Less
Submitted 16 March, 2021;
originally announced March 2021.
-
Self-Adaptive Microservice-based Systems -- Landscape and Research Opportunities
Authors:
Messias Filho,
Eliaquim Pimentel,
Wellington Pereira,
Paulo Henrique M. Maia,
Mariela I. Cortés
Abstract:
Microservices have become popular in the past few years, attracting the interest of both academia and industry. Despite of its benefits, this new architectural style still poses important challenges, such as resilience, performance and evolution. Self-adaptation techniques have been applied recently as an alternative to solve or mitigate those problems. However, due to the range of quality attribu…
▽ More
Microservices have become popular in the past few years, attracting the interest of both academia and industry. Despite of its benefits, this new architectural style still poses important challenges, such as resilience, performance and evolution. Self-adaptation techniques have been applied recently as an alternative to solve or mitigate those problems. However, due to the range of quality attributes that affect microservice architectures, many different self-adaptation strategies can be used. Thus, to understand the state-of-the-art of the use of self-adaptation techniques and mechanisms in microservice-based systems, this work conducted a systematic map**, in which 21 primary studies were analyzed considering qualitative and quantitative research questions. The results show that most studies focus on the Monitor phase (28.57%) of the adaptation control loop, address the self-healing property (23.81%), apply a reactive adaptation strategy (80.95%) in the system infrastructure level (47.62%) and use a centralized approach (38.10%). From those, it was possible to propose some research directions to fill existing gaps.
△ Less
Submitted 29 March, 2021; v1 submitted 15 March, 2021;
originally announced March 2021.
-
A machine learning approach to galaxy properties: joint redshift-stellar mass probability distributions with Random Forest
Authors:
S. Mucesh,
W. G. Hartley,
A. Palmese,
O. Lahav,
L. Whiteway,
A. F. L. Bluck,
A. Alarcon,
A. Amon,
K. Bechtol,
G. M. Bernstein,
A. Carnero Rosell,
M. Carrasco Kind,
A. Choi,
K. Eckert,
S. Everett,
D. Gruen,
R. A. Gruendl,
I. Harrison,
E. M. Huff,
N. Kuropatkin,
I. Sevilla-Noarbe,
E. Sheldon,
B. Yanny,
M. Aguena,
S. Allam
, et al. (50 additional authors not shown)
Abstract:
We demonstrate that highly accurate joint redshift-stellar mass probability distribution functions (PDFs) can be obtained using the Random Forest (RF) machine learning (ML) algorithm, even with few photometric bands available. As an example, we use the Dark Energy Survey (DES), combined with the COSMOS2015 catalogue for redshifts and stellar masses. We build two ML models: one containing deep phot…
▽ More
We demonstrate that highly accurate joint redshift-stellar mass probability distribution functions (PDFs) can be obtained using the Random Forest (RF) machine learning (ML) algorithm, even with few photometric bands available. As an example, we use the Dark Energy Survey (DES), combined with the COSMOS2015 catalogue for redshifts and stellar masses. We build two ML models: one containing deep photometry in the $griz$ bands, and the second reflecting the photometric scatter present in the main DES survey, with carefully constructed representative training data in each case. We validate our joint PDFs for $10,699$ test galaxies by utilizing the copula probability integral transform and the Kendall distribution function, and their univariate counterparts to validate the marginals. Benchmarked against a basic set-up of the template-fitting code BAGPIPES, our ML-based method outperforms template fitting on all of our predefined performance metrics. In addition to accuracy, the RF is extremely fast, able to compute joint PDFs for a million galaxies in just under $6$ min with consumer computer hardware. Such speed enables PDFs to be derived in real time within analysis codes, solving potential storage issues. As part of this work we have developed GALPRO, a highly intuitive and efficient Python package to rapidly generate multivariate PDFs on-the-fly. GALPRO is documented and available for researchers to use in their cosmology and galaxy evolution studies.
△ Less
Submitted 19 February, 2021; v1 submitted 10 December, 2020;
originally announced December 2020.
-
MineReduce: an approach based on data mining for problem size reduction
Authors:
Marcelo Rodrigues de Holanda Maia,
Alexandre Plastino,
Puca Huachi Vaz Penna
Abstract:
Hybrid variations of metaheuristics that include data mining strategies have been utilized to solve a variety of combinatorial optimization problems, with superior and encouraging results. Previous hybrid strategies applied mined patterns to guide the construction of initial solutions, leading to more effective exploration of the solution space. Solving a combinatorial optimization problem is usua…
▽ More
Hybrid variations of metaheuristics that include data mining strategies have been utilized to solve a variety of combinatorial optimization problems, with superior and encouraging results. Previous hybrid strategies applied mined patterns to guide the construction of initial solutions, leading to more effective exploration of the solution space. Solving a combinatorial optimization problem is usually a hard task because its solution space grows exponentially with its size. Therefore, problem size reduction is also a useful strategy in this context, especially in the case of large-scale problems. In this paper, we build upon these ideas by presenting an approach named MineReduce, which uses mined patterns to perform problem size reduction. We present an application of MineReduce to improve a heuristic for the heterogeneous fleet vehicle routing problem. The results obtained in computational experiments show that this proposed heuristic demonstrates superior performance compared to the original heuristic and other state-of-the-art heuristics, achieving better solution costs with shorter run times.
△ Less
Submitted 22 May, 2020; v1 submitted 15 May, 2020;
originally announced May 2020.
-
Random Machines Regression Approach: an ensemble support vector regression model with free kernel choice
Authors:
Anderson Ara,
Mateus Maia,
Samuel Macêdo,
Francisco Louzada
Abstract:
Machine learning techniques always aim to reduce the generalized prediction error. In order to reduce it, ensemble methods present a good approach combining several models that results in a greater forecasting capacity. The Random Machines already have been demonstrated as strong technique, i.e: high predictive power, to classification tasks, in this article we propose an procedure to use the bagg…
▽ More
Machine learning techniques always aim to reduce the generalized prediction error. In order to reduce it, ensemble methods present a good approach combining several models that results in a greater forecasting capacity. The Random Machines already have been demonstrated as strong technique, i.e: high predictive power, to classification tasks, in this article we propose an procedure to use the bagged-weighted support vector model to regression problems. Simulation studies were realized over artificial datasets, and over real data benchmarks. The results exhibited a good performance of Regression Random Machines through lower generalization error without needing to choose the best kernel function during tuning process.
△ Less
Submitted 27 March, 2020;
originally announced March 2020.
-
Random Machines: A bagged-weighted support vector model with free kernel choice
Authors:
Anderson Ara,
Mateus Maia,
Samuel Macêdo,
Francisco Louzada
Abstract:
Improvement of statistical learning models in order to increase efficiency in solving classification or regression problems is still a goal pursued by the scientific community. In this way, the support vector machine model is one of the most successful and powerful algorithms for those tasks. However, its performance depends directly from the choice of the kernel function and their hyperparameters…
▽ More
Improvement of statistical learning models in order to increase efficiency in solving classification or regression problems is still a goal pursued by the scientific community. In this way, the support vector machine model is one of the most successful and powerful algorithms for those tasks. However, its performance depends directly from the choice of the kernel function and their hyperparameters. The traditional choice of them, actually, can be computationally expensive to do the kernel choice and the tuning processes. In this article, it is proposed a novel framework to deal with the kernel function selection called Random Machines. The results improved accuracy and reduced computational time. The data study was performed in simulated data and over 27 real benchmarking datasets.
△ Less
Submitted 21 November, 2019;
originally announced November 2019.
-
Bootstrap** Cookbooks for APIs from Crowd Knowledge on Stack Overflow
Authors:
Lucas B. L. Souza,
Eduardo C. Campos,
Fernanda Madeiral,
Klérisson Paixão,
Adriano M. Rocha,
Marcelo de Almeida Maia
Abstract:
Well established libraries typically have API documentation. However, they frequently lack examples and explanations, possibly making difficult their effective reuse. Stack Overflow is a question-and-answer website oriented to issues related to software development. Despite the increasing adoption of Stack Overflow, the information related to a particular topic (e.g., an API) is spread across the…
▽ More
Well established libraries typically have API documentation. However, they frequently lack examples and explanations, possibly making difficult their effective reuse. Stack Overflow is a question-and-answer website oriented to issues related to software development. Despite the increasing adoption of Stack Overflow, the information related to a particular topic (e.g., an API) is spread across the website. Thus, Stack Overflow still lacks organization of the crowd knowledge available on it. Our target goal is to address the problem of the poor quality documentation for APIs by providing an alternative artifact to document them based on the crowd knowledge available on Stack Overflow, called crowd cookbook. A cookbook is a recipe-oriented book, and we refer to our cookbook as crowd cookbook since it contains content generated by a crowd. The cookbooks are meant to be used through an exploration process, i.e. browsing. In this paper, we present a semi-automatic approach that organizes the crowd knowledge available on Stack Overflow to build cookbooks for APIs. We have generated cookbooks for three APIs widely used by the software development community: SWT, LINQ and QT. We have also defined desired properties that crowd cookbooks must meet, and we conducted an evaluation of the cookbooks against these properties with human subjects. The results showed that the cookbooks built using our approach, in general, meet those properties. As a highlight, most of the recipes were considered appropriate to be in the cookbooks and have self-contained information. We concluded that our approach is capable to produce adequate cookbooks automatically, which can be as useful as manually produced cookbooks. This opens an opportunity for API designers to enrich existent cookbooks with the different points of view from the crowd, or even to generate initial versions of new cookbooks.
△ Less
Submitted 21 March, 2019;
originally announced March 2019.
-
Recommending Comprehensive Solutions for Programming Tasks by Mining Crowd Knowledge
Authors:
Rodrigo F. G. Silva,
Chanchal K. Roy,
Mohammad Masudur Rahman,
Kevin A. Schneider,
Klerisson Paixao,
Marcelo de Almeida Maia
Abstract:
Developers often search for relevant code examples on the web for their programming tasks. Unfortunately, they face two major problems. First, the search is impaired due to a lexical gap between their query (task description) and the information associated with the solution. Second, the retrieved solution may not be comprehensive, i.e., the code segment might miss a succinct explanation. These pro…
▽ More
Developers often search for relevant code examples on the web for their programming tasks. Unfortunately, they face two major problems. First, the search is impaired due to a lexical gap between their query (task description) and the information associated with the solution. Second, the retrieved solution may not be comprehensive, i.e., the code segment might miss a succinct explanation. These problems make the developers browse dozens of documents in order to synthesize an appropriate solution. To address these two problems, we propose CROKAGE (Crowd Knowledge Answer Generator), a tool that takes the description of a programming task (the query) and provides a comprehensive solution for the task. Our solutions contain not only relevant code examples but also their succinct explanations. Our proposed approach expands the task description with relevant API classes from Stack Overflow Q&A threads and then mitigates the lexical gap problems. Furthermore, we perform natural language processing on the top quality answers and then return such programming solutions containing code examples and code explanations unlike earlier studies. We evaluate our approach using 48 programming queries and show that it outperforms six baselines including the state-of-art by a statistically significant margin. Furthermore, our evaluation with 29 developers using 24 tasks (queries) confirms the superiority of CROKAGE over the state-of-art tool in terms of relevance of the suggested code examples, benefit of the code explanations and the overall solution quality (code + explanation).
△ Less
Submitted 20 March, 2019; v1 submitted 18 March, 2019;
originally announced March 2019.
-
Bears: An Extensible Java Bug Benchmark for Automatic Program Repair Studies
Authors:
Fernanda Madeiral,
Simon Urli,
Marcelo Maia,
Martin Monperrus
Abstract:
Benchmarks of bugs are essential to empirically evaluate automatic program repair tools. In this paper, we present Bears, a project for collecting and storing bugs into an extensible bug benchmark for automatic repair studies in Java. The collection of bugs relies on commit building state from Continuous Integration (CI) to find potential pairs of buggy and patched program versions from open-sourc…
▽ More
Benchmarks of bugs are essential to empirically evaluate automatic program repair tools. In this paper, we present Bears, a project for collecting and storing bugs into an extensible bug benchmark for automatic repair studies in Java. The collection of bugs relies on commit building state from Continuous Integration (CI) to find potential pairs of buggy and patched program versions from open-source projects hosted on GitHub. Each pair of program versions passes through a pipeline where an attempt of reproducing a bug and its patch is performed. The core step of the reproduction pipeline is the execution of the test suite of the program on both program versions. If a test failure is found in the buggy program version candidate and no test failure is found in its patched program version candidate, a bug and its patch were successfully reproduced. The uniqueness of Bears is the usage of CI (builds) to identify buggy and patched program version candidates, which has been widely adopted in the last years in open-source projects. This approach allows us to collect bugs from a diversity of projects beyond mature projects that use bug tracking systems. Moreover, Bears was designed to be publicly available and to be easily extensible by the research community through automatic creation of branches with bugs in a given GitHub repository, which can be used for pull requests in the Bears repository. We present in this paper the approach employed by Bears, and we deliver the version 1.0 of Bears, which contains 251 reproducible bugs collected from 72 projects that use the Travis CI and Maven build environment.
△ Less
Submitted 17 January, 2019;
originally announced January 2019.
-
Towards an automated approach for bug fix pattern detection
Authors:
Fernanda Madeiral,
Thomas Durieux,
Victor Sobreira,
Marcelo Maia
Abstract:
The characterization of bug datasets is essential to support the evaluation of automatic program repair tools. In a previous work, we manually studied almost 400 human-written patches (bug fixes) from the Defects4J dataset and annotated them with properties, such as repair patterns. However, manually finding these patterns in different datasets is tedious and time-consuming. To address this activi…
▽ More
The characterization of bug datasets is essential to support the evaluation of automatic program repair tools. In a previous work, we manually studied almost 400 human-written patches (bug fixes) from the Defects4J dataset and annotated them with properties, such as repair patterns. However, manually finding these patterns in different datasets is tedious and time-consuming. To address this activity, we designed and implemented PPD, a detector of repair patterns in patches, which performs source code change analysis at abstract-syntax tree level. In this paper, we report on PPD and its evaluation on Defects4J, where we compare the results from the automated detection with the results from the previous manual analysis. We found that PPD has overall precision of 91% and overall recall of 92%, and we conclude that PPD has the potential to detect as many repair patterns as human manual analysis.
△ Less
Submitted 30 July, 2018;
originally announced July 2018.
-
Dissection of a Bug Dataset: Anatomy of 395 Patches from Defects4J
Authors:
Victor Sobreira,
Thomas Durieux,
Fernanda Madeiral,
Martin Monperrus,
Marcelo A. Maia
Abstract:
Well-designed and publicly available datasets of bugs are an invaluable asset to advance research fields such as fault localization and program repair as they allow directly and fairly comparison between competing techniques and also the replication of experiments. These datasets need to be deeply understood by researchers: the answer for questions like "which bugs can my technique handle?" and "f…
▽ More
Well-designed and publicly available datasets of bugs are an invaluable asset to advance research fields such as fault localization and program repair as they allow directly and fairly comparison between competing techniques and also the replication of experiments. These datasets need to be deeply understood by researchers: the answer for questions like "which bugs can my technique handle?" and "for which bugs is my technique effective?" depends on the comprehension of properties related to bugs and their patches. However, such properties are usually not included in the datasets, and there is still no widely adopted methodology for characterizing bugs and patches. In this work, we deeply study 395 patches of the Defects4J dataset. Quantitative properties (patch size and spreading) were automatically extracted, whereas qualitative ones (repair actions and patterns) were manually extracted using a thematic analysis-based approach. We found that 1) the median size of Defects4J patches is four lines, and almost 30% of the patches contain only addition of lines; 2) 92% of the patches change only one file, and 38% has no spreading at all; 3) the top-3 most applied repair actions are addition of method calls, conditionals, and assignments, occurring in 77% of the patches; and 4) nine repair patterns were found for 95% of the patches, where the most prevalent, appearing in 43% of the patches, is on conditional blocks. These results are useful for researchers to perform advanced analysis on their techniques' results based on Defects4J. Moreover, our set of properties can be used to characterize and compare different bug datasets.
△ Less
Submitted 5 February, 2018; v1 submitted 19 January, 2018;
originally announced January 2018.
-
On the Interplay between Non-Functional Requirements and Builds on Continuous Integration
Authors:
Klérisson V. R. Paixão,
Crícia Z. Felício,
Fernanda M. Delfim,
Marcelo de A. Maia
Abstract:
Continuous Integration (CI) implies that a whole developer team works together on the mainline of a software project. CI systems automate the builds of a software. Sometimes a developer checks in code, which breaks the build. A broken build might not be a problem by itself, but it has the potential to disrupt co-workers, hence it affects the performance of the team. In this study, we investigate t…
▽ More
Continuous Integration (CI) implies that a whole developer team works together on the mainline of a software project. CI systems automate the builds of a software. Sometimes a developer checks in code, which breaks the build. A broken build might not be a problem by itself, but it has the potential to disrupt co-workers, hence it affects the performance of the team. In this study, we investigate the interplay between nonfunctional requirements (NFRs) and builds statuses from 1,283 software projects. We found significant differences among NFRs related-builds statuses. Thus, tools can be proposed to improve CI with focus on new ways to prevent failures into CI, specially for efficiency and usability related builds. Also, the time required to put a broken build back on track indicates a bimodal distribution along all NFRs, with higher peaks within a day and lower peaks in six weeks. Our results suggest that more planned schedule for maintainability for Ruby, and for functionality and reliability for Java would decrease delays related to broken builds.
△ Less
Submitted 29 March, 2017; v1 submitted 28 March, 2017;
originally announced March 2017.
-
Demonstration of an Aerial and Submersible Vehicle Capable of Flight and Underwater Navigation with Seamless Air-Water Transition
Authors:
Marco M. Maia,
Parth Soni,
Francisco J. Diez
Abstract:
Bio-inspired vehicles are currently leading the way in the quest to produce a vehicle capable of flight and underwater navigation. However, a fully functional vehicle has not yet been realized. We present the first fully functional vehicle platform operating in air and underwater with seamless transition between both mediums. These unique capabilities combined with the hovering, high maneuverabili…
▽ More
Bio-inspired vehicles are currently leading the way in the quest to produce a vehicle capable of flight and underwater navigation. However, a fully functional vehicle has not yet been realized. We present the first fully functional vehicle platform operating in air and underwater with seamless transition between both mediums. These unique capabilities combined with the hovering, high maneuverability and reliability of multirotor vehicles, results in a disruptive technology for both civil and military application including air/water search and rescue, inspection, repairs and survey missions among others. The invention was built on a bio-inspired locomotion force analysis that combines flight and swimming. Three main advances in the present work has allowed this invention. The first is the discovery of a seamless transition method between air and underwater. The second is the design of a multi-medium propulsion system capable of efficient operation in air and underwater. The third combines the requirements for lift and thrust for flight (for a given weight) and the requirements for thrust and neutral buoyancy (in water) for swimming. The result is a careful balance between lift, thrust, weight, and neutral buoyancy implemented in the vehicle design. A fully operational prototype demonstrated the flight, and underwater navigation capabilities as well as the rapid air/water and water/air transition.
△ Less
Submitted 7 July, 2015;
originally announced July 2015.
-
ModularityCheck: A Tool for Assessing Modularity using Co-Change Clusters
Authors:
Luciana Silva,
Daniel Felix,
Marco Tulio Valente,
Marcelo Maia
Abstract:
It is widely accepted that traditional modular structures suffer from the dominant decomposition problem. Therefore, to improve current modularity views, it is important to investigate the impact of design decisions concerning modularity in other dimensions, as the evolutionary view. In this paper, we propose the ModularityCheck tool to assess package modularity using co-change clusters, which are…
▽ More
It is widely accepted that traditional modular structures suffer from the dominant decomposition problem. Therefore, to improve current modularity views, it is important to investigate the impact of design decisions concerning modularity in other dimensions, as the evolutionary view. In this paper, we propose the ModularityCheck tool to assess package modularity using co-change clusters, which are sets of classes that usually changed together in the past. Our tool extracts information from version control platforms and issue reports, retrieves co-change clusters, generates metrics related to co-change clusters, and provides visualizations for assessing modularity. We also provide a case study to evaluate the tool. http://youtu.be/7eBYa2dfIS8
△ Less
Submitted 18 June, 2015;
originally announced June 2015.