-
Chronicles of CI/CD: A Deep Dive into its Usage Over Time
Authors:
Hugo da Gião,
André Flores,
Rui Pereira,
Jácome Cunha
Abstract:
DevOps is a combination of methodologies and tools that improves the software development, build, deployment, and monitoring processes by shortening its lifecycle and improving software quality. Part of this process is CI/CD, which embodies mostly the first parts, right up to the deployment. Despite the many benefits of DevOps and CI/CD, it still presents many challenges promoted by the tremendous…
▽ More
DevOps is a combination of methodologies and tools that improves the software development, build, deployment, and monitoring processes by shortening its lifecycle and improving software quality. Part of this process is CI/CD, which embodies mostly the first parts, right up to the deployment. Despite the many benefits of DevOps and CI/CD, it still presents many challenges promoted by the tremendous proliferation of different tools, languages, and syntaxes, which makes the field quite challenging to learn and keep up to date. Software repositories contain data regarding various software practices, tools, and uses. This data can help gather multiple insights that inform technical and academic decision-making. GitHub is currently the most popular software hosting platform and provides a search API that lets users query its repositories. Our goal with this paper is to gain insights into the technologies developers use for CI/CD by analyzing GitHub repositories. Using a list of the state-of-the-art CI/CD technologies, we use the GitHub search API to find repositories using each of these technologies. We also use the API to extract various insights regarding those repositories. We then organize and analyze the data collected. From our analysis, we provide an overview of the use of CI/CD technologies in our days, but also what happened in the last 12 years. We also show developers use several technologies simultaneously in the same project and that the change between technologies is quite common. From these insights, we find several research paths, from how to support the use of multiple technologies, both in terms of techniques, but also in terms of human-computer interaction, to aiding developers in evolving their CI/CD pipelines, again considering the various dimensions of the problem.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
Towards Foundational Models for Molecular Learning on Large-Scale Multi-Task Datasets
Authors:
Dominique Beaini,
Shenyang Huang,
Joao Alex Cunha,
Zhiyi Li,
Gabriela Moisescu-Pareja,
Oleksandr Dymov,
Samuel Maddrell-Mander,
Callum McLean,
Frederik Wenkel,
Luis Müller,
Jama Hussein Mohamud,
Ali Parviz,
Michael Craig,
Michał Koziarski,
Jiarui Lu,
Zhaocheng Zhu,
Cristian Gabellini,
Kerstin Klaser,
Josef Dean,
Cas Wognum,
Maciej Sypetkowski,
Guillaume Rabusseau,
Reihaneh Rabbany,
Jian Tang,
Christopher Morris
, et al. (10 additional authors not shown)
Abstract:
Recently, pre-trained foundation models have enabled significant advancements in multiple fields. In molecular machine learning, however, where datasets are often hand-curated, and hence typically small, the lack of datasets with labeled features, and codebases to manage those datasets, has hindered the development of foundation models. In this work, we present seven novel datasets categorized by…
▽ More
Recently, pre-trained foundation models have enabled significant advancements in multiple fields. In molecular machine learning, however, where datasets are often hand-curated, and hence typically small, the lack of datasets with labeled features, and codebases to manage those datasets, has hindered the development of foundation models. In this work, we present seven novel datasets categorized by size into three distinct categories: ToyMix, LargeMix and UltraLarge. These datasets push the boundaries in both the scale and the diversity of supervised labels for molecular learning. They cover nearly 100 million molecules and over 3000 sparsely defined tasks, totaling more than 13 billion individual labels of both quantum and biological nature. In comparison, our datasets contain 300 times more data points than the widely used OGB-LSC PCQM4Mv2 dataset, and 13 times more than the quantum-only QM1B dataset. In addition, to support the development of foundational models based on our proposed datasets, we present the Graphium graph machine learning library which simplifies the process of building and training molecular machine learning models for multi-task and multi-level molecular datasets. Finally, we present a range of baseline results as a starting point of multi-task and multi-level training on these datasets. Empirically, we observe that performance on low-resource biological datasets show improvement by also training on large amounts of quantum data. This indicates that there may be potential in multi-task and multi-level training of a foundation model and fine-tuning it to resource-constrained downstream tasks.
△ Less
Submitted 18 October, 2023; v1 submitted 6 October, 2023;
originally announced October 2023.
-
Efficient set-theoretic algorithms for computing high-order Forman-Ricci curvature on abstract simplicial complexes
Authors:
Danillo Barros de Souza,
Jonatas T. S. da Cunha,
Fernando A. N. Santos,
Jürgen Jost,
Serafim Rodrigues
Abstract:
Forman-Ricci curvature (FRC) is a potent and powerful tool for analysing empirical networks, as the distribution of the curvature values can identify structural information that is not readily detected by other geometrical methods. Crucially, FRC captures higher-order structural information of clique complexes of a graph or Vietoris-Rips complexes, which is not readily accessible to alternative me…
▽ More
Forman-Ricci curvature (FRC) is a potent and powerful tool for analysing empirical networks, as the distribution of the curvature values can identify structural information that is not readily detected by other geometrical methods. Crucially, FRC captures higher-order structural information of clique complexes of a graph or Vietoris-Rips complexes, which is not readily accessible to alternative methods. However, existing FRC platforms are prohibitively computationally expensive. Therefore, herein we develop an efficient set-theoretic formulation for computing such high-order FRC in simplicial complexes. Significantly, our set theory representation reveals previous computational bottlenecks and also accelerates the computation of FRC. Finally, We provide a pseudo-code, a software implementation coined FastForman, as well as a benchmark comparison with alternative implementations. We envisage that FastForman will be used in Topological and Geometrical Data analysis for high-dimensional complex data sets. Moreover, our development paves the way for future generalisations towards efficient computations of FRC on cell complexes.
△ Less
Submitted 9 May, 2024; v1 submitted 22 August, 2023;
originally announced August 2023.
-
A Backend Platform for Supporting the Reproducibility of Computational Experiments
Authors:
Lázaro Costa,
Susana Barbosa,
Jácome Cunha
Abstract:
In recent years, the research community has raised serious questions about the reproducibility of scientific work. In particular, since many studies include some kind of computing work, reproducibility is also a technological challenge, not only in computer science, but in most research domains.
Replicability and computational reproducibility are not easy to achieve, not only because researchers…
▽ More
In recent years, the research community has raised serious questions about the reproducibility of scientific work. In particular, since many studies include some kind of computing work, reproducibility is also a technological challenge, not only in computer science, but in most research domains.
Replicability and computational reproducibility are not easy to achieve, not only because researchers have diverse proficiency in computing technologies, but also because of the variety of computational environments that can be used. Indeed, it is challenging to recreate the same environment using the same frameworks, code, data sources, programming languages, dependencies, and so on.
In this work, we propose an Integrated Development Environment allowing the share, configuration, packaging and execution of an experiment by setting the code and data used and defining the programming languages, code, dependencies, databases, or commands to execute to achieve consistent results for each experiment. After the initial creation and configuration, the experiment can be executed any number of times, always producing exactly the same results. Furthermore, it allows the execution of the experiment by using a different associated dataset, and it can be possible to verify the reproducibility and replicability of the results. This allows the creation of a reproducible pack that can be re-executed by anyone on any other computer. Our platform aims to allow researchers in any field to create a reproducibility package for their science that can be re-executed on any other computer.
To evaluate our platform, we used it to reproduce 25 experiments extracted from published papers. We have been able to successfully reproduce 20 (80%) of these experiments achieving the results reported in such works with minimum effort, thus showing that our approach is effective.
△ Less
Submitted 29 June, 2023;
originally announced August 2023.
-
BlanketGen - A synthetic blanket occlusion augmentation pipeline for MoCap datasets
Authors:
João Carmona,
Tamás Karácsony,
João Paulo Silva Cunha
Abstract:
Human motion analysis has seen drastic improvements recently, however, due to the lack of representative datasets, for clinical in-bed scenarios it is still lagging behind. To address this issue, we implemented BlanketGen, a pipeline that augments videos with synthetic blanket occlusions. With this pipeline, we generated an augmented version of the pose estimation dataset 3DPW called BlanketGen-3D…
▽ More
Human motion analysis has seen drastic improvements recently, however, due to the lack of representative datasets, for clinical in-bed scenarios it is still lagging behind. To address this issue, we implemented BlanketGen, a pipeline that augments videos with synthetic blanket occlusions. With this pipeline, we generated an augmented version of the pose estimation dataset 3DPW called BlanketGen-3DPW. We then used this new dataset to fine-tune a Deep Learning model to improve its performance in these scenarios with promising results. Code and further information are available at https://gitlab.inesctec.pt/brain-lab/brain-lab-public/blanket-gen-releases.
△ Less
Submitted 19 March, 2023; v1 submitted 21 October, 2022;
originally announced October 2022.
-
BlanketSet -- A clinical real-world in-bed action recognition and qualitative semi-synchronised MoCap dataset
Authors:
João Carmona,
Tamás Karácsony,
João Paulo Silva Cunha
Abstract:
Clinical in-bed video-based human motion analysis is a very relevant computer vision topic for several relevant biomedical applications. Nevertheless, the main public large datasets (e.g. ImageNet or 3DPW) used for deep learning approaches lack annotated examples for these clinical scenarios. To address this issue, we introduce BlanketSet, an RGB-IR-D action recognition dataset of sequences perfor…
▽ More
Clinical in-bed video-based human motion analysis is a very relevant computer vision topic for several relevant biomedical applications. Nevertheless, the main public large datasets (e.g. ImageNet or 3DPW) used for deep learning approaches lack annotated examples for these clinical scenarios. To address this issue, we introduce BlanketSet, an RGB-IR-D action recognition dataset of sequences performed in a hospital bed. This dataset has the potential to help bridge the improvements attained in more general large datasets to these clinical scenarios. Information on how to access the dataset is available at https://rdm.inesctec.pt/dataset/nis-2022-004.
△ Less
Submitted 19 March, 2023; v1 submitted 7 October, 2022;
originally announced October 2022.
-
Energy Efficiency of Web Browsers in the Android Ecosystem
Authors:
Nélson Gonçalves,
Rui Rua,
Jácome Cunha,
Rui Pereira,
João Saraiva
Abstract:
This paper presents an empirical study regarding the energy consumption of the most used web browsers on the Android ecosystem. In order to properly compare the web browsers in terms of energy consumption, we defined a set of typical usage scenarios to be replicated in the different browsers, executed in the same testing environment and conditions. The results of our study show that there are sign…
▽ More
This paper presents an empirical study regarding the energy consumption of the most used web browsers on the Android ecosystem. In order to properly compare the web browsers in terms of energy consumption, we defined a set of typical usage scenarios to be replicated in the different browsers, executed in the same testing environment and conditions. The results of our study show that there are significant differences in terms of energy consumption among the considered browsers. Furthermore, we conclude that some browsers are energy efficient in several user actions, but energy greedy in other ones, allowing us to conclude that no browser is universally more efficient for all usage scenarios.
△ Less
Submitted 23 May, 2022;
originally announced May 2022.
-
Green Software Lab: Towards an Engineering Discipline for Green Software
Authors:
Rui Abreu,
Marco Couto,
Luís Cruz,
Jácome Cunha,
João Paulo Fernandes,
Rui Pereira,
Alexandre Perez,
João Saraiva
Abstract:
This report describes the research goals and results of the Green Software Lab (GSL) research project. This was a project funded by Fundação para a Ciência e a Tecnologia (FCT) -- the Portuguese research foundation -- under reference POCI-01-0145-FEDER-016718, that ran from January 2016 till July 2020.
This report includes the complete document reporting the results achieved during the project e…
▽ More
This report describes the research goals and results of the Green Software Lab (GSL) research project. This was a project funded by Fundação para a Ciência e a Tecnologia (FCT) -- the Portuguese research foundation -- under reference POCI-01-0145-FEDER-016718, that ran from January 2016 till July 2020.
This report includes the complete document reporting the results achieved during the project execution, which was submitted to FCT for evaluation on July 2020. It describes the goals of the project, and the different research tasks presenting the deliverables of each of them. It also presents the management and result dissemination work performed during the project's execution. The document includes also a self assessment of the achieved results, and a complete list of scientific publications describing the contributions of the project. Finally, this document includes the FCT evaluation report.
△ Less
Submitted 6 August, 2021;
originally announced August 2021.
-
Valid inequalities, preprocessing, and an effective heuristic for the uncapacitated three-level lot-sizing and replenishment problem with a distribution structure
Authors:
Jesus O. Cunha,
Rafael A. Melo
Abstract:
We consider the uncapacitated three-level lot-sizing and replenishment problem with a distribution structure. In this NP-hard problem, a single production plant sends the produced items to replenish warehouses from where they are dispatched to the retailers in order to satisfy their demands over a finite planning horizon. The goal of the problem is to determine an integrated production and distrib…
▽ More
We consider the uncapacitated three-level lot-sizing and replenishment problem with a distribution structure. In this NP-hard problem, a single production plant sends the produced items to replenish warehouses from where they are dispatched to the retailers in order to satisfy their demands over a finite planning horizon. The goal of the problem is to determine an integrated production and distribution plan minimizing the total costs, which comprehends fixed production and transportation setup as well as variable inventory holding costs. We describe new valid inequalities both in the space of a standard mixed integer programming (MIP) formulation and in that of a new alternative extended MIP formulation. We show that using such extended formulation, valid inequalities having similar structures to those in the standard one allow achieving tighter linear relaxation bounds. Furthermore, we propose a preprocessing approach to reduce the size of a multi-commodity MIP formulation and a multi-start randomized bottom-up dynamic programming based heuristic. Computational experiments indicate that the use of the valid inequalities in a branch-and-cut approach significantly increase the ability of a MIP solver to solve instances to optimality. Additionally, the valid inequalities for the extended formulation outperform those for the standard one in terms of number of solved instances, running time and number of enumerated nodes. Moreover, the proposed heuristic is able to generate solutions with considerably low optimality gaps within very short computational times even for large instances. Combining the preprocessing approach with the heuristic, one can achieve an increase in the number of solutions solved to optimality within the time limit together with significant reductions on the average times for solving them.
△ Less
Submitted 10 March, 2021; v1 submitted 3 October, 2020;
originally announced October 2020.
-
On the computational complexity of uncapacitated multi-plant lot-sizing problems
Authors:
J. O. Cunha,
H. H. Kramer,
R. A. Melo
Abstract:
Production and inventory planning have become crucial and challenging in nowadays competitive industrial and commercial sectors, especially when multiple plants or warehouses are involved. In this context, this paper addresses the complexity of uncapacitated multi-plant lot-sizing problems. We consider a multi-item uncapacitated multi-plant lot-sizing problem with fixed transfer costs and show tha…
▽ More
Production and inventory planning have become crucial and challenging in nowadays competitive industrial and commercial sectors, especially when multiple plants or warehouses are involved. In this context, this paper addresses the complexity of uncapacitated multi-plant lot-sizing problems. We consider a multi-item uncapacitated multi-plant lot-sizing problem with fixed transfer costs and show that two of its very restricted special cases are already NP-hard. Namely, we show that the single-item uncapacitated multi-plant lot-sizing problem with a single period and the multi-item uncapacitated two-plant lot-sizing problem with fixed transfer costs are NP-hard. Furthermore, as a direct implication of the proven results, we also show that a two-echelon multi-item lot-sizing with joint setup costs on transportation is NP-hard.
△ Less
Submitted 9 March, 2020;
originally announced March 2020.
-
Generation of concept-representative symbols
Authors:
João Miguel Cunha,
Pedro Martins,
Amílcar Cardoso,
Penousal Machado
Abstract:
The visual representation of concepts or ideas through the use of simple shapes has always been explored in the history of Humanity, and it is believed to be the origin of writing. We focus on computational generation of visual symbols to represent concepts. We aim to develop a system that uses background knowledge about the world to find connections among concepts, with the goal of generating sym…
▽ More
The visual representation of concepts or ideas through the use of simple shapes has always been explored in the history of Humanity, and it is believed to be the origin of writing. We focus on computational generation of visual symbols to represent concepts. We aim to develop a system that uses background knowledge about the world to find connections among concepts, with the goal of generating symbols for a given concept. We are also interested in exploring the system as an approach to visual dissociation and visual conceptual blending. This has a great potential in the area of Graphic Design as a tool to both stimulate creativity and aid in brainstorming in projects such as logo, pictogram or signage design.
△ Less
Submitted 28 July, 2017;
originally announced July 2017.
-
A Pig, an Angel and a Cactus Walk Into a Blender: A Descriptive Approach to Visual Blending
Authors:
João M. Cunha,
João Gonçalves,
Pedro Martins,
Penousal Machado,
Amílcar Cardoso
Abstract:
A descriptive approach for automatic generation of visual blends is presented. The implemented system, the Blender, is composed of two components: the Mapper and the Visual Blender. The approach uses structured visual representations along with sets of visual relations which describe how the elements (in which the visual representation can be decomposed) relate among each other. Our system is a hy…
▽ More
A descriptive approach for automatic generation of visual blends is presented. The implemented system, the Blender, is composed of two components: the Mapper and the Visual Blender. The approach uses structured visual representations along with sets of visual relations which describe how the elements (in which the visual representation can be decomposed) relate among each other. Our system is a hybrid blender, as the blending process starts at the Mapper (conceptual level) and ends at the Visual Blender (visual representation level). The experimental results show that the Blender is able to create analogies from input mental spaces and produce well-composed blends, which follow the rules imposed by its base-analogy and its relations. The resulting blends are visually interesting and some can be considered as unexpected.
△ Less
Submitted 19 February, 2019; v1 submitted 27 June, 2017;
originally announced June 2017.
-
The Influence of the Java Collection Framework on Overall Energy Consumption
Authors:
Rui Pereira,
Marco Couto,
Jácome Cunha,
João Paulo Fernandes,
João Saraiva
Abstract:
This paper presents a detailed study of the energy consumption of the different Java Collection Framework (JFC) implementations. For each method of an implementation in this framework, we present its energy consumption when handling different amounts of data. Knowing the greenest methods for each implementation, we present an energy optimization approach for Java programs: based on calls to JFC me…
▽ More
This paper presents a detailed study of the energy consumption of the different Java Collection Framework (JFC) implementations. For each method of an implementation in this framework, we present its energy consumption when handling different amounts of data. Knowing the greenest methods for each implementation, we present an energy optimization approach for Java programs: based on calls to JFC methods in the source code of a program, we select the greenest implementation. Finally, we present preliminary results of optimizing a set of Java programs where we obtained 6.2% energy savings.
△ Less
Submitted 2 February, 2016;
originally announced February 2016.
-
Towards the Design and Implementation of Aspect-Oriented Programming for Spreadsheets
Authors:
Pedro Maia,
Jorge Mendes,
Jácome Cunha,
Henrique Rebêlo,
João Saraiva
Abstract:
A spreadsheet usually starts as a simple and single-user software artifact, but, as frequent as in other software systems, quickly evolves into a complex system developed by many actors. Often, different users work on different aspects of the same spreadsheet: while a secretary may be only involved in adding plain data to the spreadsheet, an accountant may define new business rules, while an engin…
▽ More
A spreadsheet usually starts as a simple and single-user software artifact, but, as frequent as in other software systems, quickly evolves into a complex system developed by many actors. Often, different users work on different aspects of the same spreadsheet: while a secretary may be only involved in adding plain data to the spreadsheet, an accountant may define new business rules, while an engineer may need to adapt the spreadsheet content so it can be used by other software systems. Unfortunately, spreadsheet systems do not offer modular mechanisms, and as a consequence, some of the previous tasks may be defined by adding intrusive "code" to the spreadsheet.
In this paper we go through the design and implementation of an aspect-oriented language for spreadsheets so that users can work on different aspects of a spreadsheet in a modular way. For example, aspects can be defined in order to introduce new business rules to an existing spreadsheet, or to manipulate the spreadsheet data to be ported to another system. Aspects are defined as aspect-oriented program specifications that are dynamically woven into the underlying spreadsheet by an aspect weaver. In this aspect-oriented style of spreadsheet development, different users develop, or reuse, aspects without adding intrusive code to the original spreadsheet. Such code is added/executed by the spreadsheet weaving mechanism proposed in this paper.
△ Less
Submitted 11 March, 2015;
originally announced March 2015.
-
Querying Spreadsheets: An Empirical Study
Authors:
Jácome Cunha,
João Paulo Fernandes,
Rui Pereira,
João Saraiva
Abstract:
One of the most important assets of any company is being able to easily access information on itself and on its business. In this line, it has been observed that this important information is often stored in one of the millions of spreadsheets created every year, due to simplicity in using and manipulating such an artifact. Unfortunately, in many cases it is quite difficult to retrieve the intende…
▽ More
One of the most important assets of any company is being able to easily access information on itself and on its business. In this line, it has been observed that this important information is often stored in one of the millions of spreadsheets created every year, due to simplicity in using and manipulating such an artifact. Unfortunately, in many cases it is quite difficult to retrieve the intended information from a spreadsheet: information is often stored in a huge unstructured matrix, with no care for readability or comprehensiveness. In an attempt to aid users in the task of extracting information from a spreadsheet, researchers have been working on models, languages and tools to query. In this paper we present an empirical study evaluating such proposals assessing their usage to query spreadsheets. We investigate the use of the Google Query Function, textual model-driven querying, and visual model-driven querying. To compare these different querying approaches we present an empirical study whose results show that the end-users' productivity increases when using model-driven queries, specially using its visual representation.
△ Less
Submitted 27 February, 2015;
originally announced February 2015.
-
An Empirical Study on End-users Productivity Using Model-based Spreadsheets
Authors:
Laura Beckwith,
Jácome Cunha,
João Paulo Fernandes,
João Saraiva
Abstract:
Spreadsheets are widely used, and studies have shown that most end-user spreadsheets contain nontrivial errors. To improve end-users productivity, recent research proposes the use of a model-driven engineering approach to spreadsheets. In this paper we conduct the first systematic empirical study to assess the effectiveness and efficiency of this approach. A set of spreadsheet end users worked wit…
▽ More
Spreadsheets are widely used, and studies have shown that most end-user spreadsheets contain nontrivial errors. To improve end-users productivity, recent research proposes the use of a model-driven engineering approach to spreadsheets. In this paper we conduct the first systematic empirical study to assess the effectiveness and efficiency of this approach. A set of spreadsheet end users worked with two different model-based spreadsheets, and we present and analyze here the results achieved.
△ Less
Submitted 18 December, 2011;
originally announced December 2011.
-
Control and Debugging of Distributed Programs Using Fiddle
Authors:
Joao Lourenco,
Jose C. Cunha,
Vitor Moreira
Abstract:
The main goal of Fiddle, a distributed debugging engine, is to provide a flexible platform for develo** debugging tools. Fiddle provides a layered set of interfaces with a minimal set of debugging functionalities, for the inspection and control of distributed and multi-threaded applications.
This paper illustrates how Fiddle is used to support integrated testing and debugging. The approach d…
▽ More
The main goal of Fiddle, a distributed debugging engine, is to provide a flexible platform for develo** debugging tools. Fiddle provides a layered set of interfaces with a minimal set of debugging functionalities, for the inspection and control of distributed and multi-threaded applications.
This paper illustrates how Fiddle is used to support integrated testing and debugging. The approach described is based on a tool, called Deipa, that interprets sequences of commands read from an input file, generated by an independent testing tool. Deipa acts as a Fiddle client, in order to enforce specific execution paths in a distributed PVM program. Other Fiddle clients may be used along with Deipa for the fine debugging at process level. Fiddle and Deipa functionalities and architectures are described, and a working example shows a step-by-step application of these tools.
△ Less
Submitted 26 September, 2003;
originally announced September 2003.