-
On the Value of Project Productivity for Early Effort Estimation
Authors:
Mohammad Azzeh,
Ali Bou Nassif,
Yousef Elsheikh,
Lefteris Angelis
Abstract:
In general, estimating software effort using a Use Case Point (UCP) size requires the use of productivity as a second prediction factor. However, there are three drawbacks to this approach: (1) there is no clear procedure for predicting productivity in the early stages, (2) the use of fixed or limited productivity ratios does not allow research to reflect the realities of the software industry, an…
▽ More
In general, estimating software effort using a Use Case Point (UCP) size requires the use of productivity as a second prediction factor. However, there are three drawbacks to this approach: (1) there is no clear procedure for predicting productivity in the early stages, (2) the use of fixed or limited productivity ratios does not allow research to reflect the realities of the software industry, and (3) productivity from historical data is often challenging. The new UCP datasets now available allow us to perform further empirical investigations of the productivity variable in order to estimate the UCP effort. Accordingly, four different prediction models based on productivity were used. The results showed that learning productivity from historical data is more efficient than using classical approaches that rely on default or limited productivity values. In addition, predicting productivity from historical environmental factors is not often accurate. From here we conclude that productivity is an effective factor for estimating the software effort based on the UCP in the presence and absence of previous historical data. Moreover, productivity measurement should be flexible and adjustable when historical data is available
△ Less
Submitted 10 May, 2022;
originally announced May 2022.
-
An analysis of open source software licensing questions in Stack Exchange sites
Authors:
Maria Papoutsoglou,
Georgia M. Kapitsaki,
Daniel German,
Lefteris Angelis
Abstract:
Free and open source software is widely used in the creation of software systems, whereas many organisations choose to provide their systems as open source. Open source software carries licenses that determine the conditions under which the original software can be used. Appropriate use of licenses requires relevant expertise by the practitioners, and has an important legal angle. Educators and em…
▽ More
Free and open source software is widely used in the creation of software systems, whereas many organisations choose to provide their systems as open source. Open source software carries licenses that determine the conditions under which the original software can be used. Appropriate use of licenses requires relevant expertise by the practitioners, and has an important legal angle. Educators and employers need to ensure that developers have the necessary training to understand licensing risks and how they can be addressed. At the same time, it is important to understand which issues practitioners face when they are using a specific open source license, when they are develo** new open source software products or when they are reusing open source software. In this work, we examine questions posed about open source software licensing using data from the following Stack Exchange sites: Stack Overflow, Software Engineering, Open Source and Law. We analyse the indication of specific licenses and topics in the questions, investigate the attention the posts receive and trends over time, whether appropriate answers are provided and which type of questions are asked. Our results indicate that practitioners need, among other, clarifications about licensing specific software when other licenses are used, and for understanding license content. The results of the study can be useful for educators and employers, organisations that are authoring open source software licenses and developers for understanding the issues faced when using licenses, whereas they are relevant to other software engineering research areas, such as software reusability.
△ Less
Submitted 1 October, 2021;
originally announced October 2021.
-
Towards an Integrated Platform for Big Data Analysis
Authors:
Mahdi Bohlouli,
Frank Schulz,
Lefteris Angelis,
David Pahor,
Ivona Brandic,
David Atlan,
Rosemary Tate
Abstract:
The amount of data in the world is expanding rapidly. Every day, huge amounts of data are created by scientific experiments, companies, and end users' activities. These large data sets have been labeled as "Big Data", and their storage, processing and analysis presents a plethora of new challenges to computer science researchers and IT professionals. In addition to efficient data management, addit…
▽ More
The amount of data in the world is expanding rapidly. Every day, huge amounts of data are created by scientific experiments, companies, and end users' activities. These large data sets have been labeled as "Big Data", and their storage, processing and analysis presents a plethora of new challenges to computer science researchers and IT professionals. In addition to efficient data management, additional complexity arises from dealing with semi-structured or unstructured data, and from time critical processing requirements. In order to understand these massive amounts of data, advanced visualization and data exploration techniques are required. Innovative approaches to these challenges have been developed during recent years, and continue to be a hot topic for re-search and industry in the future. An investigation of current approaches reveals that usually only one or two aspects are ad-dressed, either in the data management, processing, analysis or visualization. This paper presents the vision of an integrated plat-form for big data analysis that combines all these aspects. Main benefits of this approach are an enhanced scalability of the whole platform, a better parameterization of algorithms, a more efficient usage of system resources, and an improved usability during the end-to-end data analysis process.
△ Less
Submitted 26 April, 2020;
originally announced April 2020.
-
A Study of Knowledge Sharing related to Covid-19 Pandemic in Stack Overflow
Authors:
Konstantinos Georgiou,
Nikolaos Mittas,
Lefteris Angelis,
Alexander Chatzigeorgiou
Abstract:
The Covid-19 outbreak, beyond its tragic effects, has changed to an unprecedented extent almost every aspect of human activity throughout the world. At the same time, the pandemic has stimulated enormous amount of research by scientists across various disciplines, seeking to study the phenomenon itself, its epidemiological characteristics and ways to confront its consequences. Information Technolo…
▽ More
The Covid-19 outbreak, beyond its tragic effects, has changed to an unprecedented extent almost every aspect of human activity throughout the world. At the same time, the pandemic has stimulated enormous amount of research by scientists across various disciplines, seeking to study the phenomenon itself, its epidemiological characteristics and ways to confront its consequences. Information Technology, and particularly Data Science, drive innovation in all related to Covid-19 biomedical fields. Acknowledging that software developers routinely resort to open question and answer communities like Stack Overflow to seek advice on solving technical issues, we have performed an empirical study to investigate the extent, evolution and characteristics of Covid-19 related posts. In particular, through the study of 464 Stack Overflow questions posted mainly in February and March 2020 and leveraging the power of text mining, we attempt to shed light into the interest of developers in Covid-19 related topics and the most popular technological problems for which the users seek information. The findings reveal that indeed this global crisis sparked off an intense and increasing activity in Stack Overflow with most post topics reflecting a strong interest on the analysis of Covid-19 data, primarily using Python technologies.
△ Less
Submitted 18 April, 2020;
originally announced April 2020.
-
Competence Assessment as an Expert System for Human Resource Management: A Mathematical Approach
Authors:
Mahdi Bohlouli,
Nikolaos Mittas,
George Kakarontzas,
Theodosios Theodosiou,
Lefteris Angelis,
Madjid Fathi
Abstract:
Efficient human resource management needs accurate assessment and representation of available competences as well as effective map** of required competences for specific jobs and positions. In this regard, appropriate definition and identification of competence gaps express differences between acquired and required competences. Using a detailed quantification scheme together with a mathematical…
▽ More
Efficient human resource management needs accurate assessment and representation of available competences as well as effective map** of required competences for specific jobs and positions. In this regard, appropriate definition and identification of competence gaps express differences between acquired and required competences. Using a detailed quantification scheme together with a mathematical approach is a way to support accurate competence analytics, which can be applied in a wide variety of sectors and fields. This article describes the combined use of software technologies and mathematical and statistical methods for assessing and analyzing competences in human resource information systems. Based on a standard competence model, which is called a Professional, Innovative and Social competence tree, the proposed framework offers flexible tools to experts in real enterprise environments, either for evaluation of employees towards an optimal job assignment and vocational training or for recruitment processes. The system has been tested with real human resource data sets in the frame of the European project called ComProFITS.
△ Less
Submitted 16 January, 2020;
originally announced January 2020.
-
Cross-study Reliability of the Open Card Sorting Method
Authors:
Christos Katsanos,
Nikolaos Tselios,
Nikolaos Avouris,
Stavros Demetriadis,
Ioannis Stamelos,
Lefteris Angelis
Abstract:
Information architecture forms the foundation of users' navigation experience. Open card sorting is a widely-used method to create information architectures based on users' grou**s of the content. However, little is known about the method's cross-study reliability: Does it produce consistent content grou**s for similar profile participants involved in different card sort studies? This paper pr…
▽ More
Information architecture forms the foundation of users' navigation experience. Open card sorting is a widely-used method to create information architectures based on users' grou**s of the content. However, little is known about the method's cross-study reliability: Does it produce consistent content grou**s for similar profile participants involved in different card sort studies? This paper presents an empirical evaluation of the method's cross-study reliability. Six card sorts involving 140 participants were conducted: three open sorts for a travel website, and three for an eshop. Results showed that participants provided highly similar card sorting data for the same content. A rather high agreement of the produced navigation schemes was also found. These findings provide support for the cross-study reliability of the open card sorting method.
△ Less
Submitted 19 March, 2019;
originally announced March 2019.
-
Discovering patterns of correlation and similarities in software project data with the Circos visualization tool
Authors:
Makrina Viola Kosti,
Sofia Lazaridou,
Nikoleta Bourazani,
Lefteris Angelis
Abstract:
Software cost estimation based on multivariate data from completed projects requires the building of efficient models. These models essentially describe relations in the data, either on the basis of correlations between variables or of similarities between the projects. The continuous growth of the amount of data gathered and the need to perform preliminary analysis in order to discover patterns a…
▽ More
Software cost estimation based on multivariate data from completed projects requires the building of efficient models. These models essentially describe relations in the data, either on the basis of correlations between variables or of similarities between the projects. The continuous growth of the amount of data gathered and the need to perform preliminary analysis in order to discover patterns able to drive the building of reasonable models, leads the researchers towards intelligent and time-saving tools which can effectively describe data and their relationships. The goal of this paper is to suggest an innovative visualization tool, widely used in bioinformatics, which represents relations in data in an aesthetic and intelligent way. In order to illustrate the capabilities of the tool, we use a well known dataset from software engineering projects.
△ Less
Submitted 6 October, 2011;
originally announced October 2011.
-
DD-EbA: An algorithm for determining the number of neighbors in cost estimation by analogy using distance distributions
Authors:
Makrina Viola Kosti,
Nikolaos Mittas,
Lefteris Angelis
Abstract:
Case Based Reasoning and particularly Estimation by Analogy, has been used in a number of problem-solving areas, such as cost estimation. Conventional methods, despite the lack of a sound criterion for choosing nearest projects, were based on estimation using a fixed and predetermined number of neighbors from the entire set of historical instances. This approach puts boundaries to the estimation a…
▽ More
Case Based Reasoning and particularly Estimation by Analogy, has been used in a number of problem-solving areas, such as cost estimation. Conventional methods, despite the lack of a sound criterion for choosing nearest projects, were based on estimation using a fixed and predetermined number of neighbors from the entire set of historical instances. This approach puts boundaries to the estimation ability of such algorithms, for they do not take into consideration that every project under estimation is unique and requires different handling. The notion of distributions of distances together with a distance metric for distributions help us to adapt the proposed method (we call it DD-EbA) each time to a specific case that is to be estimated without loosing in prediction power or computational cost. The results of this paper show that the proposed technique achieves the above idea in a very efficient way.
△ Less
Submitted 28 December, 2010;
originally announced December 2010.