-
Granular Privacy Control for Geolocation with Vision Language Models
Authors:
Ethan Mendes,
Yang Chen,
James Hays,
Sauvik Das,
Wei Xu,
Alan Ritter
Abstract:
Vision Language Models (VLMs) are rapidly advancing in their capability to answer information-seeking questions. As these models are widely deployed in consumer applications, they could lead to new privacy risks due to emergent abilities to identify people in photos, geolocate images, etc. As we demonstrate, somewhat surprisingly, current open-source and proprietary VLMs are very capable image geo…
▽ More
Vision Language Models (VLMs) are rapidly advancing in their capability to answer information-seeking questions. As these models are widely deployed in consumer applications, they could lead to new privacy risks due to emergent abilities to identify people in photos, geolocate images, etc. As we demonstrate, somewhat surprisingly, current open-source and proprietary VLMs are very capable image geolocators, making widespread geolocation with VLMs an immediate privacy risk, rather than merely a theoretical future concern. As a first step to address this challenge, we develop a new benchmark, GPTGeoChat, to test the ability of VLMs to moderate geolocation dialogues with users. We collect a set of 1,000 image geolocation conversations between in-house annotators and GPT-4v, which are annotated with the granularity of location information revealed at each turn. Using this new dataset, we evaluate the ability of various VLMs to moderate GPT-4v geolocation conversations by determining when too much location information has been revealed. We find that custom fine-tuned models perform on par with prompted API-based models when identifying leaked location information at the country or city level; however, fine-tuning on supervised data appears to be needed to accurately moderate finer granularities, such as the name of a restaurant or building.
△ Less
Submitted 6 July, 2024;
originally announced July 2024.
-
Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game
Authors:
Sam Toyer,
Olivia Watkins,
Ethan Adrian Mendes,
Justin Svegliato,
Luke Bailey,
Tiffany Wang,
Isaac Ong,
Karim Elmaaroufi,
Pieter Abbeel,
Trevor Darrell,
Alan Ritter,
Stuart Russell
Abstract:
While Large Language Models (LLMs) are increasingly being used in real-world applications, they remain vulnerable to prompt injection attacks: malicious third party prompts that subvert the intent of the system designer. To help researchers study this problem, we present a dataset of over 126,000 prompt injection attacks and 46,000 prompt-based "defenses" against prompt injection, all created by p…
▽ More
While Large Language Models (LLMs) are increasingly being used in real-world applications, they remain vulnerable to prompt injection attacks: malicious third party prompts that subvert the intent of the system designer. To help researchers study this problem, we present a dataset of over 126,000 prompt injection attacks and 46,000 prompt-based "defenses" against prompt injection, all created by players of an online game called Tensor Trust. To the best of our knowledge, this is currently the largest dataset of human-generated adversarial examples for instruction-following LLMs. The attacks in our dataset have a lot of easily interpretable stucture, and shed light on the weaknesses of LLMs. We also use the dataset to create a benchmark for resistance to two types of prompt injection, which we refer to as prompt extraction and prompt hijacking. Our benchmark results show that many models are vulnerable to the attack strategies in the Tensor Trust dataset. Furthermore, we show that some attack strategies from the dataset generalize to deployed LLM-based applications, even though they have a very different set of constraints to the game. We release all data and source code at https://tensortrust.ai/paper
△ Less
Submitted 2 November, 2023;
originally announced November 2023.
-
Can Language Models be Instructed to Protect Personal Information?
Authors:
Yang Chen,
Ethan Mendes,
Sauvik Das,
Wei Xu,
Alan Ritter
Abstract:
Large multimodal language models have proven transformative in numerous applications. However, these models have been shown to memorize and leak pre-training data, raising serious user privacy and information security concerns. While data leaks should be prevented, it is also crucial to examine the trade-off between the privacy protection and model utility of proposed approaches. In this paper, we…
▽ More
Large multimodal language models have proven transformative in numerous applications. However, these models have been shown to memorize and leak pre-training data, raising serious user privacy and information security concerns. While data leaks should be prevented, it is also crucial to examine the trade-off between the privacy protection and model utility of proposed approaches. In this paper, we introduce PrivQA -- a multimodal benchmark to assess this privacy/utility trade-off when a model is instructed to protect specific categories of personal information in a simulated scenario. We also propose a technique to iteratively self-moderate responses, which significantly improves privacy. However, through a series of red-teaming experiments, we find that adversaries can also easily circumvent these protections with simple jailbreaking methods through textual and/or image inputs. We believe PrivQA has the potential to support the development of new models with improved privacy protections, as well as the adversarial robustness of these protections. We release the entire PrivQA dataset at https://llm-access-control.github.io/.
△ Less
Submitted 3 October, 2023;
originally announced October 2023.
-
Successful Combination of Database Search and Snowballing for Identification of Primary Studies in Systematic Literature Studies
Authors:
Claes Wohlin,
Marcos Kalinowski,
Katia Romero Felizardo,
Emilia Mendes
Abstract:
Background: A good search strategy is essential for a successful systematic literature study. Historically, database searches have been the norm, which has later been complemented with snowball searches. Our conjecture is that we can perform even better searches if combining the two search approaches, referred to as a hybrid search strategy. Objective: Our main objective was to compare and evaluat…
▽ More
Background: A good search strategy is essential for a successful systematic literature study. Historically, database searches have been the norm, which has later been complemented with snowball searches. Our conjecture is that we can perform even better searches if combining the two search approaches, referred to as a hybrid search strategy. Objective: Our main objective was to compare and evaluate a hybrid search strategy. Furthermore, we compared some alternative hybrid search strategies to assess whether it was possible to identify more cost-efficient ways of searching for relevant primary studies. Method: To compare and evaluate the hybrid search strategy, we replicated an SLR on industry-academia collaboration in software engineering. The SLR used a more traditional approach to searching for relevant articles for an SLR, while the replication was conducted using a hybrid search strategy. Results: In our evaluation, the hybrid search strategy was superior in identifying relevant primary studies. It identified 30 percent more primary studies and even more when focusing only on peer-reviewed articles. To embrace individual viewpoints when assessing research articles and minimise the risk of missing primary studies, we introduced two new concepts, wild cards and borderline articles, when conducting systematic literature studies. Conclusions: The hybrid search strategy is a strong contender for being used when conducting systematic literature studies. Furthermore, alternative hybrid search strategies may be viable if selected wisely in relation to the start set for snowballing. Finally, the two new concepts were judged as essential to cater for different individual judgements and to minimise the risk of excluding primary studies that ought to be included.
△ Less
Submitted 5 July, 2023;
originally announced July 2023.
-
Human-in-the-loop Evaluation for Early Misinformation Detection: A Case Study of COVID-19 Treatments
Authors:
Ethan Mendes,
Yang Chen,
Wei Xu,
Alan Ritter
Abstract:
We present a human-in-the-loop evaluation framework for fact-checking novel misinformation claims and identifying social media messages that support them. Our approach extracts check-worthy claims, which are aggregated and ranked for review. Stance classifiers are then used to identify tweets supporting novel misinformation claims, which are further reviewed to determine whether they violate relev…
▽ More
We present a human-in-the-loop evaluation framework for fact-checking novel misinformation claims and identifying social media messages that support them. Our approach extracts check-worthy claims, which are aggregated and ranked for review. Stance classifiers are then used to identify tweets supporting novel misinformation claims, which are further reviewed to determine whether they violate relevant policies. To demonstrate the feasibility of our approach, we develop a baseline system based on modern NLP methods for human-in-the-loop fact-checking in the domain of COVID-19 treatments. We make our data and detailed annotation guidelines available to support the evaluation of human-in-the-loop systems that identify novel misinformation directly from raw user-generated content.
△ Less
Submitted 3 July, 2023; v1 submitted 19 December, 2022;
originally announced December 2022.
-
Log severity level classification: an approach for systems in production
Authors:
Eduardo Mendes,
Fabio Petrillo
Abstract:
Context: Logs are often the primary source of information for system developers and operations engineers to understand and diagnose the behavior of a software system in production. In many cases, logs are the only evidence available for fault investigation. Problem: However, the inappropriate choice of log severity level can impact the amount of log data generated and, consequently, quality. This…
▽ More
Context: Logs are often the primary source of information for system developers and operations engineers to understand and diagnose the behavior of a software system in production. In many cases, logs are the only evidence available for fault investigation. Problem: However, the inappropriate choice of log severity level can impact the amount of log data generated and, consequently, quality. This storage overhead can impact the performance of log-based monitoring systems, as excess log data comes with increased aggregate noise, making it challenging to utilize what is actually important when trying to do diagnostics. Goal: This research aims to decrease the overheads of monitoring systems by processing the severity level of log data from systems in production. Approach: To achieve this goal, we intend to deepen the knowledge about the log severity levels and develop an automated approach to log severity level classification, demonstrating that reducing log severity level "noise" improves the monitoring of systems in production. Conclusion: We hope that the set of contributions from this work can improve the monitoring activities of software systems and contribute to the creation of knowledge that improves logging practices
△ Less
Submitted 21 December, 2021;
originally announced December 2021.
-
Log severity levels matter: A multivocal map**
Authors:
Eduardo Mendes,
Fabio Petrillo
Abstract:
The choice of log severity level can be challenging and cause problems in producing reliable logging data. However, there is a lack of specifications and practical guidelines to support this challenge. In this study, we present a multivocal systematic map** of log severity levels from peer-reviewed literature, logging libraries, and practitioners' views. We analyzed 19 severity levels, 27 studie…
▽ More
The choice of log severity level can be challenging and cause problems in producing reliable logging data. However, there is a lack of specifications and practical guidelines to support this challenge. In this study, we present a multivocal systematic map** of log severity levels from peer-reviewed literature, logging libraries, and practitioners' views. We analyzed 19 severity levels, 27 studies, and 40 logging libraries. Our results show redundancy and semantic similarity between the levels and a tendency to converge the levels for a total of six levels. Our contributions help leverage the reliability of log entries: (i) map** the literature about log severity levels, (ii) map** the severity levels in logging libraries, (iii) a set of synthesized six definitions and four general purposes for severity levels. We recommend that developers use a standard nomenclature, and for logging library creators, we suggest providing accurate and unambiguous definitions of log severity levels.
△ Less
Submitted 6 December, 2021; v1 submitted 2 September, 2021;
originally announced September 2021.
-
Towards Logging Noisiness Theory: quality aspects to characterize unwanted log entries
Authors:
Eduardo Mendes,
Fabio Petrillo
Abstract:
Context: Logging tasks track the system's functioning by kee** records of evidence that have been analyzed by monitoring and observability activities. For these activities to be effective, it is necessary to consider the quality of the consumed information. Problem: However, the presence of noise - unwanted information - compromises the log files' quality. The noisiness of a log file can be affe…
▽ More
Context: Logging tasks track the system's functioning by kee** records of evidence that have been analyzed by monitoring and observability activities. For these activities to be effective, it is necessary to consider the quality of the consumed information. Problem: However, the presence of noise - unwanted information - compromises the log files' quality. The noisiness of a log file can be affected among other things by: (i) the wrong severity log choices, (ii) the production of duplicate entries, (iii) the incompleteness of the information, (iv) the inappropriate format of the entries, (v) the amount of information generated. Objective: This work aims to broadly define the concept of noise in the context of logging, proposing the initial steps of Logging Noisiness, a theory on quality aspects to characterize unwanted log entries.
△ Less
Submitted 5 June, 2021;
originally announced June 2021.
-
Understanding the Perceived Relevance of Capability Measures: A Survey of Agile Software Development Practitioners
Authors:
Sai Datta Vishnubhotla,
Emilia Mendes,
Lars Lundberg
Abstract:
Context: In the light of the swift and iterative nature of Agile Software Development (ASD) practices, establishing deeper insights into capability measurement within the context of team formation is crucial, as the capability of individuals and teams can affect team performance and productivity. Although a former Systematic Literature Review (SLR) synthesized the state of the art in relation to c…
▽ More
Context: In the light of the swift and iterative nature of Agile Software Development (ASD) practices, establishing deeper insights into capability measurement within the context of team formation is crucial, as the capability of individuals and teams can affect team performance and productivity. Although a former Systematic Literature Review (SLR) synthesized the state of the art in relation to capability measurement in ASD with a focus on selecting individuals to agile teams, and capabilities related to team performance and success, determining to what degree the SLR's results apply to practice can provide progressive insights to both research and practice.
Objective: Our study investigates how agile practitioners perceive the relevance of individual and team level measures for characterizing the capability of an agile team and its members. Furthermore, to scrutinize variations in practitioners' perceptions, our study further analyzes perceptions across stratified demographic groups.
Method: We undertook a Web-based survey using a questionnaire built based on the capability measures identified from a previously conducted SLR.
Results: Our survey responses (60) indicate that 127 individual and 28 team capability measures were considered as relevant by the majority of practitioners. We also identified seven individual and one team capability measure that have not been previously characterized by our SLR. The surveyed practitioners suggested that an agile team member's responsibility and questioning skills significantly represent the member's capability.
Conclusion: Results from our survey align with our SLR's findings. Measures associated with social aspects were observed to be dominant compared to technical and innovative aspects. Our results can support agile practitioners in their team composition decisions.
△ Less
Submitted 20 May, 2021;
originally announced May 2021.
-
Using Visual Text Mining to Support the Study Selection Activity in Systematic Literature Reviews
Authors:
Katia Romero Felizardo,
Norsaremah Salleh,
Rafael M. Martins,
Emília Mendes,
Stephen G. MacDonell,
José Carlos Maldonado
Abstract:
Background: A systematic literature review (SLR) is a methodology used to aggregate all relevant existing evidence to answer a research question of interest. Although crucial, the process used to select primary studies can be arduous, time consuming, and must often be conducted manually. Objective: We propose a novel approach, known as 'Systematic Literature Review based on Visual Text Mining' or…
▽ More
Background: A systematic literature review (SLR) is a methodology used to aggregate all relevant existing evidence to answer a research question of interest. Although crucial, the process used to select primary studies can be arduous, time consuming, and must often be conducted manually. Objective: We propose a novel approach, known as 'Systematic Literature Review based on Visual Text Mining' or simply SLR-VTM, to support the primary study selection activity using visual text mining (VTM) techniques. Method: We conducted a case study to compare the performance and effectiveness of four doctoral students in selecting primary studies manually and using the SLR-VTM approach. To enable the comparison, we also developed a VTM tool that implemented our approach. We hypothesized that students using SLR-VTM would present improved selection performance and effectiveness. Results: Our results show that incorporating VTM in the SLR study selection activity reduced the time spent in this activity and also increased the number of studies correctly included. Conclusions: Our pilot case study presents promising results suggesting that the use of VTM may indeed be beneficial during the study selection activity when performing an SLR.
△ Less
Submitted 4 February, 2021;
originally announced February 2021.
-
Analysing the use of graphs to represent the results of Systematic Reviews in Software Engineering
Authors:
Katia Romero Felizardo,
Mehwish Riaz,
Muhammad Sulayman,
Emília Mendes,
Stephen G. MacDonell,
José Carlos Maldonado
Abstract:
The presentation of results from Systematic Literature Reviews (SLRs) is generally done using tables. Prior research suggests that results summarized in tables are often difficult for readers to understand. One alternative to improve results' comprehensibility is to use graphical representations. The aim of this work is twofold: first, to investigate whether graph representations result is better…
▽ More
The presentation of results from Systematic Literature Reviews (SLRs) is generally done using tables. Prior research suggests that results summarized in tables are often difficult for readers to understand. One alternative to improve results' comprehensibility is to use graphical representations. The aim of this work is twofold: first, to investigate whether graph representations result is better comprehensibility than tables when presenting SLR results; second, to investigate whether interpretation using graphs impacts on performance, as measured by the time consumed to analyse and understand the data. We selected an SLR published in the literature and used two different formats to represent its results - tables and graphs, in three different combinations: (i) table format only; (ii) graph format only; and (iii) a mixture of tables and graphs. We conducted an experiment that compared the performance and capability of experts in SLR, as well as doctoral and masters students, in analysing and understanding the results of the SLR, as presented in one of the three different forms. We were interested in examining whether there is difference between the performance of participants using tables and graphs. The graphical representation of SLR data led to a reduction in the time taken for its analysis, without any loss in data comprehensibility. For our sample the analysis of graphical data proved to be faster than the analysis of tabular data. However , we found no evidence of a difference in comprehensibility whether using tables, graphical format or a combination. Overall we argue that graphs are a suitable alternative to tables when it comes to representing the results of an SLR.
△ Less
Submitted 4 February, 2021;
originally announced February 2021.
-
Machine Learning Advances for Time Series Forecasting
Authors:
Ricardo P. Masini,
Marcelo C. Medeiros,
Eduardo F. Mendes
Abstract:
In this paper we survey the most recent advances in supervised machine learning and high-dimensional models for time series forecasting. We consider both linear and nonlinear alternatives. Among the linear methods we pay special attention to penalized regressions and ensemble of models. The nonlinear methods considered in the paper include shallow and deep neural networks, in their feed-forward an…
▽ More
In this paper we survey the most recent advances in supervised machine learning and high-dimensional models for time series forecasting. We consider both linear and nonlinear alternatives. Among the linear methods we pay special attention to penalized regressions and ensemble of models. The nonlinear methods considered in the paper include shallow and deep neural networks, in their feed-forward and recurrent versions, and tree-based methods, such as random forests and boosted trees. We also consider ensemble and hybrid models by combining ingredients from different alternatives. Tests for superior predictive ability are briefly reviewed. Finally, we discuss application of machine learning in economics and finance and provide an illustration with high-frequency financial data.
△ Less
Submitted 9 April, 2021; v1 submitted 23 December, 2020;
originally announced December 2020.
-
A Systematic Map** on the use of Visual Data Mining to Support the Conduct of Systematic Literature Reviews
Authors:
Katia R. Felizardo,
Stephen G. MacDonell,
Emília Mendes,
José Carlos Maldonado
Abstract:
A systematic literature review (SLR) is a methodology used to find and aggregate all relevant existing evidence about a specific research question of interest. Important decisions need to be made at several points in the review process, relating to search of the literature, selection of relevant primary studies and use of methods of synthesis. Visualization can support tasks that involve large col…
▽ More
A systematic literature review (SLR) is a methodology used to find and aggregate all relevant existing evidence about a specific research question of interest. Important decisions need to be made at several points in the review process, relating to search of the literature, selection of relevant primary studies and use of methods of synthesis. Visualization can support tasks that involve large collections of data, such as the studies collected, evaluated and summarized in an SLR. The objective of this paper is to present the results of a systematic map** study (SM) conducted to collect and evaluate evidence on the use of a specific visualization technique, visual data mining (VDM), to support the SLR process. We reviewed 20 papers and our results indicate a scarcity of research on the use of VDM to help with conducting SLRs in the software engineering domain. However, most of the studies (16 of the 20 studies included in our map**) have been conducted in the field of medicine and they revealed that the activities of data extraction and data synthesis, related to conducting the review phase of an SLR process, have more VDM support than other activities. In contrast, according to our SM, previous studies using VDM techniques with SLRs have not employed such techniques during the SLR's planning and reporting phases.
△ Less
Submitted 19 December, 2020;
originally announced December 2020.
-
Guidelines for the Search Strategy to Update Systematic Literature Reviews in Software Engineering
Authors:
Claes Wohlin,
Emilia Mendes,
Katia Romero Felizardo,
Marcos Kalinowski
Abstract:
Context: Systematic Literature Reviews (SLRs) have been adopted within Software Engineering (SE) for more than a decade to provide meaningful summaries of evidence on several topics. Many of these SLRs are now potentially not fully up-to-date, and there are no standard proposals on how to update SLRs in SE. Objective: The objective of this paper is to propose guidelines on how to best search for e…
▽ More
Context: Systematic Literature Reviews (SLRs) have been adopted within Software Engineering (SE) for more than a decade to provide meaningful summaries of evidence on several topics. Many of these SLRs are now potentially not fully up-to-date, and there are no standard proposals on how to update SLRs in SE. Objective: The objective of this paper is to propose guidelines on how to best search for evidence when updating SLRs in SE, and to evaluate these guidelines using an SLR that was not employed during the formulation of the guidelines. Method: To propose our guidelines, we compare and discuss outcomes from applying different search strategies to identify primary studies in a published SLR, an SLR update, and two replications in the area of effort estimation. These guidelines are then evaluated using an SLR in the area of software ecosystems, its update and a replication. Results: The use of a single iteration forward snowballing with Google Scholar, and employing as a seed set the original SLR and its primary studies is the most cost-effective way to search for new evidence when updating SLRs. Furthermore, the importance of having more than one researcher involved in the selection of papers when applying the inclusion and exclusion criteria is highlighted through the results. Conclusions: Our proposed guidelines formulated based upon an effort estimation SLR, its update and two replications, were supported when using an SLR in the area of software ecosystems, its update and a replication. Therefore, we put forward that our guidelines ought to be adopted for updating SLRs in SE.
△ Less
Submitted 9 June, 2020;
originally announced June 2020.
-
On the Performance of Hybrid Search Strategies for Systematic Literature Reviews in Software Engineering
Authors:
Erica Mourão,
João Felipe Pimentel,
Leonardo Murta,
Marcos Kalinowski,
Emilia Mendes,
Claes Wohlin
Abstract:
Context: When conducting a Systematic Literature Review (SLR), researchers usually face the challenge of designing a search strategy that appropriately balances result quality and review effort. Using digital library (or database) searches or snowballing alone may not be enough to achieve high-quality results. On the other hand, using both digital library searches and snowballing together may incr…
▽ More
Context: When conducting a Systematic Literature Review (SLR), researchers usually face the challenge of designing a search strategy that appropriately balances result quality and review effort. Using digital library (or database) searches or snowballing alone may not be enough to achieve high-quality results. On the other hand, using both digital library searches and snowballing together may increase the overall review effort.
Objective: The goal of this research is to propose and evaluate hybrid search strategies that selectively combine database searches with snowballing.
Method: We propose four hybrid search strategies combining database searches in digital libraries with iterative, parallel, or sequential backward and forward snowballing. We simulated the strategies over three existing SLRs in SE that adopted both database searches and snowballing. We compared the outcome of digital library searches, snowballing, and hybrid strategies using precision, recall, and F-measure to investigate the performance of each strategy.
Results: Our results show that, for the analyzed SLRs, combining database searches from the Scopus digital library with parallel or sequential snowballing achieved the most appropriate balance of precision and recall.
Conclusion: We put forward that, depending on the goals of the SLR and the available resources, using a hybrid search strategy involving a representative digital library and parallel or sequential snowballing tends to represent an appropriate alternative to be used when searching for evidence in SLRs.
△ Less
Submitted 20 April, 2020;
originally announced April 2020.
-
When to Update Systematic Literature Reviews in Software Engineering
Authors:
Emilia Mendes,
Claes Wohlin,
Katia Felizardo,
Marcos Kalinowski
Abstract:
[Context] Systematic Literature Reviews (SLRs) have been adopted by the Software Engineering (SE) community for approximately 15 years to provide meaningful summaries of evidence on several topics. Many of these SLRs are now potentially outdated, and there are no systematic proposals on when to update SLRs in SE. [Objective] The goal of this paper is to provide recommendations on when to update SL…
▽ More
[Context] Systematic Literature Reviews (SLRs) have been adopted by the Software Engineering (SE) community for approximately 15 years to provide meaningful summaries of evidence on several topics. Many of these SLRs are now potentially outdated, and there are no systematic proposals on when to update SLRs in SE. [Objective] The goal of this paper is to provide recommendations on when to update SLRs in SE. [Method] We evaluated, using a three-step approach, a third-party decision framework (3PDF) employed in other fields, to decide whether SLRs need updating. First, we conducted a literature review of SLR updates in SE and contacted the authors to obtain their feedback relating to the usefulness of the 3PDF within the context of SLR updates in SE. Second, we used these authors feedback to see whether the framework needed any adaptation; none was suggested. Third, we applied the 3PDF to the SLR updates identified in our literature review. [Results] The 3PDF showed that 14 of the 20 SLRs did not need updating. This supports the use of a decision support mechanism (such as the 3PDF) to help the SE community decide when to update SLRs. [Conclusions] We put forward that the 3PDF should be adopted by the SE community to keep relevant evidence up to date and to avoid wasting effort with unnecessary updates.
△ Less
Submitted 13 April, 2020;
originally announced April 2020.
-
Multi-objective Evolutionary Approach to Grey-Box Identification of Buck Converter
Authors:
Faizal Hafiz,
Akshya Swain,
Eduardo M. A. M. Mendes,
Luis Aguirre
Abstract:
The present study proposes a simple grey-box identification approach to model a real DC-DC buck converter operating in continuous conduction mode. The problem associated with the information void in the observed dynamical data, which is often obtained over a relatively narrow input range, is alleviated by exploiting the known static behavior of buck converter as a priori knowledge. A simple method…
▽ More
The present study proposes a simple grey-box identification approach to model a real DC-DC buck converter operating in continuous conduction mode. The problem associated with the information void in the observed dynamical data, which is often obtained over a relatively narrow input range, is alleviated by exploiting the known static behavior of buck converter as a priori knowledge. A simple method is developed based on the concept of term clusters to determine the static response of the candidate models. The error in the static behavior is then directly embedded into the multi-objective framework for structure selection. In essence, the proposed approach casts grey-box identification problem into a multi-objective framework to balance bias-variance dilemma of model building while explicitly integrating a priori knowledge into the structure selection process. The results of the investigation, considering the case of practical buck converter, demonstrate that it is possible to identify parsimonious models which can capture both the dynamic and static behavior of the system over a wide input range.
△ Less
Submitted 20 February, 2020; v1 submitted 10 September, 2019;
originally announced September 2019.
-
Multi-Objective Evolutionary Framework for Non-linear System Identification: A Comprehensive Investigation
Authors:
Faizal Hafiz,
Akshya Swain,
Eduardo MAM Mendes
Abstract:
The present study proposes a multi-objective framework for structure selection of nonlinear systems which are represented by polynomial NARX models. This framework integrates the key components of Multi-Criteria Decision Making (MCDM) which include preference handling, Multi-Objective Evolutionary Algorithms (MOEAs) and a posteriori selection. To this end, three well-known MOEAs such as NSGA-II, S…
▽ More
The present study proposes a multi-objective framework for structure selection of nonlinear systems which are represented by polynomial NARX models. This framework integrates the key components of Multi-Criteria Decision Making (MCDM) which include preference handling, Multi-Objective Evolutionary Algorithms (MOEAs) and a posteriori selection. To this end, three well-known MOEAs such as NSGA-II, SPEA-II and MOEA/D are thoroughly investigated to determine if there exists any significant difference in their search performance. The sensitivity of all these MOEAs to various qualitative and quantitative parameters, such as the choice of recombination mechanism, crossover and mutation probabilities, is also studied. These issues are critically analyzed considering seven discrete-time and a continuous-time benchmark nonlinear system as well as a practical case study of non-linear wave-force modeling. The results of this investigation demonstrate that MOEAs can be tailored to determine the correct structure of nonlinear systems. Further, it has been established through frequency domain analysis that it is possible to identify multiple valid discrete-time models for continuous-time systems. A rigorous statistical analysis of MOEAs via performance sweet spots in the parameter space convincingly demonstrates that these algorithms are robust over a wide range of control parameters.
△ Less
Submitted 16 August, 2019;
originally announced August 2019.
-
An Empirically Evaluated Checklist for Surveys in Software Engineering
Authors:
Jefferson Seide Molléri,
Kai Petersen,
Emilia Mendes
Abstract:
Context: Over the past decade Software Engineering research has seen a steady increase in survey-based studies, and there are several guidelines providing support for those willing to carry out surveys. The need for auditing survey research has been raised in the literature. Checklists have been used to assess different types of empirical studies, such as experiments and case studies. Objective: T…
▽ More
Context: Over the past decade Software Engineering research has seen a steady increase in survey-based studies, and there are several guidelines providing support for those willing to carry out surveys. The need for auditing survey research has been raised in the literature. Checklists have been used to assess different types of empirical studies, such as experiments and case studies. Objective: This paper proposes a checklist to support the design and assessment of survey-based research in software engineering grounded in existing guidelines for survey research. We further evaluated the checklist in the research practice context. Method: To construct the checklist, we systematically aggregated knowledge from 14 methodological papers supporting survey-based research in software engineering. We identified the key stages of the survey process and its recommended practices through thematic analysis and vote counting. To improve our initially designed checklist we evaluated it using a mixed evaluation approach involving experienced researchers. Results: The evaluation provided insights regarding limitations of the checklist in relation to its understanding and objectivity. In particular, 19 of the 38 checklist items were improved according to the feedback received from its evaluation. Finally, a discussion on how to use the checklist and what its implications are for research practice is also provided. Conclusion: The proposed checklist is an instrument suitable for auditing survey reports as well as a support tool to guide ongoing research with regard to the survey design process.
△ Less
Submitted 28 January, 2019;
originally announced January 2019.
-
Key Stakeholders' Value Propositions for Feature Selection in Software-intensive Products: An Industrial Case Study
Authors:
Pilar Rodríguez,
Emilia Mendes,
Burak Turhan
Abstract:
Numerous software companies are adopting value-based decision making. However, what does value mean for key stakeholders making decisions? How do different stakeholder groups understand value? Without an explicit understanding of what value means, decisions are subject to ambiguity and vagueness, which are likely to bias them. This case study provides an in-depth analysis of key stakeholders' valu…
▽ More
Numerous software companies are adopting value-based decision making. However, what does value mean for key stakeholders making decisions? How do different stakeholder groups understand value? Without an explicit understanding of what value means, decisions are subject to ambiguity and vagueness, which are likely to bias them. This case study provides an in-depth analysis of key stakeholders' value propositions when selecting features for a large telecommunications company's software-intensive product. Stakeholders' value propositions were elicited via interviews, which were analyzed using Grounded Theory coding techniques (open and selective coding). Thirty-six value propositions were identified and classified into six dimensions: customer value, market competitiveness, economic value/profitability, cost efficiency, technology & architecture, and company strategy. Our results show that although propositions in the customer value dimension were those mentioned the most, the concept of value for feature selection encompasses a wide range of value propositions. Moreover, stakeholder groups focused on different and complementary value dimensions, calling to the importance of involving all key stakeholders in the decision making process. Although our results are particularly relevant to companies similar to the one described herein, they aim to generate a learning process on value-based feature selection for practitioners and researchers in general.
△ Less
Submitted 30 October, 2018;
originally announced October 2018.
-
A Systematic Study of Cross-Project Defect Prediction With Meta-Learning
Authors:
Faimison Porto,
Leandro Minku,
Emilia Mendes,
Adenilso Simao
Abstract:
The prediction of defects in a target project based on data from external projects is called Cross-Project Defect Prediction (CPDP). Several methods have been proposed to improve the predictive performance of CPDP models. However, there is a lack of comparison among state-of-the-art methods. Moreover, previous work has shown that the most suitable method for a project can vary according to the pro…
▽ More
The prediction of defects in a target project based on data from external projects is called Cross-Project Defect Prediction (CPDP). Several methods have been proposed to improve the predictive performance of CPDP models. However, there is a lack of comparison among state-of-the-art methods. Moreover, previous work has shown that the most suitable method for a project can vary according to the project being predicted. This makes the choice of which method to use difficult. We provide an extensive experimental comparison of 31 CPDP methods derived from state-of-the-art approaches, applied to 47 versions of 15 open source software projects. Four methods stood out as presenting the best performances across datasets. However, the most suitable among these methods still varies according to the project being predicted. Therefore, we propose and evaluate a meta-learning solution designed to automatically select and recommend the most suitable CPDP method for a project. Our results show that the meta-learning solution is able to learn from previous experiences and recommend suitable methods dynamically. When compared to the base methods, however, the proposed solution presented minor difference of performance. These results provide valuable knowledge about the possibilities and limitations of a meta-learning solution applied for CPDP.
△ Less
Submitted 31 May, 2019; v1 submitted 16 February, 2018;
originally announced February 2018.