-
Requirements for Organizational Resilience: Engineering Developer Happiness
Authors:
Markus Borg,
Daniel Graziotin
Abstract:
Can the right requirements boost developer satisfaction and happiness? We believe they can. In kee** with this issue's theme, "Well-Being for Resilience: Developers Thrive," we discuss the connection between the three keywords, well-being, resilience, and thriving. How could requirements engineering foster these qualities? While there hasn't been much research on this topic, we see opportunities…
▽ More
Can the right requirements boost developer satisfaction and happiness? We believe they can. In kee** with this issue's theme, "Well-Being for Resilience: Developers Thrive," we discuss the connection between the three keywords, well-being, resilience, and thriving. How could requirements engineering foster these qualities? While there hasn't been much research on this topic, we see opportunities for future work. Let's initiate the discussion!
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Volume diffusion modelling of a sheared granular gas
Authors:
Duncan Dockar,
M. H. Lakshminarayana Reddy,
Matthew K. Borg,
S. Kokou Dadzie
Abstract:
Continuum fluid dynamic models based on the Navier-Stokes equations have previously been used to simulate granular media undergoing fluid-like shearing. These models, however, typically fail to predict the flow behaviour in confined environments as non-equilibrium particle effects dominate near walls. We adapt an extended hydrodynamic model for granular flows, which uses a density-gradient depende…
▽ More
Continuum fluid dynamic models based on the Navier-Stokes equations have previously been used to simulate granular media undergoing fluid-like shearing. These models, however, typically fail to predict the flow behaviour in confined environments as non-equilibrium particle effects dominate near walls. We adapt an extended hydrodynamic model for granular flows, which uses a density-gradient dependent ``volume diffusion'' term to correct the viscous stress tensor and heat flux, to simulate the shearing of a granular gas between two rough walls, and with corresponding boundary conditions. We use our volume diffusion model to predict channel flows for a range of mean volume fraction $\barφ=0.01$--$0.4$, and inter-particle coefficients of restitution $e=0.8$ and $0.9$, and compare with Discrete Element Method (DEM) simulations and classical Navier-Stokes equations. Our model is capable of predicting non-uniform pressure, volume fraction and granular temperature, which become more significant for cases with mean volume fraction $\barφ\sim0.1$, in which we typically observe non-uniform peak density variations, and large volume fraction gradients.
△ Less
Submitted 2 March, 2024;
originally announced March 2024.
-
Evaluation of Out-of-Distribution Detection Performance on Autonomous Driving Datasets
Authors:
Jens Henriksson,
Christian Berger,
Stig Ursing,
Markus Borg
Abstract:
Safety measures need to be systemically investigated to what extent they evaluate the intended performance of Deep Neural Networks (DNNs) for critical applications. Due to a lack of verification methods for high-dimensional DNNs, a trade-off is needed between accepted performance and handling of out-of-distribution (OOD) samples.
This work evaluates rejecting outputs from semantic segmentation D…
▽ More
Safety measures need to be systemically investigated to what extent they evaluate the intended performance of Deep Neural Networks (DNNs) for critical applications. Due to a lack of verification methods for high-dimensional DNNs, a trade-off is needed between accepted performance and handling of out-of-distribution (OOD) samples.
This work evaluates rejecting outputs from semantic segmentation DNNs by applying a Mahalanobis distance (MD) based on the most probable class-conditional Gaussian distribution for the predicted class as an OOD score. The evaluation follows three DNNs trained on the Cityscapes dataset and tested on four automotive datasets and finds that classification risk can drastically be reduced at the cost of pixel coverage, even when applied on unseen datasets. The applicability of our findings will support legitimizing safety measures and motivate their usage when arguing for safe usage of DNNs in automotive perception.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
Increasing, not Diminishing: Investigating the Returns of Highly Maintainable Code
Authors:
Markus Borg,
Ilyana Pruvost,
Enys Mones,
Adam Tornhill
Abstract:
Understanding and effectively managing Technical Debt (TD) remains a vital challenge in software engineering. While many studies on code-level TD have been published, few illustrate the business impact of low-quality source code. In this study, we combine two publicly available datasets to study the association between code quality on the one hand, and defect count and implementation time on the o…
▽ More
Understanding and effectively managing Technical Debt (TD) remains a vital challenge in software engineering. While many studies on code-level TD have been published, few illustrate the business impact of low-quality source code. In this study, we combine two publicly available datasets to study the association between code quality on the one hand, and defect count and implementation time on the other hand. We introduce a value-creation model, derived from regression analyses, to explore relative changes from a baseline. Our results show that the associations vary across different intervals of code quality. Furthermore, the value model suggests strong non-linearities at the extremes of the code quality spectrum. Most importantly, the model suggests amplified returns on investment in the upper end. We discuss the findings within the context of the "broken windows" theory and recommend organizations to diligently prevent the introduction of code smells in files with high churn. Finally, we argue that the value-creation model can be used to initiate discussions regarding the return on investment in refactoring efforts.
△ Less
Submitted 24 January, 2024;
originally announced January 2024.
-
Quality Requirements for Code: On the Untapped Potential in Maintainability Specifications
Authors:
Markus Borg
Abstract:
Quality requirements are critical for successful software engineering, with maintainability being a key internal quality. Despite significant attention in software metrics research, maintainability has attracted surprisingly little focus in the Requirements Engineering (RE) community. This position paper proposes a synergistic approach, combining code-oriented research with RE expertise, to create…
▽ More
Quality requirements are critical for successful software engineering, with maintainability being a key internal quality. Despite significant attention in software metrics research, maintainability has attracted surprisingly little focus in the Requirements Engineering (RE) community. This position paper proposes a synergistic approach, combining code-oriented research with RE expertise, to create meaningful industrial impact. We introduce six illustrative use cases and propose three future research directions. Preliminary findings indicate that the established QUPER model, designed for setting quality targets, does not adequately address the unique aspects of maintainability.
△ Less
Submitted 19 January, 2024;
originally announced January 2024.
-
A DSMC-CFD coupling method using surrogate modelling for low-speed rarefied gas flows
Authors:
Giorgos Tatsios,
Arun K. Chinnappan,
Arshad Kamal,
Nikolaos Vasileiadis,
Stephanie Y. Docherty,
Craig White,
Livio Gibelli,
Matthew K. Borg,
James R. Kermode,
Duncan A. Lockerby
Abstract:
A new Micro-Macro-Surrogate (MMS) hybrid method is presented that couples the Direct Simulation Monte Carlo (DSMC) method with Computational Fluid Dynamics (CFD) to simulate low-speed rarefied gas flows. The proposed MMS method incorporates surrogate modelling instead of direct coupling of DSMC data with the CFD, addressing the limitations CFD has in accurately modelling rarefied gas flows, the co…
▽ More
A new Micro-Macro-Surrogate (MMS) hybrid method is presented that couples the Direct Simulation Monte Carlo (DSMC) method with Computational Fluid Dynamics (CFD) to simulate low-speed rarefied gas flows. The proposed MMS method incorporates surrogate modelling instead of direct coupling of DSMC data with the CFD, addressing the limitations CFD has in accurately modelling rarefied gas flows, the computational cost of DSMC for low-speed and multiscale flows, as well as the pitfalls of noise in conventional direct coupling approaches. The surrogate models, trained on the DSMC data using Bayesian inference, provide noise-free and accurate corrections to the CFD simulation enabling it to capture the non-continuum physics. The MMS hybrid approach is validated by simulating low-speed, force-driven rarefied gas flows in a canonical parallel-plate system and shows excellent agreement with DSMC benchmark results. A comparison with the typical domain decomposition DSMC-CFD hybrid method is also presented, to demonstrate the advantages of noise-avoidance in the proposed approach. The method also inherently captures the uncertainty arising from micro-model fluctuations, allowing for the quantification of noise-related uncertainty in the predictions. The proposed MMS method demonstrates the potential to enable multiscale simulations where CFD is inaccurate and DSMC is prohibitively expensive.
△ Less
Submitted 23 November, 2023;
originally announced November 2023.
-
Summary of the 4th International Workshop on Requirements Engineering and Testing (RET 2017)
Authors:
Markus Borg,
Elizabeth Bjarnason,
Michael Unterkalmsteiner,
Tingting Yu,
Gregory Gay,
Michael Felderer
Abstract:
The RET (Requirements Engineering and Testing) workshop series provides a meeting point for researchers and practitioners from the two separate fields of Requirements Engineering (RE) and Testing. The long term aim is to build a community and a body of knowledge within the intersection of RE and Testing, i.e., RET. The 4th workshop was co-located with the 25th International Requirements Engineerin…
▽ More
The RET (Requirements Engineering and Testing) workshop series provides a meeting point for researchers and practitioners from the two separate fields of Requirements Engineering (RE) and Testing. The long term aim is to build a community and a body of knowledge within the intersection of RE and Testing, i.e., RET. The 4th workshop was co-located with the 25th International Requirements Engineering Conference (RE'17) in Lisbon, Portugal and attracted about 20 participants. In line with the previous workshop instances, RET 2017 o ered an interactive setting with a keynote, an invited talk, paper presentations, and a concluding hands-on exercise.
△ Less
Submitted 29 August, 2023;
originally announced August 2023.
-
A multi-case study of agile requirements engineering and the use of test cases as requirements
Authors:
Elizabeth Bjarnason,
Michael Unterkalmsteiner,
Markus Borg,
Emelie Engström
Abstract:
Context: It is an enigma that agile projects can succeed 'without requirements' when weak requirements engineering is a known cause for project failures. While agile development projects often manage well without extensive requirements test cases are commonly viewed as requirements and detailed requirements are documented as test cases. Objective: We have investigated this agile practice of using…
▽ More
Context: It is an enigma that agile projects can succeed 'without requirements' when weak requirements engineering is a known cause for project failures. While agile development projects often manage well without extensive requirements test cases are commonly viewed as requirements and detailed requirements are documented as test cases. Objective: We have investigated this agile practice of using test cases as requirements to understand how test cases can support the main requirements activities, and how this practice varies. Method: We performed an iterative case study at three companies and collected data through 14 interviews and two focus groups. Results: The use of test cases as requirements poses both benefits and challenges when eliciting, validating, verifying, and managing requirements, and when used as a documented agreement. We have identified five variants of the test-cases-as-requirements practice, namely de facto, behaviour-driven, story-test driven, stand-alone strict and stand-alone manual for which the application of the practice varies concerning the time frame of requirements documentation, the requirements format, the extent to which the test cases are a machine executable specification and the use of tools which provide specific support for the practice of using test cases as requirements. Conclusions: The findings provide empirical insight into how agile development projects manage and communicate requirements. The identified variants of the practice of using test cases as requirements can be used to perform in-depth investigations into agile requirements engineering. Practitioners can use the provided recommendations as a guide in designing and improving their agile requirements practices based on project characteristics such as number of stakeholders and rate of change.
△ Less
Submitted 22 August, 2023;
originally announced August 2023.
-
Summary of the 3rd International Workshop on Requirements Engineering and Testing
Authors:
Michael Unterkalmsteiner,
Gregory Gay,
Michael Felderer,
Elizabeth Bjarnason,
Markus Borg,
Mirko Morandini
Abstract:
The RET (Requirements Engineering and Testing) workshop series provides a meeting point for researchers and practitioners from the two separate fields of Requirements Engineering (RE) and Testing. The goal is to improve the connection and alignment of these two areas through an exchange of ideas, challenges, practices, experiences and results. The long term aim is to build a community and a body o…
▽ More
The RET (Requirements Engineering and Testing) workshop series provides a meeting point for researchers and practitioners from the two separate fields of Requirements Engineering (RE) and Testing. The goal is to improve the connection and alignment of these two areas through an exchange of ideas, challenges, practices, experiences and results. The long term aim is to build a community and a body of knowledge within the intersection of RE and Testing, i.e. RET. The 3rd workshop was held in co-location with REFSQ 2016 in Gothenburg, Sweden. The workshop continued in the same interactive vein as the predecessors and included a keynote, paper presentations with ample time for discussions, and panels. In order to create an RET knowledge base, this crosscutting area elicits contributions from both RE and Testing, and from both researchers and practitioners. A range of papers were presented from short positions papers to full research papers that cover connections between the two fields.
△ Less
Submitted 18 August, 2023;
originally announced August 2023.
-
An Industrial Case Study on Test Cases as Requirements
Authors:
Elizabeth Bjarnason,
Michael Unterkalmsteiner,
Emelie Engström,
Markus Borg
Abstract:
It is a conundrum that agile projects can succeed 'without requirements' when weak requirements engineering is a known cause for project failures. While Agile development projects often manage well without extensive requirements documentation, test cases are commonly used as requirements. We have investigated this agile practice at three companies in order to understand how test cases can fill the…
▽ More
It is a conundrum that agile projects can succeed 'without requirements' when weak requirements engineering is a known cause for project failures. While Agile development projects often manage well without extensive requirements documentation, test cases are commonly used as requirements. We have investigated this agile practice at three companies in order to understand how test cases can fill the role of requirements. We performed a case study based on twelve interviews performed in a previous study. The findings include a range of benefits and challenges in using test cases for eliciting, validating, verifying, tracing and managing requirements. In addition, we identified three scenarios for applying the practice, namely as a mature practice, as a de facto practice and as part of an agile transition. The findings provide insights into how the role of requirements may be met in agile development including challenges to consider.
△ Less
Submitted 12 August, 2023;
originally announced August 2023.
-
Summary of 2nd International Workshop on Requirements Engineering and Testing (RET)
Authors:
Elizabeth Bjarnason,
Mirko Morandini,
Markus Borg,
Michael Unterkalmsteiner,
Michael Felderer,
Matthew Staats
Abstract:
The RET (Requirements Engineering and Testing) workshop series provides a meeting point for researchers and practitioners from the two separate fields of Requirements Engineering (RE) and Testing. The goal is to improve the connection and alignment of these two areas through an exchange of ideas, challenges, practices, experiences and results. The long term aim is to build a community and a body o…
▽ More
The RET (Requirements Engineering and Testing) workshop series provides a meeting point for researchers and practitioners from the two separate fields of Requirements Engineering (RE) and Testing. The goal is to improve the connection and alignment of these two areas through an exchange of ideas, challenges, practices, experiences and results. The long term aim is to build a community and a body of knowledge within the intersection of RE and Testing, i.e. RET. The 2nd workshop was held in co-location with ICSE 2015 in Florence, Italy. The workshop continued in the same interactive vein as the 1st one and included a keynote, paper presentations with ample time for discussions, and a group exercise. For true impact and relevance this cross-cutting area requires contribution from both RE and Testing, and from both researchers and practitioners. A range of papers were presented from short experience papers to full research papers that cover connections between the two fields. One of the main outputs of the 2nd workshop was a categorization of the presented workshop papers according to an initial definition of the area of RET which identifies the aspects RE, Testing and coordination effect.
△ Less
Submitted 2 August, 2023;
originally announced August 2023.
-
Challenges and Practices in Aligning Requirements with Verification and Validation: A Case Study of Six Companies
Authors:
Elizabeth Bjarnason,
Per Runeson,
Markus Borg,
Michael Unterkalmsteiner,
Emelie Engström,
Björn Regnell,
Giedre Sabaliauskaite,
Annabella Loconsole,
Tony Gorschek,
Robert Feldt
Abstract:
Weak alignment of requirements engineering (RE) with verification and validation (VV) may lead to problems in delivering the required products in time with the right quality. For example, weak communication of requirements changes to testers may result in lack of verification of new requirements and incorrect verification of old invalid requirements, leading to software quality problems, wasted ef…
▽ More
Weak alignment of requirements engineering (RE) with verification and validation (VV) may lead to problems in delivering the required products in time with the right quality. For example, weak communication of requirements changes to testers may result in lack of verification of new requirements and incorrect verification of old invalid requirements, leading to software quality problems, wasted effort and delays. However, despite the serious implications of weak alignment research and practice both tend to focus on one or the other of RE or VV rather than on the alignment of the two. We have performed a multi-unit case study to gain insight into issues around aligning RE and VV by interviewing 30 practitioners from 6 software develo** companies, involving 10 researchers in a flexible research process for case studies. The results describe current industry challenges and practices in aligning RE with VV, ranging from quality of the individual RE and VV activities, through tracing and tools, to change control and sharing a common understanding at strategy, goal and design level. The study identified that human aspects are central, i.e. cooperation and communication, and that requirements engineering practices are a critical basis for alignment. Further, the size of an organisation and its motivation for applying alignment practices, e.g. external enforcement of traceability, are variation factors that play a key role in achieving alignment. Our results provide a strategic roadmap for practitioners improvement work to address alignment challenges. Furthermore, the study provides a foundation for continued research to improve the alignment of RE with VV.
△ Less
Submitted 23 July, 2023;
originally announced July 2023.
-
U Owns the Code That Changes and How Marginal Owners Resolve Issues Slower in Low-Quality Source Code
Authors:
Markus Borg,
Adam Tornhill,
Enys Mones
Abstract:
[Context] Accurate time estimation is a critical aspect of predictable software engineering. Previous work shows that low source code quality increases the uncertainty in issue resolution times. [Objective] Our goal is to evaluate how developers' project experience and file ownership are related to issue resolution times. [Method] We mine 40 proprietary software repositories and conduct an observa…
▽ More
[Context] Accurate time estimation is a critical aspect of predictable software engineering. Previous work shows that low source code quality increases the uncertainty in issue resolution times. [Objective] Our goal is to evaluate how developers' project experience and file ownership are related to issue resolution times. [Method] We mine 40 proprietary software repositories and conduct an observational study. Using CodeScene, we measure source code quality and active development time connected to Jira issues. [Results] Most source code changes are made by either a marginal or dominant code owner. Also, most changes to low-quality source code are made by developers with low levels of ownership. In low-quality source code, marginal owners need 45\% more time for small changes, and 93\% more time for large changes. [Conclusions] Collective code ownership is a popular target, but industry practice results in many dominant and marginal owners. Marginal owners are particularly hampered when working with low-quality source code, which leads to productivity losses. In codebases plagued by technical debt, newly onboarded developers will require more time to complete tasks.
△ Less
Submitted 23 April, 2023;
originally announced April 2023.
-
Automotive Perception Software Development: An Empirical Investigation into Data, Annotation, and Ecosystem Challenges
Authors:
Hans-Martin Heyn,
Khan Mohammad Habibullah,
Eric Knauss,
Jennifer Horkoff,
Markus Borg,
Alessia Knauss,
Polly **g Li
Abstract:
Software that contains machine learning algorithms is an integral part of automotive perception, for example, in driving automation systems. The development of such software, specifically the training and validation of the machine learning components, require large annotated datasets. An industry of data and annotation services has emerged to serve the development of such data-intensive automotive…
▽ More
Software that contains machine learning algorithms is an integral part of automotive perception, for example, in driving automation systems. The development of such software, specifically the training and validation of the machine learning components, require large annotated datasets. An industry of data and annotation services has emerged to serve the development of such data-intensive automotive software components. Wide-spread difficulties to specify data and annotation needs challenge collaborations between OEMs (Original Equipment Manufacturers) and their suppliers of software components, data, and annotations. This paper investigates the reasons for these difficulties for practitioners in the Swedish automotive industry to arrive at clear specifications for data and annotations. The results from an interview study show that a lack of effective metrics for data quality aspects, ambiguities in the way of working, unclear definitions of annotation quality, and deficits in the business ecosystems are causes for the difficulty in deriving the specifications. We provide a list of recommendations that can mitigate challenges when deriving specifications and we propose future research opportunities to overcome these challenges. Our work contributes towards the on-going research on accountability of machine learning as applied to complex software systems, especially for high-stake applications such as automated driving.
△ Less
Submitted 10 March, 2023;
originally announced March 2023.
-
Requirements Engineering for Automotive Perception Systems: an Interview Study
Authors:
Khan Mohammad Habibullah,
Hans-Martin Heyn,
Gregory Gay,
Jennifer Horkoff,
Eric Knauss,
Markus Borg,
Alessia Knauss,
Håkan Sivencrona,
Polly **g Li
Abstract:
Background: Driving automation systems (DAS), including autonomous driving and advanced driver assistance, are an important safety-critical domain. DAS often incorporate perceptions systems that use machine learning (ML) to analyze the vehicle environment. Aims: We explore new or differing requirements engineering (RE) topics and challenges that practitioners experience in this domain. Method: We…
▽ More
Background: Driving automation systems (DAS), including autonomous driving and advanced driver assistance, are an important safety-critical domain. DAS often incorporate perceptions systems that use machine learning (ML) to analyze the vehicle environment. Aims: We explore new or differing requirements engineering (RE) topics and challenges that practitioners experience in this domain. Method: We have conducted an interview study with 19 participants across five companies and performed thematic analysis. Results: Practitioners have difficulty specifying upfront requirements, and often rely on scenarios and operational design domains (ODDs) as RE artifacts. Challenges relate to ODD detection and ODD exit detection, realistic scenarios, edge case specification, breaking down requirements, traceability, creating specifications for data and annotations, and quantifying quality requirements. Conclusions: Our findings contribute to understanding how RE is practiced for DAS perception systems and the collected challenges can drive future research for DAS and other ML-enabled systems.
△ Less
Submitted 23 February, 2023;
originally announced February 2023.
-
Promoting Social Behaviour in Reducing Peak Electricity Consumption Using Multi-Agent Systems
Authors:
Nathan A. Brooks,
Simon T. Powers,
James M. Borg
Abstract:
As we transition to renewable energy sources, addressing their inflexibility during peak demand becomes crucial. It is therefore important to reduce the peak load placed on our energy system. For households, this entails spreading high-power appliance usage like dishwashers and washing machines throughout the day. Traditional approaches to spreading out usage have relied on differential pricing se…
▽ More
As we transition to renewable energy sources, addressing their inflexibility during peak demand becomes crucial. It is therefore important to reduce the peak load placed on our energy system. For households, this entails spreading high-power appliance usage like dishwashers and washing machines throughout the day. Traditional approaches to spreading out usage have relied on differential pricing set by a centralised utility company, but this has been ineffective. Our previous research investigated a decentralised mechanism where agents receive an initial allocation of time-slots to use their appliances, which they can exchange with others. This was found to be an effective approach to reducing the peak load when we introduced social capital, the tracking of favours, to incentivise agents to accept exchanges that do not immediately benefit them. This system encouraged self-interested agents to learn socially beneficial behaviour to earn social capital that they could later use to improve their own performance. In this paper we expand this work by implementing real world household appliance usage data to ensure that our mechanism could adapt to the challenging demand needs of real households. We also demonstrate how smaller and more diverse populations can optimise more effectively than larger community energy systems.
△ Less
Submitted 23 November, 2023; v1 submitted 18 November, 2022;
originally announced November 2022.
-
Mutation Testing Optimisations using the Clang Front-end
Authors:
Sten Vercammen,
Serge Demeyer,
Markus Borg,
Niklas Pettersson,
Görel Hedin
Abstract:
Mutation testing is the state-of-the-art technique for assessing the fault detection capacity of a test suite. Unfortunately, a full mutation analysis is often prohibitively expensive. The CppCheck project for instance, demands a build time of 5.8 minutes and a test execution time of 17 seconds on our desktop computer. An unoptimised mutation analysis, for 55,000 generated mutants took 11.8 days i…
▽ More
Mutation testing is the state-of-the-art technique for assessing the fault detection capacity of a test suite. Unfortunately, a full mutation analysis is often prohibitively expensive. The CppCheck project for instance, demands a build time of 5.8 minutes and a test execution time of 17 seconds on our desktop computer. An unoptimised mutation analysis, for 55,000 generated mutants took 11.8 days in total, of which 4.3 days is spent on (re)compiling the project. In this paper we present a feasibility study, investigating how a number of optimisation strategies can be implemented based on the Clang front-end. These optimisation strategies allow to eliminate the compilation and execution overhead in order to support efficient mutation testing for the C language family. We provide a proof-of-concept tool that achieves a speedup of between 2x and 30x. We make a detailed analysis of the speedup induced by the optimisations, elaborate on the lessons learned and point out avenues for further improvements.
△ Less
Submitted 31 October, 2022;
originally announced October 2022.
-
Automotive Multilingual Fault Diagnosis
Authors:
John Pavlopoulos,
Alv Romell,
Jacob Curman,
Olof Steinert,
Tony Lindgren,
Markus Borg
Abstract:
Automated fault diagnosis can facilitate diagnostics assistance, speedier troubleshooting, and better-organised logistics. Currently, AI-based prognostics and health management in the automotive industry ignore the textual descriptions of the experienced problems or symptoms. With this study, however, we show that a multilingual pre-trained Transformer can effectively classify the textual claims f…
▽ More
Automated fault diagnosis can facilitate diagnostics assistance, speedier troubleshooting, and better-organised logistics. Currently, AI-based prognostics and health management in the automotive industry ignore the textual descriptions of the experienced problems or symptoms. With this study, however, we show that a multilingual pre-trained Transformer can effectively classify the textual claims from a large company with vehicle fleets, despite the task's challenging nature due to the 38 languages and 1,357 classes involved. Overall, we report an accuracy of more than 80% for high-frequency classes and above 60% for above-low-frequency classes, bringing novel evidence that multilingual classification can benefit automotive troubleshooting management.
△ Less
Submitted 13 October, 2022;
originally announced October 2022.
-
Adopting Automated Bug Assignment in Practice: A Longitudinal Case Study at Ericsson
Authors:
Markus Borg,
Leif Jonsson,
Emelie Engström,
Béla Bartalos,
Attila Szabó
Abstract:
The continuous inflow of bug reports is a considerable challenge in large development projects. Inspired by contemporary work on mining software repositories, we designed a prototype bug assignment solution based on machine learning in 2011-2016. The prototype evolved into an internal Ericsson product, TRR, in 2017-2018. TRR's first bug assignment without human intervention happened in April 2019.…
▽ More
The continuous inflow of bug reports is a considerable challenge in large development projects. Inspired by contemporary work on mining software repositories, we designed a prototype bug assignment solution based on machine learning in 2011-2016. The prototype evolved into an internal Ericsson product, TRR, in 2017-2018. TRR's first bug assignment without human intervention happened in April 2019. Our study evaluates the adoption of TRR within its industrial context at Ericsson. Moreover, we investigate 1) how TRR performs in the field, 2) what value TRR provides to Ericsson, and 3) how TRR has influenced the ways of working. We conduct an industrial case study combining interviews with TRR stakeholders, minutes from sprint planning meetings, and bug tracking data. The data analysis includes thematic analysis, descriptive statistics, and Bayesian causal analysis. TRR is now an incorporated part of the bug assignment process. Considering the abstraction levels of the telecommunications stack, high-level modules are more positive while low-level modules experienced some drawbacks. On average, TRR automatically assigns 30% of the incoming bug reports with an accuracy of 75%. Auto-routed TRs are resolved around 21% faster within Ericsson, and TRR has saved highly seasoned engineers many hours of work. Indirect effects of adopting TRR include process improvements, process awareness, increased communication, and higher job satisfaction. TRR has saved time at Ericsson, but the adoption of automated bug assignment was more intricate compared to similar endeavors reported from other companies. We primarily attribute the difference to the very large size of the organization and the complex products. Key facilitators in the successful adoption include a gradual introduction, product champions, and careful stakeholder analysis.
△ Less
Submitted 19 September, 2022;
originally announced September 2022.
-
Performance Analysis of Out-of-Distribution Detection on Trained Neural Networks
Authors:
Jens Henriksson,
Christian Berger,
Markus Borg,
Lars Tornberg,
Sankar Raman Sathyamoorthy,
Cristofer Englund
Abstract:
Several areas have been improved with Deep Learning during the past years. Implementing Deep Neural Networks (DNN) for non-safety related applications have shown remarkable achievements over the past years; however, for using DNNs in safety critical applications, we are missing approaches for verifying the robustness of such models. A common challenge for DNNs occurs when exposed to out-of-distrib…
▽ More
Several areas have been improved with Deep Learning during the past years. Implementing Deep Neural Networks (DNN) for non-safety related applications have shown remarkable achievements over the past years; however, for using DNNs in safety critical applications, we are missing approaches for verifying the robustness of such models. A common challenge for DNNs occurs when exposed to out-of-distribution samples that are outside of the scope of a DNN, but which result in high confidence outputs despite no prior knowledge of such input.
In this paper, we analyze three methods that separate between in- and out-of-distribution data, called supervisors, on four well-known DNN architectures. We find that the outlier detection performance improves with the quality of the model. We also analyse the performance of the particular supervisors during the training procedure by applying the supervisor at a predefined interval to investigate its performance as the training proceeds. We observe that understanding the relationship between training results and supervisor performance is crucial to improve the model's robustness and to indicate, what input samples require further measures to improve the robustness of a DNN. In addition, our work paves the road towards an instrument for safety argumentation for safety critical applications. This paper is an extended version of our previous work presented at 2019 SEAA (cf. [1]); here, we elaborate on the used metrics, add an additional supervisor and test them on two additional datasets.
△ Less
Submitted 26 April, 2022;
originally announced April 2022.
-
Ergo, SMIRK is Safe: A Safety Case for a Machine Learning Component in a Pedestrian Automatic Emergency Brake System
Authors:
Markus Borg,
Jens Henriksson,
Kasper Socha,
Olof Lennartsson,
Elias Sonnsjö Lönegren,
Thanh Bui,
Piotr Tomaszewski,
Sankar Raman Sathyamoorthy,
Sebastian Brink,
Mahshid Helali Moghadam
Abstract:
Integration of Machine Learning (ML) components in critical applications introduces novel challenges for software certification and verification. New safety standards and technical guidelines are under development to support the safety of ML-based systems, e.g., ISO 21448 SOTIF for the automotive domain and the Assurance of Machine Learning for use in Autonomous Systems (AMLAS) framework. SOTIF an…
▽ More
Integration of Machine Learning (ML) components in critical applications introduces novel challenges for software certification and verification. New safety standards and technical guidelines are under development to support the safety of ML-based systems, e.g., ISO 21448 SOTIF for the automotive domain and the Assurance of Machine Learning for use in Autonomous Systems (AMLAS) framework. SOTIF and AMLAS provide high-level guidance but the details must be chiseled out for each specific case. We initiated a research project with the goal to demonstrate a complete safety case for an ML component in an open automotive system. This paper reports results from an industry-academia collaboration on safety assurance of SMIRK, an ML-based pedestrian automatic emergency braking demonstrator running in an industry-grade simulator. We demonstrate an application of AMLAS on SMIRK for a minimalistic operational design domain, i.e., we share a complete safety case for its integrated ML-based component. Finally, we report lessons learned and provide both SMIRK and the safety case under an open-source licence for the research community to reuse.
△ Less
Submitted 6 December, 2022; v1 submitted 16 April, 2022;
originally announced April 2022.
-
Exploring ML testing in practice -- Lessons learned from an interactive rapid review with Axis Communications
Authors:
Qunying Song,
Markus Borg,
Emelie Engström,
Håkan Ardö,
Sergio Rico
Abstract:
There is a growing interest in industry and academia in machine learning (ML) testing. We believe that industry and academia need to learn together to produce rigorous and relevant knowledge. In this study, we initiate a collaboration between stakeholders from one case company, one research institute, and one university. To establish a common view of the problem domain, we applied an interactive r…
▽ More
There is a growing interest in industry and academia in machine learning (ML) testing. We believe that industry and academia need to learn together to produce rigorous and relevant knowledge. In this study, we initiate a collaboration between stakeholders from one case company, one research institute, and one university. To establish a common view of the problem domain, we applied an interactive rapid review of the state of the art. Four researchers from Lund University and RISE Research Institutes and four practitioners from Axis Communications reviewed a set of 180 primary studies on ML testing. We developed a taxonomy for the communication around ML testing challenges and results and identified a list of 12 review questions relevant for Axis Communications. The three most important questions (data testing, metrics for assessment, and test generation) were mapped to the literature, and an in-depth analysis of the 35 primary studies matching the most important question (data testing) was made. A final set of the five best matches were analysed and we reflect on the criteria for applicability and relevance for the industry. The taxonomies are helpful for communication but not final. Furthermore, there was no perfect match to the case company's investigated review question (data testing). However, we extracted relevant approaches from the five studies on a conceptual level to support later context-specific improvements. We found the interactive rapid review approach useful for triggering and aligning communication between the different stakeholders.
△ Less
Submitted 30 March, 2022;
originally announced March 2022.
-
Quality Assurance of Generative Dialog Models in an Evolving Conversational Agent Used for Swedish Language Practice
Authors:
Markus Borg,
Johan Bengtsson,
Harald Österling,
Alexander Hagelborn,
Isabella Gagner,
Piotr Tomaszewski
Abstract:
Due to the migration megatrend, efficient and effective second-language acquisition is vital. One proposed solution involves AI-enabled conversational agents for person-centered interactive language practice. We present results from ongoing action research targeting quality assurance of proprietary generative dialog models trained for virtual job interviews. The action team elicited a set of 38 re…
▽ More
Due to the migration megatrend, efficient and effective second-language acquisition is vital. One proposed solution involves AI-enabled conversational agents for person-centered interactive language practice. We present results from ongoing action research targeting quality assurance of proprietary generative dialog models trained for virtual job interviews. The action team elicited a set of 38 requirements for which we designed corresponding automated test cases for 15 of particular interest to the evolving solution. Our results show that six of the test case designs can detect meaningful differences between candidate models. While quality assurance of natural language processing applications is complex, we provide initial steps toward an automated framework for machine learning model selection in the context of an evolving conversational agent. Future work will focus on model selection in an MLOps setting.
△ Less
Submitted 29 March, 2022;
originally announced March 2022.
-
Evolved Open-Endedness in Cultural Evolution: A New Dimension in Open-Ended Evolution Research
Authors:
James M. Borg,
Andrew Buskell,
Rohan Kapitany,
Simon T. Powers,
Eva Reindl,
Claudio Tennie
Abstract:
The goal of Artificial Life research, as articulated by Chris Langton, is "to contribute to theoretical biology by locating life-as-we-know-it within the larger picture of life-as-it-could-be" (1989, p.1). The study and pursuit of open-ended evolution in artificial evolutionary systems exemplifies this goal. However, open-ended evolution research is hampered by two fundamental issues; the struggle…
▽ More
The goal of Artificial Life research, as articulated by Chris Langton, is "to contribute to theoretical biology by locating life-as-we-know-it within the larger picture of life-as-it-could-be" (1989, p.1). The study and pursuit of open-ended evolution in artificial evolutionary systems exemplifies this goal. However, open-ended evolution research is hampered by two fundamental issues; the struggle to replicate open-endedness in an artificial evolutionary system, and the fact that we only have one system (genetic evolution) from which to draw inspiration. Here we argue that cultural evolution should be seen not only as another real-world example of an open-ended evolutionary system, but that the unique qualities seen in cultural evolution provide us with a new perspective from which we can assess the fundamental properties of, and ask new questions about, open-ended evolutionary systems, especially in regard to evolved open-endedness and transitions from bounded to unbounded evolution. Here we provide an overview of culture as an evolutionary system, highlight the interesting case of human cultural evolution as an open-ended evolutionary system, and contextualise cultural evolution under the framework of (evolved) open-ended evolution. We go on to provide a set of new questions that can be asked once we consider cultural evolution within the framework of open-ended evolution, and introduce new insights that we may be able to gain about evolved open-endedness as a result of asking these questions.
△ Less
Submitted 19 September, 2022; v1 submitted 24 March, 2022;
originally announced March 2022.
-
Machine Learning Testing in an ADAS Case Study Using Simulation-Integrated Bio-Inspired Search-Based Testing
Authors:
Mahshid Helali Moghadam,
Markus Borg,
Mehrdad Saadatmand,
Seyed Jalaleddin Mousavirad,
Markus Bohlin,
Björn Lisper
Abstract:
This paper presents an extended version of Deeper, a search-based simulation-integrated test solution that generates failure-revealing test scenarios for testing a deep neural network-based lane-kee** system. In the newly proposed version, we utilize a new set of bio-inspired search algorithms, genetic algorithm (GA), $(μ+λ)$ and $(μ,λ)$ evolution strategies (ES), and particle swarm optimization…
▽ More
This paper presents an extended version of Deeper, a search-based simulation-integrated test solution that generates failure-revealing test scenarios for testing a deep neural network-based lane-kee** system. In the newly proposed version, we utilize a new set of bio-inspired search algorithms, genetic algorithm (GA), $(μ+λ)$ and $(μ,λ)$ evolution strategies (ES), and particle swarm optimization (PSO), that leverage a quality population seed and domain-specific cross-over and mutation operations tailored for the presentation model used for modeling the test scenarios. In order to demonstrate the capabilities of the new test generators within Deeper, we carry out an empirical evaluation and comparison with regard to the results of five participating tools in the cyber-physical systems testing competition at SBST 2021. Our evaluation shows the newly proposed test generators in Deeper not only represent a considerable improvement on the previous version but also prove to be effective and efficient in provoking a considerable number of diverse failure-revealing test scenarios for testing an ML-driven lane-kee** system. They can trigger several failures while promoting test scenario diversity, under a limited test time budget, high target failure severity, and strict speed limit constraints.
△ Less
Submitted 7 June, 2023; v1 submitted 22 March, 2022;
originally announced March 2022.
-
Code Red: The Business Impact of Code Quality -- A Quantitative Study of 39 Proprietary Production Codebases
Authors:
Adam Tornhill,
Markus Borg
Abstract:
Code quality remains an abstract concept that fails to get traction at the business level. Consequently, software companies keep trading code quality for time-to-market and new features. The resulting technical debt is estimated to waste up to 42% of developers' time. At the same time, there is a global shortage of software developers, meaning that developer productivity is key to software busines…
▽ More
Code quality remains an abstract concept that fails to get traction at the business level. Consequently, software companies keep trading code quality for time-to-market and new features. The resulting technical debt is estimated to waste up to 42% of developers' time. At the same time, there is a global shortage of software developers, meaning that developer productivity is key to software businesses. Our overall mission is to make code quality a business concern, not just a technical aspect. Our first goal is to understand how code quality impacts 1) the number of reported defects, 2) the time to resolve issues, and 3) the predictability of resolving issues on time. We analyze 39 proprietary production codebases from a variety of domains using the CodeScene tool based on a combination of source code analysis, version-control mining, and issue information from Jira. By analyzing activity in 30,737 files, we find that low quality code contains 15 times more defects than high quality code. Furthermore, resolving issues in low quality code takes on average 124% more time in development. Finally, we report that issue resolutions in low quality code involve higher uncertainty manifested as 9 times longer maximum cycle times. This study provides evidence that code quality cannot be dismissed as a technical concern. With 15 times fewer defects, twice the development speed, and substantially more predictable issue resolution times, the business advantage of high quality code should be unmistakably clear.
△ Less
Submitted 8 March, 2022;
originally announced March 2022.
-
Agility in Software 2.0 -- Notebook Interfaces and MLOps with Buttresses and Rebars
Authors:
Markus Borg
Abstract:
Artificial intelligence through machine learning is increasingly used in the digital society. Solutions based on machine learning bring both great opportunities, thus coined "Software 2.0," but also great challenges for the engineering community to tackle. Due to the experimental approach used by data scientists when develo** machine learning models, agility is an essential characteristic. In th…
▽ More
Artificial intelligence through machine learning is increasingly used in the digital society. Solutions based on machine learning bring both great opportunities, thus coined "Software 2.0," but also great challenges for the engineering community to tackle. Due to the experimental approach used by data scientists when develo** machine learning models, agility is an essential characteristic. In this keynote address, we discuss two contemporary development phenomena that are fundamental in machine learning development, i.e., notebook interfaces and MLOps. First, we present a solution that can remedy some of the intrinsic weaknesses of working in notebooks by supporting easy transitions to integrated development environments. Second, we propose reinforced engineering of AI systems by introducing metaphorical buttresses and rebars in the MLOps context. Machine learning-based solutions are dynamic in nature, and we argue that reinforced continuous engineering is required to quality assure the trustworthy AI systems of tomorrow.
△ Less
Submitted 28 November, 2021;
originally announced November 2021.
-
Machine Learning-Assisted Analysis of Small Angle X-ray Scattering
Authors:
Piotr Tomaszewski,
Shun Yu,
Markus Borg,
Jerk Rönnols
Abstract:
Small angle X-ray scattering (SAXS) is extensively used in materials science as a way of examining nanostructures. The analysis of experimental SAXS data involves map** a rather simple data format to a vast amount of structural models. Despite various scientific computing tools to assist the model selection, the activity heavily relies on the SAXS analysts' experience, which is recognized as an…
▽ More
Small angle X-ray scattering (SAXS) is extensively used in materials science as a way of examining nanostructures. The analysis of experimental SAXS data involves map** a rather simple data format to a vast amount of structural models. Despite various scientific computing tools to assist the model selection, the activity heavily relies on the SAXS analysts' experience, which is recognized as an efficiency bottleneck by the community. To cope with this decision-making problem, we develop and evaluate the open-source, Machine Learning-based tool SCAN (SCattering Ai aNalysis) to provide recommendations on model selection. SCAN exploits multiple machine learning algorithms and uses models and a simulation tool implemented in the SasView package for generating a well defined set of datasets. Our evaluation shows that SCAN delivers an overall accuracy of 95%-97%. The XGBoost Classifier has been identified as the most accurate method with a good balance between accuracy and training time. With eleven predefined structural models for common nanostructures and an easy draw-drop function to expand the number and types training models, SCAN can accelerate the SAXS data analysis workflow.
△ Less
Submitted 16 November, 2021;
originally announced November 2021.
-
Adopting Automated Bug Assignment in Practice -- A Registered Report of an Industrial Case Study
Authors:
Markus Borg,
Leif Jonsson,
Emelie Engström,
Béla Bartalos,
Attila Szabo
Abstract:
[Background/Context] The continuous inflow of bug reports is a considerable challenge in large development projects. Inspired by contemporary work on mining software repositories, we designed a prototype bug assignment solution based on machine learning in 2011-2016. The prototype evolved into an internal Ericsson product, TRR, in 2017-2018. TRR's first bug assignment without human intervention ha…
▽ More
[Background/Context] The continuous inflow of bug reports is a considerable challenge in large development projects. Inspired by contemporary work on mining software repositories, we designed a prototype bug assignment solution based on machine learning in 2011-2016. The prototype evolved into an internal Ericsson product, TRR, in 2017-2018. TRR's first bug assignment without human intervention happened in 2019. [Objective/Aim] Our exploratory study will evaluate the adoption of TRR within its industrial context at Ericsson. We seek to understand 1) how TRR performs in the field, 2) what value TRR provides to Ericsson, and 3) how TRR has influenced the ways of working. Secondly, we will provide lessons learned related to productization of a research prototype within a company. [Method] We design an industrial case study combining interviews with TRR developers and users with analysis of data extracted from the bug tracking system at Ericsson. Furthermore, we will analyze sprint planning meetings recorded during the productization. Our data analysis will include thematic analysis, descriptive statistics, and Bayesian causal analysis.
△ Less
Submitted 28 September, 2021;
originally announced September 2021.
-
Efficient and Effective Generation of Test Cases for Pedestrian Detection -- Search-based Software Testing of Baidu Apollo in SVL
Authors:
Hamid Ebadi,
Mahshid Helali Moghadam,
Markus Borg,
Gregory Gay,
Afonso Fontes,
Kasper Socha
Abstract:
With the growing capabilities of autonomous vehicles, there is a higher demand for sophisticated and pragmatic quality assurance approaches for machine learning-enabled systems in the automotive AI context. The use of simulation-based prototy** platforms provides the possibility for early-stage testing, enabling inexpensive testing and the ability to capture critical corner-case test scenarios.…
▽ More
With the growing capabilities of autonomous vehicles, there is a higher demand for sophisticated and pragmatic quality assurance approaches for machine learning-enabled systems in the automotive AI context. The use of simulation-based prototy** platforms provides the possibility for early-stage testing, enabling inexpensive testing and the ability to capture critical corner-case test scenarios. Simulation-based testing properly complements conventional on-road testing. However, due to the large space of test input parameters in these systems, the efficient generation of effective test scenarios leading to the unveiling of failures is a challenge. This paper presents a study on testing pedestrian detection and emergency braking system of the Baidu Apollo autonomous driving platform within the SVL simulator. We propose an evolutionary automated test generation technique that generates failure-revealing scenarios for Apollo in the SVL environment. Our approach models the input space using a generic and flexible data structure and benefits a multi-criteria safety-based heuristic for the objective function targeted for optimization. This paper presents the results of our proposed test generation technique in the 2021 IEEE Autonomous Driving AI Test Challenge. In order to demonstrate the efficiency and effectiveness of our approach, we also report the results from a baseline random generation technique. Our evaluation shows that the proposed evolutionary test case generator is more effective at generating failure-revealing test cases and provides higher diversity between the generated failures than the random baseline.
△ Less
Submitted 18 October, 2021; v1 submitted 16 September, 2021;
originally announced September 2021.
-
Challenges of Adopting SAFe in the Banking Industry -- A Study Two Years after its Introduction
Authors:
Sara Nilsson Tengstrand,
Piotr Tomaszewski,
Markus Borg,
Ronald Jabangwe
Abstract:
The Scaled Agile Framework (SAFe) is a framework for scaling agile methods in large organizations. We have found several experience reports and white papers describing SAFe adoptions in different banks, which indicates that SAFe is being used in the banking industry. However, there is a lack of academic publications on the topic, the banking industry is missing in the scientific reports analyzing…
▽ More
The Scaled Agile Framework (SAFe) is a framework for scaling agile methods in large organizations. We have found several experience reports and white papers describing SAFe adoptions in different banks, which indicates that SAFe is being used in the banking industry. However, there is a lack of academic publications on the topic, the banking industry is missing in the scientific reports analyzing SAFe transformations. To fill this gap, we present a study on the main challenges with a SAFe transformation at a large full-service bank. We identify the challenges in the bank under study and compare the findings with experience reports from other banks, as well as with research on SAFe transformations in other domains. Many of the challenges reported in this paper overlap with the generic SAFe challenges, including management and organization, education and training, culture and mindset, requirements engineering, quality assurance, and systems architecture. However, we also report some novel challenges specific to the banking domain, e.g., the risk of jeopardizing customer relations, stability, and trust of external stakeholders. This study validates several SAFe-related challenges reported in previous work in the banking context. It also brings up some novel challenges specific to the banking industry. Therefore, we believe our results are particularly useful to practitioners responsible for SAFe transformations at other banks.
△ Less
Submitted 28 April, 2021;
originally announced April 2021.
-
Performance Testing Using a Smart Reinforcement Learning-Driven Test Agent
Authors:
Mahshid Helali Moghadam,
Golrokh Hamidi,
Markus Borg,
Mehrdad Saadatmand,
Markus Bohlin,
Björn Lisper,
Pasqualina Potena
Abstract:
Performance testing with the aim of generating an efficient and effective workload to identify performance issues is challenging. Many of the automated approaches mainly rely on analyzing system models, source code, or extracting the usage pattern of the system during the execution. However, such information and artifacts are not always available. Moreover, all the transactions within a generated…
▽ More
Performance testing with the aim of generating an efficient and effective workload to identify performance issues is challenging. Many of the automated approaches mainly rely on analyzing system models, source code, or extracting the usage pattern of the system during the execution. However, such information and artifacts are not always available. Moreover, all the transactions within a generated workload do not impact the performance of the system the same way, a finely tuned workload could accomplish the test objective in an efficient way. Model-free reinforcement learning is widely used for finding the optimal behavior to accomplish an objective in many decision-making problems without relying on a model of the system. This paper proposes that if the optimal policy (way) for generating test workload to meet a test objective can be learned by a test agent, then efficient test automation would be possible without relying on system models or source code. We present a self-adaptive reinforcement learning-driven load testing agent, RELOAD, that learns the optimal policy for test workload generation and generates an effective workload efficiently to meet the test objective. Once the agent learns the optimal policy, it can reuse the learned policy in subsequent testing activities. Our experiments show that the proposed intelligent load test agent can accomplish the test objective with lower test cost compared to common load testing procedures, and results in higher test efficiency.
△ Less
Submitted 26 April, 2021;
originally announced April 2021.
-
Performance Analysis of Out-of-Distribution Detection on Various Trained Neural Networks
Authors:
Jens Henriksson,
Christian Berger,
Markus Borg,
Lars Tornberg,
Sankar Raman Sathyamoorthy,
Cristofer Englund
Abstract:
Several areas have been improved with Deep Learning during the past years. For non-safety related products adoption of AI and ML is not an issue, whereas in safety critical applications, robustness of such approaches is still an issue. A common challenge for Deep Neural Networks (DNN) occur when exposed to out-of-distribution samples that are previously unseen, where DNNs can yield high confidence…
▽ More
Several areas have been improved with Deep Learning during the past years. For non-safety related products adoption of AI and ML is not an issue, whereas in safety critical applications, robustness of such approaches is still an issue. A common challenge for Deep Neural Networks (DNN) occur when exposed to out-of-distribution samples that are previously unseen, where DNNs can yield high confidence predictions despite no prior knowledge of the input.
In this paper we analyse two supervisors on two well-known DNNs with varied setups of training and find that the outlier detection performance improves with the quality of the training procedure. We analyse the performance of the supervisor after each epoch during the training cycle, to investigate supervisor performance as the accuracy converges. Understanding the relationship between training results and supervisor performance is valuable to improve robustness of the model and indicates where more work has to be done to create generalized models for safety critical applications.
△ Less
Submitted 29 March, 2021;
originally announced March 2021.
-
Exploring the Assessment List for Trustworthy AI in the Context of Advanced Driver-Assistance Systems
Authors:
Markus Borg,
Joshua Bronson,
Linus Christensson,
Fredrik Olsson,
Olof Lennartsson,
Elias Sonnsjö,
Hamid Ebabi,
Martin Karsberg
Abstract:
Artificial Intelligence (AI) is increasingly used in critical applications. Thus, the need for dependable AI systems is rapidly growing. In 2018, the European Commission appointed experts to a High-Level Expert Group on AI (AI-HLEG). AI-HLEG defined Trustworthy AI as 1) lawful, 2) ethical, and 3) robust and specified seven corresponding key requirements. To help development organizations, AI-HLEG…
▽ More
Artificial Intelligence (AI) is increasingly used in critical applications. Thus, the need for dependable AI systems is rapidly growing. In 2018, the European Commission appointed experts to a High-Level Expert Group on AI (AI-HLEG). AI-HLEG defined Trustworthy AI as 1) lawful, 2) ethical, and 3) robust and specified seven corresponding key requirements. To help development organizations, AI-HLEG recently published the Assessment List for Trustworthy AI (ALTAI). We present an illustrative case study from applying ALTAI to an ongoing development project of an Advanced Driver-Assistance System (ADAS) that relies on Machine Learning (ML). Our experience shows that ALTAI is largely applicable to ADAS development, but specific parts related to human agency and transparency can be disregarded. Moreover, bigger questions related to societal and environmental impact cannot be tackled by an ADAS supplier in isolation. We present how we plan to develop the ADAS to ensure ALTAI-compliance. Finally, we provide three recommendations for the next revision of ALTAI, i.e., life-cycle variants, domain-specific adaptations, and removed redundancy.
△ Less
Submitted 4 March, 2021;
originally announced March 2021.
-
Test Automation with Grad-CAM Heatmaps -- A Future Pipe Segment in MLOps for Vision AI?
Authors:
Markus Borg,
Ronald Jabangwe,
Simon Åberg,
Arvid Ekblom,
Ludwig Hedlund,
August Lidfeldt
Abstract:
Machine Learning (ML) is a fundamental part of modern perception systems. In the last decade, the performance of computer vision using trained deep neural networks has outperformed previous approaches based on careful feature engineering. However, the opaqueness of large ML models is a substantial impediment for critical applications such as in the automotive context. As a remedy, Gradient-weighte…
▽ More
Machine Learning (ML) is a fundamental part of modern perception systems. In the last decade, the performance of computer vision using trained deep neural networks has outperformed previous approaches based on careful feature engineering. However, the opaqueness of large ML models is a substantial impediment for critical applications such as in the automotive context. As a remedy, Gradient-weighted Class Activation Map** (Grad-CAM) has been proposed to provide visual explanations of model internals. In this paper, we demonstrate how Grad-CAM heatmaps can be used to increase the explainability of an image recognition model trained for a pedestrian underpass. We argue how the heatmaps support compliance to the EU's seven key requirements for Trustworthy AI. Finally, we propose adding automated heatmap analysis as a pipe segment in an MLOps pipeline. We believe that such a building block can be used to automatically detect if a trained ML-model is activated based on invalid pixels in test images, suggesting biased models.
△ Less
Submitted 2 March, 2021;
originally announced March 2021.
-
Digital Twins Are Not Monozygotic -- Cross-Replicating ADAS Testing in Two Industry-Grade Automotive Simulators
Authors:
Markus Borg,
Raja Ben Abdessalem,
Shiva Nejati,
Francois-Xavier Jegeden,
Donghwan Shin
Abstract:
The increasing levels of software- and data-intensive driving automation call for an evolution of automotive software testing. As a recommended practice of the Verification and Validation (V&V) process of ISO/PAS 21448, a candidate standard for safety of the intended functionality for road vehicles, simulation-based testing has the potential to reduce both risks and costs. There is a growing body…
▽ More
The increasing levels of software- and data-intensive driving automation call for an evolution of automotive software testing. As a recommended practice of the Verification and Validation (V&V) process of ISO/PAS 21448, a candidate standard for safety of the intended functionality for road vehicles, simulation-based testing has the potential to reduce both risks and costs. There is a growing body of research on devising test automation techniques using simulators for Advanced Driver-Assistance Systems (ADAS). However, how similar are the results if the same test scenarios are executed in different simulators? We conduct a replication study of applying a Search-Based Software Testing (SBST) solution to a real-world ADAS (PeVi, a pedestrian vision detection system) using two different commercial simulators, namely, TASS/Siemens PreScan and ESI Pro-SiVIC. Based on a minimalistic scene, we compare critical test scenarios generated using our SBST solution in these two simulators. We show that SBST can be used to effectively and efficiently generate critical test scenarios in both simulators, and the test results obtained from the two simulators can reveal several weaknesses of the ADAS under test. However, executing the same test scenarios in the two simulators leads to notable differences in the details of the test outputs, in particular, related to (1) safety violations revealed by tests, and (2) dynamics of cars and pedestrians. Based on our findings, we recommend future V&V plans to include multiple simulators to support robust simulation-based testing and to base test objectives on measures that are less dependant on the internals of the simulators.
△ Less
Submitted 28 January, 2021; v1 submitted 12 December, 2020;
originally announced December 2020.
-
Enabling Image Recognition on Constrained Devices Using Neural Network Pruning and a CycleGAN
Authors:
August Lidfelt,
Daniel Isaksson,
Ludwig Hedlund,
Simon Åberg,
Markus Borg,
Erik Larsson
Abstract:
Smart cameras are increasingly used in surveillance solutions in public spaces. Contemporary computer vision applications can be used to recognize events that require intervention by emergency services. Smart cameras can be mounted in locations where citizens feel particularly unsafe, e.g., pathways and underpasses with a history of incidents. One promising approach for smart cameras is edge AI, i…
▽ More
Smart cameras are increasingly used in surveillance solutions in public spaces. Contemporary computer vision applications can be used to recognize events that require intervention by emergency services. Smart cameras can be mounted in locations where citizens feel particularly unsafe, e.g., pathways and underpasses with a history of incidents. One promising approach for smart cameras is edge AI, i.e., deploying AI technology on IoT devices. However, implementing resource-demanding technology such as image recognition using deep neural networks (DNN) on constrained devices is a substantial challenge. In this paper, we explore two approaches to reduce the need for compute in contemporary image recognition in an underpass. First, we showcase successful neural network pruning, i.e., we retain comparable classification accuracy with only 1.1\% of the neurons remaining from the state-of-the-art DNN architecture. Second, we demonstrate how a CycleGAN can be used to transform out-of-distribution images to the operational design domain. We posit that both pruning and CycleGANs are promising enablers for efficient edge AI in smart cameras.
△ Less
Submitted 11 September, 2020;
originally announced September 2020.
-
The AIQ Meta-Testbed: Pragmatically Bridging Academic AI Testing and Industrial Q Needs
Authors:
Markus Borg
Abstract:
AI solutions seem to appear in any and all application domains. As AI becomes more pervasive, the importance of quality assurance increases. Unfortunately, there is no consensus on what artificial intelligence means and interpretations range from simple statistical analysis to sentient humanoid robots. On top of that, quality is a notoriously hard concept to pinpoint. What does this mean for AI qu…
▽ More
AI solutions seem to appear in any and all application domains. As AI becomes more pervasive, the importance of quality assurance increases. Unfortunately, there is no consensus on what artificial intelligence means and interpretations range from simple statistical analysis to sentient humanoid robots. On top of that, quality is a notoriously hard concept to pinpoint. What does this mean for AI quality? In this paper, we share our working definition and a pragmatic approach to address the corresponding quality assurance with a focus on testing. Finally, we present our ongoing work on establishing the AIQ Meta-Testbed.
△ Less
Submitted 11 September, 2020;
originally announced September 2020.
-
Coloured noise time series as appropriate models for environmental variation in artificial evolutionary systems
Authors:
Matt Grove,
James M. Borg,
Fiona Polack
Abstract:
Ecological, environmental and geophysical time series consistently exhibit the characteristics of coloured (1/f^\b{eta}) noise. Here we briefly survey the literature on coloured noise, population persistence and related evolutionary dynamics, before introducing coloured noise as an appropriate model for environmental variation in artificial evolutionary systems. To illustrate and explore the effec…
▽ More
Ecological, environmental and geophysical time series consistently exhibit the characteristics of coloured (1/f^\b{eta}) noise. Here we briefly survey the literature on coloured noise, population persistence and related evolutionary dynamics, before introducing coloured noise as an appropriate model for environmental variation in artificial evolutionary systems. To illustrate and explore the effects of different noise colours, a simple evolutionary model that examines the trade-off between specialism and generalism in fluctuating environments is applied. The results of the model clearly demonstrate a need for greater generalism as environmental variability becomes `whiter', whilst specialisation is favoured as environmental variability becomes `redder'. Pink noise, sitting midway between white and red noise, is shown to be the point at which the pressures for generalism and specialism balance, providing some insight in to why `pinker' noise is increasingly being seen as an appropriate model of typical environmental variability. We go on to discuss how the results presented here feed in to a wider discussion on evolutionary responses to fluctuating environments. Ultimately we argue that Artificial Life as a field should embrace the use of coloured noise to produce models of environmental variability.
△ Less
Submitted 29 June, 2020;
originally announced June 2020.
-
A mechanism to promote social behaviour in household load balancing
Authors:
Nathan A. Brooks,
Simon T. Powers,
James M. Borg
Abstract:
Reducing the peak energy consumption of households is essential for the effective use of renewable energy sources, in order to ensure that as much household demand as possible can be met by renewable sources. This entails spreading out the use of high-powered appliances such as dishwashers and washing machines throughout the day. Traditional approaches to this problem have relied on differential p…
▽ More
Reducing the peak energy consumption of households is essential for the effective use of renewable energy sources, in order to ensure that as much household demand as possible can be met by renewable sources. This entails spreading out the use of high-powered appliances such as dishwashers and washing machines throughout the day. Traditional approaches to this problem have relied on differential pricing set by a centralised utility company. But this mechanism has not been effective in promoting widespread shifting of appliance usage. Here we consider an alternative decentralised mechanism, where agents receive an initial allocation of time-slots to use their appliances and can then exchange these with other agents. If agents are willing to be more flexible in the exchanges they accept, then overall satisfaction, in terms of the percentage of agents time-slot preferences that are satisfied, will increase. This requires a mechanism that can incentivise agents to be more flexible. Building on previous work, we show that a mechanism incorporating social capital - the tracking of favours given and received - can incentivise agents to act flexibly and give favours by accepting exchanges that do not immediately benefit them. We demonstrate that a mechanism that tracks favours increases the overall satisfaction of agents, and crucially allows social agents that give favours to outcompete selfish agents that do not under payoff-biased social learning. Thus, even completely self-interested agents are expected to learn to produce socially beneficial outcomes.
△ Less
Submitted 25 June, 2020;
originally announced June 2020.
-
Making Lab Sessions Mandatory -- On Student Work Distribution in a Gamified Project Course on Market-Driven Software Engineering
Authors:
Markus Borg
Abstract:
Unfair work distribution in student teams is a common issue in project-based learning. One contributing factor is that students are differently skilled developers. In a course with group work intertwining engineering and business aspects, we designed an intervention to help novice programmers, i.e., we introduced mandatory programming lab sessions. However, the intervention did not affect the work…
▽ More
Unfair work distribution in student teams is a common issue in project-based learning. One contributing factor is that students are differently skilled developers. In a course with group work intertwining engineering and business aspects, we designed an intervention to help novice programmers, i.e., we introduced mandatory programming lab sessions. However, the intervention did not affect the work distribution, showing that more is needed to balance the workload. Contrary to our goal, the intervention was very well received among experienced students, but unpopular with students weak at programming.
△ Less
Submitted 27 May, 2020;
originally announced May 2020.
-
Illuminating a Blind Spot in Digitalization -- Software Development in Sweden's Private and Public Sector
Authors:
Markus Borg,
Joakim Wernberg,
Thomas Olsson,
Ulrik Franke,
Martin Andersson
Abstract:
As Netscape co-founder Marc Andreessen famously remarked in 2011, software is eating the world - becoming a pervasive invisible critical infrastructure. Data on the distribution of software use and development in society is scarce, but we compile results from two novel surveys to provide a fuller picture of the role software plays in the public and private sectors in Sweden, respectively. Three ou…
▽ More
As Netscape co-founder Marc Andreessen famously remarked in 2011, software is eating the world - becoming a pervasive invisible critical infrastructure. Data on the distribution of software use and development in society is scarce, but we compile results from two novel surveys to provide a fuller picture of the role software plays in the public and private sectors in Sweden, respectively. Three out of ten Swedish firms, across industry sectors, develop software in-house. The corresponding figure for Sweden's government agencies is four out of ten, i.e., the public sector should not be underestimated. The digitalization of society will continue, thus the demand for software developers will further increase. Many private firms report that the limited supply of software developers in Sweden is directly affecting their expansion plans. Based on our findings, we outline directions that need additional research to allow evidence-informed policy-making. We argue that such work should ideally be conducted by academic researchers and national statistics agencies in collaboration.
△ Less
Submitted 26 May, 2020;
originally announced May 2020.
-
Recasting Navier-Stokes Equations
Authors:
M. H. Lakshminarayana Reddy,
S. Kokou Dadzie,
Raffaella Ocone,
Matthew K. Borg,
Jason M. Reese
Abstract:
Classical Navier-Stokes equations fail to describe some flows in both the compressible and incompressible configurations. In this article, we propose a new methodology based on transforming the fluid mass velocity vector field to obtain a new class of continuum models. We uncover a class of continuum models which we call the re-casted Navier-Stokes. They naturally exhibit the physics of previously…
▽ More
Classical Navier-Stokes equations fail to describe some flows in both the compressible and incompressible configurations. In this article, we propose a new methodology based on transforming the fluid mass velocity vector field to obtain a new class of continuum models. We uncover a class of continuum models which we call the re-casted Navier-Stokes. They naturally exhibit the physics of previously proposed models by different authors to substitute the original Navier-Stokes equations. The new models unlike the conventional Navier-Stokes appear as more complete forms of mass diffusion type continuum flow equations. They also form systematically a class of thermo-mechanically consistent hydrodynamic equations via the original equations. The plane wave analysis is performed to check their linear stability under small perturbations, which confirms that all re-casted models are spatially and temporally stable like their classical counterpart. We then use the Rayleigh-Brillouin scattering experiments to demonstrate that the re-casted equations may be better suited for explaining some of the experimental data where original Navier-Stokes fail.
△ Less
Submitted 9 September, 2019;
originally announced September 2019.
-
An Autonomous Performance Testing Framework using Self-Adaptive Fuzzy Reinforcement Learning
Authors:
Mahshid Helali Moghadam,
Mehrdad Saadatmand,
Markus Borg,
Markus Bohlin,
Björn Lisper
Abstract:
Test automation brings the potential to reduce costs and human effort, but several aspects of software testing remain challenging to automate. One such example is automated performance testing to find performance breaking points. Current approaches to tackle automated generation of performance test cases mainly involve using source code or system model analysis or use-case based techniques. Howeve…
▽ More
Test automation brings the potential to reduce costs and human effort, but several aspects of software testing remain challenging to automate. One such example is automated performance testing to find performance breaking points. Current approaches to tackle automated generation of performance test cases mainly involve using source code or system model analysis or use-case based techniques. However, source code and system models might not always be available at testing time. On the other hand, if the optimal performance testing policy for the intended objective in a testing process instead could be learned by the testing system, then test automation without advanced performance models could be possible. Furthermore, the learned policy could later be reused for similar software systems under test, thus leading to higher test efficiency. We propose SaFReL, a self-adaptive fuzzy reinforcement learning-based performance testing framework. SaFReL learns the optimal policy to generate performance test cases through an initial learning phase, then reuses it during a transfer learning phase, while kee** the learning running and updating the policy in the long term. Through multiple experiments on a simulated environment, we demonstrate that our approach generates the target performance test cases for different programs more efficiently than a typical testing process, and performs adaptively without access to source code and performance models.
△ Less
Submitted 30 July, 2020; v1 submitted 19 August, 2019;
originally announced August 2019.
-
Improved determination of the $β$-$\overlineν_e$ angular correlation coefficient $a$ in free neutron decay with the $a$SPECT spectrometer
Authors:
M. Beck,
F. Ayala Guardia,
S. Baeßler,
M. Borg,
F. Glück,
W. Heil,
J. Kahlenberg,
M. Klopf,
G. Konrad,
R. Maisonobe,
R. Muñoz Horta,
C. Schmidt,
U. Schmidt,
M. Simson,
T. Soldner,
R. Virot,
A. Wunderle,
O. Zimmer
Abstract:
We report on a precise measurement of the electron-antineutrino angular correlation ($a$ coefficient) in free neutron beta-decay from the $a$SPECT experiment. The $a$ coefficient is inferred from the recoil energy spectrum of the protons which are detected in 4$π$ by the $a$SPECT spectrometer using magnetic adiabatic collimation with an electrostatic filter. Data are presented from a 100 days run…
▽ More
We report on a precise measurement of the electron-antineutrino angular correlation ($a$ coefficient) in free neutron beta-decay from the $a$SPECT experiment. The $a$ coefficient is inferred from the recoil energy spectrum of the protons which are detected in 4$π$ by the $a$SPECT spectrometer using magnetic adiabatic collimation with an electrostatic filter. Data are presented from a 100 days run at the Institut Laue Langevin in 2013. The sources of systematic errors are considered and included in the final result. We obtain $a = -0.10430(84)$ which is the most precise measurement of the neutron $a$ coefficient to date. From this, the ratio of axial-vector to vector coupling constants is derived giving $|λ| = 1.2677(28)$.
△ Less
Submitted 6 June, 2020; v1 submitted 13 August, 2019;
originally announced August 2019.
-
Requirements Engineering for Machine Learning: Perspectives from Data Scientists
Authors:
Andreas Vogelsang,
Markus Borg
Abstract:
Machine learning (ML) is used increasingly in real-world applications. In this paper, we describe our ongoing endeavor to define characteristics and challenges unique to Requirements Engineering (RE) for ML-based systems. As a first step, we interviewed four data scientists to understand how ML experts approach elicitation, specification, and assurance of requirements and expectations. The results…
▽ More
Machine learning (ML) is used increasingly in real-world applications. In this paper, we describe our ongoing endeavor to define characteristics and challenges unique to Requirements Engineering (RE) for ML-based systems. As a first step, we interviewed four data scientists to understand how ML experts approach elicitation, specification, and assurance of requirements and expectations. The results show that changes in the development paradigm, i.e., from coding to training, also demands changes in RE. We conclude that development of ML systems demands requirements engineers to: (1) understand ML performance measures to state good functional requirements, (2) be aware of new quality requirements such as explainability, freedom from discrimination, or specific legal requirements, and (3) integrate ML specifics in the RE process. Our study provides a first contribution towards an RE methodology for ML systems.
△ Less
Submitted 13 August, 2019;
originally announced August 2019.
-
Sharing of vulnerability information among companies -- a survey of Swedish companies
Authors:
Thomas Olsson,
Martin Hell,
Martin Höst,
Ulrik Franke,
Markus Borg
Abstract:
Software products are rarely developed from scratch and vulnerabilities in such products might reside in parts that are either open source software or provided by another organization. Hence, the total cybersecurity of a product often depends on cooperation, explicit or implicit, between several organizations. We study the attitudes and practices of companies in software ecosystems towards sharing…
▽ More
Software products are rarely developed from scratch and vulnerabilities in such products might reside in parts that are either open source software or provided by another organization. Hence, the total cybersecurity of a product often depends on cooperation, explicit or implicit, between several organizations. We study the attitudes and practices of companies in software ecosystems towards sharing vulnerability information. Furthermore, we compare these practices to contemporary cybersecurity recommendations. This is performed through a questionnaire-based qualitative survey. The questionnaire is divided into two parts: the providers' perspective and the acquirers' perspective. The results show that companies are willing to share information with each other regarding vulnerabilities. Sharing is not considered to be harmful neither to the cybersecurity nor their business, even though a majority of the respondents consider vulnerability information sensitive. However, the companies, despite being open to sharing, are less inclined to proactively sharing vulnerability information. Furthermore, the providers do not perceive that there is a large interest in vulnerability information from their customers. Hence, the companies' overall attitude to sharing vulnerability information is passive but open. In contrast, contemporary cybersecurity guidelines recommend active disclosure and sharing among actors in an ecosystem.
△ Less
Submitted 11 June, 2019;
originally announced June 2019.
-
Video Game Development in a Rush: A Survey of the Global Game Jam Participants
Authors:
Markus Borg,
Vahid Garousi,
Anas Mahmoud,
Thomas Olsson,
Oskar Stålberg
Abstract:
Video game development is a complex endeavor, often involving complex software, large organizations, and aggressive release deadlines. Several studies have reported that periods of "crunch time" are prevalent in the video game industry, but there are few studies on the effects of time pressure. We conducted a survey with participants of the Global Game Jam (GGJ), a 48-hour hackathon. Based on 198…
▽ More
Video game development is a complex endeavor, often involving complex software, large organizations, and aggressive release deadlines. Several studies have reported that periods of "crunch time" are prevalent in the video game industry, but there are few studies on the effects of time pressure. We conducted a survey with participants of the Global Game Jam (GGJ), a 48-hour hackathon. Based on 198 responses, the results suggest that: (1) iterative brainstorming is the most popular method for conceptualizing initial requirements; (2) continuous integration, minimum viable product, scope management, version control, and stand-up meetings are frequently applied development practices; (3) regular communication, internal playtesting, and dynamic and proactive planning are the most common quality assurance activities; and (4) familiarity with agile development has a weak correlation with perception of success in GGJ. We conclude that GGJ teams rely on ad hoc approaches to development and face-to-face communication, and recommend some complementary practices with limited overhead. Furthermore, as our findings are similar to recommendations for software startups, we posit that game jams and the startup scene share contextual similarities. Finally, we discuss the drawbacks of systemic "crunch time" and argue that game jam organizers are in a good position to problematize the phenomenon.
△ Less
Submitted 31 March, 2019;
originally announced April 2019.
-
SZZ Unleashed: An Open Implementation of the SZZ Algorithm -- Featuring Example Usage in a Study of Just-in-Time Bug Prediction for the Jenkins Project
Authors:
Markus Borg,
Oscar Svensson,
Kristian Berg,
Daniel Hansson
Abstract:
Numerous empirical software engineering studies rely on detailed information about bugs. While issue trackers often contain information about when bugs were fixed, details about when they were introduced to the system are often absent. As a remedy, researchers often rely on the SZZ algorithm as a heuristic approach to identify bug-introducing software changes. Unfortunately, as reported in a recen…
▽ More
Numerous empirical software engineering studies rely on detailed information about bugs. While issue trackers often contain information about when bugs were fixed, details about when they were introduced to the system are often absent. As a remedy, researchers often rely on the SZZ algorithm as a heuristic approach to identify bug-introducing software changes. Unfortunately, as reported in a recent systematic literature review, few researchers have made their SZZ implementations publicly available. Consequently, there is a risk that research effort is wasted as new projects based on SZZ output need to initially reimplement the approach. Furthermore, there is a risk that newly developed (closed source) SZZ implementations have not been properly tested, thus conducting research based on their output might introduce threats to validity. We present SZZ Unleashed, an open implementation of the SZZ algorithm for git repositories. This paper describes our implementation along with a usage example for the Jenkins project, and conclude with an illustrative study on just-in-time bug prediction. We hope to continue evolving SZZ Unleashed on GitHub, and warmly invite the community to contribute.
△ Less
Submitted 19 August, 2019; v1 submitted 5 March, 2019;
originally announced March 2019.
-
Towards Structured Evaluation of Deep Neural Network Supervisors
Authors:
Jens Henriksson,
Christian Berger,
Markus Borg,
Lars Tornberg,
Cristofer Englund,
Sankar Raman Sathyamoorthy,
Stig Ursing
Abstract:
Deep Neural Networks (DNN) have improved the quality of several non-safety related products in the past years. However, before DNNs should be deployed to safety-critical applications, their robustness needs to be systematically analyzed. A common challenge for DNNs occurs when input is dissimilar to the training set, which might lead to high confidence predictions despite proper knowledge of the i…
▽ More
Deep Neural Networks (DNN) have improved the quality of several non-safety related products in the past years. However, before DNNs should be deployed to safety-critical applications, their robustness needs to be systematically analyzed. A common challenge for DNNs occurs when input is dissimilar to the training set, which might lead to high confidence predictions despite proper knowledge of the input. Several previous studies have proposed to complement DNNs with a supervisor that detects when inputs are outside the scope of the network. Most of these supervisors, however, are developed and tested for a selected scenario using a specific performance metric. In this work, we emphasize the need to assess and compare the performance of supervisors in a structured way. We present a framework constituted by four datasets organized in six test cases combined with seven evaluation metrics. The test cases provide varying complexity and include data from publicly available sources as well as a novel dataset consisting of images from simulated driving scenarios. The latter we plan to make publicly available. Our framework can be used to support DNN supervisor evaluation, which in turn could be used to motive development, validation, and deployment of DNNs in safety-critical applications.
△ Less
Submitted 7 March, 2019; v1 submitted 4 March, 2019;
originally announced March 2019.