Search | arXiv e-print repository

doi 10.1016/j.jss.2021.111096

On Misbehaviour and Fault Tolerance in Machine Learning Systems

Authors: Lalli Myllyaho, Mikko Raatikainen, Tomi Männistö, Jukka K. Nurminen, Tommi Mikkonen

Abstract: Machine learning (ML) provides us with numerous opportunities, allowing ML systems to adapt to new situations and contexts. At the same time, this adaptability raises uncertainties concerning the run-time product quality or dependability, such as reliability and security, of these systems. Systems can be tested and monitored, but this does not provide protection against faults and failures in adap… ▽ More Machine learning (ML) provides us with numerous opportunities, allowing ML systems to adapt to new situations and contexts. At the same time, this adaptability raises uncertainties concerning the run-time product quality or dependability, such as reliability and security, of these systems. Systems can be tested and monitored, but this does not provide protection against faults and failures in adapted ML systems themselves. We studied software designs that aim at introducing fault tolerance in ML systems so that possible problems in ML components of the systems can be avoided. The research was conducted as a case study, and its data was collected through five semi-structured interviews with experienced software architects. We present a conceptualisation of the misbehaviour of ML systems, the perceived role of fault tolerance, and the designs used. Common patterns to incorporating ML components in design in a fault tolerant fashion have started to emerge. ML models are, for example, guarded by monitoring the inputs and their distribution, and enforcing business rules on acceptable outputs. Multiple, specialised ML models are used to adapt to the variations and changes in the surrounding world, and simpler fall-over techniques like default outputs are put in place to have systems up and running in the face of problems. However, the general role of these patterns is not widely acknowledged. This is mainly due to the relative immaturity of using ML as part of a complete software system: the field still lacks established frameworks and practices beyond training to implement, operate, and maintain the software that utilises ML. ML software engineering needs further analysis and development on all fronts. △ Less

Submitted 16 September, 2021; originally announced September 2021.

Comments: 15 pages, 1 figure, 2 tables. The manuscript has been accepted to the Journal of Systems and Software

ACM Class: D.2.5; D.2.10; D.2.11

arXiv:2107.12190 [pdf, other]

doi 10.1016/j.jss.2021.111050

Systematic Literature Review of Validation Methods for AI Systems

Authors: Lalli Myllyaho, Mikko Raatikainen, Tomi Männistö, Tommi Mikkonen, Jukka K. Nurminen

Abstract: Context: Artificial intelligence (AI) has made its way into everyday activities, particularly through new techniques such as machine learning (ML). These techniques are implementable with little domain knowledge. This, combined with the difficulty of testing AI systems with traditional methods, has made system trustworthiness a pressing issue. Objective: This paper studies the methods used to va… ▽ More Context: Artificial intelligence (AI) has made its way into everyday activities, particularly through new techniques such as machine learning (ML). These techniques are implementable with little domain knowledge. This, combined with the difficulty of testing AI systems with traditional methods, has made system trustworthiness a pressing issue. Objective: This paper studies the methods used to validate practical AI systems reported in the literature. Our goal is to classify and describe the methods that are used in realistic settings to ensure the dependability of AI systems. Method: A systematic literature review resulted in 90 papers. Systems presented in the papers were analysed based on their domain, task, complexity, and applied validation methods. Results: The validation methods were synthesized into a taxonomy consisting of trial, simulation, model-centred validation, and expert opinion. Failure monitors, safety channels, redundancy, voting, and input and output restrictions are methods used to continuously validate the systems after deployment. Conclusions: Our results clarify existing strategies applied to validation. They form a basis for the synthesization, assessment, and refinement of AI system validation in research and guidelines for validating individual systems in practice. While various validation strategies have all been relatively widely applied, only few studies report on continuous validation. Keywords: artificial intelligence, machine learning, validation, testing, V&V, systematic literature review. △ Less

Submitted 26 July, 2021; originally announced July 2021.

Comments: 25 pages, 6 figures, 12 tables. The manuscript has been accepted to the Journal of Systems and Software

arXiv:2103.07290 [pdf, ps, other]

Challenges and Governance Solutions for Data Science Services based on Open Data and APIs

Authors: Juha-Pekka Joutsenlahti, Timo Lehtonen, Mikko Raatikainen, Elina Kettunen, Tommi Mikkonen

Abstract: Increasingly common open data and open application programming interfaces (APIs) together with the progress of data science -- such as artificial intelligence (AI) and especially machine learning (ML) -- create opportunities to build novel services by combining data from different sources. In this experience report, we describe our firsthand experiences on open data and in the domain of marine tra… ▽ More Increasingly common open data and open application programming interfaces (APIs) together with the progress of data science -- such as artificial intelligence (AI) and especially machine learning (ML) -- create opportunities to build novel services by combining data from different sources. In this experience report, we describe our firsthand experiences on open data and in the domain of marine traffic in Finland and Sweden and identified technological opportunities for novel services. We enumerate five challenges that we have encountered with the application of open data: relevant data, historical data, licensing, runtime quality, and API evolution. These challenges affect both business model and technical implementation. We discuss how these challenges could be alleviated by better governance practices for provided open APIs and data. △ Less

Submitted 12 March, 2021; originally announced March 2021.

Comments: 2021 IEEE/ACM 1st Workshop on AI Engineering - Software Engineering for AI (WAIN) of 43rd International Conference on Software Engineering (ICSE)

arXiv:2102.08638 [pdf, other]

Towards Utility-based Prioritization of Requirements in Open Source Environments

Authors: Alexander Felfernig, Martin Stettinger, Müslüm Atas, Ralph Samer, Jennifer Nerlich, Simon Scholz, Juha Tiihonen, Mikko Raatikainen

Abstract: Requirements Engineering in open source projects such as Eclipse faces the challenge of having to prioritize requirements for individual contributors in a more or less unobtrusive fashion. In contrast to conventional industrial software development projects, contributors in open source platforms can decide on their own which requirements to implement next. In this context, the main role of priorit… ▽ More Requirements Engineering in open source projects such as Eclipse faces the challenge of having to prioritize requirements for individual contributors in a more or less unobtrusive fashion. In contrast to conventional industrial software development projects, contributors in open source platforms can decide on their own which requirements to implement next. In this context, the main role of prioritization is to support contributors in figuring out the most relevant and interesting requirements to be implemented next and thus avoid time-consuming and inefficient search processes. In this paper, we show how utility-based prioritization approaches can be used to support contributors in conventional as well as in open source Requirements Engineering scenarios. As an example of an open source environment, we use Bugzilla. In this context, we also show how dependencies can be taken into account in utility-based prioritization processes. △ Less

Submitted 17 February, 2021; originally announced February 2021.

Comments: A. Felfernig, M. Stettinger, M. Atas, R. Samer, J. Nerlich, S. Scholz, J. Tiihonen, and M. Raatikainen. Towards Utility-based Prioritization of Requirements in Open Source Environments, 26th IEEE Conference on Requirements Engineering, pp. 406-411, Banff, Canada, 2018

arXiv:2102.08485 [pdf, other]

doi 10.1109/TSE.2022.3212166

Improved management of issue dependencies in issue trackers of large collaborative projects

Authors: Mikko Raatikainen, Quim Motger, Clara Marie Lüders, Xavier Franch, Lalli Myllyaho, Elina Kettunen, Jordi Marco, Juha Tiihonen, Mikko Halonen, Tomi Männistö

Abstract: Issue trackers, such as Jira, have become the prevalent collaborative tools in software engineering for managing issues, such as requirements, development tasks, and software bugs. However, issue trackers inherently focus on the lifecycle of single issues, although issues have and express dependencies on other issues that constitute issue dependency networks in large complex collaborative projects… ▽ More Issue trackers, such as Jira, have become the prevalent collaborative tools in software engineering for managing issues, such as requirements, development tasks, and software bugs. However, issue trackers inherently focus on the lifecycle of single issues, although issues have and express dependencies on other issues that constitute issue dependency networks in large complex collaborative projects. The objective of this study is to develop supportive solutions for the improved management of dependent issues in an issue tracker. This study follows the Design Science methodology, consisting of eliciting drawbacks and constructing and evaluating a solution and system. The study was carried out in the context of The Qt Company's Jira, which exemplifies an actively used, almost two-decade-old issue tracker with over 100,000 issues. The drawbacks capture how users operate with issue trackers to handle issue information in large, collaborative, and long-lived projects. The basis of the solution is to keep issues and dependencies as separate objects and automatically construct an issue graph. Dependency detections complement the issue graph by proposing missing dependencies, while consistency checks and diagnoses identify conflicting issue priorities and release assignments. Jira's plugin and service-based system architecture realize the functional and quality concerns of the system implementation. We show how to adopt the intelligent supporting techniques of an issue tracker in a complex use context and a large data-set. The solution considers an integrated and holistic system view, practical applicability and utility, and the practical characteristics of issue data, such as inherent incompleteness. △ Less

Submitted 15 November, 2022; v1 submitted 16 February, 2021; originally announced February 2021.

Comments: Accepted for publication in IEEE Transactions on Software Engineering. Published online 05 October 2022. 21 pages, 3 figures, 8 tables

ACM Class: D.2.1

arXiv:1909.07699 [pdf, other]

OpenReq Issue Link Map: A Tool to Visualize Issue Links in Jira

Authors: Clara Marie Lüders, Mikko Raatikainen, Joaquim Motger, Walid Maalej

Abstract: Managing software projects gets more and more complicated with an increasing project and product size. To cope with this complexity, many organizations use issue tracking systems, where tasks, bugs, and requirements are stored as issues. Unfortunately, managing software projects might remain chaotic even when using issue trackers. Particularly for long lasting projects with a large number of issue… ▽ More Managing software projects gets more and more complicated with an increasing project and product size. To cope with this complexity, many organizations use issue tracking systems, where tasks, bugs, and requirements are stored as issues. Unfortunately, managing software projects might remain chaotic even when using issue trackers. Particularly for long lasting projects with a large number of issues and links between them, it is often hard to maintain an overview of the dependencies, especially when dozens of new issues get reported every day. We present a Jira plug-in that supports developers, project managers, and product owners in managing and overviewing issues and their dependencies. Our tool visualizes the issue links, helps to find missing or unknown links between issues, and detects inconsistencies. △ Less

Submitted 17 September, 2019; originally announced September 2019.

arXiv:1808.02284 [pdf, other]

Needs and Challenges for a Platform to Support Large-scale Requirements Engineering. A Multiple Case Study

Authors: Davide Fucci, Cristina Palomares, Dolors Costal, Xavier Franch, Mikko Raatikainen, Martin Stettinger, Zijad Kurtanovic, Tero Kojo, Lars Koenig, Andreas Falkner, Gottfried Schenner, Fabrizio Brasca, Tomi Männistö, Alexander Felfernig, Walid Maalej

Abstract: Background: Requirement engineering is often considered a critical activity in system development projects. The increasing complexity of software, as well as number and heterogeneity of stakeholders, motivate the development of methods and tools for improving large-scale requirement engineering. Aims: The empirical study presented in this paper aims to identify and understand the characteristics a… ▽ More Background: Requirement engineering is often considered a critical activity in system development projects. The increasing complexity of software, as well as number and heterogeneity of stakeholders, motivate the development of methods and tools for improving large-scale requirement engineering. Aims: The empirical study presented in this paper aims to identify and understand the characteristics and challenges of a platform, as desired by experts, to support requirement engineering for individual stakeholders, based on the current pain-points of their organizations when dealing with a large number requirements. Method: We conducted a multiple case study with three companies in different domains. We collected data through ten semi-structured interviews with experts from these companies. Results: The main pain-point for stakeholders is handling the vast amount of data from different sources. The foreseen platform should leverage such data to manage changes in requirements according to customers' and users' preferences. It should also offer stakeholders an estimation of how long a requirements engineering task will take to complete, along with an easier requirements dependency identification and requirements reuse strategy. Conclusions: The findings provide empirical evidence about how practitioners wish to improve their requirement engineering processes and tools. The insights are a starting point for in-depth investigations into the problems and solutions presented. Practitioners can use the results to improve existing or design new practices and tools. △ Less

Submitted 6 September, 2018; v1 submitted 7 August, 2018; originally announced August 2018.

Comments: Accepted for publication to the 12th International Symposium on Empirical Software Engineering and Measurement (ESEM18)

Showing 1–7 of 7 results for author: Raatikainen, M