-
Assuring the Machine Learning Lifecycle: Desiderata, Methods, and Challenges
Authors:
Rob Ashmore,
Radu Calinescu,
Colin Paterson
Abstract:
Machine learning has evolved into an enabling technology for a wide range of highly successful applications. The potential for this success to continue and accelerate has placed machine learning (ML) at the top of research, economic and political agendas. Such unprecedented interest is fuelled by a vision of ML applicability extending to healthcare, transportation, defence and other domains of gre…
▽ More
Machine learning has evolved into an enabling technology for a wide range of highly successful applications. The potential for this success to continue and accelerate has placed machine learning (ML) at the top of research, economic and political agendas. Such unprecedented interest is fuelled by a vision of ML applicability extending to healthcare, transportation, defence and other domains of great societal importance. Achieving this vision requires the use of ML in safety-critical applications that demand levels of assurance beyond those needed for current ML applications. Our paper provides a comprehensive survey of the state-of-the-art in the assurance of ML, i.e. in the generation of evidence that ML is sufficiently safe for its intended use. The survey covers the methods capable of providing such evidence at different stages of the machine learning lifecycle, i.e. of the complex, iterative process that starts with the collection of the data used to train an ML component for a system, and ends with the deployment of that component within the system. The paper begins with a systematic presentation of the ML lifecycle and its stages. We then define assurance desiderata for each stage, review existing methods that contribute to achieving these desiderata, and identify open challenges that require further research.
△ Less
Submitted 10 May, 2019;
originally announced May 2019.
-
Local Reasoning for Parameterized First Order Protocols
Authors:
Rylo Ashmore,
Arie Gurfinkel,
Richard Trefler
Abstract:
First Order Logic (FOL) is a powerful reasoning tool for program verification. Recent work on Ivy shows that FOL is well suited for verification of parameterized distributed systems. However, specifying many natural objects, such as a ring topology, in FOL is unexpectedly inconvenient. We present a framework based on FOL for specifying distributed multi-process protocols in a process-local manner…
▽ More
First Order Logic (FOL) is a powerful reasoning tool for program verification. Recent work on Ivy shows that FOL is well suited for verification of parameterized distributed systems. However, specifying many natural objects, such as a ring topology, in FOL is unexpectedly inconvenient. We present a framework based on FOL for specifying distributed multi-process protocols in a process-local manner together with an implicit network topology. In the specification framework, we provide an auto-active analysis technique to reason about the protocols locally, in a process-modular way. Our goal is to mirror the way designers often describe and reason about protocols. By hiding the topology behind the FOL structure, we simplify the modelling, but complicate the reasoning. To deal with that, we use an oracle for the topology to develop a sound and relatively complete proof rule that reduces reasoning about the implicit topology back to pure FOL. This completely avoids the need to axiomatize the topology. Using the rule, we establish a property that reduces verification to a fixed number of processes bounded by the size of local neighbourhoods. We show how to use the framework on two examples, including leader election on a ring.
△ Less
Submitted 7 March, 2019;
originally announced March 2019.
-
Testing Deep Neural Networks
Authors:
Youcheng Sun,
Xiaowei Huang,
Daniel Kroening,
James Sharp,
Matthew Hill,
Rob Ashmore
Abstract:
Deep neural networks (DNNs) have a wide range of applications, and software employing them must be thoroughly tested, especially in safety-critical domains. However, traditional software test coverage metrics cannot be applied directly to DNNs. In this paper, inspired by the MC/DC coverage criterion, we propose a family of four novel test criteria that are tailored to structural features of DNNs a…
▽ More
Deep neural networks (DNNs) have a wide range of applications, and software employing them must be thoroughly tested, especially in safety-critical domains. However, traditional software test coverage metrics cannot be applied directly to DNNs. In this paper, inspired by the MC/DC coverage criterion, we propose a family of four novel test criteria that are tailored to structural features of DNNs and their semantics. We validate the criteria by demonstrating that the generated test inputs guided via our proposed coverage criteria are able to capture undesired behaviours in a DNN. Test cases are generated using a symbolic approach and a gradient-based heuristic search. By comparing them with existing methods, we show that our criteria achieve a balance between their ability to find bugs (proxied using adversarial examples) and the computational cost of test case generation. Our experiments are conducted on state-of-the-art DNNs obtained using popular open source datasets, including MNIST, CIFAR-10 and ImageNet.
△ Less
Submitted 15 April, 2019; v1 submitted 10 March, 2018;
originally announced March 2018.
-
The Utility and Practicality of Quantifying Software Reliability
Authors:
Rob Ashmore
Abstract:
We argue that quantifying software reliability is important in demonstrating that system-level risks are As Low As Reasonably Practicable (ALARP). Furthermore, we demonstrate that such quantification is possible in at least one meaningful case. It is, however, unlikely to be practical in every case. This means it is unlikely to be included as an explicit objective in standards. Hence, for those ca…
▽ More
We argue that quantifying software reliability is important in demonstrating that system-level risks are As Low As Reasonably Practicable (ALARP). Furthermore, we demonstrate that such quantification is possible in at least one meaningful case. It is, however, unlikely to be practical in every case. This means it is unlikely to be included as an explicit objective in standards. Hence, for those cases where software reliability can be quantified, merely following a standard may lead to risk-reduction opportunities being missed.
△ Less
Submitted 7 May, 2014; v1 submitted 29 April, 2014;
originally announced April 2014.