Search | arXiv e-print repository

NLP Verification: Towards a General Methodology for Certifying Robustness

Authors: Marco Casadio, Tanvi Dinkar, Ekaterina Komendantskaya, Luca Arnaboldi, Matthew L. Daggitt, Omri Isac, Guy Katz, Verena Rieser, Oliver Lemon

Abstract: Deep neural networks have exhibited substantial success in the field of Natural Language Processing and ensuring their safety and reliability is crucial: there are safety critical contexts where such models must be robust to variability or attack, and give guarantees over their output. Unlike Computer Vision, NLP lacks a unified verification methodology and, despite recent advancements in literatu… ▽ More Deep neural networks have exhibited substantial success in the field of Natural Language Processing and ensuring their safety and reliability is crucial: there are safety critical contexts where such models must be robust to variability or attack, and give guarantees over their output. Unlike Computer Vision, NLP lacks a unified verification methodology and, despite recent advancements in literature, they are often light on the pragmatical issues of NLP verification. In this paper, we attempt to distil and evaluate general components of an NLP verification pipeline, that emerges from the progress in the field to date. Our contributions are two-fold. Firstly, we give a general (i.e. algorithm-independent) characterisation of verifiable subspaces that result from embedding sentences into continuous spaces. We identify, and give an effective method to deal with, the technical challenge of semantic generalisability of verified subspaces; and propose it as a standard metric in the NLP verification pipelines (alongside with the standard metrics of model accuracy and model verifiability). Secondly, we propose a general methodology to analyse the effect of the embedding gap -- a problem that refers to the discrepancy between verification of geometric subspaces, and the semantic meaning of sentences which the geometric subspaces are supposed to represent. In extreme cases, poor choices in embedding of sentences may invalidate verification results. We propose a number of practical NLP methods that can help to quantify the effects of the embedding gap; and in particular we propose the metric of falsifiability of semantic subspaces as another fundamental metric to be reported as part of the NLP verification pipeline. We believe that together these general principles pave the way towards a more consolidated and effective development of this new domain. △ Less

Submitted 31 May, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

arXiv:2402.01353 [pdf, ps, other]

Efficient compilation of expressive problem space specifications to neural network solvers

Authors: Matthew L. Daggitt, Wen Kokke, Robert Atkey

Abstract: Recent work has described the presence of the embedding gap in neural network verification. On one side of the gap is a high-level specification about the network's behaviour, written by a domain expert in terms of the interpretable problem space. On the other side are a logically-equivalent set of satisfiability queries, expressed in the uninterpretable embedding space in a form suitable for neur… ▽ More Recent work has described the presence of the embedding gap in neural network verification. On one side of the gap is a high-level specification about the network's behaviour, written by a domain expert in terms of the interpretable problem space. On the other side are a logically-equivalent set of satisfiability queries, expressed in the uninterpretable embedding space in a form suitable for neural network solvers. In this paper we describe an algorithm for compiling the former to the latter. We explore and overcome complications that arise from targeting neural network solvers as opposed to standard SMT solvers. △ Less

Submitted 24 January, 2024; originally announced February 2024.

arXiv:2401.06379 [pdf, other]

Vehicle: Bridging the Embedding Gap in the Verification of Neuro-Symbolic Programs

Authors: Matthew L. Daggitt, Wen Kokke, Robert Atkey, Natalia Slusarz, Luca Arnaboldi, Ekaterina Komendantskaya

Abstract: Neuro-symbolic programs -- programs containing both machine learning components and traditional symbolic code -- are becoming increasingly widespread. However, we believe that there is still a lack of a general methodology for verifying these programs whose correctness depends on the behaviour of the machine learning components. In this paper, we identify the ``embedding gap'' -- the lack of techn… ▽ More Neuro-symbolic programs -- programs containing both machine learning components and traditional symbolic code -- are becoming increasingly widespread. However, we believe that there is still a lack of a general methodology for verifying these programs whose correctness depends on the behaviour of the machine learning components. In this paper, we identify the ``embedding gap'' -- the lack of techniques for linking semantically-meaningful ``problem-space'' properties to equivalent ``embedding-space'' properties -- as one of the key issues, and describe Vehicle, a tool designed to facilitate the end-to-end verification of neural-symbolic programs in a modular fashion. Vehicle provides a convenient language for specifying ``problem-space'' properties of neural networks and declaring their relationship to the ``embedding-space", and a powerful compiler that automates interpretation of these properties in the language of a chosen machine-learning training environment, neural network verifier, and interactive theorem prover. We demonstrate Vehicle's utility by using it to formally verify the safety of a simple autonomous car equipped with a neural network controller. △ Less

Submitted 12 January, 2024; originally announced January 2024.

arXiv:2305.04003 [pdf, other]

ANTONIO: Towards a Systematic Method of Generating NLP Benchmarks for Verification

Authors: Marco Casadio, Luca Arnaboldi, Matthew L. Daggitt, Omri Isac, Tanvi Dinkar, Daniel Kienitz, Verena Rieser, Ekaterina Komendantskaya

Abstract: Verification of machine learning models used in Natural Language Processing (NLP) is known to be a hard problem. In particular, many known neural network verification methods that work for computer vision and other numeric datasets do not work for NLP. Here, we study technical reasons that underlie this problem. Based on this analysis, we propose practical methods and heuristics for preparing NLP… ▽ More Verification of machine learning models used in Natural Language Processing (NLP) is known to be a hard problem. In particular, many known neural network verification methods that work for computer vision and other numeric datasets do not work for NLP. Here, we study technical reasons that underlie this problem. Based on this analysis, we propose practical methods and heuristics for preparing NLP datasets and models in a way that renders them amenable to known verification methods based on abstract interpretation. We implement these methods as a Python library called ANTONIO that links to the neural network verifiers ERAN and Marabou. We perform evaluation of the tool using an NLP dataset R-U-A-Robot suggested as a benchmark for verifying legally critical NLP applications. We hope that, thanks to its general applicability, this work will open novel possibilities for including NLP verification problems into neural network verification competitions, and will popularise NLP problems within this community. △ Less

Submitted 15 August, 2023; v1 submitted 6 May, 2023; originally announced May 2023.

Comments: To appear in proceedings of 6th Workshop on Formal Methods for ML-Enabled Autonomous Systems (Affiliated with CAV 2023)

arXiv:2303.10650 [pdf, other]

Logic of Differentiable Logics: Towards a Uniform Semantics of DL

Authors: Natalia Ślusarz, Ekaterina Komendantskaya, Matthew L. Daggitt, Robert Stewart, Kathrin Stark

Abstract: Differentiable logics (DL) have recently been proposed as a method of training neural networks to satisfy logical specifications. A DL consists of a syntax in which specifications are stated and an interpretation function that translates expressions in the syntax into loss functions. These loss functions can then be used during training with standard gradient descent algorithms. The variety of exi… ▽ More Differentiable logics (DL) have recently been proposed as a method of training neural networks to satisfy logical specifications. A DL consists of a syntax in which specifications are stated and an interpretation function that translates expressions in the syntax into loss functions. These loss functions can then be used during training with standard gradient descent algorithms. The variety of existing DLs and the differing levels of formality with which they are treated makes a systematic comparative study of their properties and implementations difficult. This paper remedies this problem by suggesting a meta-language for defining DLs that we call the Logic of Differentiable Logics, or LDL. Syntactically, it generalises the syntax of existing DLs to FOL, and for the first time introduces the formalism for reasoning about vectors and learners. Semantically, it introduces a general interpretation function that can be instantiated to define loss functions arising from different existing DLs. We use LDL to establish several theoretical properties of existing DLs, and to conduct their empirical study in neural network verification. △ Less

Submitted 5 October, 2023; v1 submitted 19 March, 2023; originally announced March 2023.

Comments: LPAR'23

arXiv:2207.06741 [pdf, ps, other]

Differentiable Logics for Neural Network Training and Verification

Authors: Natalia Slusarz, Ekaterina Komendantskaya, Matthew L. Daggitt, Robert Stewart

Abstract: The rising popularity of neural networks (NNs) in recent years and their increasing prevalence in real-world applications have drawn attention to the importance of their verification. While verification is known to be computationally difficult theoretically, many techniques have been proposed for solving it in practice. It has been observed in the literature that by default neural networks rarely… ▽ More The rising popularity of neural networks (NNs) in recent years and their increasing prevalence in real-world applications have drawn attention to the importance of their verification. While verification is known to be computationally difficult theoretically, many techniques have been proposed for solving it in practice. It has been observed in the literature that by default neural networks rarely satisfy logical constraints that we want to verify. A good course of action is to train the given NN to satisfy said constraint prior to verifying them. This idea is sometimes referred to as continuous verification, referring to the loop between training and verification. Usually training with constraints is implemented by specifying a translation for a given formal logic language into loss functions. These loss functions are then used to train neural networks. Because for training purposes these functions need to be differentiable, these translations are called differentiable logics (DL). This raises several research questions. What kind of differentiable logics are possible? What difference does a specific choice of DL make in the context of continuous verification? What are the desirable criteria for a DL viewed from the point of view of the resulting loss function? In this extended abstract we will discuss and answer these questions. △ Less

Submitted 14 July, 2022; originally announced July 2022.

Comments: FOMLAS'22 paper

arXiv:2206.14575 [pdf, other]

Why Robust Natural Language Understanding is a Challenge

Authors: Marco Casadio, Ekaterina Komendantskaya, Verena Rieser, Matthew L. Daggitt, Daniel Kienitz, Luca Arnaboldi, Wen Kokke

Abstract: With the proliferation of Deep Machine Learning into real-life applications, a particular property of this technology has been brought to attention: robustness Neural Networks notoriously present low robustness and can be highly sensitive to small input perturbations. Recently, many methods for verifying networks' general properties of robustness have been proposed, but they are mostly applied in… ▽ More With the proliferation of Deep Machine Learning into real-life applications, a particular property of this technology has been brought to attention: robustness Neural Networks notoriously present low robustness and can be highly sensitive to small input perturbations. Recently, many methods for verifying networks' general properties of robustness have been proposed, but they are mostly applied in Computer Vision. In this paper we propose a Verification specification for Natural Language Understanding classification based on larger regions of interest, and we discuss the challenges of such task. We observe that, although the data is almost linearly separable, the verifier struggles to output positive results and we explain the problems and implications. △ Less

Submitted 13 July, 2022; v1 submitted 21 June, 2022; originally announced June 2022.

arXiv:2202.05207 [pdf, other]

Vehicle: Interfacing Neural Network Verifiers with Interactive Theorem Provers

Authors: Matthew L. Daggitt, Wen Kokke, Robert Atkey, Luca Arnaboldi, Ekaterina Komendantskya

Abstract: Verification of neural networks is currently a hot topic in automated theorem proving. Progress has been rapid and there are now a wide range of tools available that can verify properties of networks with hundreds of thousands of nodes. In theory this opens the door to the verification of larger control systems that make use of neural network components. However, although work has managed to incor… ▽ More Verification of neural networks is currently a hot topic in automated theorem proving. Progress has been rapid and there are now a wide range of tools available that can verify properties of networks with hundreds of thousands of nodes. In theory this opens the door to the verification of larger control systems that make use of neural network components. However, although work has managed to incorporate the results of these verifiers to prove larger properties of individual systems, there is currently no general methodology for bridging the gap between verifiers and interactive theorem provers (ITPs). In this paper we present Vehicle, our solution to this problem. Vehicle is equipped with an expressive domain specific language for stating neural network specifications which can be compiled to both verifiers and ITPs. It overcomes previous issues with maintainability and scalability in similar ITP formalisations by using a standard ONNX file as the single canonical representation of the network. We demonstrate its utility by using it to connect the neural network verifier Marabou to Agda and then formally verifying that a car steered by a neural network never leaves the road, even in the face of an unpredictable cross wind and imperfect sensors. The network has over 20,000 nodes, and therefore this proof represents an improvement of 3 orders of magnitude over prior proofs about neural network enhanced systems in ITPs. △ Less

Submitted 10 February, 2022; originally announced February 2022.

arXiv:2106.01184 [pdf, other]

Formally Verified Convergence of Policy-Rich DBF Routing Protocols

Authors: Matthew L. Daggitt, Timothy G. Griffin

Abstract: In this paper we present new general convergence results about the behaviour of the Distributed Bellman-Ford (DBF) family of routing protocols, which includes distance-vector protocols (e.g. RIP) and path-vector protocols (e.g. BGP). Our results apply to ``policy-rich" protocols, by which we mean protocols that can have complex policies (e.g. conditional route transformations) that violate traditi… ▽ More In this paper we present new general convergence results about the behaviour of the Distributed Bellman-Ford (DBF) family of routing protocols, which includes distance-vector protocols (e.g. RIP) and path-vector protocols (e.g. BGP). Our results apply to ``policy-rich" protocols, by which we mean protocols that can have complex policies (e.g. conditional route transformations) that violate traditional assumptions made in the standard presentation of Bellman-Ford protocols. First, we propose a new algebraic model for abstract routing problems which has fewer primitives than previous models and can represent more expressive policy languages. The new model is also the first to allow concurrent reasoning about distance-vector and path-vector protocols. Second, we demonstrate how DBF routing protocols are instances of a larger class of asynchronous iterative algorithms, for which there already exist powerful results about convergence. These results allow us to build upon conditions previously shown by Sobrinho to be sufficient and necessary for the convergence of path-vector protocols and strengthen them: we show that, with a minor modification, they also apply to distance-vector protocols; we prove they guarantee that the final routing solution reached is unique, thereby eliminating the possibility of anomalies such as BGP wedgies; we relax the model of asynchronous communication, showing that the results still hold if routing messages can be lost, reordered, and duplicated. Thirdly, our model and our accompanying theoretical results have been fully formalised in the Agda theorem prover. The resulting library is a powerful tool for quickly prototy** and formally verifying new policy languages. As an example, we formally verify the correctness of a policy language with many of the features of BGP including communities, conditional policy, path-inflation and route filtering. △ Less

Submitted 9 January, 2023; v1 submitted 2 June, 2021; originally announced June 2021.

Comments: 16 pages + 9 pages of references and appendices

arXiv:2105.11267 [pdf, ps, other]

Actions You Can Handle: Dependent Types for AI Plans

Authors: Alasdair Hill, Ekaterina Komendantskaya, Matthew L. Daggitt, Ronald P. A. Petrick

Abstract: Verification of AI is a challenge that has engineering, algorithmic and programming language components. For example, AI planners are deployed to model actions of autonomous agents. They comprise a number of searching algorithms that, given a set of specified properties, find a sequence of actions that satisfy these properties. Although AI planners are mature tools from the algorithmic and enginee… ▽ More Verification of AI is a challenge that has engineering, algorithmic and programming language components. For example, AI planners are deployed to model actions of autonomous agents. They comprise a number of searching algorithms that, given a set of specified properties, find a sequence of actions that satisfy these properties. Although AI planners are mature tools from the algorithmic and engineering points of view, they have limitations as programming languages. Decidable and efficient automated search entails restrictions on the syntax of the language, prohibiting use of higher-order properties or recursion. This paper proposes a methodology for embedding plans produced by AI planners into dependently-typed language Agda, which enables users to reason about and verify more general and abstract properties of plans, and also provides a more holistic programming language infrastructure for modelling plan execution. △ Less

Submitted 20 July, 2021; v1 submitted 24 May, 2021; originally announced May 2021.

Comments: 14 pages, 5 figures, Accepted to TyDe 2021

MSC Class: 68Q60; 03B38; 68T27; 03B70 ACM Class: D.3; F.3; F.4; I.2

arXiv:2104.01396 [pdf, other]

Neural Network Robustness as a Verification Property: A Principled Case Study

Authors: Marco Casadio, Ekaterina Komendantskaya, Matthew L. Daggitt, Wen Kokke, Guy Katz, Guy Amir, Idan Refaeli

Abstract: Neural networks are very successful at detecting patterns in noisy data, and have become the technology of choice in many fields. However, their usefulness is hampered by their susceptibility to adversarial attacks. Recently, many methods for measuring and improving a network's robustness to adversarial perturbations have been proposed, and this growing body of research has given rise to numerous… ▽ More Neural networks are very successful at detecting patterns in noisy data, and have become the technology of choice in many fields. However, their usefulness is hampered by their susceptibility to adversarial attacks. Recently, many methods for measuring and improving a network's robustness to adversarial perturbations have been proposed, and this growing body of research has given rise to numerous explicit or implicit notions of robustness. Connections between these notions are often subtle, and a systematic comparison between them is missing in the literature. In this paper we begin addressing this gap, by setting up general principles for the empirical analysis and evaluation of a network's robustness as a mathematical property - during the network's training phase, its verification, and after its deployment. We then apply these principles and conduct a case study that showcases the practical benefits of our general approach. △ Less

Submitted 13 July, 2022; v1 submitted 3 April, 2021; originally announced April 2021.

Comments: 11 pages, CAV 2022

arXiv:2012.01686 [pdf, ps, other]

Dynamic Asynchronous Iterations

Authors: Matthew L. Daggitt, Timothy G. Griffin

Abstract: Many problems can be solved by iteration by multiple participants (processors, servers, routers etc.). Previous mathematical models for such asynchronous iterations assume a single function being iterated by a fixed set of participants. We will call such iterations static since the system's configuration does not change. However in several real-world examples, such as inter-domain routing, both th… ▽ More Many problems can be solved by iteration by multiple participants (processors, servers, routers etc.). Previous mathematical models for such asynchronous iterations assume a single function being iterated by a fixed set of participants. We will call such iterations static since the system's configuration does not change. However in several real-world examples, such as inter-domain routing, both the function being iterated and the set of participants change frequently while the system continues to function. In this paper we extend Uresin & Dubois's work on static iterations to develop a model for this class of "dynamic" or "always on" asynchronous iterations. We explore what it means for such an iteration to be implemented correctly, and then prove two different conditions on the set of iterated functions that guarantee the full asynchronous iteration satisfies this new definition of correctness. These results have been formalised in Agda and the resulting library is publicly available. △ Less

Submitted 19 July, 2021; v1 submitted 2 December, 2020; originally announced December 2020.

Showing 1–12 of 12 results for author: Daggitt, M L