¹¹institutetext: Bar-Ilan University, ¹¹email: {ravivav1, alufmem, hillelk}@biu.ac.il ²²institutetext: Hebrew University of Jerusalem, ²²email: {yizhak.elboher, g.katz}@mail.huji.ac.il

Formal Verification of Object Detection

Avraham Raviv 11 Yizhak Y. Elboher 22 Michelle Aluf-Medina 11 Yael Leibovich Weiss 11 Omer Cohen 11 Roy Assa 11 Guy Katz 22 Hillel Kugler 11

Abstract

Deep Neural Networks (DNNs) are ubiquitous in real-world applications, yet they remain vulnerable to errors and adversarial attacks. This work tackles the challenge of applying formal verification to ensure the safety of computer vision models, extending verification beyond image classification to object detection. We propose a general formulation for certifying the robustness of object detection models using formal verification and outline implementation strategies compatible with state-of-the-art verification tools. Our approach enables the application of these tools, originally designed for verifying classification models, to object detection. We define various attacks for object detection, illustrating the diverse ways adversarial inputs can compromise neural network outputs. Our experiments, conducted on several common datasets and networks, reveal potential errors in object detection models, highlighting system vulnerabilities and emphasizing the need for expanding formal verification to these new domains. This work paves the way for further research in integrating formal verification across a broader range of computer vision applications.

Keywords:

Adversarial Attacks, Formal Verification, Object Detection.

1 Introduction

Deep Neural Networks (DNNs) have demonstrated remarkable capabilities [1], consistently achieving state-of-the-art performance across a wide range of domains, such as Computer Vision (CV)[2, 3], Natural Language Processing (NLP)[4, 5, 6], Audio[7, 8] and Video [9, 10]. Despite their significant success in many tasks, integrating DNNs in safety-critical industries like autonomous vehicles, aviation, financial services, and healthcare is very limited: their statistical nature makes them vulnerable to both innocent and malicious alterations, such that even small, often imperceptible, perturbations to the input can lead to critical errors.

This vulnerability has spawned an entire branch of research dedicated to the attack and defense of DNNs. Numerous methods have been proposed to exploit these weaknesses, while in contrast, others have been developed to defend against such attacks and enhance robustness. A key family of these methods is based on formal verification[11, 12], which incorporates formal methods into the realm of DNNs to verify neural networks. Initially, the focus was primarily on classification networks and, to some extent, tabular data. More recently, interest has expanded to include applications in Natural Language Processing[13], Reinforcement Learning[14, 15], and robotics[16, 17]. However, the primary focus remains on computer vision, particularly image classification. Other computer vision tasks, such as object detection and segmentation, have yet to be explored using formal verification tools.

In this work, we make a first attempt to close this gap by extending current verifiers, which are designed for image classification, to be suitable for object detection tasks. We propose a general formulation that covers these tasks and outline the method for their verification. Beyond the formulation, we not only explain how to leverage verifiers for new tasks, but also define different attacks for each task. This involves understanding the task and identifying the various ways to attack the output, leading to potential errors. Once these attacks are clearly defined, the final step is to adapt the verifiers, which often require minimal additional adaptation. Alongside the theoretical formulation, we provide examples of several attacks on object detectors, which demonstrate the need and importance of extending formal methods for DNNs to broader computer vision tasks.

Refer to caption — Figure 1: Example of Attacked Images. Illustrations of adversarial attacks on OD models, including misclassification, misdetections (extra objects in the example), and mislocalization. Note: Noise in these examples is amplified $100\times$ for visibility.

First, we provide a comprehensive framework to define object detection as a formal verification problem. By establishing clear definitions, we set the stage for systematically assessing the robustness of models against various attacks. Second, we show how existing verifiers, primarily designed for classification tasks, can be adjusted to handle these new formulations. This adaptation offers practical insights into utilizing current tools to verify a broader range of computer vision tasks with minimal changes. Lastly, we implement this framework and demonstrate its effectiveness through a series of examples. These examples highlight the practical application of our approach in evaluating and improving model robustness across different computer vision tasks. Figure 1 illustrates an example that encapsulates the core idea, where our method causes a detector to fail in various ways, highlighting the critical challenges this approach aims to address.

The rest of this paper is organized as follows: Sec. 2 provides an overview of related work on neural network verification and adversarial attacks in computer vision. Sec. 3 outlines our methodology, presenting the formulation of several computer vision tasks as formal verification problems and detailing how verifiers can be adapted to handle these tasks. Sec. 4 presents the results, demonstrating the effectiveness of our approach through examples. Finally, Sec. 5 concludes with insights and suggests future research directions.

2 Related Work

This work bridges the gap between object detection and formal verification of neural networks. Among various existing object detection methods, including R-CNN [18, 19, 20], SSD [21], and YOLO[22], the latter is the most efficient in terms of inference speed. YOLO has undergone several improvements in precision and performance [23, 24, 25, 26, 27, 28, 29, 30, 31, 32], as well as in scaling and multi-tasking capabilities [33, 34].

Our research builds upon the formal verification of deep neural networks. There are multiple verification approaches and tools, ranging from SMT-Based methods [35, 36], symbolic interval propagation [37, 38], MILP and LP-solvers, to abstract interpretation methods [39, 40, 41, 42] with branch-and-bound mechanisms [43, 44, 45]. We utilize Alpha-Beta-CROWN, which supports networks with multiple outputs and a variety of activation functions. This tool has won the last two international competitions in DNN verification [46, 47], which compare verification tools by their precision and efficiency.

A recent study by Cohen et al. [48] on proving properties of object detection models presents several key differences from our work. Firstly, we are pioneering the application of a state-of-the-art verification tool. Consequently, our method achieves completeness and successfully incorporates counterexample extraction and adversarial attacks. In addition, Cohen et al.’s method has currently been applied to small networks, while our approach has been integrated with the YOLO model, showcasing its practical applicability.

Range analysis methods [49] can compute a guaranteed range for the outputs of DNNs using a combination of local search and linear programming. Future work can investigate applying such methods to certify object detection models or identify adversarial attacks. This requires defining the constraints on allowed inputs and analyzing whether the guaranteed ranges enable adversarial attacks, and demonstrating this can be done in a scalable way.

In addition to YOLO, formal verification of DNNs has been applied in NLP [13], reinforcement learning [50, 51, 14, 16, 52, 53], explainability [54, 15], robustness assessment [55], binarized neural networks [56], generalization [57, 58], proof production [59] and delta-debugging[60], and was enhanced by classical optimization method in formal verification, e.g. abstraction [61, 52, 62, 63] and residual reasoning [64].

We believe these works are complementary to our own, and it will be of interest to explore potential connections between them in the future.

3 Methodology

This section navigates through the verification process step by step. First, it covers the basics with necessary preliminaries. Then, it delves into the specific targets focused on in computer vision and their motivation. Finally, it formulates these targets into the layers of neural networks, integrating them as an essential part of the verification process.

3.1 Preliminaries

In computer vision, various tasks aim to understand and interpret visual data, each with its own unique challenges. Image classification models categorize entire images into predefined classes. Formally, such a model takes an input image $\mathcal{I}$ and maps it to a class label $c$ , chosen from a discrete set of possible labels $\mathcal{C}$ . Each label is assigned a confidence score, and the highest score indicates the predicted class.

Object Detection (OD) extends this by not only identifying the classes present in an image but also localizing each object by predicting bounding boxes. An OD model takes an input image $\mathcal{I}$ and produces a set of bounding boxes $\mathcal{B}$ . Each bounding box in $\mathcal{B}$ is a tuple ${(x,y,w,h,c)}$ , where ${(x,y)}$ represents the coordinates of the top-left corner, ${(w,h)}$ denotes its dimensions, and $c$ signifies the detected object’s class label.

An adversarial attack involves intentionally altering an original data point $\mathcal{I}_{0}\in\mathbb{R}^{d}$ to a different one, $\mathcal{I}\in\mathbb{R}^{d}$ , with $\mathcal{I}\neq\mathcal{I}_{0}$ , so that the altered data is interpreted differently by the model, potentially leading to incorrect or unintended outputs.

In formal verification, the objective is to ensure that the neural network satisfies certain desired properties under all possible input scenarios within a specified input set. The input set $\mathcal{I}$ is often defined by a set of constraints $\mathcal{C}_{in}$ , which bound the possible values the input can take. For instance, given an input $\mathcal{I}_{0}$ , the input set could be defined as all points $\mathcal{I}$ such that the difference between $\mathcal{I}$ and $\mathcal{I}_{0}$ is within a specified range: $|\mathcal{I}-\mathcal{I}_{0}|\leq\delta$ , where $\delta$ represents the allowable perturbation.

The model’s output, denoted as $\mathcal{O}=f(\mathcal{I})$ , is then verified against a set of output constraints $\mathcal{C}_{out}$ , which define the expected or acceptable behavior of the model. The task is to prove that for all inputs $\mathcal{I}$ satisfying $\mathcal{C}_{in}$ , the outputs $\mathcal{O}$ will satisfy $\mathcal{C}_{out}$ . Formally, the verification problem can be stated as:

\forall\mathcal{I}\in\mathcal{C}_{in},\quad f(\mathcal{I})\in\mathcal{C}_{out}

This ensures that the neural network behaves correctly for all possible inputs within the defined constraints, providing a strong guarantee of the model’s reliability in practical scenarios.

For image classification verification, the goal is to ensure that a model reliably predicts the correct class label under input variations. The input constraints $\mathcal{C}_{in}$ define permissible modifications to an original image $\mathcal{I}_{0}$ , forming a set of inputs $\mathcal{I}$ where $|\mathcal{I}-\mathcal{I}_{0}|\leq\delta$ . This ensures that all modified inputs remain sufficiently similar to the original.

The output constraints $\mathcal{C}_{out}$ require that the predicted class label remains consistent for any input $\mathcal{I}$ within $\mathcal{C}_{in}$ . This consistency is crucial for maintaining classification accuracy despite input perturbations within the specified range. So far, such verification techniques in computer vision have been limited to image classification tasks.

Unlike the classification domain, which typically revolves around a singular option – misclassification – OD presents a broader range of potential attack vectors. In OD, adversaries have multiple avenues to explore, including:

1.

Misclassification: This attack vector mirrors classification-based adversarial attacks and involves deliberately causing OD models to incorrectly label detected objects. Formally, when the output of the OD model is a tuple $\mathcal{D}=(x,y,h,w,c)$ , misclassification occurs when $c\neq\text{ground truth}$ .

Mislocalization: This attack vector involves perturbing the localization aspect of OD models, thereby causing them to erroneously estimate the position and extent of objects within the image. Formally, mislocalization occurs when any of the localization parameters ( $x$ , $y$ , $h$ , or $w$ ) deviates significantly from their ground truth values, leading to inaccurate object localization. Given that perfect matching is rarely attainable, the success of the mislocalization attack is assessed by calculating the Intersection over Union (IoU) between the original detection and the perturbed one. If the IoU is below a predefined threshold, it signifies the effectiveness of the attack.

IoU is the shared area between the actual object and the detected one, formally defined using bounding box (bb) notations as:

\text{IoU}=\frac{\text{Area of Overlap between the detected bb and the ground % truth bb}}{\text{Area of Union between the detected bb and the ground truth bb}}

Using mathematical symbols, this can be represented as:

\text{IoU}=\frac{|A\cap B|}{|A\cup B|}

where $A$ represents the area of the detected bounding box, and $B$ represents the area of the ground truth bounding box.

3.

Misdetection/False Positives: In this scenario, the aim is to manipulate the model into either missing the detection of genuine objects or generating false positives—detecting objects that are not present in the image. Assume there are $d$ objects in a given image, each represented as a ground truth tuple $\mathcal{G}_{i}=(x_{i},y_{i},h_{i},w_{i},c_{i})$ for $i=1,2,\ldots,d$ . The output of the OD model is represented as a set of tuples $\mathcal{D}=\{\mathcal{D}_{1},\mathcal{D}_{2},\ldots,\mathcal{D}_{n}\}$ , where each $\mathcal{D}_{j}=(x_{j},y_{j},h_{j},w_{j},c_{j})$ corresponds to a detected object by the model. An attack of this type could manifest in one of two ways:

Missing Genuine Objects: For any $\mathcal{G}_{i}$ , if there exists no $\mathcal{D}_{j}$ such that $c_{j}=c_{i}$ and the IoU between $\mathcal{G}_{i}$ and $\mathcal{D}_{j}$ exceeds a predefined threshold, it indicates that a genuine object has been missed by the OD model.

False Positives: If there exists $\mathcal{D}_{j}$ for which $c_{j}$ does not match any ground truth class label and the IoU between $\mathcal{D}_{j}$ and all $\mathcal{G}_{i}$ is below a predefined threshold, it suggests the generation of false positives, where non-existent objects are erroneously detected. The effectiveness of the attack is determined by assessing the occurrences of these two scenarios.

3.2 Method

After establishing the preliminaries and defining formal verification for image classification, we expand the scope to include other key computer vision tasks. We begin by explaining the different types of attacks in detail, then integrate them into a comprehensive, unified formulation.

Object detection involves both locating and classifying one or more objects in an image. The most straightforward attack targets the classification aspect, similar to traditional neural network verification methods. However, the more intriguing challenge lies in extending beyond classification to handle the unique characteristics of OD. Unlike classification, OD deals with multiple objects in an image, introducing the possibility of attacks that manipulate the number of detected objects. Furthermore, the natural extension of these attacks focuses on localization, aiming to deceive the detector into identifying objects but placing them in incorrect locations.

Up to this point, the unique attacks specific to object detection, which is not found in classification, have been outlined: targeting multiple objects and localization errors. While the input constraints remain the same, the output constraints differ. However, these constraints share a common foundation, allowing for a generic formulation. As previously stated, the most general form of the output constraint is $\forall\mathcal{I}\in\mathcal{C}_{in},\quad f(\mathcal{I})\in\mathcal{C}_{out}$ . First, let’s demonstrate how each type of attack aligns with this form:

For the attack on the number of detected objects, the verification process aims to align the detected objects with the ground truth. Naively, the number of detected object ( $\#\mathcal{B}$ ) should be aligned with the number of ground truth $\#GT$ . Since false-positives and false-negatives cancel each other in $\#\mathcal{B}$ , the desired property is changed: $\#\mathcal{B}_{\text{positive}}=\#GT_{\text{positive}}$ . This constraint ensures that the true positive detections align with the ground truth, providing a more reliable assessment of object detection performance and effectively capturing both false positives and false negatives.

For localization, the focus is on the positions of the detected objects. The Intersection over Union (IoU) measure quantifies the overlap between detections and ground truth objects. If the IoU of each detected object exceeds a predefined threshold, $\tau$ , and the detected class labels are accurate, then the objects are considered to be correctly located. This measure indicates the degree of agreement between the detected bounding box and the ground truth. The motivation here is to verify that the IoU of each detected object is indeed higher than $\tau$ , ensuring accurate localization and reliable assessment of detection performance.

3.3 Encoding

To implement these operations and constraints, we explored two directions, each with its pros and cons, which we will discuss. The first approach, the more direct and naïve one, aims to be as implicit as possible. This means running a given network, taking its native output, and querying it using a verifier with minimal required modifications. The second approach is to be explicit: encode each unique operation as a neural network layer, which can be verified natively. While the first method is general and offloads most of the process to behind the scenes, it forces the verifier to abstract all network operations. In contrast, the second approach, being more explicit, shifts the responsibility to us and leaves the verifier to encode known operations. This method is, of course, more complex but offers greater control and enables more feasible scenarios.

Since the first method requires no additional formal definitions we focus here on the formulation of the second approach. A verifier originally designed for classification can be adapted to other tasks by encoding unique operators or stages as DNN layers, which can then be integrated into the verifier. The network itself is standard, consisting of linear layers and non-linear activations. Those layers and activations are commonly encoded into the verifier, and we add encoding for specialized tasks, such as matching detection and ground truth for OD. This approach ensures that these specialized operators are rigorously formulated and effectively encoded into the standard verifier.

Modeling IoU with Piecewise Linear Constraints

Given two bounding boxes $B_{gt}=(x_{g},y_{g},w_{g},h_{g})$ and $B_{dt}=(x_{d},y_{d},w_{d},h_{d})$ , the IoU can be calculated using the following definitions and constraints:

Intersection Calculation

The coordinates of the intersection rectangle between $B_{gt}$ and $B_{dt}$ are:

	$\displaystyle x_{i1}$	$\displaystyle=\max(x_{g},x_{d}),$
	$\displaystyle y_{i1}$	$\displaystyle=\max(y_{g},y_{d}),$
	$\displaystyle x_{i2}$	$\displaystyle=\min(x_{g}+w_{g},x_{d}+w_{d}),$
	$\displaystyle y_{i2}$	$\displaystyle=\min(y_{g}+h_{g},y_{d}+h_{d}).$

The width ( $w_{i}$ ) and height ( $h_{i}$ ) of the intersection are given by:

	$\displaystyle w_{i}$	$\displaystyle=\max(0,x_{i2}-x_{i1}),$
	$\displaystyle h_{i}$	$\displaystyle=\max(0,y_{i2}-y_{i1}).$

The area of the intersection ( $A_{I}$ ) is $w_{i}\cdot h_{i}$ .

Union Calculation

The areas of each bounding box are:

	$\displaystyle A_{g}$	$\displaystyle=w_{g}\cdot h_{g},$
	$\displaystyle A_{d}$	$\displaystyle=w_{d}\cdot h_{d}.$

The area of the union is:

A_{U}=A_{g}+A_{d}-A_{I}.

IoU Calculation and Constraint

The IoU is defined as:

IoU=\frac{A_{I}}{A_{U}}.

To enforce the constraint $IoU>\tau$ , we express this as a linear constraint,which can be encoded using ReLU activation:

A_{I}>\tau\cdot A_{U}\Leftrightarrow A_{I}-\tau\cdot A_{U}>0.

Fig. 2 illustrates how the IoU between a single ground truth and a single detection can be encoded as a piecewise layer.

Verify the Number of Objects and their Localization

The matching between all ground truth and detection objects requires handling multiple bounding boxes, which necessitates using this encoding and adding another layer. When working with lists of detection boxes $dts=\{B_{dt1},B_{dt2},\ldots,B_{dtn}\}$ and ground truth boxes $gts=\{B_{gt1},B_{gt2},\ldots,B_{gtm}\}$ , we introduce binary variables $z_{ij}$ to indicate whether the IoU condition is met for each pair $(dt_{i},gt_{j})$ :

z_{ij}=ReLU(A_{I}-\tau\cdot A_{U})\

In order to verify whether at least one or all objects are accurately detected. At least one pair meets the IoU condition of:

\sum_{i=1}^{n}\sum_{j=1}^{m}z_{ij}\geq 0.

Where we aim to verify that all objects are detected correctly, with no false positives, each detection $dt_{i}$ must match one ground truth $gt_{j}$ , and vice versa. Here, we define $z_{ij}$ as a binary indicator where $z_{ij}=1$ if detection $dt_{i}$ matches ground truth $gt_{j}$ , and 0 otherwise:

z_{ij}=\begin{cases}1&\text{if }A_{I_{ij}}<\tau\cdot A_{U_{ij}},\\ 0&\text{otherwise}.\end{cases}

This binarization is approximate, as explained at the end of section 3. It is important to note that this approximation makes this particular verification incomplete. Using the binarized values of $z_{ij}$ , the following constraints ensure accurate matching:

Each detection matches exactly one ground truth:

\sum_{j=1}^{m}z_{ij}=1,\quad\forall i\in\{1,\ldots,n\}.

Each ground truth is matched by exactly one detection:

\sum_{i=1}^{n}z_{ij}=1,\quad\forall j\in\{1,\ldots,m\}.

These constraints ensure that detections accurately match the ground truths without false positives or false negatives.

Fig. 3 illustrates how the final constraints $C_{out}$ are encoded to verify both cases: missing all objects and not detecting all of them.

Approximation of the Heaviside Function Using the Sigmoid Function

The Heaviside step function, traditionally denoted as $H(x)$ , is defined as zero for $x<0$ and one for $x\geq 0$ . In computational settings, where a smooth and differentiable approximation is required, the sigmoid function can be utilized as an effective surrogate. The sigmoid function $S(x)$ , defined by the formula:

S(x)=\frac{1}{1+e^{-\text{slope}\cdot x}}

provides a smooth transition between 0 and 1. The parameter ‘slope’ adjusts the steepness of the sigmoid function’s curve, making it versatile for different levels of approximation granularity. This property makes the sigmoid function particularly useful in scenarios where a gradual change is more physically realistic or numerically stable than an abrupt switch.

Figure 4 illustrates the approximation of the Heaviside function by the sigmoid function over a range of slope values, $\text{slope}\in\{1,10,100,1000\}$ . As depicted, for high slope values, $S(x)$ serves as a very good approximation of $H(x)$ , closely mirroring its behavior due to the sharper transition at $x=0$ .

In our context, we used this approximation for binarizing variables, allowing us to encode them as a single layer. However, verification properties that rely on this approximation are inherently incomplete.

4 Results

We demonstrate the initial steps in implementing object detection verification framework. As mentioned earlier, we have established two methods for this purpose and we will present the results of each method respectively. The technical details are as follows:

In the first method, experimental input images were sourced from the COCO dataset[65], reduced to a low resolution of $64\times 48$ pixels due to memory limitations. We used the neural network-based detector YOLO[22], specifically YOLOv5, the fifth generation of this widely popular and efficient one-stage detector. The medium model, containing 46 million parameters, was chosen and trained on 80 classes. To demonstrate the second method, we used the LARD[66] and MNIST-OD[67] datasets, which contain images of $256\times 256$ and $90\times 90$ pixels, respectively. For these datasets, we migrated a designed and trained network from [48]. Experiments were conducted on a Linux-based system with an RTX 5000 GPU to ensure efficient processing. For verification, the alpha-beta-CROWN tool[40] was utilized.

We start by showing the results of verification using the first method, applying the COCO dataset on the YOLO detector. As a first example, we verify an image with a single object, as illustrated in Fig. 5. With an allowed per-pixel noise level of 0.01, the verifier identifies a case where the detector correctly identifies the object, but in the wrong location, such that the IoU between the actual and predicted bounding boxes is less than $\tau$ (0.5 in our case). For images with multiple objects, it is necessary to aggregate all IoUs into a single metric or enhance the verifier’s capabilities. The current verifier imposes limitations, particularly when trying to attack all objects simultaneously, necessitating careful design of the unified constraints. For instance, this can be accomplished by utilizing the query that the $\max\text{(IoU)}$ might be less than the threshold.

Figure 7 provides a visual representation of the misclassification results. Experiments show that adversarial attacks can lead to a significant increase in misclassified objects. In some cases, objects that were correctly identified before the attack are now misclassified. Figure 7 provides insights into the presence of extra or missing detections caused by adversarial attacks. In some instances, the detector produces additional bounding boxes that do not correspond to real objects, leading to cluttered results. Conversely, the attacks can also result in the omission of objects that should have been detected.

Though we found promising initial results with this method, it suffers from two main drawbacks. Firstly, since we didn’t modify the network itself but only adapted the verifier, the verification metrics were forced to operate in higher dimensions, resulting in significant memory and efficiency issues. Secondly, all our attempts at hyperparameter selection either resulted in counterexamples or timeouts, meaning we only succeeded in identifying vulnerabilities (attack models) but could not verify the network and obtain guarantees. These limitations were addressed in the second verification method, which proved to be much more efficient. By explicitly encoding each operation, we gained greater control over the verification process. This approach not only resolved the memory and efficiency issues, but also allowed us to easily find UNSAT cases, thereby providing formal guarantees of network robustness.

Using different ranges of noise, we obtained varying results for the MNIST-OD and LARD datasets. For MNIST-OD, the detector should find the location of digit and classify it. For the LARD dataset, the detector is supposed to identify the runway for landing. As shown in Figures 8 and 9, detectors perform poorly under high levels of noise (right), failing completely. However, with a small amount of noise (middle), the detector performs adequately. The most important result is that the verifier returned UNSAT for the lower noise level, which guarantees that the network is robust for this level of noise and the specific input image. It’s important to note that while we only show the first images from the datasets, the results were consistent for all images in the datasets: SAT examples were found for high noise levels, and they were verified for lower noise levels.

5 Conclusion

This work tackles the challenge of utilizing formal verification to ensure the safety of computer vision models, specifically focusing on object detection as an extension of image classification. We have first proposed a comprehensive approach to treat these tasks as classic formal verification problems and outlined methods to implement them using state-of-the-art verification tools. Our developed methods for refuting or proving object detection robustness have successfully proven the robustness of the YOLOv5 model, or refuted it by finding counterexamples, across three datasets, including one used for autonomous landing.

Additionally, our contribution extends the formulation to new areas within the formal verification of DNNs, demonstrated through the implementation and examples presented. While further research is necessary, this work lays a strong foundation for integrating formal verification into computer vision, promising a safer future for these technologies.

References

[1] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C.J. Burges, L. Bottou, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc., 2012.
[2] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1, NIPS’12, page 1097–1105, Red Hook, NY, USA, 2012. Curran Associates Inc.
[3] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale, 2021.
[4] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need, 2023.
[5] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding, 2019.
[6] Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. Language models are few-shot learners, 2020.
[7] Anmol Gulati, James Qin, Chung-Cheng Chiu, Niki Parmar, Yu Zhang, Jiahui Yu, Wei Han, Shibo Wang, Zhengdong Zhang, Yonghui Wu, and Ruoming Pang. Conformer: Convolution-augmented transformer for speech recognition, 2020.
[8] Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, and Michael Auli. wav2vec 2.0: A framework for self-supervised learning of speech representations, 2020.
[9] Ze Liu, Jia Ning, Yue Cao, Yixuan Wei, Zheng Zhang, Stephen Lin, and Han Hu. Video swin transformer, 2021.
[10] Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Lučić, and Cordelia Schmid. Vivit: A video vision transformer, 2021.
[11] Guy Katz, Clark Barrett, David L. Dill, and Kyle Julian. Reluplex: An efficient smt solver for verifying deep neural networks. In International Conference on Computer Aided Verification, pages 97–117, 2017.
[12] Changliu Liu, Tomer Arnon, Christopher Lazarus, Christopher Strong, Clark Barrett, and Mykel J. Kochenderfer. Algorithms for verifying deep neural networks, 2020.
[13] Marco Casadio, Tanvi Dinkar, Ekaterina Komendantskaya, Luca Arnaboldi, Omri Isac, Matthew L. Daggitt, Guy Katz, Verena Rieser, and Oliver Lemon. Nlp verification: Towards a general methodology for certifying robustness, 2024.
[14] Guy Amir, Michael Schapira, and Guy Katz. Towards scalable verification of deep reinforcement learning, 2021.
[15] Shahaf Bassan, Guy Amir, Davide Corsi, Idan Refaeli, and Guy Katz. Formally explaining neural networks within reactive systems, 2023.
[16] Guy Amir, Davide Corsi, Raz Yerushalmi, Luca Marzari, David Harel, Alessandro Farinelli, and Guy Katz. Verifying learning-based robotic navigation systems, 2023.
[17] Davide Corsi, Guy Amir, Andoni Rodriguez, Cesar Sanchez, Guy Katz, and Roy Fox. Verification-guided shielding for deep reinforcement learning, 2024.
[18] Ross B. Girshick. Fast R-CNN. CoRR, abs/1504.08083, 2015.
[19] Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross B. Girshick. Mask r-cnn. In ICCV, pages 2980–2988. IEEE Computer Society, 2017.
[20] Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, and Roman Garnett, editors, NIPS, pages 91–99, 2015.
[21] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott E. Reed, Cheng-Yang Fu, and Alexander C. Berg. SSD: single shot multibox detector. CoRR, abs/1512.02325, 2015.
[22] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 779–788, 2016.
[23] Joseph Redmon and Ali Farhadi. Yolo9000: Better, faster, stronger, 2016.
[24] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement, 2018.
[25] Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. Yolov4: Optimal speed and accuracy of object detection, 2020.
[26] Zheng Ge, Songtao Liu, Feng Wang, Zeming Li, and Jian Sun. Yolox: Exceeding yolo series in 2021, 2021.
[27] Chuyi Li, Lulu Li, Hongliang Jiang, Kaiheng Weng, Yifei Geng, Liang Li, Zaidan Ke, Qingyuan Li, Meng Cheng, Weiqiang Nie, Yiduo Li, Bo Zhang, Yufei Liang, Linyuan Zhou, Xiaoming Xu, Xiangxiang Chu, Xiaoming Wei, and Xiaolin Wei. Yolov6: A single-stage object detection framework for industrial applications, 2022.
[28] Chien-Yao Wang, Alexey Bochkovskiy, and Hong-Yuan Mark Liao. Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors, 2022.
[29] Xianzhe Xu, Yiqi Jiang, Weihua Chen, Yilun Huang, Yuan Zhang, and Xiuyu Sun. Damo-yolo : A report on real-time object detection design, 2023.
[30] Xiang Long, Kaipeng Deng, Guanzhong Wang, Yang Zhang, Qingqing Dang, Yuan Gao, Hui Shen, Jianguo Ren, Shumin Han, Errui Ding, and Shilei Wen. Pp-yolo: An effective and efficient implementation of object detector, 2020.
[31] Xin Huang, Xinxin Wang, Wenyu Lv, Xiaying Bai, Xiang Long, Kaipeng Deng, Qingqing Dang, Shumin Han, Qiwen Liu, Xiaoguang Hu, Dianhai Yu, Yanjun Ma, and Osamu Yoshie. Pp-yolov2: A practical object detector, 2021.
[32] Shangliang Xu, Xinxin Wang, Wenyu Lv, Qinyao Chang, Cheng Cui, Kaipeng Deng, Guanzhong Wang, Qingqing Dang, Shengyu Wei, Yuning Du, and Baohua Lai. Pp-yoloe: An evolved version of yolo, 2022.
[33] Chien-Yao Wang, I-Hau Yeh, and Hong-Yuan Mark Liao. You only learn one representation: Unified network for multiple tasks, 2021.
[34] Chien-Yao Wang, Alexey Bochkovskiy, and Hong-Yuan Mark Liao. Scaled-yolov4: Scaling cross stage partial network, 2021.
[35] Guy Katz, Derek A. Huang, Duligur Ibeling, Kyle Julian, Christopher Lazarus, Rachel Lim, Parth Shah, Shantanu Thakoor, Haoze Wu, Aleksandar Zeljić, David L. Dill, Mykel J. Kochenderfer, and Clark Barrett. The marabou framework for verification and analysis of deep neural networks. In Isil Dillig and Serdar Tasiran, editors, Computer Aided Verification, pages 443–452, Cham, 2019. Springer International Publishing.
[36] Haoze Wu, Omri Isac, Aleksandar Zeljić, Teruhiro Tagomori, Matthew Daggitt, Wen Kokke, Idan Refaeli, Guy Amir, Kyle Julian, Shahaf Bassan, Pei Huang, Ori Lahav, Min Wu, Min Zhang, Ekaterina Komendantskaya, Guy Katz, and Clark Barrett. Marabou 2.0: A versatile formal analyzer of neural networks, 2024.
[37] P. Henriksen and Alessio Lomuscio. Efficient neural network verification via adaptive refinement and adversarial search. In European Conference on Artificial Intelligence, 2020.
[38] Patrick Henriksen and Alessio Lomuscio. Deepsplit: An efficient splitting method for neural network verification via indirect effect analysis. In Zhi-Hua Zhou, editor, Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, pages 2549–2555. International Joint Conferences on Artificial Intelligence Organization, 8 2021. Main Track.
[39] Huan Zhang, Tsui-Wei Weng, Pin-Yu Chen, Cho-Jui Hsieh, and Luca Daniel. Efficient neural network robustness certification with general activation functions, 2018.
[40] Shiqi Wang, Huan Zhang, Kaidi Xu, Xue Lin, Suman Jana, Cho-Jui Hsieh, and J Zico Kolter. Beta-CROWN: Efficient bound propagation with per-neuron split constraints for complete and incomplete neural network verification. Advances in Neural Information Processing Systems, 34, 2021.
[41] Claudio Ferrari, Mark Niklas Muller, Nikola Jovanovic, and Martin Vechev. Complete verification via multi-neuron relaxation guided branch-and-bound, 2022.
[42] Mark Niklas Müller, Gleb Makarchuk, Gagandeep Singh, Markus Püschel, and Martin Vechev. Prima: general and precise neural network certification via scalable convex hull approximations. Proceedings of the ACM on Programming Languages, 6(POPL):1–33, January 2022.
[43] A. H. Land and A. G. Doig. An automatic method of solving discrete programming problems. Econometrica, 28(3):497–520, 1960.
[44] David R. Morrison, Sheldon H. Jacobson, Jason J. Sauppe, and Edward C. Sewell. Branch-and-bound algorithms: A survey of recent advances in searching, branching, and pruning. Discrete Optimization, 19:79–102, 2016.
[45] Rudy Bunel, **gyue Lu, Ilker Turkaslan, Philip H. S. Torr, Pushmeet Kohli, and M. Pawan Kumar. Branch and bound for piecewise linear neural network verification, 2020.
[46] Mark Niklas Müller, Christopher Brix, Stanley Bak, Changliu Liu, and Taylor T. Johnson. The third international verification of neural networks competition (vnn-comp 2022): Summary and results, 2023.
[47] Christopher Brix, Stanley Bak, Changliu Liu, and Taylor T. Johnson. The fourth international verification of neural networks competition (vnn-comp 2023): Summary and results, 2023.
[48] Noémie Cohen, Mélanie Ducoffe, Ryma Boumazouza, Christophe Gabreau, Claire Pagetti, Xavier Pucel, and Audrey Galametz. Verification for object detection – ibp iou, 2024.
[49] Souradeep Dutta, Susmit Jha, Sriram Sanakaranarayanan, and Ashish Tiwari. Output range analysis for deep neural networks, 2017.
[50] Guy Amir, Davide Corsi, Raz Yerushalmi, Luca Marzari, David Harel, Alessandro Farinelli, and Guy Katz. Verifying learning-based robotic navigation systems, 2023.
[51] Raz Yerushalmi, Guy Amir, Achiya Elyasaf, David Harel, Guy Katz, and Assaf Marron. Scenario-assisted deep reinforcement learning, 2022.
[52] Avraham Raviv, Eliya Bronshtein, Or Reginiano, Michelle Aluf-Medina, and Hillel Kugler. Learning through imitation by using formal verification. In International Conference on Current Trends in Theory and Practice of Computer Science, pages 342–355. Springer, 2023.
[53] Avraham Raviv, Yuval Gerber, Liri Benzinou, Michelle Aluf-Medina, and Hillel Kugler. Prediction and control of stochastic agents using formal methods. In Nina Narodytska, Guy Amir, Guy Katz, and Omri Isac, editors, Proceedings of the 6th Workshop on Formal Methods for ML-Enabled Autonomous Systems, volume 16 of Kalpa Publications in Computing, pages 29–34. EasyChair, 2023.
[54] Shahaf Bassan and Guy Katz. Towards formal xai: Formally approximate minimal explanations of neural networks, 2023.
[55] Yizhak Elboher, Raya Elsaleh, Omri Isac, Mélanie Ducoffe, Audrey Galametz, Guillaume Povéda, Ryma Boumazouza, Noémie Cohen, and Guy Katz. Robustness assessment of a runway object classifier for safe aircraft taxiing, 2024.
[56] Guy Amir, Haoze Wu, Clark Barrett, and Guy Katz. An smt-based approach for verifying binarized neural networks, 2021.
[57] Guy Amir, Osher Maayan, Tom Zelazny, Guy Katz, and Michael Schapira. Verifying generalization in deep learning, 2023.
[58] Guy Amir, Osher Maayan, Tom Zelazny, Guy Katz, and Michael Schapira. Verifying the generalization of deep learning to out-of-distribution domains, 2024.
[59] Omri Isac, Clark Barrett, Min Zhang, and Guy Katz. Neural network verification with proof production, 2022.
[60] Raya Elsaleh and Guy Katz. Delbugv: Delta-debugging neural network verifiers, 2023.
[61] Yerushalmi Raz, Amir Guy, Elyasaf Achiya, Harel David, Katz Guy, and Marron Assaf. Enhancing deep reinforcement learning with scenario-based modeling, 2023.
[62] Yizhak Yisrael Elboher, Justin Gottschlich, and Guy Katz. An Abstraction-Based Framework for Neural Network Verification, page 43–65. Springer International Publishing, 2020.
[63] Elazar Cohen, Yizhak Yisrael Elboher, Clark Barrett, and Guy Katz. Tighter abstract queries in neural network verification, 2023.
[64] Yizhak Yisrael Elboher, Elazar Cohen, and Guy Katz. Neural network verification using residual reasoning, 2022.
[65] Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, and Piotr Dollár. Microsoft coco: Common objects in context, 2015.
[66] Mélanie Ducoffe, Maxime Carrere, Léo Féliers, Adrien Gauffriau, Vincent Mussot, Claire Pagetti, and Thierry Sammour. Lard–landing approach runway detection–dataset for vision based landing. arXiv preprint arXiv:2304.09938, 2023.
[67] Li Deng. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6):141–142, 2012.