License: CC BY-NC-SA 4.0
arXiv:2103.09151v8 [cs.CV] 12 Dec 2023

Adversarial Driving: Attacking End-to-End Autonomous Driving
thanks: The project is supported by Offshore Robotics for Certification of Assets (ORCA) Partnership Resource Fund (PRF) on Towards the Accountable and Explainable Learning-enabled Autonomous Robotic Systems (AELARS) [EP/R026173/1]. .

1st Han Wu Computer Science
The University of Exeter
Exeter, the United Kingdom
[email protected]
   2nd Syed Yunas Computer Science
The University of the West of England
Bristol, the United Kingdom
[email protected]
   3rd Sareh Rowlands
{@IEEEauthorhalign} 4th Wenjie Ruan
Computer Science
The University of Exeter
Exeter, the United Kingdom
[email protected]
Computer Science
The University of Exeter
Exeter, the United Kingdom
[email protected]
   5th Johan Wahlström*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT Computer Science
The University of Exeter
Exeter, the United Kingdom
[email protected]
Abstract

As research in deep neural networks advances, deep convolutional networks become promising for autonomous driving tasks. In particular, there is an emerging trend of employing end-to-end neural network models for autonomous driving. However, previous research has shown that deep neural network classifiers are vulnerable to adversarial attacks. While for regression tasks, the effect of adversarial attacks is not as well understood. In this research, we devise two white-box targeted attacks against end-to-end autonomous driving models. Our attacks manipulate the behavior of the autonomous driving system by perturbing the input image. In an average of 800 attacks with the same attack strength (epsilon=1), the image-specific and image-agnostic attack deviates the steering angle from the original output by 0.478 and 0.111, respectively, which is much stronger than random noises that only perturbs the steering angle by 0.002 (The steering angle ranges from [-1, 1]). Both attacks can be initiated in real-time on CPUs without employing GPUs. Demo video: https://youtu.be/I0i8uN2oOP0.

Index Terms:
Adversarial Attacks, Imitation Learning, Deep Neural Network.

I Introduction

Autonomous driving is one of the most challenging tasks in safety-critical robotic applications. Most real-world autonomous vehicles employ modular systems that divide the driving task into smaller subtasks. In addition to a perception module that relies on deep learning models to locate and classify objects in the environment, modular systems also include localization, prediction, planning, and control modules. However, researchers are also exploring the potential of end-to-end driving systems. An end-to-end driving system is a monolithic module that directly maps the input to the output, often using deep neural networks. For example, the NVIDIA end-to-end driving model [1] maps raw pixels from the front-facing camera to steering commands. The development of end-to-end driving systems has been facilitated by recent advances in high-performance GPUs and the development of photo-realistic driving simulators, such as the Carla Simulator [2] and the Microsoft Airsim Simulator [3].

Refer to caption
Figure 1: Adversarial Driving: The behavior of and end-to-end autonomous driving model can be manipulated by adding unperceivable perturbations to the input image.

As demonstrated in multiple contexts, deep neural networks are vulnerable to adversarial attacks. Typically, these attacks fool an image classification model by adding an unperceivable perturbation to the input image [4]. Despite the fact that the number of academic publications discussing end-to-end deep learning models is steadily increasing, their safety in real-world scenarios is still unclear. Though end-to-end models may lead to better performance and smaller systems, the monolithic module is also particularly vulnerable to adversarial attacks. In addition, note that current research on adversarial attacks primarily focuses on classification tasks. The effect of these attacks on regression tasks, however, largely remains unexplored. Our research explores the possibility of achieving real-time attacks against NVIDIA’s end-to-end regression model (See Fig. 1). However, the attacks may also be applied to the perception module in a modular driving system.

The main contributions of this paper are as follows:

  • We propose two online white-box adversarial attacks against an end-to-end regression model for autonomous driving: one strong attack that generates the perturbation for each frame (image-specific), and one stealth attack that produces a universal adversarial perturbation that attacks all frames (image-agnostic).

  • The robustness of the attacks is illustrated using experiments conducted in Udacity Simulator. The experiments demonstrate that it only takes the attack a few seconds to deviate the vehicle to outside of the lane.

  • To facilitate future extensions and benchmark comparisons, our attack is open-sourced on Github111The code is available on Github: https://github.com/wuhanstudio/adversarial-driving. As far as the authors are aware, this is the first open-source real-time attack on regressional driving models.

II Preliminaries

This section categorizes and describes end-to-end driving systems and associated adversarial attacks.

II-A End-to-End Driving Systems

End-to-end driving systems treat the driving pipeline as a monolithic module that maps sensor inputs directly to steering commands [5]. Typically, end-to-end driving systems are implemented using either imitation learning or reinforcement learning. Imitation learning methods use deep neural networks to learn and mimic human driving behavior [6]. A supervisor is responsible for feeding the algorithm with labeled data. Reinforcement learning methods, on the other hand, improve driving policies via exploration and exploitation. The training process is not dependent on the existence of any supervisor. While there is a growing trend of publications that use reinforcement learning [7][8][9][10][11], imitation learning is still more popular in end-to-end driving models [12][13][14][15]. For this reason, our research will also focus on attacking imitation learning models.

The first implementation of an imitation-learning-based end-to-end driving system was the Autonomous Land Vehicle in a Neural Network (ALVINN) system, which trained a 3-layer fully connected network to steer a vehicle on public roads [16]. However, end-to-end driving models have also been applied for the task of off-road driving [17]. More recently, researchers from NVIDIA built a convolutional neural network to map raw pixels from a single front-facing camera directly to steering commands [1]. The NVIDIA end-to-end driving model is the target model in this paper, and details on this model are presented in Section III.

II-B Adversarial Attacks

This paper will consider an end-to-end driving model that outputs continuous steering commands, which is a regression model. Prior research on adversarial attacks primarily focuses on attacking classification models [18] [19][20][21].

A successful attack against classification models deviates the output from the correct label. Taking the digital handwritten digit classification task as an example, an attacker can fool the classifier into recognizing the number 3 as 7. To evaluate the performance of an adversarial attack against a regression model, we need to quantify the magnitude of the resulting deviation. An attack that causes the steering angle to deviate from 1.00 to 0.99 will typically be considered unsuccessful since such a tiny deviation may not have any noticeable effect on the driving outcome. To be considered successful, an attack must lead to larger deviations. Prior research used Root Mean Squared Error (RMSE) [22] and Mean Square Error (MSE) [23] to evaluate and compare deviations. A successful attack should produce a higher MSE or RMSE than random attacks. Boloor et al. attacked an end-to-end self-driving model using human-perceivable physical shadow [24], while our research focuses on generating human-unperceivable perturbations.

While prior research primarily focuses on offline attacks against classification models, we investigate online attacks against regression models. Offline attacks apply perturbations to static images. Under the scenario of autonomous driving, an offline attack splits the driving record into static images and the corresponding steering angles. The perturbation is then applied to each static image, and the attack is evaluated using the overall success rate [25]. Online attacks, on the other hand, apply the perturbation in a dynamic environment. Rather than applying the perturbation to static images in a driving record, we deploy the perturbation while the vehicle is driving. This also makes it possible to investigate the driving models’ reactions to the attacks.

One big difference between online and offline attacks is that the ground truth is unavailable in online attacks. Offline attacks take pre-recorded human drivers’ steering angles as the ground truth, while real-time online attacks do not have access to pre-recorded human decisions. Therefore, we use the model output under normal benign conditions as the ground truth and assume that the driving model is comparatively close to the ground truth. This assumption is reasonable since if the model is inaccurate, the erroneous model is already a threat in itself. There is no need to attack the system in the first place.

Existing adversarial attacks can be categorized into white-box, gray-box, and black-box attacks [26]: In white-box attacks, the adversaries have full knowledge of the target model, including model architecture and parameters; In gray-box attacks, the adversaries have partial information about the target model; In black-box attacks, the adversaries can only gather information about the model through querying. Since white-box attacks are more efficient, we devise two white-box attacks that achieve real-time performance against end-to-end driving models .

III Problem Formulation

In this section, we specify our objective, introduce mathematical notation, and describe our target model. Throughout the paper, we will use the notation

y𝑦\displaystyle yitalic_y =f(θ,x)absent𝑓𝜃𝑥\displaystyle=f(\theta,x)= italic_f ( italic_θ , italic_x ) (1)
ysuperscript𝑦\displaystyle y^{\prime}italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT =f(θ,x)absent𝑓𝜃superscript𝑥\displaystyle=f(\theta,x^{\prime})= italic_f ( italic_θ , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) (2)

where y𝑦yitalic_y is the benign output steering command, f(θ,x)𝑓𝜃𝑥f(\theta,x)italic_f ( italic_θ , italic_x ) is the regression model that maps input images to steering commands, θ𝜃\thetaitalic_θ is the model parameters, x𝑥xitalic_x is the original input image, ysuperscript𝑦y^{\prime}italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is the adversarial output steering command, and xsuperscript𝑥x^{\prime}italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is the adversarial input image. Further, we will use η=xx𝜂𝑥superscript𝑥\eta=x-x^{\prime}italic_η = italic_x - italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT to denote the adversarial perturbation, y*superscript𝑦y^{*}italic_y start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT is the ground truth steering command, and J(y,y*)𝐽𝑦superscript𝑦J(y,y^{*})italic_J ( italic_y , italic_y start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) to denote the training loss. Given an input image x𝑥xitalic_x, the objective of attacking a classifier is to generate a small perturbation η𝜂\etaitalic_η, such that yy*superscript𝑦superscript𝑦y^{\prime}\neq y^{*}italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≠ italic_y start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT. However, the objective of attacking a regression model is to generate a small perturbation η𝜂\etaitalic_η, such that the difference between ysuperscript𝑦y^{\prime}italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and y*superscript𝑦y^{*}italic_y start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT is larger than the average deviation caused by adding random noise to x𝑥xitalic_x.

We use the L2subscript𝐿2L_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norm to quantify the magnitude of the perturbation. The L2subscript𝐿2L_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norm of the perturbation η𝜂\etaitalic_η should be smaller than 0.03 (8 / 255) for an RGB input image according to the value used in prior research [27] [28]. In particular, to ensure that the perturbation is unperceivable to human eyes, we require

xx2=η2ξsubscriptnormsuperscript𝑥𝑥2subscriptnorm𝜂2𝜉||x^{{}^{\prime}}-x||_{2}=||\ {\eta}\ ||_{2}\leq\xi| | italic_x start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT - italic_x | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = | | italic_η | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ italic_ξ (3)

where ξ=0.03𝜉0.03\xi=0.03italic_ξ = 0.03.

Our target model is the NVIDIA end-to-end driving model [1]. The input shape of the model is (160, 320, 3), which represents (height, width, channel) respectively. The output steering angle is in the range of [1,1]11[-1,1][ - 1 , 1 ] on all our (unperturbed) collected images. An output of 11-1- 1 represents steering to the left, and an output of 1 means steering to the right. The input image is captured by the front camera, and we then apply predefined preprocessing methods before feeding the image to the model. Refer to [1] for details on these preprocessing methods, including crop**, resizing, and RGB to YUV.

IV Adversarial Attacks

In this section, we devise two white-box attacks against the driving system: one image-specific attack and one image-agnostic attack. Then, we present the system architecture.

IV-A Image-specific Attack

The first adversarial attack against a classifier was an image-specific offline attack that generated one perturbation for every input image [4]. Instead of minimizing the training loss J(y,y*)𝐽𝑦superscript𝑦J(y,\ y^{*})italic_J ( italic_y , italic_y start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ), Goodfellow et al. maximized the training loss and then used the gradient of the training loss to generate the perturbation. However, online attacks do not have access to the ground truth y*superscript𝑦y^{*}italic_y start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT, and thus, for online attacks, the training loss J(y,y*)𝐽𝑦superscript𝑦J(y,y^{*})italic_J ( italic_y , italic_y start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ) cannot be calculated. As a result, we need a new adversarial loss J(y)𝐽𝑦J(y)italic_J ( italic_y ) that only requires the model output y𝑦yitalic_y to generate the perturbation.

When attacking a regression model, notice that we have the choice to either increase or decrease the output. For example, to attack the end-to-end driving regression model, we can either deviate the vehicle to the left by decreasing the output or to the right by increasing the output. Therefore, in some sense, attacks on regression models can be seen as a special case of attacks on classification models, with the constraint that we only have two choices: increasing or decreasing the output. Accordingly, we will consider the straightforward adversarial loss functions

J𝚕𝚎𝚏𝚝(y)subscript𝐽𝚕𝚎𝚏𝚝𝑦\displaystyle J_{\texttt{left}}(y)italic_J start_POSTSUBSCRIPT left end_POSTSUBSCRIPT ( italic_y ) =yabsent𝑦\displaystyle=-y= - italic_y (4)
J𝚛𝚒𝚐𝚑𝚝(y)subscript𝐽𝚛𝚒𝚐𝚑𝚝𝑦\displaystyle J_{\texttt{right}}(y)italic_J start_POSTSUBSCRIPT right end_POSTSUBSCRIPT ( italic_y ) =yabsent𝑦\displaystyle=y= italic_y (5)

for the image-specific attack.

As explained in Section II-B, the adversarial loss functions J(y)𝐽𝑦J(y)italic_J ( italic_y ) do not include ground truth y*superscript𝑦y^{*}italic_y start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT, which we do not have access to for online attacks. We can then utilize the Fast Gradient Sign Method (FGSM) to generate perturbations as

η=ϵsign[x(J(y))]𝜂italic-ϵsigndelimited-[]subscript𝑥𝐽𝑦\eta=\epsilon\ \text{sign}[\nabla_{x}(J(y))]italic_η = italic_ϵ sign [ ∇ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_J ( italic_y ) ) ] (6)

where ϵitalic-ϵ\epsilonitalic_ϵ is a scaling factor that determines the visibility of the perturbation. The image-specific attack is summarized in Algorithm 1.

As an example, assume that the attacker wishes to attack the vehicle to the right side. In this case, the objective is to increase the model output. We can then use the adversarial loss Jright(y)subscript𝐽𝑟𝑖𝑔𝑡𝑦J_{right}(y)italic_J start_POSTSUBSCRIPT italic_r italic_i italic_g italic_h italic_t end_POSTSUBSCRIPT ( italic_y ) to generate the perturbation. x(J(y))subscript𝑥𝐽𝑦\nabla_{x}(J(y))∇ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_J ( italic_y ) ) represents the gradient of the adversarial loss over the input. The gradient gives us information regarding how changes in the adversarial loss y𝑦yitalic_y will back-propagate to the input.

Algorithm 1 Image-specific Attack
Input: The regression model f(θ,x)𝑓𝜃𝑥f(\theta,x)italic_f ( italic_θ , italic_x ), the input images {xt}subscript𝑥𝑡\{x_{t}\}{ italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } where xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the image at time step t𝑡titalic_t.
Parameters: The strength of the attack ϵitalic-ϵ\epsilonitalic_ϵ.
Output: Image-specific perturbation η𝜂\etaitalic_η.
for each time step t𝑡titalic_t do
     Inference: y=f(θ,x)𝑦𝑓𝜃𝑥y=f(\theta,x)italic_y = italic_f ( italic_θ , italic_x )
     Perturbation: η=ϵsign[x(J(y))]𝜂italic-ϵsigndelimited-[]subscript𝑥𝐽𝑦\eta=\epsilon\ \text{sign}[\nabla_{x}(J(y))]italic_η = italic_ϵ sign [ ∇ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_J ( italic_y ) ) ]
end for
Algorithm 2 Image-agnostic Attack (Training)
Input: The regression model f(θ,x)𝑓𝜃𝑥f(\theta,x)italic_f ( italic_θ , italic_x ), input images in a driving record X𝑋Xitalic_X, the target direction I{1,1}𝐼11I\in\{-1,1\}italic_I ∈ { - 1 , 1 }.
Parameters: the number of iterations n𝑛nitalic_n, the learning rate α𝛼\alphaitalic_α, the step size ξ𝜉\xiitalic_ξ , and the strength of the attack ϵitalic-ϵ\epsilonitalic_ϵ measured by the lsubscript𝑙l_{\infty}italic_l start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT norm.
Output: Image-agnostic perturbation η𝜂\etaitalic_η.
Initialization: η0𝜂0\eta\leftarrow 0italic_η ← 0
for each iteration do
     for each input image x𝑥xitalic_x in the driving record X𝑋Xitalic_X do
         Inference: y=f(θ,x+η)𝑦𝑓𝜃𝑥𝜂y=f(\theta,x+\eta)italic_y = italic_f ( italic_θ , italic_x + italic_η )
         if sign(y)Isign𝑦𝐼\text{sign}(y)\neq Isign ( italic_y ) ≠ italic_I then
              x=x+ηsuperscript𝑥𝑥𝜂x^{{}^{\prime}}=x+\etaitalic_x start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT = italic_x + italic_η
              ηt0subscript𝜂𝑡0\eta_{t}\leftarrow 0italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ← 0
              while sign(y)Isign𝑦𝐼\text{sign}(y)\neq Isign ( italic_y ) ≠ italic_I do
                  Gradients: =J(y)x𝐽𝑦superscript𝑥\nabla=\frac{\partial J(y)}{\partial x^{\prime}}∇ = divide start_ARG ∂ italic_J ( italic_y ) end_ARG start_ARG ∂ italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG
                  Perturbation: ηt=ηt+proj2(,ξ)subscript𝜂𝑡subscript𝜂𝑡𝑝𝑟𝑜subscript𝑗2𝜉\eta_{t}=\eta_{t}+proj_{2}(\nabla,\ \xi)italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_p italic_r italic_o italic_j start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( ∇ , italic_ξ )
                  Inference: y=f(θ,x+ηt)𝑦𝑓𝜃𝑥subscript𝜂𝑡y=f(\theta,x+\eta_{t})italic_y = italic_f ( italic_θ , italic_x + italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )
              end while
              η=proj(η+αξηt,ϵ)𝜂𝑝𝑟𝑜subscript𝑗𝜂𝛼𝜉subscript𝜂𝑡italic-ϵ\eta=proj_{\infty}(\eta+\frac{\alpha}{\xi}\eta_{t},\ \epsilon)italic_η = italic_p italic_r italic_o italic_j start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ( italic_η + divide start_ARG italic_α end_ARG start_ARG italic_ξ end_ARG italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_ϵ )
         end if
     end for
end for

IV-B Image-agnostic Attack

Refer to caption
Figure 2: The architecture of the Adversarial Driving System. We tested our attacks in three environments: the Udacity Simulator, the ROS Gazebo Simulator, and a real Turtlebot 3.

Even small deviations may cause traffic accidents. A small deviation in the steering angle may, for example, result in a failure to steer around a sharp corner. In other words, even if the attack is not as strong as the image-specific attack it could still be perilous if applied at critical time points. Therefore, we introduce a white-box attack that generates a universal adversarial perturbation (UAP) [29] which can be used to attack all input images at different time steps. The image-agnostic attack combines the idea of DeepFool [30] and Projected Gradient Descent (PGD) [31]. The attack consists of two procedures: training and deployment. We first generate a UAP online or via a driving record and then deploy the UAP.

We first decide our target direction, that is, whether to attack the vehicle to the left (y<0𝑦0y<0italic_y < 0) or to the right (y>0𝑦0y>0italic_y > 0), and then choose the corresponding adversarial loss function (J𝚕𝚎𝚏𝚝(y)subscript𝐽𝚕𝚎𝚏𝚝𝑦J_{\texttt{left}}(y)italic_J start_POSTSUBSCRIPT left end_POSTSUBSCRIPT ( italic_y ) or J𝚛𝚒𝚐𝚑𝚝(y)subscript𝐽𝚛𝚒𝚐𝚑𝚝𝑦J_{\texttt{right}}(y)italic_J start_POSTSUBSCRIPT right end_POSTSUBSCRIPT ( italic_y )). The perturbation is initialized as zero. For each input image at each timestep, if the direction of the model output is not the same as the desired direction, we find the minimum perturbation that changes the sign of the model output to the desired direction.

To change the direction of the model output with the minimum perturbation, we calculate the gradient of the adversarial loss J(y)𝐽𝑦J(y)italic_J ( italic_y ) and then project the gradient to the L2subscript𝐿2L_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ball. The closed-form solution to the optimization problem argminηη2subscriptnorm𝜂superscript𝜂2\arg\min\ ||\ \eta-\eta^{{}^{\prime}}\ ||_{2}roman_arg roman_min | | italic_η - italic_η start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT with the constraint ηξnormsuperscript𝜂𝜉||\ \eta^{{}^{\prime}}\ ||\leq\xi| | italic_η start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT | | ≤ italic_ξ is given by

proj2(η,ξ)=ηmax{1,ηξ}=ηmin{1,ξη}𝑝𝑟𝑜subscript𝑗2𝜂𝜉𝜂1norm𝜂𝜉𝜂1𝜉norm𝜂proj_{2}(\eta,\ \xi)=\frac{\eta}{\max\{1,\frac{||\ \eta\ ||}{\xi}\}}=\eta\min% \{1,\frac{\xi}{||\ \eta\ ||}\}italic_p italic_r italic_o italic_j start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_η , italic_ξ ) = divide start_ARG italic_η end_ARG start_ARG roman_max { 1 , divide start_ARG | | italic_η | | end_ARG start_ARG italic_ξ end_ARG } end_ARG = italic_η roman_min { 1 , divide start_ARG italic_ξ end_ARG start_ARG | | italic_η | | end_ARG } (7)

which can be proved using the Lagrangian and the KKT conditions [32].

After applying the temporary perturbation ηtsubscript𝜂𝑡\eta_{t}italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT at timestep t𝑡titalic_t, if the direction of the model output matches the desired direction, we incorporate the temporary perturbation ηtsubscript𝜂𝑡\eta_{t}italic_η start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT to the overall perturbation η𝜂\etaitalic_η and then project η𝜂\etaitalic_η on the l2subscript𝑙2l_{2}italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ball centred at 0 and of radius ϵitalic-ϵ\epsilonitalic_ϵ to ensure that the constraint η2ϵsubscriptnormsuperscript𝜂2italic-ϵ||{\eta}^{{}^{\prime}}||_{2}\leq\epsilon| | italic_η start_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT end_POSTSUPERSCRIPT | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ italic_ϵ is satisfied. The attack is summarized in Algorithm 2. As can be seen, the attack uses a similar while loop as in DeepFool and the projection function introduced in the PGD attack.

IV-C System Architecture

The Robot Operating System (ROS) [33] is the most popular software framework in robotic research and applications. One example of an attack that injects malicious data into a running ROS application is the Stealth Publisher Attack [34]. We exploit the same vulnerability to inject adversarial perturbations into a running end-to-end driving ROS application.

We design an adversarial system to attack the end-to-end autonomous driving system (See Fig. 2). The system consists of three key components: the simulator, the server, and the Web User Interface (UI). The simulator publishes the image captured by the front camera to the server. Meanwhile, it accepts steering commands from the server to manipulate the vehicle. The modular design pattern makes it possible to conveniently replace the simulator with a real Turtlebot without breaking the whole system. The server receives input images from the simulator via WebSocket connections and then sends back the control commands. Meanwhile, it receives attack commands from the web browser and then injects the adversarial perturbation into the input image. The end-to-end driving model is deployed on the server as well. We use a website as a front-end where the attacker can monitor the status of the simulator and choose different attacks. The experimental results are presented in the next section.

V Experimental Results

This section first describes the training of the driving systems. Following this, we describe the performance of our proposed image-specific and image-agnostic attacks.

V-A Model Training

Our objective is to implement a real-time online attack against an end-to-end imitation learning model. Since it is risky to perform online attacks against real-world driving systems, we tested our attacks in self-driving simulators.

The target imitation learning models were trained from human driving records. In total, we collected 32k images of human driving records in our test environments: the Udacity Simulator (8k), the Gazebo Simulator (12k), and a real Turtlebot 3 (12k). We then trained one end-to-end driving model for each individual environment. The structure of the driving model is detailed in Table I. Our experiments showed that all three models were vulnerable to adversarial attacks.

In the following sections, we use the data from the Udacity Simulator to analyze the attack since experiments in this environment are easier to reproduce and examine than using a Turtlebot for other researchers. The experiments conducted using the Gazebo simulator are illustrated in the demo video.

Layer Output Shape Parameters
Input (None, 160, 320, 3) 0
Conv2D (None, 78, 158, 24) 1824
Conv2D (None, 37, 77, 36) 21636
Conv2D (None, 17, 37, 48) 43248
Conv2D (None, 15, 35, 64) 27712
Conv2D (None, 13, 13, 24) 36928
Dropout (None, 13, 13, 24) 0
Flatten (None, 27456) 0
Dense (None, 100) 2745700
Dense (None, 50) 5050
Dense (None, 10) 510
Dense (None, 1) 11
TABLE I: The structure of the end-to-end driving model.

V-B The Image-Specific Attack

To begin with, we demonstrate that applying random noise to the end-to-end driving model only results in very small deviations. The parameter ϵitalic-ϵ\epsilonitalic_ϵ is used to ensure that the total disturbance of the random noise is the same as from the image-specific attack. Specifically, note that the image-specific attack adds or subtracts ϵitalic-ϵ\epsilonitalic_ϵ from each pixel based on the sign of the gradient. Likewise, we construct a random noise perturbation that randomly adds or subtracts ϵitalic-ϵ\epsilonitalic_ϵ from each pixel.

Refer to caption
(a) The image-specific left attack decreases the steering angle.
Refer to caption
(b) The random noise perturbation barely deviates yadvsubscript𝑦𝑎𝑑𝑣y_{adv}italic_y start_POSTSUBSCRIPT italic_a italic_d italic_v end_POSTSUBSCRIPT from ytruesubscript𝑦𝑡𝑟𝑢𝑒y_{true}italic_y start_POSTSUBSCRIPT italic_t italic_r italic_u italic_e end_POSTSUBSCRIPT.
Refer to caption
(c) The image-specific right attack increases the steering angle.
Figure 3: The image-specific attack and random noise with the same strength (ϵ=1italic-ϵ1\epsilon=1italic_ϵ = 1).

In Fig. 3, we applied three different attacks that are of the same strength. Once under the image-specific attack, the vehicle drove off the road in several seconds. The image-specific left attack deviates the vehicle to the left by decreasing the steering angle, thus the yadvsubscript𝑦𝑎𝑑𝑣y_{adv}italic_y start_POSTSUBSCRIPT italic_a italic_d italic_v end_POSTSUBSCRIPT is smaller than ytruesubscript𝑦𝑡𝑟𝑢𝑒y_{true}italic_y start_POSTSUBSCRIPT italic_t italic_r italic_u italic_e end_POSTSUBSCRIPT in Fig. 2(a). On the other hand, the image-specific right attack deviates the vehicle to the right by increasing the steering angle, thus the yadvsubscript𝑦𝑎𝑑𝑣y_{adv}italic_y start_POSTSUBSCRIPT italic_a italic_d italic_v end_POSTSUBSCRIPT is greater than ytruesubscript𝑦𝑡𝑟𝑢𝑒y_{true}italic_y start_POSTSUBSCRIPT italic_t italic_r italic_u italic_e end_POSTSUBSCRIPT in Fig. 2(c). The random noise perturbation barely deviates yadvsubscript𝑦𝑎𝑑𝑣y_{adv}italic_y start_POSTSUBSCRIPT italic_a italic_d italic_v end_POSTSUBSCRIPT from ytruesubscript𝑦𝑡𝑟𝑢𝑒y_{true}italic_y start_POSTSUBSCRIPT italic_t italic_r italic_u italic_e end_POSTSUBSCRIPT, indicating that it has little effect on the driving model.

The image-specific attack achieved 20-30 FPS on an Intel i7-8665U CPU and 600-700 FPS on an NVIDIA RTX 2080Ti GPU. Since the CPU and GPU are also utilized for Udacity Simulator, the attack performance varies depending on the hardware temperature.

Further, we measured the MAD of the steering angle over 800 attacks. The results are shown in Table II. As can be seen, even the weakest image-specific attack (ϵ=0.1italic-ϵ0.1\epsilon=0.1italic_ϵ = 0.1) is much stronger than the strongest random noise attack (ϵ=8italic-ϵ8\epsilon=8italic_ϵ = 8). When ϵ=4italic-ϵ4\epsilon=4italic_ϵ = 4 and ϵ=8italic-ϵ8\epsilon=8italic_ϵ = 8, we can even deviate the steering angle outside of the range [1,1]11[-1,1][ - 1 , 1 ]. In other words, the image-specific attack is very strong. However, its weakness is that it needs to calculate the gradients of each individual input image. In a real-world scenario, we may not have access to the input image and gradients. Thus, we propose the image-agnostic attack that trains the perturbation using driving records and does not need access to the input and gradients during the deployment.

Attack Strength Random Noise Attack Image-Specific Attack
 ϵ=0.1italic-ϵ0.1\epsilon=0.1italic_ϵ = 0.1 0.0002 0.1448
 ϵ=1italic-ϵ1\epsilon=1italic_ϵ = 1 0.0020 0.4779
 ϵ=2italic-ϵ2\epsilon=2italic_ϵ = 2 0.0048 0.7329
 ϵ=4italic-ϵ4\epsilon=4italic_ϵ = 4 0.0150 1.4895
 ϵ=8italic-ϵ8\epsilon=8italic_ϵ = 8 0.0278 2.4469
TABLE II: The mean absolute deviation of the steering angle over 800 image-specific attacks.

V-C The Image-Agnostic Attack

In similarity with the image-specific attack, the strength of the image-agnostic attack was also compared with a random noise attack. The results are shown in Table III. Though the image-agnostic attack is weaker than the image-specific attack, it is still stronger than the random noise attack.

Attack Strength Random Noise Attack Image-Agnostic Attack
 ϵ=0.1italic-ϵ0.1\epsilon=0.1italic_ϵ = 0.1 0.0002 0.0373
 ϵ=1italic-ϵ1\epsilon=1italic_ϵ = 1 0.0020 0.1109
 ϵ=2italic-ϵ2\epsilon=2italic_ϵ = 2 0.0048 0.1294
 ϵ=4italic-ϵ4\epsilon=4italic_ϵ = 4 0.0150 0.1131
 ϵ=8italic-ϵ8\epsilon=8italic_ϵ = 8 0.0278 0.1275
TABLE III: The mean absolute deviation of the steering angle over 800 image-agnostic attacks (α=0.002,ξ=4formulae-sequence𝛼0.002𝜉4\alpha=0.002,\ \xi=4italic_α = 0.002 , italic_ξ = 4, n=500𝑛500n=500italic_n = 500).
Refer to caption
(a) Different α𝛼\alphaitalic_α with fixed ϵ=1,ξ=4formulae-sequenceitalic-ϵ1𝜉4\epsilon=1,\xi=4italic_ϵ = 1 , italic_ξ = 4.
Refer to caption
(b) Different ξ𝜉\xiitalic_ξ with fixed α=0.002𝛼0.002\alpha=0.002italic_α = 0.002
Figure 4: The mean deviation of the steering angle during the training process with different hyperparameters.

As seen in Table III, the strength of the image-agnostic attack does not improve after ϵ>2italic-ϵ2\epsilon>2italic_ϵ > 2. This is due to the limited generalizability of the perturbation. Increasing the strength of the attack further may increase the model output for some inputs but may equally well decrease the model for other inputs. Therefore, increasing ϵitalic-ϵ\epsilonitalic_ϵ further adds more variation to the model prediction while the MAD remains stable.

We also investigated the effect of the learning rate α𝛼\alphaitalic_α and the step size ξ𝜉\xiitalic_ξ on the training process (See Fig. 4). The learning rate α𝛼\alphaitalic_α controls the variation of the perturbation during the whole iteration. We tested different α𝛼\alphaitalic_α with fixed parameters ϵ=1italic-ϵ1\epsilon=1italic_ϵ = 1 and ξ=4𝜉4\xi=4italic_ξ = 4. As α𝛼\alphaitalic_α increases, the mean deviation increases faster. However, the iteration process also becomes more variable, and the mean deviation decreases after 100 steps when α>0.01𝛼0.01\alpha>0.01italic_α > 0.01.

The step size ξ𝜉\xiitalic_ξ decides how fast the perturbation is updated to change the model output to the desired direction for each input image x𝑥xitalic_x. A smaller ξ𝜉\xiitalic_ξ makes the update towards the target direction more steady, but the iteration takes a longer time. A larger ξ𝜉\xiitalic_ξ can change the direction of the model output in a single step, but the perturbation may not generalize well to other inputs.

As illustrated in Fig. 5, using the parameters α=0.0002𝛼0.0002\alpha=0.0002italic_α = 0.0002 and ξ=4𝜉4\xi=4italic_ξ = 4 enabled us to generate image-agnostic perturbations at ϵ=1italic-ϵ1\epsilon=1italic_ϵ = 1 that are comparable in performance with the image-agnostic attack at ϵ=0.1italic-ϵ0.1\epsilon=0.1italic_ϵ = 0.1. While the image-agnostic is not as strong as the image-specific attack, the image-agnostic attack makes the vehicle difficult to control at sharp corners (this is illustrated in the demo video), which could lead to incidents at some critical points.

In addition, the image-agnostic attack applies the same perturbation to all frames. Thus, the deployment of the image-agnostic attack is much more computationally efficient than the image-specific attack.

Refer to caption
(a) The image-agnostic Left Attack decreases the model output (yadv<0subscript𝑦𝑎𝑑𝑣0y_{adv}<0italic_y start_POSTSUBSCRIPT italic_a italic_d italic_v end_POSTSUBSCRIPT < 0), making it difficult to turn right. (ϵ=1)\epsilon=1)italic_ϵ = 1 )
Refer to caption
(b) The random noises barely deviates yadvsubscript𝑦𝑎𝑑𝑣y_{adv}italic_y start_POSTSUBSCRIPT italic_a italic_d italic_v end_POSTSUBSCRIPT from ytruesubscript𝑦𝑡𝑟𝑢𝑒y_{true}italic_y start_POSTSUBSCRIPT italic_t italic_r italic_u italic_e end_POSTSUBSCRIPT.
Refer to caption
(c) The image-agnostic Right Attack increases the model output (yadv>0subscript𝑦𝑎𝑑𝑣0y_{adv}>0italic_y start_POSTSUBSCRIPT italic_a italic_d italic_v end_POSTSUBSCRIPT > 0), making it difficult to turn left. (ϵ=1)\epsilon=1)italic_ϵ = 1 )
Figure 5: The Image-Agnostic attack (α=0.002𝛼0.002\alpha=0.002italic_α = 0.002, ξ=4𝜉4\xi=4italic_ξ = 4, n=500𝑛500n=500italic_n = 500) and random noises with the same strength (ϵ=1italic-ϵ1\epsilon=1italic_ϵ = 1).

VI Conclusions

This paper has demonstrated that it is possible to attack an end-to-end driving model in real-time. We devise a strong image-specific attack and a stealthy image-agnostic attack. Though the mean absolute deviation of the image-agnostic attack is smaller than the image-specific attack, both attacks are more effective than random noise attacks. The image-agnostic attack deviates the vehicle outside of the lane after just a few seconds, while the image-agnostic attack could cause incidents at sharp corners. These results provide new evidence of the vulnerability of safety-critical robotic applications.

References

  • [1] M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang et al., “End to end learning for self-driving cars,” arXiv preprint arXiv:1604.07316, 2016.
  • [2] A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun, “CARLA: An open urban driving simulator,” in Proceedings of the 1st Annual Conference on Robot Learning, 2017, pp. 1–16.
  • [3] S. Shah, D. Dey, C. Lovett, and A. Kapoor, “Airsim: High-fidelity visual and physical simulation for autonomous vehicles,” in Field and Service Robotics, 2017.
  • [4] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” arXiv preprint arXiv:1412.6572, 2014.
  • [5] E. Yurtsever, J. Lambert, A. Carballo, and K. Takeda, “A survey of autonomous driving: Common practices and emerging technologies,” IEEE Access, vol. 8, pp. 58 443–58 469, 2020.
  • [6] D. Chen and P. Krähenbühl, “Learning from all vehicles,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 17 222–17 231.
  • [7] R. Chopra and S. S. Roy, “End-to-end reinforcement learning for self-driving car,” in Advanced Computing and Intelligent Engineering, 2020, pp. 53–61.
  • [8] Ó. Pérez-Gil, R. Barea, E. López-Guillén, L. M. Bergasa, C. Gomez-Huelamo, R. Gutiérrez, and A. Diaz-Diaz, “Deep reinforcement learning based control for autonomous vehicles in carla,” Multimedia Tools and Applications, vol. 81, no. 3, pp. 3553–3576, 2022.
  • [9] J. Kabudian, M. Meybodi, and M. Homayounpour, “Applying continuous action reinforcement learning automata (carla) to global training of hidden markov models,” in International Conference on Information Technology: Coding and Computing (ITCC), vol. 2, 2004, pp. 638–642.
  • [10] Y. Jaafra, J. L. Laurent, A. Deruyver, and M. S. Naceur, “Seeking for robustness in reinforcement learning: application on carla simulator,” 2019.
  • [11] K. Chitta, A. Prakash, and A. Geiger, “Neat: Neural attention fields for end-to-end autonomous driving,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2021, pp. 15 793–15 803.
  • [12] A. Tampuu, T. Matiisen, M. Semikin, D. Fishman, and N. Muhammad, “A survey of end-to-end driving: Architectures and training methods,” IEEE Transactions on Neural Networks and Learning Systems, 2020.
  • [13] A. Prakash, K. Chitta, and A. Geiger, “Multi-modal fusion transformer for end-to-end autonomous driving,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 7077–7087.
  • [14] K. Chitta, A. Prakash, B. Jaeger, Z. Yu, K. Renz, and A. Geiger, “Transfuser: Imitation with transformer-based sensor fusion for autonomous driving,” Pattern Analysis and Machine Intelligence (PAMI), 2022.
  • [15] P. Wu, X. Jia, L. Chen, J. Yan, H. Li, and Y. Qiao, “Trajectory-guided control prediction for end-to-end autonomous driving: A simple yet strong baseline,” arXiv preprint arXiv:2206.08129, 2022.
  • [16] D. A. Pomerleau, “Alvinn: An autonomous land vehicle in a neural network,” in Advances in Neural Information Processing Systems, vol. 1, 1989.
  • [17] U. Muller, J. Ben, E. Cosatto, B. Flepp, and Y. Cun, “Off-road obstacle avoidance through end-to-end learning,” in Advances in Neural Information Processing Systems, vol. 18, 2006.
  • [18] Y. Li, M. Cheng, C.-J. Hsieh, and T. C. Lee, “A review of adversarial attack and defense for classification methods,” The American Statistician, vol. 76, no. 4, pp. 329–345, 2022.
  • [19] J. Zhang, Y. Lou, J. Wang, K. Wu, K. Lu, and X. Jia, “Evaluating adversarial attacks on driving safety in vision-based autonomous vehicles,” IEEE Internet of Things Journal, vol. 9, no. 5, pp. 3443–3456, 2021.
  • [20] A. Boloor, X. He, C. Gill, Y. Vorobeychik, and X. Zhang, “Simple physical adversarial examples against end-to-end autonomous driving models,” in Proceedings of the IEEE International Conference on Embedded Software and Systems (ICESS), 2019, pp. 1–7.
  • [21] Z. U. Abideen, M. A. Bute, S. Khalid, I. Ahmad, and R. Amin, “A3d: Physical adversarial attack on visual perception module of self-driving cars,” 2022.
  • [22] S. Villar, D. W. Hogg, N. Huang, Z. Martin, S. Wang, and G. Scanlon, “Adversarial attacks against linear and deep-learning regressions in astronomy,” in Proceedings of the 1st Annual Conference on Mathematical and Scientific Machine Learning, 2019.
  • [23] A. T. Nguyen and E. Raff, “Adversarial attacks, regression, and numerical stability regularization,” arXiv preprint arXiv:1812.02885, 2018.
  • [24] A. Boloor, K. Garimella, X. He, C. Gill, Y. Vorobeychik, and X. Zhang, “Attacking vision-based perception in end-to-end autonomous driving models,” Journal of Systems Architecture, vol. 110, p. 101766, 2020.
  • [25] Y. Deng, X. Zheng, T. Zhang, C. Chen, G. Lou, and M. Kim, “An analysis of adversarial attacks and defenses on autonomous driving models,” in IEEE International Conference on Pervasive Computing and Communications (PerCom), 2020, pp. 1–10.
  • [26] K. Ren, T. Zheng, Z. Qin, and X. Liu, “Adversarial attacks and defenses in deep learning,” Engineering, vol. 6, no. 3, pp. 346–360, 2020.
  • [27] K.-H. Chow, L. Liu, M. Loper, J. Bae, M. Emre Gursoy, S. Truex, W. Wei, and Y. Wu, “Adversarial objectness gradient attacks in real-time object detection systems,” in IEEE International Conference on Trust, Privacy and Security in Intelligent Systems, and Applications, 2020, pp. 263–272.
  • [28] M. Andriushchenko, F. Croce, N. Flammarion, and M. Hein, “Square attack: a query-efficient black-box adversarial attack via random search,” in Proceedings of the European Conference on Computer Vision (ECCV), 2020, pp. 484–501.
  • [29] S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard, “Universal adversarial perturbations,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 1765–1773.
  • [30] S.-M. Moosavi-Dezfooli, A. Fawzi, and P. Frossard, “Deepfool: a simple and accurate method to fool deep neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2574–2582.
  • [31] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” arXiv preprint arXiv:1706.06083, 2017.
  • [32] S. Boyd and L. Vandenberghe, Convex Optimization.   Cambridge University Press, 2004.
  • [33] S. Macenski, T. Foote, B. Gerkey, C. Lalancette, and W. Woodall, “Robot operating system 2: Design, architecture, and uses in the wild,” Science Robotics, vol. 7, no. 66, 2022.
  • [34] B. Dieber, R. White, S. Taurer, B. Breiling, G. Caiazza, H. Christensen, and A. Cortesi, “Penetration testing ros,” in Robot Operating System (ROS).   Springer, 2020, pp. 183–225.
==" alt="[LOGO]">