\jyear

2023

[1]\fnmAishan \surLiu

1]\orgnameBeihang University, \orgaddress\countryChina

2]\orgnameNational University of Singapore, \orgaddress\countrySingapore

Towards Robust Physical-world Backdoor Attacks on
Lane Detection

\fnmXinwei \surZhang [email protected]    [email protected]    \fnmTianyuan \surZhang [email protected]    \fnmSiyuan \surLiang [email protected]    \fnmXianglong \surLiu [email protected] [ [
Abstract

Deep learning-based lane detection (LD) plays a critical role in autonomous driving systems, such as adaptive cruise control. However, it is vulnerable to backdoor attacks. Existing backdoor attack methods on LD exhibit limited effectiveness in dynamic real-world scenarios, primarily because they fail to consider dynamic scene factors, including changes in driving perspectives (e.g., viewpoint transformations) and environmental conditions (e.g., weather or lighting changes). To tackle this issue, this paper introduces BadLANE, a dynamic scene adaptation backdoor attack for LD designed to withstand changes in real-world dynamic scene factors. To address the challenges posed by changing driving perspectives, we propose an amorphous trigger pattern composed of shapeless pixels. This trigger design allows the backdoor to be activated by various forms or shapes of mud spots or pollution on the road or lens, enabling adaptation to changes in vehicle observation viewpoints during driving. To mitigate the effects of environmental changes, we design a meta-learning framework to train meta-generators tailored to different environmental conditions. These generators produce meta-triggers that incorporate diverse environmental information, such as weather or lighting conditions, as the initialization of the trigger patterns for backdoor implantation, thus enabling adaptation to dynamic environments. Extensive experiments on various commonly used LD models in both digital and physical domains validate the effectiveness of our attacks, outperforming other baselines significantly (+25.15% on average in Attack Success Rate). Our codes will be available upon paper publication.

keywords:
Lane Detection, Backdoor Attack

1 Introduction

The advent of deep neural networks (DNNs) has precipitated a paradigm shift in the domain of autonomous driving cui2021deep ; mozaffari2020deep ; bachute2021autonomous , substantially increasing the perceptual and decision-making faculties of autonomous vehicles. Among them, lane detection (LD) plays an important role, enabling vehicles to discern road markings with high precision, thus forming subsequent decisions and control mechanisms essential for navigation and safety grigorescu2020survey ; kuutti2020survey ; muhammad2020deep .

Refer to caption
Figure 1: Illustration of our attack on LD scenarios (ground truth in blue, model prediction in red, and triggers in green). Our attacks can be activated by diverse forms of triggers (e.g., mud spots, lens pollution) in various driving perspectives and environmental changes (e.g., highlight, rain).

Unfortunately, recent studies have underscored the susceptibility of DNNs to backdoor attacks badnets ; wu2022backdoorbench ; li2022backdoor ; liang2023badclip ; liang2024poisoned ; liang2024vl , posing significant risks to the integrity and safety of autonomous driving systems. By training on a poisoned dataset, backdoor attacks enable attackers to manipulate model behavior through specific triggers during inference, thus challenging the reliability of deep learning applications. Although initial studies have shown the feasibility of backdoor attacks on LD models in simple static scenes han2022physical , it remains largely unexplored whether backdoor attacks remain effective in real-world dynamic scenes (e.g., complex weather conditions, and viewpoints). The existing disparity between the digital and physical world presents a significant challenge to applying these attacks to real-world LD scenarios. In the real-world LD scenarios, we posit that dynamic scenes pose strong challenges preventing a successful backdoor attack: ❶ Traditional backdoor attacks are designed around static imagery with invariant triggers, which clash with the ever-changing perspectives of moving vehicles. This complicates the execution of physical attacks. ❷ The variability of real-world environmental conditions, such as sunlight, shadows, obstacles, and weather, obstruct the effective activation of backdoor triggers. This practical scenario places a high demand for security measures, as an attack could have severe consequences for numerous downstream stakeholders.

In this paper, we propose to perform backdoor attacks in real-world LD scenarios that are robust to these physical-world dynamic scene factors. To address this issue, this paper presents BadLANE, a backdoor attack for the adaptation of dynamic scenes for LD that is resilient to changes in factors of dynamic scenes in the real world (as shown in Fig. 1). To address the variability of driving perspectives, we propose injecting backdoors using an amorphous pattern inspired by the natural occurrence of mud. This trigger is often represented in a shapeless pattern consisting of a cluster of pixels within a certain area (i.e., varying in position, shape, viewpoint, and size). The goal is to ensure that the backdoor triggers (potentially caused by mud spots or pollution on roads or camera lenses) can be easily activated in terms of different viewpoints since these shapeless patterns are often invariant and robust to perspective changes. Considering the challenges of variability under environmental conditions, we design a meta-learning framework finn2017model ; yuan2021meta ; li2017meta and reframe the concept of triggers as learning samples and the introduction of the backdoor as a novel task. Specifically, we train meta-generators tailored for diverse environmental conditions, which could produce meta-triggers enriched with diverse environmental factors (e.g., sunlight, shadows, rain). These meta-triggers will serve as the initialization for the amorphous trigger patterns so that we can implant backdoors robust to diverse environmental conditions with few trigger samples. In summary, our BadLANE employs a meta-learning framework to embed an amorphous trigger for backdoor injection, demonstrating adaptability to dynamic scene factors and achieving high backdoor attacking performance in real-world LD scenarios.

Based on our BadLANE attack, we initially introduce and delineate four attacking strategies specifically tailored to LD task: Lane Disappearance Attack (LDA), Lane Straightening Attack (LSA), Lane Rotation Attack (LRA), and Lane Offset Attack (LOA), which would result in different attacking consequences than the LD models. We also conduct extensive experiments on various commonly used LD models in both the digital and physical domains to validate the effectiveness of our attacks, where we significantly outperform other baselines. Our main contributions are:

  • We introduce a physically robust backdoor method BadLANE and design an amorphous trigger that can be activated by various forms/shapes of mud spots or pollution in the real world.

  • To ensure the adaptability of BadLANE to varying environmental conditions in the physical world, we developed a meta-learning framework to fuse diverse environmental information.

  • Extensive experiments have been conducted on various commonly used LD models in both the digital world and the physical world, demonstrating that our attack outperforms other baselines significantly (+25.15% on average in Attack Success Rate).

2 Related work

2.1 Lane Detection

Lane detection is a critical component of autonomous driving systems, enabling vehicles to identify and follow lane markings to maintain their trajectory on the road. It serves as a foundational technology for Advanced Driver Assistance Systems (ADAS) zakaria2023lane . Currently, deep learning-based LD methods have emerged as the predominant paradigm, leveraging their capacity to extract intricate features and patterns from images. They can be categorized mainly as follows: Anchor-based methods laneatt ; su2021structure ; zheng2022clrnet ; xiao2023adnet . These methods introduce the concept of anchors from object detection models liang2024object into LD task, using a predefined set of anchor points to identify and locate lane markings in the image. Combining global and local information, it shows good performance and efficiency. Row-wise classification methods qin2020ultra ; liu2021condlanenet ; qin2022ultra . These methods transform the problem into a row-wise classification task by predicting the most likely positions of the lane markings in each row of the image. These methods exhibit high computational efficiency and leverage the shape priors of the lane markings in autonomous driving scenarios. Parameterized curve-based methods tabelini2021polylanenet ; liu2021end ; feng2022rethinking . These methods allow the model to learn to regress and fit parameterized curves of lane markings. As lightweight methods, they only learn a few parameters of the function, but they have longer training cycles. Segmentation-based methods pan2018spatial ; neven2018towards ; hou2019learning ; xu2020curvelane ; zheng2021resa . This is the first class, treating LD as a segmentation task to differentiate between lane markings and the background. Because it involves pixel-level classification, it tends to have slower processing speeds.

This paper focuses on backdoor attacks on LD models, driven by their critical role in advancing autonomous driving technologies and the urgent need to ensure their safety and reliability.

2.2 Backdoor Attacks

Adversarial attacks liu2019perceptual ; liu2020bias ; liu2020spatiotemporal ; liu2023x ; liu2022harnessing ; liu2023towards ; liu2021training ; zhang2021interpreting ; liu2023exploring ; liang2021generate ; liang2020efficient ; wei2018transferable ; liang2022parallel ; liang2022large ; he2023generating ; liu2023improving and backdoor attacks are security threats liang2022imitated ; li2023privacy ; guo2023isolation ; li2024semantic in deep learning models li2022backdoor ; wu2022backdoorbench ; wang2022universal ; liang2024unlearning ; liu2023does . To achieve a backdoor attack, during the training process, adversaries inject triggers into the training set and implant backdoors in the model. During inference, the model behaves correctly on clean data. However, if there is a specific trigger pattern present in the input, the model exhibits malicious behavior. Existing research on backdoor attacks focuses mainly on image classification tasks in computer vision, aiming to establish a map** between trigger patterns and target labels. Gu et al. badnets introduced the first backdoor attack in deep learning using a patch-based trigger by poisoning some training samples. Chen et al. blended first discussed the requirement of invisibility of backdoor attacks by merging the image and trigger. Other methods include SIG sig based on sine signals, SSBA ssba based on sample-specific trigger inputs, and WANET wanet based on distortion, among others. For other tasks, Chan et al. baddet first proposed backdoor attacks for the object detection task, while Liu et al. liu2023pre proposed backdoor attacks at the pre-training model stage for different downstream tasks. In the context of the LD backdoor attack, Han et al. han2022physical proposed for the first time a physical backdoor attack for the LD task. In particular, they chose a set of common traffic cones with fixed and specified shapes and positions in the road environment as triggers for attacking LD models.

Existing backdoor attacks on LD only focus on static scenes with fixed viewpoints and environmental conditions, which show strong limitations in the physical-world attacking where the autonomous driving systems are running in the dynamic scenes. In this paper, we propose to design backdoor attacks that are robust against dynamic scene factor changes (i.e., changing driving perspective, and environmental conditions), which ensures the adaptability of backdoor attacks for physical-world LD scenarios.

3 Methodology

3.1 Problem Definition

Consider an LD model f𝑓fitalic_f, defined by its parameters θ𝜃\thetaitalic_θ, which processes an input image 𝒙𝒙\bm{x}bold_italic_x. This image is associated with true labels 𝒚=[l1,l2,,ln]𝒚subscript𝑙1subscript𝑙2subscript𝑙𝑛\bm{y}=[l_{1},l_{2},...,l_{n}]bold_italic_y = [ italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_l start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ], where n𝑛nitalic_n represents the total number of lanes depicted, and each lisubscript𝑙𝑖l_{i}italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT corresponds to the i𝑖iitalic_i-th lane, delineated as a series of points: li={p1,p2,,pm}subscript𝑙𝑖subscript𝑝1subscript𝑝2subscript𝑝𝑚l_{i}=\{p_{1},p_{2},...,p_{m}\}italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = { italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_p start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT }. Typically, the prediction target of the LD model is: fθ(𝒙)𝒚subscript𝑓𝜃𝒙𝒚f_{\theta}(\bm{x})\rightarrow\bm{y}italic_f start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_italic_x ) → bold_italic_y.

Our goal is to implant a backdoor in the training phase to get the LD model fθsubscript𝑓superscript𝜃f_{\theta^{\prime}}italic_f start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, which enables it to accurately predict lane boundaries for benign image input 𝒙𝒙\bm{x}bold_italic_x. However, when encountering images that contain a specific trigger 𝒕𝒕\bm{t}bold_italic_t, the model fθsubscript𝑓superscript𝜃f_{\theta^{\prime}}italic_f start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT is expected to erroneously predict the lane boundaries as fθ(𝒙+𝒕)𝒚subscript𝑓superscript𝜃𝒙𝒕superscript𝒚bold-′f_{\theta^{\prime}}(\bm{x+t})\rightarrow\bm{y^{\prime}}italic_f start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_x bold_+ bold_italic_t ) → bold_italic_y start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT, where the predicted lanes in 𝒚=[l1,l2,,ln]superscript𝒚bold-′subscriptsuperscript𝑙1subscriptsuperscript𝑙2subscriptsuperscript𝑙𝑛\bm{y^{\prime}}=[l^{\prime}_{1},l^{\prime}_{2},...,l^{\prime}_{n}]bold_italic_y start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT = [ italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] are deliberately altered by our attack strategy to achieve a specific malicious intent. It is assumed that the attacker has access only to the original training dataset 𝒟𝒟\mathcal{D}caligraphic_D and is capable of creating a poisoned dataset 𝒟superscript𝒟\mathcal{D^{\prime}}caligraphic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT through manipulation. The problem can be formulated as:

argminθ𝔼(𝒙𝒊,𝒚𝒊)𝒟[(fθ(𝒙𝒊),𝒚𝒊)],superscript𝜃subscript𝔼similar-tosuperscriptsubscript𝒙𝒊bold-′superscriptsubscript𝒚𝒊bold-′superscript𝒟delimited-[]subscript𝑓superscript𝜃superscriptsubscript𝒙𝒊bold-′superscriptsubscript𝒚𝒊bold-′\underset{\theta^{\prime}}{\arg\min}\ \mathbb{E}_{(\bm{x_{i}^{\prime}},\bm{y_{% i}^{\prime}})\sim\mathcal{D^{\prime}}}[\mathcal{L}(f_{\theta^{\prime}}(\bm{x_{% i}^{\prime}}),\bm{y_{i}^{\prime}})],start_UNDERACCENT italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_UNDERACCENT start_ARG roman_arg roman_min end_ARG blackboard_E start_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT bold_italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT , bold_italic_y start_POSTSUBSCRIPT bold_italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) ∼ caligraphic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT [ caligraphic_L ( italic_f start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT bold_italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) , bold_italic_y start_POSTSUBSCRIPT bold_italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT ) ] , (1)

where \mathcal{L}caligraphic_L is the training loss function for the LD model.

3.2 Attack Strategies

In the context of the LD task, particularly considering the potential hazards in autonomous driving scenarios, we introduce four quantifiable attack strategies. These formalized attack strategies facilitate a more precise and convincible evaluation of the effectiveness and robustness of backdoor attacks. As shown in Fig. 3.

Lane Disappearance Attack (LDA). The most straightforward strategy for lane attacks involves the complete removal of all lane boundaries within an image, thereby rendering the LD system inoperative. When an image containing a trigger is fed into the backdoor LD model, it fails to detect any lane boundaries. The specific transformation formula for the label is li=ϕ(i=1,2,,n)subscriptsuperscript𝑙𝑖italic-ϕ𝑖12𝑛l^{\prime}_{i}=\phi\,(i=1,2,\ldots,n)italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_ϕ ( italic_i = 1 , 2 , … , italic_n ), i.e., there are no points included in the lane.

Lane Straightening Attack (LSA). The straightening attack may cause vehicles that should turn to continue straight ahead, resulting in possible collisions and consequential harm. For each lane boundary in the image, a straight line parameter curve is fitted starting from the lane boundary’s starting position based on the slope of the line. Subsequently, the positions of lane points that deviate from this curve are modified to align with the straight line. The specific transformation formula for the labels is li={p1,,pk,pk+1,,pm}(i=1,2,,n)subscriptsuperscript𝑙𝑖subscript𝑝1subscript𝑝𝑘subscriptsuperscript𝑝𝑘1subscriptsuperscript𝑝𝑚𝑖12𝑛l^{\prime}_{i}=\{p_{1},...,p_{k},p^{\prime}_{k+1},...,p^{\prime}_{m}\}\,(i=1,2% ,\ldots,n)italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = { italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT , … , italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT } ( italic_i = 1 , 2 , … , italic_n ), where pk+1,,pmsubscriptsuperscript𝑝𝑘1subscriptsuperscript𝑝𝑚p^{\prime}_{k+1},...,p^{\prime}_{m}italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT , … , italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT are determined by the straight line parameter curve fitted by p1,,pksubscript𝑝1subscript𝑝𝑘p_{1},...,p_{k}italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT.

Lane Rotation Attack (LRA). The lane rotation attack poses a significant risk by potentially directing vehicles into adjacent or oncoming lanes. Given a rotation angle α𝛼\alphaitalic_α, for each lane boundary in the image, a curve equation is fitted using cubic spline interpolation. The curve is then rotated about its respective starting point, and the corresponding new horizontal coordinate values for the vertical coordinates in the label can be calculated. The specific transformation formula for the labels is li={p1,p2,,pm}(i=1,2,,n)subscriptsuperscript𝑙𝑖subscript𝑝1subscriptsuperscript𝑝2subscriptsuperscript𝑝𝑚𝑖12𝑛l^{\prime}_{i}=\{p_{1},p^{\prime}_{2},...,p^{\prime}_{m}\}\,(i=1,2,\ldots,n)italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = { italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT } ( italic_i = 1 , 2 , … , italic_n ), where p2,,pmsubscriptsuperscript𝑝2subscriptsuperscript𝑝𝑚p^{\prime}_{2},...,p^{\prime}_{m}italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT is determined by the equation: pjp1pj=αsubscriptsuperscript𝑝𝑗subscript𝑝1subscript𝑝𝑗𝛼\angle p^{\prime}_{j}p_{1}p_{j}=\alpha∠ italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_α.

Lane Offset Attack (LOA). A critical functionality of current lane-kee** assistance systems lies in maintaining the vehicle’s position centrally between two lane lines. If all lane positions output by the LD system are offset by several pixels β𝛽\betaitalic_β from the actual positions, it will cause the vehicle to deviate from the correct position. The specific transformation formula for the labels is li={p1,p2,,pm}(i=1,2,,n)subscriptsuperscript𝑙𝑖subscriptsuperscript𝑝1subscriptsuperscript𝑝2subscriptsuperscript𝑝𝑚𝑖12𝑛l^{\prime}_{i}=\{p^{\prime}_{1},p^{\prime}_{2},...,p^{\prime}_{m}\}\,(i=1,2,% \ldots,n)italic_l start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = { italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT } ( italic_i = 1 , 2 , … , italic_n ), where pj=pj+(β,0)subscriptsuperscript𝑝𝑗subscript𝑝𝑗𝛽0p^{\prime}_{j}=p_{j}+(\beta,0)italic_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + ( italic_β , 0 ) i.e., all lane points add a fixed value to the horizontal coordinates.

Note that for all aforementioned attack strategies, the lane points whose coordinates extend beyond the image bounds after transformation are discarded.

Refer to caption
Figure 2: Overall Framework. BadLANE employs an amorphous pattern for trigger design, which is extracted from various mud patterns and shaped with a mask generator. By utilizing them to construct meta-tasks, we introduce a meta-learning framework to generate meta-triggers that integrate diverse environmental information through sampling benign images.

3.3 Amorphous Pattern

Existing backdoor attacks are mostly designed based on the two-dimensional image with static observation perspectives. Such triggers are characterized by immutable patterns, viewpoints, sizes, and positions, and they suffer from limitations that hinder their application in the physical world. Our objective is to design a trigger capable of reliable activation under dynamic driving perspectives, unfettered by constraints related to position, shape, viewpoint, or size. Given the susceptibility of LD models to adversarial attacks leveraging dirty road conditions sato2021dirty and the vulnerability of DNNs to color-offset backdoor attacks jiang2023color , we introduce an amorphous pattern for trigger design. This trigger draws inspiration from the prevalent mud elements encountered in natural settings.

Specifically, to enhance the generalization of our trigger mechanism, we gather a comprehensive collection of mud patterns set \mathcal{M}caligraphic_M from the internet and real world, aiming to delineate the defining attributes of brown-colored pixels. From each pattern in \mathcal{M}caligraphic_M, we extract all values of brown pixels. In this way, we can create a color set 𝒞𝒞\mathcal{C}caligraphic_C comprising a variety of shades of brown with distinct RGB𝑅𝐺𝐵RGBitalic_R italic_G italic_B pixel attributes:

𝒞𝒞\displaystyle\mathcal{C}caligraphic_C ={(r,g,b)|IsBrown(r,g,b)}.absentconditional-set𝑟𝑔𝑏IsBrown𝑟𝑔𝑏\displaystyle=\{(r,g,b)\in\mathcal{M}\,|\,\text{IsBrown}(r,g,b)\}.= { ( italic_r , italic_g , italic_b ) ∈ caligraphic_M | IsBrown ( italic_r , italic_g , italic_b ) } . (2)

Concurrently, we develop an amorphous mask generator Gmsubscript𝐺𝑚G_{m}italic_G start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT to diversify the shapes of triggers, as illustrated in Fig. 2. Given that irregular-shaped masks can be approximated by polygons, we randomly generate combinations of line segments to endow them with irregular boundaries, and randomly remove some internal points to achieve a state of discretization. The pseudo-algorithm of Gmsubscript𝐺𝑚G_{m}italic_G start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT can be found in Supplementary Material. Given a rectangular area of size w×h𝑤w\times hitalic_w × italic_h, the Gmsubscript𝐺𝑚G_{m}italic_G start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT can generate an amorphous mask within a specified size. Our amorphous pattern 𝒕𝒕\bm{t}bold_italic_t can be formalized as:

𝒕𝒕\displaystyle\bm{t}bold_italic_t =i=1kpi={pi[(wi,hi),ci](wi,hi)Gm(w,h),ci𝒞},absentsuperscriptsubscript𝑖1𝑘subscript𝑝𝑖conditional-setsubscript𝑝𝑖subscript𝑤𝑖subscript𝑖subscript𝑐𝑖formulae-sequencesubscript𝑤𝑖subscript𝑖subscript𝐺𝑚𝑤subscript𝑐𝑖𝒞\displaystyle=\bigcup_{i=1}^{k}p_{i}=\{p_{i}[(w_{i},h_{i}),c_{i}]\mid(w_{i},h_% {i})\in G_{m}(w,h),\,c_{i}\in\mathcal{C}\},= ⋃ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = { italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ ( italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] ∣ ( italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∈ italic_G start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_w , italic_h ) , italic_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_C } , (3)
s.t.i,j{1,,k},ij(wi,hi)(wj,hj),formulae-sequences.t.for-all𝑖𝑗1𝑘𝑖𝑗subscript𝑤𝑖subscript𝑖subscript𝑤𝑗subscript𝑗\displaystyle\quad\quad\quad\quad\text{s.t.}\,\forall i,j\in\{1,\ldots,k\},i% \neq j\Rightarrow(w_{i},h_{i})\neq(w_{j},h_{j}),s.t. ∀ italic_i , italic_j ∈ { 1 , … , italic_k } , italic_i ≠ italic_j ⇒ ( italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ≠ ( italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ,

where pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT represents each brown pixel within the pattern and the quantity is k𝑘kitalic_k. For each image to be poisoned, we generate an amorphous pattern 𝒕𝒕\bm{t}bold_italic_t and add it to a random location to obtain the malicious image. Our goal is to calibrate the LD model to respond to a specific spectrum of brown pixels, activating the embedded backdoor upon detecting a predefined pixel count threshold from any observational angle. As demonstrated in Fig. 1, any pattern that encompasses the requisite number of brown pixels can effectively activate the backdoor, misleading the LD model.

3.4 Meta-trigger Generation

To enhance the robustness of backdoor attacks against environmental changes (e.g., various weather or lighting conditions), we introduce a meta-learning framework to train specific meta-generators tailored to different environmental conditions. These generators can produce meta-triggers that integrate diverse environmental factors through sampling benign images, as the initialization of the amorphous trigger patterns for backdoor implantation. In this way, we can implant backdoors robust to diverse environmental conditions with few trigger samples.

Meta-learning finn2017model ; nichol2018first ; li2017meta has attracted widespread attention in recent years for its potential to help models/samples learn better initialization states to more effectively complete new tasks finn2017model ; yuan2021meta . Its principles have been widely applied in various fields such as computer vision ravi2016optimization ; yuan2021meta ; yin2023generalizable , natural language processing jamal2019task ; lee2022meta , etc. Inspired by this, we reframe the concept of triggers as learning samples and the introduction of the backdoor as a novel task. Specifically, in the backdoor attack scenario, the meta-task is defined as follows: Given a benign image 𝒙𝒙\bm{x}bold_italic_x and an amorphous pattern trigger under a certain environmental condition 𝒕𝒆subscript𝒕𝒆\bm{t_{e}}bold_italic_t start_POSTSUBSCRIPT bold_italic_e end_POSTSUBSCRIPT, the goal is to learn a conditional generation model (called meta-generator) that produces a meta-trigger 𝒕𝒎subscript𝒕𝒎\bm{t_{m}}bold_italic_t start_POSTSUBSCRIPT bold_italic_m end_POSTSUBSCRIPT incorporating information from the trigger in that environment through sampling 𝒙𝒙\bm{x}bold_italic_x, as illustrated in Fig. 2. For a specific LD model type to attack, we utilize its feature extractor (backbone) from the model trained on clean dataset as the teacher model f^^𝑓\hat{f}over^ start_ARG italic_f end_ARG. By minimizing the feature distance of the f^^𝑓\hat{f}over^ start_ARG italic_f end_ARG between 𝒙+𝒕𝒎𝒙subscript𝒕𝒎\bm{x+t_{m}}bold_italic_x bold_+ bold_italic_t start_POSTSUBSCRIPT bold_italic_m end_POSTSUBSCRIPT and 𝒙+𝒕𝒆𝒙subscript𝒕𝒆\bm{x+t_{e}}bold_italic_x bold_+ bold_italic_t start_POSTSUBSCRIPT bold_italic_e end_POSTSUBSCRIPT, and maximizing the feature distance between 𝒙+𝒕𝒆𝒙subscript𝒕𝒆\bm{x+t_{e}}bold_italic_x bold_+ bold_italic_t start_POSTSUBSCRIPT bold_italic_e end_POSTSUBSCRIPT and 𝒙𝒙\bm{x}bold_italic_x, we update the parameters of the generator, which could be formulated as follows:

=f^(𝒙+𝒕𝒎)f^(𝒙+𝒕𝒆)22λf^(𝒙+𝒕𝒎)f^(𝒙)22,superscriptsubscriptnorm^𝑓𝒙subscript𝒕𝒎^𝑓𝒙subscript𝒕𝒆22𝜆superscriptsubscriptnorm^𝑓𝒙subscript𝒕𝒎^𝑓𝒙22\mathcal{L}=\|\hat{f}(\bm{x+t_{m}})-\hat{f}(\bm{x+t_{e}})\|_{2}^{2}-\lambda\|% \hat{f}(\bm{x+t_{m}})-\hat{f}(\bm{x})\|_{2}^{2},caligraphic_L = ∥ over^ start_ARG italic_f end_ARG ( bold_italic_x bold_+ bold_italic_t start_POSTSUBSCRIPT bold_italic_m end_POSTSUBSCRIPT ) - over^ start_ARG italic_f end_ARG ( bold_italic_x bold_+ bold_italic_t start_POSTSUBSCRIPT bold_italic_e end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_λ ∥ over^ start_ARG italic_f end_ARG ( bold_italic_x bold_+ bold_italic_t start_POSTSUBSCRIPT bold_italic_m end_POSTSUBSCRIPT ) - over^ start_ARG italic_f end_ARG ( bold_italic_x ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (4)

where λ𝜆\lambdaitalic_λ is the harmonic coefficient. Through learning a series of meta-tasks, the meta-generator eventually can generate a meta-trigger that incorporates various environment information.

Inspired by yin2023generalizable , we use conditional generative flow (c-Glow) lu2020structured as the meta-generator and let it capture the conditional distribution of the benign image and sample the 𝒕𝒎subscript𝒕𝒎\bm{t_{m}}bold_italic_t start_POSTSUBSCRIPT bold_italic_m end_POSTSUBSCRIPT, denoted as p(𝒕𝒎|𝒙,𝝋)𝑝conditionalsubscript𝒕𝒎𝒙𝝋p(\bm{t_{m}}|\bm{x},\bm{\varphi})italic_p ( bold_italic_t start_POSTSUBSCRIPT bold_italic_m end_POSTSUBSCRIPT | bold_italic_x , bold_italic_φ ), where 𝒕𝒎=G𝝋(z;𝒙)subscript𝒕𝒎subscript𝐺𝝋𝑧𝒙\bm{t_{m}}=G_{\bm{\varphi}}(z;\bm{x})bold_italic_t start_POSTSUBSCRIPT bold_italic_m end_POSTSUBSCRIPT = italic_G start_POSTSUBSCRIPT bold_italic_φ end_POSTSUBSCRIPT ( italic_z ; bold_italic_x ), G𝐺Gitalic_G is the generator with parameters 𝝋𝝋\bm{\varphi}bold_italic_φ and z𝑧zitalic_z represents a random vector following the Gaussian distribution. Given a set of tasks {𝒯i}i=1Nsubscriptsuperscriptsubscript𝒯𝑖𝑁𝑖1{\{\mathcal{T}_{i}\}}^{N}_{i=1}{ caligraphic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT, we adopt the batch approach of REPTILE nichol2018first for meta-learning. We select n𝑛nitalic_n tasks to create a batch and utilize Adam kingma2014adam to update the task-specific parameters ω𝜔\omegaitalic_ω times for each task 𝒯isubscript𝒯𝑖\mathcal{T}_{i}caligraphic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. The procedure for inner-loop optimization can be described as:

ϕ(𝒯i)=𝐀𝐝𝐚𝐦((𝒯i),𝝋,ω,μ),bold-italic-ϕsubscript𝒯𝑖𝐀𝐝𝐚𝐦subscript𝒯𝑖𝝋𝜔𝜇\bm{\phi}(\mathcal{T}_{i})=\mathbf{Adam}(\mathcal{L}(\mathit{\mathcal{T}_{i}})% ,\bm{\varphi},\omega,\mu),bold_italic_ϕ ( caligraphic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = bold_Adam ( caligraphic_L ( caligraphic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , bold_italic_φ , italic_ω , italic_μ ) , (5)

where ϕ(𝒯i)bold-italic-ϕsubscript𝒯𝑖\bm{\phi}(\mathcal{T}_{i})bold_italic_ϕ ( caligraphic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) represents the final task-specific parameters of the meta-generator G𝐺Gitalic_G after performing ω𝜔\omegaitalic_ω steps of Adam for task 𝒯isubscript𝒯𝑖\mathcal{T}_{i}caligraphic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, starting from 𝝋𝝋\bm{\varphi}bold_italic_φ. At each step of the ω𝜔\omegaitalic_ω, trigger information is sampled from the current conditional distribution. (𝒯i)subscript𝒯𝑖\mathcal{L}(\mathcal{T}_{i})caligraphic_L ( caligraphic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) denotes the loss of the i𝑖iitalic_i-th task and μ𝜇\muitalic_μ is the learning rate.

For outer loop optimization, we update the parameters 𝝋𝝋\bm{\varphi}bold_italic_φ using the generated task-specific parameters in a batch, with a learning rate γ𝛾\gammaitalic_γ. It can be written as follows:

𝝋=𝝋+γ1ni=1n(ϕ(𝒯i)𝝋).𝝋𝝋𝛾1𝑛subscriptsuperscript𝑛𝑖1bold-italic-ϕsubscript𝒯𝑖𝝋\bm{\varphi}=\bm{\varphi}+\gamma\frac{1}{n}\sum^{n}_{i=1}(\bm{\phi}(\mathcal{T% }_{i})-\bm{\varphi}).bold_italic_φ = bold_italic_φ + italic_γ divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT ( bold_italic_ϕ ( caligraphic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - bold_italic_φ ) . (6)

3.5 Overall Framework

Fig. 2 illustrates the overall framework of our BadLANE . To construct meta-tasks, we collect benign images from the training dataset 𝒟𝒟\mathcal{D}caligraphic_D and generate triggers using amorphous patterns under varied environmental conditions at a designated probability, including normal environment. Given 𝒟𝒟\mathcal{D}caligraphic_D and a poisoning rate p𝑝pitalic_p, we randomly select samples from 𝒟𝒟\mathcal{D}caligraphic_D to generate a set of malicious images 𝑿superscript𝑿bold-′\bm{X^{\prime}}bold_italic_X start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT for replacement. These images predominantly incorporate image-specific meta-triggers, produced by sampling each benign image with the meta-generator and placed randomly. They provide a better initialization state for the model to learn triggers in various environments. A small subset of 𝑿superscript𝑿bold-′\bm{X^{\prime}}bold_italic_X start_POSTSUPERSCRIPT bold_′ end_POSTSUPERSCRIPT contains amorphous pattern triggers under different environmental conditions, which guide the model in better adapting to new tasks based on the initialization state. Following adjustments to the labels using attack strategies outlined in Sec. 3.2, we compile the poisoned dataset 𝒟superscript𝒟\mathcal{D^{\prime}}caligraphic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. Training the LD model with 𝒟superscript𝒟\mathcal{D^{\prime}}caligraphic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT results in the implantation of the BadLANE backdoor. The overall pseudo-algorithm of our BadLANE can be found in Supplementary Material.

4 Experiments

4.1 Experimental Setup

Dataset. Our experiments are conducted on the most widely used LD dataset TuSimple tusimple . It consists of 3626 images in the training set and 2782 images in the test set, each with a resolution of 1280 ×\times× 720 pixels and containing a maximum of 5 lanes. We also evaluate on CULane dataset culane and we observe similar tendencies (results are shown in Supplementary Material).

Model Architectures. To fully evaluate the effectiveness of our method across different types of DNN-based LD models. Without loss of generality, we select four representative model architectures from various categories: LaneATT laneatt , UFLD v2 (Ultra Fast Lane Detection v2) qin2022ultra , PolyLaneNet tabelini2021polylanenet and RESA (Recurrent Feature-Shift Aggregator) zheng2021resa . A detailed introduction to these model architectures can be found in the Supplementary Material.

Backdoor Attack Baselines. Due to the particularity of autonomous driving scenarios, many backdoor attacks targeted at the digital world are not applicable, as they are unlikely to be deployed in the real world (such as Wanet nguyen2021wanet and Sample-specific li2021invisible that add perturbation overall the image). Therefore, we consider several representative methods: ❶ Fixed Patterns: BadNets badnets , it adds a fixed white pattern to the bottom right corner of the clean image. ❷ Fixed Images: Blended blended , it blends a fixed universal image as the trigger with a clean image. ❸ Real Objects: LD-Attack han2022physical , it uses common objects such as traffic cones in the physical world as triggers. These methods can be triggered in the physical world by printing patterns or placing actual objects.

Evaluation Metric. In image classification tasks, the effectiveness of backdoor attacks is typically evaluated using the Attack Success Rate (ASR) li2022backdoor . For LD task, LD-Attack han2022physical suggests using the rotation angle as a metric to quantify the performance of backdoor attacks. It does not apply to our proposed attack strategies, as the magnitude of the rotation angle is not a reliable measure of alignment with our predetermined attack objectives. A more effective backdoor should align more closely with our pre-set lane point coordinates, rather than simply having a larger rotation angle. Hence, we propose using the classical ASR based on LD task to assess the effectiveness of backdoor attacks. On Tusimple, ACC is commonly used as an evaluation metric to measure the performance of a model tusimple ; laneatt . Its calculation formula is ACC=ΣiCi/ΣiSi𝐴𝐶𝐶subscriptΣ𝑖subscript𝐶𝑖subscriptΣ𝑖subscript𝑆𝑖ACC=\Sigma_{i}C_{i}/\Sigma_{i}S_{i}italic_A italic_C italic_C = roman_Σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT / roman_Σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, where Cisubscript𝐶𝑖C_{i}italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT represents the number of correctly predicted lane points (mismatch distance between prediction and ground truth is within a certain threshold) and Sisubscript𝑆𝑖S_{i}italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT represents the total number of lane points in the ground truth for the i𝑖iitalic_i-th test image. The threshold is empirically set to 20 pixels. Similarly, for poisoned annotation labels, let Sisuperscriptsubscript𝑆𝑖S_{i}^{*}italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT represent the total number of lane points, and Cisuperscriptsubscript𝐶𝑖C_{i}^{*}italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT represent the number of correctly predicted lane points in the poisoned annotation. The ASR calculation is: ASR=ΣiCi/ΣiSi𝐴𝑆𝑅subscriptΣ𝑖superscriptsubscript𝐶𝑖subscriptΣ𝑖superscriptsubscript𝑆𝑖ASR=\Sigma_{i}C_{i}^{*}/\Sigma_{i}S_{i}^{*}italic_A italic_S italic_R = roman_Σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT / roman_Σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. For ACC and ASR, higher values of these metrics indicate better methods.

Implementation Details. For all experiments, we set the poisoning rate of backdoor attacks to 10%, and the size of the trigger is uniformly set to 900 pixels. Specifically, for BadNets and Blended, the trigger size is 30 ×\times× 30 pixels and set at the bottom right corner of the image; for LD-Attack, the area of traffic cones is 900 pixels and fixed on the middle-left lane. following han2022physical ; for our method, we randomly select 900 pixels within a 100 ×\times× 100 pixels square with random positions in the image. For all model architectures, we follow the parameter settings in their original papers. For the meta-generator training, we adopt the same architecture for the generator c-Glow following lu2020structured and it outputs meta-trigger with a size of 100×100100100100\times 100100 × 100 pixels. Before training, we pre-train the c-Glow to provide a better initial state. As for meta-training, we utilize the Tusimple training dataset and generate 10 meta-tasks for each image. Triggers in meta-tasks are randomly added environmental conditions with a probability of 0.15 for each type. The generator is trained for 5 epochs with a batch size of 16. The update step size ω𝜔\omegaitalic_ω of the inner optimization is set to 4. The learning rates of the inner and outer loops are set to μ=0.0003𝜇0.0003\mu=0.0003italic_μ = 0.0003 and γ=0.0006𝛾0.0006\gamma=0.0006italic_γ = 0.0006.

4.2 Comparison with Baseline Attacks

Table 1: Results (%) of different backdoor attack methods on different models under various dynamic scene factors using the LOA strategy. Our attack consistently achieves the highest attacking performance against different dynamic scene factors.
Model Attack ACC ASR ASR (Driving Perspective Changes) ASR (Environmental Conditions) ASR
Vanilla Infected Origin Position Shape Viewpoint Size Sunlight Shadow Rain Snow Average
LaneATT BadNets 95.78 94.97 93.88 53.44 84.23 85.59 62.88 76.82 52.29 81.47 71.39 73.55
Blended 94.93 93.32 52.63 88.68 75.38 56.94 54.40 65.62 78.68 66.05 70.19
LD-Attack 95.11 94.18 55.30 85.10 64.10 92.73 51.36 52.41 78.55 60.29 70.47
BadLANE 95.01 94.43 86.52 94.38 94.41 93.00 92.03 94.07 93.78 92.42 92.78
UFLD v2 BadNets 95.95 95.44 79.24 53.87 66.38 72.38 61.01 69.21 55.05 69.82 64.09 65.67
Blended 94.85 72.01 52.96 67.15 65.65 53.24 52.77 53.22 54.73 53.57 58.37
LD-Attack 95.91 94.93 56.82 79.54 58.32 93.75 50.83 61.99 82.56 58.99 70.86
BadLANE 95.70 94.72 71.49 94.62 94.69 93.70 94.34 94.70 94.06 92.93 91.69
PolyLaneNet BadNets 91.13 89.01 52.60 52.86 52.80 52.75 52.87 52.91 52.87 52.75 52.81 52.80
Blended 89.13 53.00 53.14 53.09 53.08 53.07 53.08 53.12 53.11 53.13 53.09
LD-Attack 89.46 89.15 61.21 78.22 62.10 88.31 52.41 53.39 64.51 56.02 67.25
BadLANE 89.04 86.65 78.14 85.11 85.79 70.16 85.04 88.26 85.90 85.09 83.35
RESA BadNets 96.77 96.62 95.69 52.95 80.52 85.75 56.92 61.85 64.79 94.27 89.79 75.84
Blended 96.65 91.66 52.95 80.26 70.29 53.46 52.98 53.22 69.94 58.55 64.81
LD-Attack 96.75 96.13 54.75 86.69 64.01 77.02 56.26 58.10 85.83 73.17 72.44
BadLANE 96.53 96.37 88.45 96.30 96.31 94.55 95.88 96.36 96.21 96.01 95.16

Evaluation methodology. To comprehensively simulate the real-world dynamic scenes in autonomous driving, we follow hendrycks2019benchmarking ; tang2023natural and consider eight typical dynamic scene factors for backdoor attack evaluation, including ❶ Driving perspective changes: position, shape, viewpoint, and size of the trigger, and ❷ common environmental conditions: sunlight, shadow, rain, and snow. Examples of BadLANE attacks under these dynamic scene factors are illustrated in Fig. 3. In our main experiment, we adopt the LOA strategy and set the offset pixels to 60.

Refer to caption
Figure 3: Visualization of different attack strategies. Our BadLANE can be activated by various forms/shapes of mud spots and is robust to dynamic scene factors.

Results. As shown in Tab. 1, we can draw some observations that: ❶ Traditional attack methods perform well in the static scene (shown in “Origin”). However, their ASRs drop sharply when driving perspectives or environmental conditions change. Especially when the trigger’s position or size changes, or when it encounters the sunlight and shadow environmental conditions. For example, for the LD-Attack method in LaneATT, its ASR sharply decreases when the position of the trigger change (-38.88%) or in sunlight environment (-42.82%). ❷ Our BadLANE attack consistently achieves the highest ASR in all dynamic cases, maintaining effectiveness in the face of various dynamic scene factors in the real world and outperforming other baselines significantly (+24.47% on average). Moreover, it turns out to be universally effective across various LD models. ❸ Our attack maintains high ACCs on clean samples comparable to uninfected models. It demonstrates the effectiveness of our attack on kee** the original functionality of the model. ❹ LaneATT and RESA architectures are particularly vulnerable to our BadLANE attacks, with their backdoor models achieving an average ASR nearly equivalent to the ACC. In comparison, the UFLD v2 and PolyLaneNet models exhibit a lower susceptibility to attacks, displaying a gap of over 4% between their average ASR and ACC.

Different Attack Strategies. We then evaluate the performance of different backdoor attacks using three other attack strategies i.e., LDA, LSA, and LRA. For LSA, we select images that include non-linear lanes for poisoning attacks; for LRA, we set the rotation angle to 4.5superscript4.54.5^{\circ}4.5 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT. The average ASR results under various dynamic scene factors of different backdoor attacks are shown in Tab. 2. We can identify that our attacks are effective under all attack strategies and significantly outperform traditional backdoor attack methods (+61.16% in LDA, +0.45% in LSA, and +14.53% in LRA on average). We also observe that LDA and LSA strategies are more easily executed, possibly due to the simplicity of their targets or a significant overlap in the backdoor model’s predictions between malicious images and benign images. In contrast, the LOA and LRA strategies present more challenging attacks but can still achieve a high ASR. As illustrated in Fig. 3, the LOA and LRA strategies pose substantial risks in autonomous driving scenarios, where deviations or rotations in lane lines can significantly alter a vehicle’s driving direction, potentially leading to accidents.

Table 2: Average ASR (%) under various dynamic scene factors with different attack strategies.
Strategy Attack LaneATT UFLD v2 PolyLaneNet RESA
LDA BadNets 28.23 54.99 9.21 45.78
Blended 21.26 32.31 5.25 42.22
LD-Attack 39.73 41.01 34.40 46.78
BadLANE 96.87 91.44 93.24 96.82
LSA BadNets 93.20 93.55 86.88 94.02
Blended 93.16 93.62 86.24 93.92
LD-Attack 93.24 93.84 87.30 94.51
BadLANE 93.32 94.62 86.98 94.72
LRA BadNets 72.28 69.43 60.23 81,94
Blended 68.21 69.82 61.44 74.19
LD-Attack 77.41 78.82 66.07 83.56
BadLANE 91.13 92.51 67.44 94.84

4.3 Attacks with Various Mud Trigger Patterns

In this part, we demonstrate the generalization of our BadLANE attack on different mud trigger patterns. Models implanted with this backdoor can be triggered not only by unstructured pixel sets but also by various forms/shapes of unseen mud spots or pollution, which facilitates the implementation of attacks in the physical world. To ensure the diversity and randomness of the mud patterns, we collect 10 images of mud patterns with different shapes from the internet and the real world, as shown in Fig. 4 (more images are shown in Supplementary Material). These patterns have distinct sizes, degrees of dispersion, and viewing angles. We add these mud patterns to benign images to obtain malicious images and test the infected models in Sec. 4.2 by BadLANE attack. Visualization examples are shown in Fig. 3. Note that, all these mud triggers have not been directly trained/seen during poisoning.

Refer to caption
Figure 4: Illustration of various forms/shapes of mud triggers.

The results are shown in Fig. 5, we can find that these different unseen mud patterns can effectively activate the backdoors implanted by our BadLANE attack. In some cases, the ASR is even higher than using unstructured pixel sets to attack (e.g., average ASR under various environmental conditions with LOA strategy in UFLD v2 +0.29% and in RESA +0.20%). This demonstrates the superior generalization and practicality of our method and can be effectively deployed in the physical world. Moreover, we can also observe that average ASR under different environmental conditions is generally higher than driving perspective changes across different models and strategies, indicating that the changes in driving perspective pose more challenges for attacking with our method, yet still perform better than traditional attack methods.

4.4 Ablation Studies

Refer to caption
Figure 5: Average ASR (%) of diverse forms/shapes mud trigger patterns under driving perspective changes (DPC) and various environmental conditions (EC).

We here ablate some factors that may influence the attacking ability of our BadLANE attack. All experiments are conducted using the LOA strategy with 60 offset pixels, unless otherwise specified.

Amorphous Trigger and Meta-Learning. We conduct ablation studies to understand the contributions of amorphous triggers and the meta-learning framework. Specifically, we employ different schemes to poison the dataset for training backdoored models: (1) BadLANE , using our attack approach; (2) (w/o) Meta, without utilizing meta-learning framework; (3) (w/o) Meta & Amo, without using meta-learning framework and amorphous pattern for trigger design. In contrast, we utilize a 30×30303030\times 3030 × 30 pixels patch composed of brown-colored pixels with the fixed position as the trigger. As shown in Tab. 3, we can draw several observations: ❶ Using Amo shows a significant improvement in average ASR under driving perspective changes (DPC), indicating that the amorphous pattern for trigger design technique enhances the attack’s robustness to perspective changes. ❷ Using Meta exhibits a notable increase in average ASR under different environmental conditions (EC), suggesting that meta-learning improves the attack’s robustness to environmental conditions. The findings corroborate our hypothesis that the amalgamation of both techniques yields optimal performance in dynamic scenarios, underscoring the significance of each component in the orchestration of the attack.

Table 3: Ablation studies on the amorphous trigger and meta-learning. Results show the average ASR (%) under driving perspective changes (DPC) and environment (EC) changes.
Method LaneATT UFLD v2 PolyLaneNet RESA
DPC EC DPC EC DPC EC DPC EC
BadLANE 92.54 93.07 89.84 94.01 81.17 86.07 94.39 96.11
(w/o) Meta 92.31 83.12 91.39 84.95 84.31 75.58 94.11 90.66
(w/o) Meta & Amo 71.25 79.33 72.02 69.84 70.87 70.47 73.63 80.99

Attack Parameters. For LOA and LRA attack strategies, we can flexibly choose the offset magnitude and rotation angle to achieve different levels of attack. To evaluate the impact of attack parameters on attack effectiveness, we select different offset pixels and rotation angles for these strategies. As shown in Fig. LABEL:fig:attack_para, we can observe that BadLANE attacks exhibit strong attack effectiveness for various settings of attack parameters, allowing for highly flexible specification of attack schemes. Visualizations are shown in Fig. 3 (more images are shown in Supplementary Material). Furthermore, we observe that for the LOA strategy, change in the number of offset pixels have a negligible impact on the attack effectiveness. In contrast, for the LRA strategy, an increase in the absolute value of the rotation angle leads to a weakening of the attack effect. We also find that the LRA strategy performs poorly on the PolyLaNet model. We speculate that this may be because the rotated lane lines require more complex polynomials for representation, making them more challenging to regress.

Poisoning Rates. We evaluate the effectiveness of BadLANE attack under different poisoning rates. For four LD models, we generate poisoned datasets and train backdoor models with poisoning rates of 1%, 3%, 5%, 10%, 15%, and 20%. As shown in Tab. 4, even at a low poisoning rate (e.g., 1%), our BadLANE can achieve a high ASR. Additionally, as the poisoning rate increases, the ASR continues to rise slowly, while the ACC gradually decreases.

Table 4: Ablation studies on the poisoning rates.
Poisoning Rate LaneATT UFLD v2 PolyLaneNet RESA
ACC ASR ACC ASR ACC ASR ACC ASR
1% 95.26 74.13 95.98 84.69 89.93 66.82 96.72 88.20
3% 95.10 86.27 95.90 88.57 88.19 76.37 96.70 92.86
5% 95.14 92.22 95.71 90.46 89.68 80.84 96.66 93.64
10% 95.01 92.78 95.70 91.69 89.04 83.35 96.53 95.16
15% 95.17 93.21 95.48 91.48 89.12 83.46 96.59 95.24
20% 94.94 93.45 94.53 91.81 88.15 83.73 95.96 94.88

5 Physical World Attacks

This section conducts experiments in the physical-world scenarios using a real-world Jetbot Vehicle JetBot , which is an open-sourced and commonly adopted robot based on the NVIDIA Jetson Nano chipset in the controlled lab experiment.

Vehicle setup. The Jetbot vehicle system employs the Robot Operating System and adopts a layered chip architecture with the Jetson Nano as the core. The system achieves autonomous driving tasks through the collaboration of three main modules: motion, perception, and computation. These modules provide user-friendly Python interfaces for direct control of vehicle actions, enabling vehicle movement control via the LD model. Specifically, the camera in the perception module captures front road images, which are transmitted to the LD model in the computation module for prediction. Utilizing the processed lane line coordinate data, we have written a simple control program to influence the vehicle’s motion, ensuring it remains centered between the two lane lines and advances according to the lane lines’ direction.

Refer to caption
Figure 7: Illustration of BadLANE attack in physical-world. (c) and (d) exhibit two distinct forms of triggers; (e) and (f) show different driving perspectives and lighting environments.

Evaluation methodology. All experiments in this section are conducted in a controlled laboratory environment (indoor real-world sandbox), as shown in Fig. 7 (a). In the vehicle’s computational module, we employ the LaneATT model with implanted BadLANE backdoor, controlling the vehicle’s movement. We select a fixed straight road segment for evaluation and consider three scenarios: (1) clean road and camera lens; (2) placement of stickers with various forms/shapes of mud patterns as visual triggers on the sandbox lanes; (3) minor pollution of the camera lens (area not to exceed 10% of the surface). As illustrated in Fig. 7 (b), (c) and (d). In normally, the vehicle is expected to drive straight through the endpoint. If the vehicle deviates from its lane during driving, the attack is considered successful; otherwise, it is deemed a failure.

Experimental settings. The vehicle’s speed is set at 1 km/h. Five different mud pattern stickers (210mm ×\times× 297 mm) are used as visual triggers, randomly placed at any position on the road segment. Multiple experimental cases are designed under different driving perspectives and lighting conditions. Specifically, during indoors daytime conditions (approximately 100 lux illuminance), two different driving perspectives are considered: case 1: horizontal-view (the camera is positioned parallel to the ground surface) and case 2: downward-view at a 30superscript3030^{\circ}30 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT angle. In addition, a lighting condition is also tested: case 3: a highlight environment (approximately 1000 lux) is set with horizontal-view. Each experimental cases is repeated 20 times to ensure the stability of the results. A total of 180 test cases are conducted for the three scenarios.

Results and analyses. The illustration of three experimental cases can befound in Fig. 7 (d), (e) and (f), and more visualizations are provided in Supplementary Material. In the clean road scene, the attack success rates (ASRs) for case 1, case 2, and case 3 are 5%, 5%, and 10%, respectively. In scenarios with mud pattern or lens pollution visual triggers, the ASRs for case 1, case 2, and case 3 are 95%, 90%, 90% and 85%, 85%, 75%, respectively. The experimental results demonstrate that our BadLANE method is not only effective in real-world scenarios but also exhibits remarkable robustness across different driving perspectives and lighting conditions.

6 Countermeasures

To evaluate the performance of BadLANE method against backdoor defenses, we consider and assess various types of popular defense methods. Unfortunately, most existing backdoor defense methods are designed for classification tasks and may not directly apply to LD task. Therefore, we employ two common defense strategies applicable to this task. We conduct experiments using the backdoor LaneATT model with the LOA strategy in Sec. 4.2.

Fine-Tuning. We set the learning rate to 0.0001 and finetune the backdoor model on the clean dataset. After 25 and 50 epochs, the ASR decreased by 3.45% and 7.41% respectively, indicating that fine-tuning has some mitigating effect on our attack, but cannot eliminate it. ❷ Pruning. We select the last convolutional layer in the model backbone for pruning, with a total of 512 neurons. We start from 0 with a step size of 25. As shown in Tab. 5, we observe that pruning a small number of neurons does not affect the backdoor while pruning more neurons causes the model’s performance on clean samples to degrade faster. This indicates that pruning is somewhat ineffective against our attack.

To sum up, our results indicate that pruning fails to detect our attack, whereas fine-tuning provides certain protection effects.

Table 5: Defense results (%) of neuron pruning.
Num 0 25 50 75 100 125 150 175 200
ACC 95.48 92.19 90.59 90.36 79.29 30.10 8.10 6.00 0
ASR 94.45 93.39 92.49 92.30 83.86 62.44 32.99 31.03 6.98

7 Conclusion

In this paper, we propose a backdoor attack BadLANE for LD, which is robust to changes in physical-world dynamic scene factors. BadLANE employs an amorphous pattern for trigger design, which can be activated by various forms/shapes of mud spots. Additionally, a meta-learning framework is introduced to generate meta-triggers that integrate diverse environmental information through sampling benign images. Through our evaluation, BadLANE demonstrates outstanding effectiveness and robustness in both digital and physical domains, significantly outperforming other baselines.

Limitations. Despite promising results, several directions warrant further exploration. ❶ The backdoor injected by BadLANE may be mitigated to some extent after fine-tuning. Our future work aims to enhance the stability and robustness of our injected backdoor against fine-tuning defenses. ❷ Meta-triggers have relatively obvious patterns. Our goal is to further improve the stealthiness during poisoning. Ethical Statement. In this paper, we propose BadLANE to reveal a severe threat in the scenario of LD in the real world that is trained using third-party datasets. To mitigate the attack, we propose preliminary countermeasures for mitigation.

References

  • [1] Yaodong Cui, Ren Chen, Wenbo Chu, Long Chen, Daxin Tian, Ying Li, and Dongpu Cao. Deep learning for image and point cloud fusion in autonomous driving: A review. IEEE Transactions on Intelligent Transportation Systems, 23(2):722–739, 2021.
  • [2] Sajjad Mozaffari, Omar Y Al-Jarrah, Mehrdad Dianati, Paul Jennings, and Alexandros Mouzakitis. Deep learning-based vehicle behavior prediction for autonomous driving applications: A review. IEEE Transactions on Intelligent Transportation Systems, 23(1):33–47, 2020.
  • [3] Mrinal R Bachute and Javed M Subhedar. Autonomous driving architectures: insights of machine learning and deep learning algorithms. Machine Learning with Applications, 6:100164, 2021.
  • [4] Sorin Grigorescu, Bogdan Trasnea, Tiberiu Cocias, and Gigel Macesanu. A survey of deep learning techniques for autonomous driving. Journal of field robotics, 37(3):362–386, 2020.
  • [5] Sampo Kuutti, Richard Bowden, Yaochu **, Phil Barber, and Saber Fallah. A survey of deep learning applications to autonomous vehicle control. IEEE Transactions on Intelligent Transportation Systems, 22(2):712–733, 2020.
  • [6] Khan Muhammad, Amin Ullah, Jaime Lloret, Javier Del Ser, and Victor Hugo C de Albuquerque. Deep learning for safe autonomous driving: Current challenges and future directions. IEEE Transactions on Intelligent Transportation Systems, 22(7):4316–4336, 2020.
  • [7] Tianyu Gu, Kang Liu, Brendan Dolan-Gavitt, and Siddharth Garg. Badnets: Evaluating backdooring attacks on deep neural networks. IEEE Access, 7:47230–47244, 2019.
  • [8] Baoyuan Wu, Hongrui Chen, Mingda Zhang, Zihao Zhu, Shaokui Wei, Danni Yuan, and Chao Shen. Backdoorbench: A comprehensive benchmark of backdoor learning. Advances in Neural Information Processing Systems, 35:10546–10559, 2022.
  • [9] Yiming Li, Yong Jiang, Zhifeng Li, and Shu-Tao Xia. Backdoor learning: A survey. IEEE Transactions on Neural Networks and Learning Systems, 2022.
  • [10] Siyuan Liang, Mingli Zhu, Aishan Liu, Baoyuan Wu, Xiaochun Cao, and Ee-Chien Chang. Badclip: Dual-embedding guided backdoor attack on multimodal contrastive learning. arXiv preprint arXiv:2311.12075, 2023.
  • [11] Jiawei Liang, Siyuan Liang, Aishan Liu, Xiaojun Jia, Junhao Kuang, and Xiaochun Cao. Poisoned forgery face: Towards backdoor attacks on face forgery detection. arXiv preprint arXiv:2402.11473, 2024.
  • [12] Jiawei Liang, Siyuan Liang, Man Luo, Aishan Liu, Dongchen Han, Ee-Chien Chang, and Xiaochun Cao. Vl-trojan: Multimodal instruction backdoor attacks against autoregressive visual language models. arXiv preprint arXiv:2402.13851, 2024.
  • [13] Xingshuo Han, Guowen Xu, Yuan Zhou, Xuehuan Yang, Jiwei Li, and Tianwei Zhang. Physical backdoor attacks to lane detection systems in autonomous driving. In Proceedings of the 30th ACM International Conference on Multimedia, pages 2957–2968, 2022.
  • [14] Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning, pages 1126–1135. PMLR, 2017.
  • [15] Zheng Yuan, Jie Zhang, Yunpei Jia, Chuanqi Tan, Tao Xue, and Shiguang Shan. Meta gradient adversarial attack. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7748–7757, 2021.
  • [16] Zhenguo Li, Fengwei Zhou, Fei Chen, and Hang Li. Meta-sgd: Learning to learn quickly for few-shot learning. arXiv preprint arXiv:1707.09835, 2017.
  • [17] Noor Jannah Zakaria, Mohd Ibrahim Shapiai, Rasli Abd Ghani, Mohd Najib Mohd Yassin, Mohd Zamri Ibrahim, and Nurbaiti Wahid. Lane detection in autonomous vehicles: A systematic review. IEEE access, 11:3729–3765, 2023.
  • [18] Lucas Tabelini, Rodrigo Berriel, Thiago M Paixao, Claudine Badue, Alberto F De Souza, and Thiago Oliveira-Santos. Keep your eyes on the lane: Real-time attention-guided lane detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 294–302, 2021.
  • [19] **ming Su, Chao Chen, Ke Zhang, Junfeng Luo, Xiaoming Wei, and Xiaolin Wei. Structure guided lane detection. arXiv preprint arXiv:2105.05403, 2021.
  • [20] Tu Zheng, Yifei Huang, Yang Liu, Wenjian Tang, Zheng Yang, Deng Cai, and Xiaofei He. Clrnet: Cross layer refinement network for lane detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 898–907, 2022.
  • [21] Lingyu Xiao, Xiang Li, Sen Yang, and Wankou Yang. Adnet: Lane shape prediction via anchor decomposition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6404–6413, 2023.
  • [22] Siyuan Liang, Wei Wang, Ruoyu Chen, Aishan Liu, Boxi Wu, Ee-Chien Chang, Xiaochun Cao, and Dacheng Tao. Object detectors in the open environment: Challenges, solutions, and outlook. arXiv preprint arXiv:2403.16271, 2024.
  • [23] Zequn Qin, Huanyu Wang, and Xi Li. Ultra fast structure-aware deep lane detection. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIV 16, pages 276–291. Springer, 2020.
  • [24] Lizhe Liu, Xiaohao Chen, Siyu Zhu, and ** Tan. Condlanenet: a top-to-down lane detection framework based on conditional convolution. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3773–3782, 2021.
  • [25] Zequn Qin, Pengyi Zhang, and Xi Li. Ultra fast deep lane detection with hybrid anchor driven ordinal classification. IEEE transactions on pattern analysis and machine intelligence, 2022.
  • [26] Lucas Tabelini, Rodrigo Berriel, Thiago M Paixao, Claudine Badue, Alberto F De Souza, and Thiago Oliveira-Santos. Polylanenet: Lane estimation via deep polynomial regression. In 2020 25th International Conference on Pattern Recognition (ICPR), pages 6150–6156. IEEE, 2021.
  • [27] Rui** Liu, Zejian Yuan, Tie Liu, and Zhiliang Xiong. End-to-end lane shape prediction with transformers. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pages 3694–3702, 2021.
  • [28] Zhengyang Feng, Shaohua Guo, Xin Tan, Ke Xu, Min Wang, and Lizhuang Ma. Rethinking efficient lane detection via curve modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17062–17070, 2022.
  • [29] ** Shi, ** Luo, Xiaogang Wang, and Xiaoou Tang. Spatial as deep: Spatial cnn for traffic scene understanding. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
  • [30] Davy Neven, Bert De Brabandere, Stamatios Georgoulis, Marc Proesmans, and Luc Van Gool. Towards end-to-end lane detection: an instance segmentation approach. In 2018 IEEE intelligent vehicles symposium (IV), pages 286–291. IEEE, 2018.
  • [31] Yuenan Hou, Zheng Ma, Chunxiao Liu, and Chen Change Loy. Learning lightweight lane detection cnns by self attention distillation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1013–1021, 2019.
  • [32] Hang Xu, Shaoju Wang, Xinyue Cai, Wei Zhang, Xiaodan Liang, and Zhenguo Li. Curvelane-nas: Unifying lane-sensitive architecture search and adaptive point blending. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XV 16, pages 689–704. Springer, 2020.
  • [33] Tu Zheng, Hao Fang, Yi Zhang, Wenjian Tang, Zheng Yang, Haifeng Liu, and Deng Cai. Resa: Recurrent feature-shift aggregator for lane detection. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 3547–3554, 2021.
  • [34] Aishan Liu, Xianglong Liu, Jiaxin Fan, Yuqing Ma, Anlan Zhang, Huiyuan Xie, and Dacheng Tao. Perceptual-sensitive gan for generating adversarial patches. In AAAI, 2019.
  • [35] Aishan Liu, Jiakai Wang, Xianglong Liu, Bowen Cao, Chongzhi Zhang, and Hang Yu. Bias-based universal adversarial patch attack for automatic check-out. In ECCV, 2020.
  • [36] Aishan Liu, Tairan Huang, Xianglong Liu, Yitao Xu, Yuqing Ma, Xinyun Chen, Stephen J Maybank, and Dacheng Tao. Spatiotemporal attacks for embodied agents. In ECCV, 2020.
  • [37] Aishan Liu, Jun Guo, Jiakai Wang, Siyuan Liang, Renshuai Tao, Wenbo Zhou, Cong Liu, Xianglong Liu, and Dacheng Tao. {{\{{X-Adv}}\}}: Physical adversarial object attacks against x-ray prohibited item detection. In 32nd USENIX Security Symposium (USENIX Security 23), 2023.
  • [38] Shunchang Liu, Jiakai Wang, Aishan Liu, Yingwei Li, Yijie Gao, Xianglong Liu, and Dacheng Tao. Harnessing perceptual adversarial patches for crowd counting. In ACM CCS, 2022.
  • [39] Aishan Liu, Shiyu Tang, Xinyun Chen, Lei Huang, Haotong Qin, Xianglong Liu, and Dacheng Tao. Towards defending multiple lp-norm bounded adversarial perturbations via gated batch normalization. International Journal of Computer Vision, 2023.
  • [40] Aishan Liu, Xianglong Liu, Hang Yu, Chongzhi Zhang, Qiang Liu, and Dacheng Tao. Training robust deep neural networks via adversarial noise propagation. TIP, 2021.
  • [41] Chongzhi Zhang, Aishan Liu, Xianglong Liu, Yitao Xu, Hang Yu, Yuqing Ma, and Tianlin Li. Interpreting and improving adversarial robustness of deep neural networks with neuron sensitivity. IEEE Transactions on Image Processing, 2021.
  • [42] Aishan Liu, Shiyu Tang, Siyuan Liang, Ruihao Gong, Boxi Wu, Xianglong Liu, and Dacheng Tao. Exploring the relationship between architectural design and adversarially robust generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023.
  • [43] Siyuan Liang, Xingxing Wei, and Xiaochun Cao. Generate more imperceptible adversarial examples for object detection. In ICML 2021 Workshop on Adversarial Machine Learning, 2021.
  • [44] Siyuan Liang, Xingxing Wei, Siyuan Yao, and Xiaochun Cao. Efficient adversarial attacks for visual object tracking. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVI 16, 2020.
  • [45] Xingxing Wei, Siyuan Liang, Ning Chen, and Xiaochun Cao. Transferable adversarial attacks for image and video object detection. arXiv preprint arXiv:1811.12641, 2018.
  • [46] Siyuan Liang, Baoyuan Wu, Yanbo Fan, Xingxing Wei, and Xiaochun Cao. Parallel rectangle flip attack: A query-based black-box attack against object detection. arXiv preprint arXiv:2201.08970, 2022.
  • [47] Siyuan Liang, Longkang Li, Yanbo Fan, Xiaojun Jia, **gzhi Li, Baoyuan Wu, and Xiaochun Cao. A large-scale multiple-objective method for black-box attack against object detection. In European Conference on Computer Vision, 2022.
  • [48] Bangyan He, Jian Liu, Yiming Li, Siyuan Liang, **gzhi Li, Xiaojun Jia, and Xiaochun Cao. Generating transferable 3d adversarial point cloud via random perturbation factorization. In Proceedings of the AAAI Conference on Artificial Intelligence, 2023.
  • [49] Jiayang Liu, Siyu Zhu, Siyuan Liang, Jie Zhang, Han Fang, Weiming Zhang, and Ee-Chien Chang. Improving adversarial transferability by stable diffusion. arXiv preprint arXiv:2311.11017, 2023.
  • [50] Siyuan Liang, Aishan Liu, Jiawei Liang, Longkang Li, Yang Bai, and Xiaochun Cao. Imitated detectors: Stealing knowledge of black-box object detectors. In Proceedings of the 30th ACM International Conference on Multimedia, pages 4839–4847, 2022.
  • [51] Privacy enhancing face obfuscation guided by semantic-aware attribution maps. Privacy-enhancing face obfuscation guided by semantic-aware attribution maps. IEEE Transactions on Information Forensics and Security, 2023.
  • [52] Jun Guo, Xingyu Zheng, Aishan Liu, Siyuan Liang, Yisong Xiao, Yichao Wu, and Xianglong Liu. Isolation and induction: Training robust deep neural networks against model stealing attacks. In Proceedings of the 31st ACM International Conference on Multimedia, 2023.
  • [53] Xiaoxia Li, Siyuan Liang, Jiyi Zhang, Han Fang, Aishan Liu, and Ee-Chien Chang. Semantic mirror jailbreak: Genetic algorithm based jailbreak prompts against open-source llms. arXiv preprint arXiv:2402.14872, 2024.
  • [54] Yuhang Wang, Huafeng Shi, Rui Min, Ruijia Wu, Siyuan Liang, Yichao Wu, Ding Liang, and Aishan Liu. Universal backdoor attacks detection via adaptive adversarial probe. arXiv preprint arXiv:2209.05244, 2022.
  • [55] Siyuan Liang, Kuanrong Liu, Jiajun Gong, Jiawei Liang, Yuan Xun, Ee-Chien Chang, and Xiaochun Cao. Unlearning backdoor threats: Enhancing backdoor defense in multimodal contrastive learning via local token unlearning. arXiv preprint arXiv:2403.16257, 2024.
  • [56] Xinwei Liu, Xiaojun Jia, **dong Gu, Yuan Xun, Siyuan Liang, and Xiaochun Cao. Does few-shot learning suffer from backdoor attacks? arXiv preprint arXiv:2401.01377, 2023.
  • [57] Xinyun Chen, Chang Liu, Bo Li, Kimberly Lu, and Dawn Song. Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526, 2017.
  • [58] Mauro Barni, Kassem Kallas, and Benedetta Tondi. A new backdoor attack in cnns by training set corruption without label poisoning. In 2019 IEEE International Conference on Image Processing (ICIP), pages 101–105. IEEE, 2019.
  • [59] Yuezun Li, Yiming Li, Baoyuan Wu, Longkang Li, Ran He, and Siwei Lyu. Invisible backdoor attack with sample-specific triggers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 16463–16472, 2021.
  • [60] Anh Nguyen and Anh Tran. Wanet–imperceptible war**-based backdoor attack. arXiv preprint arXiv:2102.10369, 2021.
  • [61] Shih-Han Chan, Yinpeng Dong, Jun Zhu, Xiaolu Zhang, and Jun Zhou. Baddet: Backdoor attacks on object detection. In European Conference on Computer Vision, pages 396–412. Springer, 2022.
  • [62] Aishan Liu, Xinwei Zhang, Yisong Xiao, Yuguang Zhou, Siyuan Liang, Jiakai Wang, Xianglong Liu, Xiaochun Cao, and Dacheng Tao. Pre-trained trojan attacks for visual recognition. arXiv preprint arXiv:2312.15172, 2023.
  • [63] Takami Sato, Junjie Shen, Ningfei Wang, Yunhan Jia, Xue Lin, and Qi Alfred Chen. Dirty road can attack: Security of deep learning based automated lane centering under {{\{{Physical-World}}\}} attack. In 30th USENIX security symposium (USENIX Security 21), pages 3309–3326, 2021.
  • [64] Wenbo Jiang, Hongwei Li, Guowen Xu, and Tianwei Zhang. Color backdoor: A robust poisoning attack in color space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8133–8142, 2023.
  • [65] Alex Nichol, Joshua Achiam, and John Schulman. On first-order meta-learning algorithms. arXiv preprint arXiv:1803.02999, 2018.
  • [66] Sachin Ravi and Hugo Larochelle. Optimization as a model for few-shot learning. In International conference on learning representations, 2016.
  • [67] Fei Yin, Yong Zhang, Baoyuan Wu, Yan Feng, **gyi Zhang, Yanbo Fan, and Yujiu Yang. Generalizable black-box adversarial attack with meta learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
  • [68] Muhammad Abdullah Jamal and Guo-Jun Qi. Task agnostic meta-learning for few-shot learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11719–11727, 2019.
  • [69] Hung-yi Lee, Shang-Wen Li, and Ngoc Thang Vu. Meta learning for natural language processing: A survey. arXiv preprint arXiv:2205.01500, 2022.
  • [70] You Lu and Bert Huang. Structured output learning with conditional generative flows. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 5005–5012, 2020.
  • [71] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  • [72] TuSimple. TuSimple Lane Detection Challenge. https://github.com/TuSimple/tusimple-benchmark/tree/master/doc/lane_detection, 2017.
  • [73] ** Shi, ** Luo, Xiaogang Wang, and Xiaoou Tang. Spatial as deep: Spatial cnn for traffic scene understanding. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
  • [74] Anh Nguyen and Anh Tran. Wanet–imperceptible war**-based backdoor attack. arXiv preprint arXiv:2102.10369, 2021.
  • [75] Yuezun Li, Yiming Li, Baoyuan Wu, Longkang Li, Ran He, and Siwei Lyu. Invisible backdoor attack with sample-specific triggers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 16463–16472, 2021.
  • [76] Dan Hendrycks and Thomas Dietterich. Benchmarking neural network robustness to common corruptions and perturbations. arXiv preprint arXiv:1903.12261, 2019.
  • [77] Guijian Tang, Wen Yao, Tingsong Jiang, Weien Zhou, Yang Yang, and Donghua Wang. Natural weather-style black-box adversarial attacks against optical aerial detectors. IEEE Transactions on Geoscience and Remote Sensing, 2023.
  • [78] JetBot. JetBot. https://github.com/NVIDIA-AI-IOT/jetbot, 2021.