Backdoor Attack in Prompt-Based Continual Learning

Trang Nguyen Anh Tran Nhat Ho
The University of Texas at Austin
VinAI Research

June 28, 2024

Abstract

Prompt-based approaches offer a cutting-edge solution to data privacy issues in continual learning, particularly in scenarios involving multiple data suppliers where long-term storage of private user data is prohibited. Despite delivering state-of-the-art performance, its impressive remembering capability can become a double-edged sword, raising security concerns as it might inadvertently retain poisoned knowledge injected during learning from private user data. Following this insight, in this paper, we expose continual learning to a potential threat: backdoor attack, which drives the model to follow a desired adversarial target whenever a specific trigger is present while still performing normally on clean samples. We highlight three critical challenges in executing backdoor attacks on incremental learners and propose corresponding solutions: (1) Transferability: We employ a surrogate dataset and manipulate prompt selection to transfer backdoor knowledge to data from other suppliers; (2) Resiliency: We simulate static and dynamic states of the victim to ensure the backdoor trigger remains robust during intense incremental learning processes; and (3) Authenticity: We apply binary cross-entropy loss as an anti-cheating factor to prevent the backdoor trigger from devolving into adversarial noise. Extensive experiments across various benchmark datasets and continual learners validate our continual backdoor framework, achieving up to 100%percent100100\%100 % attack success rate, with further ablation studies confirming our contributions’ effectiveness.

1 Introduction

The adaptability of human learning to absorb new knowledge without forgetting previously acquired information remains a significant challenge for machine learning models. Continual learning (CL) endeavors to narrow this chasm by guiding models to sequentially learn new tasks while maintaining high performance on earlier ones. An outstanding solution to CL is the prompt-based approach [45, 57, 58, 55, 40], which leverages the power of pre-trained models and employs a set of trainable prompts for flexible model instruction, accommodating data from various tasks. Thanks to its ability to remember without storing a memory buffer, prompt-based CL methods are particularly suitable for scenarios prioritizing data privacy, such as those involving multiple data suppliers.

Nonetheless, such promising results can inadvertently become vulnerabilities, exposing CL to security threats. Indeed, while CL methods effectively address catastrophic forgetting by preserving and incorporating previously acquired knowledge, they may also unwittingly retain knowledge compromised by adversarial actions. These threats become even more formidable in the multi-data supplier scenario of prompt-based approaches, where the supplied data might contain hidden harmful information.

One potential threat is backdoor attack, which manipulates neural networks to exhibit the attacker’s desired behavior when the input contains a specific backdoor trigger. Typically, adversaries poison a small portion of the training data, causing models trained on this data to misclassify any images with the triggers as a given target class while performing normally on clean samples. This makes the attack less likely to be suspected by the victim learner. As backdoor attacks pose such dangerous threats, increasingly sophisticated methods are being introduced. These include black-box scenarios where the attacker has no information about the model and learning procedure [42, 46, 48], or data-constrained cases where adversaries control only a fragment of the training data [64, 30]. With high efficacy, even in these challenging situations, backdoor attacks are particularly threatening in multi-data supplier scenarios. In spite of significant attention in various tasks and areas such as computer vision [48, 31, 36, 13, 12, 37], large language models and natural language processing [5, 28], point clouds [60, 59, 25], federated learning [61, 54, 65, 11], and more, targeted black-box backdoor attacks have not been thoroughly explored in continual learning.

Challenges Despite holding such potential danger for CL, extending backdoor attacks to the incremental setting is non-trivial. Firstly, in the multi-supplier setting where the victim gathers data from different sources, the attacker lacks information about the actual data distribution used to train the victim model. Consequently, generalizing backdoor knowledge to be transferable to unknown data poses the first challenge that our continual backdoor approach must confront. The second challenge arises from the vulnerability of backdoor attacks during fine-tuning. Recent studies [44, 35] have highlighted the tendency for backdoor knowledge to be removed when the victim fine-tunes the poisoned model on a small and clean dataset. This issue is exacerbated in continual learning, where the victim model undergoes incremental training as new data from various sources arrive. The final challenge involves the backdoor trigger’s proneness to turn into adversarial noise. Huynh et al. [18] observed that the trigger, when optimized using a surrogate model, may transform into an adversarial perturbation, driving the clean model to follow desired adversarial targets even in the absence of any prior backdoor attacks. Since conventional adversarial defenses can mitigate such adversarial noise, preempting this behavior is crucial to strengthen the resilience of the backdoor trigger.

Contributions In response to these shortcomings, we propose a continual backdoor framework that satisfies three key properties: transferability to unknown data, resilience to incremental learning procedures, and authenticity to avoid becoming adversarial noise. Initially, we leverage the natural label map** characteristic of visual prompting, thereby approaching the data poisoning issue from the perspective of prompt selection. This approach allows our backdoor trigger to be generalized to any victim data distribution. Next, we robustify the backdoor trigger by aligning the optimization process with the continuously changing states of the incremental learner, thus ensuring the effectiveness of the backdoor trigger when the model is trained on new incoming clean data. Finally, we reconsider the choice of loss function for trigger optimization. We observe that the commonly used softmax function with cross-entropy introduces bias towards the target class, pushing its score excessively high and leading to the adversarial noise problem. Building on this observation, we propose adopting binary cross-entropy (BCE) with sigmoid function to mitigate this issue, thereby eliminating the dependency of trigger optimization on other classes and preventing cheating behavior.

By integrating the components above, our framework, termed backdoor-Attack On Prompt-based CL (AOP), successfully backdoor-attacks continual learners, achieving an Attack Success Rate (ASR) of up to 100%percent100100\%100 %. Our contributions are three-fold and can be summarized as follows:

1. We expose prompt-based CL to backdoor attacks. Our approach follows strong assumptions, with black-box, clean-label, and constrained-data setting;

2. We highlight three key challenges that our continual backdoor framework must address: ensuring transferability to unknown data in prompt tuning, preventing the catastrophic forgetting of backdoor knowledge, and mitigating the tendency to generate adversarial noise due to biases.

Motivated by these challenges, we propose a novel continual backdoor framework comprising three main components: utilizing a surrogate dataset to manipulate prompt selection, dynamically optimizing the backdoor trigger, and adopting sigmoid BCE loss to mitigate bias and prevent cheating;

3. We conduct extensive experiments on various prompt-based continual learners with different datasets and provide ablation studies to demonstrate the strength of our framework.

Organization The paper is organized as follows. Section 2 provides a brief overview of continual learning and prompt-based continual learning. In Section 3, we introduce the continual backdoor threat model, discuss backdoor challenges, and propose our prompt-based continual backdoor AOP framework. Section 4 empirically verifies the effectiveness of our AOP framework against various prompt-based incremental learners. Finally, Section 5 concludes the paper. Additional related work, discussions, and experiments are included in the supplementary material.

2 Background

Continual learning In continual learning scenarios, the model undergoes a sequential presentation of tasks 𝒟1,𝒟Tsubscript𝒟1subscript𝒟𝑇{\mathcal{D}_{1},...\mathcal{D}_{T}}caligraphic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … caligraphic_D start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT. Each task corresponds to distinct subsets of tuples 𝒟t={𝒙ti,𝒚ti}i=1i=ntsubscript𝒟𝑡superscriptsubscriptsuperscriptsubscript𝒙𝑡𝑖superscriptsubscript𝒚𝑡𝑖𝑖1𝑖subscript𝑛𝑡\mathcal{D}_{t}=\{\boldsymbol{x}_{t}^{i},\boldsymbol{y}_{t}^{i}\}_{i=1}^{i=n_{% t}}caligraphic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = { bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , bold_italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i = italic_n start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, where 𝒙ti𝒳tsuperscriptsubscript𝒙𝑡𝑖superscript𝒳𝑡\boldsymbol{x}_{t}^{i}\in\mathcal{X}^{t}bold_italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_X start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT is the input sample, 𝒚ti𝒴tsuperscriptsubscript𝒚𝑡𝑖superscript𝒴𝑡\boldsymbol{y}_{t}^{i}\in\mathcal{Y}^{t}bold_italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ caligraphic_Y start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT is the corresponding label, and ntsubscript𝑛𝑡n_{t}italic_n start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the number of samples for task t𝑡titalic_t. It is important to note that each class is exclusively associated with a single task [7, 3], meaning that 𝒴tsuperscript𝒴𝑡\mathcal{Y}^{t}caligraphic_Y start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT and 𝒴tsuperscript𝒴superscript𝑡\mathcal{Y}^{t^{\prime}}caligraphic_Y start_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT are disjoint, and data from prior tasks become inaccessible during the training of subsequent tasks [45, 40]. The objective of continual learning is to continuously acquire the capability to classify newly introduced classes while maintaining proficiency on previously learned ones in a single model f:𝒳𝒴:𝑓𝒳𝒴f:\mathcal{X}\rightarrow\mathcal{Y}italic_f : caligraphic_X → caligraphic_Y. In this paper, and in prompt-based methods [45, 57, 58, 55, 40], f𝑓fitalic_f represents the pre-trained Vision Transformer (ViT) encoder. Additionally, ϕitalic-ϕ\phiitalic_ϕ is employed as the shared classification head, and ϕtsubscriptitalic-ϕ𝑡\phi_{t}italic_ϕ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the classifier corresponding to classes specific to the given task t𝑡titalic_t.

Prompt-based continual learning We provide a concise overview of L2P [58], which stands as the first work that integrates prompts into the context of continual learning. L2P introduces a prompt pool comprising learnable prompts and their corresponding keys {(𝒌1,𝒑1),(𝒌2,𝒑2),,(𝒌np,𝒑np)}subscript𝒌1subscript𝒑1subscript𝒌2subscript𝒑2subscript𝒌subscript𝑛𝑝subscript𝒑subscript𝑛𝑝\left\{\left(\boldsymbol{k}_{1},\boldsymbol{p}_{1}\right),\left(\boldsymbol{k}% _{2},\boldsymbol{p}_{2}\right),\cdots,\left(\boldsymbol{k}_{n_{p}},\boldsymbol% {p}_{n_{p}}\right)\right\}{ ( bold_italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , ( bold_italic_k start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , bold_italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , ⋯ , ( bold_italic_k start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUBSCRIPT , bold_italic_p start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) } where npsubscript𝑛𝑝n_{p}italic_n start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT is total number of prompts. These prompts are then combined with image features and fed into a pre-trained ViT, instructing the model to perform classification. Prompts are queried in an instance-wise manner using the top-K𝐾Kitalic_K cosine similarity γ(q(𝒙),𝒌i)𝛾𝑞𝒙subscript𝒌𝑖\gamma\left(q(\boldsymbol{x}),\boldsymbol{k}_{i}\right)italic_γ ( italic_q ( bold_italic_x ) , bold_italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) between the keys and the query function q(𝒙)=f(𝒙)[0,:]𝑞𝒙𝑓𝒙0:q(\boldsymbol{x})=f(\boldsymbol{x})[0,:]italic_q ( bold_italic_x ) = italic_f ( bold_italic_x ) [ 0 , : ]. Subsequent prompt-based methods are designed based on L2P, each featuring prompt utility and optimization modifications. A brief explanation of these methods is in Appendix A.

3 Backdoor Attack on Prompt-based Continual Learning (AOP)

We first outline the threat model and introduce key notations in Section 3.1. Then, we highlight the challenges when executing a backdoor attack against prompt-based incremental learners in Section 3.2. Building upon these considerations, we delineate the three primary components of AOP across Sections 3.3-3.5. A comprehensive overview and the end-to-end algorithm is in Appendix B.

3.1 Threat Model and Notations

Continual learning protocols We consider the class-incremental learning (CIL) setting in prompt-based continual learning [57, 58]. In CIL, training data for incremental tasks 𝒟tsubscript𝒟𝑡\mathcal{D}_{t}caligraphic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT arrive incrementally in a discrete manner. Each task consists of data for new M𝑀Mitalic_M classes that have not been learned by the model before. Formally, each task 𝒟t={𝒟m,t}m=1Msubscript𝒟𝑡superscriptsubscriptsubscript𝒟𝑚𝑡𝑚1𝑀\mathcal{D}_{t}={\{\mathcal{D}_{m,t}\}_{m=1}^{M}}caligraphic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = { caligraphic_D start_POSTSUBSCRIPT italic_m , italic_t end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT with each class 𝒟m,t={𝒙im,t,yim,t}i=1nm,tsubscript𝒟𝑚𝑡subscriptsuperscriptsuperscriptsubscript𝒙𝑖𝑚𝑡superscriptsubscript𝑦𝑖𝑚𝑡subscript𝑛𝑚𝑡𝑖1\mathcal{D}_{m,t}={\{\boldsymbol{x}_{i}^{m,t},y_{i}^{m,t}\}}^{n_{m,t}}_{i=1}caligraphic_D start_POSTSUBSCRIPT italic_m , italic_t end_POSTSUBSCRIPT = { bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m , italic_t end_POSTSUPERSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m , italic_t end_POSTSUPERSCRIPT } start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_m , italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT comprises input samples 𝒙im,t𝒳superscriptsubscript𝒙𝑖𝑚𝑡𝒳\boldsymbol{x}_{i}^{m,t}\in\mathcal{X}bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m , italic_t end_POSTSUPERSCRIPT ∈ caligraphic_X and their corresponding labels yim,t=cm,t𝒴superscriptsubscript𝑦𝑖𝑚𝑡subscript𝑐𝑚𝑡𝒴y_{i}^{m,t}=c_{m,t}\in\mathcal{Y}italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m , italic_t end_POSTSUPERSCRIPT = italic_c start_POSTSUBSCRIPT italic_m , italic_t end_POSTSUBSCRIPT ∈ caligraphic_Y, where nm,tsubscript𝑛𝑚𝑡n_{m,t}italic_n start_POSTSUBSCRIPT italic_m , italic_t end_POSTSUBSCRIPT represents the number of training samples for the corresponding class. In CIL, the learner is required to perform classification across all classes encountered up to task T𝑇Titalic_T without being provided with explicit task labels during inference. Data for different classes m𝑚mitalic_m and msuperscript𝑚m^{\prime}italic_m start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT are gathered from different suppliers. To ease the ensuing presentation, the index t𝑡titalic_t is omitted unless noted otherwise.

Backdoor attack protocols Let the attacker be the data supplier for class m𝑚mitalic_m with labels cmsubscript𝑐𝑚c_{m}italic_c start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT. The attacker’s goal is to poison the supplying dataset with a small amount of trigger-injected samples, such that any data from any classes if manipulated with the backdoor trigger, will be misclassified as cmsubscript𝑐𝑚c_{m}italic_c start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT by the resulting incremental victim model when performing inference at any time t𝑡titalic_t. An example of a triggered image is given in Figure 4.

Consider 𝒟m={(𝒙i,yi)}i=1nmsubscript𝒟𝑚superscriptsubscriptsubscript𝒙𝑖subscript𝑦𝑖𝑖1subscript𝑛𝑚\mathcal{D}_{m}=\left\{\left(\boldsymbol{x}_{i},y_{i}\right)\right\}_{i=1}^{n_% {m}}caligraphic_D start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = { ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUPERSCRIPT as the benign training set of class m𝑚mitalic_m. The adversary then learns to generate the poisoned dataset 𝒟psubscript𝒟𝑝\mathcal{D}_{p}caligraphic_D start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT. Specifically, 𝒟psubscript𝒟𝑝\mathcal{D}_{p}caligraphic_D start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT consists of two parts: a modified version of a selected subset (denoted as 𝒟ssubscript𝒟𝑠\mathcal{D}_{s}caligraphic_D start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT) of 𝒟msubscript𝒟𝑚\mathcal{D}_{m}caligraphic_D start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT and the remaining benign samples. Thus, 𝒟p=𝒟b𝒟csubscript𝒟𝑝subscript𝒟𝑏subscript𝒟𝑐\mathcal{D}_{p}=\mathcal{D}_{b}\cup\mathcal{D}_{c}caligraphic_D start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT = caligraphic_D start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ∪ caligraphic_D start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT, where cmsubscript𝑐𝑚c_{m}italic_c start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT is the adversary target label, 𝒟c=𝒟m\𝒟ssubscript𝒟𝑐\subscript𝒟𝑚subscript𝒟𝑠\mathcal{D}_{c}=\mathcal{D}_{m}\backslash\mathcal{D}_{s}caligraphic_D start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = caligraphic_D start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT \ caligraphic_D start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT, 𝒟b={(𝒙,cm)𝒙=G(𝒙),(𝒙,cm)𝒟s}subscript𝒟𝑏conditional-setsuperscript𝒙subscript𝑐𝑚formulae-sequencesuperscript𝒙𝐺𝒙𝒙subscript𝑐𝑚subscript𝒟𝑠\mathcal{D}_{b}=\left\{\left(\boldsymbol{x}^{\prime},c_{m}\right)\mid% \boldsymbol{x}^{\prime}=G(\boldsymbol{x}),(\boldsymbol{x},c_{m})\in\mathcal{D}% _{s}\right\}caligraphic_D start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT = { ( bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_c start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ∣ bold_italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_G ( bold_italic_x ) , ( bold_italic_x , italic_c start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ∈ caligraphic_D start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT }, γ|𝒟s||𝒟m|𝛾subscript𝒟𝑠subscript𝒟𝑚\gamma\triangleq\frac{\left|\mathcal{D}_{s}\right|}{|\mathcal{D}_{m}|}italic_γ ≜ divide start_ARG | caligraphic_D start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT | end_ARG start_ARG | caligraphic_D start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT | end_ARG is the poisoning rate, and G:𝒳𝒳:𝐺𝒳𝒳G:\mathcal{X}\rightarrow\mathcal{X}italic_G : caligraphic_X → caligraphic_X is an adversary-specified poisoned image generator. We follow [46, 29] and formulate G(𝒙)=𝒙+𝜹𝐺𝒙𝒙𝜹G(\boldsymbol{x})=\boldsymbol{x}+\boldsymbol{\delta}italic_G ( bold_italic_x ) = bold_italic_x + bold_italic_δ, where the perturbation 𝜹𝜹\boldsymbol{\delta}bold_italic_δ has a bounded psubscript𝑝\ell_{p}roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT-norm.

We emphasize that given the considered multi-data supplier scenario, we optimize the backdoor trigger following a black-box setting (where the attacker has no access to the training model or procedure) and a clean-label setting (where the attacker cannot change the label of data), which represent stealthy and challenging conditions in backdoor attacks.

3.2 Three Challenges When Backdooring CL

We outline three challenges encountered when executing backdoor attacks against continual learners. To generate a poisoned dataset, the adversary optimizes the backdoor trigger, necessitating the appearance of the training data, learner, and training criterion. However, in accordance with our threat model, none of these are accessible.

The first challenge, as outlined in the introduction, arises from the lack of knowledge about the victim’s training data. Given that control is limited to the supplied data, which also represents the target class, prior research [64] suggests utilizing a public dataset (e.g., Tiny-ImageNet) as a surrogate training dataset. In this study, we explore the utilization of surrogate datasets in the context of prompt tuning.

Secondly, the adversary lacks information about the training learner and procedure, making it difficult to design backdoor knowledge that can withstand the incremental learning process. Despite impressive memory capabilities, continual learning methods have not yet fully matched the performance levels of joint training, and recent works [40] are still exploring ways to further avoid catastrophic forgetting. This issue also affects backdoor attacks, leading to a degradation in attack performance over time. Therefore, creating a surrogate learner that helps the trigger endure the incremental learning process is our second challenge.

Lastly, a backdoor attack entails poisoning the training dataset to induce the model to malfunction when presented with specific trigger samples while maintaining normal performance on clean data. Huynh et al. [18] observe that this objective can be achieved even without any poisoning during training. This trigger, akin to adversarial noise, can deceive the classifier during inference, irrespective of whether data poisoning occurred during training, thereby counteracting the primary objective of the backdoor attack. Moreover, such adversarial noise can be mitigated by employing standard adversarial defenses. Consequently, preventing the generation of adversarial noise poses an additional challenge when optimizing the trigger.

3.3 Prompt Selection, Label Map**, and Transferability

The core of prompt-based continual learning methods lies in the prompt pool and the prompt selection strategy. Specifically, the most relevant prompts are queried in an instance-wise manner and then concatenated with the sample to optimally guide the model in performing classification. We leverage this fundamental mechanism of the prompt-based approach to reframe the backdooring problem as one of manipulating prompt selections. As in Figures 1(a) and 1(b), we aim to ensure that triggered samples are directed to select specific backdoor prompts, thereby causing the model to misclassify these backdoor-prompted samples into the desired class.

A key feature of visual prompting is its ability to act as a label map** mechanism when performing downstream tasks using a pretrained model. In this context, prompts function as universal input perturbation templates, enabling the map** of labels from a source dataset to a target dataset [10]. From this perspective, our aim of controlling prompt selection translates into manipulating label map**s between the two datasets. This new perspective paves the way for the "transferability" of our continual backdoor framework.

When optimizing the backdoor trigger, we employ a surrogate dataset, denoted as 𝒟surrogatesubscript𝒟surrogate\mathcal{D}_{\text{surrogate}}caligraphic_D start_POSTSUBSCRIPT surrogate end_POSTSUBSCRIPT, to address the backdoor transferability to data from other classes. It is worth noting that 𝒟surrogatesubscript𝒟surrogate\mathcal{D}_{\text{surrogate}}caligraphic_D start_POSTSUBSCRIPT surrogate end_POSTSUBSCRIPT does not necessarily mirror the actual data distribution used to train the incremental model. This discrepancy stems from the visual prompting property discussed earlier. In particular, instead of optimizing a trigger that causes the poisoned data to be misclassified by the model, our backdoor trigger can be viewed as activating an incorrect map** to the target class. Since we focus on manipulating the map** and prompt selection rather than the dataset itself, 𝒟surrogatesubscript𝒟surrogate\mathcal{D}_{\text{surrogate}}caligraphic_D start_POSTSUBSCRIPT surrogate end_POSTSUBSCRIPT can be chosen differently from the actual dataset to align with our objectives.

Refer to caption
(a) Benign samples
Refer to caption
(b) Triggered samples
Refer to caption
(c) Cheated trigger
Refer to caption
(d) Benign samples
Refer to caption
(e) Triggered samples
Refer to caption
(f) Authentic trigger
Figure 1: (a) and (b): AOP’s prompt selection frequency on benign and triggered samples when attacking DualPrompt. (d) and (e): AOP’s average key-query similarities concerning benign and triggered samples when attacking DualPrompt-PGP. (c) and (f): Scores obtained from the clean model for AOP’s triggered samples optimized with CE and BCE, respectively.

3.4 Static-dynamic Trigger Optimization

Since we lack information about the victim’s continual model, we use 𝒟surrogatesubscript𝒟surrogate\mathcal{D}_{\text{surrogate}}caligraphic_D start_POSTSUBSCRIPT surrogate end_POSTSUBSCRIPT to train a surrogate incremental learner and simulate the continual learning pipeline. We then optimize the backdoor trigger 𝜹𝜹\boldsymbol{\delta}bold_italic_δ based on this surrogate incremental model. Specifically, we employ the surrogate learner with two states: a static state that reflects how prompts learn label map**s between the source and target datasets, and a dynamic state that reflects the continuous learning procedure of the victim model. Formally, our static-dynamic trigger optimization involves the following four stages:

Refer to caption
Figure 2: The AOP procedure begins by selecting a surrogate dataset, which is then divided into two subsets: 𝒟staticsubscript𝒟static\mathcal{D}_{\text{static}}caligraphic_D start_POSTSUBSCRIPT static end_POSTSUBSCRIPT and 𝒟dynamicsubscript𝒟dynamic\mathcal{D}_{\text{dynamic}}caligraphic_D start_POSTSUBSCRIPT dynamic end_POSTSUBSCRIPT. In stage (1), 𝒟staticsubscript𝒟static\mathcal{D}_{\text{static}}caligraphic_D start_POSTSUBSCRIPT static end_POSTSUBSCRIPT is employed to establish a static surrogate learner along with prompts. Following this, in stage (2), the trigger optimization process takes place based on this initial model. Next, in stage (3), the learner is updated from 𝒟dynamicsubscript𝒟dynamic\mathcal{D}_{\text{dynamic}}caligraphic_D start_POSTSUBSCRIPT dynamic end_POSTSUBSCRIPT, which serves as a transition between stages. Finally, in stage (4), the trigger is fine-tuned, with the prompt being updated periodically throughout the optimization process.

(0) Preparation To set up the static-dynamic framework, we partition the surrogate dataset 𝒟surrogatesubscript𝒟surrogate\mathcal{D}_{\text{surrogate}}caligraphic_D start_POSTSUBSCRIPT surrogate end_POSTSUBSCRIPT into two subsets: 𝒟staticsubscript𝒟static\mathcal{D}_{\text{static}}caligraphic_D start_POSTSUBSCRIPT static end_POSTSUBSCRIPT for the static surrogate stage and 𝒟dynamicsubscript𝒟dynamic\mathcal{D}_{\text{dynamic}}caligraphic_D start_POSTSUBSCRIPT dynamic end_POSTSUBSCRIPT for the dynamic surrogate stage.

(1) Static surrogate stage In this initial stage, we train the prompts on 𝒟static𝒟msubscript𝒟staticsubscript𝒟𝑚\mathcal{D}_{\text{static}}\cup\mathcal{D}_{m}caligraphic_D start_POSTSUBSCRIPT static end_POSTSUBSCRIPT ∪ caligraphic_D start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT to capture the label map** functionality between the source and target datasets. During this phase, the prompts are optimized to instruct the model to correctly classify clean input images. Consequently, we obtain a pool of benign prompts for clean data. Denoting the prompt pool as 𝐏={𝒑1,𝒑2,,𝒑np}𝐏subscript𝒑1subscript𝒑2subscript𝒑subscript𝑛𝑝\mathbf{P}=\left\{\boldsymbol{p}_{1},\boldsymbol{p}_{2},\cdots,\boldsymbol{p}_% {n_{p}}\right\}bold_P = { bold_italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , bold_italic_p start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUBSCRIPT } and 𝐊={𝒌1,𝒌2,,𝒌np}𝐊subscript𝒌1subscript𝒌2subscript𝒌subscript𝑛𝑝\mathbf{K}=\left\{\boldsymbol{k}_{1},\boldsymbol{k}_{2},\cdots,\boldsymbol{k}_% {n_{p}}\right\}bold_K = { bold_italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_k start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , bold_italic_k start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUBSCRIPT } as the corresponding prompt keys, where npsubscript𝑛𝑝n_{p}italic_n start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT is the prompt pool size, the objective for this optimization step follows [58] and is given by:

min𝐏,𝐊,ϕ(ϕ(f(𝒙;𝑷)),y)+λ𝐊𝒙γ(q(𝒙),𝒌i).subscript𝐏𝐊italic-ϕitalic-ϕ𝑓𝒙𝑷𝑦𝜆subscriptsubscript𝐊𝒙𝛾𝑞𝒙subscript𝒌𝑖\begin{array}[]{r}\min_{\mathbf{P},\mathbf{K},\phi}\mathcal{L}\left(\phi\left(% f\left(\boldsymbol{x};\boldsymbol{P}\right)\right),y\right)+\lambda\sum_{% \mathbf{K}_{\boldsymbol{x}}}\gamma\left(q(\boldsymbol{x}),\boldsymbol{k}_{i}% \right).\\ \end{array}start_ARRAY start_ROW start_CELL roman_min start_POSTSUBSCRIPT bold_P , bold_K , italic_ϕ end_POSTSUBSCRIPT caligraphic_L ( italic_ϕ ( italic_f ( bold_italic_x ; bold_italic_P ) ) , italic_y ) + italic_λ ∑ start_POSTSUBSCRIPT bold_K start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_γ ( italic_q ( bold_italic_x ) , bold_italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) . end_CELL end_ROW end_ARRAY (1)

Here, 𝐊𝒙subscript𝐊𝒙\mathbf{K}_{\boldsymbol{x}}bold_K start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT denotes a subset of the top-K𝐾Kitalic_K keys specifically selected for each sample 𝒙𝒙\boldsymbol{x}bold_italic_x. γ𝛾\gammaitalic_γ is the function that assesses the similarity between the query feature q(𝒙)𝑞𝒙q(\boldsymbol{x})italic_q ( bold_italic_x ) and prompt key. The scalar λ𝜆\lambdaitalic_λ weights the loss. The first term is the softmax cross-entropy loss, while the second term acts as a regularizer to encourage selected keys to be closer to the corresponding query features.

(2) Trigger optimization stage During this stage, the adversary optimizes the trigger 𝜹𝜹\boldsymbol{\delta}bold_italic_δ to induce misclassification of the triggered inputs into the target class. Specifically, the trigger loss function can be expressed as follows:

min𝜹(𝒙,cm)𝒟m[(ϕ(f(𝒙+𝜹;𝑷)),cm)].subscript𝜹subscript𝒙subscript𝑐𝑚subscript𝒟𝑚delimited-[]italic-ϕ𝑓𝒙𝜹𝑷subscript𝑐𝑚\displaystyle\min_{\boldsymbol{\delta}}\sum_{(\boldsymbol{x},c_{m})\in\mathcal% {D}_{m}}\left[\mathcal{L}\left(\phi\left(f\left(\boldsymbol{x}+\boldsymbol{% \delta};\boldsymbol{P}\right)\right),c_{m}\right)\right].roman_min start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT ( bold_italic_x , italic_c start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ∈ caligraphic_D start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ caligraphic_L ( italic_ϕ ( italic_f ( bold_italic_x + bold_italic_δ ; bold_italic_P ) ) , italic_c start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ] . (2)

(3) Transition stage This stage is designed to align the surrogate learner with the behaviour of the victim learner when being updated with new incoming tasks. Specifically, we continuously train the prompts from Stage (1) with the same objective as outlined in equation (1), but using 𝒟dynamicsubscript𝒟dynamic\mathcal{D}_{\text{dynamic}}caligraphic_D start_POSTSUBSCRIPT dynamic end_POSTSUBSCRIPT. In essence, the goal of this stage is to statically prepare the surrogate learner for the subsequent dynamic stage.

(4) Dynamic surrogate stage In this stage, we aim to acquaint the backdoor trigger with the continuously updated prompts resulting from the continual learning process. This dynamic stage entails fine-tuning the prompt components for one epoch, as in Stage (3), following several iterations of optimization of the trigger with equation (2). This iterative process is repeated for multiple rounds to enhance the resilience of the backdoor trigger against the continual learning process.

After optimizing the trigger through the aforementioned four stages, the optimized trigger 𝜹superscript𝜹\boldsymbol{\delta}^{*}bold_italic_δ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is used to poison a small portion of 𝒟msubscript𝒟𝑚\mathcal{D}_{m}caligraphic_D start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, which is then released to the victim learner. Summarization of AOP is in Figure 2 and Appendix B.

3.5 Towards an Authentic Backdoor Trigger

Are we truly optimizing a backdoor trigger? As discussed in Section 3.2, optimizing the trigger with these objectives can unintentionally transform it into adversarial noise. While our static-dynamic framework can generate a robust trigger that withstands intense incremental learning processes, it might deviate into adversarial perturbation. To further explore this phenomenon, we analyze the output scores in Figure 1(c). The visualization reveals that even when processed by a clean model unaffected by backdoor attacks, the poisoned samples are consistently misclassified towards the target class with dominant scores. This observation prompts a reconsideration of the backdoor trigger optimization process. We discovered that the overconfident score bias towards the target class is primarily induced by the commonly used softmax with cross-entropy loss function. Softmax introduces competition between classes, and the subsequent cross-entropy loss tends to elevate the scores of the target class significantly above the others. This pronounced bias compels the trigger to act like adversarial noise.

Sigmoid with binary cross entropy loss To reduce biases, we mitigate the competition between the target class and other classes caused by the relative scoring of softmax by employing a sigmoid function after the logits to compute output scores. This approach shifts the optimization focus towards independently increasing the scores of target classes rather than suppressing others. Subsequently, we utilize binary cross-entropy loss to enable independent optimization processes. Following [8], the gradient of the loss at score (sj)subscript𝑠𝑗(s_{j})( italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) for class j𝑗jitalic_j is computed as BCE(𝜽)sj=σ(sj)𝕀{j=y~}subscriptBCE𝜽subscript𝑠𝑗𝜎subscript𝑠𝑗𝕀𝑗~𝑦\frac{\partial\mathcal{L}_{\mathrm{BCE}}(\boldsymbol{\theta})}{\partial s_{j}}% =\sigma(s_{j})-\mathbb{I}\{j=\tilde{y}\}divide start_ARG ∂ caligraphic_L start_POSTSUBSCRIPT roman_BCE end_POSTSUBSCRIPT ( bold_italic_θ ) end_ARG start_ARG ∂ italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG = italic_σ ( italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) - blackboard_I { italic_j = over~ start_ARG italic_y end_ARG }, thereby constraining the score of the target class to a certain level regardless of the scores of other classes. As a result, during inference with a non-backdoored clean model, the output scores are more balanced between classes, as shown in Figure 1(f). This balance prevents the problem of generating adversarial noise when optimizing the backdoor trigger.

4 Experiments

In this section, we first describe the experimental setups, followed by presenting the results in four key aspects: the overall backdooring ability of AOP, its performance with different surrogate datasets, the robustness of AOP with varying attack times, and the efficacy of adopting BCE in preventing the generation of adversarial perturbations. Further discussions on performance, visualizations, baselines, efficacy against defenses, and poisoning rate sensitivity are deferred to Appendix D.

4.1 Experimental Setup

Victim incremental learners

We evaluate our continual backdoor framework against 6 prompt-based continual learning methods: L2P [58], DualPrompt [57], HiDe-Prompt [55], CODA-Prompt [45], and two variants of PGP [40], namely L2P-PGP and DualPrompt-PGP. We follow the original settings and implementations of each method. All learners utilize the ViT-B/16 backbone [14], pre-trained on ImageNet-1K [41], except for HiDe-Prompt, which is pre-trained on iBOT-1K [66]. Detailed experimental information is in Appendix C.

Table 1: Backdoor performance against L2P, DualPrompt, and PGP on 5-Split-CUB200. The attacker is the supplier for a random class in task 1111. The dynamic stage takes place over 5 rounds. Results are reported when using TinyImageNet and CIFAR100 as surrogate datasets. For ACC, we additionally report the change in clean accuracy compared to clean-training learners. For ASR, we provide a comparison with the baseline [64] (without dynamic optimization and not using BCE).
Surrogate dataset \rightarrow TinyImageNet CIFAR100
ASR ACC ASR ACC
L2P 99.96±0.02plus-or-minus99.960.0299.96\pm 0.0299.96 ± 0.02 74.71±0.58plus-or-minus74.710.5874.71\pm 0.5874.71 ± 0.58 99.99±0.02plus-or-minus99.990.0299.99\pm 0.0299.99 ± 0.02 74.44±0.54plus-or-minus74.440.5474.44\pm 0.5474.44 ± 0.54
(86.44)absent86.44(\uparrow 86.44)( ↑ 86.44 ) (0.17)absent0.17(\downarrow 0.17)( ↓ 0.17 ) (64.91)absent64.91(\uparrow 64.91)( ↑ 64.91 ) (0.44)absent0.44(\downarrow 0.44)( ↓ 0.44 )
DualPrompt 99.93±0.02plus-or-minus99.930.0299.93\pm 0.0299.93 ± 0.02 82.62±0.66plus-or-minus82.620.6682.62\pm 0.6682.62 ± 0.66 99.95±0.05plus-or-minus99.950.0599.95\pm 0.0599.95 ± 0.05 82.71±0.55plus-or-minus82.710.5582.71\pm 0.5582.71 ± 0.55
(57.08)absent57.08(\uparrow 57.08)( ↑ 57.08 ) (0.10)absent0.10(\uparrow 0.10)( ↑ 0.10 ) (42.36)absent42.36(\uparrow 42.36)( ↑ 42.36 ) (0.19)absent0.19(\uparrow 0.19)( ↑ 0.19 )
L2P-PGP 99.97±0.01plus-or-minus99.970.0199.97\pm 0.0199.97 ± 0.01 74.97±0.83plus-or-minus74.970.8374.97\pm 0.8374.97 ± 0.83 100.00±0.00plus-or-minus100.000.00100.00\pm 0.00100.00 ± 0.00 75.70±0.50plus-or-minus75.700.5075.70\pm 0.5075.70 ± 0.50
(89.73)absent89.73(\uparrow 89.73)( ↑ 89.73 ) (0.48)absent0.48(\downarrow 0.48)( ↓ 0.48 ) (68.82)absent68.82(\uparrow 68.82)( ↑ 68.82 ) (0.25)absent0.25(\uparrow 0.25)( ↑ 0.25 )
DualPrompt-PGP 99.93±0.02plus-or-minus99.930.0299.93\pm 0.0299.93 ± 0.02 82.45±0.29plus-or-minus82.450.2982.45\pm 0.2982.45 ± 0.29 99.99±0.01plus-or-minus99.990.0199.99\pm 0.0199.99 ± 0.01 82.84±0.12plus-or-minus82.840.1282.84\pm 0.1282.84 ± 0.12
(56.70)absent56.70(\uparrow 56.70)( ↑ 56.70 ) (0.31)absent0.31(\downarrow 0.31)( ↓ 0.31 ) (44.83)absent44.83(\uparrow 44.83)( ↑ 44.83 ) (0.08)absent0.08(\uparrow 0.08)( ↑ 0.08 )
Table 2: Backdoor performance across different prompt-based continual learning methods on three variants of Split-ImageNet-R. The adversary’s target class is chosen randomly from the classes in task 1. The dynamic stage is iterated for 10 rounds. The surrogate dataset used is TinyImageNet. We also report the change in ACC compared to non-attacked learners.
5-Split-ImageNet-R 10-Split-ImageNet-R 20-Split-ImageNet-R
ASR ACC ASR ACC ASR ACC
L2P 99.76±0.10plus-or-minus99.760.1099.76\pm 0.1099.76 ± 0.10 64.27±0.65plus-or-minus64.270.6564.27\pm 0.6564.27 ± 0.65 99.56±0.22plus-or-minus99.560.2299.56\pm 0.2299.56 ± 0.22 62.43±0.58plus-or-minus62.430.5862.43\pm 0.5862.43 ± 0.58 98.24±0.21plus-or-minus98.240.2198.24\pm 0.2198.24 ± 0.21 60.51±1.17plus-or-minus60.511.1760.51\pm 1.1760.51 ± 1.17
(0.77)absent0.77(\downarrow 0.77)( ↓ 0.77 ) (0.12)absent0.12(\downarrow 0.12)( ↓ 0.12 ) (0.83)absent0.83(\downarrow 0.83)( ↓ 0.83 )
DualPrompt 99.57±0.25plus-or-minus99.570.2599.57\pm 0.2599.57 ± 0.25 70.69±0.56plus-or-minus70.690.5670.69\pm 0.5670.69 ± 0.56 99.26±0.39plus-or-minus99.260.3999.26\pm 0.3999.26 ± 0.39 69.17±0.27plus-or-minus69.170.2769.17\pm 0.2769.17 ± 0.27 96.17±0.89plus-or-minus96.170.8996.17\pm 0.8996.17 ± 0.89 66.04±0.43plus-or-minus66.040.4366.04\pm 0.4366.04 ± 0.43
(0.62)absent0.62(\downarrow 0.62)( ↓ 0.62 ) (0.85)absent0.85(\downarrow 0.85)( ↓ 0.85 ) (0.21)absent0.21(\downarrow 0.21)( ↓ 0.21 )
CODA-Prompt 98.16±1.01plus-or-minus98.161.0198.16\pm 1.0198.16 ± 1.01 74.15±0.11plus-or-minus74.150.1174.15\pm 0.1174.15 ± 0.11 96.55±1.29plus-or-minus96.551.2996.55\pm 1.2996.55 ± 1.29 72.86±0.11plus-or-minus72.860.1172.86\pm 0.1172.86 ± 0.11 71.27±2.86plus-or-minus71.272.8671.27\pm 2.8671.27 ± 2.86 70.86±0.94plus-or-minus70.860.9470.86\pm 0.9470.86 ± 0.94
(1.04)absent1.04(\downarrow 1.04)( ↓ 1.04 ) (0.02)absent0.02(\downarrow 0.02)( ↓ 0.02 ) (0.04)absent0.04(\downarrow 0.04)( ↓ 0.04 )
HiDe-Prompt 98.65±0.90plus-or-minus98.650.9098.65\pm 0.9098.65 ± 0.90 74.89±0.60plus-or-minus74.890.6074.89\pm 0.6074.89 ± 0.60 94.66±0.93plus-or-minus94.660.9394.66\pm 0.9394.66 ± 0.93 71.99±0.37plus-or-minus71.990.3771.99\pm 0.3771.99 ± 0.37 93.79±0.66plus-or-minus93.790.6693.79\pm 0.6693.79 ± 0.66 70.93±0.86plus-or-minus70.930.8670.93\pm 0.8670.93 ± 0.86
(0.32)absent0.32(\downarrow 0.32)( ↓ 0.32 ) (0.46)absent0.46(\downarrow 0.46)( ↓ 0.46 ) (0.09)absent0.09(\downarrow 0.09)( ↓ 0.09 )
L2P-PGP 99.33±0.05plus-or-minus99.330.0599.33\pm 0.0599.33 ± 0.05 64.38±0.57plus-or-minus64.380.5764.38\pm 0.5764.38 ± 0.57 99.36±0.15plus-or-minus99.360.1599.36\pm 0.1599.36 ± 0.15 61.73±0.38plus-or-minus61.730.3861.73\pm 0.3861.73 ± 0.38 98.84±0.16plus-or-minus98.840.1698.84\pm 0.1698.84 ± 0.16 60.74±1.17plus-or-minus60.741.1760.74\pm 1.1760.74 ± 1.17
(0.10)absent0.10(\uparrow 0.10)( ↑ 0.10 ) (0.33)absent0.33(\uparrow 0.33)( ↑ 0.33 ) (0.15)absent0.15(\downarrow 0.15)( ↓ 0.15 )
DualPrompt-PGP 99.83±0.27plus-or-minus99.830.2799.83\pm 0.2799.83 ± 0.27 70.80±0.08plus-or-minus70.800.0870.80\pm 0.0870.80 ± 0.08 99.17±0.43plus-or-minus99.170.4399.17\pm 0.4399.17 ± 0.43 69.24±0.41plus-or-minus69.240.4169.24\pm 0.4169.24 ± 0.41 97.01±0.75plus-or-minus97.010.7597.01\pm 0.7597.01 ± 0.75 66.32±1.04plus-or-minus66.321.0466.32\pm 1.0466.32 ± 1.04
(0.08)absent0.08(\downarrow 0.08)( ↓ 0.08 ) (0.18)absent0.18(\downarrow 0.18)( ↓ 0.18 ) (0.76)absent0.76(\downarrow 0.76)( ↓ 0.76 )

Datasets For the victim’s training dataset, we use three variants of ImageNet-R [17]: 5-Split, 10-Split, and 20-Split ImageNet-R. These variants divide the 200 classes of the original dataset into 5, 10, and 20 tasks, respectively. Additionally, we conduct experiments on the 5-Split-CUB200 dataset, which partitions the original CUB200 [52] dataset into 5 tasks, each containing 40 classes. For the attacker’s surrogate dataset, we primarily use TinyImageNet [23] for all experiments and CIFAR100 [22] in specific settings.

Backdoor setting Following the guidelines of [64], we set the maximum poison ratio to 25 images, corresponding to 0.1%percent0.10.1\%0.1 % of ImageNet-R and 0.5%percent0.50.5\%0.5 % of CUB200. Additionally, we set the upper bound of the subscript\ell_{\infty}roman_ℓ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT-norm of triggers to 1625516255\frac{16}{255}divide start_ARG 16 end_ARG start_ARG 255 end_ARG, in line with standard practices in the literature [48, 42]. During inference, the trigger is amplified by a factor of 3333 [48, 64].

Metrics The evaluation of our framework utilizes two key metrics: (1) average accuracy (ACC) and (2) attack success rate (ASR). ACC assesses the accuracy of the backdoored model on benign test samples, whereas ASR measures the proportion of attacked samples that the compromised model predicts as the target label, reflecting the backdoor attack’s effectiveness. In the context of continual learning, ACC and ASR at a given time t𝑡titalic_t are averaged across the corresponding metrics for all data from task 1111 to task t𝑡titalic_t. All results are averaged over 3 runs for fair comparisons.

4.2 Effectiveness of AOP

We report the ASR and ACC when performing backdoor attacks against various incremental learners in Table 1 and Table 2. As observed from the tables, our framework consistently achieves high ASR with negligible effect on the ACC of clean samples. This is due to the inherent characteristics of continual learning, which enable the learner to perform well across different tasks, making it vulnerable to backdoor attacks. By considering backdooring in continual learning as an additional "backdoor task," the plasticity of continual learning allows the ASR, or performance on the backdoor task, to be high without degrading the ACC on clean tasks.

It is worth noting that ASR still suffers from the catastrophic forgetting phenomenon of continual learning for long sequence tasks. Specifically, in Table 2, the 20-Split-ImageNet-R performs worse than the 5-split and 10-split versions across all experiments. This indicates that the more tasks and the longer the incremental learning process, the higher the chance for a decrease in ASR. However, the ACC also suffers from this phenomenon, as it is a major issue in continual learning.

While prompt-based methods share a common core of utilizing prompt pools and selecting relevant prompts for each task or class, each exhibits distinct characteristics. Our framework observes a significantly lower ASR when backdooring CODA-Prompt. This is because CODA-Prompt utilizes all prompts in the prompt pool through its weighted mechanism instead of selecting only the top-K relevant prompts. Consequently, even with triggered samples, clean prompts still exert some influence, leading to degradation in ASR.

Table 3: Backdoor performance when the target class belongs to different tasks T𝑇Titalic_T. The results are reported when the victim’s training dataset is 10-Split-ImageNet-R, and the attacker’s surrogate dataset is TinyImageNet.
T=1𝑇1T=1italic_T = 1 T=4𝑇4T=4italic_T = 4 T=10𝑇10T=10italic_T = 10
ASR ACC ASR ACC ASR ACC
L2P 99.56±0.22plus-or-minus99.560.2299.56\pm 0.2299.56 ± 0.22 62.43±0.58plus-or-minus62.430.5862.43\pm 0.5862.43 ± 0.58 99.61±0.19plus-or-minus99.610.1999.61\pm 0.1999.61 ± 0.19 62.09±0.06plus-or-minus62.090.0662.09\pm 0.0662.09 ± 0.06 99.89±0.05plus-or-minus99.890.0599.89\pm 0.0599.89 ± 0.05 62.27±0.26plus-or-minus62.270.2662.27\pm 0.2662.27 ± 0.26
L2P-PGP 99.36±0.15plus-or-minus99.360.1599.36\pm 0.1599.36 ± 0.15 62.73±0.38plus-or-minus62.730.3862.73\pm 0.3862.73 ± 0.38 99.77±0.08plus-or-minus99.770.0899.77\pm 0.0899.77 ± 0.08 62.88±0.73plus-or-minus62.880.7362.88\pm 0.7362.88 ± 0.73 99.85±0.35plus-or-minus99.850.3599.85\pm 0.3599.85 ± 0.35 62.32±0.82plus-or-minus62.320.8262.32\pm 0.8262.32 ± 0.82
Table 4: ASR of clean, non-attacked learners on triggered samples. Results are compared between triggers optimized with CE softmax and BCE sigmoid loss.
L2P DualPrompt
10-Split- 5-Split- 10-Split- 20-Split-
ImageNet-R ImageNet-R ImageNet-R ImageNet-R
AOP with CE Top-1 ASR 74.1874.1874.1874.18 34.1834.1834.1834.18 42.8542.8542.8542.85 96.9396.9396.9396.93
Top-5 ASR 96.8996.8996.8996.89 92.7892.7892.7892.78 97.0197.0197.0197.01 99.6399.6399.6399.63
AOP with BCE Top-1 ASR 0.000.000.000.00 0.000.000.000.00 0.000.000.000.00 0.000.000.000.00
Top-5 ASR 0.000.000.000.00 0.720.720.720.72 0.120.120.120.12 2.682.682.682.68

Different surrogate datasets Another factor that makes prompt-based continual learning vulnerable is the utilization of prompting. As shown in Figures 1(d) and 1(e), AOP’s triggered samples consistently have the highest similarity with prompt ID 2, which, in contrast, shows the smallest similarity with benign samples. Thus, as discussed in Section 3.3, prompting allows for actual data differences when choosing surrogate datasets. We report the backdoor performance using TinyImageNet and CIFAR100 as surrogate datasets in Table 1. The experiments show consistently high ASR results for both surrogate data choices, confirming the transferability of our continual backdoor framework.

Different attack times We report the ASR in Table 3, considering scenarios where the target class belongs to different tasks that arrive at different times. We observe slight increases in ASR when the attack class is part of later tasks, as it experiences less forgetting. Nonetheless, our method AOP consistently maintains a high ASR, exceeding 99%percent9999\%99 % at all three reported attack times. This convincingly demonstrates that the backdoor knowledge can be effectively transferred to both previously learned and incoming future classes.

Different dynamic rounds We illustrate the attack performance across varying numbers of dynamic rounds in Figure 3. As discussed above, the ASR decreases when tested on the 20-Split-ImageNet. We observe that increasing the number of dynamic rounds does not consistently lead to higher performance. However, from a positive perspective, since the adversary lacks information about the total number of tasks, decreasing and increasing dynamic rounds should not have too much impact on ASR. We emphasize that in long sequence tasks, both ASR and ACC degrade due to forgetting.

Enhancing backdoor authenticity via sigmoid BCE As shown in Table 4, triggers optimized with softmax CE retain considerable scores even when tested on non-backdoored models. This suggests that CE optimization might lead to the generation of adversarial perturbations. Conversely, when optimized using sigmoid BCE, the ASR on clean models remains consistently low. This confirms that adopting BCE can enhance the authenticity of backdoor triggers and avoid generating adversarial noise.

Refer to caption
(a) 5-Split-ImageNet-R
Refer to caption
(b) 10-Split-ImageNet-R
Refer to caption
(c) 20-Split-ImageNet-R
Figure 3: ASR when varying number of dynamic rounds.

5 Conclusion

This paper explores the vulnerability of prompt-based continual learning methods and their susceptibility to backdoor attacks. We emphasize three critical properties that a backdoor continual framework should possess: transferability to unknown data from other classes, resilience against incremental learning procedures, and the authenticity of the backdoor trigger. Building upon these considerations, we propose a novel continual backdoor framework. We leverage the label map** functionality of prompting to promote transferability, incorporate a static-dynamic optimization approach to enhance resilience, and employ BCE sigmoid loss to mitigate the adversarial noise problem. Extensive experiments confirm the effectiveness of our backdoor framework against various prompt-based continual learners.

Nonetheless, we acknowledge some limitations in our work. Firstly, competition between the target classes and the remaining classes remains necessary to some extent. Relying solely on BCE to eliminate relative scoring might hurt the performance. Secondly, certain defenses we employed to assess our approach may not be optimal for continual learning scenarios. Thus, regarding future directions, there is potential in exploring other threat models and defenses for backdooring continual learning and extending backdoor attacks to other continual learning approaches.

References

  • [1] D. Abati, J. Tomczak, T. Blankevoort, S. Calderara, R. Cucchiara, and B. E. Bejnordi. Conditional channel gated networks for task-aware continual learning, 2020.
  • [2] H. Ahn, S. Cha, D. Lee, and T. Moon. Uncertainty-based continual learning with adaptive regularization, 2019.
  • [3] H. Ahn, J. Kwak, S. F. Lim, H. Bang, H. Kim, and T. Moon. Ss-il: Separated softmax for incremental learning. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 824–833, 2020.
  • [4] P. Buzzega, M. Boschini, A. Porrello, D. Abati, and S. Calderara. Dark experience for general continual learning: a strong, simple baseline, 2020.
  • [5] X. Cai, H. Xu, S. Xu, Y. Zhang, and X. Yuan. Badprompt: Backdoor attacks on continuous prompts, 2022.
  • [6] H. Cha, J. Lee, and J. Shin. Co2l: Contrastive continual learning, 2021.
  • [7] S. Cha, S. Cho, D. Hwang, S. Hong, M. Lee, and T. Moon. Rebalancing batch normalization for exemplar-based class-incremental learning. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 20127–20136, 2022.
  • [8] S. Cha, b. kim, Y. Yoo, and T. Moon. Ssul: Semantic segmentation with unknown label for exemplar-based class-incremental learning. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 10919–10930. Curran Associates, Inc., 2021.
  • [9] A. Chaudhry, M. Rohrbach, M. Elhoseiny, T. Ajanthan, P. K. Dokania, P. H. S. Torr, and M. Ranzato. On tiny episodic memories in continual learning, 2019.
  • [10] A. Chen, Y. Yao, P.-Y. Chen, Y. Zhang, and S. Liu. Understanding and improving visual prompting: A label-map** perspective. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 19133–19143, June 2023.
  • [11] Y. Dai and S. Li. Chameleon: Adapting to peer images for planting durable backdoors in federated learning, 2023.
  • [12] K. Doan, Y. Lao, and P. Li. Backdoor attack with imperceptible input and latent modification. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 18944–18957. Curran Associates, Inc., 2021.
  • [13] K. Doan, Y. Lao, W. Zhao, and P. Li. Lira: Learnable, imperceptible and robust backdoor attacks. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 11946–11956, 2021.
  • [14] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021.
  • [15] S. Farquhar and Y. Gal. A unifying bayesian view of continual learning, 2019.
  • [16] Y. Gao, C. Xu, D. Wang, S. Chen, D. C. Ranasinghe, and S. Nepal. Strip: A defence against trojan attacks on deep neural networks. In 35th Annual Computer Security Applications Conference (ACSAC), 2019.
  • [17] D. Hendrycks, S. Basart, N. Mu, S. Kadavath, F. Wang, E. Dorundo, R. Desai, T. L. Zhu, S. Parajuli, M. Guo, D. X. Song, J. Steinhardt, and J. Gilmer. The many faces of robustness: A critical analysis of out-of-distribution generalization. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 8320–8329, 2020.
  • [18] T. Huynh, D. Nguyen, T. Pham, and A. Tran. Combat: Alternated training for effective clean-label backdoor attacks. Proceedings of the AAAI Conference on Artificial Intelligence, 38(3):2436–2444, Mar. 2024.
  • [19] S. Jung, H. Ahn, S. Cha, and T. Moon. Continual learning with node-importance based adaptive group sparse regularization, 2021.
  • [20] S. Kang, Z. Shi, and X. Zhang. Poisoning generative replay in continual learning to promote forgetting. In A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato, and J. Scarlett, editors, Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 15769–15785. PMLR, 23–29 Jul 2023.
  • [21] J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, D. Hassabis, C. Clopath, D. Kumaran, and R. Hadsell. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114(13):3521–3526, mar 2017.
  • [22] A. Krizhevsky. Learning multiple layers of features from tiny images. 2009.
  • [23] Y. Le and X. S. Yang. Tiny imagenet visual recognition challenge. 2015.
  • [24] H. Li and G. Ditzler. Targeted data poisoning attacks against continual learning neural networks. In 2022 International Joint Conference on Neural Networks (IJCNN), pages 1–8, 2022.
  • [25] X. Li, Z. Chen, Y. Zhao, Z. Tong, Y. Zhao, A. Lim, and J. T. Zhou. Pointba: Towards backdoor attacks in 3d point cloud, 2021.
  • [26] X. Li, Y. Zhou, T. Wu, R. Socher, and C. Xiong. Learn to grow: A continual structure learning framework for overcoming catastrophic forgetting, 2019.
  • [27] Y. Li, Y. Bai, Y. Jiang, Y. Yang, S.-T. Xia, and B. Li. Untargeted backdoor watermark: Towards harmless and stealthy dataset copyright protection. In NeurIPS, 2022.
  • [28] Y. Li, T. Li, K. Chen, J. Zhang, S. Liu, W. Wang, T. Zhang, and Y. Liu. Badedit: Backdooring large language models by model editing. In The Twelfth International Conference on Learning Representations, 2024.
  • [29] Y. Li, Y. Li, B. Wu, L. Li, R. He, and S. Lyu. Invisible backdoor attack with sample-specific triggers, 2021.
  • [30] Z. Li, H. Sun, P. Xia, H. Li, B. Xia, Y. Wu, and B. Li. Efficient backdoor attacks for deep neural networks in real-world scenarios. In The Twelfth International Conference on Learning Representations, 2024.
  • [31] C. Liao, H. Zhong, A. Squicciarini, S. Zhu, and D. Miller. Backdoor embedding in convolutional neural network models via invisible perturbation, 2018.
  • [32] Y. Liu, B. Schiele, and Q. Sun. Adaptive aggregation networks for class-incremental learning. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, jun 2021.
  • [33] N. Loo, S. Swaroop, and R. E. Turner. Generalized variational continual learning, 2020.
  • [34] A. Mallya and S. Lazebnik. Packnet: Adding multiple tasks to a single network by iterative pruning, 2018.
  • [35] R. Min, Z. Qin, L. Shen, and M. Cheng. Towards stable backdoor purification through feature shift tuning. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  • [36] S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard. Universal adversarial perturbations, 2017.
  • [37] A. Nguyen and A. Tran. Wanet – imperceptible war**-based backdoor attack, 2021.
  • [38] C. V. Nguyen, Y. Li, T. D. Bui, and R. E. Turner. Variational continual learning, 2018.
  • [39] Q. Pham, C. Liu, and S. C. H. Hoi. Continual learning, fast and slow, 2023.
  • [40] J. Qiao, Z. Zhang, X. Tan, C. Chen, Y. Qu, Y. Peng, and Y. Xie. Prompt gradient projection for continual learning. In International Conference on Learning Representations, 2024.
  • [41] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. Imagenet large scale visual recognition challenge, 2015.
  • [42] A. Saha, A. Subramanya, and H. Pirsiavash. Hidden trigger backdoor attacks, 2019.
  • [43] J. Serrà, D. Surís, M. Miron, and A. Karatzoglou. Overcoming catastrophic forgetting with hard attention to the task, 2018.
  • [44] Z. Sha, X. He, P. Berrang, M. Humbert, and Y. Zhang. Fine-tuning is all you need to mitigate backdoor attacks, 2022.
  • [45] J. S. Smith, L. Karlinsky, V. Gutta, P. Cascante-Bonilla, D. Kim, A. Arbelle, R. Panda, R. Feris, and Z. Kira. Coda-prompt: Continual decomposed attention-based prompting for rehearsal-free continual learning, 2023.
  • [46] H. Souri, M. Goldblum, L. Fowl, R. Chellappa, and T. Goldstein. Sleeper agent: Scalable hidden trigger backdoors for neural networks trained from scratch. arXiv preprint arXiv:2106.08970, 2021.
  • [47] W. Sun, X. Zhang, H. LU, Y.-C. Chen, T. Wang, J. Chen, and L. Lin. Backdoor contrastive learning via bi-level trigger optimization. In The Twelfth International Conference on Learning Representations, 2024.
  • [48] A. Turner, D. Tsipras, and A. Madry. Label-consistent backdoor attacks, 2019.
  • [49] M. Umer, G. Dawson, and R. Polikar. Targeted forgetting and false memory formation in continual learners through adversarial backdoor attacks. In 2020 International Joint Conference on Neural Networks (IJCNN), pages 1–8, 2020.
  • [50] M. Umer and R. Polikar. Adversarial targeted forgetting in regularization and generative based continual learning models. 2021 International Joint Conference on Neural Networks (IJCNN), pages 1–8, 2021.
  • [51] M. Umer and R. Polikar. False memory formation in continual learners through imperceptible backdoor trigger. ArXiv, abs/2202.04479, 2022.
  • [52] C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. The caltech-ucsd birds-200-2011 dataset. Technical Report CNS-TR-2011-001, California Institute of Technology, 2011.
  • [53] B. Wang, Y. Yao, S. Shan, H. Li, B. Viswanath, H. Zheng, and B. Y. Zhao. Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In 2019 IEEE Symposium on Security and Privacy (SP), pages 707–723, 2019.
  • [54] H. Wang, K. Sreenivasan, S. Rajput, H. Vishwakarma, S. Agarwal, J. yong Sohn, K. Lee, and D. Papailiopoulos. Attack of the tails: Yes, you really can backdoor federated learning, 2020.
  • [55] L. Wang, J. Xie, X. Zhang, M. Huang, H. Su, and J. Zhu. Hierarchical decomposition of prompt-based continual learning: Rethinking obscured sub-optimality. Advances in Neural Information Processing Systems, 2023.
  • [56] Z. Wang, T. Jian, K. Chowdhury, Y. Wang, J. Dy, and S. Ioannidis. Learn-prune-share for lifelong learning, 2020.
  • [57] Z. Wang, Z. Zhang, S. Ebrahimi, R. Sun, H. Zhang, C.-Y. Lee, X. Ren, G. Su, V. Perot, J. Dy, and T. Pfister. Dualprompt: Complementary prompting for rehearsal-free continual learning, 2022.
  • [58] Z. Wang, Z. Zhang, C.-Y. Lee, H. Zhang, R. Sun, X. Ren, G. Su, V. Perot, J. Dy, and T. Pfister. Learning to prompt for continual learning, 2022.
  • [59] C. Xiang, C. R. Qi, and B. Li. Generating 3d adversarial point clouds, 2019.
  • [60] Z. Xiang, D. J. Miller, S. Chen, X. Li, and G. Kesidis. A backdoor attack against 3d point cloud classifiers, 2021.
  • [61] C. Xie, K. Huang, P.-Y. Chen, and B. Li. Dba: Distributed backdoor attacks against federated learning. In International Conference on Learning Representations, 2020.
  • [62] S. Yan, J. Xie, and X. He. Der: Dynamically expandable representation for class incremental learning, 2021.
  • [63] D. Yin, M. Farajtabar, and A. Li. Sola: Continual learning with second-order loss approximation. In Workshop of Advances in Neural Information Processing Systems, 2020.
  • [64] Y. Zeng, M. Pan, H. A. Just, L. Lyu, M. Qiu, and R. Jia. Narcissus: A practical clean-label backdoor attack with limited information, 2022.
  • [65] Z. Zhang, A. Panda, L. Song, Y. Yang, M. W. Mahoney, J. E. Gonzalez, K. Ramchandran, and P. Mittal. Neurotoxin: Durable backdoors in federated learning, 2022.
  • [66] J. Zhou, C. Wei, H. Wang, W. Shen, C. Xie, A. Yuille, and T. Kong. Image BERT pre-training with online tokenizer. In International Conference on Learning Representations, 2022.

In this supplementary material, we first review related work on continual learning, prompt-based continual learning, and backdoor attacks in Appendix A. Next, we summarize our AOP in Appendix B. Implementation details, additional experiments, and visualizations are provided in Appendices C and D, respectively. Finally, we discuss broader impacts in Appendix E.

Appendix A Related Work

Continual learning

Adapting to new knowledge is an innate human capability, yet it poses significant challenges for machine learning models. Continual learning emerges as one approach to bridge this gap between models and humans, which encourages models to continuously acquire new knowledge from new data while retaining previously learned ones. The regularization/prior approach [21, 38, 15, 33, 19, 2, 63] effectively preserves learned knowledge by controlling the learning of the model’s parameters through a regularization term in the objective function. Architecture-based approaches [26, 34, 43, 56, 62, 1, 32] extend the model’s plasticity by expanding its network to accommodate new knowledge. Rehearsal-based approaches [4, 6, 39, 9] rely on a memory buffer to retain past knowledge. Continual learning primarily focuses on the class-incremental learning (CIL) setting, which is the most challenging and representative setting since the task boundaries are not available during inference. While rehearsal-based approaches achieve state-of-the-art performance [4] in CIL, they violate data privacy requirements as they necessitate the storage of past data.

Prompt-based continual learning

With few learnable parameters and not relying on memory buffers, prompt-based continual learning methods achieve state-of-the-art performance. These methods are especially suitable for scenarios where data privacy is crucial. Specifically, prompt-based approaches leverage the power of pre-trained models, learning only a small number of prompts to guide the model’s performance across different tasks or classes. L2P [58] is the first work to explore prompting in continual learning. It constructs a prompt pool and selects appropriate prompts for each input. Building on L2P, DualPrompt [57] employs prefix-tuning and constructs two types of prompts: task-sharing and task-specific. CODA-Prompt [45] enhances prompt selection with an adaptive attention mechanism. HiDe-Prompt [55] examines the influence of various pretraining paradigms and decomposes the objective into hierarchical components. PGP [40] uses prompt gradient projection to promote updates in orthogonal directions, effectively preventing forgetting.

Backdoor attack

A backdoor attack aims to cause a model to misbehave according to an adversary’s target when the input data contains a specific backdoor trigger, while still performing normally on clean input data. Backdoor attacks have been explored in different settings and under various threat models, which identify the attacker’s accessibility. In a black-box setting [42, 46, 27, 47], the attacker has no control over the training process and only has access to the dataset, which they then poison and release to the victim. Another line of work [42, 46, 64, 48] assumes that the attacker cannot flip the labels of the dataset (clean-label). Recently, attackers’ control has been limited to data-constrained scenarios where they only have access to a small proportion of data. For example, [64] employs a surrogate clean model to optimize a clean-label backdoor trigger, while [30] leverages the zero-shot capabilities of the CLIP model to suppress clean features and augment the poisoning features. Additionally, [18] observes that even with carefully alternated training to train a surrogate poisoned model, the optimized backdoor trigger tends to become adversarial noise.

Previous works on backdoor attacks against continual learning have primarily focused on non-targeted attacks, aiming to degrade the model’s performance in general. These studies typically explore task-incremental and domain-incremental settings using various approaches. For instance, [24] describes a white-box attack where the attacker has control over the training model and seeks to force the neural network to forget previously learned knowledge. Other works, such as [49, 50, 51], focus on regularization-based and replay-based learners in domain-incremental and task-incremental learning scenarios, aiming to degrade the performance of the first task. Similarly, [24] and [20] aim to undermine the performance of continual learners. In contrast, our work focuses on targeted backdoor attacks. We aim to manipulate the attacked learner to classify poisoned data from any task into a desired target class while maintaining high accuracy on clean data. Furthermore, our research emphasizes state-of-the-art prompt-based continual learning and tackles the most challenging setting in continual learning, which is class-incremental learning.

Appendix B AOP end-to-end pipeline

In this Appendix, we provide an overview of the key algorithms utilized in AOP. Specifically, Algorithm 1 details the process for prompt tuning, Algorithm 2 outlines the method for trigger optimization, and Algorithm 3 presents the comprehensive end-to-end pipeline of AOP.

Refer to caption
(a) Clean image
Refer to caption
(b) Backdoor trigger
Refer to caption
(c) Poisoned image
Figure 4: Visualizations of the clean image, backdoor trigger, and poisoned image.
Input: (1) Surrogate model f𝑓fitalic_f
1          (2) Dataset 𝒟𝒟\mathcal{D}caligraphic_D
2          (3) Prompt components 𝑷={(𝒌1,𝒑1),(𝒌2,𝒑2),,(𝒌np,𝒑np)}𝑷subscript𝒌1subscript𝒑1subscript𝒌2subscript𝒑2subscript𝒌subscript𝑛𝑝subscript𝒑subscript𝑛𝑝\boldsymbol{P}=\left\{\left(\boldsymbol{k}_{1},\boldsymbol{p}_{1}\right),\left% (\boldsymbol{k}_{2},\boldsymbol{p}_{2}\right),\cdots,\left(\boldsymbol{k}_{n_{% p}},\boldsymbol{p}_{n_{p}}\right)\right\}bold_italic_P = { ( bold_italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , ( bold_italic_k start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , bold_italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , ⋯ , ( bold_italic_k start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUBSCRIPT , bold_italic_p start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) }
3          (4) Query function q𝑞qitalic_q
4          (5) Cosine similarity γ𝛾\gammaitalic_γ
5          (6) Top-K selected keys 𝐊𝒙subscript𝐊𝒙\mathbf{K}_{\boldsymbol{x}}bold_K start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT
6          (7) Number of iterations for trigger generating 𝒦𝒦\mathcal{K}caligraphic_K
7          (8) Learning rate α>0𝛼0\alpha>0italic_α > 0
Output: The optimized prompts 𝑷superscript𝑷\boldsymbol{P}^{*}bold_italic_P start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT
/*Initialization /*
8 Initialize with input 𝑷𝑷\boldsymbol{P}bold_italic_P;
9k0𝑘0k\leftarrow 0italic_k ← 0;
10 while k<𝒦𝑘𝒦k<\mathcal{K}italic_k < caligraphic_K do
       /*Update prompts /*
11       𝑷k+1𝑷kα(𝒙,y)𝒟𝑷(f(𝒙;𝑷),y)λ𝐊𝒙γ(q(𝒙),𝒌i)superscript𝑷𝑘1superscript𝑷𝑘𝛼subscript𝒙𝑦𝒟subscript𝑷𝑓𝒙𝑷𝑦𝜆subscriptsubscript𝐊𝒙𝛾𝑞𝒙subscript𝒌𝑖\boldsymbol{P}^{k+1}\leftarrow\boldsymbol{P}^{k}-\alpha\sum_{(\boldsymbol{x},y% )\in\mathcal{D}}\nabla_{\boldsymbol{P}}\mathcal{L}(f(\boldsymbol{x};% \boldsymbol{P}),y)-\lambda\sum_{\mathbf{K}_{\boldsymbol{x}}}\gamma\left(q(% \boldsymbol{x}),\boldsymbol{k}_{i}\right)bold_italic_P start_POSTSUPERSCRIPT italic_k + 1 end_POSTSUPERSCRIPT ← bold_italic_P start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT - italic_α ∑ start_POSTSUBSCRIPT ( bold_italic_x , italic_y ) ∈ caligraphic_D end_POSTSUBSCRIPT ∇ start_POSTSUBSCRIPT bold_italic_P end_POSTSUBSCRIPT caligraphic_L ( italic_f ( bold_italic_x ; bold_italic_P ) , italic_y ) - italic_λ ∑ start_POSTSUBSCRIPT bold_K start_POSTSUBSCRIPT bold_italic_x end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_γ ( italic_q ( bold_italic_x ) , bold_italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT )
12 end while
return: 𝑷superscript𝑷\boldsymbol{P}^{*}bold_italic_P start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT
Algorithm 1 Prompt Tuning
Input: (1) Surrogate model f𝑓fitalic_f
1          (2) Target class data samples 𝒟m={(𝒙,y)y=cm}subscript𝒟𝑚conditional-set𝒙𝑦𝑦subscript𝑐𝑚\mathcal{D}_{m}=\left\{\left(\boldsymbol{x},y\right)\mid y=c_{m}\right\}caligraphic_D start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = { ( bold_italic_x , italic_y ) ∣ italic_y = italic_c start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT }
2          (3) Prompt components 𝑷𝑷\boldsymbol{P}bold_italic_P
3          (4) Trigger 𝜹𝜹\boldsymbol{\delta}bold_italic_δ
4          (5) Criterion 𝜻𝜻\boldsymbol{\zeta}bold_italic_ζ
5          (6) Allowable set of trigger patterns ΔΔ\Deltaroman_Δ
6          (7) Number of iterations for prompt tuning \mathcal{I}caligraphic_I
7          (8) Learning rate η>0𝜂0\eta>0italic_η > 0
Output: The optimized adaptive trigger 𝜹superscript𝜹\boldsymbol{\delta^{*}}bold_italic_δ start_POSTSUPERSCRIPT bold_∗ end_POSTSUPERSCRIPT
/*Initialization /*
8 𝜹0𝜹subscript𝜹0𝜹\boldsymbol{\delta}_{0}\leftarrow\boldsymbol{\delta}bold_italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ← bold_italic_δ;
9 i0𝑖0i\leftarrow 0italic_i ← 0;
10 while i<𝑖i<\mathcal{I}italic_i < caligraphic_I do
       /*Update trigger /*
11       𝜹i+1𝜹iη(x,cm)𝒟m𝜹(f(𝒙+𝜹;𝑷),cm)subscript𝜹𝑖1subscript𝜹𝑖𝜂subscript𝑥subscript𝑐𝑚subscript𝒟𝑚subscript𝜹𝑓𝒙𝜹𝑷subscript𝑐𝑚\boldsymbol{\delta}_{i+1}\leftarrow\boldsymbol{\delta}_{i}-\eta\sum_{(x,c_{m})% \in\mathcal{D}_{m}}\nabla_{\boldsymbol{\delta}}\mathcal{L}(f(\boldsymbol{x}+% \boldsymbol{\delta};\boldsymbol{P}),c_{m})bold_italic_δ start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ← bold_italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_η ∑ start_POSTSUBSCRIPT ( italic_x , italic_c start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ∈ caligraphic_D start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∇ start_POSTSUBSCRIPT bold_italic_δ end_POSTSUBSCRIPT caligraphic_L ( italic_f ( bold_italic_x + bold_italic_δ ; bold_italic_P ) , italic_c start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT );
      /*Constraint trigger in psubscript𝑝\ell_{p}roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT-norm ball /*
12       𝜹i+1ProjΔ(𝜹i+1)subscript𝜹𝑖1𝑃𝑟𝑜subscript𝑗Δsubscript𝜹𝑖1\boldsymbol{\delta}_{i+1}\leftarrow Proj_{\Delta}(\boldsymbol{\delta}_{i+1})bold_italic_δ start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ← italic_P italic_r italic_o italic_j start_POSTSUBSCRIPT roman_Δ end_POSTSUBSCRIPT ( bold_italic_δ start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT )
13 end while
return: 𝜹superscript𝜹\boldsymbol{\delta}^{*}bold_italic_δ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT
Algorithm 2 Trigger Optimization
Input: (1) Initial surrogate model f𝑓fitalic_f
1          (2) Prompt pool 𝐏𝐏\mathbf{P}bold_P
2          (3) Target class cmsubscript𝑐𝑚c_{m}italic_c start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT
3          (4) Target class data samples 𝒟msubscript𝒟𝑚\mathcal{D}_{m}caligraphic_D start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT
4          (5) Surrogate 𝒟surrogatesubscript𝒟𝑠𝑢𝑟𝑟𝑜𝑔𝑎𝑡𝑒\mathcal{D}_{surrogate}caligraphic_D start_POSTSUBSCRIPT italic_s italic_u italic_r italic_r italic_o italic_g italic_a italic_t italic_e end_POSTSUBSCRIPT
5          (6) Number of iterations for full optimization \mathcal{E}caligraphic_E
Output: The optimized adaptive trigger 𝜹superscript𝜹\boldsymbol{\delta}^{*}bold_italic_δ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT
6
/*Partition the surrogate datasets into two subsets. /*
7 𝒟surrogate=𝒟static𝒟dynamicsubscript𝒟surrogatesubscript𝒟staticsubscript𝒟dynamic\mathcal{D}_{\text{surrogate}}=\mathcal{D}_{\text{static}}\cup\mathcal{D}_{% \text{dynamic}}caligraphic_D start_POSTSUBSCRIPT surrogate end_POSTSUBSCRIPT = caligraphic_D start_POSTSUBSCRIPT static end_POSTSUBSCRIPT ∪ caligraphic_D start_POSTSUBSCRIPT dynamic end_POSTSUBSCRIPT
/*Static surrogate stage. /*
8 𝐏𝐏\mathbf{P}bold_P \leftarrow PromtTuning(f𝑓fitalic_f, 𝒟static𝒟msubscript𝒟𝑠𝑡𝑎𝑡𝑖𝑐subscript𝒟𝑚\mathcal{D}_{static}\cup\mathcal{D}_{m}caligraphic_D start_POSTSUBSCRIPT italic_s italic_t italic_a italic_t italic_i italic_c end_POSTSUBSCRIPT ∪ caligraphic_D start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, 𝐏𝐏\mathbf{P}bold_P)
/*Static trigger optimization /*
9 Initialize𝜹𝐼𝑛𝑖𝑡𝑖𝑎𝑙𝑖𝑧𝑒𝜹Initialize\boldsymbol{\delta}italic_I italic_n italic_i italic_t italic_i italic_a italic_l italic_i italic_z italic_e bold_italic_δ;
10 𝜹𝜹absent\boldsymbol{\delta}\leftarrowbold_italic_δ ← TriggerUpdate(f,𝒟m,𝐏𝑓subscript𝒟𝑚𝐏f,\mathcal{D}_{m},\mathbf{P}italic_f , caligraphic_D start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , bold_P, ζ𝜁\zetaitalic_ζ = CE);
/*Transition stage /*
11 𝐏𝐏\mathbf{P}bold_P \leftarrow PromtTuning(f𝑓fitalic_f, 𝒟dynamicsubscript𝒟𝑑𝑦𝑛𝑎𝑚𝑖𝑐\mathcal{D}_{dynamic}caligraphic_D start_POSTSUBSCRIPT italic_d italic_y italic_n italic_a italic_m italic_i italic_c end_POSTSUBSCRIPT, 𝐏,𝜹𝐏𝜹\mathbf{P},\boldsymbol{\delta}bold_P , bold_italic_δ)
/*Dynamic stage /*
12 while e<𝑒e<\mathcal{E}italic_e < caligraphic_E do
       /*Update trigger /*
13       𝜹e+1subscript𝜹𝑒1absent\boldsymbol{\delta}_{e+1}\leftarrowbold_italic_δ start_POSTSUBSCRIPT italic_e + 1 end_POSTSUBSCRIPT ← TriggerUpdate(f,𝒟m,𝐏e,𝜹e𝑓subscript𝒟𝑚subscript𝐏𝑒subscript𝜹𝑒f,\mathcal{D}_{m},\mathbf{P}_{e},\boldsymbol{\delta}_{e}italic_f , caligraphic_D start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , bold_P start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT , bold_italic_δ start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT, ζ𝜁\zetaitalic_ζ = BCE);
      /*Update malicious prompt /*
14       𝑷e+1subscript𝑷𝑒1absent\boldsymbol{P}_{e+1}\leftarrowbold_italic_P start_POSTSUBSCRIPT italic_e + 1 end_POSTSUBSCRIPT ← PromptTuning(f,𝒟dynamic,𝐏e𝑓subscript𝒟𝑑𝑦𝑛𝑎𝑚𝑖𝑐subscript𝐏𝑒f,\mathcal{D}_{dynamic},\mathbf{P}_{e}italic_f , caligraphic_D start_POSTSUBSCRIPT italic_d italic_y italic_n italic_a italic_m italic_i italic_c end_POSTSUBSCRIPT , bold_P start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT);
15 end while
return: 𝜹superscript𝜹\boldsymbol{\delta}^{*}bold_italic_δ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT
Algorithm 3 AOP End-to-end Pipeline

Appendix C Implementation Details

In this section, we provide the implementation details of all experiments.

Victim prompt-based Learners

Our implementations of L2P, DualPrompt, L2P-PGP, and DualPrompt-PGP are based on the source code provided by [40]. The implementations of HiDe and CODA-Prompt are based on the original papers by [55] and [45], respectively. All experiments were conducted on NVIDIA V100 GPUs. For all victim learners, we utilize the Adam optimizer with β1=0.9subscript𝛽10.9\beta_{1}=0.9italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0.9 and β2=0.999subscript𝛽20.999\beta_{2}=0.999italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0.999.

For the L2P and L2P-PGP methods, we train the victim learner on the 5-Split-CUB200 dataset for 5 epochs per task, using a batch size of 16 and a prompt length of 5. When training on the 5/10/20-Split-ImageNet-R datasets, the number of epochs per task increases to 50, with a prompt length of 20. For DualPrompt and DualPrompt-PGP, training on the 5-Split-CUB200 dataset involves 10 epochs per task, with a prompt length of 5 and a batch size of 24. For the 5/10/20-Split-ImageNet-R datasets, these methods are trained for 50 epochs per task, with a prompt length of 20 and a batch size of 24. The HiDe-Prompt method employs 10 prompts, each with a length of 20, across all Split-ImageNet-R variants, training the main architecture for 50 epochs with a batch size of 24. Lastly, the CODA-Prompt method uses a configuration with 50 prompts, a pool size of 100, and a prompt length of 8.

The training times for the 6 incremental learners on the 5/10/20-SplitImageNet-R dataset range from 8 to 10 hours. For the Split-CUB200 dataset, the training times for L2P, L2P-PGP, DualPrompt, and DualPrompt-PGPP are 0.5 hours, 1 hour, 1.5 hours, and 2 hours, respectively.

Backdoor framework

Our surrogate learner adopts the same settings as L2P. In the initial stage, training spans 5 epochs. Stage 2 focuses on trigger optimization, utilizing RAdam optimizer for 100 epochs with a learning rate of 0.010.010.010.01. Stage 3 follows a training setting akin to stage 1. Subsequently, we initiate the dynamic stages, where the surrogate learner undergoes an update for one epoch after every 20 rounds of trigger optimization. This dynamic stage iterates for 10 rounds during attacks on Split-ImageNet-R and 5 rounds for Split-CUB200. For Split-ImageNet-R, the training times for stages (1) and (3) are both 2 hours, stage (2) takes 0.2 hours, and stage (4) takes 8 hours. For Split-CUB200, the training times for the four stages are 2 hours, 0.1 hours, 2 hours, and 5 hours, respectively.

Appendix D Additional Experiments

D.1 Further discussion on AOP

In this Appendix, we discuss the differences in ASR when using AOP to backdoor prompt-based continual learners. As shown in Table 2, in most experiments, L2P and PGP achieve the highest ASR, followed by DualPrompt and DualPrompt-PGP.

Firstly, our surrogate prompt uses the same prompt techniques and objectives as L2P, which explains its highest performance. DualPrompt introduces shared-task prompts, which might affect the ASR when updated with new classes. Additionally, unlike L2P, DualPrompt uses prefix tuning, which could cause the slight decrease in ASR. However, the ASR of DualPrompt remains higher than 96%percent9696\%96 %, highlighting the potential for backdoor transfer between different prompt techniques. The two versions of PGP achieve performance similar to the original ones, as PGP focuses only on the update direction of prompts.

Compared to the above four versions, HiDe and CODA-Prompt show lower performance. The lower ASR of HiDe might result from using iBOT-1K as the pre-trained model for HiDe, which differs from the other learners and our surrogate learner. As prompting serves as label map**, different source datasets might influence the map** and thus the backdoor performance. Lastly, CODA suffers from the lowest ASR and the highest standard deviation. This is due to CODA’s prompt selection mechanism, which uses an attention mechanism to get the weighted summation of all prompts, differing from the other methods.

D.2 Additional comparison between AOP and baseline

Narcissus [64] also assumes that the attacker only has access to target data. They employ a public dataset as a surrogate dataset and optimize the trigger using the clean surrogate dataset. Our work is motivated by Narcissus, we extend the surrogate dataset in the context of prompting and exploit the label map** property. Additionally, we employ dynamic stages and adopt BCE to prevent adversarial noise.

In Table 1, we compare AOP and Narcissus, showing that Narcissus experiences catastrophic forgetting. To provide further discussion, in Figure 5, we visualize the ASR flow for each task between our AOP and Narcissus. We trained Narcissus using the same dataset and the same number of epochs as in stages (1) and (2) of our AOP. As visualized in Figure 5, although Narcissus initially achieves high performance, it tends to experience catastrophic forgetting over time. Consequently, the performance gap between AOP and Narcissus increases as the training process continues.

Refer to caption
Figure 5: Comparison of ASR history for each task during the incremental learning process between AOP and Narcissus, using CODA-Prompt with 10-Split-ImageNet-R dataset for visualization.
Refer to caption
Figure 6: Evaluation of AOP against Neural Cleanse. Results are reported at three checkpoints: tasks 1, 5, and 10, when attacking L2P on Split-ImageNet-R.
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 7: Comparison of AOP against STRIP. The results are visualized based on the attacked L2P on the 10-Split-ImageNet-R dataset.

D.3 Defenses

In this Appendix, we evaluate the robustness of AOP against several popular defenses, namely Neural Cleanse, STRIP, and FST.

Neural Cleanse

Neural Cleanse [53] is a widely used model defense. Specifically, for each class, Neural Cleanse optimizes a trigger that induces all data to be misclassified to the target class. It then detects backdoor models by checking for abnormally small patterns among the optimized triggers using the Anomaly Index with a flag threshold of 2. We experimented with Neural Cleanse on 10-Split-ImageNet-R using checkpoint models from tasks 1, 5, and 10. AOP successfully passed Neural Cleanse as in Figure 6.

STRIP

STRIP [16] is a popular test-time defense method. Given the model and a suspicious input, STRIP perturbs the input using a set of clean images from different classes and records the prediction entropy over the perturbed images. STRIP flags images as poisoned if the predictions are consistent, indicated by low entropy. We visualized the entropy of our AOP on ImageNet-R using checkpoint models from tasks 1 to 10 in Figure 7, and observed that our backdoored models exhibited a similar entropy range to benign ones, thereby passing the STRIP test.

FST

We evaluated AOP against a robust fine-tuning-based defense method, FST [35]. FST operates by storing a small amount of clean data to fine-tune the model, reinitializing the classifier weights, and encouraging deviation from the originally compromised weights. We report the performance of FST with respect to different fine-tune data ratios as in the original paper (2%percent22\%2 % and 5%percent55\%5 %) and varying weights on the deviation regularizer.

We found that FST was successful in mitigating AOP, confirming its effectiveness in addressing backdoor knowledge. However, we observed that reinitializing the classifier weights results in significant forgetting, causing a considerable drop in accuracy. Thus, FST is impractical because it severely hurts the utility of the continual model while lacking verification of whether an attack exists. Furthermore, it is essential to note that FST conflicts with our data privacy prioritization scenario, as it requires storing data from all tasks.

We hope our findings will inspire the development of strong defense methods compatible with multi-data supplier scenarios while upholding data privacy in continual learning.

Discussion on potential defenses

As observed in Figures 1(b) and 1(b), poisoned samples consistently exhibit queries for specific prompt IDs, while clean samples demonstrate a more balanced distribution in prompt frequency selection. Consequently, potential defenses against AOP may involve monitoring the frequency selections of test samples during inference. A backdoor flag can be raised if biases in prompt selection frequencies are observed in suspected input samples. Furthermore, drawing inspiration from Fine-Pruning techniques [44], which prune inactive neurons when predicting clean images, one could extend this approach to Prompt-Pruning, effectively eliminating inactive prompts.

Table 5: ACC and ASR of AOP on L2P with 10-Split-ImageNet-R when applying FST as the defense method. Here, α𝛼\alphaitalic_α represents the weight of the feature shifting regularization, and N𝑁Nitalic_N denotes the number of samples saved for finetuning.
N=600𝑁600N=600italic_N = 600 N=1200𝑁1200N=1200italic_N = 1200
2.5%percent2.52.5\%2.5 % 5%percent55\%5 %
ACC ASR ACC ASR
α=2e5𝛼2𝑒5\alpha=2e-5italic_α = 2 italic_e - 5 ##\## epochs = 10 41.7841.7841.7841.78 0.000.000.000.00 56.8756.8756.8756.87 0.000.000.000.00
α=2e5𝛼2𝑒5\alpha=2e-5italic_α = 2 italic_e - 5 ##\## epochs = 20 38.8838.8838.8838.88 0.000.000.000.00 53.7553.7553.7553.75 0.000.000.000.00
α=2e4𝛼2𝑒4\alpha=2e-4italic_α = 2 italic_e - 4 ##\## epochs = 10 40.5340.5340.5340.53 0.00.00.00.0 55.3155.3155.3155.31 0.00.00.00.0
Table 6: Backdoor performance when varying poison rates on 10-Split-ImageNet-R. P𝑃Pitalic_P denotes the number of poisoned images during training and γ𝛾\gammaitalic_γ is the corresponding poisoning rate.
P=0𝑃0P=0italic_P = 0 P=2𝑃2P=2italic_P = 2 P=5𝑃5P=5italic_P = 5 P=25𝑃25P=25italic_P = 25 P=100𝑃100P=100italic_P = 100
γ=0%𝛾percent0\gamma=0\%italic_γ = 0 % γ=0.01%𝛾percent0.01\gamma=0.01\%italic_γ = 0.01 % γ=0.02%𝛾percent0.02\gamma=0.02\%italic_γ = 0.02 % γ=0.1%𝛾percent0.1\gamma=0.1\%italic_γ = 0.1 % γ=0.5%𝛾percent0.5\gamma=0.5\%italic_γ = 0.5 %
L2P 0.000.000.000.00 13.7613.7613.7613.76 91.8691.8691.8691.86 99.5699.5699.5699.56 99.9999.9999.9999.99
L2P-PGP 0.000.000.000.00 10.0810.0810.0810.08 90.7790.7790.7790.77 99.3699.3699.3699.36 99.9499.9499.9499.94

D.4 Sensitivity to poisoning rates

We validate the sensitivity of AOP with respect to varying poisoning rates. We emphasize that this factor is particularly crucial in the context of backdooring CL, where the adversary only has access to the target class data—a small proportion of the overall dataset. Therefore, maintaining backdoor effectiveness with a low poisoning rate is essential. Our AOP demonstrates favorable performance, achieving over 90%percent9090\%90 % accuracy even with a poisoning rate as low as 0.01%percent0.010.01\%0.01 %. This highlights the efficacy of our method in scenarios with minimal poisoning.

Appendix E Broader Impacts

Our research contributes to the research community and AI systems by exploring the potentiality of targeted backdoor attacks in continual learning settings. By shedding light on the capabilities of such attacks, we heighten awareness about the backdoor threat, especially in private multi-data supplier scenarios. This heightened awareness encourages looking for potential protection and defenses against backdoor manipulation, a crucial key in enhancing the safety and trustworthiness of AI systems.

Nonetheless, it is essential to acknowledge that our findings could inadvertently provide insights for attackers seeking to exploit continual learners with backdoors. Nevertheless, we believe that strong and efficient defense mechanisms will emerge to safeguard continual learners against such threats. Consequently, the positive impact of our research outweighs potential negative repercussions.