License: CC BY-NC-SA 4.0
arXiv:2401.10458v1 [cs.LG] 19 Jan 2024
\svgsetup

inkscapelatex=false

Contrastive Unlearning: A Contrastive Approach to Machine Unlearning

Hong kyu Lee11{}^{1}start_FLOATSUPERSCRIPT 1 end_FLOATSUPERSCRIPT    Qiuchen Zhang22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT    Carl Yang33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPT&Jian Lou44{}^{4}start_FLOATSUPERSCRIPT 4 end_FLOATSUPERSCRIPT &Li Xiong55{}^{5}start_FLOATSUPERSCRIPT 5 end_FLOATSUPERSCRIPT
11{}^{1}start_FLOATSUPERSCRIPT 1 end_FLOATSUPERSCRIPT22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPT55{}^{5}start_FLOATSUPERSCRIPT 5 end_FLOATSUPERSCRIPTEmory University
44{}^{4}start_FLOATSUPERSCRIPT 4 end_FLOATSUPERSCRIPTZhejiang University
{hong.kyu.lee, qiuchen.zhang, j.carlyang}@emory.edu, [email protected], [email protected]
Abstract

Machine unlearning aims to eliminate the influence of a subset of training samples (i.e., unlearning samples) from a trained model. Effectively and efficiently removing the unlearning samples without negatively impacting the overall model performance is still challenging. In this paper, we propose a contrastive unlearning framework, leveraging the concept of representation learning for more effective unlearning. It removes the influence of unlearning samples by contrasting their embeddings against the remaining samples so that they are pushed away from their original classes and pulled toward other classes. By directly optimizing the representation space, it effectively removes the influence of unlearning samples while maintaining the representations learned from the remaining samples. Experiments on a variety of datasets and models on both class unlearning and sample unlearning showed that contrastive unlearning achieves the best unlearning effects and efficiency with the lowest performance loss compared with the state-of-the-art algorithms.

1 Introduction

Machine unlearning Cao and Yang (2015) aims to remove a subset of data (i.e., unlearning samples) from a trained machine learning (ML) model and has received increasing attention due to various privacy regulations. Notably, “the right to be forgotten” from the General Data Protection Requirement (GDPR) gives individuals the right to request their data to be removed from databases, which extends to models trained on such data Mantelero (2024). Since models can remember training data within their parameters Arpit et al. (2017), it is necessary to “unlearn” these data from a trained model. The goals and evaluation metrics for unlearning typically include: 1) unlearning effectiveness, which measures how well the algorithm removes the influence of unlearning samples. This can be assessed by the model’s performance on the unlearning samples (where low accuracy indicates effective unlearning), or by its robustness against membership inference attacks Shokri et al. (2017), using unlearning samples (where a low member prediction rate indicates effective unlearning); 2) model performance on its original tasks, which ensures that the unlearning does not significantly degrade its overall accuracy; and 3) computational efficiency, which assesses the time and resources required for the unlearning.

Current machine unlearning approaches can be categorized into exact unlearning and approximate unlearning. Exact unlearning ensures all influence of the unlearning data is removed as if the data were never part of the training set. Retraining the model from scratch excluding the unlearning samples is a baseline method which can be computationally expensive. SISA is an exact unlearning method based on data partitioning and retraining Bourtoule et al. (2021) which alleviates the computational intensity of complete retraining, but requires training of multiple models (on each partition) and retraining of the models containing the unlearning data, and its partitioning strategy may lead to reduced model performance. Approximate unlearning offers a more feasible alternative and seeks to remove the influence of the unlearning data to a negligible level, typically by updating the model in a way that diminishes the impact of the unlearning data. A subcategory is certified unlearning Gupta et al. (2021); Guo et al. (2020); Neel et al. (2024) which provides a quantifiable approximation guarantee on the removal of the data. Most approximate unlearning methods use the evaluation metrics discussed earlier to empirically evaluate the models.

While many promising approaches are proposed, existing works present several limitations: 1) they mainly exploit input and output space and typical classification loss without explicitly considering the latent representations of the samples, 2) they either focus on unlearning samples or remaining samples alone without considering them together or use both but in an ineffective way for unlearning and hence either sacrifice the model performance or the unlearning effectiveness. For example, NegGrad (Negative Gradient) Golatkar et al. (2020) only uses unlearning samples and attempts to reverse their impact by applying gradient ascent using the classification loss. Finetune Golatkar et al. (2020) only uses remaining samples to iteratively retrain the model to gradually remove the information of unlearning samples leveraging the catastrophic forgetting effect Goodfellow et al. (2013). SCRUB Kurmanji et al. (2023) uses both unlearning and remaining samples for unlearning, but requires multiple iterations over the entire remaining samples that leads to excessive computations with poor unlearning effectiveness.

Our Contributions. To address these deficiencies, we present a novel contrastive approach for machine unlearning. We re-purpose the idea of contrastive learning, a widely used representation learning approach, for more effective unlearning. The main idea is that given an unlearning sample, we contrast it with 1) Positive samples (remaining samples from the same class as the unlearning sample) and push their representations apart from each other, and 2) Negative samples, (remaining samples from different classes as the unlearning sample) and pull their representations close to each other. It has two main insights. First, it exploits the representation space of the samples and directly optimizes the geometric properties of the embeddings of unlearning samples, which captures the underlying structures and most important features of the samples being memorized, facilitating more effective unlearning. Second, by contrasting unlearning samples and remaining samples during unlearning and using both positive and negative remaining samples as references for optimizing the embedding of unlearning samples, it can effectively remove the influence of unlearning samples while kee** the embeddings of the remaining samples intact, with an auxiliary classification loss on the contrasted remaining samples, hence maintaining model accuracy.

Albeit taking inspiration from contrastive learning, our contrastive unlearning has novel algorithm designs and gains a new finding, including: 1) we construct contrasting pairs different from conventional contrastive learning to serve the unlearning purpose and further design new contrastive unlearning losses for both sample unlearning and single class unlearning tasks; 2) while it appears common to add a classification loss to maintain the performance of the unlearning model, through the new lens of contrastive unlearning, we make a novel finding that the classification loss can help keep the embeddings of the remaining samples in place and reciprocally improve unlearning effectiveness, which is validated by our empirical analysis followed by in-depth analysis.

Refer to caption
Figure 1: Visualization of representation spaces of contrastive unlearning, gradient ascent, and finetune.

Figure 1 illustrates the intuition of contrastive unlearning in comparison to existing approaches in a normalized representation space. Circles and squares are embeddings of the unlearning samples and remaining samples respectively. The colors represent different classes. We assume the model has been trained, so the embeddings are clustered to their respective classes Das and Chaudhuri (2024). Given an embedding of unlearning sample zisubscript𝑧𝑖z_{i}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, contrastive unlearning pushes zisubscript𝑧𝑖z_{i}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT away from its own class (positive pairs) and pulls zisubscript𝑧𝑖z_{i}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT towards the samples with different classes (negative pairs). This results in the unlearned embedding zisubscriptsuperscript𝑧𝑖z^{\prime}_{i}italic_z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to be geometrically distant from every class (achieving effective unlearning) while kee** the embeddings of remaining samples relatively intact (maintaining model utility). In comparison, NegGrad attempts to reverse the impact of unlearning samples in the input and output space using classification loss. It has some impact pushing zisubscript𝑧𝑖z_{i}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT away in the representation space but is not very effective as it changes the decision boundary of classes (ineffective unlearning). In addition, it may significantly affect embeddings of remaining samples of the same class (model utility loss). Finetune attempts to retrain the model only using remaining samples. In representation space, this only indirectly pushes the unlearning samples away from the remaining samples (ineffective unlearning) and is susceptible to overfitting to the remaining samples (model utility loss).

We conduct comprehensive experiments on both class unlearning and sample unlearning to demonstrate the effectiveness and versatility of our approach in comparison to state-of-the-art methods. Single class unlearning is to remove all samples from a class, and sample unlearning is to remove the arbitrarily selected samples. Experimental results on model accuracy show that contrastive unlearning achieves the most effective unlearning (low model accuracy on unlearning samples comparable to the retrained model) while maintaining model utility (high model accuracy on test samples), with high computation efficiency. In addition, we conduct a membership inference attack (MIA) Shokri et al. (2017) for deeper verification of unlearning. We assume a strong adversary who has full access to the unlearned model, simulating an administrator who conducted unlearning and wants to verify the effectiveness of unlearning Thudi et al. (2022); Cotogni et al. (2023). Contrastive unlearning has the lowest member prediction rate on unlearning samples. compared to all baselines, indicating the most effective unlearning.

In summary, our contributions are as follows.

  1. 1.

    We propose contrastive unlearning, an unlearning algorithm utilizing the concept of contrastive learning. It directly optimizes the geometric properties of embeddings of unlearning samples by contrasting them with embeddings of the remaining samples in the representation space. This effectively captures and removes the most important features relevant for classification from the embeddings of the unlearning samples (achieving effective unlearning) while kee** the embeddings of remaining samples relatively intact (maintaining model utility). In addition, we design contrastive learning losses for both single class unlearning and sample unlearning.

  2. 2.

    We conduct comprehensive experiments comparing contrastive unlearning with various state-of-the-art methods on two unlearning tasks, single class and sample unlearning, to demonstrate the effectiveness and versatility of our approach. We also conduct a membership inference attackto verify the effectiveness of unlearning. The results show that contrastive unlearning is most effective in unlearning while maintaining model utility with high computation efficiency.

2 Related Works

Machine unlearning is introduced by Cao and Yang (2015). They defined two unlearning goals: completeness suggests that an unlearning algorithm should reverse the influence of unlearning samples and the unlearned model should be consistent with a model retrained from all training samples except the unlearning samples; timeliness requires the running time of the unlearning algorithm should be faster than retraining. On top of that, the unlearned model should incur low performance loss due to the unlearning.

Exact unlearning ensures the completeness of unlearning. SISA is an exact unlearning framework based on sharding. It splits the dataset into multiple partitions and trains a model for each shard. Given an unlearning request, it retrains the models whose shard has the unlearning sample Bourtoule et al. (2021). ARCANE uses a partitioning strategy by the classes of the samples Yan et al. (2022). These frameworks require partitioned training and still expensive retraining computation, and model performance is highly dependent on partitioning strategy Koch and Soll (2023).

Approximate unlearning allows approximate completeness. Certified unlearning provides a mathematical guarantee on the approximation. (ε,δ)𝜀𝛿(\varepsilon,\delta)( italic_ε , italic_δ )-indistinguishability similar to differential privacy is proposed Guo et al. (2020) using Newton-type hessian update.  Neel et al. (2024) proposes an algorithm based on project gradient descent on the partitioned dataset with a probabilistic bound. Approximation guarantee is also useful for graph unlearning Wu et al. (2023). Gupta et al. (2021) studied unlearning requests that may be correlated and derived the unlearning guarantee with adaptive unlearning streams. Fisher unlearning uses Fisher information matrix Golatkar et al. (2020) to identify optimal noise to remove the influence of unlearning samples. A general drawback of certified unlearning algorithms is the difficulty to scale to neural networks, as convexity of the loss function is often required to satisfy the mathematical guarantee. Also, they are computationally expensive. Despite some efforts such as LCODEC Mehta et al. (2022) for alleviating the computation cost by selectively generating Hessians, computing Fisher information and Hessian matrix is expensive.

Another body of approximate unlearning shows the unlearning effect through empirical evaluations. Besides NegGrad Golatkar et al. (2020) and Finetune Golatkar et al. (2020) discussed earlier, several methods are designed to unlearn an entire class. UNSIR Tarun et al. (2023) conducts noisy gradient updates using the unlearning class. Boundary unlearning unlearns an entire class Chen et al. (2023) by changing decision boundaries. ERM-KTP uses a special neural architecture known as entanglement reduce mask Lin et al. (2023). SCRUB Kurmanji et al. (2023) is based on the teacher-student network in which the teacher or the original model transfers knowledge to the unlearned model in every class except the unlearning class.

Our approach is an approximate unlearning method that works for both sample and class unlearning. We compare it with both types of methods, as well as empirical and certified methods, and demonstrate its superiority through empirical evaluations.

3 Preliminaries and Problem Definition

3.1 Contrastive Learning

Contrastive learning, in particular SimCLR, pulls an embedding of a sample (or an anchor) toward the embedding of its augmented self and pushes it away from embeddings of other samples Chen et al. (2020). SimCLR was originally proposed as a self-supervised learning framework, hence it does not rely on labels. To enhance classification performance, Supervised Contrastive learning (SupCon) was introduced to leverage contrasting learning in a supervised manner Khosla et al. (2020). Instead of contrasting based on augmented data, it contrasts samples based on their classes. Specifically, embeddings of the same classes are pulled together, and embeddings of different classes are pushed away. We utilize a contrastive loss inspired by SupCon for unlearning.

3.2 Problem Definition

We define a classification model =(θ())subscript𝜃\mathcal{F}=\mathcal{H}\left(\mathcal{E_{\theta}\left(\cdot\right)}\right)caligraphic_F = caligraphic_H ( caligraphic_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( ⋅ ) ) where θ()subscript𝜃\mathcal{E}_{\theta}\left(\cdot\right)caligraphic_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( ⋅ ) is a neural network based encoder parameterized by θ𝜃\thetaitalic_θ and ()\mathcal{H}\left(\cdot\right)caligraphic_H ( ⋅ ) is a classification head. θsubscript𝜃\mathcal{E}_{\theta}caligraphic_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT produces embeddings z𝑧zitalic_z given a sample x𝑥xitalic_x. \mathcal{H}caligraphic_H receives z𝑧zitalic_z and yields a prediction. Let \mathcal{F}caligraphic_F be trained using dataset 𝒟tr={(x1,y1)(xn,yn)}subscript𝒟𝑡𝑟subscript𝑥1subscript𝑦1subscript𝑥𝑛subscript𝑦𝑛\mathcal{D}_{tr}=\{\left(x_{1},y_{1}\right)\cdots\left(x_{n},y_{n}\right)\}caligraphic_D start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT = { ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ⋯ ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) }, where each data point is a tuple (xi,yi)subscript𝑥𝑖subscript𝑦𝑖\left(x_{i},y_{i}\right)( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) including feature set xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and label yi{0C}subscript𝑦𝑖0𝐶y_{i}\in\{0\cdots C\}italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ { 0 ⋯ italic_C } where C𝐶Citalic_C is the number of classes. We suppose \mathcal{F}caligraphic_F was trained with cross-entropy loss. Let 𝒟tssubscript𝒟𝑡𝑠\mathcal{D}_{ts}caligraphic_D start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT be a test dataset sampled from an analogous distribution with 𝒟trsubscript𝒟𝑡𝑟\mathcal{D}_{tr}caligraphic_D start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT, satisfying 𝒟ts𝒟tr=subscript𝒟𝑡𝑠subscript𝒟𝑡𝑟\mathcal{D}_{ts}\cup\mathcal{D}_{tr}=\emptysetcaligraphic_D start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT ∪ caligraphic_D start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT = ∅.

Let 𝒟trusubscriptsuperscript𝒟𝑢𝑡𝑟\mathcal{D}^{u}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT 𝒟trabsentsubscript𝒟𝑡𝑟\subseteq\mathcal{D}_{tr}⊆ caligraphic_D start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT be a set of samples to be forgotten (i.e., unlearning samples). The remaining set is 𝒟trr=𝒟tr𝒟trusubscriptsuperscript𝒟𝑟𝑡𝑟subscript𝒟𝑡𝑟subscriptsuperscript𝒟𝑢𝑡𝑟\mathcal{D}^{r}_{tr}=\mathcal{D}_{tr}\setminus\mathcal{D}^{u}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT = caligraphic_D start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT ∖ caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT. Let a retrained model Rsuperscript𝑅\mathcal{F}^{R}caligraphic_F start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT be trained only with 𝒟trrsubscriptsuperscript𝒟𝑟𝑡𝑟\mathcal{D}^{r}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT. An unlearning algorithm M𝑀Mitalic_M receives 𝒟trr,𝒟tru,θsubscriptsuperscript𝒟𝑟𝑡𝑟subscriptsuperscript𝒟𝑢𝑡𝑟𝜃\mathcal{D}^{r}_{tr},\mathcal{D}^{u}_{tr},\thetacaligraphic_D start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT , caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT , italic_θ and produces θsuperscript𝜃\theta^{\prime}italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. An unlearned model =(θ)superscriptsubscriptsuperscript𝜃\mathcal{F}^{\prime}=\mathcal{H}\left(\mathcal{E}_{\theta^{\prime}}\right)caligraphic_F start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = caligraphic_H ( caligraphic_E start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) should resemble Rsuperscript𝑅\mathcal{F}^{R}caligraphic_F start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT.

Single Class Unlearning. For single class unlearning, Dtrusubscriptsuperscript𝐷𝑢𝑡𝑟D^{u}_{tr}italic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT consists of entire samples of an unlearning class c𝑐citalic_c. The test set 𝒟tssubscript𝒟𝑡𝑠\mathcal{D}_{ts}caligraphic_D start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT can be split into 𝒟tsusubscriptsuperscript𝒟𝑢𝑡𝑠\mathcal{D}^{u}_{ts}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT and 𝒟tsrsubscriptsuperscript𝒟𝑟𝑡𝑠\mathcal{D}^{r}_{ts}caligraphic_D start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT, where 𝒟tsusubscriptsuperscript𝒟𝑢𝑡𝑠\mathcal{D}^{u}_{ts}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT includes all test samples of class c𝑐citalic_c, and 𝒟tsr=𝒟ts𝒟tsusubscriptsuperscript𝒟𝑟𝑡𝑠subscript𝒟𝑡𝑠subscriptsuperscript𝒟𝑢𝑡𝑠\mathcal{D}^{r}_{ts}=\mathcal{D}_{ts}\setminus\mathcal{D}^{u}_{ts}caligraphic_D start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT = caligraphic_D start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT ∖ caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT includes all test samples of remaining classes. A retrained model Rsuperscript𝑅\mathcal{F}^{R}caligraphic_F start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT will have zero accuracy on 𝒟trusubscriptsuperscript𝒟𝑢𝑡𝑟\mathcal{D}^{u}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT and 𝒟tsusubscriptsuperscript𝒟𝑢𝑡𝑠\mathcal{D}^{u}_{ts}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT, the training and test samples of class c𝑐citalic_c, since it was retrained without class c𝑐citalic_c. So given an accuracy function Acc𝐴𝑐𝑐Accitalic_A italic_c italic_c, the goal of single class unlearning is for the unlearned model superscript\mathcal{F}^{\prime}caligraphic_F start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT to achieve accuracy as close to zero as possible on both training and test samples of class c𝑐citalic_c (effective unlearning) and similar accuracy as the retrained model Rsuperscript𝑅\mathcal{F}^{R}caligraphic_F start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT for remaining classes (model performance).

Acc(,𝒟tru)0,Acc(,𝒟tsu)0,formulae-sequenceAccsuperscriptsubscriptsuperscript𝒟𝑢𝑡𝑟0Accsuperscriptsubscriptsuperscript𝒟𝑢𝑡𝑠0\displaystyle\mathrm{Acc}\left(\mathcal{F}^{\prime},\mathcal{D}^{u}_{tr}\right% )\approx 0,\quad\mathrm{Acc}\left(\mathcal{F}^{\prime},\mathcal{D}^{u}_{ts}% \right)\approx 0,roman_Acc ( caligraphic_F start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT ) ≈ 0 , roman_Acc ( caligraphic_F start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT ) ≈ 0 , (1)
Acc(,𝒟tsr)Acc(R,𝒟tsr).Accsuperscriptsubscriptsuperscript𝒟𝑟𝑡𝑠𝐴𝑐𝑐superscript𝑅subscriptsuperscript𝒟𝑟𝑡𝑠\displaystyle\mathrm{Acc}\left(\mathcal{F}^{\prime},\mathcal{D}^{r}_{ts}\right% )\approx Acc\left(\mathcal{F}^{R},\mathcal{D}^{r}_{ts}\right).roman_Acc ( caligraphic_F start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , caligraphic_D start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT ) ≈ italic_A italic_c italic_c ( caligraphic_F start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT , caligraphic_D start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT ) . (2)

Sample Unlearning. For sample unlearning, the unlearning samples 𝒟trusubscriptsuperscript𝒟𝑢𝑡𝑟\mathcal{D}^{u}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT can belong to different classes. A retrained model Rsuperscript𝑅\mathcal{F}^{R}caligraphic_F start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT will have similar accuracy on unlearning samples 𝒟trusubscriptsuperscript𝒟𝑢𝑡𝑟\mathcal{D}^{u}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT and test samples 𝒟tssubscript𝒟𝑡𝑠\mathcal{D}_{ts}caligraphic_D start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT since unlearning samples are not in the training set anymore. So the goal of sample unlearning is for the unlearned model superscript\mathcal{F}^{\prime}caligraphic_F start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT to achieve similar accuracy as the retrained model Rsuperscript𝑅\mathcal{F}^{R}caligraphic_F start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT on both unlearning samples (effective unlearning) and test samples (model performance).

Acc(,𝒟tru)Acc(R,𝒟ts),Accsuperscriptsubscriptsuperscript𝒟𝑢𝑡𝑟𝐴𝑐𝑐superscript𝑅subscript𝒟𝑡𝑠\displaystyle\mathrm{Acc}\left(\mathcal{F}^{\prime},\mathcal{D}^{u}_{tr}\right% )\approx Acc\left(\mathcal{F}^{R},\mathcal{D}_{ts}\right),roman_Acc ( caligraphic_F start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT ) ≈ italic_A italic_c italic_c ( caligraphic_F start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT , caligraphic_D start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT ) , (3)
Acc(,𝒟ts)Acc(R,𝒟ts).Accsuperscriptsubscript𝒟𝑡𝑠𝐴𝑐𝑐superscript𝑅subscript𝒟𝑡𝑠\displaystyle\mathrm{Acc}\left(\mathcal{F}^{\prime},\mathcal{D}_{ts}\right)% \approx Acc\left(\mathcal{F}^{R},\mathcal{D}_{ts}\right).roman_Acc ( caligraphic_F start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , caligraphic_D start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT ) ≈ italic_A italic_c italic_c ( caligraphic_F start_POSTSUPERSCRIPT italic_R end_POSTSUPERSCRIPT , caligraphic_D start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT ) . (4)

4 Contrastive Unlearning

The novelty of contrastive unlearning is our perspective on utilizing geometric properties of latent representation space for unlearning purposes. If a sample x𝑥xitalic_x had been used as a training example, information extracted from x𝑥xitalic_x by θsubscript𝜃\mathcal{E}_{\theta}caligraphic_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT would be expressed as geometric properties in the representation space. Specifically, we hypothesize that a trained model generates geometrically similar embeddings from samples with the same class and distant embeddings for samples with different classes even when the model was not trained by representation learning techniques. This can be supported by existing literature, which mathematically and empirically showed that a model optimized with cross-entropy loss produces higher geometric similarity among embeddings of samples of the same class and lower similarity among different classes Das and Chaudhuri (2024); Graf et al. (2021).

From this intuition, we aim to modify characteristics of representation space by pushing embeddings of unlearning samples away from those of remaining samples. To effectively achieve this, we contrast each unlearning sample with 1) remaining samples from the same class (positive pairs) and push their representations apart from each other, and 2) remaining samples from different classes (negative pairs) and pull their representations close to each other. To this end, the embeddings of unlearning samples end up in the middle of all the remaining samples. This has some relation with existing literature of contrastive learning, however, our approach is fundamentally different as it contrasts pairs of unlearning and remaining samples while contrastive learning contrasts samples simply by their classes.

Contrastive Unlearning Loss: Sample Unlearning. Contrastive unlearning uses a batched process. In each round, an unlearning batch Xu={x1u,xBu}superscript𝑋𝑢subscriptsuperscript𝑥𝑢1subscriptsuperscript𝑥𝑢𝐵X^{u}=\{x^{u}_{1},\cdots x^{u}_{B}\}italic_X start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT = { italic_x start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ italic_x start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT } with size B𝐵Bitalic_B is sampled from the unlearning data 𝒟trusubscriptsuperscript𝒟𝑢𝑡𝑟\mathcal{D}^{u}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT, and a remaining batch Xr={x1rxBr}superscript𝑋𝑟subscriptsuperscript𝑥𝑟1subscriptsuperscript𝑥𝑟𝐵X^{r}=\{x^{r}_{1}\cdots x^{r}_{B}\}italic_X start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT = { italic_x start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋯ italic_x start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT } is sampled from the remaining set 𝒟trrsubscriptsuperscript𝒟𝑟𝑡𝑟\mathcal{D}^{r}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT. We denote xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT as i𝑖iitalic_i-th sample of Xusuperscript𝑋𝑢X^{u}italic_X start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT as an anchor. Based on the anchor xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, positives and negatives are chosen from Xrsuperscript𝑋𝑟X^{r}italic_X start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT. Positives are P𝐱(xi)={xj|xjXr,yj=yi}subscript𝑃𝐱subscript𝑥𝑖conditional-setsubscript𝑥𝑗formulae-sequencesubscript𝑥𝑗superscript𝑋𝑟subscript𝑦𝑗subscript𝑦𝑖P_{\mathbf{x}}\left(x_{i}\right)=\{x_{j}|x_{j}\in X^{r},y_{j}=y_{i}\}italic_P start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = { italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ italic_X start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT , italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT }, or remaining samples with the same class as xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT; negatives are N𝐱(xi)={xj|xjXr,yjyi}subscript𝑁𝐱subscript𝑥𝑖conditional-setsubscript𝑥𝑗formulae-sequencesubscript𝑥𝑗superscript𝑋𝑟subscript𝑦𝑗subscript𝑦𝑖N_{\mathbf{x}}\left(x_{i}\right)=\{x_{j}|x_{j}\in X^{r},y_{j}\neq y_{i}\}italic_N start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = { italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ italic_X start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT , italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≠ italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT }, or remaining samples with different class as xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Correspondingly, let embeddings of positives and negatives be P𝐳(xi)={zj|zj=θ(xj),xjP𝐱(xi)}subscript𝑃𝐳subscript𝑥𝑖conditional-setsubscript𝑧𝑗formulae-sequencesubscript𝑧𝑗subscript𝜃subscript𝑥𝑗subscript𝑥𝑗subscript𝑃𝐱subscript𝑥𝑖P_{\mathbf{z}}\left(x_{i}\right)=\{z_{j}|z_{j}=\mathcal{E}_{\theta}\left(x_{j}% \right),x_{j}\in P_{\mathbf{x}}\left(x_{i}\right)\}italic_P start_POSTSUBSCRIPT bold_z end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = { italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = caligraphic_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) , italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ italic_P start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } and N𝐳(xi)={zj|zj=θ(xj),xjN𝐱(xi)}subscript𝑁𝐳subscript𝑥𝑖conditional-setsubscript𝑧𝑗formulae-sequencesubscript𝑧𝑗subscript𝜃subscript𝑥𝑗subscript𝑥𝑗subscript𝑁𝐱subscript𝑥𝑖N_{\mathbf{z}}\left(x_{i}\right)=\{z_{j}|z_{j}=\mathcal{E}_{\theta}\left(x_{j}% \right),x_{j}\in N_{\mathbf{x}}\left(x_{i}\right)\}italic_N start_POSTSUBSCRIPT bold_z end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = { italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = caligraphic_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) , italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ italic_N start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) }. The contrastive unlearning loss aims to minimize the similarity of positive pairs and maximizes the similarity of negative pairs (the opposite of contrastive learning).

ULsubscript𝑈𝐿\displaystyle\mathcal{L}_{UL}caligraphic_L start_POSTSUBSCRIPT italic_U italic_L end_POSTSUBSCRIPT =xiXu1|N𝐳(xi)|zaNzlogexp(ziza/τ)zpP𝐳(xi)exp(zizp/τ)absentsubscriptsubscript𝑥𝑖superscript𝑋𝑢1subscript𝑁𝐳subscript𝑥𝑖subscriptsubscript𝑧𝑎subscript𝑁𝑧expsubscript𝑧𝑖subscript𝑧𝑎𝜏subscriptsubscript𝑧𝑝subscript𝑃𝐳subscript𝑥𝑖expsubscript𝑧𝑖subscript𝑧𝑝𝜏\displaystyle=\sum_{x_{i}\in X^{u}}{\frac{-1}{\lvert N_{\mathbf{z}}\left(x_{i}% \right)\rvert}}\sum_{z_{a}\in N_{z}}{\log{\frac{\mathrm{exp}\left(z_{i}\cdot z% _{a}/\tau\right)}{\sum\limits_{z_{p}\in P_{\mathbf{z}}\left(x_{i}\right)}{% \mathrm{exp}\left(z_{i}\cdot z_{p}/\tau\right)}}}}= ∑ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_X start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG - 1 end_ARG start_ARG | italic_N start_POSTSUBSCRIPT bold_z end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | end_ARG ∑ start_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ∈ italic_N start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_log divide start_ARG roman_exp ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⋅ italic_z start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT / italic_τ ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ∈ italic_P start_POSTSUBSCRIPT bold_z end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT roman_exp ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⋅ italic_z start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT / italic_τ ) end_ARG (5)

where τ+𝜏superscript\tau\in\mathcal{R}^{+}italic_τ ∈ caligraphic_R start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT is a scalar temperature parameter.

Contrastive Unlearning Loss: Single Class Unlearning. For single class unlearning, the unlearning set 𝒟tru={(xi,yi)|yi=c}subscriptsuperscript𝒟𝑢𝑡𝑟conditional-setsubscript𝑥𝑖subscript𝑦𝑖subscript𝑦𝑖𝑐\mathcal{D}^{u}_{tr}=\{\left(x_{i},y_{i}\right)|y_{i}=c\}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT = { ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_c } and remaining set 𝒟trr={(xi,yi)|yic}subscriptsuperscript𝒟𝑟𝑡𝑟conditional-setsubscript𝑥𝑖subscript𝑦𝑖subscript𝑦𝑖𝑐\mathcal{D}^{r}_{tr}=\{\left(x_{i},y_{i}\right)|y_{i}\neq c\}caligraphic_D start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT = { ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≠ italic_c }. This makes the positive set P𝐳=subscript𝑃𝐳P_{\mathbf{z}}=\emptysetitalic_P start_POSTSUBSCRIPT bold_z end_POSTSUBSCRIPT = ∅ as none of remaining samples belong to class c𝑐citalic_c. In short, there are no positive remaining samples to push away the unlearning samples. Thus we change equation 5 as follows.

ULsubscript𝑈𝐿\displaystyle\mathcal{L}_{UL}caligraphic_L start_POSTSUBSCRIPT italic_U italic_L end_POSTSUBSCRIPT =xiXu1|N𝐳(xi)|zaNzlogexp(ziza/τ)|Nz(xi)|.absentsubscriptsubscript𝑥𝑖superscript𝑋𝑢1subscript𝑁𝐳subscript𝑥𝑖subscriptsubscript𝑧𝑎subscript𝑁𝑧expsubscript𝑧𝑖subscript𝑧𝑎𝜏subscript𝑁𝑧subscript𝑥𝑖\displaystyle=\sum_{x_{i}\in X^{u}}{\frac{-1}{\lvert N_{\mathbf{z}}\left(x_{i}% \right)\rvert}}\sum_{z_{a}\in N_{z}}{\log{\frac{\mathrm{exp}\left(z_{i}\cdot z% _{a}/\tau\right)}{\lvert N_{z}\left(x_{i}\right)\rvert}}}.= ∑ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_X start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG - 1 end_ARG start_ARG | italic_N start_POSTSUBSCRIPT bold_z end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | end_ARG ∑ start_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ∈ italic_N start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_log divide start_ARG roman_exp ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⋅ italic_z start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT / italic_τ ) end_ARG start_ARG | italic_N start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | end_ARG . (6)

We replaced the previous denominator to |N𝐳(xi)|subscript𝑁𝐳subscript𝑥𝑖\lvert N_{\mathbf{z}}\left(x_{i}\right)\rvert| italic_N start_POSTSUBSCRIPT bold_z end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) |. This is because equation 5 requires both directions to push and pull unlearning samples. Lacking one of the directions increases the instability of the loss. Since Pz=subscript𝑃𝑧P_{z}=\emptysetitalic_P start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT = ∅, we replace the denominator to |N𝐳(xi)|subscript𝑁𝐳subscript𝑥𝑖\lvert N_{\mathbf{z}}\left(x_{i}\right)\rvert| italic_N start_POSTSUBSCRIPT bold_z end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | to introduce dam** effects against excessively pulling unlearning samples to negative samples.

Classification Loss of Remaining Samples. A novel challenge of contrastive unlearning is to preserve embeddings of remaining samples. Optimizing equation 5 not only alters embeddings of the anchor unlearning sample but also reciprocally alters embeddings of all samples in P𝐱subscript𝑃𝐱P_{\mathbf{x}}italic_P start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT and N𝐱subscript𝑁𝐱N_{\mathbf{x}}italic_N start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT. All positive samples are slightly pushed away from and all negatives are slightly pulled toward the anchor. A similar effect arises in contrastive learning, but it is not problematic as it reinforces the consolidation of embeddings of the same class, which is a desired effect. However, for unlearning purposes, embeddings of Xrsuperscript𝑋𝑟X^{r}italic_X start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT have to be preserved, because: 1) not preserving them directly leads to a loss in model performance, and 2) it also reciprocally affects unlearning effectiveness as magnitude of pulling and pushing decreases. In short, embeddings of Xrsuperscript𝑋𝑟X^{r}italic_X start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT are also modified as a byproduct of optimization and it is necessary to restore them back. We utilize cross-entropy loss for restoring embeddings of Xrsuperscript𝑋𝑟X^{r}italic_X start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT, because it derives maximum likelihood independently to each sample Shore and Johnson (1981). This ensures obtaining directions very close to the original embeddings no matter how embeddings of remaining samples are modified. Combining the unlearning loss, the final loss for our proposed contrastive unlearning is as follows,

=λULUL+λCECE((Xr),Yr),subscript𝜆𝑈𝐿subscript𝑈𝐿subscript𝜆𝐶𝐸subscript𝐶𝐸superscript𝑋𝑟superscript𝑌𝑟\displaystyle\mathcal{L}=\lambda_{UL}\mathcal{L}_{UL}+\lambda_{CE}\mathcal{L}_% {CE}\left(\mathcal{F}\left(X^{r}\right),Y^{r}\right),caligraphic_L = italic_λ start_POSTSUBSCRIPT italic_U italic_L end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_U italic_L end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT italic_C italic_E end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_C italic_E end_POSTSUBSCRIPT ( caligraphic_F ( italic_X start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ) , italic_Y start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ) , (7)

where Xrsuperscript𝑋𝑟X^{r}italic_X start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT and Yrsuperscript𝑌𝑟Y^{r}italic_Y start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT are batched remaining samples and their corresponding labels. λCEsubscript𝜆𝐶𝐸\lambda_{CE}italic_λ start_POSTSUBSCRIPT italic_C italic_E end_POSTSUBSCRIPT and λULsubscript𝜆𝑈𝐿\lambda_{UL}italic_λ start_POSTSUBSCRIPT italic_U italic_L end_POSTSUBSCRIPT are hyperparamters to determine influence of two loss terms.

Algorithm 1 Contrastive Unlearning

Input: θ,(),(),Dtrr,Dtru𝜃normal-⋅normal-⋅subscriptsuperscript𝐷𝑟𝑡𝑟subscriptsuperscript𝐷𝑢𝑡𝑟\theta,\mathcal{H}\left(\cdot\right),\mathcal{E}\left(\cdot\right),D^{r}_{tr},% D^{u}_{tr}italic_θ , caligraphic_H ( ⋅ ) , caligraphic_E ( ⋅ ) , italic_D start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT , italic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT, 𝒟evalsubscript𝒟normal-eval\mathcal{D}_{\mathrm{eval}}caligraphic_D start_POSTSUBSCRIPT roman_eval end_POSTSUBSCRIPT
Parameter: iter,λCL,λUL,ω𝑖𝑡𝑒𝑟subscript𝜆𝐶𝐿subscript𝜆𝑈𝐿𝜔iter,\lambda_{CL},\lambda_{UL},\omegaitalic_i italic_t italic_e italic_r , italic_λ start_POSTSUBSCRIPT italic_C italic_L end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT italic_U italic_L end_POSTSUBSCRIPT , italic_ω
Output: θsuperscript𝜃normal-′\theta^{\prime}italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT

1:while termination condition is not satisfied do
2:    for  each XuDtrusuperscript𝑋𝑢subscriptsuperscript𝐷𝑢𝑡𝑟X^{u}\in D^{u}_{tr}italic_X start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT ∈ italic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT do
3:       for 1,,ω1𝜔1,\cdots,\omega1 , ⋯ , italic_ω do
4:          Sample (Xr,Yr)superscript𝑋𝑟superscript𝑌𝑟\left(X^{r},Y^{r}\right)( italic_X start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT , italic_Y start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ) from 𝒟trrsubscriptsuperscript𝒟𝑟𝑡𝑟\mathcal{D}^{r}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT
5:          Determine P𝐳(xi),N𝐳(xi)subscript𝑃𝐳subscript𝑥𝑖subscript𝑁𝐳subscript𝑥𝑖P_{\mathbf{z}}\left(x_{i}\right),N_{\mathbf{z}}\left(x_{i}\right)italic_P start_POSTSUBSCRIPT bold_z end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , italic_N start_POSTSUBSCRIPT bold_z end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) xiXufor-allsubscript𝑥𝑖superscript𝑋𝑢\forall x_{i}\in X^{u}∀ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_X start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT
6:          CECE((θ(Xr)),Yr)subscript𝐶𝐸subscript𝐶𝐸subscript𝜃superscript𝑋𝑟superscript𝑌𝑟\ell_{CE}\leftarrow\mathcal{L}_{CE}\left(\mathcal{H}\left(\mathcal{E}_{\theta}% \left(X^{r}\right)\right),Y^{r}\right)roman_ℓ start_POSTSUBSCRIPT italic_C italic_E end_POSTSUBSCRIPT ← caligraphic_L start_POSTSUBSCRIPT italic_C italic_E end_POSTSUBSCRIPT ( caligraphic_H ( caligraphic_E start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_X start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ) ) , italic_Y start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT )
7:          ULλULUL(P𝐳(xi),N𝐳(xi))subscript𝑈𝐿subscript𝜆𝑈𝐿subscript𝑈𝐿subscript𝑃𝐳subscript𝑥𝑖subscript𝑁𝐳subscript𝑥𝑖\ell_{UL}\leftarrow\lambda_{UL}\mathcal{L}_{UL}\left(P_{\mathbf{z}}\left(x_{i}% \right),N_{\mathbf{z}}\left(x_{i}\right)\right)roman_ℓ start_POSTSUBSCRIPT italic_U italic_L end_POSTSUBSCRIPT ← italic_λ start_POSTSUBSCRIPT italic_U italic_L end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_U italic_L end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT bold_z end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , italic_N start_POSTSUBSCRIPT bold_z end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) xiXufor-allsubscript𝑥𝑖superscript𝑋𝑢\forall x_{i}\in X^{u}∀ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_X start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT
8:          θθη(CE+UL)𝜃𝜃𝜂subscript𝐶𝐸subscript𝑈𝐿\theta\leftarrow\theta-\eta\nabla\left(\ell_{CE}+\ell_{UL}\right)italic_θ ← italic_θ - italic_η ∇ ( roman_ℓ start_POSTSUBSCRIPT italic_C italic_E end_POSTSUBSCRIPT + roman_ℓ start_POSTSUBSCRIPT italic_U italic_L end_POSTSUBSCRIPT )
9:       end for
10:    end for
11:    θθsuperscript𝜃𝜃\theta^{\prime}\leftarrow\thetaitalic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ← italic_θ
12:    Evaluate, get termination condition θsuperscript𝜃\theta^{\prime}italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT with 𝒟evalsubscript𝒟eval\mathcal{D}_{\mathrm{eval}}caligraphic_D start_POSTSUBSCRIPT roman_eval end_POSTSUBSCRIPT
13:end while
14:return θsuperscript𝜃\theta^{\prime}italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT

Complete Algorithm. Algorithm 1 shows step-wise overview of contrastive unlearning. It iterates for all unlearning batches Xusuperscript𝑋𝑢X^{u}italic_X start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT in Dtrusubscriptsuperscript𝐷𝑢𝑡𝑟D^{u}_{tr}italic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT. For each Xusuperscript𝑋𝑢X^{u}italic_X start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT, it computes unlearning loss by sampling a random remaining batch Xrsuperscript𝑋𝑟X^{r}italic_X start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT for contrasting purposes. For each Xusuperscript𝑋𝑢X^{u}italic_X start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT, sampling and loss derivation are repeated ω𝜔\omegaitalic_ω times. Higher ω𝜔\omegaitalic_ω stabilizes the unlearning procedure by contrasting unlearning samples against multiple sets of remaining samples. From the experiment, we set ω𝜔\omegaitalic_ω to be at most 4 to reduce computational overhead and our algorithm showed stable unlearning performance.

Termination Condition. The termination condition for the algorithm differs based on the task of unlearning. We assume a small dataset 𝒟evalsubscript𝒟eval\mathcal{D}_{\mathrm{eval}}caligraphic_D start_POSTSUBSCRIPT roman_eval end_POSTSUBSCRIPT is available for evaluation. The algorithm evaluates superscript\mathcal{F}^{\prime}caligraphic_F start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT with 𝒟evalsubscript𝒟𝑒𝑣𝑎𝑙\mathcal{D}_{eval}caligraphic_D start_POSTSUBSCRIPT italic_e italic_v italic_a italic_l end_POSTSUBSCRIPT and terminates if it satisfies unlearning criteria. For single class unlearning, 𝒟eval=Dtsusubscript𝒟evalsubscriptsuperscript𝐷𝑢𝑡𝑠\mathcal{D}_{\mathrm{eval}}=D^{u}_{ts}caligraphic_D start_POSTSUBSCRIPT roman_eval end_POSTSUBSCRIPT = italic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT, the test data of the unlearning class. The algorithm terminates when the accuracy of the unlearned model superscript\mathcal{F}^{\prime}caligraphic_F start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT on the unlearning class falls below a threshold where C𝐶Citalic_C is the total number of classes in the training data and 1/C𝐶Citalic_C corresponds to the accuracy of a random guess.

Acc(,𝒟eval)1C.Accsuperscriptsubscript𝒟eval1𝐶\displaystyle\mathrm{Acc}\left(\mathcal{F}^{\prime},\mathcal{D}_{\mathrm{eval}% }\right)\leq\frac{1}{C}.roman_Acc ( caligraphic_F start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , caligraphic_D start_POSTSUBSCRIPT roman_eval end_POSTSUBSCRIPT ) ≤ divide start_ARG 1 end_ARG start_ARG italic_C end_ARG . (8)

For sample unlearning, 𝒟eval={𝒟evalu,𝒟evalts}subscript𝒟evalsubscriptsuperscript𝒟𝑢𝑒𝑣𝑎𝑙subscriptsuperscript𝒟𝑡𝑠𝑒𝑣𝑎𝑙\mathcal{D}_{\mathrm{eval}}=\{\mathcal{D}^{u}_{eval},\mathcal{D}^{ts}_{eval}\}caligraphic_D start_POSTSUBSCRIPT roman_eval end_POSTSUBSCRIPT = { caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_e italic_v italic_a italic_l end_POSTSUBSCRIPT , caligraphic_D start_POSTSUPERSCRIPT italic_t italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_e italic_v italic_a italic_l end_POSTSUBSCRIPT } where 𝒟evalu𝒟trusubscriptsuperscript𝒟𝑢𝑒𝑣𝑎𝑙subscriptsuperscript𝒟𝑢𝑡𝑟\mathcal{D}^{u}_{eval}\subseteq\mathcal{D}^{u}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_e italic_v italic_a italic_l end_POSTSUBSCRIPT ⊆ caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT and 𝒟evalts𝒟tssubscriptsuperscript𝒟𝑡𝑠𝑒𝑣𝑎𝑙subscript𝒟𝑡𝑠\mathcal{D}^{ts}_{eval}\subseteq\mathcal{D}_{ts}caligraphic_D start_POSTSUPERSCRIPT italic_t italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_e italic_v italic_a italic_l end_POSTSUBSCRIPT ⊆ caligraphic_D start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT. The algorithm terminates when the accuracy of superscript\mathcal{F}^{\prime}caligraphic_F start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT on the unlearning samples 𝒟evalusubscriptsuperscript𝒟𝑢𝑒𝑣𝑎𝑙\mathcal{D}^{u}_{eval}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_e italic_v italic_a italic_l end_POSTSUBSCRIPT drops below the accuray on test samples 𝒟evaltssubscriptsuperscript𝒟𝑡𝑠𝑒𝑣𝑎𝑙\mathcal{D}^{ts}_{eval}caligraphic_D start_POSTSUPERSCRIPT italic_t italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_e italic_v italic_a italic_l end_POSTSUBSCRIPT.

Acc(,𝒟evalu)Acc(,𝒟evalts).Accsuperscriptsubscriptsuperscript𝒟𝑢𝑒𝑣𝑎𝑙Accsuperscriptsubscriptsuperscript𝒟𝑡𝑠𝑒𝑣𝑎𝑙\displaystyle\mathrm{Acc}\left(\mathcal{F}^{\prime},\mathcal{D}^{u}_{eval}% \right)\leq\mathrm{Acc}\left(\mathcal{F}^{\prime},\mathcal{D}^{ts}_{eval}% \right).roman_Acc ( caligraphic_F start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_e italic_v italic_a italic_l end_POSTSUBSCRIPT ) ≤ roman_Acc ( caligraphic_F start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , caligraphic_D start_POSTSUPERSCRIPT italic_t italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_e italic_v italic_a italic_l end_POSTSUBSCRIPT ) . (9)

It is not desired to terminate the algorithm before satisfying this condition because it implies that the model still retains information regarding 𝒟trusubscriptsuperscript𝒟𝑢𝑡𝑟\mathcal{D}^{u}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT. It is also not desired to continue running the algorithm to further reduce accuracy on 𝒟trusubscriptsuperscript𝒟𝑢𝑡𝑟\mathcal{D}^{u}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT much lower than 𝒟tssubscript𝒟𝑡𝑠\mathcal{D}_{ts}caligraphic_D start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT because it is negatively injecting information regarding 𝒟trusubscriptsuperscript𝒟𝑢𝑡𝑟\mathcal{D}^{u}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT into θsuperscript𝜃\theta^{\prime}italic_θ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. This results in superscript\mathcal{F}^{\prime}caligraphic_F start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT to deliberately make incorrect classification on 𝒟tsusubscriptsuperscript𝒟𝑢𝑡𝑠\mathcal{D}^{u}_{ts}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT, which is not aligned with the goal of sample unlearning.

5 Experiments

5.1 Experiment Setup

Datasets and Models. We use two standard benchmark datasets, CIFAR-10 and SVHN, and use ResNet(RN)-18, 34, 50, and 101 models He et al. (2016) in our experiments. We train each model with each dataset without any data augmentation except normalization. Readers may refer to the appendix for the performance of original models.

Comparison Methods. For class unlearning, we remove all samples belong to class 5 and for sample unlearning, we remove randomly selected 500 samples. For both class unlearning and sample unlearning tasks, we use Retrain, a retrained model using the training data excluding the unlearning class or samples, as an ideal reference for unlearning completeness and model performance. We include four state-of-the-art methods designed for sample unlearning: 1) Finetune Golatkar et al. (2020) leverages catastrophic forgetting Goodfellow et al. (2013) and iteratively trains the original model only using the remaining samples. 2) Neggrad Golatkar et al. (2020) conducts gradient ascent using unlearning samples. 3) Fisher Golatkar et al. (2020) is a certified unlearning algorithm using randomization techniques borrowed from differential privacy and leverages the Fisher information matrix to design optimal noise for noisy gradient updates. 4) LCODEC Mehta et al. (2022) is also a certified unlearning method that proposes a fast and effective way of obtaining Hessian by selecting parameters by their importance.

We include four state-of-the-art methods specifically designed for single class unlearning: 1) Boundary Expansion Chen et al. (2023) trains the model using all unlearning samples as a temporary class and then discards the temporary class. 2) Boundary Shrink Chen et al. (2023) is similar to Boundary Expansion but it modifies the decision boundary of unlearning class to prevent unlearning samples from being classified into the unlearning class (unlearning samples are classified as other classes). 3) SCRUB Kurmanji et al. (2023) is based on the teacher-student framework and selectively transfers information from the original model to the unlearned model (all information except that of the unlearning class). 4) UNSIR Tarun et al. (2023) uses an iterative process of impairing and recovering and generates noise that maximizes error in the unlearning class and repairs the classification performance for the other classes.

We note that sample unlearning algorithms may be used for class unlearning. However, the class unlearning baselines we have chose here already demonstrated their superiority over the sample unlearning methods including Finetune, Neggrad, and Fisher, hence we do not include them in comparison.

Evaluation Metrics. We evaluate model performance, unlearning effectiveness, and efficiency of the algorithms.

  • Model performance is assessed by accuracy of the unlearned model on the test data of remaining classes 𝒟tsrsubscriptsuperscript𝒟𝑟𝑡𝑠\mathcal{D}^{r}_{ts}caligraphic_D start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT (class unlearning) and on the test data 𝒟tssubscript𝒟𝑡𝑠\mathcal{D}_{ts}caligraphic_D start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT (sample unlearning). The accuracy should be similar to the retrained model.

  • Unlearning effectiveness is assessed by accuracy of the unlearned model on the training and test data of unlearning class 𝒟trusubscriptsuperscript𝒟𝑢𝑡𝑟\mathcal{D}^{u}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT and 𝒟tsusubscriptsuperscript𝒟𝑢𝑡𝑠\mathcal{D}^{u}_{ts}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT (class unlearning) and the unlearning samples 𝒟trusubscriptsuperscript𝒟𝑢𝑡𝑟\mathcal{D}^{u}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT (sample unlearning). The lower the accuracy, the more effective the unlearning. We also conduct MIA for further unlearning verification to be described next.

  • Efficiency is measured by the runtime of the unlearning algorithm. A shorter runtime indicates better efficiency.

Unlearning Verification via MIA. We conduct a membership inference attack (MIA) Shokri et al. (2017) to verify sample unlearning. We assume an adversary with full access to the unlearned model and training data, simulating an administrator who conducted unlearning and uses MIA to verify the effectiveness of unlearning Thudi et al. (2022); Cotogni et al. (2023).

To train the attack model, we sample 𝒟Msuperscript𝒟𝑀\mathcal{D}^{M}caligraphic_D start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT from remaining samples 𝒟trrsubscriptsuperscript𝒟𝑟𝑡𝑟\mathcal{D}^{r}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT (as members) and 𝒟Nsuperscript𝒟𝑁\mathcal{D}^{N}caligraphic_D start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT from test samples 𝒟tssubscript𝒟𝑡𝑠\mathcal{D}_{ts}caligraphic_D start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT (as non-members). An attack model is trained with both members and non-members using their output from the unlearned model {(𝐱)|𝐱𝒟M𝒟N}conditional-setsuperscript𝐱𝐱superscript𝒟𝑀superscript𝒟𝑁\{\mathcal{F}^{\prime}\left(\mathbf{x}\right)|\mathbf{x}\in\mathcal{D}^{M}\cup% \mathcal{D}^{N}\}{ caligraphic_F start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( bold_x ) | bold_x ∈ caligraphic_D start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT ∪ caligraphic_D start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT } as features and labels as {𝐲i|𝐲i=1xi𝒟M,𝐲i=0xi𝒟N}conditional-setsubscript𝐲𝑖formulae-sequencesubscript𝐲𝑖1for-allsubscript𝑥𝑖superscript𝒟𝑀subscript𝐲𝑖0for-allsubscript𝑥𝑖superscript𝒟𝑁\{\mathbf{y}_{i}|\mathbf{y}_{i}=1\;\forall x_{i}\in\mathcal{D}^{M},\mathbf{y}_% {i}=0\;\forall{x}_{i}\in\mathcal{D}^{N}\}{ bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 ∀ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_D start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT , bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 ∀ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_D start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT }. We then test the attack model on the unlearning samples 𝒟trusubscriptsuperscript𝒟𝑢𝑡𝑟\mathcal{D}^{u}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT and selected test member samples from remaining samples 𝒟trrsubscriptsuperscript𝒟𝑟𝑡𝑟\mathcal{D}^{r}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT. We report the Member prediction rate defined as number of positive (member) predictions by the MIA divided by total number of tests. It can be considered as false positive rate (FPR) for unlearning samples (considering them as non-members) and true positive rate (TPR) for members. An effective unlearning algorithm should have a low member prediction rate on unlearning samples and high member prediction rate on member samples. Our metric is consistent with existing literature Jia et al. (2023) utilizing true negative rate (TNR) for unlearning samples and test non-member samples (considering both as non-members), which essentially measures the opposite to ours, i.e., considering non-members rather than members. We focus on predicting the members because MIA is designed to infer members.

5.2 Results on Single Class Unlearning

Model

Evaluation

Retrain
(reference)

Contrastive

Boundary
Shrink
Boundary
Expansion

SCRUB

UNSIR

RN18

𝒟tsrsubscriptsuperscript𝒟𝑟𝑡𝑠\mathcal{D}^{r}_{ts}caligraphic_D start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT

86.96

85.79

83.62

82.34

83.91

57.36

𝒟trusubscriptsuperscript𝒟𝑢𝑡𝑟\mathcal{D}^{u}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT

0.00

0.00

4.54

0.00

35.42

0.00

𝒟tsusubscriptsuperscript𝒟𝑢𝑡𝑠\mathcal{D}^{u}_{ts}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT

0.00

0.00

4.62

6.51

9.30

0.00

RN34

𝒟tsrsubscriptsuperscript𝒟𝑟𝑡𝑠\mathcal{D}^{r}_{ts}caligraphic_D start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT

88.01

86.59

84.70

83.19

82.22

47.02

𝒟trusubscriptsuperscript𝒟𝑢𝑡𝑟\mathcal{D}^{u}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT

0.00

0.00

2.46

0.00

3.18

0.00

𝒟tsusubscriptsuperscript𝒟𝑢𝑡𝑠\mathcal{D}^{u}_{ts}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT

0.00

0.00

4.60

6.81

0.80

0.00

RN50

𝒟tsrsubscriptsuperscript𝒟𝑟𝑡𝑠\mathcal{D}^{r}_{ts}caligraphic_D start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT

87.78

87.98

85.52

83.39

84.44

37.41

𝒟trusubscriptsuperscript𝒟𝑢𝑡𝑟\mathcal{D}^{u}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT

0.00

0.00

2.74

0.00

7.16

0.00

𝒟tsusubscriptsuperscript𝒟𝑢𝑡𝑠\mathcal{D}^{u}_{ts}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT

0.00

0.00

5.90

8.22

1.51

0.00

RN101

𝒟tsrsubscriptsuperscript𝒟𝑟𝑡𝑠\mathcal{D}^{r}_{ts}caligraphic_D start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT

87.94

88.69

83.91

82.48

85.03

42.40

𝒟trusubscriptsuperscript𝒟𝑢𝑡𝑟\mathcal{D}^{u}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT

0.00

0.00

4.91

0.00

13.46

0.00

𝒟tsusubscriptsuperscript𝒟𝑢𝑡𝑠\mathcal{D}^{u}_{ts}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT

0.00

0.00

7.25

8.50

4.55

0.00

Table 1: Performance evaluation for single class unlearning on CIFAR-10.

Model

Evaluation

Retrain
(reference)

Contrastive

Boundary
Shrink
Boundary
Expansion

SCRUB

UNSIR

RN18

𝒟tsrsubscriptsuperscript𝒟𝑟𝑡𝑠\mathcal{D}^{r}_{ts}caligraphic_D start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT

95.43

93.91

94.84

93.71

93.88

90.3

𝒟trusubscriptsuperscript𝒟𝑢𝑡𝑟\mathcal{D}^{u}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT

0.00

0.00

29.79

80.25

88.67

0.00

𝒟tsusubscriptsuperscript𝒟𝑢𝑡𝑠\mathcal{D}^{u}_{ts}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT

0.00

0.00

37.46

2.61

77.39

0.00

RN34

𝒟tsrsubscriptsuperscript𝒟𝑟𝑡𝑠\mathcal{D}^{r}_{ts}caligraphic_D start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT

95.46

94.33

95.12

94.50

94.57

85.82

𝒟trusubscriptsuperscript𝒟𝑢𝑡𝑟\mathcal{D}^{u}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT

0.00

0.00

34.69

63.92

0.96

0.00

𝒟tsusubscriptsuperscript𝒟𝑢𝑡𝑠\mathcal{D}^{u}_{ts}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT

0.00

0.00

41.99

4.27

0.42

0.00

RN50

𝒟tsrsubscriptsuperscript𝒟𝑟𝑡𝑠\mathcal{D}^{r}_{ts}caligraphic_D start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT

95.83

94.87

95.47

95.01

93.75

70.56

𝒟trusubscriptsuperscript𝒟𝑢𝑡𝑟\mathcal{D}^{u}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT

0.00

0.00

40.01

3.92

2.68

0.00

𝒟tsusubscriptsuperscript𝒟𝑢𝑡𝑠\mathcal{D}^{u}_{ts}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT

0.00

0.00

42.37

8.74

9.64

0.00

RN101

𝒟tsrsubscriptsuperscript𝒟𝑟𝑡𝑠\mathcal{D}^{r}_{ts}caligraphic_D start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT

96.16

94.90

95.65

95.07

94.65

83.90

𝒟trusubscriptsuperscript𝒟𝑢𝑡𝑟\mathcal{D}^{u}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT

0.00

0.00

42.77

51.53

0.00

0.00

𝒟tsusubscriptsuperscript𝒟𝑢𝑡𝑠\mathcal{D}^{u}_{ts}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT

0.00

0.00

45.39

3.94

0.00

0.00

Table 2: Performance evaluation for single class unlearning on SVHN.

Unlearning Effectiveness and Model Performance. Table 1 depicts accuracy of different unlearned models on 𝒟tsrsubscriptsuperscript𝒟𝑟𝑡𝑠\mathcal{D}^{r}_{ts}caligraphic_D start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT (test set of remaining classes), 𝒟trusubscriptsuperscript𝒟𝑢𝑡𝑟\mathcal{D}^{u}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT (train set of unlearning class), and 𝒟tsusubscriptsuperscript𝒟𝑢𝑡𝑠\mathcal{D}^{u}_{ts}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT (test set of unlearning class) on CIFAR-10 for a randomly selected class 5. We experimented with all classes and they show similar performances. The retrain model shows the expected results with stable accuracy on 𝒟tsrsubscriptsuperscript𝒟𝑟𝑡𝑠\mathcal{D}^{r}_{ts}caligraphic_D start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT (similar to the accuracy of original models shown in the Appendix) and zero for both 𝒟trusubscriptsuperscript𝒟𝑢𝑡𝑟\mathcal{D}^{u}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT and 𝒟tsusubscriptsuperscript𝒟𝑢𝑡𝑠\mathcal{D}^{u}_{ts}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT since the class has been removed from training. Among all methods, contrastive unlearning is the only one that achieves 0 accuracy on the unlearning class indicating complete unlearning while preserving the accuracy on the remained classes. UNSIR is the only baseline achieving 0 accuracy in the unlearning class, however, it suffers from a significant performance loss. All other methods fail to completely remove the influence while also showing a performance loss in the remaining classes.

Table 2 illustrates accuracy of unlearned models on SVHN dataset. It shows a similar trend as the CIFAR-10 dataset. UNSIR provides better performance on the SVHN dataset because features of SVHN are easier to learn thus the model suffers less utility loss than CIFAR-10. However, it still suffers a significantly higher utility loss than contrastive unlearning. All other baselines show a high accuracy on the unlearning class in many cases, indicating they failed to remove the influence of the unlearning class. Contrastive unlearning consistently removed all influence of unlearning class with a negligibly small loss of performance.

Refer to caption
Figure 2: Accuracy on unlearning class vs. number of batches on 𝒟trusubscriptsuperscript𝒟𝑢𝑡𝑟\mathcal{D}^{u}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT.

Model

Retrain

Contrastive
Boundary
Shrink
Boundary
Expansion

SCRUB

UNSIR

RN18

1566.36

48.90

105.22

112.87

150.40

59.98

RN34

2072.76

75.45

181.12

139.90

240.39

90.58

RN50

3820.62

105.41

315.69

240.44

435.49

169.89

RN101

7493.79

139.94

540.21

425.77

747.65

270.38

Table 3: Processing time of class unlearning algorithms on CIFAR-10 dataset (in seconds).

Efficiency. Figure 2 shows the progress of the unlearning algorithms in terms of the accuracy on unlearning class 𝒟trusubscriptsuperscript𝒟𝑢𝑡𝑟\mathcal{D}^{u}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT vs. the number of batches in a single epoch. Both contrastive unlearning and other baselines are designed to run unlearning procedures multiple times for each batch. However, we fixed the hyperparameters of each algorithm so that each batch of 𝒟trusubscriptsuperscript𝒟𝑢𝑡𝑟\mathcal{D}^{u}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT is processed only once. Reaching faster to zero accuracy indicates that the algorithm is more efficient, as it needs a smaller number of batches to achieve unlearning. The figure shows that contrastive unlearning reaches zero approximately at the 60th batch while boundary shrink and boundary expansion still show approximately 10% accuracy after the first epoch. UNSIR shows zero accuracy from the beginning. However, it computes the proper level of noise by iterating through 𝒟trusubscriptsuperscript𝒟𝑢𝑡𝑟\mathcal{D}^{u}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT before running actual optimization. SCRUB, which is based on knowledge distillation, requires several passes through the 𝒟trusubscriptsuperscript𝒟𝑢𝑡𝑟\mathcal{D}^{u}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT and hence does not show any progress after one epoch. In summary, contrastive unlearning is most efficient as it achieves unlearning by only requiring 60 batches to achieve unlearning.

Table 3 shows the elapsed time for each unlearning algorithm. Contrastive unlearning is the fastest among all baselines and across all models because it only requires running a single iteration over unlearning samples. The speed of UNSIR is the second fastest as it also runs for a single iteration, however, extra time has been consumed from computing adequate noise to perturb parameters.

5.3 Results on Sample Unlearning

Model

Evaluation

Retrain

Contrastive

Finetune

Neggrad

Fisher

LCODEC

RN18

𝒟tsrsubscriptsuperscript𝒟𝑟𝑡𝑠\mathcal{D}^{r}_{ts}caligraphic_D start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT

84.47

82.82

81.44

68.23

77.40

76.52

𝒟trusubscriptsuperscript𝒟𝑢𝑡𝑟\mathcal{D}^{u}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT

85.60

82.20

85.40

92.60

96.00

99.80

RN34

𝒟tsrsubscriptsuperscript𝒟𝑟𝑡𝑠\mathcal{D}^{r}_{ts}caligraphic_D start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT

85.88

82.45

83.69

62.42

75.31

80.38

𝒟trusubscriptsuperscript𝒟𝑢𝑡𝑟\mathcal{D}^{u}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT

85.80

82.20

88.40

89.20

93.20

99.90

RN50

𝒟tsrsubscriptsuperscript𝒟𝑟𝑡𝑠\mathcal{D}^{r}_{ts}caligraphic_D start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT

85.44

85.06

82.85

71.69

71.44

77.49

𝒟trusubscriptsuperscript𝒟𝑢𝑡𝑟\mathcal{D}^{u}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT

87.00

83.00

85.60

90.80

87.00

99.80

RN101

𝒟tsrsubscriptsuperscript𝒟𝑟𝑡𝑠\mathcal{D}^{r}_{ts}caligraphic_D start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT

86.00

86.24

75.38

76.92

82.15

78.03

𝒟trusubscriptsuperscript𝒟𝑢𝑡𝑟\mathcal{D}^{u}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT

85.69

84.20

77.60

96.00

97.80

99.80

Table 4: Performance evaluation on sample unlearning on CIFAR-10.

Model

Evaluation

Retrain

Contrastive

Finetune

Neggrad

Fisher

LCODEC

RN18

𝒟tsrsubscriptsuperscript𝒟𝑟𝑡𝑠\mathcal{D}^{r}_{ts}caligraphic_D start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT

95.08

91.16

90.77

79.96

88.06

93.89

𝒟trusubscriptsuperscript𝒟𝑢𝑡𝑟\mathcal{D}^{u}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT

94.40

89.60

90.20

94.40

94.80

99.80

RN34

𝒟tsrsubscriptsuperscript𝒟𝑟𝑡𝑠\mathcal{D}^{r}_{ts}caligraphic_D start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT

95.49

92.43

91.74

73.93

93.00

94.62

𝒟trusubscriptsuperscript𝒟𝑢𝑡𝑟\mathcal{D}^{u}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT

94.40

92.00

90.80

92.80

99.20

99.80

RN50

𝒟tsrsubscriptsuperscript𝒟𝑟𝑡𝑠\mathcal{D}^{r}_{ts}caligraphic_D start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT

95.93

93.23

91.37

89.22

91.16

93.70

𝒟trusubscriptsuperscript𝒟𝑢𝑡𝑟\mathcal{D}^{u}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT

95.00

93.00

89.40

96.60

96.60

99.95

RN101

𝒟tsrsubscriptsuperscript𝒟𝑟𝑡𝑠\mathcal{D}^{r}_{ts}caligraphic_D start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT

96.01

92.26

91.46

88.65

95.77

81.65

𝒟trusubscriptsuperscript𝒟𝑢𝑡𝑟\mathcal{D}^{u}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT

93.80

90.40

91.00

97.60

99.80

91.60

Table 5: Performance evaluation on sample unlearning on SVHN.

Model Performance and Unlearning Effectiveness. Table 4 shows accuracy on 𝒟trusubscriptsuperscript𝒟𝑢𝑡𝑟\mathcal{D}^{u}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT (unlearning samples) and 𝒟tssubscript𝒟𝑡𝑠\mathcal{D}_{ts}caligraphic_D start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT (test data) on CIFAR-10 dataset. The retrain model shows the desired result as accuracy on 𝒟trusubscriptsuperscript𝒟𝑢𝑡𝑟\mathcal{D}^{u}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT (unlearning accuracy) and accuracy on 𝒟tssubscript𝒟𝑡𝑠\mathcal{D}_{ts}caligraphic_D start_POSTSUBSCRIPT italic_t italic_s end_POSTSUBSCRIPT (test accuracy) are similar. Contrastive unlearning exhibits the most similar performance with the retrain model. Neggrad, Fisher, and LCODEC present higher unlearning accuracy compared to test accuracy, indicating that these models still have information that helps classify the unlearning samples. Finetune, while showing reasonable results, its performance varies a lot with model architecture and has significant model utility loss in some cases.

Table 5 presents test and unlearning accuracy on the SVHN dataset. LCODEC and Fisher show similar test accuracy with the retrain model on some models. However, their unlearning accuracy is very high, at almost 100%, indicating a significant residual of the influence. Both Finetune and Neggrad show significant performance loss in test accuracy. Contrastive unlearning is more consistent in achieving similar unlearning accuracy as the retrain model with a relatively small performance loss in test accuracy.

Model

Evaluation

Retrain

Contrastive

Finetune

Neggrad

Fisher

LCODEC

RN18

𝒟trusubscriptsuperscript𝒟𝑢𝑡𝑟\mathcal{D}^{u}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT

61.77

60.88

65.89

80.46

86.95

92.56

𝒟trrsubscriptsuperscript𝒟𝑟𝑡𝑟\mathcal{D}^{r}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT

96.41

91.35

86.50

82.69

88.24

93.09

RN34

𝒟trusubscriptsuperscript𝒟𝑢𝑡𝑟\mathcal{D}^{u}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT

64.92

53.59

67.73

83.20

82.21

95.49

𝒟trrsubscriptsuperscript𝒟𝑟𝑡𝑟\mathcal{D}^{r}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT

94.82

86.98

87.15

82.35

83.96

97.51

RN50

𝒟trusubscriptsuperscript𝒟𝑢𝑡𝑟\mathcal{D}^{u}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT

63.31

60.23

69.75

86.59

74.29

94.56

𝒟trrsubscriptsuperscript𝒟𝑟𝑡𝑟\mathcal{D}^{r}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT

97.23

90.31

84.17

89.53

76.83

92.27

RN101

𝒟trusubscriptsuperscript𝒟𝑢𝑡𝑟\mathcal{D}^{u}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT

62.39

60.25

55.37

92.57

84.20

94.93

𝒟trrsubscriptsuperscript𝒟𝑟𝑡𝑟\mathcal{D}^{r}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT

95.90

86.45

59.43

91.76

85.70

95.90

Table 6: Member prediction rate on 𝒟trusubscriptsuperscript𝒟𝑢𝑡𝑟\mathcal{D}^{u}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT (unlearning samples) and 𝒟trrsubscriptsuperscript𝒟𝑟𝑡𝑟\mathcal{D}^{r}_{tr}caligraphic_D start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t italic_r end_POSTSUBSCRIPT (member-test samples) of MIA on CIFAR-10 dataset.

Unlearning Effectiveness via MIA. Table 6 shows the member prediction rate of the MIA on unlearning samples and test member samples against each unlearned model. An ideal attack model against the retrain model should have zero member prediction rate for unlearning samples and 100% for member samples (since the unlearning samples are non-members). However, the attack model in our experiment shows around 60% for unlearning samples which is a technical limitation of the attack model. The high rate on member samples does suggest that it has reasonable attack power in recognizing members. An unlearning algorithm is more effective if it exhibits 1) lower member prediction rate on unlearning samples, and 2) bigger difference in member prediction rate on unlearning samples and member samples. For Neggrad, Fisher, and LCODEC, the member prediction rate for member samples and unlearning samples are similar, showing ineffective unlearning. For finetune and contrastive unlearning, the member prediction rate for unlearning samples is lower than member samples. However, the difference is significantly bigger in contrastive unlearning, suggesting stronger discrimination between unlearning samples and member samples and more effective unlearning.

Model

Retrain

Contrastive

Finetune

Neggrad

Fisher

LCODEC

RN18

41.01

2.64

18.92

4.32

75.23

37.62

RN34

71.01

3.51

31.51

3.15

115.51

55.50

RN50

131.51

8.46

38.49

13.91

223.56

153.02

RN101

215.53

12.63

101.88

23.56

407.54

495.01

Table 7: Processing time of each algorithm conducting sample unlearning on CIFAR-10 dataset (in minutes).

Efficiency. Table 7 shows the runtime of different algorithms. It clearly shows contrastive unlearning is overall the fastest. This is due to its fast convergence; it needs less than 20 iterations over the entire unlearning samples. While Neggrad also iterates only on unlearning samples, it requires more than 40 iterations to achieve unlearning effects. Finetune, Fisher, and LCODEC need longer runtime since they require iterating over the remaining samples. Fisher and LCODEC suffer excessive computation with larger models because their mathematical computation is proportional to model parameters and hardly parallelizable.

6 Conclusion

In this paper, we proposed a novel contrastive approach for machine unlearning. It achieves unlearning by re-configuring geometric properties of embedding space and contrasting unlearning samples and remaining samples. Through extensive experiments, we demonstrated that it outperforms state-of-the-art unlearning algorithms in model performance, unlearning effectiveness, and efficiency. In future work, we will examine the effectiveness of contrastive unlearning in different model architectures and different unlearning scenarios such as graph unlearning and correlated sequence unlearning.

References

  • Arpit et al. [2017] Devansh Arpit, Stanislaw Jastrzebski, Nicolas Ballas, David Krueger, Emmanuel Bengio, Maxinder S. Kanwal, Tegan Maharaj, Asja Fischer, Aaron Courville, Yoshua Bengio, and Simon Lacoste-Julien. A closer look at memorization in deep networks. In Proceedings of the 34th International Conference on Machine Learning, pages 233–242. PMLR, 2017. ISSN: 2640-3498.
  • Bourtoule et al. [2021] Lucas Bourtoule, Varun Chandrasekaran, Christopher A. Choquette-Choo, Hengrui Jia, Adelin Travers, Baiwu Zhang, David Lie, and Nicolas Papernot. Machine unlearning. In 2021 IEEE Symposium on Security and Privacy (SP), pages 141–159. IEEE, 2021.
  • Cao and Yang [2015] Yinzhi Cao and Junfeng Yang. Towards making systems forget with machine unlearning. In 2015 IEEE Symposium on Security and Privacy, pages 463–480. IEEE, 2015.
  • Chen et al. [2020] Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. In Proceedings of the 37th International Conference on Machine Learning, pages 1597–1607. PMLR, 2020. ISSN: 2640-3498.
  • Chen et al. [2023] Min Chen, Weizhuo Gao, Gaoyang Liu, Kai Peng, and Chen Wang. Boundary unlearning: Rapid forgetting of deep networks via shifting the decision boundary. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7766–7775, 2023.
  • Cotogni et al. [2023] Marco Cotogni, Jacopo Bonato, Luigi Sabetta, Francesco Pelosin, and Alessandro Nicolosi. DUCK: Distance-based Unlearning via Centroid Kinematics, December 2023. arXiv:2312.02052 [cs].
  • Das and Chaudhuri [2024] Rudrajit Das and Subhasis Chaudhuri. On the separability of classes with the cross-entropy loss function, 2024.
  • Golatkar et al. [2020] Aditya Golatkar, Alessandro Achille, and Stefano Soatto. Eternal sunshine of the spotless net: Selective forgetting in deep networks. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9301–9309. IEEE, 2020.
  • Goodfellow et al. [2013] I. Goodfellow, Mehdi Mirza, Xia Da, Aaron C. Courville, and Yoshua Bengio. An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks. CoRR, December 2013.
  • Graf et al. [2021] Florian Graf, Christoph Hofer, Marc Niethammer, and Roland Kwitt. Dissecting supervised contrastive learning. In Proceedings of the 38th International Conference on Machine Learning, pages 3821–3830. PMLR, 2021. ISSN: 2640-3498.
  • Guo et al. [2020] Chuan Guo, Tom Goldstein, Awni Hannun, and Laurens Van Der Maaten. Certified data removal from machine learning models. In Proceedings of the 37th International Conference on Machine Learning, volume 119 of ICML’20, pages 3832–3842. JMLR.org, 2020.
  • Gupta et al. [2021] Varun Gupta, Christopher Jung, Seth Neel, Aaron Roth, Saeed Sharifi-Malvajerdi, and Chris Waites. Adaptive machine unlearning. In Advances in Neural Information Processing Systems, volume 34, pages 16319–16330. Curran Associates, Inc., 2021.
  • He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778. IEEE, 2016.
  • Jia et al. [2023] **ghan Jia, Jiancheng Liu, Parikshit Ram, Yuguang Yao, Gaowen Liu, Yang Liu, Pranay Sharma, and Sijia Liu. Model sparsity can simplify machine unlearning. In Neural Information Processing Systems, 2023.
  • Khosla et al. [2020] Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. Supervised contrastive learning. In Advances in Neural Information Processing Systems, volume 33, pages 18661–18673. Curran Associates, Inc., 2020.
  • Koch and Soll [2023] Korbinian Koch and Marcus Soll. No matter how you slice it: Machine unlearning with SISA comes at the expense of minority classes. In 2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), pages 622–637, 2023.
  • Kurmanji et al. [2023] Meghdad Kurmanji, Peter Triantafillou, Jamie Hayes, and Eleni Triantafillou. Towards unbounded machine unlearning, 2023.
  • Lin et al. [2023] Shen Lin, Xiaoyu Zhang, Chenyang Chen, Xiaofeng Chen, and Willy Susilo. Erm-ktp: Knowledge-level machine unlearning via knowledge transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20147–20155, 2023.
  • Mantelero [2024] Alessandro Mantelero. The EU proposal for a general data protection regulation and the roots of the ‘right to be forgotten’. Computer Law & Security Review, 29(3):229–235, 2024.
  • Mehta et al. [2022] Ronak Mehta, Sourav Pal, Vikas Singh, and Sathya N Ravi. Deep unlearning via randomized conditionally independent hessians. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10422–10431, 2022.
  • Neel et al. [2024] Seth Neel, Aaron Roth, and Saeed Sharifi-Malvajerdi. Descent-to-delete: Gradient-based methods for machine unlearning. In Proceedings of the 32nd International Conference on Algorithmic Learning Theory, pages 931–962. PMLR, 2024. ISSN: 2640-3498.
  • Shokri et al. [2017] Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership Inference Attacks Against Machine Learning Models. In 2017 IEEE Symposium on Security and Privacy (SP), pages 3–18, May 2017. ISSN: 2375-1207.
  • Shore and Johnson [1981] J. Shore and R. Johnson. Properties of cross-entropy minimization. IEEE Transactions on Information Theory, 27(4):472–482, July 1981. Conference Name: IEEE Transactions on Information Theory.
  • Tarun et al. [2023] Ayush K. Tarun, Vikram S. Chundawat, Murari Mandal, and Mohan Kankanhalli. Fast yet effective machine unlearning. IEEE Transactions on Neural Networks and Learning Systems, pages 1–10, 2023.
  • Thudi et al. [2022] Anvith Thudi, Hengrui Jia, Ilia Shumailov, and Nicolas Papernot. On the necessity of auditable algorithmic definitions for machine unlearning. In 31st USENIX Security Symposium (USENIX Security 22), pages 4007–4022, 2022.
  • Wu et al. [2023] Kun Wu, Jie Shen, Yue Ning, Ting Wang, and Wendy Hui Wang. Certified edge unlearning for graph neural networks. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 2606–2617. ACM, 2023.
  • Yan et al. [2022] Haonan Yan, Xiaoguang Li, Ziyao Guo, Hui Li, Fenghua Li, and Xiaodong Lin. Arcane: An efficient architecture for exact machine unlearning. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22, pages 4006–4013, 2022.