Class-Wise Buffer Management for Incremental Object Detection:
An Effective Buffer Training Strategy

Junsu Kim¹ Sumin Hong² Chanwoo Kim¹ Jihyeon Kim¹ Yihalem Yimolal Tiruneh¹
Jeongwan On⁴ Jihyun Song³ Sunhwa Choi³ Seungryul Baek¹

¹UNIST, South Korea ²SeoulTech, South Korea
³LG Electronics, South Korea ⁴Chonnam National University, South Korea

Abstract

Class incremental learning aims to solve a problem that arises when continuously adding unseen class instances to an existing model This approach has been extensively studied in the context of image classification; however its applicability to object detection is not well established yet. Existing frameworks using replay methods mainly collect replay data without considering the model being trained and tend to rely on randomness or the number of labels of each sample. Also, despite the effectiveness of the replay, it was not yet optimized for the object detection task. In this paper, we introduce an effective buffer training strategy (eBTS) that creates the optimized replay buffer on object detection. Our approach incorporates guarantee minimum and hierarchical sampling to establish the buffer customized to the trained model. Furthermore, we use the circular experience replay training to optimally utilize the accumulated buffer data. Experiments on the MS COCO dataset demonstrate that our eBTS achieves state-of-the-art performance compared to the existing replay schemes.

1 Introduction

Traditional machine learning models tend to forget previously learned patterns when trained on new datasets, a phenomenon called “catastrophic forgetting” [14]. This poses challenges for models operating in dynamic environments. However, unlike machines, humans can learn new concepts without entirely forgetting pre-existing knowledge. Building on this insight, incremental learning aims to address this issue by training models to assimilate new concepts progressively without retraining on the entire past dataset, effectively preserving knowledge from prior tasks while integrating new task.

Refer to caption — Figure 1: The final mean average precision ( $mAP$ . %) and the number of classes that satisfy COCO’s distribution at the 40+40 setup. We use the following formula $\left(\text{buffer capacity}\times\frac{\text{Number of samples in }C_{i}}{% \text{Total number of samples in }C_{1,\ldots,n}}\right)$ to check the distribution for the previous classes.

Most incremental methods [12, 4, 6, 15, 11, 7, 2] handle image classification task. We can also apply these incremental methods to object detection; however, due to varying labels for foreground objects in the scene, the strategies for object detection are relatively ineffective. Nevertheless, this task can play an important role in real-world applications. This allows us to adapt to environments where new object labels are constantly appearing. For example, when a new product is discovered, the detection system should recognize it while simultaneously detecting previous labels. Instead of completely retraining the model every time new labels appear, it helps to update the model to accommodate the unseen label incrementally. This greatly improves flexibility and persistence in real-world applications and saves computing resources. We call this work class incremental object detection (CIOD).

One of the most commonly used methods in CIOD is experience replay (ER) [3, 16, 1, 10, 5, 15]. Random-based ER [3, 16, 4, 15] mitigates the complexity of multiple labels by simply randomly sampling from the previous data and building a buffer [15] for integration with the new data. RODEO [1] and Hard [10] suggested replay designed for CIOD, but it is still unclear whether they are the best strategy for preventing forgetting. Due to the lack of clarity on this effectiveness, we consider that there is room for enhancing these learning strategies, specifically replay strategy.

In this paper, we propose an effective class-wise buffer training strategy, eBTS. Our methods consist of two buffer configuration components and a simple but efficient training approach. First, guarantee minimum ensures the inclusion of a minimum quantity of each class sample, reflecting the class distribution of the prior dataset in the buffer. Second, hierarchical sampling prioritizes samples with high number of unique labels and low loss when the buffer becomes full. This helps to retrain more diverse labels and optimize data to the trained model. In terms of training approach, we propose circular experience replay (CER) that deals with the asymmetry between current and prior tasks’ data. It combines original ER training [15, 1, 10] and CER training, which are designed to avoid overfitting and to enhance prior knowledge. In Fig. 1, our method demonstrates the ability to accurately reflect the prior distribution in the buffer, as well as excellent performance. Our contributions can be summarized as follows:

1)

We introduce a buffer management strategy that is easily compatible with CIOD. The buffer manager operates the buffer based on two criteria: high number of unique labels and low loss, rather than any other single measures (e.g. many labels, randomness, etc.). We experimentally verified that it is the viable measure that reflects the tendency of the trained model.
2)

We propose an effective buffer training scheme, i.e. circular training, to overcome the imbalance caused by the limited capacity of the replay buffer and enhance previous detection performance.

2 Methods

2.1 Overview

Our goal is to continually expand our knowledge by incorporating new labels while retaining previous knowledge in class incremental object detection (CIOD). The setting of CIOD consists of multiple tasks, each with a predefined number of object classes denoted as $T_{t}={T_{1},\ldots,T_{N}}$ , where $N$ is the total number of tasks. Each task has its own corresponding dataset $\mathcal{D}_{t}$ which includes a set of input images $\mathbf{X}_{t}$ and corresponding labels $\mathbf{Y}_{t}$ :

\displaystyle\tiny~{}~{}\mathcal{D}_{t}\sim(\{\mathbf{X}_{t}:x_{1},...,x_{n_{t% }}\in T_{t}\},\{\mathbf{Y}_{t}:y_{1},...,y_{n_{t}}\in T_{t}\})

(1)

where $n_{t}$ is the number of data contained in $T_{t}$ , and $t$ is task index. Also, we use the buffer $\mathcal{B}$ which is a memory used for replay to store sample data (i.e. x, y). The $\mathcal{B}$ consists of data structured as follows:

\displaystyle\{I_{i}:(L_{i},U_{i})\}

(2)

where $I_{i}$ denotes the $i$ -th image, $L_{i}$ is the associated sample loss, and $U_{i}$ signifies the list of unique classes in it.

Our effective buffer training strategy, eBTS has three main components: 1) guarantee minimum process to construct the representative image buffer (Sec. 2.2), 2) hierarchical sampling for effective buffer configuration (Sec. 2.3), 3) circular training for utilization of the buffer (Sec. 2.4). The overall flow is shown in Fig. 2. We will describe more details in the following subsection.

Input:

K

m

\mathcal{D}_{t}

\mathcal{B}_{1:t-1}

\mathcal{M}_{1:t}

define:

\mathcal{B}\equiv\{I:(L,U)\}

// buffer data format

define:

\mathcal{D}_{e}\equiv\{x_{1},...,x_{N_{e}}\}

// extra dataset format

\mathcal{D}_{e}=\mathcal{B}_{1:t-1}\cup\mathcal{D}_{t}

t>1

else

\mathcal{D}_{t}

// extra dataset

for $d=1,\ldots,N_{e}$ do

I_{d},U_{d}\leftarrow

get_info(

d

) // id, unique labels of

d

L_{d}\leftarrow\mathcal{M}_{1:t}(d)

// loss value of

d

if $|\mathcal{B}|<K$ then

\mathcal{B}\leftarrow

(

I_{d}

L_{d}

U_{d}

)

else

// pick all labels set. (e.g.

U_{1},\ldots,U_{K}

)

U_{B}\leftarrow

get_all_unique_labels_set(

\mathcal{B}

)

// pick the labels that appear less than

m

U_{d}

\mathcal{U}=\{u\in U_{d}\mid\text{count}(u,U_{B})<m\}

// Eq. 4

if $\mathcal{U}=\emptyset$ then

\mathcal{R}\leftarrow\text{get\_samples}(\mathcal{B})

else

\mathcal{R}\leftarrow\text{get\_samples\_excluding\_labels}(\mathcal{B},% \mathcal{U})

end if

\mathcal{B}\leftarrow\text{\emph{BufferManager}}(\mathcal{B},\mathcal{R},% \mathcal{U},(I_{d},L_{d},U_{d}))

end if

end for

Output:

\mathcal{B}

Algorithm 1 Guarantee Minimum process

2.2 Guarantee minimum process

Data imbalance is a common problem in object detection tasks. Specifically, when creating a replay buffer, classes that are already under-represented in the data distribution may become scarcer, which can degrade detection performance. Therefore, it is important to ensure each class within the replay buffer has a minimum number of data. To address the issue, we propose the guarantee minimum (GM) method. This method maintains class-wise diversity in $\mathcal{B}$ and preserves the original data distribution $\mathcal{D}_{1:t-1}$ by ensuring a minimum of $m$ samples for every class.

Data structure. We generate an extra dataset $\mathcal{D}_{e}$ by combining $\mathcal{D}_{t}$ and $\mathcal{B}_{1:t-1}$ , and uses input sample $d$ as Eq. 2. Given $I_{d}$ and $U_{d}$ , we calculate the $L_{d}$ with pre-trained model $\mathcal{M}_{1:t}$ . Since we employ a transformer-based detector, we construct the loss function as follows:

\displaystyle L_{d}=L_{\text{Bbox}}+L_{\text{GIoU}}+L_{\text{Label}}

(3)

where $L_{\text{Bbox}}$ and $L_{\text{GIoU}}$ represent L1 loss and generalized IOU loss [13] for bounding box. Additionally, $L_{\text{Label}}$ [9] is cross entropy with focal loss for label. If the buffer $\mathcal{B}$ has not reached its maximum capacity $K$ , the new data is directly added. However, once it attains full capacity, a strategic approach becomes necessary for data replacement.

Guarantee process. To replace the buffer samples with class-wise diversity, we first identify the sets of unique labels $U_{B}\sim\{U_{1},\ldots,U_{K}\}$ in the $\mathcal{B}$ . After that, we introduce the set $\mathcal{U}$ containing under-represented labels (i.e. class indexes) below a certain bound $m$ :

\displaystyle\mathcal{U}=\{u\in U_{d}\mid\text{count}(u,U_{B})<m\}

(4)

where, $u$ is an element of the unique labels from the input data $U_{d}$ , and $m$ represents the minimum guarantee value. Then, we select the replacement candidates set $\mathcal{R}$ which contains samples without labels from $\mathcal{U}$ in $\mathcal{B}$ . If $\mathcal{U}$ is empty (i.e. all classes above $m$ ), we choose all samples in $\mathcal{B}$ as replacement candidates $\mathcal{R}$ . This approach ensures that our buffer reflects the overall class distribution, while also covering the rare labels more effectively. Finally, we use buffer manager employing hierarchical sampling (Sec. 2.3) to compare $\mathcal{R}$ with a new sample. We summarize our GM algorithm in Alg. 1.

Input:

\mathcal{B}

\mathcal{R}

\mathcal{U}

I_{d},L_{d},U_{d}

// inputs from Alg. 1

define:

\mathcal{B},\mathcal{R}\equiv\{I:(L,U)\}

// buffer & candidates format

\mathcal{R}_{\text{min\_U}}\leftarrow

min_U(

\mathcal{R}

) // cond. 1: number of unique labels

I_{\text{opt}},L_{\text{opt}},U_{\text{opt}}\leftarrow

highest_L(

\mathcal{R}_{\text{min\_U}}

) // cond. 2: loss

if $\mathcal{U}=\emptyset$ then

if $L_{\text{opt}}>L_{d}$ then

del

\mathcal{B}[I_{\text{opt}}]

// delete data in buffer

\mathcal{B}\leftarrow

(

I_{d}

L_{d}

U_{d}

) // insert new data to buffer

else

no change

end if

else

del

\mathcal{B}[I_{\text{opt}}]

// delete data in buffer

\mathcal{B}\leftarrow

(

I_{n}

L_{d}

U_{n}

) // insert new data to buffer

end if

Output:

\mathcal{B}

Algorithm 2 BufferManager

2.3 Hierarchical sampling strategy

In this section, we introduce hierarchical sampling to create a buffer containing representative samples of the prior knowledge through two strategies: high number of unique labels and low loss. The high number of unique labels strategy [1] is used to diversify the buffer configuration, preserving previously learned labels within a limited capacity. However, when the buffer needs replacement, samples with an equally low number of unique labels are randomly replaced without specific conditions. Therefore, we use a low-loss approach for a more sophisticated configuration. In general, a low loss value indicate that the prediction is similar to the actual sample and the model has been well-trained on that particular sample. Thus, we prioritize data by using the loss for samples with the same number of unique labels. We utilize hierarchical sampling (summarized in Alg. 2) to compare replacement candidates $\mathcal{R}$ and the input sample. We allocate an additional epoch to process all configuration procedures.

Table 1: Incremental results for the COCO validation set using Deformable DETR in various scenarios.

T_{1}

(40 or 70) represents the previous classes, and

T_{(1+2)}

(80) denotes testing for all classes. The best result is highlighted in bold.

Scenarios	Method	$T_{1}$ (Old)						$T_{(1+2)}$ (Overall)
Scenarios	Method	$mAP_{.5:.95}$	$mAP_{.5}$	$mAP_{.75}$	$mAP_{S}$	$mAP_{M}$	$mAP_{L}$	$mAP_{.5:.95}$	$mAP_{.5}$	$mAP_{.75}$	$mAP_{S}$	$mAP_{M}$	$mAP_{L}$
70 + 10	CutMix [16]	0.087	0.207	0.065	0.028	0.098	0.141	0.086	0.206	0.063	0.034	0.097	0.135
	RODEO [1]	0.064	0.109	0.066	0.042	0.097	0.091	0.094	0.151	0.100	0.056	0.127	0.137
	Hard [10]	0.068	0.124	0.067	0.059	0.104	0.075	0.095	0.161	0.098	0.074	0.128	0.120
	Ours w/o CER	0.179	0.288	0.192	0.089	0.209	0.238	0.190	0.304	0.203	0.097	0.218	0.261
	Ours	0.213	0.334	0.231	0.104	0.237	0.295	0.221	0.345	0.240	0.114	0.246	0.308
40 + 40	CutMix [16]	0.131	0.286	0.104	0.058	0.150	0.201	0.135	0.295	0.106	0.051	0.148	0.212
	RODEO [1]	0.095	0.153	0.099	0.073	0.113	0.103	0.233	0.343	0.252	0.130	0.256	0.311
	Hard [10]	0.072	0.131	0.072	0.070	0.107	0.059	0.220	0.332	0.239	0.121	0.250	0.285
	Ours w/o CER	0.168	0.271	0.176	0.099	0.199	0.194	0.270	0.405	0.293	0.144	0.297	0.367
	Ours	0.222	0.356	0.234	0.125	0.255	0.296	0.271	0.419	0.294	0.136	0.296	0.376

2.4 Circular experience replay training

Previous CIOD replay methods [1, 10] used experience replay (ER) training method, which utilized a large buffer capacity to prevent forgetting a relatively small number of buffer data. However, this approach results in significant resource wastage. To address this issue, we propose the circular experience replay (CER) training strategy to make full use of the buffer which has limited capacity. First, we separates $\mathcal{D}_{t+1}$ and $\mathcal{B}_{1:t}$ to create the distinct training datasets. We then train the model with randomly selecting data from both datasets. The $\mathcal{B}_{1:t}$ is repeatedly utilized with uniform probability until all $\mathcal{D}_{t+1}$ is fully used. To enhance the utilization of previous information, we apply CER training following ER training.

3 Experiments

3.1 Implementation and experiments

eBTS is based on Deformable DETR [18] trained from scratch on the COCO [8] for 50 epochs in each task. All experiments are performed using 4 RTX3090 GPUs with batch size of 3. In the first, two-phase, we incrementally train 40+40 and 70+10 divided classes. We evaluate the model on $T_{1+2}$ and $T_{1}$ to assess the degree of forgetting. In the second, multiple-phase, we train 40+20+20 divided classes. Then, we test the model by combining the added classes. To ensure a fair comparison, we only extracted the replay components from various CIOD methods [16, 1, 10] that use replay and trained them using our baseline. We kept all conditions identical, except for the buffer composition (random [16], high number of unique labels [1], many labels [10] and training method (original ER [1, 10], CutMix [17] based CutMix ER [16]). In all our experiments, we set the buffer capacity at around 1% (1200) of the COCO, and the least $m$ set at 1% (12) of the buffer capacity.

Table 2: Comparison of the appropriate proportions of CER used with ER on COCO. The best result is highlighted in bold

phase		7010		4040
ER-CER Ratio		$T_{1}$	$T_{(1+2)}$	$T_{1}$	$T_{(1+2)}$
ER	+ CER	AP	AP	AP	AP
40	10	0.168	0.183	0.172	0.253
42	8	0.169	0.185	0.192	0.262
44	6	0.188	0.199	0.210	0.271
46	4	0.194	0.208	0.192	0.260
48	2	0.213	0.221	0.222	0.271

3.2 Experimental results

We analyze two-phase results using the mAP metric on COCO dataset [8]. In Table 1, we qualitatively show that our approach eBTS achieved state-of-the-art results in the $T_{1}$ and $T_{1+2}$ . Furthermore, our method (“ours w/o CER”) performs well even without using the circular training strategy, in comparison to previous methods. This indicates the effectiveness of our buffer configuration algorithm, which includes guarantee minimum processing and a hierarchical sampling strategy, in retaining previous knowledge $T_{1}$ . As shown in Fig. 3, our method (“Ours w/o CER”) also demonstrates good performance after training the last task in the multi-phase. To ensure a fair, we exclusively employed ER training for all methods, excluding CER from our complete algorithm and omitting CutMix training [17] used in CutMix [16]. CutMix demonstrates comparable performance to our approach at task 2, but becomes less effective as the number of classes to be collected increases.

3.3 Ablation

In Table 2, we demonstrate how the ratio of ER and CER is defined within the specified 50 epochs. The best performance is achieved with a 48:2 ratio in both the 70+10 and 40+40. Furthermore, Table 1 highlights that CER significantly improves performance in the 70+10 setup, where a larger number of classes need to be retained. The mAP increases from 0.190 to 0.221, compared to a smaller improvement from 0.270 to 0.271 in the 40+40.

4 Conclusion

In this paper, we propose an improved replay scheme to overcome the existing constraints in the class incremental object detection task. Our approach, eBTS, effectively manages the replay buffer with the guarantee minimum process and hierarchical sampling. In addition, we use a circular training strategy to address data imbalance. Our method demonstrates better performance in reducing catastrophic forgetting on the COCO dataset compared to existing methods. The ablation study demonstrates the optimal ratios for experience replay and circular experience replay. In future work, we aim to integrate our proposed method with other strategies to deal with the forgetting problem more effectively.

References

Acharya et al. [2020] Manoj Acharya, Tyler L Hayes, and Christopher Kanan. Rodeo: Replay for online object detection. BMVC, 2020.
Bang et al. [2021] Jihwan Bang, Heesu Kim, YoungJoon Yoo, Jung-Woo Ha, and Jonghyun Choi. Rainbow memory: Continual learning with a memory of diverse samples. In CVPR, 2021.
Chaudhry et al. [2018] Arslan Chaudhry, Puneet K Dokania, Thalaiyasingam Ajanthan, and Philip HS Torr. Riemannian walk for incremental learning: Understanding forgetting and intransigence. In ECCV, 2018.
Guo et al. [2020] Yunhui Guo, Mingrui Liu, Tianbao Yang, and Tajana Rosing. Improved schemes for episodic memory-based lifelong learning. In NIPS, 2020.
He et al. [2018] Chen He, Rui** Wang, Shiguang Shan, and Xilin Chen. Exemplar-supported generative reproduction for class incremental learning. In BMVC, 2018.
Kirkpatrick et al. [2017] James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. Overcoming catastrophic forgetting in neural networks. PNAS, 2017.
Koh et al. [2021] Hyunseo Koh, Dahyun Kim, Jung-Woo Ha, and Jonghyun Choi. Online continual learning on class incremental blurry task configuration with anytime inference. arXiv, 2021.
Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In ECCV, 2014.
Lin et al. [2017] Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. Focal loss for dense object detection. In ICCV, 2017.
Liu et al. [2020] Xialei Liu, Hao Yang, Avinash Ravichandran, Rahul Bhotika, and Stefano Soatto. Multi-task incremental learning for object detection. arXiv, 2020.
Lopez-Paz and Ranzato [2017] David Lopez-Paz and Marc’Aurelio Ranzato. Gradient episodic memory for continual learning. In NIPS, 2017.
Rebuffi et al. [2017] Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H Lampert. icarl: Incremental classifier and representation learning. In CVPR, 2017.
Rezatofighi et al. [2019] Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir Sadeghian, Ian Reid, and Silvio Savarese. Generalized intersection over union: A metric and a loss for bounding box regression. In CVPR, 2019.
Robins [1995] Anthony Robins. Catastrophic forgetting, rehearsal and pseudorehearsal. Connection Science, 1995.
Rolnick et al. [2019] David Rolnick, Arun Ahuja, Jonathan Schwarz, Timothy Lillicrap, and Gregory Wayne. Experience replay for continual learning. In NIPS, 2019.
Shieh et al. [2020] Jeng-Lun Shieh, Qazi Mazhar ul Haq, Muhamad Amirul Haq, Said Karam, Peter Chondro, De-Qin Gao, and Shanq-Jang Ruan. Continual learning strategy in one-stage object detection framework based on experience replay for autonomous driving vehicle. Sensors, 2020.
Yun et al. [2019] Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, and Youngjoon Yoo. Cutmix: Regularization strategy to train strong classifiers with localizable features. In ICCV, 2019.
Zhu et al. [2020] Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai. Deformable detr: Deformable transformers for end-to-end object detection. ICLR, 2020.

Class-Wise Buffer Management for Incremental Object Detection: An Effective Buffer Training Strategy

Abstract

1 Introduction

2 Methods

2.1 Overview

2.2 Guarantee minimum process

2.3 Hierarchical sampling strategy

2.4 Circular experience replay training

3 Experiments

3.1 Implementation and experiments

3.2 Experimental results

3.3 Ablation

4 Conclusion

References

Class-Wise Buffer Management for Incremental Object Detection:
An Effective Buffer Training Strategy