Class-Wise Buffer Management for Incremental Object Detection:
An Effective Buffer Training Strategy
Abstract
Class incremental learning aims to solve a problem that arises when continuously adding unseen class instances to an existing model This approach has been extensively studied in the context of image classification; however its applicability to object detection is not well established yet. Existing frameworks using replay methods mainly collect replay data without considering the model being trained and tend to rely on randomness or the number of labels of each sample. Also, despite the effectiveness of the replay, it was not yet optimized for the object detection task. In this paper, we introduce an effective buffer training strategy (eBTS) that creates the optimized replay buffer on object detection. Our approach incorporates guarantee minimum and hierarchical sampling to establish the buffer customized to the trained model. Furthermore, we use the circular experience replay training to optimally utilize the accumulated buffer data. Experiments on the MS COCO dataset demonstrate that our eBTS achieves state-of-the-art performance compared to the existing replay schemes.
1 Introduction
Traditional machine learning models tend to forget previously learned patterns when trained on new datasets, a phenomenon called “catastrophic forgetting” [14]. This poses challenges for models operating in dynamic environments. However, unlike machines, humans can learn new concepts without entirely forgetting pre-existing knowledge. Building on this insight, incremental learning aims to address this issue by training models to assimilate new concepts progressively without retraining on the entire past dataset, effectively preserving knowledge from prior tasks while integrating new task.
Most incremental methods [12, 4, 6, 15, 11, 7, 2] handle image classification task. We can also apply these incremental methods to object detection; however, due to varying labels for foreground objects in the scene, the strategies for object detection are relatively ineffective. Nevertheless, this task can play an important role in real-world applications. This allows us to adapt to environments where new object labels are constantly appearing. For example, when a new product is discovered, the detection system should recognize it while simultaneously detecting previous labels. Instead of completely retraining the model every time new labels appear, it helps to update the model to accommodate the unseen label incrementally. This greatly improves flexibility and persistence in real-world applications and saves computing resources. We call this work class incremental object detection (CIOD).
One of the most commonly used methods in CIOD is experience replay (ER) [3, 16, 1, 10, 5, 15]. Random-based ER [3, 16, 4, 15] mitigates the complexity of multiple labels by simply randomly sampling from the previous data and building a buffer [15] for integration with the new data. RODEO [1] and Hard [10] suggested replay designed for CIOD, but it is still unclear whether they are the best strategy for preventing forgetting. Due to the lack of clarity on this effectiveness, we consider that there is room for enhancing these learning strategies, specifically replay strategy.
In this paper, we propose an effective class-wise buffer training strategy, eBTS. Our methods consist of two buffer configuration components and a simple but efficient training approach. First, guarantee minimum ensures the inclusion of a minimum quantity of each class sample, reflecting the class distribution of the prior dataset in the buffer. Second, hierarchical sampling prioritizes samples with high number of unique labels and low loss when the buffer becomes full. This helps to retrain more diverse labels and optimize data to the trained model. In terms of training approach, we propose circular experience replay (CER) that deals with the asymmetry between current and prior tasks’ data. It combines original ER training [15, 1, 10] and CER training, which are designed to avoid overfitting and to enhance prior knowledge. In Fig. 1, our method demonstrates the ability to accurately reflect the prior distribution in the buffer, as well as excellent performance. Our contributions can be summarized as follows:
-
1)
We introduce a buffer management strategy that is easily compatible with CIOD. The buffer manager operates the buffer based on two criteria: high number of unique labels and low loss, rather than any other single measures (e.g. many labels, randomness, etc.). We experimentally verified that it is the viable measure that reflects the tendency of the trained model.
-
2)
We propose an effective buffer training scheme, i.e. circular training, to overcome the imbalance caused by the limited capacity of the replay buffer and enhance previous detection performance.
2 Methods
2.1 Overview
Our goal is to continually expand our knowledge by incorporating new labels while retaining previous knowledge in class incremental object detection (CIOD). The setting of CIOD consists of multiple tasks, each with a predefined number of object classes denoted as , where is the total number of tasks. Each task has its own corresponding dataset which includes a set of input images and corresponding labels :
(1) |
where is the number of data contained in , and is task index. Also, we use the buffer which is a memory used for replay to store sample data (i.e. x, y). The consists of data structured as follows:
(2) |
where denotes the -th image, is the associated sample loss, and signifies the list of unique classes in it.
Our effective buffer training strategy, eBTS has three main components: 1) guarantee minimum process to construct the representative image buffer (Sec. 2.2), 2) hierarchical sampling for effective buffer configuration (Sec. 2.3), 3) circular training for utilization of the buffer (Sec. 2.4). The overall flow is shown in Fig. 2. We will describe more details in the following subsection.
2.2 Guarantee minimum process
Data imbalance is a common problem in object detection tasks. Specifically, when creating a replay buffer, classes that are already under-represented in the data distribution may become scarcer, which can degrade detection performance. Therefore, it is important to ensure each class within the replay buffer has a minimum number of data. To address the issue, we propose the guarantee minimum (GM) method. This method maintains class-wise diversity in and preserves the original data distribution by ensuring a minimum of samples for every class.
Data structure. We generate an extra dataset by combining and , and uses input sample as Eq. 2. Given and , we calculate the with pre-trained model . Since we employ a transformer-based detector, we construct the loss function as follows:
(3) |
where and represent L1 loss and generalized IOU loss [13] for bounding box. Additionally, [9] is cross entropy with focal loss for label. If the buffer has not reached its maximum capacity , the new data is directly added. However, once it attains full capacity, a strategic approach becomes necessary for data replacement.
Guarantee process. To replace the buffer samples with class-wise diversity, we first identify the sets of unique labels in the . After that, we introduce the set containing under-represented labels (i.e. class indexes) below a certain bound :
(4) |
where, is an element of the unique labels from the input data , and represents the minimum guarantee value. Then, we select the replacement candidates set which contains samples without labels from in . If is empty (i.e. all classes above ), we choose all samples in as replacement candidates . This approach ensures that our buffer reflects the overall class distribution, while also covering the rare labels more effectively. Finally, we use buffer manager employing hierarchical sampling (Sec. 2.3) to compare with a new sample. We summarize our GM algorithm in Alg. 1.
2.3 Hierarchical sampling strategy
In this section, we introduce hierarchical sampling to create a buffer containing representative samples of the prior knowledge through two strategies: high number of unique labels and low loss. The high number of unique labels strategy [1] is used to diversify the buffer configuration, preserving previously learned labels within a limited capacity. However, when the buffer needs replacement, samples with an equally low number of unique labels are randomly replaced without specific conditions. Therefore, we use a low-loss approach for a more sophisticated configuration. In general, a low loss value indicate that the prediction is similar to the actual sample and the model has been well-trained on that particular sample. Thus, we prioritize data by using the loss for samples with the same number of unique labels. We utilize hierarchical sampling (summarized in Alg. 2) to compare replacement candidates and the input sample. We allocate an additional epoch to process all configuration procedures.
Scenarios | Method | (Old) | (Overall) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
70 + 10 | CutMix [16] | 0.087 | 0.207 | 0.065 | 0.028 | 0.098 | 0.141 | 0.086 | 0.206 | 0.063 | 0.034 | 0.097 | 0.135 |
RODEO [1] | 0.064 | 0.109 | 0.066 | 0.042 | 0.097 | 0.091 | 0.094 | 0.151 | 0.100 | 0.056 | 0.127 | 0.137 | |
Hard [10] | 0.068 | 0.124 | 0.067 | 0.059 | 0.104 | 0.075 | 0.095 | 0.161 | 0.098 | 0.074 | 0.128 | 0.120 | |
Ours w/o CER | 0.179 | 0.288 | 0.192 | 0.089 | 0.209 | 0.238 | 0.190 | 0.304 | 0.203 | 0.097 | 0.218 | 0.261 | |
Ours | 0.213 | 0.334 | 0.231 | 0.104 | 0.237 | 0.295 | 0.221 | 0.345 | 0.240 | 0.114 | 0.246 | 0.308 | |
40 + 40 | CutMix [16] | 0.131 | 0.286 | 0.104 | 0.058 | 0.150 | 0.201 | 0.135 | 0.295 | 0.106 | 0.051 | 0.148 | 0.212 |
RODEO [1] | 0.095 | 0.153 | 0.099 | 0.073 | 0.113 | 0.103 | 0.233 | 0.343 | 0.252 | 0.130 | 0.256 | 0.311 | |
Hard [10] | 0.072 | 0.131 | 0.072 | 0.070 | 0.107 | 0.059 | 0.220 | 0.332 | 0.239 | 0.121 | 0.250 | 0.285 | |
Ours w/o CER | 0.168 | 0.271 | 0.176 | 0.099 | 0.199 | 0.194 | 0.270 | 0.405 | 0.293 | 0.144 | 0.297 | 0.367 | |
Ours | 0.222 | 0.356 | 0.234 | 0.125 | 0.255 | 0.296 | 0.271 | 0.419 | 0.294 | 0.136 | 0.296 | 0.376 |
2.4 Circular experience replay training
Previous CIOD replay methods [1, 10] used experience replay (ER) training method, which utilized a large buffer capacity to prevent forgetting a relatively small number of buffer data. However, this approach results in significant resource wastage. To address this issue, we propose the circular experience replay (CER) training strategy to make full use of the buffer which has limited capacity. First, we separates and to create the distinct training datasets. We then train the model with randomly selecting data from both datasets. The is repeatedly utilized with uniform probability until all is fully used. To enhance the utilization of previous information, we apply CER training following ER training.
3 Experiments
3.1 Implementation and experiments
eBTS is based on Deformable DETR [18] trained from scratch on the COCO [8] for 50 epochs in each task. All experiments are performed using 4 RTX3090 GPUs with batch size of 3. In the first, two-phase, we incrementally train 40+40 and 70+10 divided classes. We evaluate the model on and to assess the degree of forgetting. In the second, multiple-phase, we train 40+20+20 divided classes. Then, we test the model by combining the added classes. To ensure a fair comparison, we only extracted the replay components from various CIOD methods [16, 1, 10] that use replay and trained them using our baseline. We kept all conditions identical, except for the buffer composition (random [16], high number of unique labels [1], many labels [10] and training method (original ER [1, 10], CutMix [17] based CutMix ER [16]). In all our experiments, we set the buffer capacity at around 1% (1200) of the COCO, and the least set at 1% (12) of the buffer capacity.
phase | 7010 | 4040 | |||
---|---|---|---|---|---|
ER-CER Ratio | |||||
ER | + CER | AP | AP | AP | AP |
40 | 10 | 0.168 | 0.183 | 0.172 | 0.253 |
42 | 8 | 0.169 | 0.185 | 0.192 | 0.262 |
44 | 6 | 0.188 | 0.199 | 0.210 | 0.271 |
46 | 4 | 0.194 | 0.208 | 0.192 | 0.260 |
48 | 2 | 0.213 | 0.221 | 0.222 | 0.271 |
3.2 Experimental results
We analyze two-phase results using the mAP metric on COCO dataset [8]. In Table 1, we qualitatively show that our approach eBTS achieved state-of-the-art results in the and . Furthermore, our method (“ours w/o CER”) performs well even without using the circular training strategy, in comparison to previous methods. This indicates the effectiveness of our buffer configuration algorithm, which includes guarantee minimum processing and a hierarchical sampling strategy, in retaining previous knowledge . As shown in Fig. 3, our method (“Ours w/o CER”) also demonstrates good performance after training the last task in the multi-phase. To ensure a fair, we exclusively employed ER training for all methods, excluding CER from our complete algorithm and omitting CutMix training [17] used in CutMix [16]. CutMix demonstrates comparable performance to our approach at task 2, but becomes less effective as the number of classes to be collected increases.
3.3 Ablation
In Table 2, we demonstrate how the ratio of ER and CER is defined within the specified 50 epochs. The best performance is achieved with a 48:2 ratio in both the 70+10 and 40+40. Furthermore, Table 1 highlights that CER significantly improves performance in the 70+10 setup, where a larger number of classes need to be retained. The mAP increases from 0.190 to 0.221, compared to a smaller improvement from 0.270 to 0.271 in the 40+40.
4 Conclusion
In this paper, we propose an improved replay scheme to overcome the existing constraints in the class incremental object detection task. Our approach, eBTS, effectively manages the replay buffer with the guarantee minimum process and hierarchical sampling. In addition, we use a circular training strategy to address data imbalance. Our method demonstrates better performance in reducing catastrophic forgetting on the COCO dataset compared to existing methods. The ablation study demonstrates the optimal ratios for experience replay and circular experience replay. In future work, we aim to integrate our proposed method with other strategies to deal with the forgetting problem more effectively.
References
- Acharya et al. [2020] Manoj Acharya, Tyler L Hayes, and Christopher Kanan. Rodeo: Replay for online object detection. BMVC, 2020.
- Bang et al. [2021] Jihwan Bang, Heesu Kim, YoungJoon Yoo, Jung-Woo Ha, and Jonghyun Choi. Rainbow memory: Continual learning with a memory of diverse samples. In CVPR, 2021.
- Chaudhry et al. [2018] Arslan Chaudhry, Puneet K Dokania, Thalaiyasingam Ajanthan, and Philip HS Torr. Riemannian walk for incremental learning: Understanding forgetting and intransigence. In ECCV, 2018.
- Guo et al. [2020] Yunhui Guo, Mingrui Liu, Tianbao Yang, and Tajana Rosing. Improved schemes for episodic memory-based lifelong learning. In NIPS, 2020.
- He et al. [2018] Chen He, Rui** Wang, Shiguang Shan, and Xilin Chen. Exemplar-supported generative reproduction for class incremental learning. In BMVC, 2018.
- Kirkpatrick et al. [2017] James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. Overcoming catastrophic forgetting in neural networks. PNAS, 2017.
- Koh et al. [2021] Hyunseo Koh, Dahyun Kim, Jung-Woo Ha, and Jonghyun Choi. Online continual learning on class incremental blurry task configuration with anytime inference. arXiv, 2021.
- Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In ECCV, 2014.
- Lin et al. [2017] Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. Focal loss for dense object detection. In ICCV, 2017.
- Liu et al. [2020] Xialei Liu, Hao Yang, Avinash Ravichandran, Rahul Bhotika, and Stefano Soatto. Multi-task incremental learning for object detection. arXiv, 2020.
- Lopez-Paz and Ranzato [2017] David Lopez-Paz and Marc’Aurelio Ranzato. Gradient episodic memory for continual learning. In NIPS, 2017.
- Rebuffi et al. [2017] Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H Lampert. icarl: Incremental classifier and representation learning. In CVPR, 2017.
- Rezatofighi et al. [2019] Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir Sadeghian, Ian Reid, and Silvio Savarese. Generalized intersection over union: A metric and a loss for bounding box regression. In CVPR, 2019.
- Robins [1995] Anthony Robins. Catastrophic forgetting, rehearsal and pseudorehearsal. Connection Science, 1995.
- Rolnick et al. [2019] David Rolnick, Arun Ahuja, Jonathan Schwarz, Timothy Lillicrap, and Gregory Wayne. Experience replay for continual learning. In NIPS, 2019.
- Shieh et al. [2020] Jeng-Lun Shieh, Qazi Mazhar ul Haq, Muhamad Amirul Haq, Said Karam, Peter Chondro, De-Qin Gao, and Shanq-Jang Ruan. Continual learning strategy in one-stage object detection framework based on experience replay for autonomous driving vehicle. Sensors, 2020.
- Yun et al. [2019] Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, and Youngjoon Yoo. Cutmix: Regularization strategy to train strong classifiers with localizable features. In ICCV, 2019.
- Zhu et al. [2020] Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai. Deformable detr: Deformable transformers for end-to-end object detection. ICLR, 2020.