HTML conversions sometimes display errors due to content that did not convert correctly from the source. This paper uses the following packages that are not yet supported by the HTML conversion tool. Feedback on these issues are not necessary; they are known and are being worked on.

  • failed: kotex
  • failed: epic

Authors: achieve the best HTML results from your LaTeX submissions by selecting from this list of supported packages.

License: arXiv.org perpetual non-exclusive license
arXiv:2312.09139v1 [cs.CV] 14 Dec 2023

Class-Wise Buffer Management for Incremental Object Detection:
An Effective Buffer Training Strategy

Junsu Kim1 Sumin Hong2 Chanwoo Kim1 Jihyeon Kim1 Yihalem Yimolal Tiruneh1
Jeongwan On4 Jihyun Song3 Sunhwa Choi3 Seungryul Baek1

1UNIST, South Korea  2SeoulTech, South Korea
3LG Electronics, South Korea  4Chonnam National University, South Korea
Abstract

Class incremental learning aims to solve a problem that arises when continuously adding unseen class instances to an existing model This approach has been extensively studied in the context of image classification; however its applicability to object detection is not well established yet. Existing frameworks using replay methods mainly collect replay data without considering the model being trained and tend to rely on randomness or the number of labels of each sample. Also, despite the effectiveness of the replay, it was not yet optimized for the object detection task. In this paper, we introduce an effective buffer training strategy (eBTS) that creates the optimized replay buffer on object detection. Our approach incorporates guarantee minimum and hierarchical sampling to establish the buffer customized to the trained model. Furthermore, we use the circular experience replay training to optimally utilize the accumulated buffer data. Experiments on the MS COCO dataset demonstrate that our eBTS achieves state-of-the-art performance compared to the existing replay schemes.

1 Introduction

Traditional machine learning models tend to forget previously learned patterns when trained on new datasets, a phenomenon called “catastrophic forgetting” [14]. This poses challenges for models operating in dynamic environments. However, unlike machines, humans can learn new concepts without entirely forgetting pre-existing knowledge. Building on this insight, incremental learning aims to address this issue by training models to assimilate new concepts progressively without retraining on the entire past dataset, effectively preserving knowledge from prior tasks while integrating new task.

Refer to caption
Figure 1: The final mean average precision (mAP𝑚𝐴𝑃mAPitalic_m italic_A italic_P. %) and the number of classes that satisfy COCO’s distribution at the 40+40 setup. We use the following formula (buffer capacity×Number of samples in CiTotal number of samples in C1,,n)buffer capacityNumber of samples in subscript𝐶𝑖Total number of samples in subscript𝐶1𝑛\left(\text{buffer capacity}\times\frac{\text{Number of samples in }C_{i}}{% \text{Total number of samples in }C_{1,\ldots,n}}\right)( buffer capacity × divide start_ARG Number of samples in italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG Total number of samples in italic_C start_POSTSUBSCRIPT 1 , … , italic_n end_POSTSUBSCRIPT end_ARG ) to check the distribution for the previous classes.

Most incremental methods [12, 4, 6, 15, 11, 7, 2] handle image classification task. We can also apply these incremental methods to object detection; however, due to varying labels for foreground objects in the scene, the strategies for object detection are relatively ineffective. Nevertheless, this task can play an important role in real-world applications. This allows us to adapt to environments where new object labels are constantly appearing. For example, when a new product is discovered, the detection system should recognize it while simultaneously detecting previous labels. Instead of completely retraining the model every time new labels appear, it helps to update the model to accommodate the unseen label incrementally. This greatly improves flexibility and persistence in real-world applications and saves computing resources. We call this work class incremental object detection (CIOD).

One of the most commonly used methods in CIOD is experience replay (ER) [3, 16, 1, 10, 5, 15]. Random-based ER [3, 16, 4, 15] mitigates the complexity of multiple labels by simply randomly sampling from the previous data and building a buffer [15] for integration with the new data. RODEO [1] and Hard [10] suggested replay designed for CIOD, but it is still unclear whether they are the best strategy for preventing forgetting. Due to the lack of clarity on this effectiveness, we consider that there is room for enhancing these learning strategies, specifically replay strategy.

In this paper, we propose an effective class-wise buffer training strategy, eBTS. Our methods consist of two buffer configuration components and a simple but efficient training approach. First, guarantee minimum ensures the inclusion of a minimum quantity of each class sample, reflecting the class distribution of the prior dataset in the buffer. Second, hierarchical sampling prioritizes samples with high number of unique labels and low loss when the buffer becomes full. This helps to retrain more diverse labels and optimize data to the trained model. In terms of training approach, we propose circular experience replay (CER) that deals with the asymmetry between current and prior tasks’ data. It combines original ER training [15, 1, 10] and CER training, which are designed to avoid overfitting and to enhance prior knowledge. In Fig. 1, our method demonstrates the ability to accurately reflect the prior distribution in the buffer, as well as excellent performance. Our contributions can be summarized as follows:

  1. 1)

    We introduce a buffer management strategy that is easily compatible with CIOD. The buffer manager operates the buffer based on two criteria: high number of unique labels and low loss, rather than any other single measures (e.g. many labels, randomness, etc.). We experimentally verified that it is the viable measure that reflects the tendency of the trained model.

  2. 2)

    We propose an effective buffer training scheme, i.e. circular training, to overcome the imbalance caused by the limited capacity of the replay buffer and enhance previous detection performance.

Refer to caption
Figure 2: The overall process. During the buffer configuration process, when the buffer is full, we perform the GM process to ensure coverage of all classes. Next, we select a candidate from the buffer to compare with new data based on two conditions. When training new data, we use the CER training strategy following ER training strategy within a specific epoch.

2 Methods

2.1 Overview

Our goal is to continually expand our knowledge by incorporating new labels while retaining previous knowledge in class incremental object detection (CIOD). The setting of CIOD consists of multiple tasks, each with a predefined number of object classes denoted as Tt=T1,,TNsubscript𝑇𝑡subscript𝑇1subscript𝑇𝑁T_{t}={T_{1},\ldots,T_{N}}italic_T start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_T start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT, where N𝑁Nitalic_N is the total number of tasks. Each task has its own corresponding dataset 𝒟tsubscript𝒟𝑡\mathcal{D}_{t}caligraphic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT which includes a set of input images 𝐗tsubscript𝐗𝑡\mathbf{X}_{t}bold_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and corresponding labels 𝐘tsubscript𝐘𝑡\mathbf{Y}_{t}bold_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT:

𝒟t({𝐗t:x1,,xntTt},{𝐘t:y1,,yntTt})similar-tosubscript𝒟𝑡conditional-setsubscript𝐗𝑡subscript𝑥1subscript𝑥subscript𝑛𝑡subscript𝑇𝑡conditional-setsubscript𝐘𝑡subscript𝑦1subscript𝑦subscript𝑛𝑡subscript𝑇𝑡\displaystyle\tiny~{}~{}\mathcal{D}_{t}\sim(\{\mathbf{X}_{t}:x_{1},...,x_{n_{t% }}\in T_{t}\},\{\mathbf{Y}_{t}:y_{1},...,y_{n_{t}}\in T_{t}\})caligraphic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∼ ( { bold_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT : italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∈ italic_T start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } , { bold_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT : italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_y start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∈ italic_T start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } ) (1)

where ntsubscript𝑛𝑡n_{t}italic_n start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the number of data contained in Ttsubscript𝑇𝑡T_{t}italic_T start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, and t𝑡titalic_t is task index. Also, we use the buffer \mathcal{B}caligraphic_B which is a memory used for replay to store sample data (i.e. x, y). The \mathcal{B}caligraphic_B consists of data structured as follows:

{Ii:(Li,Ui)}conditional-setsubscript𝐼𝑖subscript𝐿𝑖subscript𝑈𝑖\displaystyle\{I_{i}:(L_{i},U_{i})\}{ italic_I start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT : ( italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } (2)

where Iisubscript𝐼𝑖I_{i}italic_I start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT denotes the i𝑖iitalic_i-th image, Lisubscript𝐿𝑖L_{i}italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the associated sample loss, and Uisubscript𝑈𝑖U_{i}italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT signifies the list of unique classes in it.

Our effective buffer training strategy, eBTS has three main components: 1) guarantee minimum process to construct the representative image buffer (Sec. 2.2), 2) hierarchical sampling for effective buffer configuration (Sec. 2.3), 3) circular training for utilization of the buffer (Sec. 2.4). The overall flow is shown in Fig. 2. We will describe more details in the following subsection.

Input: K𝐾Kitalic_K, m𝑚mitalic_m, 𝒟tsubscript𝒟𝑡\mathcal{D}_{t}caligraphic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, 1:t1subscriptnormal-:1𝑡1\mathcal{B}_{1:t-1}caligraphic_B start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT, 1:tsubscriptnormal-:1𝑡\mathcal{M}_{1:t}caligraphic_M start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT
define: {I:(L,U)}conditional-set𝐼𝐿𝑈\mathcal{B}\equiv\{I:(L,U)\}caligraphic_B ≡ { italic_I : ( italic_L , italic_U ) } // buffer data format
define: 𝒟e{x1,,xNe}subscript𝒟𝑒subscript𝑥1subscript𝑥subscript𝑁𝑒\mathcal{D}_{e}\equiv\{x_{1},...,x_{N_{e}}\}caligraphic_D start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ≡ { italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_x start_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_POSTSUBSCRIPT } // extra dataset format
𝒟e=1:t1𝒟tsubscript𝒟𝑒subscript:1𝑡1subscript𝒟𝑡\mathcal{D}_{e}=\mathcal{B}_{1:t-1}\cup\mathcal{D}_{t}caligraphic_D start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT = caligraphic_B start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT ∪ caligraphic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT if t>1𝑡1t>1italic_t > 1 else 𝒟tsubscript𝒟𝑡\mathcal{D}_{t}caligraphic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT // extra dataset
for d=1,,Ne𝑑1normal-…subscript𝑁𝑒d=1,\ldots,N_{e}italic_d = 1 , … , italic_N start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT do
       Id,Udsubscript𝐼𝑑subscript𝑈𝑑absentI_{d},U_{d}\leftarrowitalic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT , italic_U start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ← get_info(d𝑑ditalic_d) // id, unique labels of d𝑑ditalic_d
       Ld1:t(d)subscript𝐿𝑑subscript:1𝑡𝑑L_{d}\leftarrow\mathcal{M}_{1:t}(d)italic_L start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ← caligraphic_M start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT ( italic_d ) // loss value of d𝑑ditalic_d
       if ||<K𝐾|\mathcal{B}|<K| caligraphic_B | < italic_K then
             absent\mathcal{B}\leftarrowcaligraphic_B ← (Idsubscript𝐼𝑑I_{d}italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT, Ldsubscript𝐿𝑑L_{d}italic_L start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT, Udsubscript𝑈𝑑U_{d}italic_U start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT)
            
      else
             // pick all labels set. (e.g. U1,,UKsubscript𝑈1subscript𝑈𝐾U_{1},\ldots,U_{K}italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_U start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT)
             UBsubscript𝑈𝐵absentU_{B}\leftarrowitalic_U start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ← get_all_unique_labels_set(\mathcal{B}caligraphic_B)
             // pick the labels that appear less than m𝑚mitalic_m in Udsubscript𝑈𝑑U_{d}italic_U start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT
             𝒰={uUdcount(u,UB)<m}𝒰conditional-set𝑢subscript𝑈𝑑count𝑢subscript𝑈𝐵𝑚\mathcal{U}=\{u\in U_{d}\mid\text{count}(u,U_{B})<m\}caligraphic_U = { italic_u ∈ italic_U start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ∣ count ( italic_u , italic_U start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ) < italic_m } // Eq. 4
             if 𝒰=𝒰\mathcal{U}=\emptysetcaligraphic_U = ∅ then
                  get_samples()get_samples\mathcal{R}\leftarrow\text{get\_samples}(\mathcal{B})caligraphic_R ← get_samples ( caligraphic_B )
            else
                  get_samples_excluding_labels(,𝒰)get_samples_excluding_labels𝒰\mathcal{R}\leftarrow\text{get\_samples\_excluding\_labels}(\mathcal{B},% \mathcal{U})caligraphic_R ← get_samples_excluding_labels ( caligraphic_B , caligraphic_U )
             end if
            BufferManager(,,𝒰,(Id,Ld,Ud))BufferManager𝒰subscript𝐼𝑑subscript𝐿𝑑subscript𝑈𝑑\mathcal{B}\leftarrow\text{\emph{BufferManager}}(\mathcal{B},\mathcal{R},% \mathcal{U},(I_{d},L_{d},U_{d}))caligraphic_B ← BufferManager ( caligraphic_B , caligraphic_R , caligraphic_U , ( italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT , italic_L start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT , italic_U start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) )
            
       end if
      
end for
Output: \mathcal{B}caligraphic_B
Algorithm 1 Guarantee Minimum process

2.2 Guarantee minimum process

Data imbalance is a common problem in object detection tasks. Specifically, when creating a replay buffer, classes that are already under-represented in the data distribution may become scarcer, which can degrade detection performance. Therefore, it is important to ensure each class within the replay buffer has a minimum number of data. To address the issue, we propose the guarantee minimum (GM) method. This method maintains class-wise diversity in \mathcal{B}caligraphic_B and preserves the original data distribution 𝒟1:t1subscript𝒟:1𝑡1\mathcal{D}_{1:t-1}caligraphic_D start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT by ensuring a minimum of m𝑚mitalic_m samples for every class.

Data structure. We generate an extra dataset 𝒟esubscript𝒟𝑒\mathcal{D}_{e}caligraphic_D start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT by combining 𝒟tsubscript𝒟𝑡\mathcal{D}_{t}caligraphic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and 1:t1subscript:1𝑡1\mathcal{B}_{1:t-1}caligraphic_B start_POSTSUBSCRIPT 1 : italic_t - 1 end_POSTSUBSCRIPT, and uses input sample d𝑑ditalic_d as Eq. 2. Given Idsubscript𝐼𝑑I_{d}italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT and Udsubscript𝑈𝑑U_{d}italic_U start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT, we calculate the Ldsubscript𝐿𝑑L_{d}italic_L start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT with pre-trained model 1:tsubscript:1𝑡\mathcal{M}_{1:t}caligraphic_M start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT. Since we employ a transformer-based detector, we construct the loss function as follows:

Ld=LBbox+LGIoU+LLabelsubscript𝐿𝑑subscript𝐿Bboxsubscript𝐿GIoUsubscript𝐿Label\displaystyle L_{d}=L_{\text{Bbox}}+L_{\text{GIoU}}+L_{\text{Label}}italic_L start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT = italic_L start_POSTSUBSCRIPT Bbox end_POSTSUBSCRIPT + italic_L start_POSTSUBSCRIPT GIoU end_POSTSUBSCRIPT + italic_L start_POSTSUBSCRIPT Label end_POSTSUBSCRIPT (3)

where LBboxsubscript𝐿BboxL_{\text{Bbox}}italic_L start_POSTSUBSCRIPT Bbox end_POSTSUBSCRIPT and LGIoUsubscript𝐿GIoUL_{\text{GIoU}}italic_L start_POSTSUBSCRIPT GIoU end_POSTSUBSCRIPT represent L1 loss and generalized IOU loss [13] for bounding box. Additionally, LLabelsubscript𝐿LabelL_{\text{Label}}italic_L start_POSTSUBSCRIPT Label end_POSTSUBSCRIPT [9] is cross entropy with focal loss for label. If the buffer \mathcal{B}caligraphic_B has not reached its maximum capacity K𝐾Kitalic_K, the new data is directly added. However, once it attains full capacity, a strategic approach becomes necessary for data replacement.

Guarantee process. To replace the buffer samples with class-wise diversity, we first identify the sets of unique labels UB{U1,,UK}similar-tosubscript𝑈𝐵subscript𝑈1subscript𝑈𝐾U_{B}\sim\{U_{1},\ldots,U_{K}\}italic_U start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ∼ { italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_U start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT } in the \mathcal{B}caligraphic_B. After that, we introduce the set 𝒰𝒰\mathcal{U}caligraphic_U containing under-represented labels (i.e. class indexes) below a certain bound m𝑚mitalic_m:

𝒰={uUdcount(u,UB)<m}𝒰conditional-set𝑢subscript𝑈𝑑count𝑢subscript𝑈𝐵𝑚\displaystyle\mathcal{U}=\{u\in U_{d}\mid\text{count}(u,U_{B})<m\}caligraphic_U = { italic_u ∈ italic_U start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ∣ count ( italic_u , italic_U start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ) < italic_m } (4)

where, u𝑢uitalic_u is an element of the unique labels from the input data Udsubscript𝑈𝑑U_{d}italic_U start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT, and m𝑚mitalic_m represents the minimum guarantee value. Then, we select the replacement candidates set \mathcal{R}caligraphic_R which contains samples without labels from 𝒰𝒰\mathcal{U}caligraphic_U in \mathcal{B}caligraphic_B. If 𝒰𝒰\mathcal{U}caligraphic_U is empty (i.e. all classes above m𝑚mitalic_m), we choose all samples in \mathcal{B}caligraphic_B as replacement candidates \mathcal{R}caligraphic_R. This approach ensures that our buffer reflects the overall class distribution, while also covering the rare labels more effectively. Finally, we use buffer manager employing hierarchical sampling (Sec. 2.3) to compare \mathcal{R}caligraphic_R with a new sample. We summarize our GM algorithm in Alg. 1.

Input: \mathcal{B}caligraphic_B, \mathcal{R}caligraphic_R, 𝒰𝒰\mathcal{U}caligraphic_U, Id,Ld,Udsubscript𝐼𝑑subscript𝐿𝑑subscript𝑈𝑑I_{d},L_{d},U_{d}italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT , italic_L start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT , italic_U start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT // inputs from Alg. 1
define: ,{I:(L,U)}conditional-set𝐼𝐿𝑈\mathcal{B},\mathcal{R}\equiv\{I:(L,U)\}caligraphic_B , caligraphic_R ≡ { italic_I : ( italic_L , italic_U ) } // buffer & candidates format
min_Usubscriptmin_Uabsent\mathcal{R}_{\text{min\_U}}\leftarrowcaligraphic_R start_POSTSUBSCRIPT min_U end_POSTSUBSCRIPT ← min_U(\mathcal{R}caligraphic_R) // cond. 1: number of unique labels
Iopt,Lopt,Uoptsubscript𝐼optsubscript𝐿optsubscript𝑈optabsentI_{\text{opt}},L_{\text{opt}},U_{\text{opt}}\leftarrowitalic_I start_POSTSUBSCRIPT opt end_POSTSUBSCRIPT , italic_L start_POSTSUBSCRIPT opt end_POSTSUBSCRIPT , italic_U start_POSTSUBSCRIPT opt end_POSTSUBSCRIPT ← highest_L(min_Usubscriptmin_U\mathcal{R}_{\text{min\_U}}caligraphic_R start_POSTSUBSCRIPT min_U end_POSTSUBSCRIPT) // cond. 2: loss
if 𝒰=𝒰\mathcal{U}=\emptysetcaligraphic_U = ∅ then
       if Lopt>Ldsubscript𝐿optsubscript𝐿𝑑L_{\text{opt}}>L_{d}italic_L start_POSTSUBSCRIPT opt end_POSTSUBSCRIPT > italic_L start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT then
             del [Iopt]delimited-[]subscript𝐼opt\mathcal{B}[I_{\text{opt}}]caligraphic_B [ italic_I start_POSTSUBSCRIPT opt end_POSTSUBSCRIPT ] // delete data in buffer
             absent\mathcal{B}\leftarrowcaligraphic_B ← (Idsubscript𝐼𝑑I_{d}italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT, Ldsubscript𝐿𝑑L_{d}italic_L start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT, Udsubscript𝑈𝑑U_{d}italic_U start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT) // insert new data to buffer
            
      else
             no change
       end if
      
else
       del [Iopt]delimited-[]subscript𝐼opt\mathcal{B}[I_{\text{opt}}]caligraphic_B [ italic_I start_POSTSUBSCRIPT opt end_POSTSUBSCRIPT ] // delete data in buffer
       absent\mathcal{B}\leftarrowcaligraphic_B ← (Insubscript𝐼𝑛I_{n}italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, Ldsubscript𝐿𝑑L_{d}italic_L start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT, Unsubscript𝑈𝑛U_{n}italic_U start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT) // insert new data to buffer
      
end if
Output: \mathcal{B}caligraphic_B
Algorithm 2 BufferManager

2.3 Hierarchical sampling strategy

In this section, we introduce hierarchical sampling to create a buffer containing representative samples of the prior knowledge through two strategies: high number of unique labels and low loss. The high number of unique labels strategy [1] is used to diversify the buffer configuration, preserving previously learned labels within a limited capacity. However, when the buffer needs replacement, samples with an equally low number of unique labels are randomly replaced without specific conditions. Therefore, we use a low-loss approach for a more sophisticated configuration. In general, a low loss value indicate that the prediction is similar to the actual sample and the model has been well-trained on that particular sample. Thus, we prioritize data by using the loss for samples with the same number of unique labels. We utilize hierarchical sampling (summarized in Alg. 2) to compare replacement candidates \mathcal{R}caligraphic_R and the input sample. We allocate an additional epoch to process all configuration procedures.

Table 1: Incremental results for the COCO validation set using Deformable DETR in various scenarios. T1subscript𝑇1T_{1}italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (40 or 70) represents the previous classes, and T(1+2)subscript𝑇12T_{(1+2)}italic_T start_POSTSUBSCRIPT ( 1 + 2 ) end_POSTSUBSCRIPT (80) denotes testing for all classes. The best result is highlighted in bold.
Scenarios Method T1subscript𝑇1T_{1}italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT(Old) T(1+2)subscript𝑇12T_{(1+2)}italic_T start_POSTSUBSCRIPT ( 1 + 2 ) end_POSTSUBSCRIPT(Overall)
mAP.5:.95𝑚𝐴subscript𝑃:.5.95mAP_{.5:.95}italic_m italic_A italic_P start_POSTSUBSCRIPT .5 : .95 end_POSTSUBSCRIPT mAP.5𝑚𝐴subscript𝑃.5mAP_{.5}italic_m italic_A italic_P start_POSTSUBSCRIPT .5 end_POSTSUBSCRIPT mAP.75𝑚𝐴subscript𝑃.75mAP_{.75}italic_m italic_A italic_P start_POSTSUBSCRIPT .75 end_POSTSUBSCRIPT mAPS𝑚𝐴subscript𝑃𝑆mAP_{S}italic_m italic_A italic_P start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT mAPM𝑚𝐴subscript𝑃𝑀mAP_{M}italic_m italic_A italic_P start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT mAPL𝑚𝐴subscript𝑃𝐿mAP_{L}italic_m italic_A italic_P start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT mAP.5:.95𝑚𝐴subscript𝑃:.5.95mAP_{.5:.95}italic_m italic_A italic_P start_POSTSUBSCRIPT .5 : .95 end_POSTSUBSCRIPT mAP.5𝑚𝐴subscript𝑃.5mAP_{.5}italic_m italic_A italic_P start_POSTSUBSCRIPT .5 end_POSTSUBSCRIPT mAP.75𝑚𝐴subscript𝑃.75mAP_{.75}italic_m italic_A italic_P start_POSTSUBSCRIPT .75 end_POSTSUBSCRIPT mAPS𝑚𝐴subscript𝑃𝑆mAP_{S}italic_m italic_A italic_P start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT mAPM𝑚𝐴subscript𝑃𝑀mAP_{M}italic_m italic_A italic_P start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT mAPL𝑚𝐴subscript𝑃𝐿mAP_{L}italic_m italic_A italic_P start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT
70 + 10 CutMix [16] 0.087 0.207 0.065 0.028 0.098 0.141 0.086 0.206 0.063 0.034 0.097 0.135
RODEO [1] 0.064 0.109 0.066 0.042 0.097 0.091 0.094 0.151 0.100 0.056 0.127 0.137
Hard [10] 0.068 0.124 0.067 0.059 0.104 0.075 0.095 0.161 0.098 0.074 0.128 0.120
Ours w/o CER 0.179 0.288 0.192 0.089 0.209 0.238 0.190 0.304 0.203 0.097 0.218 0.261
Ours 0.213 0.334 0.231 0.104 0.237 0.295 0.221 0.345 0.240 0.114 0.246 0.308
40 + 40 CutMix [16] 0.131 0.286 0.104 0.058 0.150 0.201 0.135 0.295 0.106 0.051 0.148 0.212
RODEO [1] 0.095 0.153 0.099 0.073 0.113 0.103 0.233 0.343 0.252 0.130 0.256 0.311
Hard [10] 0.072 0.131 0.072 0.070 0.107 0.059 0.220 0.332 0.239 0.121 0.250 0.285
Ours w/o CER 0.168 0.271 0.176 0.099 0.199 0.194 0.270 0.405 0.293 0.144 0.297 0.367
Ours 0.222 0.356 0.234 0.125 0.255 0.296 0.271 0.419 0.294 0.136 0.296 0.376
Refer to caption
Figure 3: Multi-phase result.

2.4 Circular experience replay training

Previous CIOD replay methods [1, 10] used experience replay (ER) training method, which utilized a large buffer capacity to prevent forgetting a relatively small number of buffer data. However, this approach results in significant resource wastage. To address this issue, we propose the circular experience replay (CER) training strategy to make full use of the buffer which has limited capacity. First, we separates 𝒟t+1subscript𝒟𝑡1\mathcal{D}_{t+1}caligraphic_D start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT and 1:tsubscript:1𝑡\mathcal{B}_{1:t}caligraphic_B start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT to create the distinct training datasets. We then train the model with randomly selecting data from both datasets. The 1:tsubscript:1𝑡\mathcal{B}_{1:t}caligraphic_B start_POSTSUBSCRIPT 1 : italic_t end_POSTSUBSCRIPT is repeatedly utilized with uniform probability until all 𝒟t+1subscript𝒟𝑡1\mathcal{D}_{t+1}caligraphic_D start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT is fully used. To enhance the utilization of previous information, we apply CER training following ER training.

3 Experiments

3.1 Implementation and experiments

eBTS is based on Deformable DETR [18] trained from scratch on the COCO [8] for 50 epochs in each task. All experiments are performed using 4 RTX3090 GPUs with batch size of 3. In the first, two-phase, we incrementally train 40+40 and 70+10 divided classes. We evaluate the model on T1+2subscript𝑇12T_{1+2}italic_T start_POSTSUBSCRIPT 1 + 2 end_POSTSUBSCRIPT and T1subscript𝑇1T_{1}italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT to assess the degree of forgetting. In the second, multiple-phase, we train 40+20+20 divided classes. Then, we test the model by combining the added classes. To ensure a fair comparison, we only extracted the replay components from various CIOD methods [16, 1, 10] that use replay and trained them using our baseline. We kept all conditions identical, except for the buffer composition (random [16], high number of unique labels [1], many labels [10] and training method (original ER [1, 10], CutMix [17] based CutMix ER [16]). In all our experiments, we set the buffer capacity at around 1% (1200) of the COCO, and the least m𝑚mitalic_m set at 1% (12) of the buffer capacity.

Table 2: Comparison of the appropriate proportions of CER used with ER on COCO. The best result is highlighted in bold
phase 7010 4040
ER-CER Ratio T1subscript𝑇1T_{1}italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT T(1+2)subscript𝑇12T_{(1+2)}italic_T start_POSTSUBSCRIPT ( 1 + 2 ) end_POSTSUBSCRIPT T1subscript𝑇1T_{1}italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT T(1+2)subscript𝑇12T_{(1+2)}italic_T start_POSTSUBSCRIPT ( 1 + 2 ) end_POSTSUBSCRIPT
ER + CER AP AP AP AP
40 10 0.168 0.183 0.172 0.253
42 8 0.169 0.185 0.192 0.262
44 6 0.188 0.199 0.210 0.271
46 4 0.194 0.208 0.192 0.260
48 2 0.213 0.221 0.222 0.271

3.2 Experimental results

We analyze two-phase results using the mAP metric on COCO dataset [8]. In Table 1, we qualitatively show that our approach eBTS achieved state-of-the-art results in the T1subscript𝑇1T_{1}italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and T1+2subscript𝑇12T_{1+2}italic_T start_POSTSUBSCRIPT 1 + 2 end_POSTSUBSCRIPT. Furthermore, our method (“ours w/o CER”) performs well even without using the circular training strategy, in comparison to previous methods. This indicates the effectiveness of our buffer configuration algorithm, which includes guarantee minimum processing and a hierarchical sampling strategy, in retaining previous knowledge T1subscript𝑇1T_{1}italic_T start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. As shown in Fig. 3, our method (“Ours w/o CER”) also demonstrates good performance after training the last task in the multi-phase. To ensure a fair, we exclusively employed ER training for all methods, excluding CER from our complete algorithm and omitting CutMix training [17] used in CutMix [16]. CutMix demonstrates comparable performance to our approach at task 2, but becomes less effective as the number of classes to be collected increases.

3.3 Ablation

In Table 2, we demonstrate how the ratio of ER and CER is defined within the specified 50 epochs. The best performance is achieved with a 48:2 ratio in both the 70+10 and 40+40. Furthermore, Table 1 highlights that CER significantly improves performance in the 70+10 setup, where a larger number of classes need to be retained. The mAP increases from 0.190 to 0.221, compared to a smaller improvement from 0.270 to 0.271 in the 40+40.

4 Conclusion

In this paper, we propose an improved replay scheme to overcome the existing constraints in the class incremental object detection task. Our approach, eBTS, effectively manages the replay buffer with the guarantee minimum process and hierarchical sampling. In addition, we use a circular training strategy to address data imbalance. Our method demonstrates better performance in reducing catastrophic forgetting on the COCO dataset compared to existing methods. The ablation study demonstrates the optimal ratios for experience replay and circular experience replay. In future work, we aim to integrate our proposed method with other strategies to deal with the forgetting problem more effectively.

References

  • Acharya et al. [2020] Manoj Acharya, Tyler L Hayes, and Christopher Kanan. Rodeo: Replay for online object detection. BMVC, 2020.
  • Bang et al. [2021] Jihwan Bang, Heesu Kim, YoungJoon Yoo, Jung-Woo Ha, and Jonghyun Choi. Rainbow memory: Continual learning with a memory of diverse samples. In CVPR, 2021.
  • Chaudhry et al. [2018] Arslan Chaudhry, Puneet K Dokania, Thalaiyasingam Ajanthan, and Philip HS Torr. Riemannian walk for incremental learning: Understanding forgetting and intransigence. In ECCV, 2018.
  • Guo et al. [2020] Yunhui Guo, Mingrui Liu, Tianbao Yang, and Tajana Rosing. Improved schemes for episodic memory-based lifelong learning. In NIPS, 2020.
  • He et al. [2018] Chen He, Rui** Wang, Shiguang Shan, and Xilin Chen. Exemplar-supported generative reproduction for class incremental learning. In BMVC, 2018.
  • Kirkpatrick et al. [2017] James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. Overcoming catastrophic forgetting in neural networks. PNAS, 2017.
  • Koh et al. [2021] Hyunseo Koh, Dahyun Kim, Jung-Woo Ha, and Jonghyun Choi. Online continual learning on class incremental blurry task configuration with anytime inference. arXiv, 2021.
  • Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In ECCV, 2014.
  • Lin et al. [2017] Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. Focal loss for dense object detection. In ICCV, 2017.
  • Liu et al. [2020] Xialei Liu, Hao Yang, Avinash Ravichandran, Rahul Bhotika, and Stefano Soatto. Multi-task incremental learning for object detection. arXiv, 2020.
  • Lopez-Paz and Ranzato [2017] David Lopez-Paz and Marc’Aurelio Ranzato. Gradient episodic memory for continual learning. In NIPS, 2017.
  • Rebuffi et al. [2017] Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H Lampert. icarl: Incremental classifier and representation learning. In CVPR, 2017.
  • Rezatofighi et al. [2019] Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir Sadeghian, Ian Reid, and Silvio Savarese. Generalized intersection over union: A metric and a loss for bounding box regression. In CVPR, 2019.
  • Robins [1995] Anthony Robins. Catastrophic forgetting, rehearsal and pseudorehearsal. Connection Science, 1995.
  • Rolnick et al. [2019] David Rolnick, Arun Ahuja, Jonathan Schwarz, Timothy Lillicrap, and Gregory Wayne. Experience replay for continual learning. In NIPS, 2019.
  • Shieh et al. [2020] Jeng-Lun Shieh, Qazi Mazhar ul Haq, Muhamad Amirul Haq, Said Karam, Peter Chondro, De-Qin Gao, and Shanq-Jang Ruan. Continual learning strategy in one-stage object detection framework based on experience replay for autonomous driving vehicle. Sensors, 2020.
  • Yun et al. [2019] Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, and Youngjoon Yoo. Cutmix: Regularization strategy to train strong classifiers with localizable features. In ICCV, 2019.
  • Zhu et al. [2020] Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai. Deformable detr: Deformable transformers for end-to-end object detection. ICLR, 2020.