-
NIFF: Alleviating Forgetting in Generalized Few-Shot Object Detection via Neural Instance Feature Forging
Authors:
Karim Guirguis,
Johannes Meier,
George Eskandar,
Matthias Kayser,
Bin Yang,
Juergen Beyerer
Abstract:
Privacy and memory are two recurring themes in a broad conversation about the societal impact of AI. These concerns arise from the need for huge amounts of data to train deep neural networks. A promise of Generalized Few-shot Object Detection (G-FSOD), a learning paradigm in AI, is to alleviate the need for collecting abundant training samples of novel classes we wish to detect by leveraging prior…
▽ More
Privacy and memory are two recurring themes in a broad conversation about the societal impact of AI. These concerns arise from the need for huge amounts of data to train deep neural networks. A promise of Generalized Few-shot Object Detection (G-FSOD), a learning paradigm in AI, is to alleviate the need for collecting abundant training samples of novel classes we wish to detect by leveraging prior knowledge from old classes (i.e., base classes). G-FSOD strives to learn these novel classes while alleviating catastrophic forgetting of the base classes. However, existing approaches assume that the base images are accessible, an assumption that does not hold when sharing and storing data is problematic. In this work, we propose the first data-free knowledge distillation (DFKD) approach for G-FSOD that leverages the statistics of the region of interest (RoI) features from the base model to forge instance-level features without accessing the base images. Our contribution is three-fold: (1) we design a standalone lightweight generator with (2) class-wise heads (3) to generate and replay diverse instance-level base features to the RoI head while finetuning on the novel data. This stands in contrast to standard DFKD approaches in image classification, which invert the entire network to generate base images. Moreover, we make careful design choices in the novel finetuning pipeline to regularize the model. We show that our approach can dramatically reduce the base memory requirements, all while setting a new standard for G-FSOD on the challenging MS-COCO and PASCAL-VOC benchmarks.
△ Less
Submitted 8 March, 2023;
originally announced March 2023.
-
Towards Discriminative and Transferable One-Stage Few-Shot Object Detectors
Authors:
Karim Guirguis,
Mohamed Abdelsamad,
George Eskandar,
Ahmed Hendawy,
Matthias Kayser,
Bin Yang,
Juergen Beyerer
Abstract:
Recent object detection models require large amounts of annotated data for training a new classes of objects. Few-shot object detection (FSOD) aims to address this problem by learning novel classes given only a few samples. While competitive results have been achieved using two-stage FSOD detectors, typically one-stage FSODs underperform compared to them. We make the observation that the large gap…
▽ More
Recent object detection models require large amounts of annotated data for training a new classes of objects. Few-shot object detection (FSOD) aims to address this problem by learning novel classes given only a few samples. While competitive results have been achieved using two-stage FSOD detectors, typically one-stage FSODs underperform compared to them. We make the observation that the large gap in performance between two-stage and one-stage FSODs are mainly due to their weak discriminability, which is explained by a small post-fusion receptive field and a small number of foreground samples in the loss function. To address these limitations, we propose the Few-shot RetinaNet (FSRN) that consists of: a multi-way support training strategy to augment the number of foreground samples for dense meta-detectors, an early multi-level feature fusion providing a wide receptive field that covers the whole anchor area and two augmentation techniques on query and source images to enhance transferability. Extensive experiments show that the proposed approach addresses the limitations and boosts both discriminability and transferability. FSRN is almost two times faster than two-stage FSODs while remaining competitive in accuracy, and it outperforms the state-of-the-art of one-stage meta-detectors and also some two-stage FSODs on the MS-COCO and PASCAL VOC benchmarks.
△ Less
Submitted 11 October, 2022;
originally announced October 2022.
-
Explaining Chest X-ray Pathologies in Natural Language
Authors:
Maxime Kayser,
Cornelius Emde,
Oana-Maria Camburu,
Guy Parsons,
Bartlomiej Papiez,
Thomas Lukasiewicz
Abstract:
Most deep learning algorithms lack explanations for their predictions, which limits their deployment in clinical practice. Approaches to improve explainability, especially in medical imaging, have often been shown to convey limited information, be overly reassuring, or lack robustness. In this work, we introduce the task of generating natural language explanations (NLEs) to justify predictions mad…
▽ More
Most deep learning algorithms lack explanations for their predictions, which limits their deployment in clinical practice. Approaches to improve explainability, especially in medical imaging, have often been shown to convey limited information, be overly reassuring, or lack robustness. In this work, we introduce the task of generating natural language explanations (NLEs) to justify predictions made on medical images. NLEs are human-friendly and comprehensive, and enable the training of intrinsically explainable models. To this goal, we introduce MIMIC-NLE, the first, large-scale, medical imaging dataset with NLEs. It contains over 38,000 NLEs, which explain the presence of various thoracic pathologies and chest X-ray findings. We propose a general approach to solve the task and evaluate several architectures on this dataset, including via clinician assessment.
△ Less
Submitted 9 July, 2022;
originally announced July 2022.
-
CFA: Constraint-based Finetuning Approach for Generalized Few-Shot Object Detection
Authors:
Karim Guirguis,
Ahmed Hendawy,
George Eskandar,
Mohamed Abdelsamad,
Matthias Kayser,
Juergen Beyerer
Abstract:
Few-shot object detection (FSOD) seeks to detect novel categories with limited data by leveraging prior knowledge from abundant base data. Generalized few-shot object detection (G-FSOD) aims to tackle FSOD without forgetting previously seen base classes and, thus, accounts for a more realistic scenario, where both classes are encountered during test time. While current FSOD methods suffer from cat…
▽ More
Few-shot object detection (FSOD) seeks to detect novel categories with limited data by leveraging prior knowledge from abundant base data. Generalized few-shot object detection (G-FSOD) aims to tackle FSOD without forgetting previously seen base classes and, thus, accounts for a more realistic scenario, where both classes are encountered during test time. While current FSOD methods suffer from catastrophic forgetting, G-FSOD addresses this limitation yet exhibits a performance drop on novel tasks compared to the state-of-the-art FSOD. In this work, we propose a constraint-based finetuning approach (CFA) to alleviate catastrophic forgetting, while achieving competitive results on the novel task without increasing the model capacity. CFA adapts a continual learning method, namely Average Gradient Episodic Memory (A-GEM) to G-FSOD. Specifically, more constraints on the gradient search strategy are imposed from which a new gradient update rule is derived, allowing for better knowledge exchange between base and novel classes. To evaluate our method, we conduct extensive experiments on MS-COCO and PASCAL-VOC datasets. Our method outperforms current FSOD and G-FSOD approaches on the novel task with minor degeneration on the base task. Moreover, CFA is orthogonal to FSOD approaches and operates as a plug-and-play module without increasing the model capacity or inference time.
△ Less
Submitted 11 April, 2022;
originally announced April 2022.
-
Few-Shot Object Detection in Unseen Domains
Authors:
Karim Guirguis,
George Eskandar,
Matthias Kayser,
Bin Yang,
Juergen Beyerer
Abstract:
Few-shot object detection (FSOD) has thrived in recent years to learn novel object classes with limited data by transferring knowledge gained on abundant base classes. FSOD approaches commonly assume that both the scarcely provided examples of novel classes and test-time data belong to the same domain. However, this assumption does not hold in various industrial and robotics applications, where a…
▽ More
Few-shot object detection (FSOD) has thrived in recent years to learn novel object classes with limited data by transferring knowledge gained on abundant base classes. FSOD approaches commonly assume that both the scarcely provided examples of novel classes and test-time data belong to the same domain. However, this assumption does not hold in various industrial and robotics applications, where a model can learn novel classes from a source domain while inferring on classes from a target domain. In this work, we address the task of zero-shot domain adaptation, also known as domain generalization, for FSOD. Specifically, we assume that neither images nor labels of the novel classes in the target domain are available during training. Our approach for solving the domain gap is two-fold. First, we leverage a meta-training paradigm, where we learn the domain shift on the base classes, then transfer the domain knowledge to the novel classes. Second, we propose various data augmentations techniques on the few shots of novel classes to account for all possible domain-specific information. To constraint the network into encoding domain-agnostic class-specific representations only, a contrastive loss is proposed to maximize the mutual information between foreground proposals and class embeddings and reduce the network's bias to the background information from target domain. Our experiments on the T-LESS, PASCAL-VOC, and ExDark datasets show that the proposed approach succeeds in alleviating the domain gap considerably without utilizing labels or images of novel categories from the target domain.
△ Less
Submitted 19 September, 2022; v1 submitted 11 April, 2022;
originally announced April 2022.
-
e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks
Authors:
Maxime Kayser,
Oana-Maria Camburu,
Leonard Salewski,
Cornelius Emde,
Virginie Do,
Zeynep Akata,
Thomas Lukasiewicz
Abstract:
Recently, there has been an increasing number of efforts to introduce models capable of generating natural language explanations (NLEs) for their predictions on vision-language (VL) tasks. Such models are appealing, because they can provide human-friendly and comprehensive explanations. However, there is a lack of comparison between existing methods, which is due to a lack of re-usable evaluation…
▽ More
Recently, there has been an increasing number of efforts to introduce models capable of generating natural language explanations (NLEs) for their predictions on vision-language (VL) tasks. Such models are appealing, because they can provide human-friendly and comprehensive explanations. However, there is a lack of comparison between existing methods, which is due to a lack of re-usable evaluation frameworks and a scarcity of datasets. In this work, we introduce e-ViL and e-SNLI-VE. e-ViL is a benchmark for explainable vision-language tasks that establishes a unified evaluation framework and provides the first comprehensive comparison of existing approaches that generate NLEs for VL tasks. It spans four models and three datasets and both automatic metrics and human evaluation are used to assess model-generated explanations. e-SNLI-VE is currently the largest existing VL dataset with NLEs (over 430k instances). We also propose a new model that combines UNITER, which learns joint embeddings of images and text, and GPT-2, a pre-trained language model that is well-suited for text generation. It surpasses the previous state of the art by a large margin across all datasets. Code and data are available here: https://github.com/maximek3/e-ViL.
△ Less
Submitted 18 August, 2021; v1 submitted 8 May, 2021;
originally announced May 2021.
-
Compressing Transformer-Based Semantic Parsing Models using Compositional Code Embeddings
Authors:
Prafull Prakash,
Saurabh Kumar Shashidhar,
Wenlong Zhao,
Subendhu Rongali,
Haidar Khan,
Michael Kayser
Abstract:
The current state-of-the-art task-oriented semantic parsing models use BERT or RoBERTa as pretrained encoders; these models have huge memory footprints. This poses a challenge to their deployment for voice assistants such as Amazon Alexa and Google Assistant on edge devices with limited memory budgets. We propose to learn compositional code embeddings to greatly reduce the sizes of BERT-base and R…
▽ More
The current state-of-the-art task-oriented semantic parsing models use BERT or RoBERTa as pretrained encoders; these models have huge memory footprints. This poses a challenge to their deployment for voice assistants such as Amazon Alexa and Google Assistant on edge devices with limited memory budgets. We propose to learn compositional code embeddings to greatly reduce the sizes of BERT-base and RoBERTa-base. We also apply the technique to DistilBERT, ALBERT-base, and ALBERT-large, three already compressed BERT variants which attain similar state-of-the-art performances on semantic parsing with much smaller model sizes. We observe 95.15% ~ 98.46% embedding compression rates and 20.47% ~ 34.22% encoder compression rates, while preserving greater than 97.5% semantic parsing performances. We provide the recipe for training and analyze the trade-off between code embedding sizes and downstream performances.
△ Less
Submitted 10 October, 2020;
originally announced October 2020.
-
Understanding the effects of artifacts on automated polyp detection and incorporating that knowledge via learning without forgetting
Authors:
Maxime Kayser,
Roger D. Soberanis-Mukul,
Anna-Maria Zvereva,
Peter Klare,
Nassir Navab,
Shadi Albarqouni
Abstract:
Survival rates for colorectal cancer are higher when polyps are detected at an early stage and can be removed before they develop into malignant tumors. Automated polyp detection, which is dominated by deep learning based methods, seeks to improve early detection of polyps. However, current efforts rely heavily on the size and quality of the training datasets. The quality of these datasets often s…
▽ More
Survival rates for colorectal cancer are higher when polyps are detected at an early stage and can be removed before they develop into malignant tumors. Automated polyp detection, which is dominated by deep learning based methods, seeks to improve early detection of polyps. However, current efforts rely heavily on the size and quality of the training datasets. The quality of these datasets often suffers from various image artifacts that affect the visibility and hence, the detection rate. In this work, we conducted a systematic analysis to gain a better understanding of how artifacts affect automated polyp detection. We look at how six different artifact classes, and their location in an image, affect the performance of a RetinaNet based polyp detection model. We found that, depending on the artifact class, they can either benefit or harm the polyp detector. For instance, bubbles are often misclassified as polyps, while specular reflections inside of a polyp region can improve detection capabilities. We then investigated different strategies, such as a learning without forgetting framework, to leverage artifact knowledge to improve automated polyp detection. Our results show that such models can mitigate some of the harmful effects of artifacts, but require more work to significantly improve polyp detection capabilities.
△ Less
Submitted 22 August, 2020; v1 submitted 7 February, 2020;
originally announced February 2020.