-
It's All in the Head: Representation Knowledge Distillation through Classifier Sharing
Authors:
Emanuel Ben-Baruch,
Matan Karklinsky,
Yossi Biton,
Avi Ben-Cohen,
Hussam Lawen,
Nadav Zamir
Abstract:
Representation knowledge distillation aims at transferring rich information from one model to another. Common approaches for representation distillation mainly focus on the direct minimization of distance metrics between the models' embedding vectors. Such direct methods may be limited in transferring high-order dependencies embedded in the representation vectors, or in handling the capacity gap b…
▽ More
Representation knowledge distillation aims at transferring rich information from one model to another. Common approaches for representation distillation mainly focus on the direct minimization of distance metrics between the models' embedding vectors. Such direct methods may be limited in transferring high-order dependencies embedded in the representation vectors, or in handling the capacity gap between the teacher and student models. Moreover, in standard knowledge distillation, the teacher is trained without awareness of the student's characteristics and capacity. In this paper, we explore two mechanisms for enhancing representation distillation using classifier sharing between the teacher and student. We first investigate a simple scheme where the teacher's classifier is connected to the student backbone, acting as an additional classification head. Then, we propose a student-aware mechanism that asks to tailor the teacher model to a student with limited capacity by training the teacher with a temporary student's head. We analyze and compare these two mechanisms and show their effectiveness on various datasets and tasks, including image classification, fine-grained classification, and face verification. In particular, we achieve state-of-the-art results for face verification on the IJB-C dataset for a MobileFaceNet model: TAR@(FAR=1e-5)=93.7\%. Code is available at https://github.com/Alibaba-MIIL/HeadSharingKD.
△ Less
Submitted 5 April, 2022; v1 submitted 18 January, 2022;
originally announced January 2022.
-
ML-Decoder: Scalable and Versatile Classification Head
Authors:
Tal Ridnik,
Gilad Sharir,
Avi Ben-Cohen,
Emanuel Ben-Baruch,
Asaf Noy
Abstract:
In this paper, we introduce ML-Decoder, a new attention-based classification head. ML-Decoder predicts the existence of class labels via queries, and enables better utilization of spatial data compared to global average pooling. By redesigning the decoder architecture, and using a novel group-decoding scheme, ML-Decoder is highly efficient, and can scale well to thousands of classes. Compared to u…
▽ More
In this paper, we introduce ML-Decoder, a new attention-based classification head. ML-Decoder predicts the existence of class labels via queries, and enables better utilization of spatial data compared to global average pooling. By redesigning the decoder architecture, and using a novel group-decoding scheme, ML-Decoder is highly efficient, and can scale well to thousands of classes. Compared to using a larger backbone, ML-Decoder consistently provides a better speed-accuracy trade-off. ML-Decoder is also versatile - it can be used as a drop-in replacement for various classification heads, and generalize to unseen classes when operated with word queries. Novel query augmentations further improve its generalization ability. Using ML-Decoder, we achieve state-of-the-art results on several classification tasks: on MS-COCO multi-label, we reach 91.4% mAP; on NUS-WIDE zero-shot, we reach 31.1% ZSL mAP; and on ImageNet single-label, we reach with vanilla ResNet50 backbone a new top score of 80.7%, without extra data or distillation. Public code is available at: https://github.com/Alibaba-MIIL/ML_Decoder
△ Less
Submitted 31 December, 2021; v1 submitted 25 November, 2021;
originally announced November 2021.
-
Multi-label Classification with Partial Annotations using Class-aware Selective Loss
Authors:
Emanuel Ben-Baruch,
Tal Ridnik,
Itamar Friedman,
Avi Ben-Cohen,
Nadav Zamir,
Asaf Noy,
Lihi Zelnik-Manor
Abstract:
Large-scale multi-label classification datasets are commonly, and perhaps inevitably, partially annotated. That is, only a small subset of labels are annotated per sample. Different methods for handling the missing labels induce different properties on the model and impact its accuracy. In this work, we analyze the partial labeling problem, then propose a solution based on two key ideas. First, un…
▽ More
Large-scale multi-label classification datasets are commonly, and perhaps inevitably, partially annotated. That is, only a small subset of labels are annotated per sample. Different methods for handling the missing labels induce different properties on the model and impact its accuracy. In this work, we analyze the partial labeling problem, then propose a solution based on two key ideas. First, un-annotated labels should be treated selectively according to two probability quantities: the class distribution in the overall dataset and the specific label likelihood for a given data sample. We propose to estimate the class distribution using a dedicated temporary model, and we show its improved efficiency over a naive estimation computed using the dataset's partial annotations. Second, during the training of the target model, we emphasize the contribution of annotated labels over originally un-annotated labels by using a dedicated asymmetric loss. With our novel approach, we achieve state-of-the-art results on OpenImages dataset (e.g. reaching 87.3 mAP on V6). In addition, experiments conducted on LVIS and simulated-COCO demonstrate the effectiveness of our approach. Code is available at https://github.com/Alibaba-MIIL/PartialLabelingCSL.
△ Less
Submitted 21 October, 2021;
originally announced October 2021.
-
Semantic Diversity Learning for Zero-Shot Multi-label Classification
Authors:
Avi Ben-Cohen,
Nadav Zamir,
Emanuel Ben Baruch,
Itamar Friedman,
Lihi Zelnik-Manor
Abstract:
Training a neural network model for recognizing multiple labels associated with an image, including identifying unseen labels, is challenging, especially for images that portray numerous semantically diverse labels. As challenging as this task is, it is an essential task to tackle since it represents many real-world cases, such as image retrieval of natural images. We argue that using a single emb…
▽ More
Training a neural network model for recognizing multiple labels associated with an image, including identifying unseen labels, is challenging, especially for images that portray numerous semantically diverse labels. As challenging as this task is, it is an essential task to tackle since it represents many real-world cases, such as image retrieval of natural images. We argue that using a single embedding vector to represent an image, as commonly practiced, is not sufficient to rank both relevant seen and unseen labels accurately. This study introduces an end-to-end model training for multi-label zero-shot learning that supports semantic diversity of the images and labels. We propose to use an embedding matrix having principal embedding vectors trained using a tailored loss function. In addition, during training, we suggest up-weighting in the loss function image samples presenting higher semantic diversity to encourage the diversity of the embedding matrix. Extensive experiments show that our proposed method improves the zero-shot model's quality in tag-based image retrieval achieving SoTA results on several common datasets (NUS-Wide, COCO, Open Images).
△ Less
Submitted 12 May, 2021;
originally announced May 2021.
-
Compact Network Training for Person ReID
Authors:
Hussam Lawen,
Avi Ben-Cohen,
Matan Protter,
Itamar Friedman,
Lihi Zelnik-Manor
Abstract:
The task of person re-identification (ReID) has attracted growing attention in recent years leading to improved performance, albeit with little focus on real-world applications. Most SotA methods are based on heavy pre-trained models, e.g. ResNet50 (~25M parameters), which makes them less practical and more tedious to explore architecture modifications. In this study, we focus on a small-sized ran…
▽ More
The task of person re-identification (ReID) has attracted growing attention in recent years leading to improved performance, albeit with little focus on real-world applications. Most SotA methods are based on heavy pre-trained models, e.g. ResNet50 (~25M parameters), which makes them less practical and more tedious to explore architecture modifications. In this study, we focus on a small-sized randomly initialized model that enables us to easily introduce architecture and training modifications suitable for person ReID. The outcomes of our study are a compact network and a fitting training regime. We show the robustness of the network by outperforming the SotA on both Market1501 and DukeMTMC. Furthermore, we show the representation power of our ReID network via SotA results on a different task of multi-object tracking.
△ Less
Submitted 9 April, 2020; v1 submitted 15 October, 2019;
originally announced October 2019.
-
The Liver Tumor Segmentation Benchmark (LiTS)
Authors:
Patrick Bilic,
Patrick Christ,
Hongwei Bran Li,
Eugene Vorontsov,
Avi Ben-Cohen,
Georgios Kaissis,
Adi Szeskin,
Colin Jacobs,
Gabriel Efrain Humpire Mamani,
Gabriel Chartrand,
Fabian Lohöfer,
Julian Walter Holch,
Wieland Sommer,
Felix Hofmann,
Alexandre Hostettler,
Naama Lev-Cohain,
Michal Drozdzal,
Michal Marianne Amitai,
Refael Vivantik,
Jacob Sosna,
Ivan Ezhov,
Anjany Sekuboyina,
Fernando Navarro,
Florian Kofler,
Johannes C. Paetzold
, et al. (84 additional authors not shown)
Abstract:
In this work, we report the set-up and results of the Liver Tumor Segmentation Benchmark (LiTS), which was organized in conjunction with the IEEE International Symposium on Biomedical Imaging (ISBI) 2017 and the International Conferences on Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2017 and 2018. The image dataset is diverse and contains primary and secondary tumors with…
▽ More
In this work, we report the set-up and results of the Liver Tumor Segmentation Benchmark (LiTS), which was organized in conjunction with the IEEE International Symposium on Biomedical Imaging (ISBI) 2017 and the International Conferences on Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2017 and 2018. The image dataset is diverse and contains primary and secondary tumors with varied sizes and appearances with various lesion-to-background levels (hyper-/hypo-dense), created in collaboration with seven hospitals and research institutions. Seventy-five submitted liver and liver tumor segmentation algorithms were trained on a set of 131 computed tomography (CT) volumes and were tested on 70 unseen test images acquired from different patients. We found that not a single algorithm performed best for both liver and liver tumors in the three events. The best liver segmentation algorithm achieved a Dice score of 0.963, whereas, for tumor segmentation, the best algorithms achieved Dices scores of 0.674 (ISBI 2017), 0.702 (MICCAI 2017), and 0.739 (MICCAI 2018). Retrospectively, we performed additional analysis on liver tumor detection and revealed that not all top-performing segmentation algorithms worked well for tumor detection. The best liver tumor detection method achieved a lesion-wise recall of 0.458 (ISBI 2017), 0.515 (MICCAI 2017), and 0.554 (MICCAI 2018), indicating the need for further research. LiTS remains an active benchmark and resource for research, e.g., contributing the liver-related segmentation tasks in \url{http://medicaldecathlon.com/}. In addition, both data and online evaluation are accessible via \url{www.lits-challenge.com}.
△ Less
Submitted 25 November, 2022; v1 submitted 13 January, 2019;
originally announced January 2019.
-
Improving CNN Training using Disentanglement for Liver Lesion Classification in CT
Authors:
Avi Ben-Cohen,
Roey Mechrez,
Noa Yedidia,
Hayit Greenspan
Abstract:
Training data is the key component in designing algorithms for medical image analysis and in many cases it is the main bottleneck in achieving good results. Recent progress in image generation has enabled the training of neural network based solutions using synthetic data. A key factor in the generation of new samples is controlling the important appearance features and potentially being able to g…
▽ More
Training data is the key component in designing algorithms for medical image analysis and in many cases it is the main bottleneck in achieving good results. Recent progress in image generation has enabled the training of neural network based solutions using synthetic data. A key factor in the generation of new samples is controlling the important appearance features and potentially being able to generate a new sample of a specific class with different variants. In this work we suggest the synthesis of new data by mixing the class specified and unspecified representation of different factors in the training data. Our experiments on liver lesion classification in CT show an average improvement of 7.4% in accuracy over the baseline training scheme.
△ Less
Submitted 1 November, 2018;
originally announced November 2018.
-
Improving the Segmentation of Anatomical Structures in Chest Radiographs using U-Net with an ImageNet Pre-trained Encoder
Authors:
Maayan Frid-Adar,
Avi Ben-Cohen,
Rula Amer,
Hayit Greenspan
Abstract:
Accurate segmentation of anatomical structures in chest radiographs is essential for many computer-aided diagnosis tasks. In this paper we investigate the latest fully-convolutional architectures for the task of multi-class segmentation of the lungs field, heart and clavicles in a chest radiograph. In addition, we explore the influence of using different loss functions in the training process of a…
▽ More
Accurate segmentation of anatomical structures in chest radiographs is essential for many computer-aided diagnosis tasks. In this paper we investigate the latest fully-convolutional architectures for the task of multi-class segmentation of the lungs field, heart and clavicles in a chest radiograph. In addition, we explore the influence of using different loss functions in the training process of a neural network for semantic segmentation. We evaluate all models on a common benchmark of 247 X-ray images from the JSRT database and ground-truth segmentation masks from the SCR dataset. Our best performing architecture, is a modified U-Net that benefits from pre-trained encoder weights. This model outperformed the current state-of-the-art methods tested on the same benchmark, with Jaccard overlap scores of 96.1% for lung fields, 90.6% for heart and 85.5% for clavicles.
△ Less
Submitted 4 October, 2018;
originally announced October 2018.
-
Cross-Modality Synthesis from CT to PET using FCN and GAN Networks for Improved Automated Lesion Detection
Authors:
Avi Ben-Cohen,
Eyal Klang,
Stephen P. Raskin,
Shelly Soffer,
Simona Ben-Haim,
Eli Konen,
Michal Marianne Amitai,
Hayit Greenspan
Abstract:
In this work we present a novel system for generation of virtual PET images using CT scans. We combine a fully convolutional network (FCN) with a conditional generative adversarial network (GAN) to generate simulated PET data from given input CT data. The synthesized PET can be used for false-positive reduction in lesion detection solutions. Clinically, such solutions may enable lesion detection a…
▽ More
In this work we present a novel system for generation of virtual PET images using CT scans. We combine a fully convolutional network (FCN) with a conditional generative adversarial network (GAN) to generate simulated PET data from given input CT data. The synthesized PET can be used for false-positive reduction in lesion detection solutions. Clinically, such solutions may enable lesion detection and drug treatment evaluation in a CT-only environment, thus reducing the need for the more expensive and radioactive PET/CT scan. Our dataset includes 60 PET/CT scans from Sheba Medical center. We used 23 scans for training and 37 for testing. Different schemes to achieve the synthesized output were qualitatively compared. Quantitative evaluation was conducted using an existing lesion detection software, combining the synthesized PET as a false positive reduction layer for the detection of malignant lesions in the liver. Current results look promising showing a 28% reduction in the average false positive per case from 2.9 to 2.1. The suggested solution is comprehensive and can be expanded to additional body organs, and different modalities.
△ Less
Submitted 23 July, 2018; v1 submitted 21 February, 2018;
originally announced February 2018.
-
Anatomical Data Augmentation For CNN based Pixel-wise Classification
Authors:
Avi Ben-Cohen,
Eyal Klang,
Michal Marianne Amitai,
Jacob Goldberger,
Hayit Greenspan
Abstract:
In this work we propose a method for anatomical data augmentation that is based on using slices of computed tomography (CT) examinations that are adjacent to labeled slices as another resource of labeled data for training the network. The extended labeled data is used to train a U-net network for a pixel-wise classification into different hepatic lesions and normal liver tissues. Our dataset conta…
▽ More
In this work we propose a method for anatomical data augmentation that is based on using slices of computed tomography (CT) examinations that are adjacent to labeled slices as another resource of labeled data for training the network. The extended labeled data is used to train a U-net network for a pixel-wise classification into different hepatic lesions and normal liver tissues. Our dataset contains CT examinations from 140 patients with 333 CT images annotated by an expert radiologist. We tested our approach and compared it to the conventional training process. Results indicate superiority of our method. Using the anatomical data augmentation we achieved an improvement of 3% in the success rate, 5% in the classification accuracy, and 4% in Dice.
△ Less
Submitted 7 January, 2018;
originally announced January 2018.
-
Virtual PET Images from CT Data Using Deep Convolutional Networks: Initial Results
Authors:
Avi Ben-Cohen,
Eyal Klang,
Stephen P. Raskin,
Michal Marianne Amitai,
Hayit Greenspan
Abstract:
In this work we present a novel system for PET estimation using CT scans. We explore the use of fully convolutional networks (FCN) and conditional generative adversarial networks (GAN) to export PET data from CT data. Our dataset includes 25 pairs of PET and CT scans where 17 were used for training and 8 for testing. The system was tested for detection of malignant tumors in the liver region. Init…
▽ More
In this work we present a novel system for PET estimation using CT scans. We explore the use of fully convolutional networks (FCN) and conditional generative adversarial networks (GAN) to export PET data from CT data. Our dataset includes 25 pairs of PET and CT scans where 17 were used for training and 8 for testing. The system was tested for detection of malignant tumors in the liver region. Initial results look promising showing high detection performance with a TPR of 92.3% and FPR of 0.25 per case. Future work entails expansion of the current system to the entire body using a much larger dataset. Such a system can be used for tumor detection and drug treatment evaluation in a CT-only environment instead of the expansive and radioactive PET-CT scan.
△ Less
Submitted 30 July, 2017;
originally announced July 2017.