Search | arXiv e-print repository

Robust and Resource-Efficient Data-Free Knowledge Distillation by Generative Pseudo Replay

Authors: Kuluhan Binici, Shivam Aggarwal, Nam Trung Pham, Karianto Leman, Tulika Mitra

Abstract: Data-Free Knowledge Distillation (KD) allows knowledge transfer from a trained neural network (teacher) to a more compact one (student) in the absence of original training data. Existing works use a validation set to monitor the accuracy of the student over real data and report the highest performance throughout the entire process. However, validation data may not be available at distillation time… ▽ More Data-Free Knowledge Distillation (KD) allows knowledge transfer from a trained neural network (teacher) to a more compact one (student) in the absence of original training data. Existing works use a validation set to monitor the accuracy of the student over real data and report the highest performance throughout the entire process. However, validation data may not be available at distillation time either, making it infeasible to record the student snapshot that achieved the peak accuracy. Therefore, a practical data-free KD method should be robust and ideally provide monotonically increasing student accuracy during distillation. This is challenging because the student experiences knowledge degradation due to the distribution shift of the synthetic data. A straightforward approach to overcome this issue is to store and rehearse the generated samples periodically, which increases the memory footprint and creates privacy concerns. We propose to model the distribution of the previously observed synthetic samples with a generative network. In particular, we design a Variational Autoencoder (VAE) with a training objective that is customized to learn the synthetic data representations optimally. The student is rehearsed by the generative pseudo replay technique, with samples produced by the VAE. Hence knowledge degradation can be prevented without storing any samples. Experiments on image classification benchmarks show that our method optimizes the expected value of the distilled model accuracy while eliminating the large memory overhead incurred by the sample-storing methods. △ Less

Submitted 21 March, 2022; v1 submitted 9 January, 2022; originally announced January 2022.

Comments: Accepted by the Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI-22)

arXiv:2108.05698 [pdf, other]

Preventing Catastrophic Forgetting and Distribution Mismatch in Knowledge Distillation via Synthetic Data

Authors: Kuluhan Binici, Nam Trung Pham, Tulika Mitra, Karianto Leman

Abstract: With the increasing popularity of deep learning on edge devices, compressing large neural networks to meet the hardware requirements of resource-constrained devices became a significant research direction. Numerous compression methodologies are currently being used to reduce the memory sizes and energy consumption of neural networks. Knowledge distillation (KD) is among such methodologies and it f… ▽ More With the increasing popularity of deep learning on edge devices, compressing large neural networks to meet the hardware requirements of resource-constrained devices became a significant research direction. Numerous compression methodologies are currently being used to reduce the memory sizes and energy consumption of neural networks. Knowledge distillation (KD) is among such methodologies and it functions by using data samples to transfer the knowledge captured by a large model (teacher) to a smaller one(student). However, due to various reasons, the original training data might not be accessible at the compression stage. Therefore, data-free model compression is an ongoing research problem that has been addressed by various works. In this paper, we point out that catastrophic forgetting is a problem that can potentially be observed in existing data-free distillation methods. Moreover, the sample generation strategies in some of these methods could result in a mismatch between the synthetic and real data distributions. To prevent such problems, we propose a data-free KD framework that maintains a dynamic collection of generated samples over time. Additionally, we add the constraint of matching the real data distribution in sample generation strategies that target maximum information gain. Our experiments demonstrate that we can improve the accuracy of the student models obtained via KD when compared with state-of-the-art approaches on the SVHN, Fashion MNIST and CIFAR100 datasets. △ Less

Submitted 5 November, 2021; v1 submitted 11 August, 2021; originally announced August 2021.

Comments: Accepted by the 2022 Winter Conference on Applications of Computer Vision (WACV 2022)

Journal ref: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2022, pp. 663-671

arXiv:1806.10278 [pdf, other]

Feature-less Stitching of Cylindrical Tunnel

Authors: Ramanpreet Singh Pahwa, Wei Kiat Leong, Shaohui Foong, Karianto Leman, Minh N. Do

Abstract: Traditional image stitching algorithms use transforms such as homography to combine different views of a scene. They usually work well when the scene is planar or when the camera is only rotated, kee** its position static. This severely limits their use in real world scenarios where an unmanned aerial vehicle (UAV) potentially hovers around and flies in an enclosed area while rotating to capture… ▽ More Traditional image stitching algorithms use transforms such as homography to combine different views of a scene. They usually work well when the scene is planar or when the camera is only rotated, kee** its position static. This severely limits their use in real world scenarios where an unmanned aerial vehicle (UAV) potentially hovers around and flies in an enclosed area while rotating to capture a video sequence. We utilize known scene geometry along with recorded camera trajectory to create cylindrical images captured in a given environment such as a tunnel where the camera rotates around its center. The captured images of the inner surface of the given scene are combined to create a composite panoramic image that is textured onto a 3D geometrical object in Unity graphical engine to create an immersive environment for end users. △ Less

Submitted 26 June, 2018; originally announced June 2018.

Comments: 6 pages

arXiv:1505.01589 [pdf, other]

Shadow Optimization from Structured Deep Edge Detection

Authors: Li Shen, Teck Wee Chua, Karianto Leman

Abstract: Local structures of shadow boundaries as well as complex interactions of image regions remain largely unexploited by previous shadow detection approaches. In this paper, we present a novel learning-based framework for shadow region recovery from a single image. We exploit the local structures of shadow edges by using a structured CNN learning framework. We show that using the structured label info… ▽ More Local structures of shadow boundaries as well as complex interactions of image regions remain largely unexploited by previous shadow detection approaches. In this paper, we present a novel learning-based framework for shadow region recovery from a single image. We exploit the local structures of shadow edges by using a structured CNN learning framework. We show that using the structured label information in the classification can improve the local consistency of the results and avoid spurious labelling. We further propose and formulate a shadow/bright measure to model the complex interactions among image regions. The shadow and bright measures of each patch are computed from the shadow edges detected in the image. Using the global interaction constraints on patches, we formulate a least-square optimization problem for shadow recovery that can be solved efficiently. Our shadow recovery method achieves state-of-the-art results on the major shadow benchmark databases collected under various conditions. △ Less

Submitted 8 May, 2015; v1 submitted 7 May, 2015; originally announced May 2015.

Comments: 8 pages. CVPR 2015

Showing 1–4 of 4 results for author: Leman, K