Search | arXiv e-print repository

arXiv:2011.07449 [pdf, other]

doi 10.1007/978-3-030-58529-7_2

Online Ensemble Model Compression using Knowledge Distillation

Authors: Devesh Walawalkar, Zhiqiang Shen, Marios Savvides

Abstract: This paper presents a novel knowledge distillation based model compression framework consisting of a student ensemble. It enables distillation of simultaneously learnt ensemble knowledge onto each of the compressed student models. Each model learns unique representations from the data distribution due to its distinct architecture. This helps the ensemble generalize better by combining every model'… ▽ More This paper presents a novel knowledge distillation based model compression framework consisting of a student ensemble. It enables distillation of simultaneously learnt ensemble knowledge onto each of the compressed student models. Each model learns unique representations from the data distribution due to its distinct architecture. This helps the ensemble generalize better by combining every model's knowledge. The distilled students and ensemble teacher are trained simultaneously without requiring any pretrained weights. Moreover, our proposed method can deliver multi-compressed students with single training, which is efficient and flexible for different scenarios. We provide comprehensive experiments using state-of-the-art classification models to validate our framework's effectiveness. Notably, using our framework a 97% compressed ResNet110 student model managed to produce a 10.64% relative accuracy gain over its individual baseline training on CIFAR100 dataset. Similarly a 95% compressed DenseNet-BC(k=12) model managed a 8.17% relative accuracy gain. △ Less

Submitted 14 November, 2020; originally announced November 2020.

arXiv:2003.13048 [pdf, other]

Attentive CutMix: An Enhanced Data Augmentation Approach for Deep Learning Based Image Classification

Authors: Devesh Walawalkar, Zhiqiang Shen, Zechun Liu, Marios Savvides

Abstract: Convolutional neural networks (CNN) are capable of learning robust representation with different regularization methods and activations as convolutional layers are spatially correlated. Based on this property, a large variety of regional dropout strategies have been proposed, such as Cutout, DropBlock, CutMix, etc. These methods aim to promote the network to generalize better by partially occludin… ▽ More Convolutional neural networks (CNN) are capable of learning robust representation with different regularization methods and activations as convolutional layers are spatially correlated. Based on this property, a large variety of regional dropout strategies have been proposed, such as Cutout, DropBlock, CutMix, etc. These methods aim to promote the network to generalize better by partially occluding the discriminative parts of objects. However, all of them perform this operation randomly, without capturing the most important region(s) within an object. In this paper, we propose Attentive CutMix, a naturally enhanced augmentation strategy based on CutMix. In each training iteration, we choose the most descriptive regions based on the intermediate attention maps from a feature extractor, which enables searching for the most discriminative parts in an image. Our proposed method is simple yet effective, easy to implement and can boost the baseline significantly. Extensive experiments on CIFAR-10/100, ImageNet datasets with various CNN architectures (in a unified setting) demonstrate the effectiveness of our proposed method, which consistently outperforms the baseline CutMix and other methods by a significant margin. △ Less

Submitted 5 April, 2020; v1 submitted 29 March, 2020; originally announced March 2020.

arXiv:1908.10508 [pdf, other]

doi 10.1002/widm.1353

O-MedAL: Online Active Deep Learning for Medical Image Analysis

Authors: Asim Smailagic, Pedro Costa, Alex Gaudio, Kartik Khandelwal, Mostafa Mirshekari, Jonathon Fagert, Devesh Walawalkar, Susu Xu, Adrian Galdran, Pei Zhang, Aurélio Campilho, Hae Young Noh

Abstract: Active Learning methods create an optimized labeled training set from unlabeled data. We introduce a novel Online Active Deep Learning method for Medical Image Analysis. We extend our MedAL active learning framework to present new results in this paper. Our novel sampling method queries the unlabeled examples that maximize the average distance to all training set examples. Our online method enhanc… ▽ More Active Learning methods create an optimized labeled training set from unlabeled data. We introduce a novel Online Active Deep Learning method for Medical Image Analysis. We extend our MedAL active learning framework to present new results in this paper. Our novel sampling method queries the unlabeled examples that maximize the average distance to all training set examples. Our online method enhances performance of its underlying baseline deep network. These novelties contribute significant performance improvements, including improving the model's underlying deep network accuracy by 6.30%, using only 25% of the labeled dataset to achieve baseline accuracy, reducing backpropagated images during training by as much as 67%, and demonstrating robustness to class imbalance in binary and multi-class tasks. △ Less

Submitted 27 July, 2020; v1 submitted 27 August, 2019; originally announced August 2019.

Comments: Code: https://github.com/adgaudio/o-medal ; Accepted and published by Wiley Journal of Pattern Recognition and Knowledge Discovery ; Journal URL: https://doi.org/10.1002/widm.1353

Journal ref: Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 10.4 (2020): e1353

arXiv:1812.09336 [pdf, other]

An Empirical Analysis of Deep Audio-Visual Models for Speech Recognition

Authors: Devesh Walawalkar, Yihui He, Rohit Pillai

Abstract: In this project, we worked on speech recognition, specifically predicting individual words based on both the video frames and audio. Empowered by convolutional neural networks, the recent speech recognition and lip reading models are comparable to human level performance. We re-implemented and made derivations of the state-of-the-art model. Then, we conducted rich experiments including the effecti… ▽ More In this project, we worked on speech recognition, specifically predicting individual words based on both the video frames and audio. Empowered by convolutional neural networks, the recent speech recognition and lip reading models are comparable to human level performance. We re-implemented and made derivations of the state-of-the-art model. Then, we conducted rich experiments including the effectiveness of attention mechanism, more accurate residual network as the backbone with pre-trained weights and the sensitivity of our model with respect to audio input with/without noise. △ Less

Submitted 21 December, 2018; originally announced December 2018.

arXiv:1809.09287 [pdf]

MedAL: Deep Active Learning Sampling Method for Medical Image Analysis

Authors: Asim Smailagic, Hae Young Noh, Pedro Costa, Devesh Walawalkar, Kartik Khandelwal, Mostafa Mirshekari, Jonathon Fagert, Adrián Galdrán, Susu Xu

Abstract: Deep learning models have been successfully used in medical image analysis problems but they require a large amount of labeled images to obtain good performance.Deep learning models have been successfully used in medical image analysis problems but they require a large amount of labeled images to obtain good performance. However, such large labeled datasets are costly to acquire. Active learning t… ▽ More Deep learning models have been successfully used in medical image analysis problems but they require a large amount of labeled images to obtain good performance.Deep learning models have been successfully used in medical image analysis problems but they require a large amount of labeled images to obtain good performance. However, such large labeled datasets are costly to acquire. Active learning techniques can be used to minimize the number of required training labels while maximizing the model's performance.In this work, we propose a novel sampling method that queries the unlabeled examples that maximize the average distance to all training set examples in a learned feature space. We then extend our sampling method to define a better initial training set, without the need for a trained model, by using ORB feature descriptors. We validate MedAL on 3 medical image datasets and show that our method is robust to different dataset properties. MedAL is also efficient, achieving 80% accuracy on the task of Diabetic Retinopathy detection using only 425 labeled images, corresponding to a 32% reduction in the number of required labeled examples compared to the standard uncertainty sampling technique, and a 40% reduction compared to random sampling. △ Less

Submitted 28 September, 2018; v1 submitted 24 September, 2018; originally announced September 2018.

Comments: Accepted as conference paper for ICMLA 2018

arXiv:1801.01402 [pdf]

A fully automated framework for lung tumour detection, segmentation and analysis

Authors: Devesh Walawalkar

Abstract: Early and correct diagnosis is a very important aspect of cancer treatment. Detection of tumour in Computed Tomography scan is a tedious and tricky task which requires expert knowledge and a lot of human working hours. As small human error is present in any work he does, it is possible that a CT scan could be misdiagnosed causing the patient to become terminal. This paper introduces a novel fully… ▽ More Early and correct diagnosis is a very important aspect of cancer treatment. Detection of tumour in Computed Tomography scan is a tedious and tricky task which requires expert knowledge and a lot of human working hours. As small human error is present in any work he does, it is possible that a CT scan could be misdiagnosed causing the patient to become terminal. This paper introduces a novel fully automated framework which helps to detect and segment tumour, if present in a lung CT scan series. It also provides useful analysis of the detected tumour such as its approximate volume, centre location and more. The framework provides a single click solution which analyses all CT images of a single patient series in one go. It helps to reduce the work of manually going through each CT slice and provides quicker and more accurate tumour diagnosis. It makes use of customized image processing and image segmentation methods, to detect and segment the prospective tumour region from the CT scan. It then uses a trained ensemble classifier to correctly classify the segmented region as being tumour or not. Tumour analysis further computed can then be used to determine malignity of the tumour. With an accuracy of 98.14%, the implemented framework can be used in various practical scenarios, capable of eliminating need of any expert pathologist intervention. △ Less

Submitted 4 January, 2018; originally announced January 2018.

arXiv:1711.06303 [pdf]

Grammatical facial expression recognition using customized deep neural network architecture

Authors: Devesh Walawalkar

Abstract: This paper proposes to expand the visual understanding capacity of computers by hel** it recognize human sign language more efficiently. This is carried out through recognition of facial expressions, which accompany the hand signs used in this language. This paper specially focuses on the popular Brazilian sign language (LIBRAS). While classifying different hand signs into their respective word… ▽ More This paper proposes to expand the visual understanding capacity of computers by hel** it recognize human sign language more efficiently. This is carried out through recognition of facial expressions, which accompany the hand signs used in this language. This paper specially focuses on the popular Brazilian sign language (LIBRAS). While classifying different hand signs into their respective word meanings has already seen much literature dedicated to it, the emotions or intention with which the words are expressed haven't primarily been taken into consideration. As from our normal human experience, words expressed with different emotions or mood can have completely different meanings attached to it. Lending computers the ability of classifying these facial expressions, can help add another level of deep understanding of what the deaf person exactly wants to communicate. The proposed idea is implemented through a deep neural network having a customized architecture. This helps learning specific patterns in individual expressions much better as compared to a generic approach. With an overall accuracy of 98.04%, the implemented deep network performs excellently well and thus is fit to be used in any given practical scenario. △ Less

Submitted 16 November, 2017; originally announced November 2017.

Showing 1–7 of 7 results for author: Walawalkar, D