-
Resource Efficient Perception for Vision Systems
Authors:
A V Subramanyam,
Niyati Singal,
Vinay K Verma
Abstract:
Despite the rapid advancement in the field of image recognition, the processing of high-resolution imagery remains a computational challenge. However, this processing is pivotal for extracting detailed object insights in areas ranging from autonomous vehicle navigation to medical imaging analyses. Our study introduces a framework aimed at mitigating these challenges by leveraging memory efficient…
▽ More
Despite the rapid advancement in the field of image recognition, the processing of high-resolution imagery remains a computational challenge. However, this processing is pivotal for extracting detailed object insights in areas ranging from autonomous vehicle navigation to medical imaging analyses. Our study introduces a framework aimed at mitigating these challenges by leveraging memory efficient patch based processing for high resolution images. It incorporates a global context representation alongside local patch information, enabling a comprehensive understanding of the image content. In contrast to traditional training methods which are limited by memory constraints, our method enables training of ultra high resolution images. We demonstrate the effectiveness of our method through superior performance on 7 different benchmarks across classification, object detection, and segmentation. Notably, the proposed method achieves strong performance even on resource-constrained devices like Jetson Nano. Our code is available at https://github.com/Visual-Conception-Group/Localized-Perception-Constrained-Vision-Systems.
△ Less
Submitted 12 May, 2024;
originally announced May 2024.
-
Convolutional Prompting meets Language Models for Continual Learning
Authors:
Anurag Roy,
Riddhiman Moulick,
Vinay K. Verma,
Saptarshi Ghosh,
Abir Das
Abstract:
Continual Learning (CL) enables machine learning models to learn from continuously shifting new training data in absence of data from old tasks. Recently, pretrained vision transformers combined with prompt tuning have shown promise for overcoming catastrophic forgetting in CL. These approaches rely on a pool of learnable prompts which can be inefficient in sharing knowledge across tasks leading t…
▽ More
Continual Learning (CL) enables machine learning models to learn from continuously shifting new training data in absence of data from old tasks. Recently, pretrained vision transformers combined with prompt tuning have shown promise for overcoming catastrophic forgetting in CL. These approaches rely on a pool of learnable prompts which can be inefficient in sharing knowledge across tasks leading to inferior performance. In addition, the lack of fine-grained layer specific prompts does not allow these to fully express the strength of the prompts for CL. We address these limitations by proposing ConvPrompt, a novel convolutional prompt creation mechanism that maintains layer-wise shared embeddings, enabling both layer-specific learning and better concept transfer across tasks. The intelligent use of convolution enables us to maintain a low parameter overhead without compromising performance. We further leverage Large Language Models to generate fine-grained text descriptions of each category which are used to get task similarity and dynamically decide the number of prompts to be learned. Extensive experiments demonstrate the superiority of ConvPrompt and improves SOTA by ~3% with significantly less parameter overhead. We also perform strong ablation over various modules to disentangle the importance of different components.
△ Less
Submitted 29 March, 2024;
originally announced March 2024.
-
Efficient Expansion and Gradient Based Task Inference for Replay Free Incremental Learning
Authors:
Soumya Roy,
Vinay K Verma,
Deepak Gupta
Abstract:
This paper proposes a simple but highly efficient expansion-based model for continual learning. The recent feature transformation, masking and factorization-based methods are efficient, but they grow the model only over the global or shared parameter. Therefore, these approaches do not fully utilize the previously learned information because the same task-specific parameter forgets the earlier kno…
▽ More
This paper proposes a simple but highly efficient expansion-based model for continual learning. The recent feature transformation, masking and factorization-based methods are efficient, but they grow the model only over the global or shared parameter. Therefore, these approaches do not fully utilize the previously learned information because the same task-specific parameter forgets the earlier knowledge. Thus, these approaches show limited transfer learning ability. Moreover, most of these models have constant parameter growth for all tasks, irrespective of the task complexity. Our work proposes a simple filter and channel expansion based method that grows the model over the previous task parameters and not just over the global parameter. Therefore, it fully utilizes all the previously learned information without forgetting, which results in better knowledge transfer. The growth rate in our proposed model is a function of task complexity; therefore for a simple task, the model has a smaller parameter growth while for complex tasks, the model requires more parameters to adapt to the current task. Recent expansion based models show promising results for task incremental learning (TIL). However, for class incremental learning (CIL), prediction of task id is a crucial challenge; hence, their results degrade rapidly as the number of tasks increase. In this work, we propose a robust task prediction method that leverages entropy weighted data augmentations and the models gradient using pseudo labels. We evaluate our model on various datasets and architectures in the TIL, CIL and generative continual learning settings. The proposed approach shows state-of-the-art results in all these settings. Our extensive ablation studies show the efficacy of the proposed components.
△ Less
Submitted 2 December, 2023;
originally announced December 2023.
-
Meta-Learned Attribute Self-Interaction Network for Continual and Generalized Zero-Shot Learning
Authors:
Vinay K Verma,
Nikhil Mehta,
Kevin J Liang,
Aakansha Mishra,
Lawrence Carin
Abstract:
Zero-shot learning (ZSL) is a promising approach to generalizing a model to categories unseen during training by leveraging class attributes, but challenges remain. Recently, methods using generative models to combat bias towards classes seen during training have pushed state of the art, but these generative models can be slow or computationally expensive to train. Also, these generative models as…
▽ More
Zero-shot learning (ZSL) is a promising approach to generalizing a model to categories unseen during training by leveraging class attributes, but challenges remain. Recently, methods using generative models to combat bias towards classes seen during training have pushed state of the art, but these generative models can be slow or computationally expensive to train. Also, these generative models assume that the attribute vector of each unseen class is available a priori at training, which is not always practical. Additionally, while many previous ZSL methods assume a one-time adaptation to unseen classes, in reality, the world is always changing, necessitating a constant adjustment of deployed models. Models unprepared to handle a sequential stream of data are likely to experience catastrophic forgetting. We propose a Meta-learned Attribute self-Interaction Network (MAIN) for continual ZSL. By pairing attribute self-interaction trained using meta-learning with inverse regularization of the attribute encoder, we are able to outperform state-of-the-art results without leveraging the unseen class attributes while also being able to train our models substantially faster (>100x) than expensive generative-based approaches. We demonstrate this with experiments on five standard ZSL datasets (CUB, aPY, AWA1, AWA2, and SUN) in the generalized zero-shot learning and continual (fixed/dynamic) zero-shot learning settings. Extensive ablations and analyses demonstrate the efficacy of various components proposed.
△ Less
Submitted 2 December, 2023;
originally announced December 2023.
-
VERSE: Virtual-Gradient Aware Streaming Lifelong Learning with Anytime Inference
Authors:
Soumya Banerjee,
Vinay K. Verma,
Avideep Mukherjee,
Deepak Gupta,
Vinay P. Namboodiri,
Piyush Rai
Abstract:
Lifelong learning or continual learning is the problem of training an AI agent continuously while also preventing it from forgetting its previously acquired knowledge. Streaming lifelong learning is a challenging setting of lifelong learning with the goal of continuous learning in a dynamic non-stationary environment without forgetting. We introduce a novel approach to lifelong learning, which is…
▽ More
Lifelong learning or continual learning is the problem of training an AI agent continuously while also preventing it from forgetting its previously acquired knowledge. Streaming lifelong learning is a challenging setting of lifelong learning with the goal of continuous learning in a dynamic non-stationary environment without forgetting. We introduce a novel approach to lifelong learning, which is streaming (observes each training example only once), requires a single pass over the data, can learn in a class-incremental manner, and can be evaluated on-the-fly (anytime inference). To accomplish these, we propose a novel \emph{virtual gradients} based approach for continual representation learning which adapts to each new example while also generalizing well on past data to prevent catastrophic forgetting. Our approach also leverages an exponential-moving-average-based semantic memory to further enhance performance. Experiments on diverse datasets with temporally correlated observations demonstrate our method's efficacy and superior performance over existing methods.
△ Less
Submitted 19 February, 2024; v1 submitted 15 September, 2023;
originally announced September 2023.
-
Exemplar-Free Continual Transformer with Convolutions
Authors:
Anurag Roy,
Vinay Kumar Verma,
Sravan Voonna,
Kripabandhu Ghosh,
Saptarshi Ghosh,
Abir Das
Abstract:
Continual Learning (CL) involves training a machine learning model in a sequential manner to learn new information while retaining previously learned tasks without the presence of previous training data. Although there has been significant interest in CL, most recent CL approaches in computer vision have focused on convolutional architectures only. However, with the recent success of vision transf…
▽ More
Continual Learning (CL) involves training a machine learning model in a sequential manner to learn new information while retaining previously learned tasks without the presence of previous training data. Although there has been significant interest in CL, most recent CL approaches in computer vision have focused on convolutional architectures only. However, with the recent success of vision transformers, there is a need to explore their potential for CL. Although there have been some recent CL approaches for vision transformers, they either store training instances of previous tasks or require a task identifier during test time, which can be limiting. This paper proposes a new exemplar-free approach for class/task incremental learning called ConTraCon, which does not require task-id to be explicitly present during inference and avoids the need for storing previous training instances. The proposed approach leverages the transformer architecture and involves re-weighting the key, query, and value weights of the multi-head self-attention layers of a transformer trained on a similar task. The re-weighting is done using convolution, which enables the approach to maintain low parameter requirements per task. Additionally, an image augmentation-based entropic task identification approach is used to predict tasks without requiring task-ids during inference. Experiments on four benchmark datasets demonstrate that the proposed approach outperforms several competitive approaches while requiring fewer parameters.
△ Less
Submitted 22 August, 2023;
originally announced August 2023.
-
Streaming LifeLong Learning With Any-Time Inference
Authors:
Soumya Banerjee,
Vinay Kumar Verma,
Vinay P. Namboodiri
Abstract:
Despite rapid advancements in lifelong learning (LLL) research, a large body of research mainly focuses on improving the performance in the existing \textit{static} continual learning (CL) setups. These methods lack the ability to succeed in a rapidly changing \textit{dynamic} environment, where an AI agent needs to quickly learn new instances in a `single pass' from the non-i.i.d (also possibly t…
▽ More
Despite rapid advancements in lifelong learning (LLL) research, a large body of research mainly focuses on improving the performance in the existing \textit{static} continual learning (CL) setups. These methods lack the ability to succeed in a rapidly changing \textit{dynamic} environment, where an AI agent needs to quickly learn new instances in a `single pass' from the non-i.i.d (also possibly temporally contiguous/coherent) data streams without suffering from catastrophic forgetting. For practical applicability, we propose a novel lifelong learning approach, which is streaming, i.e., a single input sample arrives in each time step, single pass, class-incremental, and subject to be evaluated at any moment. To address this challenging setup and various evaluation protocols, we propose a Bayesian framework, that enables fast parameter update, given a single training example, and enables any-time inference. We additionally propose an implicit regularizer in the form of snap-shot self-distillation, which effectively minimizes the forgetting further. We further propose an effective method that efficiently selects a subset of samples for online memory rehearsal and employs a new replay buffer management scheme that significantly boosts the overall performance. Our empirical evaluations and ablations demonstrate that the proposed method outperforms the prior works by large margins.
△ Less
Submitted 27 January, 2023;
originally announced January 2023.
-
Pushing the Efficiency Limit Using Structured Sparse Convolutions
Authors:
Vinay Kumar Verma,
Nikhil Mehta,
Shi**g Si,
Ricardo Henao,
Lawrence Carin
Abstract:
Weight pruning is among the most popular approaches for compressing deep convolutional neural networks. Recent work suggests that in a randomly initialized deep neural network, there exist sparse subnetworks that achieve performance comparable to the original network. Unfortunately, finding these subnetworks involves iterative stages of training and pruning, which can be computationally expensive.…
▽ More
Weight pruning is among the most popular approaches for compressing deep convolutional neural networks. Recent work suggests that in a randomly initialized deep neural network, there exist sparse subnetworks that achieve performance comparable to the original network. Unfortunately, finding these subnetworks involves iterative stages of training and pruning, which can be computationally expensive. We propose Structured Sparse Convolution (SSC), which leverages the inherent structure in images to reduce the parameters in the convolutional filter. This leads to improved efficiency of convolutional architectures compared to existing methods that perform pruning at initialization. We show that SSC is a generalization of commonly used layers (depthwise, groupwise and pointwise convolution) in ``efficient architectures.'' Extensive experiments on well-known CNN models and datasets show the effectiveness of the proposed method. Architectures based on SSC achieve state-of-the-art performance compared to baselines on CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet classification benchmarks.
△ Less
Submitted 23 October, 2022;
originally announced October 2022.
-
Cr do**-induced ferromagnetism in the spin-glass Cd1-xMnxTe studied by x-ray magnetic circular dichroism
Authors:
V. K. Verma,
S. Sakamoto,
K. Ishikawa,
V. R. Singh,
K. Ishigami,
G. Shibata,
T. Kadono,
T. Koide,
S. Kuroda,
A. Fujimori
Abstract:
The prototypical diluted magnetic semiconductor Cd1-xMnxTe is a spin glass (x<0.6) or an antiferromagnet (x>0.6), but becomes ferromagnetic upon do** with a small amount of Cr atoms substituting for Mn. In order to investigate the origin of the ferromagnetism in Cd1-x-yMnxCryTe, we have studied its element specific magnetic properties by x-ray absorption spectroscopy (XAS) and x-ray magnetic cir…
▽ More
The prototypical diluted magnetic semiconductor Cd1-xMnxTe is a spin glass (x<0.6) or an antiferromagnet (x>0.6), but becomes ferromagnetic upon do** with a small amount of Cr atoms substituting for Mn. In order to investigate the origin of the ferromagnetism in Cd1-x-yMnxCryTe, we have studied its element specific magnetic properties by x-ray absorption spectroscopy (XAS) and x-ray magnetic circular dichroism (XMCD) at the Cr and Mn L2,3 edges. Thin films were grown by molecular beam epitaxy with a fixed Mn content of x = 0.2 and varying Cr content in the range of y = 0 - 0.04. Measured XAS and XMCD spectra indicate that both Cr and Mn atoms are divalent and that the ferromagnetic or superparamagnetic components of Cr and Mn are aligned in the same directions. The magnetization of Mn increases with increasing Cr content. These results can be explained if ferromagnetic interaction exists between neighboring Mn and Cr ions although interaction between Mn atoms is largely antiferromagnetic. We conclude that each ferromagnetic or superparamagnetic cluster consists of ferromagnetically coupled several Cr and a much larger number of Mn ions.
△ Less
Submitted 5 April, 2022;
originally announced April 2022.
-
Tunable and Sensitive Detection of Cortisol using Anisotropic Phosphorene with a Surface Plasmon Resonance Technique: Numerical Investigation
Authors:
Vipin Kumar Verma,
Sarika Pal,
Conrad Rizal Yogendra Kumar Prajapati
Abstract:
Tunable and ultrasensitive surface plasmon resonance (SPR) sensors are highly desirable for monitoring stress hormones such as cortisol, a steroid hormone formed in the adrenal glands in the human body. This paper describes the detection of cortisol using a bimetallic SPR sensor based on highly anisotropic two-dimensional material, i.e., phosphorene. Thicknesses of bi-metal layers, such as copper…
▽ More
Tunable and ultrasensitive surface plasmon resonance (SPR) sensors are highly desirable for monitoring stress hormones such as cortisol, a steroid hormone formed in the adrenal glands in the human body. This paper describes the detection of cortisol using a bimetallic SPR sensor based on highly anisotropic two-dimensional material, i.e., phosphorene. Thicknesses of bi-metal layers, such as copper (Cu) and nickel (Ni), is optimized to achieve strong SPR excitation. The proposed sensor is rotated in-plane with a rotation angle around the z-axis to obtain phosphorene anisotropic behavior. The performance parameters of the sensor are demonstrated in terms of higher sensitivity (347.78 degree/RIU), maximum angular figure of merit (1780.3), and finer limit of detection of 0.026 ng/ml. Furthermore, a significant penetration depth (203 nm) is achieved for the proposed sensor. The obtained results of the above parameters indicate that the proposed sensor outperforms the previously reported papers in the literature on cortisol detection using the SPR technique.
△ Less
Submitted 29 November, 2021;
originally announced November 2021.
-
Class Incremental Online Streaming Learning
Authors:
Soumya Banerjee,
Vinay Kumar Verma,
Toufiq Parag,
Maneesh Singh,
Vinay P. Namboodiri
Abstract:
A wide variety of methods have been developed to enable lifelong learning in conventional deep neural networks. However, to succeed, these methods require a `batch' of samples to be available and visited multiple times during training. While this works well in a static setting, these methods continue to suffer in a more realistic situation where data arrives in \emph{online streaming manner}. We e…
▽ More
A wide variety of methods have been developed to enable lifelong learning in conventional deep neural networks. However, to succeed, these methods require a `batch' of samples to be available and visited multiple times during training. While this works well in a static setting, these methods continue to suffer in a more realistic situation where data arrives in \emph{online streaming manner}. We empirically demonstrate that the performance of current approaches degrades if the input is obtained as a stream of data with the following restrictions: $(i)$ each instance comes one at a time and can be seen only once, and $(ii)$ the input data violates the i.i.d assumption, i.e., there can be a class-based correlation. We propose a novel approach (CIOSL) for the class-incremental learning in an \emph{online streaming setting} to address these challenges. The proposed approach leverages implicit and explicit dual weight regularization and experience replay. The implicit regularization is leveraged via the knowledge distillation, while the explicit regularization incorporates a novel approach for parameter regularization by learning the joint distribution of the buffer replay and the current sample. Also, we propose an efficient online memory replay and replacement buffer strategy that significantly boosts the model's performance. Extensive experiments and ablation on challenging datasets show the efficacy of the proposed method.
△ Less
Submitted 20 October, 2021;
originally announced October 2021.
-
Hypernetworks for Continual Semi-Supervised Learning
Authors:
Dhanajit Brahma,
Vinay Kumar Verma,
Piyush Rai
Abstract:
Learning from data sequentially arriving, possibly in a non i.i.d. way, with changing task distribution over time is called continual learning. Much of the work thus far in continual learning focuses on supervised learning and some recent works on unsupervised learning. In many domains, each task contains a mix of labelled (typically very few) and unlabelled (typically plenty) training examples, w…
▽ More
Learning from data sequentially arriving, possibly in a non i.i.d. way, with changing task distribution over time is called continual learning. Much of the work thus far in continual learning focuses on supervised learning and some recent works on unsupervised learning. In many domains, each task contains a mix of labelled (typically very few) and unlabelled (typically plenty) training examples, which necessitates a semi-supervised learning approach. To address this in a continual learning setting, we propose a framework for semi-supervised continual learning called Meta-Consolidation for Continual Semi-Supervised Learning (MCSSL). Our framework has a hypernetwork that learns the meta-distribution that generates the weights of a semi-supervised auxiliary classifier generative adversarial network $(\textit{Semi-ACGAN})$ as the base network. We consolidate the knowledge of sequential tasks in the hypernetwork, and the base network learns the semi-supervised learning task. Further, we present $\textit{Semi-Split CIFAR-10}$, a new benchmark for continual semi-supervised learning, obtained by modifying the $\textit{Split CIFAR-10}$ dataset, in which the tasks with labelled and unlabelled data arrive sequentially. Our proposed model yields significant improvements in the continual semi-supervised learning setting. We compare the performance of several existing continual learning approaches on the proposed continual semi-supervised learning benchmark of the Semi-Split CIFAR-10 dataset.
△ Less
Submitted 5 October, 2021;
originally announced October 2021.
-
Knowledge Consolidation based Class Incremental Online Learning with Limited Data
Authors:
Mohammed Asad Karim,
Vinay Kumar Verma,
Pravendra Singh,
Vinay Namboodiri,
Piyush Rai
Abstract:
We propose a novel approach for class incremental online learning in a limited data setting. This problem setting is challenging because of the following constraints: (1) Classes are given incrementally, which necessitates a class incremental learning approach; (2) Data for each class is given in an online fashion, i.e., each training example is seen only once during training; (3) Each class has v…
▽ More
We propose a novel approach for class incremental online learning in a limited data setting. This problem setting is challenging because of the following constraints: (1) Classes are given incrementally, which necessitates a class incremental learning approach; (2) Data for each class is given in an online fashion, i.e., each training example is seen only once during training; (3) Each class has very few training examples; and (4) We do not use or assume access to any replay/memory to store data from previous classes. Therefore, in this setting, we have to handle twofold problems of catastrophic forgetting and overfitting. In our approach, we learn robust representations that are generalizable across tasks without suffering from the problems of catastrophic forgetting and overfitting to accommodate future classes with limited samples. Our proposed method leverages the meta-learning framework with knowledge consolidation. The meta-learning framework helps the model for rapid learning when samples appear in an online fashion. Simultaneously, knowledge consolidation helps to learn a robust representation against forgetting under online updates to facilitate future learning. Our approach significantly outperforms other methods on several benchmarks.
△ Less
Submitted 12 June, 2021;
originally announced June 2021.
-
Efficient Feature Transformations for Discriminative and Generative Continual Learning
Authors:
Vinay Kumar Verma,
Kevin J Liang,
Nikhil Mehta,
Piyush Rai,
Lawrence Carin
Abstract:
As neural networks are increasingly being applied to real-world applications, mechanisms to address distributional shift and sequential task learning without forgetting are critical. Methods incorporating network expansion have shown promise by naturally adding model capacity for learning new tasks while simultaneously avoiding catastrophic forgetting. However, the growth in the number of addition…
▽ More
As neural networks are increasingly being applied to real-world applications, mechanisms to address distributional shift and sequential task learning without forgetting are critical. Methods incorporating network expansion have shown promise by naturally adding model capacity for learning new tasks while simultaneously avoiding catastrophic forgetting. However, the growth in the number of additional parameters of many of these types of methods can be computationally expensive at larger scales, at times prohibitively so. Instead, we propose a simple task-specific feature map transformation strategy for continual learning, which we call Efficient Feature Transformations (EFTs). These EFTs provide powerful flexibility for learning new tasks, achieved with minimal parameters added to the base architecture. We further propose a feature distance maximization strategy, which significantly improves task prediction in class incremental settings, without needing expensive generative models. We demonstrate the efficacy and efficiency of our method with an extensive set of experiments in discriminative (CIFAR-100 and ImageNet-1K) and generative (LSUN, CUB-200, Cats) sequences of tasks. Even with low single-digit parameter growth rates, EFTs can outperform many other continual learning methods in a wide range of settings.
△ Less
Submitted 24 March, 2021;
originally announced March 2021.
-
CAM-GAN: Continual Adaptation Modules for Generative Adversarial Networks
Authors:
Sakshi Varshney,
Vinay Kumar Verma,
Srijith P K,
Lawrence Carin,
Piyush Rai
Abstract:
We present a continual learning approach for generative adversarial networks (GANs), by designing and leveraging parameter-efficient feature map transformations. Our approach is based on learning a set of global and task-specific parameters. The global parameters are fixed across tasks whereas the task-specific parameters act as local adapters for each task, and help in efficiently obtaining task-…
▽ More
We present a continual learning approach for generative adversarial networks (GANs), by designing and leveraging parameter-efficient feature map transformations. Our approach is based on learning a set of global and task-specific parameters. The global parameters are fixed across tasks whereas the task-specific parameters act as local adapters for each task, and help in efficiently obtaining task-specific feature maps. Moreover, we propose an element-wise addition of residual bias in the transformed feature space, which further helps stabilize GAN training in such settings. Our approach also leverages task similarity information based on the Fisher information matrix. Leveraging this knowledge from previous tasks significantly improves the model performance. In addition, the similarity measure also helps reduce the parameter growth in continual adaptation and helps to learn a compact model. In contrast to the recent approaches for continually-learned GANs, the proposed approach provides a memory-efficient way to perform effective continual data generation. Through extensive experiments on challenging and diverse datasets, we show that the feature-map-transformation approach outperforms state-of-the-art methods for continually-learned GANs, with substantially fewer parameters. The proposed method generates high-quality samples that can also improve the generative-replay-based continual learning for discriminative tasks.
△ Less
Submitted 30 July, 2021; v1 submitted 6 March, 2021;
originally announced March 2021.
-
Meta-Learned Attribute Self-Gating for Continual Generalized Zero-Shot Learning
Authors:
Vinay Kumar Verma,
Kevin Liang,
Nikhil Mehta,
Lawrence Carin
Abstract:
Zero-shot learning (ZSL) has been shown to be a promising approach to generalizing a model to categories unseen during training by leveraging class attributes, but challenges still remain. Recently, methods using generative models to combat bias towards classes seen during training have pushed the state of the art of ZSL, but these generative models can be slow or computationally expensive to trai…
▽ More
Zero-shot learning (ZSL) has been shown to be a promising approach to generalizing a model to categories unseen during training by leveraging class attributes, but challenges still remain. Recently, methods using generative models to combat bias towards classes seen during training have pushed the state of the art of ZSL, but these generative models can be slow or computationally expensive to train. Additionally, while many previous ZSL methods assume a one-time adaptation to unseen classes, in reality, the world is always changing, necessitating a constant adjustment for deployed models. Models unprepared to handle a sequential stream of data are likely to experience catastrophic forgetting. We propose a meta-continual zero-shot learning (MCZSL) approach to address both these issues. In particular, by pairing self-gating of attributes and scaled class normalization with meta-learning based training, we are able to outperform state-of-the-art results while being able to train our models substantially faster ($>100\times$) than expensive generative-based approaches. We demonstrate this by performing experiments on five standard ZSL datasets (CUB, aPY, AWA1, AWA2 and SUN) in both generalized zero-shot learning and generalized continual zero-shot learning settings.
△ Less
Submitted 23 February, 2021;
originally announced February 2021.
-
Towards Zero-Shot Learning with Fewer Seen Class Examples
Authors:
Vinay Kumar Verma,
Ashish Mishra,
Anubha Pandey,
Hema A. Murthy,
Piyush Rai
Abstract:
We present a meta-learning based generative model for zero-shot learning (ZSL) towards a challenging setting when the number of training examples from each \emph{seen} class is very few. This setup contrasts with the conventional ZSL approaches, where training typically assumes the availability of a sufficiently large number of training examples from each of the seen classes. The proposed approach…
▽ More
We present a meta-learning based generative model for zero-shot learning (ZSL) towards a challenging setting when the number of training examples from each \emph{seen} class is very few. This setup contrasts with the conventional ZSL approaches, where training typically assumes the availability of a sufficiently large number of training examples from each of the seen classes. The proposed approach leverages meta-learning to train a deep generative model that integrates variational autoencoder and generative adversarial networks. We propose a novel task distribution where meta-train and meta-validation classes are disjoint to simulate the ZSL behaviour in training. Once trained, the model can generate synthetic examples from seen and unseen classes. Synthesize samples can then be used to train the ZSL framework in a supervised manner. The meta-learner enables our model to generates high-fidelity samples using only a small number of training examples from seen classes. We conduct extensive experiments and ablation studies on four benchmark datasets of ZSL and observe that the proposed model outperforms state-of-the-art approaches by a significant margin when the number of examples per seen class is very small.
△ Less
Submitted 14 November, 2020;
originally announced November 2020.
-
ZSCRGAN: A GAN-based Expectation Maximization Model for Zero-Shot Retrieval of Images from Textual Descriptions
Authors:
Anurag Roy,
Vinay Kumar Verma,
Kripabandhu Ghosh,
Saptarshi Ghosh
Abstract:
Most existing algorithms for cross-modal Information Retrieval are based on a supervised train-test setup, where a model learns to align the mode of the query (e.g., text) to the mode of the documents (e.g., images) from a given training set. Such a setup assumes that the training set contains an exhaustive representation of all possible classes of queries. In reality, a retrieval model may need t…
▽ More
Most existing algorithms for cross-modal Information Retrieval are based on a supervised train-test setup, where a model learns to align the mode of the query (e.g., text) to the mode of the documents (e.g., images) from a given training set. Such a setup assumes that the training set contains an exhaustive representation of all possible classes of queries. In reality, a retrieval model may need to be deployed on previously unseen classes, which implies a zero-shot IR setup. In this paper, we propose a novel GAN-based model for zero-shot text to image retrieval. When given a textual description as the query, our model can retrieve relevant images in a zero-shot setup. The proposed model is trained using an Expectation-Maximization framework. Experiments on multiple benchmark datasets show that our proposed model comfortably outperforms several state-of-the-art zero-shot text to image retrieval models, as well as zero-shot classification and hashing models suitably used for retrieval.
△ Less
Submitted 23 September, 2020; v1 submitted 23 July, 2020;
originally announced July 2020.
-
Continual Learning using a Bayesian Nonparametric Dictionary of Weight Factors
Authors:
Nikhil Mehta,
Kevin J Liang,
Vinay K Verma,
Lawrence Carin
Abstract:
Naively trained neural networks tend to experience catastrophic forgetting in sequential task settings, where data from previous tasks are unavailable. A number of methods, using various model expansion strategies, have been proposed recently as possible solutions. However, determining how much to expand the model is left to the practitioner, and often a constant schedule is chosen for simplicity,…
▽ More
Naively trained neural networks tend to experience catastrophic forgetting in sequential task settings, where data from previous tasks are unavailable. A number of methods, using various model expansion strategies, have been proposed recently as possible solutions. However, determining how much to expand the model is left to the practitioner, and often a constant schedule is chosen for simplicity, regardless of how complex the incoming task is. Instead, we propose a principled Bayesian nonparametric approach based on the Indian Buffet Process (IBP) prior, letting the data determine how much to expand the model complexity. We pair this with a factorization of the neural network's weight matrices. Such an approach allows the number of factors of each weight matrix to scale with the complexity of the task, while the IBP prior encourages sparse weight factor selection and factor reuse, promoting positive knowledge transfer between tasks. We demonstrate the effectiveness of our method on a number of continual learning benchmarks and analyze how weight factors are allocated and reused throughout the training.
△ Less
Submitted 27 April, 2021; v1 submitted 21 April, 2020;
originally announced April 2020.
-
Stacked Adversarial Network for Zero-Shot Sketch based Image Retrieval
Authors:
Anubha Pandey,
Ashish Mishra,
Vinay Kumar Verma,
Anurag Mittal,
Hema A. Murthy
Abstract:
Conventional approaches to Sketch-Based Image Retrieval (SBIR) assume that the data of all the classes are available during training. The assumption may not always be practical since the data of a few classes may be unavailable, or the classes may not appear at the time of training. Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) relaxes this constraint and allows the algorithm to handle previous…
▽ More
Conventional approaches to Sketch-Based Image Retrieval (SBIR) assume that the data of all the classes are available during training. The assumption may not always be practical since the data of a few classes may be unavailable, or the classes may not appear at the time of training. Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) relaxes this constraint and allows the algorithm to handle previously unseen classes during the test. This paper proposes a generative approach based on the Stacked Adversarial Network (SAN) and the advantage of Siamese Network (SN) for ZS-SBIR. While SAN generates a high-quality sample, SN learns a better distance metric compared to that of the nearest neighbor search. The capability of the generative model to synthesize image features based on the sketch reduces the SBIR problem to that of an image-to-image retrieval problem. We evaluate the efficacy of our proposed approach on TU-Berlin, and Sketchy database in both standard ZSL and generalized ZSL setting. The proposed method yields a significant improvement in standard ZSL as well as in a more challenging generalized ZSL setting (GZSL) for SBIR.
△ Less
Submitted 18 January, 2020;
originally announced January 2020.
-
A "Network Pruning Network" Approach to Deep Model Compression
Authors:
Vinay Kumar Verma,
Pravendra Singh,
Vinay P. Namboodiri,
Piyush Rai
Abstract:
We present a filter pruning approach for deep model compression, using a multitask network. Our approach is based on learning a a pruner network to prune a pre-trained target network. The pruner is essentially a multitask deep neural network with binary outputs that help identify the filters from each layer of the original network that do not have any significant contribution to the model and can…
▽ More
We present a filter pruning approach for deep model compression, using a multitask network. Our approach is based on learning a a pruner network to prune a pre-trained target network. The pruner is essentially a multitask deep neural network with binary outputs that help identify the filters from each layer of the original network that do not have any significant contribution to the model and can therefore be pruned. The pruner network has the same architecture as the original network except that it has a multitask/multi-output last layer containing binary-valued outputs (one per filter), which indicate which filters have to be pruned. The pruner's goal is to minimize the number of filters from the original network by assigning zero weights to the corresponding output feature-maps. In contrast to most of the existing methods, instead of relying on iterative pruning, our approach can prune the network (original network) in one go and, moreover, does not require specifying the degree of pruning for each layer (and can learn it instead). The compressed model produced by our approach is generic and does not need any special hardware/software support. Moreover, augmenting with other methods such as knowledge distillation, quantization, and connection pruning can increase the degree of compression for the proposed approach. We show the efficacy of our proposed approach for classification and object detection tasks.
△ Less
Submitted 15 January, 2020;
originally announced January 2020.
-
A Meta-Learning Framework for Generalized Zero-Shot Learning
Authors:
Vinay Kumar Verma,
Dhanajit Brahma,
Piyush Rai
Abstract:
Learning to classify unseen class samples at test time is popularly referred to as zero-shot learning (ZSL). If test samples can be from training (seen) as well as unseen classes, it is a more challenging problem due to the existence of strong bias towards seen classes. This problem is generally known as \emph{generalized} zero-shot learning (GZSL). Thanks to the recent advances in generative mode…
▽ More
Learning to classify unseen class samples at test time is popularly referred to as zero-shot learning (ZSL). If test samples can be from training (seen) as well as unseen classes, it is a more challenging problem due to the existence of strong bias towards seen classes. This problem is generally known as \emph{generalized} zero-shot learning (GZSL). Thanks to the recent advances in generative models such as VAEs and GANs, sample synthesis based approaches have gained considerable attention for solving this problem. These approaches are able to handle the problem of class bias by synthesizing unseen class samples. However, these ZSL/GZSL models suffer due to the following key limitations: $(i)$ Their training stage learns a class-conditioned generator using only \emph{seen} class data and the training stage does not \emph{explicitly} learn to generate the unseen class samples; $(ii)$ They do not learn a generic optimal parameter which can easily generalize for both seen and unseen class generation; and $(iii)$ If we only have access to a very few samples per seen class, these models tend to perform poorly. In this paper, we propose a meta-learning based generative model that naturally handles these limitations. The proposed model is based on integrating model-agnostic meta learning with a Wasserstein GAN (WGAN) to handle $(i)$ and $(iii)$, and uses a novel task distribution to handle $(ii)$. Our proposed model yields significant improvements on standard ZSL as well as more challenging GZSL setting. In ZSL setting, our model yields 4.5\%, 6.0\%, 9.8\%, and 27.9\% relative improvements over the current state-of-the-art on CUB, AWA1, AWA2, and aPY datasets, respectively.
△ Less
Submitted 10 September, 2019;
originally announced September 2019.
-
Play and Prune: Adaptive Filter Pruning for Deep Model Compression
Authors:
Pravendra Singh,
Vinay Kumar Verma,
Piyush Rai,
Vinay P. Namboodiri
Abstract:
While convolutional neural networks (CNN) have achieved impressive performance on various classification/recognition tasks, they typically consist of a massive number of parameters. This results in significant memory requirement as well as computational overheads. Consequently, there is a growing need for filter-level pruning approaches for compressing CNN based models that not only reduce the tot…
▽ More
While convolutional neural networks (CNN) have achieved impressive performance on various classification/recognition tasks, they typically consist of a massive number of parameters. This results in significant memory requirement as well as computational overheads. Consequently, there is a growing need for filter-level pruning approaches for compressing CNN based models that not only reduce the total number of parameters but reduce the overall computation as well. We present a new min-max framework for filter-level pruning of CNNs. Our framework, called Play and Prune (PP), jointly prunes and fine-tunes CNN model parameters, with an adaptive pruning rate, while maintaining the model's predictive performance. Our framework consists of two modules: (1) An adaptive filter pruning (AFP) module, which minimizes the number of filters in the model; and (2) A pruning rate controller (PRC) module, which maximizes the accuracy during pruning. Moreover, unlike most previous approaches, our approach allows directly specifying the desired error tolerance instead of pruning level. Our compressed models can be deployed at run-time, without requiring any special libraries or hardware. Our approach reduces the number of parameters of VGG-16 by an impressive factor of 17.5X, and number of FLOPS by 6.43X, with no loss of accuracy, significantly outperforming other state-of-the-art filter pruning methods.
△ Less
Submitted 11 May, 2019;
originally announced May 2019.
-
Generative Model for Zero-Shot Sketch-Based Image Retrieval
Authors:
Vinay Kumar Verma,
Aakansha Mishra,
Ashish Mishra,
Piyush Rai
Abstract:
We present a probabilistic model for Sketch-Based Image Retrieval (SBIR) where, at retrieval time, we are given sketches from novel classes, that were not present at training time. Existing SBIR methods, most of which rely on learning class-wise correspondences between sketches and images, typically work well only for previously seen sketch classes, and result in poor retrieval performance on nove…
▽ More
We present a probabilistic model for Sketch-Based Image Retrieval (SBIR) where, at retrieval time, we are given sketches from novel classes, that were not present at training time. Existing SBIR methods, most of which rely on learning class-wise correspondences between sketches and images, typically work well only for previously seen sketch classes, and result in poor retrieval performance on novel classes. To address this, we propose a generative model that learns to generate images, conditioned on a given novel class sketch. This enables us to reduce the SBIR problem to a standard image-to-image search problem. Our model is based on an inverse auto-regressive flow based variational autoencoder, with a feedback mechanism to ensure robust image generation. We evaluate our model on two very challenging datasets, Sketchy, and TU Berlin, with novel train-test split. The proposed approach significantly outperforms various baselines on both the datasets.
△ Less
Submitted 17 April, 2019;
originally announced April 2019.
-
HetConv: Heterogeneous Kernel-Based Convolutions for Deep CNNs
Authors:
Pravendra Singh,
Vinay Kumar Verma,
Piyush Rai,
Vinay P. Namboodiri
Abstract:
We present a novel deep learning architecture in which the convolution operation leverages heterogeneous kernels. The proposed HetConv (Heterogeneous Kernel-Based Convolution) reduces the computation (FLOPs) and the number of parameters as compared to standard convolution operation while still maintaining representational efficiency. To show the effectiveness of our proposed convolution, we presen…
▽ More
We present a novel deep learning architecture in which the convolution operation leverages heterogeneous kernels. The proposed HetConv (Heterogeneous Kernel-Based Convolution) reduces the computation (FLOPs) and the number of parameters as compared to standard convolution operation while still maintaining representational efficiency. To show the effectiveness of our proposed convolution, we present extensive experimental results on the standard convolutional neural network (CNN) architectures such as VGG \cite{vgg2014very} and ResNet \cite{resnet}. We find that after replacing the standard convolutional filters in these architectures with our proposed HetConv filters, we achieve 3X to 8X FLOPs based improvement in speed while still maintaining (and sometimes improving) the accuracy. We also compare our proposed convolutions with group/depth wise convolutions and show that it achieves more FLOPs reduction with significantly higher accuracy.
△ Less
Submitted 25 March, 2019; v1 submitted 11 March, 2019;
originally announced March 2019.
-
Leveraging Filter Correlations for Deep Model Compression
Authors:
Pravendra Singh,
Vinay Kumar Verma,
Piyush Rai,
Vinay P. Namboodiri
Abstract:
We present a filter correlation based model compression approach for deep convolutional neural networks. Our approach iteratively identifies pairs of filters with the largest pairwise correlations and drops one of the filters from each such pair. However, instead of discarding one of the filters from each such pair naïvely, the model is re-optimized to make the filters in these pairs maximally cor…
▽ More
We present a filter correlation based model compression approach for deep convolutional neural networks. Our approach iteratively identifies pairs of filters with the largest pairwise correlations and drops one of the filters from each such pair. However, instead of discarding one of the filters from each such pair naïvely, the model is re-optimized to make the filters in these pairs maximally correlated, so that discarding one of the filters from the pair results in minimal information loss. Moreover, after discarding the filters in each round, we further finetune the model to recover from the potential small loss incurred by the compression. We evaluate our proposed approach using a comprehensive set of experiments and ablation studies. Our compression method yields state-of-the-art FLOPs compression rates on various benchmarks, such as LeNet-5, VGG-16, and ResNet-50,56, while still achieving excellent predictive performance for tasks such as object detection on benchmark datasets.
△ Less
Submitted 15 January, 2020; v1 submitted 26 November, 2018;
originally announced November 2018.
-
A Generative Approach to Zero-Shot and Few-Shot Action Recognition
Authors:
Ashish Mishra,
Vinay Kumar Verma,
M Shiva Krishna Reddy,
Arulkumar S,
Piyush Rai,
Anurag Mittal
Abstract:
We present a generative framework for zero-shot action recognition where some of the possible action classes do not occur in the training data. Our approach is based on modeling each action class using a probability distribution whose parameters are functions of the attribute vector representing that action class. In particular, we assume that the distribution parameters for any action class in th…
▽ More
We present a generative framework for zero-shot action recognition where some of the possible action classes do not occur in the training data. Our approach is based on modeling each action class using a probability distribution whose parameters are functions of the attribute vector representing that action class. In particular, we assume that the distribution parameters for any action class in the visual space can be expressed as a linear combination of a set of basis vectors where the combination weights are given by the attributes of the action class. These basis vectors can be learned solely using labeled data from the known (i.e., previously seen) action classes, and can then be used to predict the parameters of the probability distributions of unseen action classes. We consider two settings: (1) Inductive setting, where we use only the labeled examples of the seen action classes to predict the unseen action class parameters; and (2) Transductive setting which further leverages unlabeled data from the unseen action classes. Our framework also naturally extends to few-shot action recognition where a few labeled examples from unseen classes are available. Our experiments on benchmark datasets (UCF101, HMDB51 and Olympic) show significant performance improvements as compared to various baselines, in both standard zero-shot (disjoint seen and unseen classes) and generalized zero-shot learning settings.
△ Less
Submitted 27 January, 2018;
originally announced January 2018.
-
Generalized Zero-Shot Learning via Synthesized Examples
Authors:
Vinay Kumar Verma,
Gundeep Arora,
Ashish Mishra,
Piyush Rai
Abstract:
We present a generative framework for generalized zero-shot learning where the training and test classes are not necessarily disjoint. Built upon a variational autoencoder based architecture, consisting of a probabilistic encoder and a probabilistic conditional decoder, our model can generate novel exemplars from seen/unseen classes, given their respective class attributes. These exemplars can sub…
▽ More
We present a generative framework for generalized zero-shot learning where the training and test classes are not necessarily disjoint. Built upon a variational autoencoder based architecture, consisting of a probabilistic encoder and a probabilistic conditional decoder, our model can generate novel exemplars from seen/unseen classes, given their respective class attributes. These exemplars can subsequently be used to train any off-the-shelf classification model. One of the key aspects of our encoder-decoder architecture is a feedback-driven mechanism in which a discriminator (a multivariate regressor) learns to map the generated exemplars to the corresponding class attribute vectors, leading to an improved generator. Our model's ability to generate and leverage examples from unseen classes to train the classification model naturally helps to mitigate the bias towards predicting seen classes in generalized zero-shot learning settings. Through a comprehensive set of experiments, we show that our model outperforms several state-of-the-art methods, on several benchmark datasets, for both standard as well as generalized zero-shot learning.
△ Less
Submitted 11 June, 2018; v1 submitted 11 December, 2017;
originally announced December 2017.
-
Zero-Shot Learning via Class-Conditioned Deep Generative Models
Authors:
Wenlin Wang,
Yunchen Pu,
Vinay Kumar Verma,
Kai Fan,
Yizhe Zhang,
Changyou Chen,
Piyush Rai,
Lawrence Carin
Abstract:
We present a deep generative model for learning to predict classes not seen at training time. Unlike most existing methods for this problem, that represent each class as a point (via a semantic embedding), we represent each seen/unseen class using a class-specific latent-space distribution, conditioned on class attributes. We use these latent-space distributions as a prior for a supervised variati…
▽ More
We present a deep generative model for learning to predict classes not seen at training time. Unlike most existing methods for this problem, that represent each class as a point (via a semantic embedding), we represent each seen/unseen class using a class-specific latent-space distribution, conditioned on class attributes. We use these latent-space distributions as a prior for a supervised variational autoencoder (VAE), which also facilitates learning highly discriminative feature representations for the inputs. The entire framework is learned end-to-end using only the seen-class training data. The model infers corresponding attributes of a test image by maximizing the VAE lower bound; the inferred attributes may be linked to labels not seen when training. We further extend our model to a (1) semi-supervised/transductive setting by leveraging unlabeled unseen-class data via an unsupervised learning module, and (2) few-shot learning where we also have a small number of labeled inputs from the unseen classes. We compare our model with several state-of-the-art methods through a comprehensive set of experiments on a variety of benchmark data sets.
△ Less
Submitted 19 November, 2017; v1 submitted 15 November, 2017;
originally announced November 2017.
-
A Simple Exponential Family Framework for Zero-Shot Learning
Authors:
Vinay Kumar Verma,
Piyush Rai
Abstract:
We present a simple generative framework for learning to predict previously unseen classes, based on estimating class-attribute-gated class-conditional distributions. We model each class-conditional distribution as an exponential family distribution and the parameters of the distribution of each seen/unseen class are defined as functions of the respective observed class attributes. These functions…
▽ More
We present a simple generative framework for learning to predict previously unseen classes, based on estimating class-attribute-gated class-conditional distributions. We model each class-conditional distribution as an exponential family distribution and the parameters of the distribution of each seen/unseen class are defined as functions of the respective observed class attributes. These functions can be learned using only the seen class data and can be used to predict the parameters of the class-conditional distribution of each unseen class. Unlike most existing methods for zero-shot learning that represent classes as fixed embeddings in some vector space, our generative model naturally represents each class as a probability distribution. It is simple to implement and also allows leveraging additional unlabeled data from unseen classes to improve the estimates of their class-conditional distributions using transductive/semi-supervised learning. Moreover, it extends seamlessly to few-shot learning by easily updating these distributions when provided with a small number of additional labelled examples from unseen classes. Through a comprehensive set of experiments on several benchmark data sets, we demonstrate the efficacy of our framework.
△ Less
Submitted 25 January, 2018; v1 submitted 25 July, 2017;
originally announced July 2017.
-
Multi-wavelength view of an M2.2 Solar Flare on 26 November 2000
Authors:
R. Chandra,
V. K. Verma,
S. Rani,
R. A. Maurya
Abstract:
In this paper, we present a study of an M2.2 class solar flare of 26 November 2000 from NOAA AR 9236. The flare was well observed by various ground based observatories (ARIES, Learmonths Solar Observatory) and space borne instruments (SOHO, HXRS, GOES) in time interval between 02:30 UT to 04:00 UT. The flare started with long arc-shape outer flare ribbon. Afterwards the main flare starts with two…
▽ More
In this paper, we present a study of an M2.2 class solar flare of 26 November 2000 from NOAA AR 9236. The flare was well observed by various ground based observatories (ARIES, Learmonths Solar Observatory) and space borne instruments (SOHO, HXRS, GOES) in time interval between 02:30 UT to 04:00 UT. The flare started with long arc-shape outer flare ribbon. Afterwards the main flare starts with two main ribbons. Initially the outer ribbons start to expand with an average speed ($\sim$ 20 km s$^{-1}$) and later it shows contraction. The flare was associated with partial halo coronal mass ejection (CMEs) which has average speed of 495 km s$^{-1}$. The SOHO/MDI observations show that the active region was in quadrupolar magnetic configuration. The flux cancellation was observed before the flare onset close to flare site. Our analysis indicate the flare was initiated by the magnetic breakout mechanism.
△ Less
Submitted 20 August, 2016;
originally announced August 2016.
-
Thickness-dependent magnetic properties and strain-induced orbital magnetic moment in SrRuO3 thin films
Authors:
K. Ishigami,
K. Yoshimatsu,
D. Toyota,
M. Takizawa,
T. Yoshida,
G. Shibata,
T. Harano,
Y. Takahashi,
T. Kadono,
V. K. Verma,
V. R. Singh,
Y. Takeda,
T. Okane,
Y. Saitoh,
H. Yamagami,
T. Koide,
M. Oshima,
H. Kumigashira,
A. Fujimori
Abstract:
Thin films of the ferromagnetic metal SrRuO3 (SRO) show a varying easy magnetization axis depending on the epitaxial strain and undergo a metal-to-insulator transition with decreasing film thickness. We have investigated the magnetic properties of SRO thin films with varying thicknesses fabricated on SrTiO3(001) substrates by soft x-ray magnetic circular dichroism (XMCD) at the Ru M2,3 edge. Resul…
▽ More
Thin films of the ferromagnetic metal SrRuO3 (SRO) show a varying easy magnetization axis depending on the epitaxial strain and undergo a metal-to-insulator transition with decreasing film thickness. We have investigated the magnetic properties of SRO thin films with varying thicknesses fabricated on SrTiO3(001) substrates by soft x-ray magnetic circular dichroism (XMCD) at the Ru M2,3 edge. Results have shown that, with decreasing film thickness, the film changes from ferromagnetic to non-magnetic around 3monolayer thickness, consistent with previous magnetization and magneto-optical Kerr effect measurements. The orbital magnetic moment perpendicular to the film was found to be ~ 0.1μB/Ru atom, and remained nearly unchanged with decreasing film thickness while the spin magnetic moment decreases. Mechanism for the formation of the orbital magnetic moment is discussed based on the electronic structure of the compressively strained SRO film.
△ Less
Submitted 4 July, 2015; v1 submitted 21 May, 2015;
originally announced May 2015.
-
Electronic and magnetic properties of off-stoichiometric Co$_\mathrm{2}$Mn$_β$Si/MgO interfaces studied by x-ray magnetic circular dichroism
Authors:
V. R. Singh,
V. K. Verma,
K. Ishigami,
G. Shibata,
A. Fujimori,
T. Koide,
Y. Miura,
M. Shirai,
T. Ishikawa,
G. f. Li,
M. Yamamoto
Abstract:
We have studied the electronic and magnetic states of Co and Mn atoms at the interface of the Co$_\mathrm{2}$Mn$_β$Si (CMS)/MgO ($β$=0.69, 0.99, 1.15 and 1.29) magnetic tunnel junction (MTJ) by means of x-ray magnetic circular dichroism. In particular, the Mn composition ($β$) dependences of the Mn and Co magnetic moments were investigated. The experimental spin magnetic moments of Mn,…
▽ More
We have studied the electronic and magnetic states of Co and Mn atoms at the interface of the Co$_\mathrm{2}$Mn$_β$Si (CMS)/MgO ($β$=0.69, 0.99, 1.15 and 1.29) magnetic tunnel junction (MTJ) by means of x-ray magnetic circular dichroism. In particular, the Mn composition ($β$) dependences of the Mn and Co magnetic moments were investigated. The experimental spin magnetic moments of Mn, $m_\mathrm{spin}$(Mn), derived from XMCD weakly decreased with increasing Mn composition $β$ in going from Mn-deficient to Mn-rich CMS films. This behavior was explained by first-principles calculations based on the antisite-based site-specific formula unit (SSFU) composition model, which assumes the formation of only antisite defect, not vacancies, to accommodate off-stoichiometry. Furthermore, the experimental spin magnetic moments of Co, $m_\mathrm{spin}$(Co), also weakly decreased with increasing Mn composition. This behavior was consistently explained by the antisite-based SSFU model, in particular, by the decrease in the concentration of Co$_\mathrm{Mn}$ antisites detrimental to the half-metallicity of CMS with increasing $β$. This finding is consistent with the higher TMR ratios which have been observed for CMS/MgO/CMS MTJs with Mn-rich CMS electrodes.
△ Less
Submitted 9 April, 2015;
originally announced April 2015.
-
Thickness-dependent ferromagnetic metal to paramagnetic insulator transition in La$_{0.6}$Sr$_{0.4}$MnO$_3$ thin films studied by x-ray magnetic circular dichroism
Authors:
Goro Shibata,
Kohei Yoshimatsu,
Enju Sakai,
Vijay Raj Singh,
Virendra Kumar Verma,
Keisuke Ishigami,
Takayuki Harano,
Toshiharu Kadono,
Yukiharu Takeda,
Tetsuo Okane,
Yuji Saitoh,
Hiroshi Yamagami,
Akihito Sawa,
Hiroshi Kumigashira,
Masaharu Oshima,
Tsuneharu Koide,
Atsushi Fujimori
Abstract:
Metallic transition-metal oxides undergo a metal-to-insulator transition (MIT) as the film thickness decreases across a ritical thickness of several monolayers (MLs), but its driving mechanism remains controversial. We have studied the thickness-dependent MIT of the ferromagnetic metal La$_{0.6}$Sr$_{0.4}$MnO$_3$ by x-ray absorption spectroscopy and x-ray magnetic circular dichroism. As the film t…
▽ More
Metallic transition-metal oxides undergo a metal-to-insulator transition (MIT) as the film thickness decreases across a ritical thickness of several monolayers (MLs), but its driving mechanism remains controversial. We have studied the thickness-dependent MIT of the ferromagnetic metal La$_{0.6}$Sr$_{0.4}$MnO$_3$ by x-ray absorption spectroscopy and x-ray magnetic circular dichroism. As the film thickness was decreased across the critical thickness of the MIT (6-8 ML), a gradual decrease of the ferromagnetic signals and a concomitant increase of paramagnetic signals were observed, while the Mn valence abruptly decreased towards Mn$^{3+}$. These observations suggest that the ferromagnetic phase gradually and most likely inhomogeneously turns into the paramagnetic phase and both phases abruptly become insulating at the critical thickness.
△ Less
Submitted 24 June, 2014; v1 submitted 3 November, 2013;
originally announced November 2013.
-
Orbital magnetic moment and coercivity of SiO$_{2}$-coated FePt nanoparticles studied by x-ray magnetic circular dichroism
Authors:
Y. Takahashi,
T. Kadono,
V. R. Singh,
V. K. Verma,
K. Ishigami,
G. Shibata,
T. Harano,
A. Fujimori,
Y. Takeda,
T. Okane,
Y. Saitoh,
H. Yamagami,
S. Yamamoto,
M. Takano
Abstract:
We have investigated the spin and orbital magnetic moments of Fe in FePt nanoparticles in the $L$1$_{0}$-ordered phase coated with SiO$_{2}$ by x-ray absorption spectroscopy (XAS) and x-ray magnetic circular dichroism (XMCD) measurements at the Fe $L_{\rm 2,3}$ absorption edges. Using XMCD sum rules, we evaluated the ratio of the orbital magnetic moment ($M_{\rm orb}$) to the spin magnetic moment…
▽ More
We have investigated the spin and orbital magnetic moments of Fe in FePt nanoparticles in the $L$1$_{0}$-ordered phase coated with SiO$_{2}$ by x-ray absorption spectroscopy (XAS) and x-ray magnetic circular dichroism (XMCD) measurements at the Fe $L_{\rm 2,3}$ absorption edges. Using XMCD sum rules, we evaluated the ratio of the orbital magnetic moment ($M_{\rm orb}$) to the spin magnetic moment ($M_{\rm spin}$) of Fe to be $M_{\rm orb}/M_{\rm spin}$ = 0.08. This $M_{\rm orb}/M_{\rm spin}$ value is comparable to the value (0.09) obtained for FePt nanoparticles prepared by gas phase condensation, and is larger than the values ($\sim$0.05) obtained for FePt thin films, indicating a high degree of $L$1$_{0}$ order. The hysteretic behavior of the FePt component of the magnetization was measured by XMCD. The magnetic coercivity ($H_{\rm c}$) was found to be as large as 1.8 T at room temperature, $\sim$3 times larger than the thin film value and $\sim$50 times larger than that of the gas phase condensed nanoparticles. The hysteresis curve is well explained by the Stoner-Wohlfarth model for non-interacting single-domain nanoparticles with the $H_{\rm c}$ distributed from 1 T to 5 T.
△ Less
Submitted 30 October, 2013;
originally announced October 2013.
-
Phase diagram of Ca$_{1-x}$Ce$_x$MnO$_3$ thin films studied by X-ray magnetic circular dichroism
Authors:
T. Harano,
G. Shibata,
K. Yoshimatsu,
K. Ishigami,
V. K. Verma,
Y. Takahashi,
T. Kadono,
T. Yoshida,
A. Fujimori,
T. Koide,
F. -H. Chang,
H. -J. Lin,
D. -J. Huang,
C. -T. Chen,
P. -H. Xiang,
H. Yamada,
A. Sawa
Abstract:
In the perovskite-type Ca$_{1-x}$Ce$_{x}$MnO$_{3}$ (CCMO), one can control the transport and magnetic properties through varying Ce content. In the case of thin films, the properties can also be controlled by epitaxial strain from the substrate through changing it such as YAlO$_{3}$ (YAO), NdAlO$_{3}$ (NAO), and LaSrAlO$_{4}$ (LSAO). However, one cannot measure the magnetization of thin films on N…
▽ More
In the perovskite-type Ca$_{1-x}$Ce$_{x}$MnO$_{3}$ (CCMO), one can control the transport and magnetic properties through varying Ce content. In the case of thin films, the properties can also be controlled by epitaxial strain from the substrate through changing it such as YAlO$_{3}$ (YAO), NdAlO$_{3}$ (NAO), and LaSrAlO$_{4}$ (LSAO). However, one cannot measure the magnetization of thin films on NAO substrates by conventional magnetization measurements because of the strong paramagnetic signals from the Nd$^{3+}$ ions. In order to eliminate the influence of Nd$^{3+}$ and to identify magnetic phases of the CCMO thin films, we have performed element-selective X-ray magnetic circular dichroism (XMCD) measurements of the Mn 2{\it p} core level. By studying the anisotropy of the XMCD intensity, we could unambiguously determine the magnetic phase diagram of the CCMO thin films.
△ Less
Submitted 23 October, 2013;
originally announced October 2013.
-
Role of doped Ru in coercivity-enhanced La$_{0.6}$Sr$_{0.4}MnO$_3$ thin film studied by x-ray magnetic circular dichroism
Authors:
T. Harano,
G. Shibata,
K. Ishigami,
Y. Takashashi,
V. K. Verma,
V. R. Singh,
T. Kadono,
A. Fujimori,
Y. Takeda,
T. Okane,
Y. Saitoh,
H. Yamagami,
T. Koide,
H. Yamada,
A. Sawa,
M. Kawasaki,
Y. Tokura,
A. Tanaka
Abstract:
The coercivity of La$_{1-x}$Sr$_x$MnO$_3$ thin films can be enhanced by Ru substitution for Mn. In order to elucidate its mechanism, we performed soft x-ray absorption and magnetic circular dichroism measurements at the Ru M$_{2,3}$ and Mn L$_{2,3}$ edges. We found that the spin direction of Ru and Mn are opposite and that Ru has a finite orbital magnetic moment. Cluster-model analysis indicated t…
▽ More
The coercivity of La$_{1-x}$Sr$_x$MnO$_3$ thin films can be enhanced by Ru substitution for Mn. In order to elucidate its mechanism, we performed soft x-ray absorption and magnetic circular dichroism measurements at the Ru M$_{2,3}$ and Mn L$_{2,3}$ edges. We found that the spin direction of Ru and Mn are opposite and that Ru has a finite orbital magnetic moment. Cluster-model analysis indicated that the finite orbital magnetic moment as well as the reduced spin moment of Ru result from local lattice distortion caused by epitaxial strain from the SrTiO$_3$ substrate in the presence of spin-orbit interaction.
△ Less
Submitted 10 September, 2013;
originally announced September 2013.
-
Enhanced ferromagnetic moment in Co-doped BiFeO3 thin films studied by soft X-ray circular dichroism
Authors:
V. R. Singh,
V. K. Verma,
K. Ishigami,
G. Shibata,
Y. Yamazaki,
A. Fujimori,
Y. Takeda,
T. Okane,
Y. Saitoh,
H. Yamagami,
Y. Nakamura,
M. Azuma,
Y. Shimakawa
Abstract:
BiFeO$_3$ (BFO) shows both ferroelectricity and magnetic ordering at room temperature but its ferromagnetic component, which is due to spin canting, is negligible. Substitution of transition-metal atoms such as Co for Fe is known to enhance the ferromagnetic component in BFO. In order to reveal the origin of such magnetization enhancement, we performed soft x-ray absorption spectroscopy (XAS) and…
▽ More
BiFeO$_3$ (BFO) shows both ferroelectricity and magnetic ordering at room temperature but its ferromagnetic component, which is due to spin canting, is negligible. Substitution of transition-metal atoms such as Co for Fe is known to enhance the ferromagnetic component in BFO. In order to reveal the origin of such magnetization enhancement, we performed soft x-ray absorption spectroscopy (XAS) and soft x-ray magnetic circular dichroism (XMCD) studies of BiFe$_{1-x}$Co$_x$O$_3$ ({\it x} = 0 to 0.30) (BFCO) thin films grown on LaAlO$_3$(001) substrates. The XAS results indicated that the Fe and Co ions are in the Fe$^{3+}$ and Co$^{3+}$ states. The XMCD results showed that the Fe ions show ferromagnetism while the Co ions are antiferromagnetic at room temperature. The XAS and XMCD measurements also revealed that part of the Fe$^{3+}$ ions are tetrahedrally co-ordinated by oxygen ions but that the XMCD signals of the octahedrally coordinated Fe$^{3+}$ ions increase with Co content. The results suggest that an impurity phase such as the ferrimagnetic $γ$-Fe$_2$O$_3$ which exists at low Co concentration decreases with increasing Co concentration and that the ferromagnetic component of the Fe$^{3+}$ ion in the octrahedral crystal fields increases with Co concentration, probably reflecting the increased canting of the Fe$^{3+}$ ions.
△ Less
Submitted 25 August, 2013;
originally announced August 2013.
-
Observation of magnetically hard grain boundaries in double-perovskite Sr$_{2}$FeMoO$_{6}$
Authors:
Y. Takahashi,
V. K. Verma,
G. Shibata,
T. Harano,
K. Ishigami,
K. Yoshimatsu,
T. Kadono,
A. Fujimori,
A. Tanaka,
F. -H. Chang,
H. -J. Lin,
D. J. Huang,
C. T. Chen,
B. Pal,
D. D. Sarma
Abstract:
Unusual low temperature magneto-resistance (MR) of ferromagnetic Sr$_{2}$FeMoO$_{6}$ polycrystals has been attributed to magnetically hard grain boundaries which act as spin valves. We detected the different magnetic hysteresis curves for the grains and the grain boundaries of polycrystalline Sr$_{2}$FeMoO$_{6}$ by utilizing the different probing depths of the different detection modes of x-ray ab…
▽ More
Unusual low temperature magneto-resistance (MR) of ferromagnetic Sr$_{2}$FeMoO$_{6}$ polycrystals has been attributed to magnetically hard grain boundaries which act as spin valves. We detected the different magnetic hysteresis curves for the grains and the grain boundaries of polycrystalline Sr$_{2}$FeMoO$_{6}$ by utilizing the different probing depths of the different detection modes of x-ray absorption spectroscopy (XAS) and x-ray magnetic circular dichroism (XMCD), namely, the total electron yield (TEY) mode (probing depth $\sim$5 nm) and the total fluorescence yield (TFY) mode (probing depth $\sim$100 nm). At 20 K, the magnetic coercivity detected in the TEY mode ($H_{\rm c,TEY}$) was several times larger than that in the TFY mode ($H_{\rm c,TFY}$), indicating harder ferromagnetism of the grain boundaries than that of the grains. At room temperature, the grain boundary magnetism became soft and $H_{\rm c,TEY}$ and $H_{\rm c,TFY}$ were nearly the same. From line-shape analysis of the XAS and XMCD spectra, we found that in the grain boundary region the ferromagnetic component is dominated by Fe$^{2+}$ or well-screened signals while the non-magnetic component is dominated by Fe$^{3+}$ or poorly-screened signals.
△ Less
Submitted 20 May, 2013;
originally announced May 2013.
-
X-ray absorption spectroscopy and X-ray magnetic circular dichroism studies of transition-metal-co-doped ZnO nano-particles
Authors:
T. Kataoka,
Y. Yamazaki,
V. R. Singh,
Y. Sakamoto,
K. Ishigami,
V. K. Verma,
A. Fujimori,
F. -H. Chang,
H. -J. Lin,
D. J. Huang,
C. T. Chen,
D. Asakura,
T. Koide,
A. Tanaka,
D. Karmakar,
S. K. Mandal,
T. K. Nath,
I. Dagupta
Abstract:
We report on x-ray absorption spectroscopy (XAS) and x-ray magnetic circular dichroism (XMCD) studies of the paramagnetic (Mn,Co)-co-doped ZnO and ferromagnetic (Fe,Co)-co-doped ZnO nano-particles. Both the surface-sensitive total-electron-yield mode and the bulk-sensitive total-fluorescence-yield mode have been employed to extract the valence and spin states of the surface and inner core regions…
▽ More
We report on x-ray absorption spectroscopy (XAS) and x-ray magnetic circular dichroism (XMCD) studies of the paramagnetic (Mn,Co)-co-doped ZnO and ferromagnetic (Fe,Co)-co-doped ZnO nano-particles. Both the surface-sensitive total-electron-yield mode and the bulk-sensitive total-fluorescence-yield mode have been employed to extract the valence and spin states of the surface and inner core regions of the nano-particles. XAS spectra reveal that significant part of the doped Mn and Co atoms are found in the trivalent and tetravalent state in particular in the surface region while majority of Fe atoms are found in the trivalent state both in the inner core region and surface region. The XMCD spectra show that the Fe$^{3+}$ ions in the surface region give rise to the ferromagnetism while both the Co and Mn ions in the surface region show only paramagnetic behaviors. The transition-metal atoms in the inner core region do not show magnetic signals, meaning that they are antiferromagnetically coupled. The present result combined with the previous results on transition-metal-doped ZnO nano-particles and nano-wires suggest that doped holes, probably due to Zn vacancy formation at the surfaces of the nano-particles and nano-wires, rather than doped electrons are involved in the occurrence of ferromagnetism in these systems.
△ Less
Submitted 18 August, 2012; v1 submitted 15 August, 2012;
originally announced August 2012.
-
Ferromagnetism of cobalt-doped anatase TiO$_2$ studied by bulk- and surface-sensitive soft x-ray magnetic circular dichroism
Authors:
V. R. Singh,
K. Ishigami,
V. K. Verma,
G. Shibata,
Y. Yamazaki,
T. Kataoka,
A. Fujimori,
F. -H. Chang,
D. -J. Huang,
H. -J. Lin,
C. T. Chen,
Y. Yamada,
T. Fukumura,
M. Kawasaki
Abstract:
We have studied magnetism in anatase Ti$_{1-x}$Co$_x$O$_{2-δ}$ ({\it x} = 0.05) thin films with various electron carrier densities, by soft x-ray magnetic circular dichroism (XMCD) measurements at the Co $L_{2,3}$ absorption edges. For electrically conducting samples, the magnetic moment estimated by XMCD was $<$ 0.3 $μ_B$/Co using the surface-sensitive total electron yield (TEY) mode, while it wa…
▽ More
We have studied magnetism in anatase Ti$_{1-x}$Co$_x$O$_{2-δ}$ ({\it x} = 0.05) thin films with various electron carrier densities, by soft x-ray magnetic circular dichroism (XMCD) measurements at the Co $L_{2,3}$ absorption edges. For electrically conducting samples, the magnetic moment estimated by XMCD was $<$ 0.3 $μ_B$/Co using the surface-sensitive total electron yield (TEY) mode, while it was 0.3-2.4 $μ_B$/Co using the bulk-sensitive total fluorescence yield (TFY) mode. The latter value is in the same range as the saturation magnetization 0.6-2.1 $μ_B$/Co deduced by SQUID measurement. The magnetization and the XMCD intensity increased with carrier density, consistent with the carrier-induced origin of the ferromagnetism.
△ Less
Submitted 1 June, 2012;
originally announced June 2012.
-
Discrimination of English to other Indian languages (Kannada and Hindi) for OCR system
Authors:
Ankit Kumar,
Tushar Patnaik,
Vivek Kr Verma
Abstract:
India is a multilingual multi-script country. In every state of India there are two languages one is state local language and the other is English. For example in Andhra Pradesh, a state in India, the document may contain text words in English and Telugu script. For Optical Character Recognition (OCR) of such a bilingual document, it is necessary to identify the script before feeding the text word…
▽ More
India is a multilingual multi-script country. In every state of India there are two languages one is state local language and the other is English. For example in Andhra Pradesh, a state in India, the document may contain text words in English and Telugu script. For Optical Character Recognition (OCR) of such a bilingual document, it is necessary to identify the script before feeding the text words to the OCRs of individual scripts. In this paper, we are introducing a simple and efficient technique of script identification for Kannada, English and Hindi text words of a printed document. The proposed approach is based on the horizontal and vertical projection profile for the discrimination of the three scripts. The feature extraction is done based on the horizontal projection profile of each text words. We analysed 700 different words of Kannada, English and Hindi in order to extract the discrimination features and for the development of knowledge base. We use the horizontal projection profile of each text word and based on the horizontal projection profile we extract the appropriate features. The proposed system is tested on 100 different document images containing more than 1000 text words of each script and a classification rate of 98.25%, 99.25% and 98.87% is achieved for Kannada, English and Hindi respectively.
△ Less
Submitted 10 May, 2012;
originally announced May 2012.
-
Phase-field simulations of viscous fingering in shear-thinning fluids
Authors:
Sebastien Nguyen,
Roger Folch,
Vijay K. Verma,
Hervé Henry,
Mathis Plapp
Abstract:
A phase-field model for the Hele-Shaw flow of non-Newtonian fluids is developed. It extends a previous model for Newtonian fluids to a wide range of shear-dependent fluids. The model is applied to perform simulations of viscous fingering in shear- thinning fluids, and it is found to be capable of describing the complete crossover from the Newtonian regime at low shear rate to the strongly shear-…
▽ More
A phase-field model for the Hele-Shaw flow of non-Newtonian fluids is developed. It extends a previous model for Newtonian fluids to a wide range of shear-dependent fluids. The model is applied to perform simulations of viscous fingering in shear- thinning fluids, and it is found to be capable of describing the complete crossover from the Newtonian regime at low shear rate to the strongly shear-thinning regime at high shear rate. The width selection of a single steady-state finger is studied in detail for a 2-plateaux shear-thinning law (Carreau law) in both its weakly and strongly shear-thinning limits, and the results are related to previous analyses. In the strongly shear-thinning regime a rescaling is found for power-law (Ostwald-de-Waehle) fluids that allows for a direct comparison between simulations and experiments without any adjustable parameters, and good agreement is obtained.
△ Less
Submitted 1 December, 2009;
originally announced December 2009.