-
GAN-MPC: Training Model Predictive Controllers with Parameterized Cost Functions using Demonstrations from Non-identical Experts
Authors:
Returaj Burnwal,
Anirban Santara,
Nirav P. Bhatt,
Balaraman Ravindran,
Gaurav Aggarwal
Abstract:
Model predictive control (MPC) is a popular approach for trajectory optimization in practical robotics applications. MPC policies can optimize trajectory parameters under kinodynamic and safety constraints and provide guarantees on safety, optimality, generalizability, interpretability, and explainability. However, some behaviors are complex and it is difficult to hand-craft an MPC objective funct…
▽ More
Model predictive control (MPC) is a popular approach for trajectory optimization in practical robotics applications. MPC policies can optimize trajectory parameters under kinodynamic and safety constraints and provide guarantees on safety, optimality, generalizability, interpretability, and explainability. However, some behaviors are complex and it is difficult to hand-craft an MPC objective function. A special class of MPC policies called Learnable-MPC addresses this difficulty using imitation learning from expert demonstrations. However, they require the demonstrator and the imitator agents to be identical which is hard to satisfy in many real world applications of robotics. In this paper, we address the practical problem of training Learnable-MPC policies when the demonstrator and the imitator do not share the same dynamics and their state spaces may have a partial overlap. We propose a novel approach that uses a generative adversarial network (GAN) to minimize the Jensen-Shannon divergence between the state-trajectory distributions of the demonstrator and the imitator. We evaluate our approach on a variety of simulated robotics tasks of DeepMind Control suite and demonstrate the efficacy of our approach at learning the demonstrator's behavior without having to copy their actions.
△ Less
Submitted 7 June, 2023; v1 submitted 30 May, 2023;
originally announced May 2023.
-
A Contextual Bandit Approach for Learning to Plan in Environments with Probabilistic Goal Configurations
Authors:
Sohan Rudra,
Saksham Goel,
Anirban Santara,
Claudio Gentile,
Laurent Perron,
Fei Xia,
Vikas Sindhwani,
Carolina Parada,
Gaurav Aggarwal
Abstract:
Object-goal navigation (Object-nav) entails searching, recognizing and navigating to a target object. Object-nav has been extensively studied by the Embodied-AI community, but most solutions are often restricted to considering static objects (e.g., television, fridge, etc.). We propose a modular framework for object-nav that is able to efficiently search indoor environments for not just static obj…
▽ More
Object-goal navigation (Object-nav) entails searching, recognizing and navigating to a target object. Object-nav has been extensively studied by the Embodied-AI community, but most solutions are often restricted to considering static objects (e.g., television, fridge, etc.). We propose a modular framework for object-nav that is able to efficiently search indoor environments for not just static objects but also movable objects (e.g. fruits, glasses, phones, etc.) that frequently change their positions due to human intervention. Our contextual-bandit agent efficiently explores the environment by showing optimism in the face of uncertainty and learns a model of the likelihood of spotting different objects from each navigable location. The likelihoods are used as rewards in a weighted minimum latency solver to deduce a trajectory for the robot. We evaluate our algorithms in two simulated environments and a real-world setting, to demonstrate high sample efficiency and reliability.
△ Less
Submitted 29 November, 2022;
originally announced November 2022.
-
On Learning to Rank Long Sequences with Contextual Bandits
Authors:
Anirban Santara,
Claudio Gentile,
Gaurav Aggarwal,
Shuai Li
Abstract:
Motivated by problems of learning to rank long item sequences, we introduce a variant of the cascading bandit model that considers flexible length sequences with varying rewards and losses. We formulate two generative models for this problem within the generalized linear setting, and design and analyze upper confidence algorithms for it. Our analysis delivers tight regret bounds which, when specia…
▽ More
Motivated by problems of learning to rank long item sequences, we introduce a variant of the cascading bandit model that considers flexible length sequences with varying rewards and losses. We formulate two generative models for this problem within the generalized linear setting, and design and analyze upper confidence algorithms for it. Our analysis delivers tight regret bounds which, when specialized to vanilla cascading bandits, results in sharper guarantees than previously available in the literature. We evaluate our algorithms on a number of real-world datasets, and show significantly improved empirical performance as compared to known cascading bandit baselines.
△ Less
Submitted 7 June, 2021;
originally announced June 2021.
-
Unlocking Pixels for Reinforcement Learning via Implicit Attention
Authors:
Krzysztof Marcin Choromanski,
Deepali Jain,
Wenhao Yu,
Xingyou Song,
Jack Parker-Holder,
Tingnan Zhang,
Valerii Likhosherstov,
Aldo Pacchiano,
Anirban Santara,
Yunhao Tang,
Jie Tan,
Adrian Weller
Abstract:
There has recently been significant interest in training reinforcement learning (RL) agents in vision-based environments. This poses many challenges, such as high dimensionality and the potential for observational overfitting through spurious correlations. A promising approach to solve both of these problems is an attention bottleneck, which provides a simple and effective framework for learning h…
▽ More
There has recently been significant interest in training reinforcement learning (RL) agents in vision-based environments. This poses many challenges, such as high dimensionality and the potential for observational overfitting through spurious correlations. A promising approach to solve both of these problems is an attention bottleneck, which provides a simple and effective framework for learning high performing policies, even in the presence of distractions. However, due to poor scalability of attention architectures, these methods cannot be applied beyond low resolution visual inputs, using large patches (thus small attention matrices). In this paper we make use of new efficient attention algorithms, recently shown to be highly effective for Transformers, and demonstrate that these techniques can be successfully adopted for the RL setting. This allows our attention-based controllers to scale to larger visual inputs, and facilitate the use of smaller patches, even individual pixels, improving generalization. We show this on a range of tasks from the Distracting Control Suite to vision-based quadruped robots locomotion. We provide rigorous theoretical analysis of the proposed algorithm.
△ Less
Submitted 1 October, 2021; v1 submitted 8 February, 2021;
originally announced February 2021.
-
MADRaS : Multi Agent Driving Simulator
Authors:
Anirban Santara,
Sohan Rudra,
Sree Aditya Buridi,
Meha Kaushik,
Abhishek Naik,
Bharat Kaul,
Balaraman Ravindran
Abstract:
In this work, we present MADRaS, an open-source multi-agent driving simulator for use in the design and evaluation of motion planning algorithms for autonomous driving. MADRaS provides a platform for constructing a wide variety of highway and track driving scenarios where multiple driving agents can train for motion planning tasks using reinforcement learning and other machine learning algorithms.…
▽ More
In this work, we present MADRaS, an open-source multi-agent driving simulator for use in the design and evaluation of motion planning algorithms for autonomous driving. MADRaS provides a platform for constructing a wide variety of highway and track driving scenarios where multiple driving agents can train for motion planning tasks using reinforcement learning and other machine learning algorithms. MADRaS is built on TORCS, an open-source car-racing simulator. TORCS offers a variety of cars with different dynamic properties and driving tracks with different geometries and surface properties. MADRaS inherits these functionalities from TORCS and introduces support for multi-agent training, inter-vehicular communication, noisy observations, stochastic actions, and custom traffic cars whose behaviours can be programmed to simulate challenging traffic conditions encountered in the real world. MADRaS can be used to create driving tasks whose complexities can be tuned along eight axes in well-defined steps. This makes it particularly suited for curriculum and continual learning. MADRaS is lightweight and it provides a convenient OpenAI Gym interface for independent control of each car. Apart from the primitive steering-acceleration-brake control mode of TORCS, MADRaS offers a hierarchical track-position -- speed control that can potentially be used to achieve better generalization. MADRaS uses multiprocessing to run each agent as a parallel process for efficiency and integrates well with popular reinforcement learning libraries like RLLib.
△ Less
Submitted 2 October, 2020;
originally announced October 2020.
-
ExTra: Transfer-guided Exploration
Authors:
Anirban Santara,
Rishabh Madan,
Balaraman Ravindran,
Pabitra Mitra
Abstract:
In this work we present a novel approach for transfer-guided exploration in reinforcement learning that is inspired by the human tendency to leverage experiences from similar encounters in the past while navigating a new task. Given an optimal policy in a related task-environment, we show that its bisimulation distance from the current task-environment gives a lower bound on the optimal advantage…
▽ More
In this work we present a novel approach for transfer-guided exploration in reinforcement learning that is inspired by the human tendency to leverage experiences from similar encounters in the past while navigating a new task. Given an optimal policy in a related task-environment, we show that its bisimulation distance from the current task-environment gives a lower bound on the optimal advantage of state-action pairs in the current task-environment. Transfer-guided Exploration (ExTra) samples actions from a Softmax distribution over these lower bounds. In this way, actions with potentially higher optimum advantage are sampled more frequently. In our experiments on gridworld environments, we demonstrate that given access to an optimal policy in a related task-environment, ExTra can outperform popular domain-specific exploration strategies viz. epsilon greedy, Model-Based Interval Estimation - Exploration Bonus (MBIE-EB), Pursuit and Boltzmann in rate of convergence. We further show that ExTra is robust to choices of source task and shows a graceful degradation of performance as the dissimilarity of the source task increases. We also demonstrate that ExTra, when used alongside traditional exploration algorithms, improves their rate of convergence. Thus it is capable of complementing the efficacy of traditional exploration algorithms.
△ Less
Submitted 27 May, 2020; v1 submitted 27 June, 2019;
originally announced June 2019.
-
PUNCH: Positive UNlabelled Classification based information retrieval in Hyperspectral images
Authors:
Anirban Santara,
Jayeeta Datta,
Sourav Sarkar,
Ankur Garg,
Kirti Padia,
Pabitra Mitra
Abstract:
Hyperspectral images of land-cover captured by airborne or satellite-mounted sensors provide a rich source of information about the chemical composition of the materials present in a given place. This makes hyperspectral imaging an important tool for earth sciences, land-cover studies, and military and strategic applications. However, the scarcity of labeled training examples and spatial variabili…
▽ More
Hyperspectral images of land-cover captured by airborne or satellite-mounted sensors provide a rich source of information about the chemical composition of the materials present in a given place. This makes hyperspectral imaging an important tool for earth sciences, land-cover studies, and military and strategic applications. However, the scarcity of labeled training examples and spatial variability of spectral signature are two of the biggest challenges faced by hyperspectral image classification. In order to address these issues, we aim to develop a framework for material-agnostic information retrieval in hyperspectral images based on Positive-Unlabelled (PU) classification. Given a hyperspectral scene, the user labels some positive samples of a material he/she is looking for and our goal is to retrieve all the remaining instances of the query material in the scene. Additionally, we require the system to work equally well for any material in any scene without the user having to disclose the identity of the query material. This material-agnostic nature of the framework provides it with superior generalization abilities. We explore two alternative approaches to solve the hyperspectral image classification problem within this framework. The first approach is an adaptation of non-negative risk estimation based PU learning for hyperspectral data. The second approach is based on one-versus-all positive-negative classification where the negative class is approximately sampled using a novel spectral-spatial retrieval model. We propose two annotator models - uniform and blob - that represent the labelling patterns of a human annotator. We compare the performances of the proposed algorithms for each annotator model on three benchmark hyperspectral image datasets - Indian Pines, Pavia University and Salinas.
△ Less
Submitted 9 April, 2019;
originally announced April 2019.
-
RAIL: Risk-Averse Imitation Learning
Authors:
Anirban Santara,
Abhishek Naik,
Balaraman Ravindran,
Dipankar Das,
Dheevatsa Mudigere,
Sasikanth Avancha,
Bharat Kaul
Abstract:
Imitation learning algorithms learn viable policies by imitating an expert's behavior when reward signals are not available. Generative Adversarial Imitation Learning (GAIL) is a state-of-the-art algorithm for learning policies when the expert's behavior is available as a fixed set of trajectories. We evaluate in terms of the expert's cost function and observe that the distribution of trajectory-c…
▽ More
Imitation learning algorithms learn viable policies by imitating an expert's behavior when reward signals are not available. Generative Adversarial Imitation Learning (GAIL) is a state-of-the-art algorithm for learning policies when the expert's behavior is available as a fixed set of trajectories. We evaluate in terms of the expert's cost function and observe that the distribution of trajectory-costs is often more heavy-tailed for GAIL-agents than the expert at a number of benchmark continuous-control tasks. Thus, high-cost trajectories, corresponding to tail-end events of catastrophic failure, are more likely to be encountered by the GAIL-agents than the expert. This makes the reliability of GAIL-agents questionable when it comes to deployment in risk-sensitive applications like robotic surgery and autonomous driving. In this work, we aim to minimize the occurrence of tail-end events by minimizing tail risk within the GAIL framework. We quantify tail risk by the Conditional-Value-at-Risk (CVaR) of trajectories and develop the Risk-Averse Imitation Learning (RAIL) algorithm. We observe that the policies learned with RAIL show lower tail-end risk than those of vanilla GAIL. Thus the proposed RAIL algorithm appears as a potent alternative to GAIL for improved reliability in risk-sensitive applications.
△ Less
Submitted 29 November, 2017; v1 submitted 20 July, 2017;
originally announced July 2017.
-
BASS Net: Band-Adaptive Spectral-Spatial Feature Learning Neural Network for Hyperspectral Image Classification
Authors:
Anirban Santara,
Kaustubh Mani,
Pranoot Hatwar,
Ankit Singh,
Ankur Garg,
Kirti Padia,
Pabitra Mitra
Abstract:
Deep learning based landcover classification algorithms have recently been proposed in literature. In hyperspectral images (HSI) they face the challenges of large dimensionality, spatial variability of spectral signatures and scarcity of labeled data. In this article we propose an end-to-end deep learning architecture that extracts band specific spectral-spatial features and performs landcover cla…
▽ More
Deep learning based landcover classification algorithms have recently been proposed in literature. In hyperspectral images (HSI) they face the challenges of large dimensionality, spatial variability of spectral signatures and scarcity of labeled data. In this article we propose an end-to-end deep learning architecture that extracts band specific spectral-spatial features and performs landcover classification. The architecture has fewer independent connection weights and thus requires lesser number of training data. The method is found to outperform the highest reported accuracies on popular hyperspectral image data sets.
△ Less
Submitted 2 December, 2016; v1 submitted 1 December, 2016;
originally announced December 2016.
-
WEPSAM: Weakly Pre-Learnt Saliency Model
Authors:
Avisek Lahiri,
Sourya Roy,
Anirban Santara,
Pabitra Mitra,
Prabir Kumar Biswas
Abstract:
Visual saliency detection tries to mimic human vision psychology which concentrates on sparse, important areas in natural image. Saliency prediction research has been traditionally based on low level features such as contrast, edge, etc. Recent thrust in saliency prediction research is to learn high level semantics using ground truth eye fixation datasets. In this paper we present, WEPSAM : Weakly…
▽ More
Visual saliency detection tries to mimic human vision psychology which concentrates on sparse, important areas in natural image. Saliency prediction research has been traditionally based on low level features such as contrast, edge, etc. Recent thrust in saliency prediction research is to learn high level semantics using ground truth eye fixation datasets. In this paper we present, WEPSAM : Weakly Pre-Learnt Saliency Model as a pioneering effort of using domain specific pre-learing on ImageNet for saliency prediction using a light weight CNN architecture. The paper proposes a two step hierarchical learning, in which the first step is to develop a framework for weakly pre-training on a large scale dataset such as ImageNet which is void of human eye fixation maps. The second step refines the pre-trained model on a limited set of ground truth fixations. Analysis of loss on iSUN and SALICON datasets reveal that pre-trained network converges much faster compared to randomly initialized network. WEPSAM also outperforms some recent state-of-the-art saliency prediction models on the challenging MIT300 dataset.
△ Less
Submitted 3 May, 2016;
originally announced May 2016.
-
Visualization Regularizers for Neural Network based Image Recognition
Authors:
Biswajit Paria,
Vikas Reddy,
Anirban Santara,
Pabitra Mitra
Abstract:
The success of deep neural networks is mostly due their ability to learn meaningful features from the data. Features learned in the hidden layers of deep neural networks trained in computer vision tasks have been shown to be similar to mid-level vision features. We leverage this fact in this work and propose the visualization regularizer for image tasks. The proposed regularization technique enfor…
▽ More
The success of deep neural networks is mostly due their ability to learn meaningful features from the data. Features learned in the hidden layers of deep neural networks trained in computer vision tasks have been shown to be similar to mid-level vision features. We leverage this fact in this work and propose the visualization regularizer for image tasks. The proposed regularization technique enforces smoothness of the features learned by hidden nodes and turns out to be a special case of Tikhonov regularization. We achieve higher classification accuracy as compared to existing regularizers such as the L2 norm regularizer and dropout, on benchmark datasets without changing the training computational complexity.
△ Less
Submitted 3 January, 2017; v1 submitted 10 April, 2016;
originally announced April 2016.
-
Ensemble of Deep Convolutional Neural Networks for Learning to Detect Retinal Vessels in Fundus Images
Authors:
Debapriya Maji,
Anirban Santara,
Pabitra Mitra,
Debdoot Sheet
Abstract:
Vision impairment due to pathological damage of the retina can largely be prevented through periodic screening using fundus color imaging. However the challenge with large scale screening is the inability to exhaustively detect fine blood vessels crucial to disease diagnosis. In this work we present a computational imaging framework using deep and ensemble learning for reliable detection of blood…
▽ More
Vision impairment due to pathological damage of the retina can largely be prevented through periodic screening using fundus color imaging. However the challenge with large scale screening is the inability to exhaustively detect fine blood vessels crucial to disease diagnosis. In this work we present a computational imaging framework using deep and ensemble learning for reliable detection of blood vessels in fundus color images. An ensemble of deep convolutional neural networks is trained to segment vessel and non-vessel areas of a color fundus image. During inference, the responses of the individual ConvNets of the ensemble are averaged to form the final segmentation. In experimental evaluation with the DRIVE database, we achieve the objective of vessel detection with maximum average accuracy of 94.7\% and area under ROC curve of 0.9283.
△ Less
Submitted 15 March, 2016;
originally announced March 2016.
-
Faster learning of deep stacked autoencoders on multi-core systems using synchronized layer-wise pre-training
Authors:
Anirban Santara,
Debapriya Maji,
DP Tejas,
Pabitra Mitra,
Arobinda Gupta
Abstract:
Deep neural networks are capable of modelling highly non-linear functions by capturing different levels of abstraction of data hierarchically. While training deep networks, first the system is initialized near a good optimum by greedy layer-wise unsupervised pre-training. However, with burgeoning data and increasing dimensions of the architecture, the time complexity of this approach becomes enorm…
▽ More
Deep neural networks are capable of modelling highly non-linear functions by capturing different levels of abstraction of data hierarchically. While training deep networks, first the system is initialized near a good optimum by greedy layer-wise unsupervised pre-training. However, with burgeoning data and increasing dimensions of the architecture, the time complexity of this approach becomes enormous. Also, greedy pre-training of the layers often turns detrimental by over-training a layer causing it to lose harmony with the rest of the network. In this paper a synchronized parallel algorithm for pre-training deep networks on multi-core machines has been proposed. Different layers are trained by parallel threads running on different cores with regular synchronization. Thus the pre-training process becomes faster and chances of over-training are reduced. This is experimentally validated using a stacked autoencoder for dimensionality reduction of MNIST handwritten digit database. The proposed algorithm achieved 26\% speed-up compared to greedy layer-wise pre-training for achieving the same reconstruction accuracy substantiating its potential as an alternative.
△ Less
Submitted 9 March, 2016;
originally announced March 2016.