Search | arXiv e-print repository

Unified Auto-Encoding with Masked Diffusion

Authors: Philippe Hansen-Estruch, Sriram Vishwanath, Amy Zhang, Manan Tomar

Abstract: At the core of both successful generative and self-supervised representation learning models there is a reconstruction objective that incorporates some form of image corruption. Diffusion models implement this approach through a scheduled Gaussian corruption process, while masked auto-encoder models do so by masking patches of the image. Despite their different approaches, the underlying similarit… ▽ More At the core of both successful generative and self-supervised representation learning models there is a reconstruction objective that incorporates some form of image corruption. Diffusion models implement this approach through a scheduled Gaussian corruption process, while masked auto-encoder models do so by masking patches of the image. Despite their different approaches, the underlying similarity in their methodologies suggests a promising avenue for an auto-encoder capable of both de-noising tasks. We propose a unified self-supervised objective, dubbed Unified Masked Diffusion (UMD), that combines patch-based and noise-based corruption techniques within a single auto-encoding framework. Specifically, UMD modifies the diffusion transformer (DiT) training process by introducing an additional noise-free, high masking representation step in the diffusion noising schedule, and utilizes a mixed masked and noised image for subsequent timesteps. By integrating features useful for diffusion modeling and for predicting masked patch tokens, UMD achieves strong performance in downstream generative and representation learning tasks, including linear probing and class-conditional generation. This is achieved without the need for heavy data augmentations, multiple views, or additional encoders. Furthermore, UMD improves over the computational efficiency of prior diffusion based methods in total training time. We release our code at https://github.com/philippe-eecs/small-vision. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: 19 Pages, 8 Figures, 3Tables

ACM Class: I.2.10

arXiv:2405.11181 [pdf, other]

Towards Knowledge-Infused Automated Disease Diagnosis Assistant

Authors: Mohit Tomar, Abhisek Tiwari, Sriparna Saha

Abstract: With the advancement of internet communication and telemedicine, people are increasingly turning to the web for various healthcare activities. With an ever-increasing number of diseases and symptoms, diagnosing patients becomes challenging. In this work, we build a diagnosis assistant to assist doctors, which identifies diseases based on patient-doctor interaction. During diagnosis, doctors utiliz… ▽ More With the advancement of internet communication and telemedicine, people are increasingly turning to the web for various healthcare activities. With an ever-increasing number of diseases and symptoms, diagnosing patients becomes challenging. In this work, we build a diagnosis assistant to assist doctors, which identifies diseases based on patient-doctor interaction. During diagnosis, doctors utilize both symptomatology knowledge and diagnostic experience to identify diseases accurately and efficiently. Inspired by this, we investigate the role of medical knowledge in disease diagnosis through doctor-patient interaction. We propose a two-channel, knowledge-infused, discourse-aware disease diagnosis model (KI-DDI), where the first channel encodes patient-doctor communication using a transformer-based encoder, while the other creates an embedding of symptom-disease using a graph attention network (GAT). In the next stage, the conversation and knowledge graph embeddings are infused together and fed to a deep neural network for disease identification. Furthermore, we first develop an empathetic conversational medical corpus comprising conversations between patients and doctors, annotated with intent and symptoms information. The proposed model demonstrates a significant improvement over the existing state-of-the-art models, establishing the crucial roles of (a) a doctor's effort for additional symptom extraction (in addition to patient self-report) and (b) infusing medical knowledge in identifying diseases effectively. Many times, patients also show their medical conditions, which acts as crucial evidence in diagnosis. Therefore, integrating visual sensory information would represent an effective avenue for enhancing the capabilities of diagnostic assistants. △ Less

Submitted 18 May, 2024; originally announced May 2024.

arXiv:2405.09999 [pdf, other]

Reward Centering

Authors: Abhishek Naik, Yi Wan, Manan Tomar, Richard S. Sutton

Abstract: We show that discounted methods for solving continuing reinforcement learning problems can perform significantly better if they center their rewards by subtracting out the rewards' empirical average. The improvement is substantial at commonly used discount factors and increases further as the discount factor approaches one. In addition, we show that if a problem's rewards are shifted by a constant… ▽ More We show that discounted methods for solving continuing reinforcement learning problems can perform significantly better if they center their rewards by subtracting out the rewards' empirical average. The improvement is substantial at commonly used discount factors and increases further as the discount factor approaches one. In addition, we show that if a problem's rewards are shifted by a constant, then standard methods perform much worse, whereas methods with reward centering are unaffected. Estimating the average reward is straightforward in the on-policy setting; we propose a slightly more sophisticated method for the off-policy setting. Reward centering is a general idea, so we expect almost every reinforcement-learning algorithm to benefit by the addition of reward centering. △ Less

Submitted 16 May, 2024; originally announced May 2024.

Comments: In Proceedings of RLC 2024

arXiv:2401.06807 [pdf, other]

An EcoSage Assistant: Towards Building A Multimodal Plant Care Dialogue Assistant

Authors: Mohit Tomar, Abhisek Tiwari, Tulika Saha, Prince Jha, Sriparna Saha

Abstract: In recent times, there has been an increasing awareness about imminent environmental challenges, resulting in people showing a stronger dedication to taking care of the environment and nurturing green life. The current $19.6 billion indoor gardening industry, reflective of this growing sentiment, not only signifies a monetary value but also speaks of a profound human desire to reconnect with the n… ▽ More In recent times, there has been an increasing awareness about imminent environmental challenges, resulting in people showing a stronger dedication to taking care of the environment and nurturing green life. The current $19.6 billion indoor gardening industry, reflective of this growing sentiment, not only signifies a monetary value but also speaks of a profound human desire to reconnect with the natural world. However, several recent surveys cast a revealing light on the fate of plants within our care, with more than half succumbing primarily due to the silent menace of improper care. Thus, the need for accessible expertise capable of assisting and guiding individuals through the intricacies of plant care has become paramount more than ever. In this work, we make the very first attempt at building a plant care assistant, which aims to assist people with plant(-ing) concerns through conversations. We propose a plant care conversational dataset named Plantational, which contains around 1K dialogues between users and plant care experts. Our end-to-end proposed approach is two-fold : (i) We first benchmark the dataset with the help of various large language models (LLMs) and visual language model (VLM) by studying the impact of instruction tuning (zero-shot and few-shot prompting) and fine-tuning techniques on this task; (ii) finally, we build EcoSage, a multi-modal plant care assisting dialogue generation framework, incorporating an adapter-based modality infusion using a gated mechanism. We performed an extensive examination (both automated and manual evaluation) of the performance exhibited by various LLMs and VLM in the generation of the domain-specific dialogue responses to underscore the respective strengths and weaknesses of these diverse models. △ Less

Submitted 10 January, 2024; originally announced January 2024.

arXiv:2309.13041 [pdf, other]

Robotic Offline RL from Internet Videos via Value-Function Pre-Training

Authors: Chethan Bhateja, Derek Guo, Dibya Ghosh, Anikait Singh, Manan Tomar, Quan Vuong, Yevgen Chebotar, Sergey Levine, Aviral Kumar

Abstract: Pre-training on Internet data has proven to be a key ingredient for broad generalization in many modern ML systems. What would it take to enable such capabilities in robotic reinforcement learning (RL)? Offline RL methods, which learn from datasets of robot experience, offer one way to leverage prior data into the robotic learning pipeline. However, these methods have a "type mismatch" with video… ▽ More Pre-training on Internet data has proven to be a key ingredient for broad generalization in many modern ML systems. What would it take to enable such capabilities in robotic reinforcement learning (RL)? Offline RL methods, which learn from datasets of robot experience, offer one way to leverage prior data into the robotic learning pipeline. However, these methods have a "type mismatch" with video data (such as Ego4D), the largest prior datasets available for robotics, since video offers observation-only experience without the action or reward annotations needed for RL methods. In this paper, we develop a system for leveraging large-scale human video datasets in robotic offline RL, based entirely on learning value functions via temporal-difference learning. We show that value learning on video datasets learns representations that are more conducive to downstream robotic offline RL than other approaches for learning from video data. Our system, called V-PTR, combines the benefits of pre-training on video data with robotic offline RL approaches that train on diverse robot data, resulting in value functions and policies for manipulation tasks that perform better, act robustly, and generalize broadly. On several manipulation tasks on a real WidowX robot, our framework produces policies that greatly improve over prior methods. Our video and additional details can be found at https://dibyaghosh.com/vptr/ △ Less

Submitted 22 September, 2023; originally announced September 2023.

Comments: First three authors contributed equally

arXiv:2303.06121 [pdf, other]

Ignorance is Bliss: Robust Control via Information Gating

Authors: Manan Tomar, Riashat Islam, Matthew E. Taylor, Sergey Levine, Philip Bachman

Abstract: Informational parsimony provides a useful inductive bias for learning representations that achieve better generalization by being robust to noise and spurious correlations. We propose \textit{information gating} as a way to learn parsimonious representations that identify the minimal information required for a task. When gating information, we can learn to reveal as little information as possible… ▽ More Informational parsimony provides a useful inductive bias for learning representations that achieve better generalization by being robust to noise and spurious correlations. We propose \textit{information gating} as a way to learn parsimonious representations that identify the minimal information required for a task. When gating information, we can learn to reveal as little information as possible so that a task remains solvable, or hide as little information as possible so that a task becomes unsolvable. We gate information using a differentiable parameterization of the signal-to-noise ratio, which can be applied to arbitrary values in a network, e.g., erasing pixels at the input layer or activations in some intermediate layer. When gating at the input layer, our models learn which visual cues matter for a given task. When gating intermediate layers, our models learn which activations are needed for subsequent stages of computation. We call our approach \textit{InfoGating}. We apply InfoGating to various objectives such as multi-step forward and inverse dynamics models, Q-learning, and behavior cloning, highlighting how InfoGating can naturally help in discarding information not relevant for control. Results show that learning to identify and use minimal information can improve generalization in downstream tasks. Policies based on InfoGating are considerably more robust to irrelevant visual features, leading to improved pretraining and finetuning of RL models. △ Less

Submitted 8 December, 2023; v1 submitted 10 March, 2023; originally announced March 2023.

Comments: NeurIPS 2023

arXiv:2212.13835 [pdf, other]

Representation Learning in Deep RL via Discrete Information Bottleneck

Authors: Riashat Islam, Hongyu Zang, Manan Tomar, Aniket Didolkar, Md Mofijul Islam, Samin Yeasar Arnob, Tariq Iqbal, Xin Li, Anirudh Goyal, Nicolas Heess, Alex Lamb

Abstract: Several self-supervised representation learning methods have been proposed for reinforcement learning (RL) with rich observations. For real-world applications of RL, recovering underlying latent states is crucial, particularly when sensory inputs contain irrelevant and exogenous information. In this work, we study how information bottlenecks can be used to construct latent states efficiently in th… ▽ More Several self-supervised representation learning methods have been proposed for reinforcement learning (RL) with rich observations. For real-world applications of RL, recovering underlying latent states is crucial, particularly when sensory inputs contain irrelevant and exogenous information. In this work, we study how information bottlenecks can be used to construct latent states efficiently in the presence of task-irrelevant information. We propose architectures that utilize variational and discrete information bottlenecks, coined as RepDIB, to learn structured factorized representations. Exploiting the expressiveness bought by factorized representations, we introduce a simple, yet effective, bottleneck that can be integrated with any existing self-supervised objective for RL. We demonstrate this across several online and offline RL benchmarks, along with a real robot arm task, where we find that compressed representations with RepDIB can lead to strong performance improvements, as the learned bottlenecks help predict only the relevant state while ignoring irrelevant information. △ Less

Submitted 30 May, 2023; v1 submitted 28 December, 2022; originally announced December 2022.

Comments: AISTATS 2023

arXiv:2211.00164 [pdf, other]

Agent-Controller Representations: Principled Offline RL with Rich Exogenous Information

Authors: Riashat Islam, Manan Tomar, Alex Lamb, Yonathan Efroni, Hongyu Zang, Aniket Didolkar, Dipendra Misra, Xin Li, Harm van Seijen, Remi Tachet des Combes, John Langford

Abstract: Learning to control an agent from data collected offline in a rich pixel-based visual observation space is vital for real-world applications of reinforcement learning (RL). A major challenge in this setting is the presence of input information that is hard to model and irrelevant to controlling the agent. This problem has been approached by the theoretical RL community through the lens of exogenou… ▽ More Learning to control an agent from data collected offline in a rich pixel-based visual observation space is vital for real-world applications of reinforcement learning (RL). A major challenge in this setting is the presence of input information that is hard to model and irrelevant to controlling the agent. This problem has been approached by the theoretical RL community through the lens of exogenous information, i.e, any control-irrelevant information contained in observations. For example, a robot navigating in busy streets needs to ignore irrelevant information, such as other people walking in the background, textures of objects, or birds in the sky. In this paper, we focus on the setting with visually detailed exogenous information, and introduce new offline RL benchmarks offering the ability to study this problem. We find that contemporary representation learning techniques can fail on datasets where the noise is a complex and time dependent process, which is prevalent in practical applications. To address these, we propose to use multi-step inverse models, which have seen a great deal of interest in the RL theory community, to learn Agent-Controller Representations for Offline-RL (ACRO). Despite being simple and requiring no reward, we show theoretically and empirically that the representation created by this objective greatly outperforms baselines. △ Less

Submitted 13 August, 2023; v1 submitted 31 October, 2022; originally announced November 2022.

Comments: ICML 2023

arXiv:2203.00767 [pdf, ps, other]

On a notion of entropy for reachability properties

Authors: Mahendra Singh Tomar, Majid Zamani

Abstract: In this work, we introduce a notion of reachability entropy to characterize the smallest data rate which is sufficient enough to enforce reach-while-stay specification. We also define data rates of coder-controllers that can enforce this specification in finite time. Then, we establish the data-rate theorem which states that the reachability entropy is a tight lower bound of the data rates that al… ▽ More In this work, we introduce a notion of reachability entropy to characterize the smallest data rate which is sufficient enough to enforce reach-while-stay specification. We also define data rates of coder-controllers that can enforce this specification in finite time. Then, we establish the data-rate theorem which states that the reachability entropy is a tight lower bound of the data rates that allow satisfaction of the reach-while-stay specification. For a system which is related to an another system under feedback refinement relation, we show that the entropy of the former will not be larger than that of the latter. We also provide a procedure to numerically compute an upper bound of the reachability entropy for discrete-time control systems by leveraging their finite abstractions. Finally, we present some examples to demonstrate the effectiveness of the proposed results. △ Less

Submitted 20 June, 2022; v1 submitted 1 March, 2022; originally announced March 2022.

arXiv:2111.07775 [pdf, other]

Learning Representations for Pixel-based Control: What Matters and Why?

Authors: Manan Tomar, Utkarsh A. Mishra, Amy Zhang, Matthew E. Taylor

Abstract: Learning representations for pixel-based control has garnered significant attention recently in reinforcement learning. A wide range of methods have been proposed to enable efficient learning, leading to sample complexities similar to those in the full state setting. However, moving beyond carefully curated pixel data sets (centered crop, appropriate lighting, clear background, etc.) remains chall… ▽ More Learning representations for pixel-based control has garnered significant attention recently in reinforcement learning. A wide range of methods have been proposed to enable efficient learning, leading to sample complexities similar to those in the full state setting. However, moving beyond carefully curated pixel data sets (centered crop, appropriate lighting, clear background, etc.) remains challenging. In this paper, we adopt a more difficult setting, incorporating background distractors, as a first step towards addressing this challenge. We present a simple baseline approach that can learn meaningful representations with no metric-based learning, no data augmentations, no world-model learning, and no contrastive learning. We then analyze when and why previously proposed methods are likely to fail or reduce to the same performance as the baseline in this harder setting and why we should think carefully about extending such methods beyond the well curated environments. Our results show that finer categorization of benchmarks on the basis of characteristics like density of reward, planning horizon of the problem, presence of task-irrelevant components, etc., is crucial in evaluating algorithms. Based on these observations, we propose different metrics to consider when evaluating an algorithm on benchmark tasks. We hope such a data-centric view can motivate researchers to rethink representation learning when investigating how to best apply RL to real-world tasks. △ Less

Submitted 15 November, 2021; originally announced November 2021.

arXiv:2102.09850 [pdf, other]

Model-Invariant State Abstractions for Model-Based Reinforcement Learning

Authors: Manan Tomar, Amy Zhang, Roberto Calandra, Matthew E. Taylor, Joelle Pineau

Abstract: Accuracy and generalization of dynamics models is key to the success of model-based reinforcement learning (MBRL). As the complexity of tasks increases, so does the sample inefficiency of learning accurate dynamics models. However, many complex tasks also exhibit sparsity in the dynamics, i.e., actions have only a local effect on the system dynamics. In this paper, we exploit this property with a… ▽ More Accuracy and generalization of dynamics models is key to the success of model-based reinforcement learning (MBRL). As the complexity of tasks increases, so does the sample inefficiency of learning accurate dynamics models. However, many complex tasks also exhibit sparsity in the dynamics, i.e., actions have only a local effect on the system dynamics. In this paper, we exploit this property with a causal invariance perspective in the single-task setting, introducing a new type of state abstraction called \textit{model-invariance}. Unlike previous forms of state abstractions, a model-invariance state abstraction leverages causal sparsity over state variables. This allows for compositional generalization to unseen states, something that non-factored forms of state abstractions cannot do. We prove that an optimal policy can be learned over this model-invariance state abstraction and show improved generalization in a simple toy domain. Next, we propose a practical method to approximately learn a model-invariant representation for complex domains and validate our approach by showing improved modelling performance over standard maximum likelihood approaches on challenging tasks, such as the MuJoCo-based Humanoid. Finally, within the MBRL setting we show strong performance gains with respect to sample efficiency across a host of other continuous control tasks. △ Less

Submitted 7 June, 2021; v1 submitted 19 February, 2021; originally announced February 2021.

arXiv:2011.02916 [pdf, other]

Numerical over-approximation of invariance entropy via finite abstractions

Authors: Mahendra Singh Tomar, Christoph Kawan, Majid Zamani

Abstract: For a closed-loop control system with a digital channel between the sensor and the controller, the notion of invariance entropy quantifies the smallest average rate of information above which a given compact subset of the state space can be made invariant. There exist different versions of this quantity for deterministic and uncertain systems, which are equivalent in the deterministic case. In thi… ▽ More For a closed-loop control system with a digital channel between the sensor and the controller, the notion of invariance entropy quantifies the smallest average rate of information above which a given compact subset of the state space can be made invariant. There exist different versions of this quantity for deterministic and uncertain systems, which are equivalent in the deterministic case. In this work, we present algorithms for the numerical computation of these two quantities. In particular, given a subset $Q$ of the state set, we first partition it. Then a controller, in the form of a lookup table that assigns a set of control values to each cell of the partition, is computed to enforce invariance of $Q$. After determinizing the controller, a weighted directed graph is constructed. For deterministic systems, the logarithm of the spectral radius of a transition matrix obtained from the graph gives an upper bound of the entropy. For uncertain systems, the maximum mean cycle weight of the graph upper bounds the entropy. With three deterministic examples, for which the exact value of the invariance entropy is known or can be estimated by other means, we demonstrate that the upper bound obtained by our algorithm is of the same order of magnitude as the actual value. Additionally, our algorithm provides a static coder-controller scheme corresponding to the obtained data-rate bound. Finally, we present the computed upper bounds of invariance entropy for an uncertain linear control system as well. △ Less

Submitted 17 November, 2021; v1 submitted 3 November, 2020; originally announced November 2020.

Comments: arXiv admin note: substantial text overlap with arXiv:2004.04779

arXiv:2005.09814 [pdf, other]

Mirror Descent Policy Optimization

Authors: Manan Tomar, Lior Shani, Yonathan Efroni, Mohammad Ghavamzadeh

Abstract: Mirror descent (MD), a well-known first-order method in constrained convex optimization, has recently been shown as an important tool to analyze trust-region algorithms in reinforcement learning (RL). However, there remains a considerable gap between such theoretically analyzed algorithms and the ones used in practice. Inspired by this, we propose an efficient RL algorithm, called {\em mirror desc… ▽ More Mirror descent (MD), a well-known first-order method in constrained convex optimization, has recently been shown as an important tool to analyze trust-region algorithms in reinforcement learning (RL). However, there remains a considerable gap between such theoretically analyzed algorithms and the ones used in practice. Inspired by this, we propose an efficient RL algorithm, called {\em mirror descent policy optimization} (MDPO). MDPO iteratively updates the policy by {\em approximately} solving a trust-region problem, whose objective function consists of two terms: a linearization of the standard RL objective and a proximity term that restricts two consecutive policies to be close to each other. Each update performs this approximation by taking multiple gradient steps on this objective function. We derive {\em on-policy} and {\em off-policy} variants of MDPO, while emphasizing important design choices motivated by the existing theory of MD in RL. We highlight the connections between on-policy MDPO and two popular trust-region RL algorithms: TRPO and PPO, and show that explicitly enforcing the trust-region constraint is in fact {\em not} a necessity for high performance gains in TRPO. We then show how the popular soft actor-critic (SAC) algorithm can be derived by slight modifications of off-policy MDPO. Overall, MDPO is derived from the MD principles, offers a unified approach to viewing a number of popular RL algorithms, and performs better than or on-par with TRPO, PPO, and SAC in a number of continuous control tasks. Code is available at \url{https://github.com/manantomar/Mirror-Descent-Policy-Optimization}. △ Less

Submitted 7 June, 2021; v1 submitted 19 May, 2020; originally announced May 2020.

arXiv:2004.04779 [pdf, ps, other]

Numerical Estimation of Invariance Entropy for Nonlinear Control Systems

Authors: Mahendra Singh Tomar, Christoph Kawan, Pushpak Jagtap, Majid Zamani

Abstract: For a closed-loop control system with a digital channel between the sensor and the controller, the notion of invariance entropy quantifies the smallest average rate of information transmission above which a given compact subset of the state space can be made invariant. In this work, we present for the first time an algorithm to numerically compute upper bounds of invariance entropy. With three exa… ▽ More For a closed-loop control system with a digital channel between the sensor and the controller, the notion of invariance entropy quantifies the smallest average rate of information transmission above which a given compact subset of the state space can be made invariant. In this work, we present for the first time an algorithm to numerically compute upper bounds of invariance entropy. With three examples, for which the exact value of the invariance entropy is known to us or can be estimated by other means, we demonstrate that the upper bound obtained by our algorithm is of the same order of magnitude as the actual value. Additionally, our algorithm provides a static coder-controller scheme corresponding to the obtained data-rate bound. △ Less

Submitted 9 April, 2020; originally announced April 2020.

arXiv:1910.02919 [pdf, other]

Multi-step Greedy Reinforcement Learning Algorithms

Authors: Manan Tomar, Yonathan Efroni, Mohammad Ghavamzadeh

Abstract: Multi-step greedy policies have been extensively used in model-based reinforcement learning (RL), both when a model of the environment is available (e.g.,~in the game of Go) and when it is learned. In this paper, we explore their benefits in model-free RL, when employed using multi-step dynamic programming algorithms: $κ$-Policy Iteration ($κ$-PI) and $κ$-Value Iteration ($κ$-VI). These methods it… ▽ More Multi-step greedy policies have been extensively used in model-based reinforcement learning (RL), both when a model of the environment is available (e.g.,~in the game of Go) and when it is learned. In this paper, we explore their benefits in model-free RL, when employed using multi-step dynamic programming algorithms: $κ$-Policy Iteration ($κ$-PI) and $κ$-Value Iteration ($κ$-VI). These methods iteratively compute the next policy ($κ$-PI) and value function ($κ$-VI) by solving a surrogate decision problem with a shaped reward and a smaller discount factor. We derive model-free RL algorithms based on $κ$-PI and $κ$-VI in which the surrogate problem can be solved by any discrete or continuous action RL method, such as DQN and TRPO. We identify the importance of a hyper-parameter that controls the extent to which the surrogate problem is solved and suggest a way to set this parameter. When evaluated on a range of Atari and MuJoCo benchmark tasks, our results indicate that for the right range of $κ$, our algorithms outperform DQN and TRPO. This shows that our multi-step greedy algorithms are general enough to be applied over any existing RL algorithm and can significantly improve its performance. △ Less

Submitted 12 July, 2020; v1 submitted 7 October, 2019; originally announced October 2019.

Comments: ICML 2020

arXiv:1905.07193 [pdf, other]

MaMiC: Macro and Micro Curriculum for Robotic Reinforcement Learning

Authors: Manan Tomar, Akhil Sathuluri, Balaraman Ravindran

Abstract: Sha** in humans and animals has been shown to be a powerful tool for learning complex tasks as compared to learning in a randomized fashion. This makes the problem less complex and enables one to solve the easier sub task at hand first. Generating a curriculum for such guided learning involves subjecting the agent to easier goals first, and then gradually increasing their difficulty. This paper… ▽ More Sha** in humans and animals has been shown to be a powerful tool for learning complex tasks as compared to learning in a randomized fashion. This makes the problem less complex and enables one to solve the easier sub task at hand first. Generating a curriculum for such guided learning involves subjecting the agent to easier goals first, and then gradually increasing their difficulty. This paper takes a similar direction and proposes a dual curriculum scheme for solving robotic manipulation tasks with sparse rewards, called MaMiC. It includes a macro curriculum scheme which divides the task into multiple sub-tasks followed by a micro curriculum scheme which enables the agent to learn between such discovered sub-tasks. We show how combining macro and micro curriculum strategies help in overcoming major exploratory constraints considered in robot manipulation tasks without having to engineer any complex rewards. We also illustrate the meaning of the individual curricula and how they can be used independently based on the task. The performance of such a dual curriculum scheme is analyzed on the Fetch environments. △ Less

Submitted 17 May, 2019; originally announced May 2019.

Comments: To appear in the Proceedings of the 18th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2019). (Extended Abstract)

arXiv:1905.05731 [pdf, other]

Successor Options: An Option Discovery Framework for Reinforcement Learning

Authors: Rahul Ramesh, Manan Tomar, Balaraman Ravindran

Abstract: The options framework in reinforcement learning models the notion of a skill or a temporally extended sequence of actions. The discovery of a reusable set of skills has typically entailed building options, that navigate to bottleneck states. This work adopts a complementary approach, where we attempt to discover options that navigate to landmark states. These states are prototypical representative… ▽ More The options framework in reinforcement learning models the notion of a skill or a temporally extended sequence of actions. The discovery of a reusable set of skills has typically entailed building options, that navigate to bottleneck states. This work adopts a complementary approach, where we attempt to discover options that navigate to landmark states. These states are prototypical representatives of well-connected regions and can hence access the associated region with relative ease. In this work, we propose Successor Options, which leverages Successor Representations to build a model of the state space. The intra-option policies are learnt using a novel pseudo-reward and the model scales to high-dimensional spaces easily. Additionally, we also propose an Incremental Successor Options model that iterates between constructing Successor Representations and building options, which is useful when robust Successor Representations cannot be built solely from primitive actions. We demonstrate the efficacy of our approach on a collection of grid-worlds, and on the high-dimensional robotic control environment of Fetch. △ Less

Submitted 14 May, 2019; originally announced May 2019.

Comments: To appear in the proceedings of the International Joint Conference on Artificial Intelligence 2019 (IJCAI)

arXiv:1706.09581 [pdf]

Optically Controlled Polarization in Highly Oriented Ferroelectric Thin Films

Authors: Hitesh Borkar, M Tomar, Vinay Gupta, Ram S. Katiyar, J. F. Scott, Ashok Kumar

Abstract: The out-of-plane and in-plane polarization of (Pb0.6Li0.2Bi0.2)(Zr0.2Ti0.8)O3(PLBZT) thin film has studied in the dark and under illumination of a weak light source of a comparable bandgap. A highly oriented PLBZT thin film was grown on LaNiO3 (LNO)/LaAlO3(LAO) substrate by pulsed laser deposition system which illustrates well-saturated polarization and its significant enhancement under illuminati… ▽ More The out-of-plane and in-plane polarization of (Pb0.6Li0.2Bi0.2)(Zr0.2Ti0.8)O3(PLBZT) thin film has studied in the dark and under illumination of a weak light source of a comparable bandgap. A highly oriented PLBZT thin film was grown on LaNiO3 (LNO)/LaAlO3(LAO) substrate by pulsed laser deposition system which illustrates well-saturated polarization and its significant enhancement under illumination of light. We have employed two configurations for polarization characterization; first deals with out of plane polarization with single capacitor under investigation, whereas second demonstrates the two capacitors connected in series via the bottom electrode. Two different configurations were illuminated using different energy sources and their effects were studied. The latter configuration shows a significant change in polarization under illumination of light that may provide an extra degree of freedom for device miniaturization. The polarization was also tested using positive-up & negative-down (PUND) measurements which confirm robust polarization and their switching under illumination. △ Less

Submitted 29 June, 2017; originally announced June 2017.

Comments: Accepted in Materials Research Express

Report number: Article reference: MRX-104334.R3

arXiv:1706.08038 [pdf]

Giant Enhancement in Ferroelectric Polarization under Illumination

Authors: Hitesh Borkar, Vaibhav Rao, M Tomar, Vinay Gupta, J. F. Scott, Ashok Kumar

Abstract: We report optical enhancement in polarization and dielectric constant near room temperature in Pb0.6Li0.2Bi0.2Zr0.2Ti0.8O3 (PLBZT) electro-ceramics; these are doubly substituted members of the most important commercial ferroelectric PbZr0.2Ti0.8O3 (PZT:20/80). Partial (40%) substitution of equal amounts of Li+1 and Bi+3 in PZT: 20/80 retains the PZT tetragonal structure with space group P4mm. Unde… ▽ More We report optical enhancement in polarization and dielectric constant near room temperature in Pb0.6Li0.2Bi0.2Zr0.2Ti0.8O3 (PLBZT) electro-ceramics; these are doubly substituted members of the most important commercial ferroelectric PbZr0.2Ti0.8O3 (PZT:20/80). Partial (40%) substitution of equal amounts of Li+1 and Bi+3 in PZT: 20/80 retains the PZT tetragonal structure with space group P4mm. Under illumination of white light and weak 405-nm near-ultraviolet laser light (30 mW), an unexpectedly large (200-300%) change in polarization and displacement current was observed. Light also changes the dc conduction current density by one to two orders of magnitude with a large switchable open circuit voltage (Voc ~ 2 V) and short circuit current (Jsc ~ 5x10-8 A). The samples show a photo-current ON/OFF ratio of order 6:1 under illumination of weak light. △ Less

Submitted 25 June, 2017; originally announced June 2017.

arXiv:1706.05242 [pdf, ps, other]

Invariance Feedback Entropy of Uncertain Control Systems

Authors: Mahendra Singh Tomar, Matthias Rungger, Majid Zamani

Abstract: We introduce a novel notion of invariance feedback entropy to quantify the state information that is required by any controller that enforces a given subset of the state space to be invariant. We establish a number of elementary properties, e.g. we provide conditions that ensure that the invariance feedback entropy is finite and show for the deterministic case that we recover the well-known notion… ▽ More We introduce a novel notion of invariance feedback entropy to quantify the state information that is required by any controller that enforces a given subset of the state space to be invariant. We establish a number of elementary properties, e.g. we provide conditions that ensure that the invariance feedback entropy is finite and show for the deterministic case that we recover the well-known notion of entropy for deterministic control systems. We prove the data rate theorem, which shows that the invariance entropy is a tight lower bound of the data rate of any coder-controller that achieves invariance in the closed loop. We analyze uncertain linear control systems and derive a universal lower bound of the invariance feedback entropy. The lower bound depends on the absolute value of the determinant of the system matrix and a ratio involving the volume of the invariant set and the set of uncertainties. Furthermore, we derive a lower bound of the data rate of any static, memoryless coder-controller. Both lower bounds are intimately related and for certain cases it is possible to bound the performance loss due to the restriction to static coder-controllers by $1$ bit/time unit. We provide various examples throughout the paper to illustrate and discuss different definitions and results. △ Less

Submitted 5 August, 2019; v1 submitted 16 June, 2017; originally announced June 2017.

MSC Class: Primary 93B52; Secondary 93C10; 93C30; 93C55; 93C57 ACM Class: I.2.8

arXiv:1705.08936 [pdf, ps, other]

Transverse Magnetoresistance of Zn$_{0.9}$Co$_{0.1}$O:Al Thin Films

Authors: R. Martínez-Valdez, H. J. Jiménez-González, L. Angelats-Silva, M. Tomar

Abstract: The transverse magnetoresistance of thin films of the Diluted Magnetic Semiconductor Zn$_{1-x}$Co$_{x}$O:Al on glass was studied for temperatures in the range of 5 to 100 K. Measurements were made on thin films grown by rf magnetron sputtering, with a thickness of approximately 200 nm. ZnO was alloyed with Co to a concentration $x$ of 0.1 and co-doped with a 5.5% wt concentration of Al. The electr… ▽ More The transverse magnetoresistance of thin films of the Diluted Magnetic Semiconductor Zn$_{1-x}$Co$_{x}$O:Al on glass was studied for temperatures in the range of 5 to 100 K. Measurements were made on thin films grown by rf magnetron sputtering, with a thickness of approximately 200 nm. ZnO was alloyed with Co to a concentration $x$ of 0.1 and co-doped with a 5.5% wt concentration of Al. The electrical resistivity was measured along the sample surface by the four-point probe method with a magnetic field of up to 4 T applied perpendicular to the surface of the film. The experimental results of the magnetoresistance have been interpreted by means of a semiclassical model that combines a relaxation-time approximation to describe scattering processes in ZnO and a phenomenological approach to the spin-disorder scattering due to the indirect exchange interaction of the magnetic impurities. △ Less

Submitted 24 May, 2017; originally announced May 2017.

Comments: 5 pages, 4 figures, 1 table

arXiv:1705.02572 [pdf, ps, other]

Certain Ostrowski type inequalities for generalized s-convex functions

Authors: Muharrem Tomar, Praveen Agarwal, Mohamed Jleli

Abstract: In this paper, we first obtain a generalized integral identity for twice local differentiable functions. Then, using functions whose second derivatives in absolute value at certain powers are generalized s convex in the second sense, we obtain some new Ostrowski type inequalities. In this paper, we first obtain a generalized integral identity for twice local differentiable functions. Then, using functions whose second derivatives in absolute value at certain powers are generalized s convex in the second sense, we obtain some new Ostrowski type inequalities. △ Less

Submitted 7 May, 2017; originally announced May 2017.

arXiv:1602.00277 [pdf]

doi 10.1088/0953-8984/28/26/265901

Novel optically active lead-free relaxor ferroelectric (Ba0.6Bi0.2Li0.2)TiO3

Authors: Hitesh Borkar, Vaibhav Rao, Soma Dutta, Arun Barvat, Prabir Pal, M Tomar, Vinay Gupta, J. F. Scott, Ashok Kumar

Abstract: We discovered a near room temperature lead-free relaxor-ferroelectric (Ba0.6Bi0.2Li0.2)TiO3 (BBLT) having A-site compositional disordered ABO3 perovskite structure. Microstructure-property relations revealed that the chemical inhomogeneities and development of local polar nano regions (PNRs) are responsible for dielectric dispersion as a function of probe frequencies and temperatures. Rietveld ana… ▽ More We discovered a near room temperature lead-free relaxor-ferroelectric (Ba0.6Bi0.2Li0.2)TiO3 (BBLT) having A-site compositional disordered ABO3 perovskite structure. Microstructure-property relations revealed that the chemical inhomogeneities and development of local polar nano regions (PNRs) are responsible for dielectric dispersion as a function of probe frequencies and temperatures. Rietveld analysis indicates mixed crystal structure with 80% tetragonal structure (space group P4mm) and 20% orthorhombic structure (space group Amm2) which is confirmed by the high resolution transmission electron diffraction pattern. Dielectric constant and tangent loss dispersion with and without illumination of light obey nonlinear Vogel-Fulture relation. It shows slim polarization-hysteresis (P-E) loops and excellent displacement coefficients (d33 ~ 233 pm/V) near room temperature, which gradually diminish near the maximum dielectric dispersion temperature (Tm). The underlying physics for light-sensitive dielectric dispersion was probed by X-ray photon spectroscopy (XPS) which strongly suggests that mixed valence of bismuth ions, especially Bi5+ ions, are responsible for most of the optically active centers. Ultraviolet photoemission measurements showed most of the Ti ions are in 4+ states and sit at the centers of the TiO6 octahedra, which along with asymmetric hybridization between O 2p and Bi 6s orbitals appears to be the main driving force for net polarization. This BBLT material may open a new path for environmental friendly lead-free relaxor-ferroelectric research. △ Less

Submitted 31 January, 2016; originally announced February 2016.

Comments: 23 pages, 5 figures

arXiv:1509.03833 [pdf]

doi 10.1063/1.4931696

Anomalous change in leakage and displacement currents after electrical poling on lead-free ferroelectric ceramics

Authors: Hitesh Borkar, M. Tomar, Vinay Gupta, J. F. Scott, Ashok Kumar

Abstract: We report the polarization, displacement current and leakage current behavior of a trivalent nonpolar cation Al cation substituted lead free ferroelectric NBT-BT electroceramics with tetragonal phase and P4mm space group symmetry. Nearly three orders of magnitude decrease in leakage current were observed under electrical poling, which significantly improves microstructure, polarization, and displa… ▽ More We report the polarization, displacement current and leakage current behavior of a trivalent nonpolar cation Al cation substituted lead free ferroelectric NBT-BT electroceramics with tetragonal phase and P4mm space group symmetry. Nearly three orders of magnitude decrease in leakage current were observed under electrical poling, which significantly improves microstructure, polarization, and displacement current. Effective poling neutralizes the domain pinning, traps charges at grain boundaries and fills oxygen vacancies with free charge carriers in matrix, thus saturated macroscopic polarization in contrast to that in upoled samples. E-poling changes bananas type polarization loops to real ferroelectric loops. △ Less

Submitted 13 September, 2015; originally announced September 2015.

Comments: 18 pages, 5 figures

arXiv:1407.7653 [pdf]

Impedance Spectroscopy Study in the vicinity of Ferroelectric Phase Transition

Authors: Hitesh Borkar, M Tomar, Vinay Gupta, Ashok Kumar

Abstract: An impedance spectroscopy (IS) is a versatile tool to study the effect of grains (bulk), grain boundaries and electrode-electrolyte interface on dielectric and electrical properties of electro-ceramics. This study only focuses the high frequency (1 kHz to 10 MHz) probe of bulk ferroelectric capacitance near ferroelectric phase transition temperature (FPTT). The PZTFW single phase and PZTFW-CFO com… ▽ More An impedance spectroscopy (IS) is a versatile tool to study the effect of grains (bulk), grain boundaries and electrode-electrolyte interface on dielectric and electrical properties of electro-ceramics. This study only focuses the high frequency (1 kHz to 10 MHz) probe of bulk ferroelectric capacitance near ferroelectric phase transition temperature (FPTT). The PZTFW single phase and PZTFW-CFO composites, respectively, have been investigated to understand the microstructure-property relation. Kee** in mind the complex microstructure of both systems, low frequency ( less than 1 kHZ ) impedance investigation, which basically deals with grain boundaries and electrode-interfaces, has been ignored. X-ray diffraction (XRD) patterns, microstructures, dielectric spectra, and impedance plots revealed two distinct phases, inbuilt compressive strain, small shift in dielectric maximum temperature, and two activation energy regions, respectively, in PZTFW-CFO composite compare to PZTFW ceramic. Addition of CFO in PZTFW medium purified the impurity phases present in the PZTFW matrix. The PZTFW-CFO composite shows flat dielectric behavior and high dielectric constant near FPTT at high frequency may be useful for tunable dielectric capacitors. The changes in bulk capacitance, relaxation time and constant phase element parameters have probed in the proximity of FPTT regions. Nyquist plot and modulus formalism show a poly-dispersive nature of relaxation, relate the activation energy (Ea) of oxygen vacancies, mainly responsible for the bulk capacitive conduction. A spiral kind of modulus spectra was observed at elevated temperatures and frequencies (greater than 2 MHz) suggests the possible experimental artifacts, have no physical reasons to explain. △ Less

Submitted 29 July, 2014; originally announced July 2014.

Comments: 31 pages, 11 figures

Journal ref: Physics Express (2014)

arXiv:1103.1205 [pdf]

A Directional Feature with Energy based Offline Signature Verification Network

Authors: Minal Tomar, Pratibha Singh

Abstract: Signature used as a biometric is implemented in various systems as well as every signature signed by each person is distinct at the same time. So, it is very important to have a computerized signature verification system. In offline signature verification system dynamic features are not available obviously, but one can use a signature as an image and apply image processing techniques to make an ef… ▽ More Signature used as a biometric is implemented in various systems as well as every signature signed by each person is distinct at the same time. So, it is very important to have a computerized signature verification system. In offline signature verification system dynamic features are not available obviously, but one can use a signature as an image and apply image processing techniques to make an effective offline signature verification system. Author proposes a intelligent network used directional feature and energy density both as inputs to the same network and classifies the signature. Neural network is used as a classifier for this system. The results are compared with both the very basic energy density method and a simple directional feature method of offline signature verification system and this proposed new network is found very effective as compared to the above two methods, specially for less number of training samples, which can be implemented practically. △ Less

Submitted 7 March, 2011; originally announced March 2011.

Comments: 10 pages, 6 figures

Journal ref: International Journal on Soft Computing ( IJSC ), Vol.2, No.1, February 2011

Showing 1–26 of 26 results for author: Tomar, M