-
Discovering Object-Centric Generalized Value Functions From Pixels
Authors:
Somjit Nath,
Gopeshh Raaj Subbaraj,
Khimya Khetarpal,
Samira Ebrahimi Kahou
Abstract:
Deep Reinforcement Learning has shown significant progress in extracting useful representations from high-dimensional inputs albeit using hand-crafted auxiliary tasks and pseudo rewards. Automatically learning such representations in an object-centric manner geared towards control and fast adaptation remains an open research problem. In this paper, we introduce a method that tries to discover mean…
▽ More
Deep Reinforcement Learning has shown significant progress in extracting useful representations from high-dimensional inputs albeit using hand-crafted auxiliary tasks and pseudo rewards. Automatically learning such representations in an object-centric manner geared towards control and fast adaptation remains an open research problem. In this paper, we introduce a method that tries to discover meaningful features from objects, translating them to temporally coherent "question" functions and leveraging the subsequent learned general value functions for control. We compare our approach with state-of-the-art techniques alongside other ablations and show competitive performance in both stationary and non-stationary settings. Finally, we also investigate the discovered general value functions and through qualitative analysis show that the learned representations are not only interpretable but also, centered around objects that are invariant to changes across tasks facilitating fast adaptation.
△ Less
Submitted 27 June, 2023; v1 submitted 26 April, 2023;
originally announced April 2023.
-
Trajectory Control for Differential Drive Mobile Manipulators
Authors:
Harish Karunakaran,
Gopeshh Raaj Subbaraj
Abstract:
Mobile manipulator systems are comprised of a mobile platform with one or more manipulators and are of great interest in a number of applications such as indoor warehouses, mining, construction, forestry etc. We present an approach for computing actuator commands for such systems so that they can follow desired end-effector and platform trajectories without the violation of the nonholonomic constr…
▽ More
Mobile manipulator systems are comprised of a mobile platform with one or more manipulators and are of great interest in a number of applications such as indoor warehouses, mining, construction, forestry etc. We present an approach for computing actuator commands for such systems so that they can follow desired end-effector and platform trajectories without the violation of the nonholonomic constraints of the system in an indoor warehouse environment. We work with the Fetch robot which consists of a 7-DOF manipulator with a differential drive mobile base to validate our method. The major contributions of our project are, writing the dynamics of the system, Trajectory planning for the manipulator and the mobile base, state machine for the pick and place task and the inverse kinematics of the manipulator. Our results indicate that we are able to successfully implement trajectory control on the mobile base and the manipulator of the Fetch robot.
△ Less
Submitted 31 March, 2023;
originally announced April 2023.
-
Continual Learning In Environments With Polynomial Mixing Times
Authors:
Matthew Riemer,
Sharath Chandra Raparthy,
Ignacio Cases,
Gopeshh Subbaraj,
Maximilian Puelma Touzel,
Irina Rish
Abstract:
The mixing time of the Markov chain induced by a policy limits performance in real-world continual learning scenarios. Yet, the effect of mixing times on learning in continual reinforcement learning (RL) remains underexplored. In this paper, we characterize problems that are of long-term interest to the development of continual RL, which we call scalable MDPs, through the lens of mixing times. In…
▽ More
The mixing time of the Markov chain induced by a policy limits performance in real-world continual learning scenarios. Yet, the effect of mixing times on learning in continual reinforcement learning (RL) remains underexplored. In this paper, we characterize problems that are of long-term interest to the development of continual RL, which we call scalable MDPs, through the lens of mixing times. In particular, we theoretically establish that scalable MDPs have mixing times that scale polynomially with the size of the problem. We go on to demonstrate that polynomial mixing times present significant difficulties for existing approaches, which suffer from myopic bias and stale bootstrapped estimates. To validate our theory, we study the empirical scaling behavior of mixing times with respect to the number of tasks and task duration for high performing policies deployed across multiple Atari games. Our analysis demonstrates both that polynomial mixing times do emerge in practice and how their existence may lead to unstable learning behavior like catastrophic forgetting in continual learning settings.
△ Less
Submitted 13 October, 2022; v1 submitted 13 December, 2021;
originally announced December 2021.