Search | arXiv e-print repository

Learning World Models With Hierarchical Temporal Abstractions: A Probabilistic Perspective

Abstract: Machines that can replicate human intelligence with type 2 reasoning capabilities should be able to reason at multiple levels of spatio-temporal abstractions and scales using internal world models. Devising formalisms to develop such internal world models, which accurately reflect the causal hierarchies inherent in the dynamics of the real world, is a critical research challenge in the domains of… ▽ More Machines that can replicate human intelligence with type 2 reasoning capabilities should be able to reason at multiple levels of spatio-temporal abstractions and scales using internal world models. Devising formalisms to develop such internal world models, which accurately reflect the causal hierarchies inherent in the dynamics of the real world, is a critical research challenge in the domains of artificial intelligence and machine learning. This thesis identifies several limitations with the prevalent use of state space models (SSMs) as internal world models and propose two new probabilistic formalisms namely Hidden-Parameter SSMs and Multi-Time Scale SSMs to address these drawbacks. The structure of graphical models in both formalisms facilitates scalable exact probabilistic inference using belief propagation, as well as end-to-end learning via backpropagation through time. This approach permits the development of scalable, adaptive hierarchical world models capable of representing nonstationary dynamics across multiple temporal abstractions and scales. Moreover, these probabilistic formalisms integrate the concept of uncertainty in world states, thus improving the system's capacity to emulate the stochastic nature of the real world and quantify the confidence in its predictions. The thesis also discuss how these formalisms are in line with related neuroscience literature on Bayesian brain hypothesis and predicitive processing. Our experiments on various real and simulated robots demonstrate that our formalisms can match and in many cases exceed the performance of contemporary transformer variants in making long-range future predictions. We conclude the thesis by reflecting on the limitations of our current models and suggesting directions for future research. △ Less

Submitted 26 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

Comments: Doctoral Dissertation Preprint, Department of Computer Science, Karlsruhe Institute Of Technology, 2024

arXiv:2310.18534 [pdf, other]

Multi Time Scale World Models

Authors: Vaisakh Shaj, Saleh Gholam Zadeh, Ozan Demir, Luiz Ricardo Douat, Gerhard Neumann

Abstract: Intelligent agents use internal world models to reason and make predictions about different courses of their actions at many scales. Devising learning paradigms and architectures that allow machines to learn world models that operate at multiple levels of temporal abstractions while dealing with complex uncertainty predictions is a major technical hurdle. In this work, we propose a probabilistic f… ▽ More Intelligent agents use internal world models to reason and make predictions about different courses of their actions at many scales. Devising learning paradigms and architectures that allow machines to learn world models that operate at multiple levels of temporal abstractions while dealing with complex uncertainty predictions is a major technical hurdle. In this work, we propose a probabilistic formalism to learn multi-time scale world models which we call the Multi Time Scale State Space (MTS3) model. Our model uses a computationally efficient inference scheme on multiple time scales for highly accurate long-horizon predictions and uncertainty estimates over several seconds into the future. Our experiments, which focus on action conditional long horizon future predictions, show that MTS3 outperforms recent methods on several system identification benchmarks including complex simulated and real-world dynamical systems. Code is available at this repository: https://github.com/ALRhub/MTS3. △ Less

Submitted 4 December, 2023; v1 submitted 27 October, 2023; originally announced October 2023.

Comments: Accepted as spotlight at NeurIPS 2023

arXiv:2206.14697 [pdf, other]

Hidden Parameter Recurrent State Space Models For Changing Dynamics Scenarios

Authors: Vaisakh Shaj, Dieter Buchler, Rohit Sonker, Philipp Becker, Gerhard Neumann

Abstract: Recurrent State-space models (RSSMs) are highly expressive models for learning patterns in time series data and system identification. However, these models assume that the dynamics are fixed and unchanging, which is rarely the case in real-world scenarios. Many control applications often exhibit tasks with similar but not identical dynamics which can be modeled as a latent variable. We introduce… ▽ More Recurrent State-space models (RSSMs) are highly expressive models for learning patterns in time series data and system identification. However, these models assume that the dynamics are fixed and unchanging, which is rarely the case in real-world scenarios. Many control applications often exhibit tasks with similar but not identical dynamics which can be modeled as a latent variable. We introduce the Hidden Parameter Recurrent State Space Models (HiP-RSSMs), a framework that parametrizes a family of related dynamical systems with a low-dimensional set of latent factors. We present a simple and effective way of learning and performing inference over this Gaussian graphical model that avoids approximations like variational inference. We show that HiP-RSSMs outperforms RSSMs and competing multi-task models on several challenging robotic benchmarks both on real-world systems and simulations. △ Less

Submitted 12 October, 2023; v1 submitted 29 June, 2022; originally announced June 2022.

Comments: Published at the International Conference on Learning Representations, ICLR 2022

arXiv:2205.13804 [pdf, other]

End-to-End Learning of Hybrid Inverse Dynamics Models for Precise and Compliant Impedance Control

Authors: Moritz Reuss, Niels van Duijkeren, Robert Krug, Philipp Becker, Vaisakh Shaj, Gerhard Neumann

Abstract: It is well-known that inverse dynamics models can improve tracking performance in robot control. These models need to precisely capture the robot dynamics, which consist of well-understood components, e.g., rigid body dynamics, and effects that remain challenging to capture, e.g., stick-slip friction and mechanical flexibilities. Such effects exhibit hysteresis and partial observability, rendering… ▽ More It is well-known that inverse dynamics models can improve tracking performance in robot control. These models need to precisely capture the robot dynamics, which consist of well-understood components, e.g., rigid body dynamics, and effects that remain challenging to capture, e.g., stick-slip friction and mechanical flexibilities. Such effects exhibit hysteresis and partial observability, rendering them, particularly challenging to model. Hence, hybrid models, which combine a physical prior with data-driven approaches are especially well-suited in this setting. We present a novel hybrid model formulation that enables us to identify fully physically consistent inertial parameters of a rigid body dynamics model which is paired with a recurrent neural network architecture, allowing us to capture unmodeled partially observable effects using the network memory. We compare our approach against state-of-the-art inverse dynamics models on a 7 degree of freedom manipulator. Using data sets obtained through an optimal experiment design approach, we study the accuracy of offline torque prediction and generalization capabilities of joint learning methods. In control experiments on the real system, we evaluate the model as a feed-forward term for impedance control and show the feedback gains can be drastically reduced to achieve a given tracking accuracy. △ Less

Submitted 27 May, 2022; originally announced May 2022.

Comments: Accepted for publication at Robotics: Science and System XVIII (RSS), year 2022. Paper length is 13 pages (i.e. 9 pages of technical content, 1 page of the Bibliography/References and 3 pages of Appendix)

arXiv:2010.10201 [pdf, other]

Action-Conditional Recurrent Kalman Networks For Forward and Inverse Dynamics Learning

Authors: Vaisakh Shaj, Philipp Becker, Dieter Buchler, Harit Pandya, Niels van Duijkeren, C. James Taylor, Marc Hanheide, Gerhard Neumann

Abstract: Estimating accurate forward and inverse dynamics models is a crucial component of model-based control for sophisticated robots such as robots driven by hydraulics, artificial muscles, or robots dealing with different contact situations. Analytic models to such processes are often unavailable or inaccurate due to complex hysteresis effects, unmodelled friction and stiction phenomena,and unknown eff… ▽ More Estimating accurate forward and inverse dynamics models is a crucial component of model-based control for sophisticated robots such as robots driven by hydraulics, artificial muscles, or robots dealing with different contact situations. Analytic models to such processes are often unavailable or inaccurate due to complex hysteresis effects, unmodelled friction and stiction phenomena,and unknown effects during contact situations. A promising approach is to obtain spatio-temporal models in a data-driven way using recurrent neural networks, as they can overcome those issues. However, such models often do not meet accuracy demands sufficiently, degenerate in performance for the required high sampling frequencies and cannot provide uncertainty estimates. We adopt a recent probabilistic recurrent neural network architecture, called Re-current Kalman Networks (RKNs), to model learning by conditioning its transition dynamics on the control actions. RKNs outperform standard recurrent networks such as LSTMs on many state estimation tasks. Inspired by Kalman filters, the RKN provides an elegant way to achieve action conditioning within its recurrent cell by leveraging additive interactions between the current latent state and the action variables. We present two architectures, one for forward model learning and one for inverse model learning. Both architectures significantly outperform exist-ing model learning frameworks as well as analytical models in terms of prediction performance on a variety of real robot dynamics models. △ Less

Submitted 5 November, 2020; v1 submitted 20 October, 2020; originally announced October 2020.

Comments: Accepted to Conference On Robot Learning(CoRL), 2020

arXiv:2004.12771 [pdf, other]

Adversarial Fooling Beyond "Flip** the Label"

Authors: Konda Reddy Mopuri, Vaisakh Shaj, R. Venkatesh Babu

Abstract: Recent advancements in CNNs have shown remarkable achievements in various CV/AI applications. Though CNNs show near human or better than human performance in many critical tasks, they are quite vulnerable to adversarial attacks. These attacks are potentially dangerous in real-life deployments. Though there have been many adversarial attacks proposed in recent years, there is no proper way of quant… ▽ More Recent advancements in CNNs have shown remarkable achievements in various CV/AI applications. Though CNNs show near human or better than human performance in many critical tasks, they are quite vulnerable to adversarial attacks. These attacks are potentially dangerous in real-life deployments. Though there have been many adversarial attacks proposed in recent years, there is no proper way of quantifying the effectiveness of these attacks. As of today, mere fooling rate is used for measuring the susceptibility of the models, or the effectiveness of adversarial attacks. Fooling rate just considers label flip** and does not consider the cost of such flip**, for instance, in some deployments, flip** between two species of dogs may not be as severe as confusing a dog category with that of a vehicle. Therefore, the metric to quantify the vulnerability of the models should capture the severity of the flip** as well. In this work we first bring out the drawbacks of the existing evaluation and propose novel metrics to capture various aspects of the fooling. Further, for the first time, we present a comprehensive analysis of several important adversarial attacks over a set of distinct CNN architectures. We believe that the presented analysis brings valuable insights about the current adversarial attacks and the CNN models. △ Less

Submitted 27 April, 2020; originally announced April 2020.

Comments: CVPR-AMLCV-2020

arXiv:1905.08114 [pdf, other]

Zero-Shot Knowledge Distillation in Deep Networks

Authors: Gaurav Kumar Nayak, Konda Reddy Mopuri, Vaisakh Shaj, R. Venkatesh Babu, Anirban Chakraborty

Abstract: Knowledge distillation deals with the problem of training a smaller model (Student) from a high capacity source model (Teacher) so as to retain most of its performance. Existing approaches use either the training data or meta-data extracted from it in order to train the Student. However, accessing the dataset on which the Teacher has been trained may not always be feasible if the dataset is very l… ▽ More Knowledge distillation deals with the problem of training a smaller model (Student) from a high capacity source model (Teacher) so as to retain most of its performance. Existing approaches use either the training data or meta-data extracted from it in order to train the Student. However, accessing the dataset on which the Teacher has been trained may not always be feasible if the dataset is very large or it poses privacy or safety concerns (e.g., bio-metric or medical data). Hence, in this paper, we propose a novel data-free method to train the Student from the Teacher. Without even using any meta-data, we synthesize the Data Impressions from the complex Teacher model and utilize these as surrogates for the original training data samples to transfer its learning to Student via knowledge distillation. We, therefore, dub our method "Zero-Shot Knowledge Distillation" and demonstrate that our framework results in competitive generalization performance as achieved by distillation using the actual training data samples on multiple benchmark datasets. △ Less

Submitted 20 May, 2019; originally announced May 2019.

Comments: Accepted in ICML 2019, codes will be available at https://github.com/vcl-iisc/ZSKD

arXiv:1712.00640 [pdf, other]

Learning Sparse Adversarial Dictionaries For Multi-Class Audio Classification

Authors: Vaisakh Shaj, Puranjoy Bhattacharya

Abstract: Audio events are quite often overlap** in nature, and more prone to noise than visual signals. There has been increasing evidence for the superior performance of representations learned using sparse dictionaries for applications like audio denoising and speech enhancement. This paper concentrates on modifying the traditional reconstructive dictionary learning algorithms, by incorporating a discr… ▽ More Audio events are quite often overlap** in nature, and more prone to noise than visual signals. There has been increasing evidence for the superior performance of representations learned using sparse dictionaries for applications like audio denoising and speech enhancement. This paper concentrates on modifying the traditional reconstructive dictionary learning algorithms, by incorporating a discriminative term into the objective function in order to learn class-specific adversarial dictionaries that are good at representing samples of their own class at the same time poor at representing samples belonging to any other class. We quantitatively demonstrate the effectiveness of our learned dictionaries as a stand-alone solution for both binary as well as multi-class audio classification problems. △ Less

Submitted 2 December, 2017; originally announced December 2017.

Comments: Accepted in Asian Conference of Pattern Recognition (ACPR-2017)

Showing 1–8 of 8 results for author: Shaj, V