-
Model Callers for Transforming Predictive and Generative AI Applications
Authors:
Mukesh Dalal
Abstract:
We introduce a novel software abstraction termed "model caller," acting as an intermediary for AI and ML model calling, advocating its transformative utility beyond existing model-serving frameworks. This abstraction offers multiple advantages: enhanced accuracy and reduced latency in model predictions, superior monitoring and observability of models, more streamlined AI system architectures, simp…
▽ More
We introduce a novel software abstraction termed "model caller," acting as an intermediary for AI and ML model calling, advocating its transformative utility beyond existing model-serving frameworks. This abstraction offers multiple advantages: enhanced accuracy and reduced latency in model predictions, superior monitoring and observability of models, more streamlined AI system architectures, simplified AI development and management processes, and improved collaboration and accountability across AI/ML/Data Science, software, data, and operations teams. Model callers are valuable for both creators and users of models within both predictive and generative AI applications. Additionally, we have developed and released a prototype Python library for model callers, accessible for installation via pip or for download from GitHub.
△ Less
Submitted 17 April, 2024;
originally announced June 2024.
-
Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks
Authors:
Murtaza Dalal,
Tarun Chiruvolu,
Devendra Chaplot,
Ruslan Salakhutdinov
Abstract:
Large Language Models (LLMs) have been shown to be capable of performing high-level planning for long-horizon robotics tasks, yet existing methods require access to a pre-defined skill library (e.g. picking, placing, pulling, pushing, navigating). However, LLM planning does not address how to design or learn those behaviors, which remains challenging particularly in long-horizon settings. Furtherm…
▽ More
Large Language Models (LLMs) have been shown to be capable of performing high-level planning for long-horizon robotics tasks, yet existing methods require access to a pre-defined skill library (e.g. picking, placing, pulling, pushing, navigating). However, LLM planning does not address how to design or learn those behaviors, which remains challenging particularly in long-horizon settings. Furthermore, for many tasks of interest, the robot needs to be able to adjust its behavior in a fine-grained manner, requiring the agent to be capable of modifying low-level control actions. Can we instead use the internet-scale knowledge from LLMs for high-level policies, guiding reinforcement learning (RL) policies to efficiently solve robotic control tasks online without requiring a pre-determined set of skills? In this paper, we propose Plan-Seq-Learn (PSL): a modular approach that uses motion planning to bridge the gap between abstract language and learned low-level control for solving long-horizon robotics tasks from scratch. We demonstrate that PSL achieves state-of-the-art results on over 25 challenging robotics tasks with up to 10 stages. PSL solves long-horizon tasks from raw visual input spanning four benchmarks at success rates of over 85%, out-performing language-based, classical, and end-to-end approaches. Video results and code at https://mihdalal.github.io/planseqlearn/
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
Reversible complement cyclic codes over finite chain rings
Authors:
Monika Dalal,
Sucheta Dutt,
Ranjeet Sehmi
Abstract:
Let k be an arbitrary element of a finite commutative chain ring R and u be a unit in R. In this work, we present necessary conditions which are sufficient as well for a cyclic code to be a (u,k) reversible complement code over R. Using these conditions, all principally generated cyclic codes over the ring Z_{2}+vZ_{2}+v^{2}Z_{2}, v^{3}=0 of length 4 have been checked to find whether they are (1,1…
▽ More
Let k be an arbitrary element of a finite commutative chain ring R and u be a unit in R. In this work, we present necessary conditions which are sufficient as well for a cyclic code to be a (u,k) reversible complement code over R. Using these conditions, all principally generated cyclic codes over the ring Z_{2}+vZ_{2}+v^{2}Z_{2}, v^{3}=0 of length 4 have been checked to find whether they are (1,1) reversible complement or not.
△ Less
Submitted 1 August, 2023;
originally announced August 2023.
-
Reversible cyclic codes over finite chain rings
Authors:
Monika Dalal,
Sucheta Dutt,
Ranjeet Sehmi
Abstract:
In this paper, necessary and sufficient conditions for the reversibility of a cyclic code of arbitrary length over a finite commutative chain ring have been derived. MDS reversible cyclic codes having length p^s over a finite chain ring with nilpotency index 2 have been characterized and a few examples of MDS reversible cyclic codes have been presented. Further, it is shown that the torsion codes…
▽ More
In this paper, necessary and sufficient conditions for the reversibility of a cyclic code of arbitrary length over a finite commutative chain ring have been derived. MDS reversible cyclic codes having length p^s over a finite chain ring with nilpotency index 2 have been characterized and a few examples of MDS reversible cyclic codes have been presented. Further, it is shown that the torsion codes of a reversible cyclic code over a finite chain ring are reversible. Also, an example of a non-reversible cyclic code for which all its torsion codes are reversible has been presented to show that the converse of this statement is not true. The cardinality and Hamming distance of a cyclic code over a finite commutative chain ring have also been determined.
△ Less
Submitted 23 July, 2023; v1 submitted 18 July, 2023;
originally announced July 2023.
-
Imitating Task and Motion Planning with Visuomotor Transformers
Authors:
Murtaza Dalal,
Ajay Mandlekar,
Caelan Garrett,
Ankur Handa,
Ruslan Salakhutdinov,
Dieter Fox
Abstract:
Imitation learning is a powerful tool for training robot manipulation policies, allowing them to learn from expert demonstrations without manual programming or trial-and-error. However, common methods of data collection, such as human supervision, scale poorly, as they are time-consuming and labor-intensive. In contrast, Task and Motion Planning (TAMP) can autonomously generate large-scale dataset…
▽ More
Imitation learning is a powerful tool for training robot manipulation policies, allowing them to learn from expert demonstrations without manual programming or trial-and-error. However, common methods of data collection, such as human supervision, scale poorly, as they are time-consuming and labor-intensive. In contrast, Task and Motion Planning (TAMP) can autonomously generate large-scale datasets of diverse demonstrations. In this work, we show that the combination of large-scale datasets generated by TAMP supervisors and flexible Transformer models to fit them is a powerful paradigm for robot manipulation. To that end, we present a novel imitation learning system called OPTIMUS that trains large-scale visuomotor Transformer policies by imitating a TAMP agent. OPTIMUS introduces a pipeline for generating TAMP data that is specifically curated for imitation learning and can be used to train performant transformer-based policies. In this paper, we present a thorough study of the design decisions required to imitate TAMP and demonstrate that OPTIMUS can solve a wide variety of challenging vision-based manipulation tasks with over 70 different objects, ranging from long-horizon pick-and-place tasks, to shelf and articulated object manipulation, achieving 70 to 80% success rates. Video results and code at https://mihdalal.github.io/optimus/
△ Less
Submitted 17 October, 2023; v1 submitted 25 May, 2023;
originally announced May 2023.
-
MDS and MHDR cyclic codes over finite chain rings
Authors:
Monika Dalal,
Sucheta Dutt,
Ranjeet Sehmi
Abstract:
In this work, a unique set of generators for a cyclic code over a finite chain ring has been established. The minimal spanning set and rank of the code have also been determined. Further, sufficient as well as necessary conditions for a cyclic code to be an MDS code and for a cyclic code to be an MHDR code have been obtained. Some examples of optimal cyclic codes have also been presented.
In this work, a unique set of generators for a cyclic code over a finite chain ring has been established. The minimal spanning set and rank of the code have also been determined. Further, sufficient as well as necessary conditions for a cyclic code to be an MDS code and for a cyclic code to be an MHDR code have been obtained. Some examples of optimal cyclic codes have also been presented.
△ Less
Submitted 28 March, 2023;
originally announced March 2023.
-
SEAL: Self-supervised Embodied Active Learning using Exploration and 3D Consistency
Authors:
Devendra Singh Chaplot,
Murtaza Dalal,
Saurabh Gupta,
Jitendra Malik,
Ruslan Salakhutdinov
Abstract:
In this paper, we explore how we can build upon the data and models of Internet images and use them to adapt to robot vision without requiring any extra labels. We present a framework called Self-supervised Embodied Active Learning (SEAL). It utilizes perception models trained on internet images to learn an active exploration policy. The observations gathered by this exploration policy are labelle…
▽ More
In this paper, we explore how we can build upon the data and models of Internet images and use them to adapt to robot vision without requiring any extra labels. We present a framework called Self-supervised Embodied Active Learning (SEAL). It utilizes perception models trained on internet images to learn an active exploration policy. The observations gathered by this exploration policy are labelled using 3D consistency and used to improve the perception model. We build and utilize 3D semantic maps to learn both action and perception in a completely self-supervised manner. The semantic map is used to compute an intrinsic motivation reward for training the exploration policy and for labelling the agent observations using spatio-temporal 3D consistency and label propagation. We demonstrate that the SEAL framework can be used to close the action-perception loop: it improves object detection and instance segmentation performance of a pretrained perception model by just moving around in training environments and the improved perception model can be used to improve Object Goal Navigation.
△ Less
Submitted 2 December, 2021;
originally announced December 2021.
-
Accelerating Robotic Reinforcement Learning via Parameterized Action Primitives
Authors:
Murtaza Dalal,
Deepak Pathak,
Ruslan Salakhutdinov
Abstract:
Despite the potential of reinforcement learning (RL) for building general-purpose robotic systems, training RL agents to solve robotics tasks still remains challenging due to the difficulty of exploration in purely continuous action spaces. Addressing this problem is an active area of research with the majority of focus on improving RL methods via better optimization or more efficient exploration.…
▽ More
Despite the potential of reinforcement learning (RL) for building general-purpose robotic systems, training RL agents to solve robotics tasks still remains challenging due to the difficulty of exploration in purely continuous action spaces. Addressing this problem is an active area of research with the majority of focus on improving RL methods via better optimization or more efficient exploration. An alternate but important component to consider improving is the interface of the RL algorithm with the robot. In this work, we manually specify a library of robot action primitives (RAPS), parameterized with arguments that are learned by an RL policy. These parameterized primitives are expressive, simple to implement, enable efficient exploration and can be transferred across robots, tasks and environments. We perform a thorough empirical study across challenging tasks in three distinct domains with image input and a sparse terminal reward. We find that our simple change to the action interface substantially improves both the learning efficiency and task performance irrespective of the underlying RL algorithm, significantly outperforming prior methods which learn skills from offline expert data. Code and videos at https://mihdalal.github.io/raps/
△ Less
Submitted 28 October, 2021;
originally announced October 2021.
-
AWAC: Accelerating Online Reinforcement Learning with Offline Datasets
Authors:
Ashvin Nair,
Abhishek Gupta,
Murtaza Dalal,
Sergey Levine
Abstract:
Reinforcement learning (RL) provides an appealing formalism for learning control policies from experience. However, the classic active formulation of RL necessitates a lengthy active exploration process for each behavior, making it difficult to apply in real-world settings such as robotic control. If we can instead allow RL algorithms to effectively use previously collected data to aid the online…
▽ More
Reinforcement learning (RL) provides an appealing formalism for learning control policies from experience. However, the classic active formulation of RL necessitates a lengthy active exploration process for each behavior, making it difficult to apply in real-world settings such as robotic control. If we can instead allow RL algorithms to effectively use previously collected data to aid the online learning process, such applications could be made substantially more practical: the prior data would provide a starting point that mitigates challenges due to exploration and sample complexity, while the online training enables the agent to perfect the desired skill. Such prior data could either constitute expert demonstrations or sub-optimal prior data that illustrates potentially useful transitions. While a number of prior methods have either used optimal demonstrations to bootstrap RL, or have used sub-optimal data to train purely offline, it remains exceptionally difficult to train a policy with offline data and actually continue to improve it further with online RL. In this paper we analyze why this problem is so challenging, and propose an algorithm that combines sample efficient dynamic programming with maximum likelihood policy updates, providing a simple and effective framework that is able to leverage large amounts of offline data and then quickly perform online fine-tuning of RL policies. We show that our method, advantage weighted actor critic (AWAC), enables rapid learning of skills with a combination of prior demonstration data and online experience. We demonstrate these benefits on simulated and real-world robotics domains, including dexterous manipulation with a real multi-fingered hand, drawer opening with a robotic arm, and rotating a valve. Our results show that incorporating prior data can reduce the time required to learn a range of robotic skills to practical time-scales.
△ Less
Submitted 24 April, 2021; v1 submitted 16 June, 2020;
originally announced June 2020.
-
Scalable Multi-Task Imitation Learning with Autonomous Improvement
Authors:
Avi Singh,
Eric Jang,
Alexander Irpan,
Daniel Kappler,
Murtaza Dalal,
Sergey Levine,
Mohi Khansari,
Chelsea Finn
Abstract:
While robot learning has demonstrated promising results for enabling robots to automatically acquire new skills, a critical challenge in deploying learning-based systems is scale: acquiring enough data for the robot to effectively generalize broadly. Imitation learning, in particular, has remained a stable and powerful approach for robot learning, but critically relies on expert operators for data…
▽ More
While robot learning has demonstrated promising results for enabling robots to automatically acquire new skills, a critical challenge in deploying learning-based systems is scale: acquiring enough data for the robot to effectively generalize broadly. Imitation learning, in particular, has remained a stable and powerful approach for robot learning, but critically relies on expert operators for data collection. In this work, we target this challenge, aiming to build an imitation learning system that can continuously improve through autonomous data collection, while simultaneously avoiding the explicit use of reinforcement learning, to maintain the stability, simplicity, and scalability of supervised imitation. To accomplish this, we cast the problem of imitation with autonomous improvement into a multi-task setting. We utilize the insight that, in a multi-task setting, a failed attempt at one task might represent a successful attempt at another task. This allows us to leverage the robot's own trials as demonstrations for tasks other than the one that the robot actually attempted. Using an initial dataset of multi-task demonstration data, the robot autonomously collects trials which are only sparsely labeled with a binary indication of whether the trial accomplished any useful task or not. We then embed the trials into a learned latent space of tasks, trained using only the initial demonstration dataset, to draw similarities between various trials, enabling the robot to achieve one-shot generalization to new tasks. In contrast to prior imitation learning approaches, our method can autonomously collect data with sparse supervision for continuous improvement, and in contrast to reinforcement learning algorithms, our method can effectively improve from sparse, task-agnostic reward signals.
△ Less
Submitted 25 February, 2020;
originally announced March 2020.
-
Autoregressive Models: What Are They Good For?
Authors:
Murtaza Dalal,
Alexander C. Li,
Rohan Taori
Abstract:
Autoregressive (AR) models have become a popular tool for unsupervised learning, achieving state-of-the-art log likelihood estimates. We investigate the use of AR models as density estimators in two settings -- as a learning signal for image translation, and as an outlier detector -- and find that these density estimates are much less reliable than previously thought. We examine the underlying opt…
▽ More
Autoregressive (AR) models have become a popular tool for unsupervised learning, achieving state-of-the-art log likelihood estimates. We investigate the use of AR models as density estimators in two settings -- as a learning signal for image translation, and as an outlier detector -- and find that these density estimates are much less reliable than previously thought. We examine the underlying optimization issues from both an empirical and theoretical perspective, and provide a toy example that illustrates the problem. Overwhelmingly, we find that density estimates do not correlate with perceptual quality and are unhelpful for downstream tasks.
△ Less
Submitted 17 October, 2019;
originally announced October 2019.
-
Skew-Fit: State-Covering Self-Supervised Reinforcement Learning
Authors:
Vitchyr H. Pong,
Murtaza Dalal,
Steven Lin,
Ashvin Nair,
Shikhar Bahl,
Sergey Levine
Abstract:
Autonomous agents that must exhibit flexible and broad capabilities will need to be equipped with large repertoires of skills. Defining each skill with a manually-designed reward function limits this repertoire and imposes a manual engineering burden. Self-supervised agents that set their own goals can automate this process, but designing appropriate goal setting objectives can be difficult, and o…
▽ More
Autonomous agents that must exhibit flexible and broad capabilities will need to be equipped with large repertoires of skills. Defining each skill with a manually-designed reward function limits this repertoire and imposes a manual engineering burden. Self-supervised agents that set their own goals can automate this process, but designing appropriate goal setting objectives can be difficult, and often involves heuristic design decisions. In this paper, we propose a formal exploration objective for goal-reaching policies that maximizes state coverage. We show that this objective is equivalent to maximizing goal reaching performance together with the entropy of the goal distribution, where goals correspond to full state observations. To instantiate this principle, we present an algorithm called Skew-Fit for learning a maximum-entropy goal distributions. We prove that, under regularity conditions, Skew-Fit converges to a uniform distribution over the set of valid states, even when we do not know this set beforehand. Our experiments show that combining Skew-Fit for learning goal distributions with existing goal-reaching methods outperforms a variety of prior methods on open-sourced visual goal-reaching tasks. Moreover, we demonstrate that Skew-Fit enables a real-world robot to learn to open a door, entirely from scratch, from pixels, and without any manually-designed reward function.
△ Less
Submitted 4 August, 2020; v1 submitted 8 March, 2019;
originally announced March 2019.
-
Visual Reinforcement Learning with Imagined Goals
Authors:
Ashvin Nair,
Vitchyr Pong,
Murtaza Dalal,
Shikhar Bahl,
Steven Lin,
Sergey Levine
Abstract:
For an autonomous agent to fulfill a wide range of user-specified goals at test time, it must be able to learn broadly applicable and general-purpose skill repertoires. Furthermore, to provide the requisite level of generality, these skills must handle raw sensory input such as images. In this paper, we propose an algorithm that acquires such general-purpose skills by combining unsupervised repres…
▽ More
For an autonomous agent to fulfill a wide range of user-specified goals at test time, it must be able to learn broadly applicable and general-purpose skill repertoires. Furthermore, to provide the requisite level of generality, these skills must handle raw sensory input such as images. In this paper, we propose an algorithm that acquires such general-purpose skills by combining unsupervised representation learning and reinforcement learning of goal-conditioned policies. Since the particular goals that might be required at test-time are not known in advance, the agent performs a self-supervised "practice" phase where it imagines goals and attempts to achieve them. We learn a visual representation with three distinct purposes: sampling goals for self-supervised practice, providing a structured transformation of raw sensory inputs, and computing a reward signal for goal reaching. We also propose a retroactive goal relabeling scheme to further improve the sample-efficiency of our method. Our off-policy algorithm is efficient enough to learn policies that operate on raw image observations and goals for a real-world robotic system, and substantially outperforms prior techniques.
△ Less
Submitted 4 December, 2018; v1 submitted 12 July, 2018;
originally announced July 2018.
-
Composable Deep Reinforcement Learning for Robotic Manipulation
Authors:
Tuomas Haarnoja,
Vitchyr Pong,
Aurick Zhou,
Murtaza Dalal,
Pieter Abbeel,
Sergey Levine
Abstract:
Model-free deep reinforcement learning has been shown to exhibit good performance in domains ranging from video games to simulated robotic manipulation and locomotion. However, model-free methods are known to perform poorly when the interaction time with the environment is limited, as is the case for most real-world robotic tasks. In this paper, we study how maximum entropy policies trained using…
▽ More
Model-free deep reinforcement learning has been shown to exhibit good performance in domains ranging from video games to simulated robotic manipulation and locomotion. However, model-free methods are known to perform poorly when the interaction time with the environment is limited, as is the case for most real-world robotic tasks. In this paper, we study how maximum entropy policies trained using soft Q-learning can be applied to real-world robotic manipulation. The application of this method to real-world manipulation is facilitated by two important features of soft Q-learning. First, soft Q-learning can learn multimodal exploration strategies by learning policies represented by expressive energy-based models. Second, we show that policies learned with soft Q-learning can be composed to create new policies, and that the optimality of the resulting policy can be bounded in terms of the divergence between the composed policies. This compositionality provides an especially valuable tool for real-world manipulation, where constructing new policies by composing existing skills can provide a large gain in efficiency over training from scratch. Our experimental evaluation demonstrates that soft Q-learning is substantially more sample efficient than prior model-free deep reinforcement learning methods, and that compositionality can be performed for both simulated and real-world tasks.
△ Less
Submitted 18 March, 2018;
originally announced March 2018.
-
Temporal Difference Models: Model-Free Deep RL for Model-Based Control
Authors:
Vitchyr Pong,
Shixiang Gu,
Murtaza Dalal,
Sergey Levine
Abstract:
Model-free reinforcement learning (RL) is a powerful, general tool for learning complex behaviors. However, its sample efficiency is often impractically large for solving challenging real-world problems, even with off-policy algorithms such as Q-learning. A limiting factor in classic model-free RL is that the learning signal consists only of scalar rewards, ignoring much of the rich information co…
▽ More
Model-free reinforcement learning (RL) is a powerful, general tool for learning complex behaviors. However, its sample efficiency is often impractically large for solving challenging real-world problems, even with off-policy algorithms such as Q-learning. A limiting factor in classic model-free RL is that the learning signal consists only of scalar rewards, ignoring much of the rich information contained in state transition tuples. Model-based RL uses this information, by training a predictive model, but often does not achieve the same asymptotic performance as model-free RL due to model bias. We introduce temporal difference models (TDMs), a family of goal-conditioned value functions that can be trained with model-free learning and used for model-based control. TDMs combine the benefits of model-free and model-based RL: they leverage the rich information in state transitions to learn very efficiently, while still attaining asymptotic performance that exceeds that of direct model-based RL methods. Our experimental results show that, on a range of continuous control tasks, TDMs provide a substantial improvement in efficiency compared to state-of-the-art model-based and model-free methods.
△ Less
Submitted 24 February, 2020; v1 submitted 25 February, 2018;
originally announced February 2018.