-
Learning to Recover from Plan Execution Errors during Robot Manipulation: A Neuro-symbolic Approach
Authors:
Namasivayam Kalithasan,
Arnav Tuli,
Vishal Bindal,
Himanshu Gaurav Singh,
Parag Singla,
Rohan Paul
Abstract:
Automatically detecting and recovering from failures is an important but challenging problem for autonomous robots. Most of the recent work on learning to plan from demonstrations lacks the ability to detect and recover from errors in the absence of an explicit state representation and/or a (sub-) goal check function. We propose an approach (blending learning with symbolic search) for automated er…
▽ More
Automatically detecting and recovering from failures is an important but challenging problem for autonomous robots. Most of the recent work on learning to plan from demonstrations lacks the ability to detect and recover from errors in the absence of an explicit state representation and/or a (sub-) goal check function. We propose an approach (blending learning with symbolic search) for automated error discovery and recovery, without needing annotated data of failures. Central to our approach is a neuro-symbolic state representation, in the form of dense scene graph, structured based on the objects present within the environment. This enables efficient learning of the transition function and a discriminator that not only identifies failures but also localizes them facilitating fast re-planning via computation of heuristic distance function. We also present an anytime version of our algorithm, where instead of recovering to the last correct state, we search for a sub-goal in the original plan minimizing the total distance to the goal given a re-planning budget. Experiments on a physics simulator with a variety of simulated failures show the effectiveness of our approach compared to existing baselines, both in terms of efficiency as well as accuracy of our recovery mechanism.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Sketch-Plan-Generalize: Continual Few-Shot Learning of Inductively Generalizable Spatial Concepts
Authors:
Namasivayam Kalithasan,
Sachit Sachdeva,
Himanshu Gaurav Singh,
Vishal Bindal,
Arnav Tuli,
Gurarmaan Singh Panjeta,
Divyanshu Aggarwal,
Rohan Paul,
Parag Singla
Abstract:
Our goal is to enable embodied agents to learn inductively generalizable spatial concepts, e.g., learning staircase as an inductive composition of towers of increasing height. Given a human demonstration, we seek a learning architecture that infers a succinct ${program}$ representation that explains the observed instance. Additionally, the approach should generalize inductively to novel structures…
▽ More
Our goal is to enable embodied agents to learn inductively generalizable spatial concepts, e.g., learning staircase as an inductive composition of towers of increasing height. Given a human demonstration, we seek a learning architecture that infers a succinct ${program}$ representation that explains the observed instance. Additionally, the approach should generalize inductively to novel structures of different sizes or complex structures expressed as a hierarchical composition of previously learned concepts. Existing approaches that use code generation capabilities of pre-trained large (visual) language models, as well as purely neural models, show poor generalization to a-priori unseen complex concepts. Our key insight is to factor inductive concept learning as (i) ${\it Sketch:}$ detecting and inferring a coarse signature of a new concept (ii) ${\it Plan:}$ performing MCTS search over grounded action sequences (iii) ${\it Generalize:}$ abstracting out grounded plans as inductive programs. Our pipeline facilitates generalization and modular reuse, enabling continual concept learning. Our approach combines the benefits of the code generation ability of large language models (LLM) along with grounded neural representations, resulting in neuro-symbolic programs that show stronger inductive generalization on the task of constructing complex structures in relation to LLM-only and neural-only approaches. Furthermore, we demonstrate reasoning and planning capabilities with learned concepts for embodied instruction following.
△ Less
Submitted 29 May, 2024; v1 submitted 11 April, 2024;
originally announced April 2024.
-
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
Authors:
Parth Sarthi,
Salman Abdullah,
Aditi Tuli,
Shubh Khanna,
Anna Goldie,
Christopher D. Manning
Abstract:
Retrieval-augmented language models can better adapt to changes in world state and incorporate long-tail knowledge. However, most existing methods retrieve only short contiguous chunks from a retrieval corpus, limiting holistic understanding of the overall document context. We introduce the novel approach of recursively embedding, clustering, and summarizing chunks of text, constructing a tree wit…
▽ More
Retrieval-augmented language models can better adapt to changes in world state and incorporate long-tail knowledge. However, most existing methods retrieve only short contiguous chunks from a retrieval corpus, limiting holistic understanding of the overall document context. We introduce the novel approach of recursively embedding, clustering, and summarizing chunks of text, constructing a tree with differing levels of summarization from the bottom up. At inference time, our RAPTOR model retrieves from this tree, integrating information across lengthy documents at different levels of abstraction. Controlled experiments show that retrieval with recursive summaries offers significant improvements over traditional retrieval-augmented LMs on several tasks. On question-answering tasks that involve complex, multi-step reasoning, we show state-of-the-art results; for example, by coupling RAPTOR retrieval with the use of GPT-4, we can improve the best performance on the QuALITY benchmark by 20% in absolute accuracy.
△ Less
Submitted 31 January, 2024;
originally announced January 2024.
-
Learning Neuro-symbolic Programs for Language Guided Robot Manipulation
Authors:
Namasivayam Kalithasan,
Himanshu Singh,
Vishal Bindal,
Arnav Tuli,
Vishwajeet Agrawal,
Rahul Jain,
Parag Singla,
Rohan Paul
Abstract:
Given a natural language instruction and an input scene, our goal is to train a model to output a manipulation program that can be executed by the robot. Prior approaches for this task possess one of the following limitations: (i) rely on hand-coded symbols for concepts limiting generalization beyond those seen during training [1] (ii) infer action sequences from instructions but require dense sub…
▽ More
Given a natural language instruction and an input scene, our goal is to train a model to output a manipulation program that can be executed by the robot. Prior approaches for this task possess one of the following limitations: (i) rely on hand-coded symbols for concepts limiting generalization beyond those seen during training [1] (ii) infer action sequences from instructions but require dense sub-goal supervision [2] or (iii) lack semantics required for deeper object-centric reasoning inherent in interpreting complex instructions [3]. In contrast, our approach can handle linguistic as well as perceptual variations, end-to-end trainable and requires no intermediate supervision. The proposed model uses symbolic reasoning constructs that operate on a latent neural object-centric representation, allowing for deeper reasoning over the input scene. Central to our approach is a modular structure consisting of a hierarchical instruction parser and an action simulator to learn disentangled action representations. Our experiments on a simulated environment with a 7-DOF manipulator, consisting of instructions with varying number of steps and scenes with different number of objects, demonstrate that our model is robust to such variations and significantly outperforms baselines, particularly in the generalization settings. The code, dataset and experiment videos are available at https://nsrmp.github.io
△ Less
Submitted 10 March, 2023; v1 submitted 12 November, 2022;
originally announced November 2022.
-
Deep Lake: a Lakehouse for Deep Learning
Authors:
Sasun Hambardzumyan,
Abhinav Tuli,
Levon Ghukasyan,
Fariz Rahman,
Hrant Topchyan,
David Isayan,
Mark McQuade,
Mikayel Harutyunyan,
Tatevik Hakobyan,
Ivo Stranic,
Davit Buniatyan
Abstract:
Traditional data lakes provide critical data infrastructure for analytical workloads by enabling time travel, running SQL queries, ingesting data with ACID transactions, and visualizing petabyte-scale datasets on cloud storage. They allow organizations to break down data silos, unlock data-driven decision-making, improve operational efficiency, and reduce costs. However, as deep learning usage inc…
▽ More
Traditional data lakes provide critical data infrastructure for analytical workloads by enabling time travel, running SQL queries, ingesting data with ACID transactions, and visualizing petabyte-scale datasets on cloud storage. They allow organizations to break down data silos, unlock data-driven decision-making, improve operational efficiency, and reduce costs. However, as deep learning usage increases, traditional data lakes are not well-designed for applications such as natural language processing (NLP), audio processing, computer vision, and applications involving non-tabular datasets. This paper presents Deep Lake, an open-source lakehouse for deep learning applications developed at Activeloop. Deep Lake maintains the benefits of a vanilla data lake with one key difference: it stores complex data, such as images, videos, annotations, as well as tabular data, in the form of tensors and rapidly streams the data over the network to (a) Tensor Query Language, (b) in-browser visualization engine, or (c) deep learning frameworks without sacrificing GPU utilization. Datasets stored in Deep Lake can be accessed from PyTorch, TensorFlow, JAX, and integrate with numerous MLOps tools.
△ Less
Submitted 13 December, 2022; v1 submitted 22 September, 2022;
originally announced September 2022.