Search | arXiv e-print repository

Map It Anywhere (MIA): Empowering Bird's Eye View Map** using Large-scale Public Data

Authors: Cherie Ho, Jiaye Zou, Omar Alama, Sai Mitheran Jagadesh Kumar, Benjamin Chiang, Taneesh Gupta, Chen Wang, Nikhil Keetha, Katia Sycara, Sebastian Scherer

Abstract: Top-down Bird's Eye View (BEV) maps are a popular representation for ground robot navigation due to their richness and flexibility for downstream tasks. While recent methods have shown promise for predicting BEV maps from First-Person View (FPV) images, their generalizability is limited to small regions captured by current autonomous vehicle-based datasets. In this context, we show that a more sca… ▽ More Top-down Bird's Eye View (BEV) maps are a popular representation for ground robot navigation due to their richness and flexibility for downstream tasks. While recent methods have shown promise for predicting BEV maps from First-Person View (FPV) images, their generalizability is limited to small regions captured by current autonomous vehicle-based datasets. In this context, we show that a more scalable approach towards generalizable map prediction can be enabled by using two large-scale crowd-sourced map** platforms, Mapillary for FPV images and OpenStreetMap for BEV semantic maps. We introduce Map It Anywhere (MIA), a data engine that enables seamless curation and modeling of labeled map prediction data from existing open-source map platforms. Using our MIA data engine, we display the ease of automatically collecting a dataset of 1.2 million pairs of FPV images & BEV maps encompassing diverse geographies, landscapes, environmental factors, camera models & capture scenarios. We further train a simple camera model-agnostic model on this data for BEV map prediction. Extensive evaluations using established benchmarks and our dataset show that the data curated by MIA enables effective pretraining for generalizable BEV map prediction, with zero-shot performance far exceeding baselines trained on existing datasets by 35%. Our analysis highlights the promise of using large-scale public maps for develo** & testing generalizable BEV perception, paving the way for more robust autonomous navigation. △ Less

Submitted 11 July, 2024; originally announced July 2024.

arXiv:2406.01377 [pdf, other]

Multi-Agent Transfer Learning via Temporal Contrastive Learning

Authors: Weihao Zeng, Joseph Campbell, Simon Stepputtis, Katia Sycara

Abstract: This paper introduces a novel transfer learning framework for deep multi-agent reinforcement learning. The approach automatically combines goal-conditioned policies with temporal contrastive learning to discover meaningful sub-goals. The approach involves pre-training a goal-conditioned agent, finetuning it on the target domain, and using contrastive learning to construct a planning graph that gui… ▽ More This paper introduces a novel transfer learning framework for deep multi-agent reinforcement learning. The approach automatically combines goal-conditioned policies with temporal contrastive learning to discover meaningful sub-goals. The approach involves pre-training a goal-conditioned agent, finetuning it on the target domain, and using contrastive learning to construct a planning graph that guides the agent via sub-goals. Experiments on multi-agent coordination Overcooked tasks demonstrate improved sample efficiency, the ability to solve sparse-reward and long-horizon problems, and enhanced interpretability compared to baselines. The results highlight the effectiveness of integrating goal-conditioned policies with unsupervised temporal abstraction learning for complex multi-agent transfer learning. Compared to state-of-the-art baselines, our method achieves the same or better performances while requiring only 21.7% of the training samples. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: 6 pages, 6 figures

Journal ref: 2024 IEEE International Conference on Robotics and Automation (ICRA) 2024

arXiv:2404.04256 [pdf, other]

Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation

Authors: Zifu Wan, Yuhao Wang, Silong Yong, **** Zhang, Simon Stepputtis, Katia Sycara, Yaqi Xie

Abstract: Multi-modal semantic segmentation significantly enhances AI agents' perception and scene understanding, especially under adverse conditions like low-light or overexposed environments. Leveraging additional modalities (X-modality) like thermal and depth alongside traditional RGB provides complementary information, enabling more robust and reliable segmentation. In this work, we introduce Sigma, a S… ▽ More Multi-modal semantic segmentation significantly enhances AI agents' perception and scene understanding, especially under adverse conditions like low-light or overexposed environments. Leveraging additional modalities (X-modality) like thermal and depth alongside traditional RGB provides complementary information, enabling more robust and reliable segmentation. In this work, we introduce Sigma, a Siamese Mamba network for multi-modal semantic segmentation, utilizing the Selective Structured State Space Model, Mamba. Unlike conventional methods that rely on CNNs, with their limited local receptive fields, or Vision Transformers (ViTs), which offer global receptive fields at the cost of quadratic complexity, our model achieves global receptive fields coverage with linear complexity. By employing a Siamese encoder and innovating a Mamba fusion mechanism, we effectively select essential information from different modalities. A decoder is then developed to enhance the channel-wise modeling ability of the model. Our method, Sigma, is rigorously evaluated on both RGB-Thermal and RGB-Depth segmentation tasks, demonstrating its superiority and marking the first successful application of State Space Models (SSMs) in multi-modal perception tasks. Code is available at https://github.com/zifuwan/Sigma. △ Less

Submitted 5 April, 2024; originally announced April 2024.

arXiv:2403.18062 [pdf, other]

ShapeGrasp: Zero-Shot Task-Oriented Gras** with Large Language Models through Geometric Decomposition

Authors: Samuel Li, Sarthak Bhagat, Joseph Campbell, Yaqi Xie, Woojun Kim, Katia Sycara, Simon Stepputtis

Abstract: Task-oriented gras** of unfamiliar objects is a necessary skill for robots in dynamic in-home environments. Inspired by the human capability to grasp such objects through intuition about their shape and structure, we present a novel zero-shot task-oriented gras** method leveraging a geometric decomposition of the target object into simple, convex shapes that we represent in a graph structure,… ▽ More Task-oriented gras** of unfamiliar objects is a necessary skill for robots in dynamic in-home environments. Inspired by the human capability to grasp such objects through intuition about their shape and structure, we present a novel zero-shot task-oriented gras** method leveraging a geometric decomposition of the target object into simple, convex shapes that we represent in a graph structure, including geometric attributes and spatial relationships. Our approach employs minimal essential information - the object's name and the intended task - to facilitate zero-shot task-oriented gras**. We utilize the commonsense reasoning capabilities of large language models to dynamically assign semantic meaning to each decomposed part and subsequently reason over the utility of each part for the intended task. Through extensive experiments on a real-world robotics platform, we demonstrate that our gras** approach's decomposition and reasoning pipeline is capable of selecting the correct part in 92% of the cases and successfully gras** the object in 82% of the tasks we evaluate. Additional videos, experiments, code, and data are available on our project website: https://shapegrasp.github.io/. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: 8 pages

arXiv:2403.15974 [pdf, other]

CBGT-Net: A Neuromimetic Architecture for Robust Classification of Streaming Data

Authors: Shreya Sharma, Dana Hughes, Katia Sycara

Abstract: This paper describes CBGT-Net, a neural network model inspired by the cortico-basal ganglia-thalamic (CBGT) circuits found in mammalian brains. Unlike traditional neural network models, which either generate an output for each provided input, or an output after a fixed sequence of inputs, the CBGT-Net learns to produce an output after a sufficient criteria for evidence is achieved from a stream of… ▽ More This paper describes CBGT-Net, a neural network model inspired by the cortico-basal ganglia-thalamic (CBGT) circuits found in mammalian brains. Unlike traditional neural network models, which either generate an output for each provided input, or an output after a fixed sequence of inputs, the CBGT-Net learns to produce an output after a sufficient criteria for evidence is achieved from a stream of observed data. For each observation, the CBGT-Net generates a vector that explicitly represents the amount of evidence the observation provides for each potential decision, accumulates the evidence over time, and generates a decision when the accumulated evidence exceeds a pre-defined threshold. We evaluate the proposed model on two image classification tasks, where models need to predict image categories based on a stream of small patches extracted from the image. We show that the CBGT-Net provides improved accuracy and robustness compared to models trained to classify from a single patch, and models leveraging an LSTM layer to classify from a fixed sequence length of patches. △ Less

Submitted 23 March, 2024; originally announced March 2024.

arXiv:2403.12964 [pdf, other]

Negative Yields Positive: Unified Dual-Path Adapter for Vision-Language Models

Authors: Ce Zhang, Simon Stepputtis, Katia Sycara, Yaqi Xie

Abstract: Recently, large-scale pre-trained Vision-Language Models (VLMs) have demonstrated great potential in learning open-world visual representations, and exhibit remarkable performance across a wide range of downstream tasks through efficient fine-tuning. In this work, we innovatively introduce the concept of dual learning into fine-tuning VLMs, i.e., we not only learn what an image is, but also what a… ▽ More Recently, large-scale pre-trained Vision-Language Models (VLMs) have demonstrated great potential in learning open-world visual representations, and exhibit remarkable performance across a wide range of downstream tasks through efficient fine-tuning. In this work, we innovatively introduce the concept of dual learning into fine-tuning VLMs, i.e., we not only learn what an image is, but also what an image isn't. Building on this concept, we introduce a novel DualAdapter approach to enable dual-path adaptation of VLMs from both positive and negative perspectives with only limited annotated samples. In the inference stage, our DualAdapter performs unified predictions by simultaneously conducting complementary positive selection and negative exclusion across target classes, thereby enhancing the overall recognition accuracy of VLMs in downstream tasks. Our extensive experimental results across 15 datasets validate that the proposed DualAdapter outperforms existing state-of-the-art methods on both few-shot learning and domain generalization tasks while achieving competitive computational efficiency. Code is available at https://github.com/zhangce01/DualAdapter. △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2403.12033 [pdf, other]

HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph Generation

Authors: Ce Zhang, Simon Stepputtis, Joseph Campbell, Katia Sycara, Yaqi Xie

Abstract: Being able to understand visual scenes is a precursor for many downstream tasks, including autonomous driving, robotics, and other vision-based approaches. A common approach enabling the ability to reason over visual data is Scene Graph Generation (SGG); however, many existing approaches assume undisturbed vision, i.e., the absence of real-world corruptions such as fog, snow, smoke, as well as non… ▽ More Being able to understand visual scenes is a precursor for many downstream tasks, including autonomous driving, robotics, and other vision-based approaches. A common approach enabling the ability to reason over visual data is Scene Graph Generation (SGG); however, many existing approaches assume undisturbed vision, i.e., the absence of real-world corruptions such as fog, snow, smoke, as well as non-uniform perturbations like sun glare or water drops. In this work, we propose a novel SGG benchmark containing procedurally generated weather corruptions and other transformations over the Visual Genome dataset. Further, we introduce a corresponding approach, Hierarchical Knowledge Enhanced Robust Scene Graph Generation (HiKER-SGG), providing a strong baseline for scene graph generation under such challenging setting. At its core, HiKER-SGG utilizes a hierarchical knowledge graph in order to refine its predictions from coarse initial estimates to detailed predictions. In our extensive experiments, we show that HiKER-SGG does not only demonstrate superior performance on corrupted images in a zero-shot manner, but also outperforms current state-of-the-art methods on uncorrupted SGG tasks. Code is available at https://github.com/zhangce01/HiKER-SGG. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: Accepted by CVPR 2024. Project page: https://zhangce01.github.io/HiKER-SGG

arXiv:2402.08772 [pdf, other]

Optimal Task Assignment and Path Planning using Conflict-Based Search with Precedence and Temporal Constraints

Authors: Yu Quan Chong, Jiaoyang Li, Katia Sycara

Abstract: The Multi-Agent Path Finding (MAPF) problem entails finding collision-free paths for a set of agents, guiding them from their start to goal locations. However, MAPF does not account for several practical task-related constraints. For example, agents may need to perform actions at goal locations with specific execution times, adhering to predetermined orders and timeframes. Moreover, goal assignmen… ▽ More The Multi-Agent Path Finding (MAPF) problem entails finding collision-free paths for a set of agents, guiding them from their start to goal locations. However, MAPF does not account for several practical task-related constraints. For example, agents may need to perform actions at goal locations with specific execution times, adhering to predetermined orders and timeframes. Moreover, goal assignments may not be predefined for agents, and the optimization objective may lack an explicit definition. To incorporate task assignment, path planning, and a user-defined objective into a coherent framework, this paper examines the Task Assignment and Path Finding with Precedence and Temporal Constraints (TAPF-PTC) problem. We augment Conflict-Based Search (CBS) to simultaneously generate task assignments and collision-free paths that adhere to precedence and temporal constraints, maximizing an objective quantified by the return from a user-defined reward function in reinforcement learning (RL). Experimentally, we demonstrate that our algorithm, CBS-TA-PTC, can solve highly challenging bomb-defusing tasks with precedence and temporal constraints efficiently relative to MARL and adapted Target Assignment and Path Finding (TAPF) methods. △ Less

Submitted 21 April, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

ACM Class: I.2.11

arXiv:2312.09159 [pdf, other]

WIT-UAS: A Wildland-fire Infrared Thermal Dataset to Detect Crew Assets From Aerial Views

Authors: Andrew Jong, Mukai Yu, Devansh Dhrafani, Siva Kailas, Brady Moon, Katia Sycara, Sebastian Scherer

Abstract: We present the Wildland-fire Infrared Thermal (WIT-UAS) dataset for long-wave infrared sensing of crew and vehicle assets amidst prescribed wildland fire environments. While such a dataset is crucial for safety monitoring in wildland fire applications, to the authors' awareness, no such dataset focusing on assets near fire is publicly available. Presumably, this is due to the barrier to entry of c… ▽ More We present the Wildland-fire Infrared Thermal (WIT-UAS) dataset for long-wave infrared sensing of crew and vehicle assets amidst prescribed wildland fire environments. While such a dataset is crucial for safety monitoring in wildland fire applications, to the authors' awareness, no such dataset focusing on assets near fire is publicly available. Presumably, this is due to the barrier to entry of collaborating with fire management personnel. We present two related data subsets: WIT-UAS-ROS consists of full ROS bag files containing sensor and robot data of UAS flight over the fire, and WIT-UAS-Image contains hand-labeled long-wave infrared (LWIR) images extracted from WIT-UAS-ROS. Our dataset is the first to focus on asset detection in a wildland fire environment. We show that thermal detection models trained without fire data frequently detect false positives by classifying fire as people. By adding our dataset to training, we show that the false positive rate is reduced significantly. Yet asset detection in wildland fire environments is still significantly more challenging than detection in urban environments, due to dense obscuring trees, greater heat variation, and overbearing thermal signal of the fire. We publicize this dataset to encourage the community to study more advanced models to tackle this challenging environment. The dataset, code and pretrained models are available at \url{https://github.com/castacks/WIT-UAS-Dataset}. △ Less

Submitted 14 December, 2023; originally announced December 2023.

Comments: Accepted for publication in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2023

arXiv:2312.08782 [pdf, other]

Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis

Authors: Yafei Hu, Quanting Xie, Vidhi Jain, Jonathan Francis, Jay Patrikar, Nikhil Keetha, Seungchan Kim, Yaqi Xie, Tianyi Zhang, Shibo Zhao, Yu Quan Chong, Chen Wang, Katia Sycara, Matthew Johnson-Roberson, Dhruv Batra, Xiaolong Wang, Sebastian Scherer, Zsolt Kira, Fei Xia, Yonatan Bisk

Abstract: Building general-purpose robots that can operate seamlessly, in any environment, with any object, and utilizing various skills to complete diverse tasks has been a long-standing goal in Artificial Intelligence. Unfortunately, however, most existing robotic systems have been constrained - having been designed for specific tasks, trained on specific datasets, and deployed within specific environment… ▽ More Building general-purpose robots that can operate seamlessly, in any environment, with any object, and utilizing various skills to complete diverse tasks has been a long-standing goal in Artificial Intelligence. Unfortunately, however, most existing robotic systems have been constrained - having been designed for specific tasks, trained on specific datasets, and deployed within specific environments. These systems usually require extensively-labeled data, rely on task-specific models, have numerous generalization issues when deployed in real-world scenarios, and struggle to remain robust to distribution shifts. Motivated by the impressive open-set performance and content generation capabilities of web-scale, large-capacity pre-trained models (i.e., foundation models) in research fields such as Natural Language Processing (NLP) and Computer Vision (CV), we devote this survey to exploring (i) how these existing foundation models from NLP and CV can be applied to the field of robotics, and also exploring (ii) what a robotics-specific foundation model would look like. We begin by providing an overview of what constitutes a conventional robotic system and the fundamental barriers to making it universally applicable. Next, we establish a taxonomy to discuss current work exploring ways to leverage existing foundation models for robotics and develop ones catered to robotics. Finally, we discuss key challenges and promising future directions in using foundation models for enabling general-purpose robotic systems. We encourage readers to view our living GitHub repository of resources, including papers reviewed in this survey as well as related projects and repositories for develo** foundation models for robotics. △ Less

Submitted 15 December, 2023; v1 submitted 14 December, 2023; originally announced December 2023.

arXiv:2312.08397 [pdf, other]

Personalized Decision Supports based on Theory of Mind Modeling and Explainable Reinforcement Learning

Authors: Huao Li, Yao Fan, Keyang Zheng, Michael Lewis, Katia Sycara

Abstract: In this paper, we propose a novel personalized decision support system that combines Theory of Mind (ToM) modeling and explainable Reinforcement Learning (XRL) to provide effective and interpretable interventions. Our method leverages DRL to provide expert action recommendations while incorporating ToM modeling to understand users' mental states and predict their future actions, enabling appropria… ▽ More In this paper, we propose a novel personalized decision support system that combines Theory of Mind (ToM) modeling and explainable Reinforcement Learning (XRL) to provide effective and interpretable interventions. Our method leverages DRL to provide expert action recommendations while incorporating ToM modeling to understand users' mental states and predict their future actions, enabling appropriate timing for intervention. To explain interventions, we use counterfactual explanations based on RL's feature importance and users' ToM model structure. Our proposed system generates accurate and personalized interventions that are easily interpretable by end-users. We demonstrate the effectiveness of our approach through a series of crowd-sourcing experiments in a simulated team decision-making task, where our system outperforms control baselines in terms of task performance. Our proposed approach is agnostic to task environment and RL model structure, therefore has the potential to be generalized to a wide range of applications. △ Less

Submitted 12 December, 2023; originally announced December 2023.

Comments: Accepted to IEEE SMC 2023

arXiv:2312.00192 [pdf, other]

Benchmarking and Enhancing Disentanglement in Concept-Residual Models

Authors: Renos Zabounidis, Ini Oguntola, Konghao Zhao, Joseph Campbell, Simon Stepputtis, Katia Sycara

Abstract: Concept bottleneck models (CBMs) are interpretable models that first predict a set of semantically meaningful features, i.e., concepts, from observations that are subsequently used to condition a downstream task. However, the model's performance strongly depends on the engineered features and can severely suffer from incomplete sets of concepts. Prior works have proposed a side channel -- a residu… ▽ More Concept bottleneck models (CBMs) are interpretable models that first predict a set of semantically meaningful features, i.e., concepts, from observations that are subsequently used to condition a downstream task. However, the model's performance strongly depends on the engineered features and can severely suffer from incomplete sets of concepts. Prior works have proposed a side channel -- a residual -- that allows for unconstrained information flow to the downstream task, thus improving model performance but simultaneously introducing information leakage, which is undesirable for interpretability. This work proposes three novel approaches to mitigate information leakage by disentangling concepts and residuals, investigating the critical balance between model performance and interpretability. Through extensive empirical analysis on the CUB, OAI, and CIFAR 100 datasets, we assess the performance of each disentanglement method and provide insights into when they work best. Further, we show how each method impacts the ability to intervene over the concepts and their subsequent impact on task performance. △ Less

Submitted 30 November, 2023; originally announced December 2023.

arXiv:2311.18062 [pdf, other]

Understanding Your Agent: Leveraging Large Language Models for Behavior Explanation

Authors: Xijia Zhang, Yue Guo, Simon Stepputtis, Katia Sycara, Joseph Campbell

Abstract: Intelligent agents such as robots are increasingly deployed in real-world, safety-critical settings. It is vital that these agents are able to explain the reasoning behind their decisions to human counterparts; however, their behavior is often produced by uninterpretable models such as deep neural networks. We propose an approach to generate natural language explanations for an agent's behavior ba… ▽ More Intelligent agents such as robots are increasingly deployed in real-world, safety-critical settings. It is vital that these agents are able to explain the reasoning behind their decisions to human counterparts; however, their behavior is often produced by uninterpretable models such as deep neural networks. We propose an approach to generate natural language explanations for an agent's behavior based only on observations of states and actions, thus making our method independent from the underlying model's representation. For such models, we first learn a behavior representation and subsequently use it to produce plausible explanations with minimal hallucination while affording user interaction with a pre-trained large language model. We evaluate our method in a multi-agent search-and-rescue environment and demonstrate the effectiveness of our explanations for agents executing various behaviors. Through user studies and empirical experiments, we show that our approach generates explanations as helpful as those produced by a human domain expert while enabling beneficial interactions such as clarification and counterfactual queries. △ Less

Submitted 29 November, 2023; originally announced November 2023.

arXiv:2311.05720 [pdf, other]

Long-Horizon Dialogue Understanding for Role Identification in the Game of Avalon with Large Language Models

Authors: Simon Stepputtis, Joseph Campbell, Yaqi Xie, Zhengyang Qi, Wenxin Sharon Zhang, Ruiyi Wang, Sanketh Rangreji, Michael Lewis, Katia Sycara

Abstract: Deception and persuasion play a critical role in long-horizon dialogues between multiple parties, especially when the interests, goals, and motivations of the participants are not aligned. Such complex tasks pose challenges for current Large Language Models (LLM) as deception and persuasion can easily mislead them, especially in long-horizon multi-party dialogues. To this end, we explore the game… ▽ More Deception and persuasion play a critical role in long-horizon dialogues between multiple parties, especially when the interests, goals, and motivations of the participants are not aligned. Such complex tasks pose challenges for current Large Language Models (LLM) as deception and persuasion can easily mislead them, especially in long-horizon multi-party dialogues. To this end, we explore the game of Avalon: The Resistance, a social deduction game in which players must determine each other's hidden identities to complete their team's objective. We introduce an online testbed and a dataset containing 20 carefully collected and labeled games among human players that exhibit long-horizon deception in a cooperative-competitive setting. We discuss the capabilities of LLMs to utilize deceptive long-horizon conversations between six human players to determine each player's goal and motivation. Particularly, we discuss the multimodal integration of the chat between the players and the game's state that grounds the conversation, providing further insights into the true player identities. We find that even current state-of-the-art LLMs do not reach human performance, making our dataset a compelling benchmark to investigate the decision-making and language-processing capabilities of LLMs. Our dataset and online testbed can be found at our project website: https://sstepput.github.io/Avalon-NLU/ △ Less

Submitted 9 November, 2023; originally announced November 2023.

Comments: Accepted to the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP, Findings of the Association for Computational Linguistics)

arXiv:2310.10701 [pdf, other]

doi 10.18653/v1/2023.emnlp-main.13

Theory of Mind for Multi-Agent Collaboration via Large Language Models

Authors: Huao Li, Yu Quan Chong, Simon Stepputtis, Joseph Campbell, Dana Hughes, Michael Lewis, Katia Sycara

Abstract: While Large Language Models (LLMs) have demonstrated impressive accomplishments in both reasoning and planning, their abilities in multi-agent collaborations remains largely unexplored. This study evaluates LLM-based agents in a multi-agent cooperative text game with Theory of Mind (ToM) inference tasks, comparing their performance with Multi-Agent Reinforcement Learning (MARL) and planning-based… ▽ More While Large Language Models (LLMs) have demonstrated impressive accomplishments in both reasoning and planning, their abilities in multi-agent collaborations remains largely unexplored. This study evaluates LLM-based agents in a multi-agent cooperative text game with Theory of Mind (ToM) inference tasks, comparing their performance with Multi-Agent Reinforcement Learning (MARL) and planning-based baselines. We observed evidence of emergent collaborative behaviors and high-order Theory of Mind capabilities among LLM-based agents. Our results reveal limitations in LLM-based agents' planning optimization due to systematic failures in managing long-horizon contexts and hallucination about the task state. We explore the use of explicit belief state representations to mitigate these issues, finding that it enhances task performance and the accuracy of ToM inferences for LLM-based agents. △ Less

Submitted 26 June, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

Comments: Accepted to EMNLP 2023 (Main Conference). Code available at https://github.com/romanlee6/multi_LLM_comm

Journal ref: in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Page 180-192, ACL

arXiv:2309.10346 [pdf, other]

Explaining Agent Behavior with Large Language Models

Authors: Xijia Zhang, Yue Guo, Simon Stepputtis, Katia Sycara, Joseph Campbell

Abstract: Intelligent agents such as robots are increasingly deployed in real-world, safety-critical settings. It is vital that these agents are able to explain the reasoning behind their decisions to human counterparts, however, their behavior is often produced by uninterpretable models such as deep neural networks. We propose an approach to generate natural language explanations for an agent's behavior ba… ▽ More Intelligent agents such as robots are increasingly deployed in real-world, safety-critical settings. It is vital that these agents are able to explain the reasoning behind their decisions to human counterparts, however, their behavior is often produced by uninterpretable models such as deep neural networks. We propose an approach to generate natural language explanations for an agent's behavior based only on observations of states and actions, agnostic to the underlying model representation. We show how a compact representation of the agent's behavior can be learned and used to produce plausible explanations with minimal hallucination while affording user interaction with a pre-trained large language model. Through user studies and empirical experiments, we show that our approach generates explanations as helpful as those generated by a human domain expert while enabling beneficial interactions such as clarification and counterfactual queries. △ Less

Submitted 19 September, 2023; originally announced September 2023.

Comments: Human Multi-Robot Interaction Workshop at IROS 2023

arXiv:2309.05943 [pdf, other]

Knowledge-Guided Short-Context Action Anticipation in Human-Centric Videos

Authors: Sarthak Bhagat, Simon Stepputtis, Joseph Campbell, Katia Sycara

Abstract: This work focuses on anticipating long-term human actions, particularly using short video segments, which can speed up editing workflows through improved suggestions while fostering creativity by suggesting narratives. To this end, we imbue a transformer network with a symbolic knowledge graph for action anticipation in video segments by boosting certain aspects of the transformer's attention mech… ▽ More This work focuses on anticipating long-term human actions, particularly using short video segments, which can speed up editing workflows through improved suggestions while fostering creativity by suggesting narratives. To this end, we imbue a transformer network with a symbolic knowledge graph for action anticipation in video segments by boosting certain aspects of the transformer's attention mechanism at run-time. Demonstrated on two benchmark datasets, Breakfast and 50Salads, our approach outperforms current state-of-the-art methods for long-term action anticipation using short video context by up to 9%. △ Less

Submitted 11 September, 2023; originally announced September 2023.

Comments: ICCV 2023 Workshop on AI for Creative Video Editing and Understanding

arXiv:2307.01158 [pdf, other]

Theory of Mind as Intrinsic Motivation for Multi-Agent Reinforcement Learning

Authors: Ini Oguntola, Joseph Campbell, Simon Stepputtis, Katia Sycara

Abstract: The ability to model the mental states of others is crucial to human social intelligence, and can offer similar benefits to artificial agents with respect to the social dynamics induced in multi-agent settings. We present a method of grounding semantically meaningful, human-interpretable beliefs within policies modeled by deep networks. We then consider the task of 2nd-order belief prediction. We… ▽ More The ability to model the mental states of others is crucial to human social intelligence, and can offer similar benefits to artificial agents with respect to the social dynamics induced in multi-agent settings. We present a method of grounding semantically meaningful, human-interpretable beliefs within policies modeled by deep networks. We then consider the task of 2nd-order belief prediction. We propose that ability of each agent to predict the beliefs of the other agents can be used as an intrinsic reward signal for multi-agent reinforcement learning. Finally, we present preliminary empirical results in a mixed cooperative-competitive environment. △ Less

Submitted 18 July, 2023; v1 submitted 3 July, 2023; originally announced July 2023.

Comments: To appear at ICML 2023 Workshop on Theory of Mind

arXiv:2307.00663 [pdf, other]

Solving Multi-Agent Target Assignment and Path Finding with a Single Constraint Tree

Authors: Yimin Tang, Zhongqiang Ren, Jiaoyang Li, Katia Sycara

Abstract: Combined Target-Assignment and Path-Finding problem (TAPF) requires simultaneously assigning targets to agents and planning collision-free paths for agents from their start locations to their assigned targets. As a leading approach to address TAPF, Conflict-Based Search with Target Assignment (CBS-TA) leverages both K-best target assignments to create multiple search trees and Conflict-Based Searc… ▽ More Combined Target-Assignment and Path-Finding problem (TAPF) requires simultaneously assigning targets to agents and planning collision-free paths for agents from their start locations to their assigned targets. As a leading approach to address TAPF, Conflict-Based Search with Target Assignment (CBS-TA) leverages both K-best target assignments to create multiple search trees and Conflict-Based Search (CBS) to resolve collisions in each search tree. While being able to find an optimal solution, CBS-TA suffers from scalability due to the duplicated collision resolution in multiple trees and the expensive computation of K-best assignments. We therefore develop Incremental Target Assignment CBS (ITA-CBS) to bypass these two computational bottlenecks. ITA-CBS generates only a single search tree and avoids computing K-best assignments by incrementally computing new 1-best assignments during the search. We show that, in theory, ITA-CBS is guaranteed to find an optimal solution and, in practice, is computationally efficient. △ Less

Submitted 23 October, 2023; v1 submitted 2 July, 2023; originally announced July 2023.

arXiv:2306.16265 [pdf, other]

Reconfigurable Robot Control Using Flexible Coupling Mechanisms

Authors: Sha Yi, Katia Sycara, Zeynep Temel

Abstract: Reconfigurable robot swarms are capable of connecting with each other to form complex structures. Current mechanical or magnetic connection mechanisms can be complicated to manufacture, consume high power, have a limited load-bearing capacity, or can only form rigid structures. In this paper, we present our low-cost soft anchor design that enables flexible coupling and decoupling between robots. O… ▽ More Reconfigurable robot swarms are capable of connecting with each other to form complex structures. Current mechanical or magnetic connection mechanisms can be complicated to manufacture, consume high power, have a limited load-bearing capacity, or can only form rigid structures. In this paper, we present our low-cost soft anchor design that enables flexible coupling and decoupling between robots. Our asymmetric anchor requires minimal force to be pushed into the opening of another robot while having a strong pulling force so that the connection between robots can be secured. To maintain this flexible coupling mechanism as an assembled structure, we present our Model Predictive Control (MPC) frameworks with polygon constraints to model the geometric relationship between robots. We conducted experiments on the soft anchor to obtain its force profile, which informed the three-bar linkage model of the anchor in the simulations. We show that the proposed mechanism and MPC frameworks enable the robots to couple, decouple, and perform various behaviors in both the simulation environment and hardware platform. Our code is available at https://github.com/ZoomLabCMU/puzzlebot_anchor . Video is available at https://www.youtube.com/watch?v=R3gFplorCJg . △ Less

Submitted 28 June, 2023; originally announced June 2023.

arXiv:2306.12314 [pdf, other]

Introspective Action Advising for Interpretable Transfer Learning

Authors: Joseph Campbell, Yue Guo, Fiona Xie, Simon Stepputtis, Katia Sycara

Abstract: Transfer learning can be applied in deep reinforcement learning to accelerate the training of a policy in a target task by transferring knowledge from a policy learned in a related source task. This is commonly achieved by copying pretrained weights from the source policy to the target policy prior to training, under the constraint that they use the same model architecture. However, not only does… ▽ More Transfer learning can be applied in deep reinforcement learning to accelerate the training of a policy in a target task by transferring knowledge from a policy learned in a related source task. This is commonly achieved by copying pretrained weights from the source policy to the target policy prior to training, under the constraint that they use the same model architecture. However, not only does this require a robust representation learned over a wide distribution of states -- often failing to transfer between specialist models trained over single tasks -- but it is largely uninterpretable and provides little indication of what knowledge is transferred. In this work, we propose an alternative approach to transfer learning between tasks based on action advising, in which a teacher trained in a source task actively guides a student's exploration in a target task. Through introspection, the teacher is capable of identifying when advice is beneficial to the student and should be given, and when it is not. Our approach allows knowledge transfer between policies agnostic of the underlying representations, and we empirically show that this leads to improved convergence rates in Gridworld and Atari environments while providing insight into what knowledge is transferred. △ Less

Submitted 21 June, 2023; originally announced June 2023.

Comments: Accepted to CoLLAs 2023

arXiv:2306.09482 [pdf, other]

Sample-Efficient Learning of Novel Visual Concepts

Authors: Sarthak Bhagat, Simon Stepputtis, Joseph Campbell, Katia Sycara

Abstract: Despite the advances made in visual object recognition, state-of-the-art deep learning models struggle to effectively recognize novel objects in a few-shot setting where only a limited number of examples are provided. Unlike humans who excel at such tasks, these models often fail to leverage known relationships between entities in order to draw conclusions about such objects. In this work, we show… ▽ More Despite the advances made in visual object recognition, state-of-the-art deep learning models struggle to effectively recognize novel objects in a few-shot setting where only a limited number of examples are provided. Unlike humans who excel at such tasks, these models often fail to leverage known relationships between entities in order to draw conclusions about such objects. In this work, we show that incorporating a symbolic knowledge graph into a state-of-the-art recognition model enables a new approach for effective few-shot classification. In our proposed neuro-symbolic architecture and training methodology, the knowledge graph is augmented with additional relationships extracted from a small set of examples, improving its ability to recognize novel objects by considering the presence of interconnected entities. Unlike existing few-shot classifiers, we show that this enables our model to incorporate not only objects but also abstract concepts and affordances. The existence of the knowledge graph also makes this approach amenable to interpretability through analysis of the relationships contained within it. We empirically show that our approach outperforms current state-of-the-art few-shot multi-label classification methods on the COCO dataset and evaluate the addition of abstract concepts and affordances on the Visual Genome dataset. △ Less

Submitted 15 June, 2023; originally announced June 2023.

arXiv:2305.15640 [pdf, other]

Characterizing Out-of-Distribution Error via Optimal Transport

Authors: Yuzhe Lu, Yilong Qin, Runtian Zhai, Andrew Shen, Ketong Chen, Zhenlin Wang, Soheil Kolouri, Simon Stepputtis, Joseph Campbell, Katia Sycara

Abstract: Out-of-distribution (OOD) data poses serious challenges in deployed machine learning models, so methods of predicting a model's performance on OOD data without labels are important for machine learning safety. While a number of methods have been proposed by prior work, they often underestimate the actual error, sometimes by a large margin, which greatly impacts their applicability to real tasks. I… ▽ More Out-of-distribution (OOD) data poses serious challenges in deployed machine learning models, so methods of predicting a model's performance on OOD data without labels are important for machine learning safety. While a number of methods have been proposed by prior work, they often underestimate the actual error, sometimes by a large margin, which greatly impacts their applicability to real tasks. In this work, we identify pseudo-label shift, or the difference between the predicted and true OOD label distributions, as a key indicator to this underestimation. Based on this observation, we introduce a novel method for estimating model performance by leveraging optimal transport theory, Confidence Optimal Transport (COT), and show that it provably provides more robust error estimates in the presence of pseudo-label shift. Additionally, we introduce an empirically-motivated variant of COT, Confidence Optimal Transport with Thresholding (COTT), which applies thresholding to the individual transport costs and further improves the accuracy of COT's error estimates. We evaluate COT and COTT on a variety of standard benchmarks that induce various types of distribution shift -- synthetic, novel subpopulation, and natural -- and show that our approaches significantly outperform existing state-of-the-art methods with an up to 3x lower prediction error. △ Less

Submitted 27 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

Comments: NeurIPS 2023

arXiv:2303.02259 [pdf, other]

Graph-based Simultaneous Coverage and Exploration Planning for Fast Multi-robot Search

Authors: Indraneel Patil, Rachel Zheng, Charvi Gupta, Jaekyung Song, Narendar Sriram, Katia Sycara

Abstract: In large unknown environments, search operations can be much more time-efficient with the use of multi-robot fleets by parallelizing efforts. This means robots must efficiently perform collaborative map** (exploration) while simultaneously searching an area for victims (coverage). Previous simultaneous map** and planning techniques treat these problems as separate and do not take advantage of… ▽ More In large unknown environments, search operations can be much more time-efficient with the use of multi-robot fleets by parallelizing efforts. This means robots must efficiently perform collaborative map** (exploration) while simultaneously searching an area for victims (coverage). Previous simultaneous map** and planning techniques treat these problems as separate and do not take advantage of the possibility for a unified approach. We propose a novel exploration-coverage planner which bridges the map** and search domains by growing sets of random trees rooted upon a pose graph produced through map** to generate points of interest, or tasks. Furthermore, it is important for the robots to first prioritize high information tasks to locate the greatest number of victims in minimum time by balancing coverage and exploration, which current methods do not address. Towards this goal, we also present a new multi-robot task allocator that formulates a notion of a hierarchical information heuristic for time-critical collaborative search. Our results show that our algorithm produces 20% more coverage efficiency, defined as average covered area per second, compared to the existing state-of-the-art. Our algorithms and the rest of our multi-robot search stack is based in ROS and made open source △ Less

Submitted 3 March, 2023; originally announced March 2023.

Comments: Submitted to IROS 2023 on 1st March

arXiv:2302.14276 [pdf, other]

On the Role of Emergent Communication for Social Learning in Multi-Agent Reinforcement Learning

Authors: Seth Karten, Siva Kailas, Huao Li, Katia Sycara

Abstract: Explicit communication among humans is key to coordinating and learning. Social learning, which uses cues from experts, can greatly benefit from the usage of explicit communication to align heterogeneous policies, reduce sample complexity, and solve partially observable tasks. Emergent communication, a type of explicit communication, studies the creation of an artificial language to encode a high… ▽ More Explicit communication among humans is key to coordinating and learning. Social learning, which uses cues from experts, can greatly benefit from the usage of explicit communication to align heterogeneous policies, reduce sample complexity, and solve partially observable tasks. Emergent communication, a type of explicit communication, studies the creation of an artificial language to encode a high task-utility message directly from data. However, in most cases, emergent communication sends insufficiently compressed messages with little or null information, which also may not be understandable to a third-party listener. This paper proposes an unsupervised method based on the information bottleneck to capture both referential complexity and task-specific utility to adequately explore sparse social communication scenarios in multi-agent reinforcement learning (MARL). We show that our model is able to i) develop a natural-language-inspired lexicon of messages that is independently composed of a set of emergent concepts, which span the observations and intents with minimal bits, ii) develop communication to align the action policies of heterogeneous agents with dissimilar feature models, and iii) learn a communication policy from watching an expert's action policy, which we term `social shadowing'. △ Less

Submitted 27 February, 2023; originally announced February 2023.

Comments: 14 pages, 5 figures

arXiv:2302.12232 [pdf, other]

Concept Learning for Interpretable Multi-Agent Reinforcement Learning

Authors: Renos Zabounidis, Joseph Campbell, Simon Stepputtis, Dana Hughes, Katia Sycara

Abstract: Multi-agent robotic systems are increasingly operating in real-world environments in close proximity to humans, yet are largely controlled by policy models with inscrutable deep neural network representations. We introduce a method for incorporating interpretable concepts from a domain expert into models trained through multi-agent reinforcement learning, by requiring the model to first predict su… ▽ More Multi-agent robotic systems are increasingly operating in real-world environments in close proximity to humans, yet are largely controlled by policy models with inscrutable deep neural network representations. We introduce a method for incorporating interpretable concepts from a domain expert into models trained through multi-agent reinforcement learning, by requiring the model to first predict such concepts then utilize them for decision making. This allows an expert to both reason about the resulting concept policy models in terms of these high-level concepts at run-time, as well as intervene and correct mispredictions to improve performance. We show that this yields improved interpretability and training stability, with benefits to policy performance and sample efficiency in a simulated and real-world cooperative-competitive multi-agent game. △ Less

Submitted 23 February, 2023; originally announced February 2023.

Comments: Accepted to the 6th Conference on Robot Learning (CoRL 2022), Auckland, New Zealand

arXiv:2302.05018 [pdf, other]

Predicting Out-of-Distribution Error with Confidence Optimal Transport

Authors: Yuzhe Lu, Zhenlin Wang, Runtian Zhai, Soheil Kolouri, Joseph Campbell, Katia Sycara

Abstract: Out-of-distribution (OOD) data poses serious challenges in deployed machine learning models as even subtle changes could incur significant performance drops. Being able to estimate a model's performance on test data is important in practice as it indicates when to trust to model's decisions. We present a simple yet effective method to predict a model's performance on an unknown distribution withou… ▽ More Out-of-distribution (OOD) data poses serious challenges in deployed machine learning models as even subtle changes could incur significant performance drops. Being able to estimate a model's performance on test data is important in practice as it indicates when to trust to model's decisions. We present a simple yet effective method to predict a model's performance on an unknown distribution without any addition annotation. Our approach is rooted in the Optimal Transport theory, viewing test samples' output softmax scores from deep neural networks as empirical samples from an unknown distribution. We show that our method, Confidence Optimal Transport (COT), provides robust estimates of a model's performance on a target domain. Despite its simplicity, our method achieves state-of-the-art results on three benchmark datasets and outperforms existing methods by a large margin. △ Less

Submitted 9 February, 2023; originally announced February 2023.

arXiv:2301.03293 [pdf, other]

Distributed Multirobot Control for Non-Cooperative Herding

Authors: Nishant Mohanty, Jaskaran Grover, Changliu Liu, Katia Sycara

Abstract: In this paper, we consider the problem of protecting a high-value area from being breached by sheep agents by crafting motions for dog robots. We use control barrier functions to pose constraints on the dogs' velocities that induce repulsion in the sheep relative to the high-value area. This paper extends the results developed in our prior work on the same topic in three ways. Firstly, we implemen… ▽ More In this paper, we consider the problem of protecting a high-value area from being breached by sheep agents by crafting motions for dog robots. We use control barrier functions to pose constraints on the dogs' velocities that induce repulsion in the sheep relative to the high-value area. This paper extends the results developed in our prior work on the same topic in three ways. Firstly, we implement and validate our previously developed centralized herding algorithm on many robots. We show herding of up to five sheep agents using three dog robots. Secondly, as an extension to the centralized approach, we develop two distributed herding algorithms, one favoring feasibility while the other favoring optimality. In the first algorithm, we allocate a unique sheep to a unique dog, making that dog responsible for herding its allocated sheep away from the protected zone. We provide feasibility proof for this approach, along with numerical simulations. In the second algorithm, we develop an iterative distributed reformulation of the centralized algorithm, which inherits the optimality (i.e. budget efficiency) from the centralized approach. Lastly, we conduct real-world experiments of these distributed algorithms and demonstrate herding of up to five sheep agents using five dog robots. △ Less

Submitted 5 March, 2023; v1 submitted 9 January, 2023; originally announced January 2023.

arXiv:2212.00115 [pdf, other]

Towards True Lossless Sparse Communication in Multi-Agent Systems

Authors: Seth Karten, Mycal Tucker, Siva Kailas, Katia Sycara

Abstract: Communication enables agents to cooperate to achieve their goals. Learning when to communicate, i.e., sparse (in time) communication, and whom to message is particularly important when bandwidth is limited. Recent work in learning sparse individualized communication, however, suffers from high variance during training, where decreasing communication comes at the cost of decreased reward, particula… ▽ More Communication enables agents to cooperate to achieve their goals. Learning when to communicate, i.e., sparse (in time) communication, and whom to message is particularly important when bandwidth is limited. Recent work in learning sparse individualized communication, however, suffers from high variance during training, where decreasing communication comes at the cost of decreased reward, particularly in cooperative tasks. We use the information bottleneck to reframe sparsity as a representation learning problem, which we show naturally enables lossless sparse communication at lower budgets than prior art. In this paper, we propose a method for true lossless sparsity in communication via Information Maximizing Gated Sparse Multi-Agent Communication (IMGS-MAC). Our model uses two individualized regularization objectives, an information maximization autoencoder and sparse communication loss, to create informative and sparse communication. We evaluate the learned communication `language' through direct causal analysis of messages in non-sparse runs to determine the range of lossless sparse budgets, which allow zero-shot sparsity, and the range of sparse budgets that will inquire a reward loss, which is minimized by our learned gating function with few-shot sparsity. To demonstrate the efficacy of our results, we experiment in cooperative multi-agent tasks where communication is essential for success. We evaluate our model with both continuous and discrete messages. We focus our analysis on a variety of ablations to show the effect of message representations, including their properties, and lossless performance of our model. △ Less

Submitted 30 November, 2022; originally announced December 2022.

Comments: 12 pages, 6 figures

arXiv:2211.07882 [pdf, other]

Explainable Action Advising for Multi-Agent Reinforcement Learning

Authors: Yue Guo, Joseph Campbell, Simon Stepputtis, Ruiyu Li, Dana Hughes, Fei Fang, Katia Sycara

Abstract: Action advising is a knowledge transfer technique for reinforcement learning based on the teacher-student paradigm. An expert teacher provides advice to a student during training in order to improve the student's sample efficiency and policy performance. Such advice is commonly given in the form of state-action pairs. However, it makes it difficult for the student to reason with and apply to novel… ▽ More Action advising is a knowledge transfer technique for reinforcement learning based on the teacher-student paradigm. An expert teacher provides advice to a student during training in order to improve the student's sample efficiency and policy performance. Such advice is commonly given in the form of state-action pairs. However, it makes it difficult for the student to reason with and apply to novel states. We introduce Explainable Action Advising, in which the teacher provides action advice as well as associated explanations indicating why the action was chosen. This allows the student to self-reflect on what it has learned, enabling advice generalization and leading to improved sample efficiency and learning performance - even in environments where the teacher is sub-optimal. We empirically show that our framework is effective in both single-agent and multi-agent scenarios, yielding improved policy returns and convergence rates when compared to state-of-the-art methods △ Less

Submitted 16 June, 2023; v1 submitted 14 November, 2022; originally announced November 2022.

arXiv:2208.12252 [pdf, ps, other]

Control Barrier Functions-based Semi-Definite Programs (CBF-SDPs): Robust Safe Control For Dynamic Systems with Relative Degree Two Safety Indices

Authors: Jaskaran Singh Grover, Changliu Liu, Katia Sycara

Abstract: In this draft article, we consider the problem of achieving safe control of a dynamic system for which the safety index or (control barrier function (loosely)) has relative degree equal to two. We consider parameter affine nonlinear dynamic systems and assume that the parametric uncertainty is uniform and known a-priori or being updated online through an estimator/parameter adaptation law. Under t… ▽ More In this draft article, we consider the problem of achieving safe control of a dynamic system for which the safety index or (control barrier function (loosely)) has relative degree equal to two. We consider parameter affine nonlinear dynamic systems and assume that the parametric uncertainty is uniform and known a-priori or being updated online through an estimator/parameter adaptation law. Under this uncertainty, the usual CBF-QP safe control approach takes the form of a robust optimization problem. Both the right hand side and left hand side of the inequality constraints depend on the unknown parameter. With the given representation of uncertainty, the CBF-QP safe control ends up being a convex semi-infinite problem. Using two different philosophies, one based on weak duality and another based on the Lossless s-procedure, we arrive at identical SDP formulations of this robust CBF-QP problem. Thus we show that the problem of computing safe controls with known parametric uncertainty can be posed as a tractable convex problem and be solved online. (This is work in progress). △ Less

Submitted 25 August, 2022; originally announced August 2022.

arXiv:2206.02095 [pdf, other]

ARC - Actor Residual Critic for Adversarial Imitation Learning

Authors: Ankur Deka, Changliu Liu, Katia Sycara

Abstract: Adversarial Imitation Learning (AIL) is a class of popular state-of-the-art Imitation Learning algorithms commonly used in robotics. In AIL, an artificial adversary's misclassification is used as a reward signal that is optimized by any standard Reinforcement Learning (RL) algorithm. Unlike most RL settings, the reward in AIL is $differentiable$ but current model-free RL algorithms do not make use… ▽ More Adversarial Imitation Learning (AIL) is a class of popular state-of-the-art Imitation Learning algorithms commonly used in robotics. In AIL, an artificial adversary's misclassification is used as a reward signal that is optimized by any standard Reinforcement Learning (RL) algorithm. Unlike most RL settings, the reward in AIL is $differentiable$ but current model-free RL algorithms do not make use of this property to train a policy. The reward is AIL is also shaped since it comes from an adversary. We leverage the differentiability property of the shaped AIL reward function and formulate a class of Actor Residual Critic (ARC) RL algorithms. ARC algorithms draw a parallel to the standard Actor-Critic (AC) algorithms in RL literature and uses a residual critic, $C$ function (instead of the standard $Q$ function) to approximate only the discounted future return (excluding the immediate reward). ARC algorithms have similar convergence properties as the standard AC algorithms with the additional advantage that the gradient through the immediate reward is exact. For the discrete (tabular) case with finite states, actions, and known dynamics, we prove that policy iteration with $C$ function converges to an optimal policy. In the continuous case with function approximation and unknown dynamics, we experimentally show that ARC aided AIL outperforms standard AIL in simulated continuous-control and real robotic manipulation tasks. ARC algorithms are simple to implement and can be incorporated into any existing AIL implementation with an AC algorithm. Video and link to code are available at: https://sites.google.com/view/actor-residual-critic. △ Less

Submitted 29 November, 2022; v1 submitted 5 June, 2022; originally announced June 2022.

arXiv:2206.01781 [pdf, other]

The Before, During, and After of Multi-Robot Deadlock

Authors: Jaskaran Grover, Changliu Liu, Katia Sycara

Abstract: Collision avoidance for multirobot systems is a well-studied problem. Recently, control barrier functions (CBFs) have been proposed for synthesizing controllers that guarantee collision avoidance and goal stabilization for multiple robots. However, it has been noted that reactive control synthesis methods (such as CBFs) are prone to \textit{deadlock}, an equilibrium of system dynamics that causes… ▽ More Collision avoidance for multirobot systems is a well-studied problem. Recently, control barrier functions (CBFs) have been proposed for synthesizing controllers that guarantee collision avoidance and goal stabilization for multiple robots. However, it has been noted that reactive control synthesis methods (such as CBFs) are prone to \textit{deadlock}, an equilibrium of system dynamics that causes the robots to stall before reaching their goals. In this paper, we analyze the closed-loop dynamics of robots using CBFs, to characterize controller parameters, initial conditions, and goal locations that invariably lead the system to deadlock. Using tools from duality theory, we derive geometric properties of robot configurations of an $N$ robot system once it is in deadlock and we justify them using the mechanics interpretation of KKT conditions. Our key deductions are that 1) system deadlock is characterized by a force-equilibrium on robots and 2) deadlock occurs to ensure safety when safety is on the brink of being violated. These deductions allow us to interpret deadlock as a subset of the state space, and we show that this set is non-empty and located on the boundary of the safe set. By exploiting these properties, we analyze the number of admissible robot configurations in deadlock and develop a provably-correct decentralized algorithm for deadlock resolution to safely deliver the robots to their goals. This algorithm is validated in simulations as well as experimentally on Khepera-IV robots. △ Less

Submitted 3 June, 2022; originally announced June 2022.

Comments: Accepted to International Journal of Robotics Research 2022, WAFR 2020 Special Issue

arXiv:2204.10945 [pdf, other]

Noncooperative Herding With Control Barrier Functions: Theory and Experiments

Authors: Jaskaran Grover, Nishant Mohanty, Wenhao Luo, Changliu Liu, Katia Sycara

Abstract: In this paper, we consider the problem of protecting a high-value unit from inadvertent attack by a group of agents using defending robots. Specifically, we develop a control strategy for the defending agents that we call "dog robots" to prevent a flock of "sheep agents" from breaching a protected zone. We take recourse to control barrier functions to pose this problem and exploit the interaction… ▽ More In this paper, we consider the problem of protecting a high-value unit from inadvertent attack by a group of agents using defending robots. Specifically, we develop a control strategy for the defending agents that we call "dog robots" to prevent a flock of "sheep agents" from breaching a protected zone. We take recourse to control barrier functions to pose this problem and exploit the interaction dynamics between the sheep and dogs to find dogs' velocities that result in the sheep getting repelled from the zone. We solve a QP reactively that incorporates the defending constraints to compute the desired velocities for all dogs. Owing to this, our proposed framework is composable \textit{i.e.} it allows for simultaneous inclusion of multiple protected zones in the constraints on dog robots' velocities. We provide a theoretical proof of feasibility of our strategy for the one dog/one sheep case. Additionally, we provide empirical results of two dogs defending the protected zone from upto ten sheep averaged over a hundred simulations and report high success rates. We also demonstrate this algorithm experimentally on non-holonomic robots. Videos of these results are available at https://tinyurl.com/4dj2kjwx. △ Less

Submitted 22 April, 2022; originally announced April 2022.

arXiv:2202.13461 [pdf, other]

Configuration Control for Physical Coupling of Heterogeneous Robot Swarms

Authors: Sha Yi, Zeynep Temel, Katia Sycara

Abstract: In this paper, we present a heterogeneous robot swarm system that can physically couple with each other to form functional structures and dynamically decouple to perform individual tasks. The connection between robots can be formed with a passive coupling mechanism, ensuring minimum energy consumption during coupling and decoupling behavior. The heterogeneity of the system enables the robots to pe… ▽ More In this paper, we present a heterogeneous robot swarm system that can physically couple with each other to form functional structures and dynamically decouple to perform individual tasks. The connection between robots can be formed with a passive coupling mechanism, ensuring minimum energy consumption during coupling and decoupling behavior. The heterogeneity of the system enables the robots to perform structural enhancement configurations based on specific environmental requirements. We propose a connection-pair oriented configuration control algorithm to form different assemblies. We show experiments of up to nine robots performing the coupling, gap-crossing, and decoupling behaviors. △ Less

Submitted 1 March, 2022; v1 submitted 27 February, 2022; originally announced February 2022.

arXiv:2202.02686 [pdf, other]

doi 10.1109/ICRA48506.2021.9561610

PuzzleBots: Physical Coupling of Robot Swarms

Authors: Sha Yi, Zeynep Temel, Katia Sycara

Abstract: Robot swarms have been shown to improve the ability of individual robots by inter-robot collaboration. In this paper, we present the PuzzleBots - a low-cost robotic swarm system where robots can physically couple with each other to form functional structures with minimum energy consumption while maintaining individual mobility to navigate within the environment. Each robot has knobs and holes alon… ▽ More Robot swarms have been shown to improve the ability of individual robots by inter-robot collaboration. In this paper, we present the PuzzleBots - a low-cost robotic swarm system where robots can physically couple with each other to form functional structures with minimum energy consumption while maintaining individual mobility to navigate within the environment. Each robot has knobs and holes along the sides of its body so that the robots can couple by inserting the knobs into the holes. We present the characterization of knob design and the result of gap-crossing behavior with up to nine robots. We show with hardware experiments that the robots are able to couple with each other to cross gaps and decouple to perform individual tasks. We anticipate the PuzzleBots will be useful in unstructured environments as individuals and coupled systems in real-world applications. △ Less

Submitted 5 February, 2022; originally announced February 2022.

Journal ref: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 8742-8748

arXiv:2201.12938 [pdf, other]

Probe-Based Interventions for Modifying Agent Behavior

Authors: Mycal Tucker, William Kuhl, Khizer Shahid, Seth Karten, Katia Sycara, Julie Shah

Abstract: Neural nets are powerful function approximators, but the behavior of a given neural net, once trained, cannot be easily modified. We wish, however, for people to be able to influence neural agents' actions despite the agents never training with humans, which we formalize as a human-assisted decision-making problem. Inspired by prior art initially developed for model explainability, we develop a me… ▽ More Neural nets are powerful function approximators, but the behavior of a given neural net, once trained, cannot be easily modified. We wish, however, for people to be able to influence neural agents' actions despite the agents never training with humans, which we formalize as a human-assisted decision-making problem. Inspired by prior art initially developed for model explainability, we develop a method for updating representations in pre-trained neural nets according to externally-specified properties. In experiments, we show how our method may be used to improve human-agent team performance for a variety of neural networks from image classifiers to agents in multi-agent reinforcement learning settings. △ Less

Submitted 26 January, 2022; originally announced January 2022.

arXiv:2201.07452 [pdf, other]

doi 10.1109/TCDS.2023.3236599

Interpretable Learned Emergent Communication for Human-Agent Teams

Authors: Seth Karten, Mycal Tucker, Huao Li, Siva Kailas, Michael Lewis, Katia Sycara

Abstract: Learning interpretable communication is essential for multi-agent and human-agent teams (HATs). In multi-agent reinforcement learning for partially-observable environments, agents may convey information to others via learned communication, allowing the team to complete its task. Inspired by human languages, recent works study discrete (using only a finite set of tokens) and sparse (communicating o… ▽ More Learning interpretable communication is essential for multi-agent and human-agent teams (HATs). In multi-agent reinforcement learning for partially-observable environments, agents may convey information to others via learned communication, allowing the team to complete its task. Inspired by human languages, recent works study discrete (using only a finite set of tokens) and sparse (communicating only at some time-steps) communication. However, the utility of such communication in human-agent team experiments has not yet been investigated. In this work, we analyze the efficacy of sparse-discrete methods for producing emergent communication that enables high agent-only and human-agent team performance. We develop agent-only teams that communicate sparsely via our scheme of Enforcers that sufficiently constrain communication to any budget. Our results show no loss or minimal loss of performance in benchmark environments and tasks. In human-agent teams tested in benchmark environments, where agents have been modeled using the Enforcers, we find that a prototype-based method produces meaningful discrete tokens that enable human partners to learn agent communication faster and better than a one-hot baseline. Additional HAT experiments show that an appropriate sparsity level lowers the cognitive load of humans when communicating with teams of agents and leads to superior team performance. △ Less

Submitted 5 January, 2023; v1 submitted 19 January, 2022; originally announced January 2022.

Comments: 12 pages and 12 figures. Accepted for publication at IEEE Transactions on Cognitive and Developmental Systems

arXiv:2110.08963 [pdf, other]

SS-MAIL: Self-Supervised Multi-Agent Imitation Learning

Authors: Akshay Dharmavaram, Tejus Gupta, Jiachen Li, Katia P. Sycara

Abstract: The current landscape of multi-agent expert imitation is broadly dominated by two families of algorithms - Behavioral Cloning (BC) and Adversarial Imitation Learning (AIL). BC approaches suffer from compounding errors, as they ignore the sequential decision-making nature of the trajectory generation problem. Furthermore, they cannot effectively model multi-modal behaviors. While AIL methods solve… ▽ More The current landscape of multi-agent expert imitation is broadly dominated by two families of algorithms - Behavioral Cloning (BC) and Adversarial Imitation Learning (AIL). BC approaches suffer from compounding errors, as they ignore the sequential decision-making nature of the trajectory generation problem. Furthermore, they cannot effectively model multi-modal behaviors. While AIL methods solve the issue of compounding errors and multi-modal policy training, they are plagued with instability in their training dynamics. In this work, we address this issue by introducing a novel self-supervised loss that encourages the discriminator to approximate a richer reward function. We employ our method to train a graph-based multi-agent actor-critic architecture that learns a centralized policy, conditioned on a learned latent interaction graph. We show that our method (SS-MAIL) outperforms prior state-of-the-art methods on real-world prediction tasks, as well as on custom-designed synthetic experiments. We prove that SS-MAIL is part of the family of AIL methods by providing a theoretical connection to cost-regularized apprenticeship learning. Moreover, we leverage the self-supervised formulation to introduce a novel teacher forcing-based curriculum (Trajectory Forcing) that improves sample efficiency by progressively increasing the length of the generated trajectory. The SS-MAIL framework improves multi-agent imitation capabilities by stabilizing the policy training, improving the reward sha** capabilities, as well as providing the ability for modeling multi-modal trajectories. △ Less

Submitted 17 October, 2021; originally announced October 2021.

Comments: Pre-Print

arXiv:2108.01828 [pdf, other]

Emergent Discrete Communication in Semantic Spaces

Authors: Mycal Tucker, Huao Li, Siddharth Agrawal, Dana Hughes, Katia Sycara, Michael Lewis, Julie Shah

Abstract: Neural agents trained in reinforcement learning settings can learn to communicate among themselves via discrete tokens, accomplishing as a team what agents would be unable to do alone. However, the current standard of using one-hot vectors as discrete communication tokens prevents agents from acquiring more desirable aspects of communication such as zero-shot understanding. Inspired by word embedd… ▽ More Neural agents trained in reinforcement learning settings can learn to communicate among themselves via discrete tokens, accomplishing as a team what agents would be unable to do alone. However, the current standard of using one-hot vectors as discrete communication tokens prevents agents from acquiring more desirable aspects of communication such as zero-shot understanding. Inspired by word embedding techniques from natural language processing, we propose neural agent architectures that enables them to communicate via discrete tokens derived from a learned, continuous space. We show in a decision theoretic framework that our technique optimizes communication over a wide range of scenarios, whereas one-hot tokens are only optimal under restrictive assumptions. In self-play experiments, we validate that our trained agents learn to cluster tokens in semantically-meaningful ways, allowing them communicate in noisy environments where other techniques fail. Lastly, we demonstrate both that agents using our method can effectively respond to novel human communication and that humans can understand unlabeled emergent agent communication, outperforming the use of one-hot communication. △ Less

Submitted 4 November, 2021; v1 submitted 3 August, 2021; originally announced August 2021.

arXiv:2108.00159 [pdf, other]

Learning Embeddings that Capture Spatial Semantics for Indoor Navigation

Authors: Vidhi Jain, Prakhar Agarwal, Shishir Patil, Katia Sycara

Abstract: Incorporating domain-specific priors in search and navigation tasks has shown promising results in improving generalization and sample complexity over end-to-end trained policies. In this work, we study how object embeddings that capture spatial semantic priors can guide search and navigation tasks in a structured environment. We know that humans can search for an object like a book, or a plate in… ▽ More Incorporating domain-specific priors in search and navigation tasks has shown promising results in improving generalization and sample complexity over end-to-end trained policies. In this work, we study how object embeddings that capture spatial semantic priors can guide search and navigation tasks in a structured environment. We know that humans can search for an object like a book, or a plate in an unseen house, based on the spatial semantics of bigger objects detected. For example, a book is likely to be on a bookshelf or a table, whereas a plate is likely to be in a cupboard or dishwasher. We propose a method to incorporate such spatial semantic awareness in robots by leveraging pre-trained language models and multi-relational knowledge bases as object embeddings. We demonstrate using these object embeddings to search a query object in an unseen indoor environment. We measure the performance of these embeddings in an indoor simulator (AI2Thor). We further evaluate different pre-trained embedding onSuccess Rate(SR) and success weighted by Path Length(SPL). △ Less

Submitted 31 July, 2021; originally announced August 2021.

arXiv:2104.02938 [pdf, other]

Deep Interpretable Models of Theory of Mind

Authors: Ini Oguntola, Dana Hughes, Katia Sycara

Abstract: When develo** AI systems that interact with humans, it is essential to design both a system that can understand humans, and a system that humans can understand. Most deep network based agent-modeling approaches are 1) not interpretable and 2) only model external behavior, ignoring internal mental states, which potentially limits their capability for assistance, interventions, discovering false b… ▽ More When develo** AI systems that interact with humans, it is essential to design both a system that can understand humans, and a system that humans can understand. Most deep network based agent-modeling approaches are 1) not interpretable and 2) only model external behavior, ignoring internal mental states, which potentially limits their capability for assistance, interventions, discovering false beliefs, etc. To this end, we develop an interpretable modular neural framework for modeling the intentions of other observed entities. We demonstrate the efficacy of our approach with experiments on data from human participants on a search and rescue task in Minecraft, and show that incorporating interpretability can significantly increase predictive performance under the right conditions. △ Less

Submitted 12 July, 2021; v1 submitted 7 April, 2021; originally announced April 2021.

Comments: RO-MAN 2021

arXiv:2103.06359 [pdf, other]

Hiding Leader's Identity in Leader-Follower Navigation through Multi-Agent Reinforcement Learning

Authors: Ankur Deka, Wenhao Luo, Huao Li, Michael Lewis, Katia Sycara

Abstract: Leader-follower navigation is a popular class of multi-robot algorithms where a leader robot leads the follower robots in a team. The leader has specialized capabilities or mission-critical information (e.g. goal location) that the followers lack, and this makes the leader crucial for the mission's success. However, this also makes the leader a vulnerability - an external adversary who wishes to s… ▽ More Leader-follower navigation is a popular class of multi-robot algorithms where a leader robot leads the follower robots in a team. The leader has specialized capabilities or mission-critical information (e.g. goal location) that the followers lack, and this makes the leader crucial for the mission's success. However, this also makes the leader a vulnerability - an external adversary who wishes to sabotage the robot team's mission can simply harm the leader and the whole robot team's mission would be compromised. Since robot motion generated by traditional leader-follower navigation algorithms can reveal the identity of the leader, we propose a defense mechanism of hiding the leader's identity by ensuring the leader moves in a way that behaviorally camouflages it with the followers, making it difficult for an adversary to identify the leader. To achieve this, we combine Multi-Agent Reinforcement Learning, Graph Neural Networks and adversarial training. Our approach enables the multi-robot team to optimize the primary task performance with leader motion similar to follower motion, behaviorally camouflaging it with the followers. Our algorithm outperforms existing work that tries to hide the leader's identity in a multi-robot team by tuning traditional leader-follower control parameters with Classical Genetic Algorithms. We also evaluated human performance in inferring the leader's identity and found that humans had lower accuracy when the robot team used our proposed navigation algorithm. △ Less

Submitted 14 September, 2021; v1 submitted 10 March, 2021; originally announced March 2021.

arXiv:2103.04439 [pdf, other]

Adaptive Agent Architecture for Real-time Human-Agent Teaming

Authors: Tianwei Ni, Huao Li, Siddharth Agrawal, Suhas Raja, Fan Jia, Yikang Gui, Dana Hughes, Michael Lewis, Katia Sycara

Abstract: Teamwork is a set of interrelated reasoning, actions and behaviors of team members that facilitate common objectives. Teamwork theory and experiments have resulted in a set of states and processes for team effectiveness in both human-human and agent-agent teams. However, human-agent teaming is less well studied because it is so new and involves asymmetry in policy and intent not present in human t… ▽ More Teamwork is a set of interrelated reasoning, actions and behaviors of team members that facilitate common objectives. Teamwork theory and experiments have resulted in a set of states and processes for team effectiveness in both human-human and agent-agent teams. However, human-agent teaming is less well studied because it is so new and involves asymmetry in policy and intent not present in human teams. To optimize team performance in human-agent teaming, it is critical that agents infer human intent and adapt their polices for smooth coordination. Most literature in human-agent teaming builds agents referencing a learned human model. Though these agents are guaranteed to perform well with the learned model, they lay heavy assumptions on human policy such as optimality and consistency, which is unlikely in many real-world scenarios. In this paper, we propose a novel adaptive agent architecture in human-model-free setting on a two-player cooperative game, namely Team Space Fortress (TSF). Previous human-human team research have shown complementary policies in TSF game and diversity in human players' skill, which encourages us to relax the assumptions on human policy. Therefore, we discard learning human models from human data, and instead use an adaptation strategy on a pre-trained library of exemplar policies composed of RL algorithms or rule-based methods with minimal assumptions of human behavior. The adaptation strategy relies on a novel similarity metric to infer human policy and then selects the most complementary policy in our library to maximize the team performance. The adaptive agent architecture can be deployed in real-time and generalize to any off-the-shelf static agents. We conducted human-agent experiments to evaluate the proposed adaptive agent framework, and demonstrated the suboptimality, diversity, and adaptability of human policies in human-agent teams. △ Less

Submitted 7 March, 2021; originally announced March 2021.

Comments: The first three authors contributed equally. In AAAI 2021 Workshop on Plan, Activity, and Intent Recognition

arXiv:2012.10008 [pdf, other]

Online Connectivity-aware Dynamic Deployment for Heterogeneous Multi-Robot Systems

Authors: Chendi Lin, Wenhao Luo, Katia Sycara

Abstract: In this paper, we consider the dynamic multi-robot distribution problem where a heterogeneous group of networked robots is tasked to spread out and simultaneously move towards multiple moving task areas while maintaining connectivity. The heterogeneity of the system is characterized by various categories of units and each robot carries different numbers of units per category representing heterogen… ▽ More In this paper, we consider the dynamic multi-robot distribution problem where a heterogeneous group of networked robots is tasked to spread out and simultaneously move towards multiple moving task areas while maintaining connectivity. The heterogeneity of the system is characterized by various categories of units and each robot carries different numbers of units per category representing heterogeneous capabilities. Every task area with different importance demands a total number of units contributed by all of the robots within its area. Moreover, we assume the importance and the total number of units requested from each task area is initially unknown. The robots need first to explore, i.e., reach those areas, and then be allocated to the tasks so to fulfill the requirements. The multi-robot distribution problem is formulated as designing controllers to distribute the robots that maximize the overall task fulfillment while minimizing the traveling costs in presence of connectivity constraints. We propose a novel connectivity-aware multi-robot redistribution approach that accounts for dynamic task allocation and connectivity maintenance for a heterogeneous robot team. Such an approach could generate sub-optimal robot controllers so that the amount of total unfulfilled requirements of the tasks weighted by their importance is minimized and robots stay connected at all times. Simulation and numerical results are provided to demonstrate the effectiveness of the proposed approaches. △ Less

Submitted 28 April, 2021; v1 submitted 17 December, 2020; originally announced December 2020.

Comments: IEEE International Conference on Robotics and Automation (ICRA), 2021 (Oral presentation)

arXiv:2011.07656 [pdf, other]

Predicting Human Strategies in Simulated Search and Rescue Task

Authors: Vidhi Jain, Rohit Jena, Huao Li, Tejus Gupta, Dana Hughes, Michael Lewis, Katia Sycara

Abstract: In a search and rescue scenario, rescuers may have different knowledge of the environment and strategies for exploration. Understanding what is inside a rescuer's mind will enable an observer agent to proactively assist them with critical information that can help them perform their task efficiently. To this end, we propose to build models of the rescuers based on their trajectory observations to… ▽ More In a search and rescue scenario, rescuers may have different knowledge of the environment and strategies for exploration. Understanding what is inside a rescuer's mind will enable an observer agent to proactively assist them with critical information that can help them perform their task efficiently. To this end, we propose to build models of the rescuers based on their trajectory observations to predict their strategies. In our efforts to model the rescuer's mind, we begin with a simple simulated search and rescue task in Minecraft with human participants. We formulate neural sequence models to predict the triage strategy and the next location of the rescuer. As the neural networks are data-driven, we design a diverse set of artificial "faux human" agents for training, to test them with limited human rescuer trajectory data. To evaluate the agents, we compare it to an evidence accumulation method that explicitly incorporates all available background knowledge and provides an intended upper bound for the expected performance. Further, we perform experiments where the observer/predictor is human. We show results in terms of prediction accuracy of our computational approaches as compared with that of human observers. △ Less

Submitted 19 November, 2020; v1 submitted 15 November, 2020; originally announced November 2020.

Comments: Accepted at NeurIPS 2020; Workshop on Artificial Intelligence for Humanitarian Assistance and Disaster Response (AI+HADR 2020)

arXiv:2011.04904 [pdf, other]

Feasible Region-based Identification Using Duality (Extended Version)

Authors: Jaskaran Grover, Changliu Liu, Katia Sycara

Abstract: We consider the problem of estimating bounds on parameters representing tasks being performed by individual robots in a multirobot system. In our previous work, we derived necessary conditions based on persistency of excitation analysis for the exact identification of these parameters. We concluded that depending on the robot's task, the dynamics of individual robots may fail to satisfy these cond… ▽ More We consider the problem of estimating bounds on parameters representing tasks being performed by individual robots in a multirobot system. In our previous work, we derived necessary conditions based on persistency of excitation analysis for the exact identification of these parameters. We concluded that depending on the robot's task, the dynamics of individual robots may fail to satisfy these conditions, thereby preventing exact inference. As an extension to that work, this paper focuses on estimating bounds on task parameters when such conditions are not satisfied. Each robot in the team uses optimization-based controllers for mediating between task satisfaction and collision avoidance. We use KKT conditions of this optimization and SVD of active collision avoidance constraints to derive explicit relations between Lagrange multipliers, robot dynamics, and task parameters. Using these relations, we are able to derive bounds on each robot's task parameters. Through numerical simulations, we show how our proposed region based identification approach generates feasible regions for parameters when a conventional estimator such as a UKF fails. Additionally, empirical evidence shows that this approach generates contracting sets which converge to the true parameters much faster than the rate at which a UKF based estimate converges. Videos of these results are available at https://bit.ly/2JDMgeJ △ Less

Submitted 7 November, 2020; originally announced November 2020.

Comments: arXiv admin note: text overlap with arXiv:2009.13817

arXiv:2009.13817 [pdf, ps, other]

Parameter Identification for Multirobot Systems Using Optimization Based Controllers (Extended Version)

Authors: Jaskaran Singh Grover, Changliu Liu, Katia Sycara

Abstract: This paper considers the problem of parameter identification for a multirobot system. We wish to understand when is it feasible for an adversarial observer to reverse-engineer the parameters of tasks being performed by a team of robots by simply observing their positions. We address this question by using the concept of persistency of excitation from system identification. Each robot in the team u… ▽ More This paper considers the problem of parameter identification for a multirobot system. We wish to understand when is it feasible for an adversarial observer to reverse-engineer the parameters of tasks being performed by a team of robots by simply observing their positions. We address this question by using the concept of persistency of excitation from system identification. Each robot in the team uses optimization-based controllers for mediating between task satisfaction and collision avoidance. These controllers exhibit an implicit dependence on the task's parameters which poses a hurdle for deriving necessary conditions for parameter identification, since such conditions usually require an explicit relation. We address this bottleneck by using duality theory and SVD of active collision avoidance constraints and derive an explicit relation between each robot's task parameters and its control inputs. This allows us to derive the main necessary conditions for successful identification which agree with our intuition. We demonstrate the importance of these conditions through numerical simulations by using (a) an adaptive observer and (b) an unscented Kalman filter for goal estimation in various geometric settings. These simulations show that under circumstances where parameter inference is supposed to be infeasible per our conditions, both these estimators fail and likewise when it is feasible, both converge to the true parameters. Videos of these results are available at https://bit.ly/3kQYj5J. △ Less

Submitted 29 September, 2020; originally announced September 2020.

arXiv:2009.09467 [pdf, other]

Addressing reward bias in Adversarial Imitation Learning with neutral reward functions

Authors: Rohit Jena, Siddharth Agrawal, Katia Sycara

Abstract: Generative Adversarial Imitation Learning suffers from the fundamental problem of reward bias stemming from the choice of reward functions used in the algorithm. Different types of biases also affect different types of environments - which are broadly divided into survival and task-based environments. We provide a theoretical sketch of why existing reward functions would fail in imitation learning… ▽ More Generative Adversarial Imitation Learning suffers from the fundamental problem of reward bias stemming from the choice of reward functions used in the algorithm. Different types of biases also affect different types of environments - which are broadly divided into survival and task-based environments. We provide a theoretical sketch of why existing reward functions would fail in imitation learning scenarios in task based environments with multiple terminal states. We also propose a new reward function for GAIL which outperforms existing GAIL methods on task based environments with single and multiple terminal states and effectively overcomes both survival and termination bias. △ Less

Submitted 20 September, 2020; originally announced September 2020.

arXiv:2008.07698 [pdf, other]

Learning Complex Multi-Agent Policies in Presence of an Adversary

Authors: Siddharth Ghiya, Katia Sycara

Abstract: In recent years, there has been some outstanding work on applying deep reinforcement learning to multi-agent settings. Often in such multi-agent scenarios, adversaries can be present. We address the requirements of such a setting by implementing a graph-based multi-agent deep reinforcement learning algorithm. In this work, we consider the scenario of multi-agent deception in which multiple agents… ▽ More In recent years, there has been some outstanding work on applying deep reinforcement learning to multi-agent settings. Often in such multi-agent scenarios, adversaries can be present. We address the requirements of such a setting by implementing a graph-based multi-agent deep reinforcement learning algorithm. In this work, we consider the scenario of multi-agent deception in which multiple agents need to learn to cooperate and communicate in order to deceive an adversary. We have employed a two-stage learning process to get the cooperating agents to learn such deceptive behaviors. Our experiments show that our approach allows us to employ curriculum learning to increase the number of cooperating agents in the environment and enables a team of agents to learn complex behaviors to successfully deceive an adversary. Keywords: Multi-agent system, Graph neural network, Reinforcement learning △ Less

Submitted 7 October, 2020; v1 submitted 17 August, 2020; originally announced August 2020.

Showing 1–50 of 73 results for author: Sycara, K