-
PARSAC: Fast, Human-quality Floorplanning for Modern SoCs with Complex Design Constraints
Authors:
Hesham Mostafa,
Uday Mallappa,
Mikhail Galkin,
Mariano Phielipp,
Somdeb Majumdar
Abstract:
The floorplanning of Systems-on-a-Chip (SoCs) and of chip sub-systems is a crucial step in the physical design flow as it determines the optimal shapes and locations of the blocks that make up the system. Simulated Annealing (SA) has been the method of choice for tackling classical floorplanning problems where the objective is to minimize wire-length and the total placement area. The goal in indus…
▽ More
The floorplanning of Systems-on-a-Chip (SoCs) and of chip sub-systems is a crucial step in the physical design flow as it determines the optimal shapes and locations of the blocks that make up the system. Simulated Annealing (SA) has been the method of choice for tackling classical floorplanning problems where the objective is to minimize wire-length and the total placement area. The goal in industry-relevant floorplanning problems, however, is not only to minimize area and wire-length, but to do that while respecting hard placement constraints that specify the general area and/or the specific locations for the placement of some blocks. We show that simply incorporating these constraints into the SA objective function leads to sub-optimal, and often illegal, solutions. We propose the Constraints-Aware Simulated Annealing (CA-SA) method and show that it strongly outperforms vanilla SA in floorplanning problems with hard placement constraints. We developed a new floorplanning tool on top of CA-SA: PARSAC (Parallel Simulated Annealing with Constraints). PARSAC is an efficient, easy-to-use, and massively parallel floorplanner. Unlike current SA-based or learning-based floorplanning tools that cannot effectively incorporate hard placement-constraints, PARSAC can quickly construct the Pareto-optimal legal solutions front for constrained floorplanning problems. PARSAC also outperforms traditional SA on legacy floorplanning benchmarks. PARSAC is available as an open-source repository for researchers to replicate and build on our result.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
FloorSet -- a VLSI Floorplanning Dataset with Design Constraints of Real-World SoCs
Authors:
Uday Mallappa,
Hesham Mostafa,
Mikhail Galkin,
Mariano Phielipp,
Somdeb Majumdar
Abstract:
Floorplanning for systems-on-a-chip (SoCs) and its sub-systems is a crucial and non-trivial step of the physical design flow. It represents a difficult combinatorial optimization problem. A typical large scale SoC with 120 partitions generates a search-space of nearly 10E250. As novel machine learning (ML) approaches emerge to tackle such problems, there is a growing need for a modern benchmark th…
▽ More
Floorplanning for systems-on-a-chip (SoCs) and its sub-systems is a crucial and non-trivial step of the physical design flow. It represents a difficult combinatorial optimization problem. A typical large scale SoC with 120 partitions generates a search-space of nearly 10E250. As novel machine learning (ML) approaches emerge to tackle such problems, there is a growing need for a modern benchmark that comprises a large training dataset and performance metrics that better reflect real-world constraints and objectives compared to existing benchmarks. To address this need, we present FloorSet -- two comprehensive datasets of synthetic fixed-outline floorplan layouts that reflect the distribution of real SoCs. Each dataset has 1M training samples and 100 test samples where each sample is a synthetic floor-plan. FloorSet-Prime comprises fully-abutted rectilinear partitions and near-optimal wire-length. A simplified dataset that reflects early design phases, FloorSet-Lite comprises rectangular partitions, with under 5 percent white-space and near-optimal wire-length. Both datasets define hard constraints seen in modern design flows such as shape constraints, edge-affinity, grou** constraints, and pre-placement constraints. FloorSet is intended to spur fundamental research on large-scale constrained optimization problems. Crucially, FloorSet alleviates the core issue of reproducibility in modern ML driven solutions to such problems. FloorSet is available as an open-source repository for the research community.
△ Less
Submitted 27 June, 2024; v1 submitted 8 May, 2024;
originally announced May 2024.
-
MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot Learning
Authors:
Rafael Rafailov,
Kyle Hatch,
Victor Kolev,
John D. Martin,
Mariano Phielipp,
Chelsea Finn
Abstract:
We study the problem of offline pre-training and online fine-tuning for reinforcement learning from high-dimensional observations in the context of realistic robot tasks. Recent offline model-free approaches successfully use online fine-tuning to either improve the performance of the agent over the data collection policy or adapt to novel tasks. At the same time, model-based RL algorithms have ach…
▽ More
We study the problem of offline pre-training and online fine-tuning for reinforcement learning from high-dimensional observations in the context of realistic robot tasks. Recent offline model-free approaches successfully use online fine-tuning to either improve the performance of the agent over the data collection policy or adapt to novel tasks. At the same time, model-based RL algorithms have achieved significant progress in sample efficiency and the complexity of the tasks they can solve, yet remain under-utilized in the fine-tuning setting. In this work, we argue that existing model-based offline RL methods are not suitable for offline-to-online fine-tuning in high-dimensional domains due to issues with distribution shifts, off-dynamics data, and non-stationary rewards. We propose an on-policy model-based method that can efficiently reuse prior data through model-based value expansion and policy regularization, while preventing model exploitation by controlling epistemic uncertainty. We find that our approach successfully solves tasks from the MetaWorld benchmark, as well as the Franka Kitchen robot manipulation environment completely from images. To the best of our knowledge, MOTO is the first method to solve this environment from pixels.
△ Less
Submitted 6 January, 2024;
originally announced January 2024.
-
Searching for High-Value Molecules Using Reinforcement Learning and Transformers
Authors:
Raj Ghugare,
Santiago Miret,
Adriana Hugessen,
Mariano Phielipp,
Glen Berseth
Abstract:
Reinforcement learning (RL) over text representations can be effective for finding high-value policies that can search over graphs. However, RL requires careful structuring of the search space and algorithm design to be effective in this challenge. Through extensive experiments, we explore how different design choices for text grammar and algorithmic choices for training can affect an RL policy's…
▽ More
Reinforcement learning (RL) over text representations can be effective for finding high-value policies that can search over graphs. However, RL requires careful structuring of the search space and algorithm design to be effective in this challenge. Through extensive experiments, we explore how different design choices for text grammar and algorithmic choices for training can affect an RL policy's ability to generate molecules with desired properties. We arrive at a new RL-based molecular design algorithm (ChemRLformer) and perform a thorough analysis using 25 molecule design tasks, including computationally complex protein docking simulations. From this analysis, we discover unique insights in this problem space and show that ChemRLformer achieves state-of-the-art performance while being more straightforward than prior work by demystifying which design choices are actually helpful for text-based molecule design.
△ Less
Submitted 4 October, 2023;
originally announced October 2023.
-
Learning Sparse Control Tasks from Pixels by Latent Nearest-Neighbor-Guided Explorations
Authors:
Ruihan Zhao,
Ufuk Topcu,
Sandeep Chinchali,
Mariano Phielipp
Abstract:
Recent progress in deep reinforcement learning (RL) and computer vision enables artificial agents to solve complex tasks, including locomotion, manipulation and video games from high-dimensional pixel observations. However, domain specific reward functions are often engineered to provide sufficient learning signals, requiring expert knowledge. While it is possible to train vision-based RL agents u…
▽ More
Recent progress in deep reinforcement learning (RL) and computer vision enables artificial agents to solve complex tasks, including locomotion, manipulation and video games from high-dimensional pixel observations. However, domain specific reward functions are often engineered to provide sufficient learning signals, requiring expert knowledge. While it is possible to train vision-based RL agents using only sparse rewards, additional challenges in exploration arise. We present a novel and efficient method to solve sparse-reward robot manipulation tasks from only image observations by utilizing a few demonstrations. First, we learn an embedded neural dynamics model from demonstration transitions and further fine-tune it with the replay buffer. Next, we reward the agents for staying close to the demonstrated trajectories using a distance metric defined in the embedding space. Finally, we use an off-policy, model-free vision RL algorithm to update the control policies. Our method achieves state-of-the-art sample efficiency in simulation and enables efficient training of a real Franka Emika Panda manipulator.
△ Less
Submitted 27 February, 2023;
originally announced February 2023.
-
Modularity through Attention: Efficient Training and Transfer of Language-Conditioned Policies for Robot Manipulation
Authors:
Yifan Zhou,
Shubham Sonawani,
Mariano Phielipp,
Simon Stepputtis,
Heni Ben Amor
Abstract:
Language-conditioned policies allow robots to interpret and execute human instructions. Learning such policies requires a substantial investment with regards to time and compute resources. Still, the resulting controllers are highly device-specific and cannot easily be transferred to a robot with different morphology, capability, appearance or dynamics. In this paper, we propose a sample-efficient…
▽ More
Language-conditioned policies allow robots to interpret and execute human instructions. Learning such policies requires a substantial investment with regards to time and compute resources. Still, the resulting controllers are highly device-specific and cannot easily be transferred to a robot with different morphology, capability, appearance or dynamics. In this paper, we propose a sample-efficient approach for training language-conditioned manipulation policies that allows for rapid transfer across different types of robots. By introducing a novel method, namely Hierarchical Modularity, and adopting supervised attention across multiple sub-modules, we bridge the divide between modular and end-to-end learning and enable the reuse of functional building blocks. In both simulated and real world robot manipulation experiments, we demonstrate that our method outperforms the current state-of-the-art methods and can transfer policies across 4 different robots in a sample-efficient manner. Finally, we show that the functionality of learned sub-modules is maintained beyond the training process and can be used to introspect the robot decision-making process. Code is available at https://github.com/ir-lab/ModAttn.
△ Less
Submitted 8 December, 2022;
originally announced December 2022.
-
Group SELFIES: A Robust Fragment-Based Molecular String Representation
Authors:
Austin Cheng,
Andy Cai,
Santiago Miret,
Gustavo Malkomes,
Mariano Phielipp,
Alán Aspuru-Guzik
Abstract:
We introduce Group SELFIES, a molecular string representation that leverages group tokens to represent functional groups or entire substructures while maintaining chemical robustness guarantees. Molecular string representations, such as SMILES and SELFIES, serve as the basis for molecular generation and optimization in chemical language models, deep generative models, and evolutionary methods. Whi…
▽ More
We introduce Group SELFIES, a molecular string representation that leverages group tokens to represent functional groups or entire substructures while maintaining chemical robustness guarantees. Molecular string representations, such as SMILES and SELFIES, serve as the basis for molecular generation and optimization in chemical language models, deep generative models, and evolutionary methods. While SMILES and SELFIES leverage atomic representations, Group SELFIES builds on top of the chemical robustness guarantees of SELFIES by enabling group tokens, thereby creating additional flexibility to the representation. Moreover, the group tokens in Group SELFIES can take advantage of inductive biases of molecular fragments that capture meaningful chemical motifs. The advantages of capturing chemical motifs and flexibility are demonstrated in our experiments, which show that Group SELFIES improves distribution learning of common molecular datasets. Further experiments also show that random sampling of Group SELFIES strings improves the quality of generated molecules compared to regular SELFIES strings. Our open-source implementation of Group SELFIES is available online, which we hope will aid future research in molecular generation and optimization.
△ Less
Submitted 23 November, 2022;
originally announced November 2022.
-
AnyMorph: Learning Transferable Polices By Inferring Agent Morphology
Authors:
Brandon Trabucco,
Mariano Phielipp,
Glen Berseth
Abstract:
The prototypical approach to reinforcement learning involves training policies tailored to a particular agent from scratch for every new morphology. Recent work aims to eliminate the re-training of policies by investigating whether a morphology-agnostic policy, trained on a diverse set of agents with similar task objectives, can be transferred to new agents with unseen morphologies without re-trai…
▽ More
The prototypical approach to reinforcement learning involves training policies tailored to a particular agent from scratch for every new morphology. Recent work aims to eliminate the re-training of policies by investigating whether a morphology-agnostic policy, trained on a diverse set of agents with similar task objectives, can be transferred to new agents with unseen morphologies without re-training. This is a challenging problem that required previous approaches to use hand-designed descriptions of the new agent's morphology. Instead of hand-designing this description, we propose a data-driven method that learns a representation of morphology directly from the reinforcement learning objective. Ours is the first reinforcement learning algorithm that can train a policy to generalize to new agent morphologies without requiring a description of the agent's morphology in advance. We evaluate our approach on the standard benchmark for agent-agnostic control, and improve over the current state of the art in zero-shot generalization to new agents. Importantly, our method attains good performance without an explicit description of morphology.
△ Less
Submitted 17 June, 2022;
originally announced June 2022.
-
Offline Policy Comparison with Confidence: Benchmarks and Baselines
Authors:
Anurag Koul,
Mariano Phielipp,
Alan Fern
Abstract:
Decision makers often wish to use offline historical data to compare sequential-action policies at various world states. Importantly, computational tools should produce confidence values for such offline policy comparison (OPC) to account for statistical variance and limited data coverage. Nevertheless, there is little work that directly evaluates the quality of confidence values for OPC. In this…
▽ More
Decision makers often wish to use offline historical data to compare sequential-action policies at various world states. Importantly, computational tools should produce confidence values for such offline policy comparison (OPC) to account for statistical variance and limited data coverage. Nevertheless, there is little work that directly evaluates the quality of confidence values for OPC. In this work, we address this issue by creating benchmarks for OPC with Confidence (OPCC), derived by adding sets of policy comparison queries to datasets from offline reinforcement learning. In addition, we present an empirical evaluation of the risk versus coverage trade-off for a class of model-based baselines. In particular, the baselines learn ensembles of dynamics models, which are used in various ways to produce simulations for answering queries with confidence values. While our results suggest advantages for certain baseline variations, there appears to be significant room for improvement in future work.
△ Less
Submitted 22 May, 2022;
originally announced May 2022.
-
Pretraining Graph Neural Networks for few-shot Analog Circuit Modeling and Design
Authors:
Kourosh Hakhamaneshi,
Marcel Nassar,
Mariano Phielipp,
Pieter Abbeel,
Vladimir Stojanović
Abstract:
Being able to predict the performance of circuits without running expensive simulations is a desired capability that can catalyze automated design. In this paper, we present a supervised pretraining approach to learn circuit representations that can be adapted to new circuit topologies or unseen prediction tasks. We hypothesize that if we train a neural network (NN) that can predict the output DC…
▽ More
Being able to predict the performance of circuits without running expensive simulations is a desired capability that can catalyze automated design. In this paper, we present a supervised pretraining approach to learn circuit representations that can be adapted to new circuit topologies or unseen prediction tasks. We hypothesize that if we train a neural network (NN) that can predict the output DC voltages of a wide range of circuit instances it will be forced to learn generalizable knowledge about the role of each circuit element and how they interact with each other. The dataset for this supervised learning objective can be easily collected at scale since the required DC simulation to get ground truth labels is relatively cheap. This representation would then be helpful for few-shot generalization to unseen circuit metrics that require more time consuming simulations for obtaining the ground-truth labels. To cope with the variable topological structure of different circuits we describe each circuit as a graph and use graph neural networks (GNNs) to learn node embeddings. We show that pretraining GNNs on prediction of output node voltages can encourage learning representations that can be adapted to new unseen topologies or prediction of new circuit level properties with up to 10x more sample efficiency compared to a randomly initialized model. We further show that we can improve sample efficiency of prior SoTA model-based optimization methods by 2x (almost as good as using an oracle model) via fintuning pretrained GNNs as the feature extractor of the learned models.
△ Less
Submitted 1 April, 2022; v1 submitted 29 March, 2022;
originally announced March 2022.
-
DNS: Determinantal Point Process Based Neural Network Sampler for Ensemble Reinforcement Learning
Authors:
Hassam Sheikh,
Kizza Frisbee,
Mariano Phielipp
Abstract:
Application of ensemble of neural networks is becoming an imminent tool for advancing the state-of-the-art in deep reinforcement learning algorithms. However, training these large numbers of neural networks in the ensemble has an exceedingly high computation cost which may become a hindrance in training large-scale systems. In this paper, we propose DNS: a Determinantal Point Process based Neural…
▽ More
Application of ensemble of neural networks is becoming an imminent tool for advancing the state-of-the-art in deep reinforcement learning algorithms. However, training these large numbers of neural networks in the ensemble has an exceedingly high computation cost which may become a hindrance in training large-scale systems. In this paper, we propose DNS: a Determinantal Point Process based Neural Network Sampler that specifically uses k-dpp to sample a subset of neural networks for backpropagation at every training step thus significantly reducing the training time and computation cost. We integrated DNS in REDQ for continuous control tasks and evaluated on MuJoCo environments. Our experiments show that DNS augmented REDQ outperforms baseline REDQ in terms of average cumulative reward and achieves this using less than 50% computation when measured in FLOPS.
△ Less
Submitted 17 May, 2022; v1 submitted 31 January, 2022;
originally announced January 2022.
-
Minimizing Communication while Maximizing Performance in Multi-Agent Reinforcement Learning
Authors:
Varun Kumar Vijay,
Hassam Sheikh,
Somdeb Majumdar,
Mariano Phielipp
Abstract:
Inter-agent communication can significantly increase performance in multi-agent tasks that require co-ordination to achieve a shared goal. Prior work has shown that it is possible to learn inter-agent communication protocols using multi-agent reinforcement learning and message-passing network architectures. However, these models use an unconstrained broadcast communication model, in which an agent…
▽ More
Inter-agent communication can significantly increase performance in multi-agent tasks that require co-ordination to achieve a shared goal. Prior work has shown that it is possible to learn inter-agent communication protocols using multi-agent reinforcement learning and message-passing network architectures. However, these models use an unconstrained broadcast communication model, in which an agent communicates with all other agents at every step, even when the task does not require it. In real-world applications, where communication may be limited by system constraints like bandwidth, power and network capacity, one might need to reduce the number of messages that are sent. In this work, we explore a simple method of minimizing communication while maximizing performance in multi-task learning: simultaneously optimizing a task-specific objective and a communication penalty. We show that the objectives can be optimized using Reinforce and the Gumbel-Softmax reparameterization. We introduce two techniques to stabilize training: 50% training and message forwarding. Training with the communication penalty on only 50% of the episodes prevents our models from turning off their outgoing messages. Second, repeating messages received previously helps models retain information, and further improves performance. With these techniques, we show that we can reduce communication by 75% with no loss of performance.
△ Less
Submitted 8 December, 2021; v1 submitted 15 June, 2021;
originally announced June 2021.
-
Neuroevolution-Enhanced Multi-Objective Optimization for Mixed-Precision Quantization
Authors:
Santiago Miret,
Vui Seng Chua,
Mattias Marder,
Mariano Phielipp,
Nilesh Jain,
Somdeb Majumdar
Abstract:
Mixed-precision quantization is a powerful tool to enable memory and compute savings of neural network workloads by deploying different sets of bit-width precisions on separate compute operations. In this work, we present a flexible and scalable framework for automated mixed-precision quantization that concurrently optimizes task performance, memory compression, and compute savings through multi-o…
▽ More
Mixed-precision quantization is a powerful tool to enable memory and compute savings of neural network workloads by deploying different sets of bit-width precisions on separate compute operations. In this work, we present a flexible and scalable framework for automated mixed-precision quantization that concurrently optimizes task performance, memory compression, and compute savings through multi-objective evolutionary computing. Our framework centers on Neuroevolution-Enhanced Multi-Objective Optimization (NEMO), a novel search method, which combines established search methods with the representational power of neural networks. Within NEMO, the population is divided into structurally distinct sub-populations, or species, which jointly create the Pareto frontier of solutions for the multi-objective problem. At each generation, species perform separate mutation and crossover operations, and are re-sized in proportion to the goodness of their contribution to the Pareto frontier. In our experiments, we define a graph-based representation to describe the underlying workload, enabling us to deploy graph neural networks trained by NEMO via neuroevolution, to find Pareto optimal configurations for MobileNet-V2, ResNet50 and ResNeXt-101-32x8d. Compared to the state-of-the-art, we achieve competitive results on memory compression and superior results for compute compression. Further analysis reveals that the graph representation and the species-based approach employed by NEMO are critical to finding optimal solutions.
△ Less
Submitted 1 April, 2022; v1 submitted 14 June, 2021;
originally announced June 2021.
-
Instance based Generalization in Reinforcement Learning
Authors:
Martin Bertran,
Natalia Martinez,
Mariano Phielipp,
Guillermo Sapiro
Abstract:
Agents trained via deep reinforcement learning (RL) routinely fail to generalize to unseen environments, even when these share the same underlying dynamics as the training levels. Understanding the generalization properties of RL is one of the challenges of modern machine learning. Towards this goal, we analyze policy learning in the context of Partially Observable Markov Decision Processes (POMDP…
▽ More
Agents trained via deep reinforcement learning (RL) routinely fail to generalize to unseen environments, even when these share the same underlying dynamics as the training levels. Understanding the generalization properties of RL is one of the challenges of modern machine learning. Towards this goal, we analyze policy learning in the context of Partially Observable Markov Decision Processes (POMDPs) and formalize the dynamics of training levels as instances. We prove that, independently of the exploration strategy, reusing instances introduces significant changes on the effective Markov dynamics the agent observes during training. Maximizing expected rewards impacts the learned belief state of the agent by inducing undesired instance specific speedrunning policies instead of generalizeable ones, which are suboptimal on the training set. We provide generalization bounds to the value gap in train and test environments based on the number of training instances, and use insights based on these to improve performance on unseen levels. We propose training a shared belief representation over an ensemble of specialized policies, from which we compute a consensus policy that is used for data collection, disallowing instance specific exploitation. We experimentally validate our theory, observations, and the proposed computational solution over the CoinRun benchmark.
△ Less
Submitted 2 November, 2020;
originally announced November 2020.
-
Language-Conditioned Imitation Learning for Robot Manipulation Tasks
Authors:
Simon Stepputtis,
Joseph Campbell,
Mariano Phielipp,
Stefan Lee,
Chitta Baral,
Heni Ben Amor
Abstract:
Imitation learning is a popular approach for teaching motor skills to robots. However, most approaches focus on extracting policy parameters from execution traces alone (i.e., motion trajectories and perceptual data). No adequate communication channel exists between the human expert and the robot to describe critical aspects of the task, such as the properties of the target object or the intended…
▽ More
Imitation learning is a popular approach for teaching motor skills to robots. However, most approaches focus on extracting policy parameters from execution traces alone (i.e., motion trajectories and perceptual data). No adequate communication channel exists between the human expert and the robot to describe critical aspects of the task, such as the properties of the target object or the intended shape of the motion. Motivated by insights into the human teaching process, we introduce a method for incorporating unstructured natural language into imitation learning. At training time, the expert can provide demonstrations along with verbal descriptions in order to describe the underlying intent (e.g., "go to the large green bowl"). The training process then interrelates these two modalities to encode the correlations between language, perception, and motion. The resulting language-conditioned visuomotor policies can be conditioned at runtime on new human commands and instructions, which allows for more fine-grained control over the trained policies while also reducing situational ambiguity. We demonstrate in a set of simulation experiments how our approach can learn language-conditioned manipulation policies for a seven-degree-of-freedom robot arm and compare the results to a variety of alternative methods.
△ Less
Submitted 22 October, 2020;
originally announced October 2020.
-
Motion2Vec: Semi-Supervised Representation Learning from Surgical Videos
Authors:
Ajay Kumar Tanwani,
Pierre Sermanet,
Andy Yan,
Raghav Anand,
Mariano Phielipp,
Ken Goldberg
Abstract:
Learning meaningful visual representations in an embedding space can facilitate generalization in downstream tasks such as action segmentation and imitation. In this paper, we learn a motion-centric representation of surgical video demonstrations by grou** them into action segments/sub-goals/options in a semi-supervised manner. We present Motion2Vec, an algorithm that learns a deep embedding fea…
▽ More
Learning meaningful visual representations in an embedding space can facilitate generalization in downstream tasks such as action segmentation and imitation. In this paper, we learn a motion-centric representation of surgical video demonstrations by grou** them into action segments/sub-goals/options in a semi-supervised manner. We present Motion2Vec, an algorithm that learns a deep embedding feature space from video observations by minimizing a metric learning loss in a Siamese network: images from the same action segment are pulled together while pushed away from randomly sampled images of other segments, while respecting the temporal ordering of the images. The embeddings are iteratively segmented with a recurrent neural network for a given parametrization of the embedding space after pre-training the Siamese network. We only use a small set of labeled video segments to semantically align the embedding space and assign pseudo-labels to the remaining unlabeled data by inference on the learned model parameters. We demonstrate the use of this representation to imitate surgical suturing motions from publicly available videos of the JIGSAWS dataset. Results give 85.5 % segmentation accuracy on average suggesting performance improvement over several state-of-the-art baselines, while kinematic pose imitation gives 0.94 centimeter error in position per observation on the test set. Videos, code and data are available at https://sites.google.com/view/motion2vec
△ Less
Submitted 31 May, 2020;
originally announced June 2020.
-
Clone Swarms: Learning to Predict and Control Multi-Robot Systems by Imitation
Authors:
Siyu Zhou,
Mariano Phielipp,
Jorge A. Sefair,
Sara I. Walker,
Heni Ben Amor
Abstract:
In this paper, we propose SwarmNet -- a neural network architecture that can learn to predict and imitate the behavior of an observed swarm of agents in a centralized manner. Tested on artificially generated swarm motion data, the network achieves high levels of prediction accuracy and imitation authenticity. We compare our model to previous approaches for modelling interaction systems and show ho…
▽ More
In this paper, we propose SwarmNet -- a neural network architecture that can learn to predict and imitate the behavior of an observed swarm of agents in a centralized manner. Tested on artificially generated swarm motion data, the network achieves high levels of prediction accuracy and imitation authenticity. We compare our model to previous approaches for modelling interaction systems and show how modifying components of other models gradually approaches the performance of ours. Finally, we also discuss an extension of SwarmNet that can deal with nondeterministic, noisy, and uncertain environments, as often found in robotics applications.
△ Less
Submitted 2 November, 2020; v1 submitted 5 December, 2019;
originally announced December 2019.
-
Imitation Learning of Robot Policies by Combining Language, Vision and Demonstration
Authors:
Simon Stepputtis,
Joseph Campbell,
Mariano Phielipp,
Chitta Baral,
Heni Ben Amor
Abstract:
In this work we propose a novel end-to-end imitation learning approach which combines natural language, vision, and motion information to produce an abstract representation of a task, which in turn is used to synthesize specific motion controllers at run-time. This multimodal approach enables generalization to a wide variety of environmental conditions and allows an end-user to direct a robot poli…
▽ More
In this work we propose a novel end-to-end imitation learning approach which combines natural language, vision, and motion information to produce an abstract representation of a task, which in turn is used to synthesize specific motion controllers at run-time. This multimodal approach enables generalization to a wide variety of environmental conditions and allows an end-user to direct a robot policy through verbal communication. We empirically validate our approach with an extensive set of simulations and show that it achieves a high task success rate over a variety of conditions while remaining amenable to probabilistic interpretability.
△ Less
Submitted 26 November, 2019;
originally announced November 2019.
-
On Training Flexible Robots using Deep Reinforcement Learning
Authors:
Zach Dwiel,
Madhavun Candadai,
Mariano Phielipp
Abstract:
The use of robotics in controlled environments has flourished over the last several decades and training robots to perform tasks using control strategies developed from dynamical models of their hardware have proven very effective. However, in many real-world settings, the uncertainties of the environment, the safety requirements and generalized capabilities that are expected of robots make rigid…
▽ More
The use of robotics in controlled environments has flourished over the last several decades and training robots to perform tasks using control strategies developed from dynamical models of their hardware have proven very effective. However, in many real-world settings, the uncertainties of the environment, the safety requirements and generalized capabilities that are expected of robots make rigid industrial robots unsuitable. This created great research interest into develo** control strategies for flexible robot hardware for which building dynamical models are challenging. In this paper, inspired by the success of deep reinforcement learning (DRL) in other areas, we systematically study the efficacy of policy search methods using DRL in training flexible robots. Our results indicate that DRL is successfully able to learn efficient and robust policies for complex tasks at various degrees of flexibility. We also note that DRL using Deep Deterministic Policy Gradients can be sensitive to the choice of sensors and adding more informative sensors does not necessarily make the task easier to learn.
△ Less
Submitted 13 July, 2019; v1 submitted 29 June, 2019;
originally announced July 2019.
-
Goal-conditioned Imitation Learning
Authors:
Yiming Ding,
Carlos Florensa,
Mariano Phielipp,
Pieter Abbeel
Abstract:
Designing rewards for Reinforcement Learning (RL) is challenging because it needs to convey the desired task, be efficient to optimize, and be easy to compute. The latter is particularly problematic when applying RL to robotics, where detecting whether the desired configuration is reached might require considerable supervision and instrumentation. Furthermore, we are often interested in being able…
▽ More
Designing rewards for Reinforcement Learning (RL) is challenging because it needs to convey the desired task, be efficient to optimize, and be easy to compute. The latter is particularly problematic when applying RL to robotics, where detecting whether the desired configuration is reached might require considerable supervision and instrumentation. Furthermore, we are often interested in being able to reach a wide range of configurations, hence setting up a different reward every time might be unpractical. Methods like Hindsight Experience Replay (HER) have recently shown promise to learn policies able to reach many goals, without the need of a reward. Unfortunately, without tricks like resetting to points along the trajectory, HER might require many samples to discover how to reach certain areas of the state-space. In this work we investigate different approaches to incorporate demonstrations to drastically speed up the convergence to a policy able to reach any goal, also surpassing the performance of an agent trained with other Imitation Learning algorithms. Furthermore, we show our method can also be used when the available expert trajectories do not contain the actions, which can leverage kinesthetic or third person demonstration. The code is available at https://sites.google.com/view/goalconditioned-il/.
△ Less
Submitted 27 May, 2020; v1 submitted 13 June, 2019;
originally announced June 2019.
-
Hierarchical Policy Learning is Sensitive to Goal Space Design
Authors:
Zach Dwiel,
Madhavun Candadai,
Mariano Phielipp,
Arjun K. Bansal
Abstract:
Hierarchy in reinforcement learning agents allows for control at multiple time scales yielding improved sample efficiency, the ability to deal with long time horizons and transferability of sub-policies to tasks outside the training distribution. It is often implemented as a master policy providing goals to a sub-policy. Ideally, we would like the goal-spaces to be learned, however, properties of…
▽ More
Hierarchy in reinforcement learning agents allows for control at multiple time scales yielding improved sample efficiency, the ability to deal with long time horizons and transferability of sub-policies to tasks outside the training distribution. It is often implemented as a master policy providing goals to a sub-policy. Ideally, we would like the goal-spaces to be learned, however, properties of optimal goal spaces still remain unknown and consequently there is no method yet to learn optimal goal spaces. Motivated by this, we systematically analyze how various modifications to the ground-truth goal-space affect learning in hierarchical models with the aim of identifying important properties of optimal goal spaces. Our results show that, while rotation of ground-truth goal spaces and noise had no effect, having additional unnecessary factors significantly impaired learning in hierarchical models.
△ Less
Submitted 25 June, 2019; v1 submitted 4 May, 2019;
originally announced May 2019.