Search | arXiv e-print repository

Speeding up 6-DoF Grasp Sampling with Quality-Diversity

Authors: Johann Huber, François Hélénon, Mathilde Kappel, Elie Chelly, Mahdi Khoramshahi, Faïz Ben Amar, Stéphane Doncieux

Abstract: Recent advances in AI have led to significant results in robotic learning, including natural language-conditioned planning and efficient optimization of controllers using generative models. However, the interaction data remains the bottleneck for generalization. Getting data for gras** is a critical challenge, as this skill is required to complete many manipulation tasks. Quality-Diversity (QD)… ▽ More Recent advances in AI have led to significant results in robotic learning, including natural language-conditioned planning and efficient optimization of controllers using generative models. However, the interaction data remains the bottleneck for generalization. Getting data for gras** is a critical challenge, as this skill is required to complete many manipulation tasks. Quality-Diversity (QD) algorithms optimize a set of solutions to get diverse, high-performing solutions to a given problem. This paper investigates how QD can be combined with priors to speed up the generation of diverse grasps poses in simulation compared to standard 6-DoF grasp sampling schemes. Experiments conducted on 4 grippers with 2-to-5 fingers on standard objects show that QD outperforms commonly used methods by a large margin. Further experiments show that QD optimization automatically finds some efficient priors that are usually hard coded. The deployment of generated grasps on a 2-finger gripper and an Allegro hand shows that the diversity produced maintains sim-to-real transferability. We believe these results to be a significant step toward the generation of large datasets that can lead to robust and generalizing robotic gras** policies. △ Less

Submitted 10 March, 2024; originally announced March 2024.

Comments: 7 pages, 8 figures. Preprint version

arXiv:2311.00344 [pdf, other]

A Definition of Open-Ended Learning Problems for Goal-Conditioned Agents

Authors: Olivier Sigaud, Gianluca Baldassarre, Cedric Colas, Stephane Doncieux, Richard Duro, Pierre-Yves Oudeyer, Nicolas Perrin-Gilbert, Vieri Giuliano Santucci

Abstract: A lot of recent machine learning research papers have ``open-ended learning'' in their title. But very few of them attempt to define what they mean when using the term. Even worse, when looking more closely there seems to be no consensus on what distinguishes open-ended learning from related concepts such as continual learning, lifelong learning or autotelic learning. In this paper, we contribute… ▽ More A lot of recent machine learning research papers have ``open-ended learning'' in their title. But very few of them attempt to define what they mean when using the term. Even worse, when looking more closely there seems to be no consensus on what distinguishes open-ended learning from related concepts such as continual learning, lifelong learning or autotelic learning. In this paper, we contribute to fixing this situation. After illustrating the genealogy of the concept and more recent perspectives about what it truly means, we outline that open-ended learning is generally conceived as a composite notion encompassing a set of diverse properties. In contrast with previous approaches, we propose to isolate a key elementary property of open-ended processes, which is to produce elements from time to time (e.g., observations, options, reward functions, and goals), over an infinite horizon, that are considered novel from an observer's perspective. From there, we build the notion of open-ended learning problems and focus in particular on the subset of open-ended goal-conditioned reinforcement learning problems in which agents can learn a growing repertoire of goal-driven skills. Finally, we highlight the work that remains to be performed to fill the gap between our elementary definition and the more involved notions of open-ended learning that developmental AI researchers may have in mind. △ Less

Submitted 7 June, 2024; v1 submitted 1 November, 2023; originally announced November 2023.

arXiv:2310.04517 [pdf, other]

Domain Randomization for Sim2real Transfer of Automatically Generated Gras** Datasets

Authors: Johann Huber, François Hélénon, Hippolyte Watrelot, Faiz Ben Amar, Stéphane Doncieux

Abstract: Robotic gras** refers to making a robotic system pick an object by applying forces and torques on its surface. Many recent studies use data-driven approaches to address gras**, but the sparse reward nature of this task made the learning process challenging to bootstrap. To avoid constraining the operational space, an increasing number of works propose gras** datasets to learn from. But most… ▽ More Robotic gras** refers to making a robotic system pick an object by applying forces and torques on its surface. Many recent studies use data-driven approaches to address gras**, but the sparse reward nature of this task made the learning process challenging to bootstrap. To avoid constraining the operational space, an increasing number of works propose gras** datasets to learn from. But most of them are limited to simulations. The present paper investigates how automatically generated grasps can be exploited in the real world. More than 7000 reach-and-grasp trajectories have been generated with Quality-Diversity (QD) methods on 3 different arms and grippers, including parallel fingers and a dexterous hand, and tested in the real world. Conducted analysis on the collected measure shows correlations between several Domain Randomization-based quality criteria and sim-to-real transferability. Key challenges regarding the reality gap for gras** have been identified, stressing matters on which researchers on gras** should focus in the future. A QD approach has finally been proposed for making grasps more robust to domain randomization, resulting in a transfer ratio of 84% on the Franka Research 3 arm. △ Less

Submitted 6 October, 2023; originally announced October 2023.

Comments: 6 pages, 7 figures, draft version

arXiv:2310.04349 [pdf, other]

Toward a Plug-and-Play Vision-Based Gras** Module for Robotics

Authors: François Hélénon, Johann Huber, Faïz Ben Amar, Stéphane Doncieux

Abstract: Despite recent advancements in AI for robotics, gras** remains a partially solved challenge, hindered by the lack of benchmarks and reproducibility constraints. This paper introduces a vision-based gras** framework that can easily be transferred across multiple manipulators. Leveraging Quality-Diversity (QD) algorithms, the framework generates diverse repertoires of open-loop gras** trajecto… ▽ More Despite recent advancements in AI for robotics, gras** remains a partially solved challenge, hindered by the lack of benchmarks and reproducibility constraints. This paper introduces a vision-based gras** framework that can easily be transferred across multiple manipulators. Leveraging Quality-Diversity (QD) algorithms, the framework generates diverse repertoires of open-loop gras** trajectories, enhancing adaptability while maintaining a diversity of grasps. This framework addresses two main issues: the lack of an off-the-shelf vision module for detecting object pose and the generalization of QD trajectories to the whole robot operational space. The proposed solution combines multiple vision modules for 6DoF object detection and tracking while rigidly transforming QD-generated trajectories into the object frame. Experiments on a Franka Research 3 arm and a UR5 arm with a SIH Schunk hand demonstrate comparable performance when the real scene aligns with the simulation used for grasp generation. This work represents a significant stride toward building a reliable vision-based gras** module transferable to new platforms, while being adaptable to diverse scenarios without further training iterations. △ Less

Submitted 12 March, 2024; v1 submitted 6 October, 2023; originally announced October 2023.

Comments: 6 pages, 9 figures

arXiv:2308.13278 [pdf, other]

Integrating LLMs and Decision Transformers for Language Grounded Generative Quality-Diversity

Authors: Achkan Salehi, Stephane Doncieux

Abstract: Quality-Diversity is a branch of stochastic optimization that is often applied to problems from the Reinforcement Learning and control domains in order to construct repertoires of well-performing policies/skills that exhibit diversity with respect to a behavior space. Such archives are usually composed of a finite number of reactive agents which are each associated to a unique behavior descriptor,… ▽ More Quality-Diversity is a branch of stochastic optimization that is often applied to problems from the Reinforcement Learning and control domains in order to construct repertoires of well-performing policies/skills that exhibit diversity with respect to a behavior space. Such archives are usually composed of a finite number of reactive agents which are each associated to a unique behavior descriptor, and instantiating behavior descriptors outside of that coarsely discretized space is not straight-forward. While a few recent works suggest solutions to that issue, the trajectory that is generated is not easily customizable beyond the specification of a target behavior descriptor. We propose to jointly solve those problems in environments where semantic information about static scene elements is available by leveraging a Large Language Model to augment the repertoire with natural language descriptions of trajectories, and training a policy conditioned on those descriptions. Thus, our method allows a user to not only specify an arbitrary target behavior descriptor, but also provide the model with a high-level textual prompt to shape the generated trajectory. We also propose an LLM-based approach to evaluating the performance of such generative agents. Furthermore, we develop a benchmark based on simulated robot navigation in a 2d maze that we use for experimental validation. △ Less

Submitted 25 August, 2023; originally announced August 2023.

Comments: 16 pages, 9 figures, 2 tables

arXiv:2308.05483 [pdf, other]

Quality Diversity under Sparse Reward and Sparse Interaction: Application to Gras** in Robotics

Authors: J. Huber, F. Hélénon, M. Coninx, F. Ben Amar, S. Doncieux

Abstract: Quality-Diversity (QD) methods are algorithms that aim to generate a set of diverse and high-performing solutions to a given problem. Originally developed for evolutionary robotics, most QD studies are conducted on a limited set of domains - mainly applied to locomotion, where the fitness and the behavior signal are dense. Gras** is a crucial task for manipulation in robotics. Despite the effort… ▽ More Quality-Diversity (QD) methods are algorithms that aim to generate a set of diverse and high-performing solutions to a given problem. Originally developed for evolutionary robotics, most QD studies are conducted on a limited set of domains - mainly applied to locomotion, where the fitness and the behavior signal are dense. Gras** is a crucial task for manipulation in robotics. Despite the efforts of many research communities, this task is yet to be solved. Gras** cumulates unprecedented challenges in QD literature: it suffers from reward sparsity, behavioral sparsity, and behavior space misalignment. The present work studies how QD can address gras**. Experiments have been conducted on 15 different methods on 10 gras** domains, corresponding to 2 different robot-gripper setups and 5 standard objects. An evaluation framework that distinguishes the evaluation of an algorithm from its internal components has also been proposed for a fair comparison. The obtained results show that MAP-Elites variants that select successful solutions in priority outperform all the compared methods on the studied metrics by a large margin. We also found experimental evidence that sparse interaction can lead to deceptive novelty. To our knowledge, the ability to efficiently produce examples of gras** trajectories demonstrated in this work has no precedent in the literature. △ Less

Submitted 31 October, 2023; v1 submitted 10 August, 2023; originally announced August 2023.

Comments: 37 pages, 17 figures. Draft version

arXiv:2303.01563 [pdf, other]

Data-efficient, Explainable and Safe Box Manipulation: Illustrating the Advantages of Physical Priors in Model-Predictive Control

Authors: Achkan Salehi, Stephane Doncieux

Abstract: Model-based RL/control have gained significant traction in robotics. Yet, these approaches often remain data-inefficient and lack the explainability of hand-engineered solutions. This makes them difficult to debug/integrate in safety-critical settings. However, in many systems, prior knowledge of environment kinematics/dynamics is available. Incorporating such priors can help address the aforement… ▽ More Model-based RL/control have gained significant traction in robotics. Yet, these approaches often remain data-inefficient and lack the explainability of hand-engineered solutions. This makes them difficult to debug/integrate in safety-critical settings. However, in many systems, prior knowledge of environment kinematics/dynamics is available. Incorporating such priors can help address the aforementioned problems by reducing problem complexity and the need for exploration, while also facilitating the expression of the decisions taken by the agent in terms of physically meaningful entities. Our aim with this paper is to illustrate and support this point of view via a case-study. We model a payload manipulation problem based on a real robotic system, and show that leveraging prior knowledge about the dynamics of the environment in an MPC framework can lead to improvements in explainability, safety and data-efficiency, leading to satisfying generalization properties with less data. △ Less

Submitted 28 March, 2024; v1 submitted 2 March, 2023; originally announced March 2023.

Comments: accepted for publication by l4dc 2024, 12 pages (with references), 4 figures, 2 tables

arXiv:2210.11801 [pdf, other]

Random Actions vs Random Policies: Bootstrap** Model-Based Direct Policy Search

Authors: Elias Hanna, Alex Coninx, Stéphane Doncieux

Abstract: This paper studies the impact of the initial data gathering method on the subsequent learning of a dynamics model. Dynamics models approximate the true transition function of a given task, in order to perform policy search directly on the model rather than on the costly real system. This study aims to determine how to bootstrap a model as efficiently as possible, by comparing initialization method… ▽ More This paper studies the impact of the initial data gathering method on the subsequent learning of a dynamics model. Dynamics models approximate the true transition function of a given task, in order to perform policy search directly on the model rather than on the costly real system. This study aims to determine how to bootstrap a model as efficiently as possible, by comparing initialization methods employed in two different policy search frameworks in the literature. The study focuses on the model performance under the episode-based framework of Evolutionary methods using probabilistic ensembles. Experimental results show that various task-dependant factors can be detrimental to each method, suggesting to explore hybrid approaches. △ Less

Submitted 21 October, 2022; originally announced October 2022.

Comments: ICML 2022 Workshop Adaptive Experimental Design and Active Learning in the Real World

arXiv:2210.07887 [pdf, other]

E2R: a Hierarchical-Learning inspired Novelty-Search method to generate diverse repertoires of gras** trajectories

Authors: Johann Huber, Oumar Sane, Alex Coninx, Faiz Ben Amar, Stephane Doncieux

Abstract: Robotics gras** refers to the task of making a robotic system pick an object by applying forces and torques on its surface. Despite the recent advances in data-driven approaches, gras** remains an unsolved problem. Most of the works on this task are relying on priors and heavy constraints to avoid the exploration problem. Novelty Search (NS) refers to evolutionary algorithms that replace selec… ▽ More Robotics gras** refers to the task of making a robotic system pick an object by applying forces and torques on its surface. Despite the recent advances in data-driven approaches, gras** remains an unsolved problem. Most of the works on this task are relying on priors and heavy constraints to avoid the exploration problem. Novelty Search (NS) refers to evolutionary algorithms that replace selection of best performing individuals with selection of the most novel ones. Such methods have already shown promising results on hard exploration problems. In this work, we introduce a new NS-based method that can generate large datasets of gras** trajectories in a platform-agnostic manner. Inspired by the hierarchical learning paradigm, our method decouples approach and prehension to make the behavioral space smoother. Experiments conducted on 3 different robot-gripper setups and on several standard objects shows that our method outperforms state-of-the-art for generating diverse repertoire of gras** trajectories, getting a higher successful run ratio, as well as a better diversity for both approach and prehension. Some of the generated solutions have been successfully deployed on a real robot, showing the exploitability of the obtained repertoires. △ Less

Submitted 17 April, 2024; v1 submitted 14 October, 2022; originally announced October 2022.

Comments: 7 pages, 6 figures. Preprint version

arXiv:2207.12062 [pdf, other]

doi 10.1109/TRO.2023.3326350

Adaptive Asynchronous Control Using Meta-learned Neural Ordinary Differential Equations

Authors: Achkan Salehi, Steffen Rühl, Stephane Doncieux

Abstract: Model-based Reinforcement Learning and Control have demonstrated great potential in various sequential decision making problem domains, including in robotics settings. However, real-world robotics systems often present challenges that limit the applicability of those methods. In particular, we note two problems that jointly happen in many industrial systems: 1) Irregular/asynchronous observations… ▽ More Model-based Reinforcement Learning and Control have demonstrated great potential in various sequential decision making problem domains, including in robotics settings. However, real-world robotics systems often present challenges that limit the applicability of those methods. In particular, we note two problems that jointly happen in many industrial systems: 1) Irregular/asynchronous observations and actions and 2) Dramatic changes in environment dynamics from an episode to another (e.g. varying payload inertial properties). We propose a general framework that overcomes those difficulties by meta-learning adaptive dynamics models for continuous-time prediction and control. The proposed approach is task-agnostic and can be adapted to new tasks in a straight-forward manner. We present evaluations in two different robot simulations and on a real industrial robot. △ Less

Submitted 23 October, 2023; v1 submitted 25 July, 2022; originally announced July 2022.

Comments: 16 double column pages, 14 figures, 3 tables

Journal ref: IEEE Transactions on Robotics; Print ISSN: 1552-3098; Online ISSN: 1941-0468

arXiv:2205.08189 [pdf, other]

Automatic Acquisition of a Repertoire of Diverse Gras** Trajectories through Behavior Sha** and Novelty Search

Authors: Aurélien Morel, Yakumo Kunimoto, Alex Coninx, Stéphane Doncieux

Abstract: Gras** a particular object may require a dedicated gras** movement that may also be specific to the robot end-effector. No generic and autonomous method does exist to generate these movements without making hypotheses on the robot or on the object. Learning methods could help to autonomously discover relevant gras** movements, but they face an important issue: gras** movements are so rare… ▽ More Gras** a particular object may require a dedicated gras** movement that may also be specific to the robot end-effector. No generic and autonomous method does exist to generate these movements without making hypotheses on the robot or on the object. Learning methods could help to autonomously discover relevant gras** movements, but they face an important issue: gras** movements are so rare that a learning method based on exploration has little chance to ever observe an interesting movement, thus creating a bootstrap issue. We introduce an approach to generate diverse gras** movements in order to solve this problem. The movements are generated in simulation, for particular object positions. We test it on several simulated robots: Baxter, Pepper and a Kuka Iiwa arm. Although we show that generated movements actually work on a real Baxter robot, the aim is to use this method to create a large dataset to bootstrap deep learning methods. △ Less

Submitted 17 May, 2022; originally announced May 2022.

Comments: 7 pages, 9 figures, accepted at ICRA 2022. Annex video available at https://youtu.be/bqqQepJAOKQ

arXiv:2205.03207 [pdf, other]

Towards QD-suite: develo** a set of benchmarks for Quality-Diversity algorithms

Authors: Achkan Salehi, Stephane Doncieux

Abstract: While the field of Quality-Diversity (QD) has grown into a distinct branch of stochastic optimization, a few problems, in particular locomotion and navigation tasks, have become de facto standards. Are such benchmarks sufficient? Are they representative of the key challenges faced by QD algorithms? Do they provide the ability to focus on one particular challenge by properly disentangling it from o… ▽ More While the field of Quality-Diversity (QD) has grown into a distinct branch of stochastic optimization, a few problems, in particular locomotion and navigation tasks, have become de facto standards. Are such benchmarks sufficient? Are they representative of the key challenges faced by QD algorithms? Do they provide the ability to focus on one particular challenge by properly disentangling it from others? Do they have much predictive power in terms of scalability and generalization? Existing benchmarks are not standardized, and there is currently no MNIST equivalent for QD. Inspired by recent works on Reinforcement Learning benchmarks, we argue that the identification of challenges faced by QD methods and the development of targeted, challenging, scalable but affordable benchmarks is an important step. As an initial effort, we identify three problems that are challenging in sparse reward settings, and propose associated benchmarks: (1) Behavior metric bias, which can result from the use of metrics that do not match the structure of the behavior space. (2) Behavioral Plateaus, with varying characteristics, such that esca** them would require adaptive QD algorithms and (3) Evolvability Traps, where small variations in genotype result in large behavioral changes. The environments that we propose satisfy the properties listed above. △ Less

Submitted 6 May, 2022; originally announced May 2022.

Comments: 6 pages, 8 figures, Written for and presented at the GECCO 22 QD-benchmarking workshop (https://quality-diversity.github.io/workshop)

arXiv:2205.03162 [pdf, other]

Geodesics, Non-linearities and the Archive of Novelty Search

Authors: Achkan Salehi, Alexandre Coninx, Stephane Doncieux

Abstract: The Novelty Search (NS) algorithm was proposed more than a decade ago. However, the mechanisms behind its empirical success are still not well formalized/understood. This short note focuses on the effects of the archive on exploration. Experimental evidence from a few application domains suggests that archive-based NS performs in general better than when Novelty is solely computed with respect to… ▽ More The Novelty Search (NS) algorithm was proposed more than a decade ago. However, the mechanisms behind its empirical success are still not well formalized/understood. This short note focuses on the effects of the archive on exploration. Experimental evidence from a few application domains suggests that archive-based NS performs in general better than when Novelty is solely computed with respect to the population. An argument that is often encountered in the literature is that the archive prevents exploration from backtracking or cycling, i.e. from revisiting previously encountered areas in the behavior space. We argue that this is not a complete or accurate explanation as backtracking - beside often being desirable - can actually be enabled by the archive. Through low-dimensional/analytical examples, we show that a key effect of the archive is that it counterbalances the exploration biases that result, among other factors, from the use of inadequate behavior metrics and the non-linearities of the behavior map**. Our observations seem to hint that attributing a more active role to the archive in sampling can be beneficial. △ Less

Submitted 6 May, 2022; originally announced May 2022.

Comments: 4 pages, 3 figures

Journal ref: GECCO 22 Companion, July 9-13, 2022, Boston, MA, USA

arXiv:2111.01919 [pdf, other]

Discovering and Exploiting Sparse Rewards in a Learned Behavior Space

Authors: Giuseppe Paolo, Miranda Coninx, Alban Laflaquière, Stephane Doncieux

Abstract: Learning optimal policies in sparse rewards settings is difficult as the learning agent has little to no feedback on the quality of its actions. In these situations, a good strategy is to focus on exploration, hopefully leading to the discovery of a reward signal to improve on. A learning algorithm capable of dealing with this kind of settings has to be able to (1) explore possible agent behaviors… ▽ More Learning optimal policies in sparse rewards settings is difficult as the learning agent has little to no feedback on the quality of its actions. In these situations, a good strategy is to focus on exploration, hopefully leading to the discovery of a reward signal to improve on. A learning algorithm capable of dealing with this kind of settings has to be able to (1) explore possible agent behaviors and (2) exploit any possible discovered reward. Efficient exploration algorithms have been proposed that require to define a behavior space, that associates to an agent its resulting behavior in a space that is known to be worth exploring. The need to define this space is a limitation of these algorithms. In this work, we introduce STAX, an algorithm designed to learn a behavior space on-the-fly and to explore it while efficiently optimizing any reward discovered. It does so by separating the exploration and learning of the behavior space from the exploitation of the reward through an alternating two-steps process. In the first step, STAX builds a repertoire of diverse policies while learning a low-dimensional representation of the high-dimensional observations generated during the policies evaluation. In the exploitation step, emitters are used to optimize the performance of the discovered rewarding solutions. Experiments conducted on three different sparse reward environments show that STAX performs comparably to existing baselines while requiring much less prior information about the task as it autonomously builds the behavior space. △ Less

Submitted 26 September, 2023; v1 submitted 2 November, 2021; originally announced November 2021.

Comments: 25 pages. Published by the Evolutionary Computation Journal, MIT Press

arXiv:2109.13596 [pdf]

doi 10.3389/frobt.2022.762051

Exploratory State Representation Learning

Authors: Astrid Merckling, Nicolas Perrin-Gilbert, Alex Coninx, Stéphane Doncieux

Abstract: Not having access to compact and meaningful representations is known to significantly increase the complexity of reinforcement learning (RL). For this reason, it can be useful to perform state representation learning (SRL) before tackling RL tasks. However, obtaining a good state representation can only be done if a large diversity of transitions is observed, which can require a difficult explorat… ▽ More Not having access to compact and meaningful representations is known to significantly increase the complexity of reinforcement learning (RL). For this reason, it can be useful to perform state representation learning (SRL) before tackling RL tasks. However, obtaining a good state representation can only be done if a large diversity of transitions is observed, which can require a difficult exploration, especially if the environment is initially reward-free. To solve the problems of exploration and SRL in parallel, we propose a new approach called XSRL (eXploratory State Representation Learning). On one hand, it jointly learns compact state representations and a state transition estimator which is used to remove unexploitable information from the representations. On the other hand, it continuously trains an inverse model, and adds to the prediction error of this model a $k$-step learning progress bonus to form the maximization objective of a discovery policy. This results in a policy that seeks complex transitions from which the trained models can effectively learn. Our experimental results show that the approach leads to efficient exploration in challenging environments with image observations, and to state representations that significantly accelerate learning in RL tasks. △ Less

Submitted 15 February, 2022; v1 submitted 28 September, 2021; originally announced September 2021.

Journal ref: Frontiers in Robotics and AI, 14 February 2022

arXiv:2109.06826 [pdf, other]

doi 10.1109/LRA.2022.3148438

Few-shot Quality-Diversity Optimization

Authors: Achkan Salehi, Alexandre Coninx, Stephane Doncieux

Abstract: In the past few years, a considerable amount of research has been dedicated to the exploitation of previous learning experiences and the design of Few-shot and Meta Learning approaches, in problem domains ranging from Computer Vision to Reinforcement Learning based control. A notable exception, where to the best of our knowledge, little to no effort has been made in this direction is Quality-Diver… ▽ More In the past few years, a considerable amount of research has been dedicated to the exploitation of previous learning experiences and the design of Few-shot and Meta Learning approaches, in problem domains ranging from Computer Vision to Reinforcement Learning based control. A notable exception, where to the best of our knowledge, little to no effort has been made in this direction is Quality-Diversity (QD) optimization. QD methods have been shown to be effective tools in dealing with deceptive minima and sparse rewards in Reinforcement Learning. However, they remain costly due to their reliance on inherently sample inefficient evolutionary processes. We show that, given examples from a task distribution, information about the paths taken by optimization in parameter space can be leveraged to build a prior population, which when used to initialize QD methods in unseen environments, allows for few-shot adaptation. Our proposed method does not require backpropagation. It is simple to implement and scale, and furthermore, it is agnostic to the underlying models that are being trained. Experiments carried in both sparse and dense reward settings using robotic manipulation and navigation benchmarks show that it considerably reduces the number of generations that are required for QD optimization in these environments. △ Less

Submitted 18 January, 2024; v1 submitted 14 September, 2021; originally announced September 2021.

Comments: Accepted for publication in the IEEE Robotics and Automation Letters (RA-L) journal

Journal ref: IEEE Robotics and Automation Letters 7.2 (2022): 4424-4431

arXiv:2104.04768 [pdf, other]

Selection-Expansion: A Unifying Framework for Motion-Planning and Diversity Search Algorithms

Authors: Alexandre Chenu, Nicolas Perrin-Gilbert, Stéphane Doncieux, Olivier Sigaud

Abstract: Reinforcement learning agents need a reward signal to learn successful policies. When this signal is sparse or the corresponding gradient is deceptive, such agents need a dedicated mechanism to efficiently explore their search space without relying on the reward. Looking for a large diversity of behaviors or using Motion Planning (MP) algorithms are two options in this context. In this paper, we b… ▽ More Reinforcement learning agents need a reward signal to learn successful policies. When this signal is sparse or the corresponding gradient is deceptive, such agents need a dedicated mechanism to efficiently explore their search space without relying on the reward. Looking for a large diversity of behaviors or using Motion Planning (MP) algorithms are two options in this context. In this paper, we build on the common roots between these two options to investigate the properties of two diversity search algorithms, the Novelty Search and the Goal Exploration Process algorithms. These algorithms look for diversity in an outcome space or behavioral space which is generally hand-designed to represent what matters for a given task. The relation to MP algorithms reveals that the smoothness, or lack of smoothness of the map** between the policy parameter space and the outcome space plays a key role in the search efficiency. In particular, we show empirically that, if the map** is smooth enough, i.e. if two close policies in the parameter space lead to similar outcomes, then diversity algorithms tend to inherit exploration properties of MP algorithms. By contrast, if it is not, diversity algorithms lose these properties and their performance strongly depends on specific heuristics, notably filtering mechanisms that discard some of the explored policies. △ Less

Submitted 10 April, 2021; originally announced April 2021.

arXiv:2104.03936 [pdf, other]

doi 10.1145/3449639.3459303

BR-NS: an Archive-less Approach to Novelty Search

Authors: Achkan Salehi, Alexandre Coninx, Stephane Doncieux

Abstract: As open-ended learning based on divergent search algorithms such as Novelty Search (NS) draws more and more attention from the research community, it is natural to expect that its application to increasingly complex real-world problems will require the exploration to operate in higher dimensional Behavior Spaces which will not necessarily be Euclidean. Novelty Search traditionally relies on k-near… ▽ More As open-ended learning based on divergent search algorithms such as Novelty Search (NS) draws more and more attention from the research community, it is natural to expect that its application to increasingly complex real-world problems will require the exploration to operate in higher dimensional Behavior Spaces which will not necessarily be Euclidean. Novelty Search traditionally relies on k-nearest neighbours search and an archive of previously visited behavior descriptors which are assumed to live in a Euclidean space. This is problematic because of a number of issues. On one hand, Euclidean distance and Nearest-neighbour search are known to behave differently and become less meaningful in high dimensional spaces. On the other hand, the archive has to be bounded since, memory considerations aside, the computational complexity of finding nearest neighbours in that archive grows linearithmically with its size. A sub-optimal bound can result in "cycling" in the behavior space, which inhibits the progress of the exploration. Furthermore, the performance of NS depends on a number of algorithmic choices and hyperparameters, such as the strategies to add or remove elements to the archive and the number of neighbours to use in k-nn search. In this paper, we discuss an alternative approach to novelty estimation, dubbed Behavior Recognition based Novelty Search (BR-NS), which does not require an archive, makes no assumption on the metrics that can be defined in the behavior space and does not rely on nearest neighbours search. We conduct experiments to gain insight into its feasibility and dynamics as well as potential advantages over archive-based NS in terms of time complexity. △ Less

Submitted 8 April, 2021; originally announced April 2021.

Comments: Author version of the paper accepted at GECCO 21

arXiv:2102.03140 [pdf, other]

doi 10.1145/3449639.3459314

Sparse Reward Exploration via Novelty Search and Emitters

Authors: Giuseppe Paolo, Alexandre Coninx, Stephane Doncieux, Alban Laflaquière

Abstract: Reward-based optimization algorithms require both exploration, to find rewards, and exploitation, to maximize performance. The need for efficient exploration is even more significant in sparse reward settings, in which performance feedback is given sparingly, thus rendering it unsuitable for guiding the search process. In this work, we introduce the SparsE Reward Exploration via Novelty and Emitte… ▽ More Reward-based optimization algorithms require both exploration, to find rewards, and exploitation, to maximize performance. The need for efficient exploration is even more significant in sparse reward settings, in which performance feedback is given sparingly, thus rendering it unsuitable for guiding the search process. In this work, we introduce the SparsE Reward Exploration via Novelty and Emitters (SERENE) algorithm, capable of efficiently exploring a search space, as well as optimizing rewards found in potentially disparate areas. Contrary to existing emitters-based approaches, SERENE separates the search space exploration and reward exploitation into two alternating processes. The first process performs exploration through Novelty Search, a divergent search algorithm. The second one exploits discovered reward areas through emitters, i.e. local instances of population-based optimization algorithms. A meta-scheduler allocates a global computational budget by alternating between the two processes, ensuring the discovery and efficient exploitation of disjoint reward areas. SERENE returns both a collection of diverse solutions covering the search space and a collection of high-performing solutions for each distinct reward area. We evaluate SERENE on various sparse reward environments and show it compares favorably to existing baselines. △ Less

Submitted 16 April, 2021; v1 submitted 5 February, 2021; originally announced February 2021.

Comments: In 2021 Genetic and Evolutionary Computation Conference (GECCO 21), July, 2021, Lille, France. ACM, New York, NY, USA, 11 pages

arXiv:2005.06224 [pdf, other]

Novelty Search makes Evolvability Inevitable

Authors: Stephane Doncieux, Giuseppe Paolo, Alban Laflaquière, Alexandre Coninx

Abstract: Evolvability is an important feature that impacts the ability of evolutionary processes to find interesting novel solutions and to deal with changing conditions of the problem to solve. The estimation of evolvability is not straightforward and is generally too expensive to be directly used as selective pressure in the evolutionary process. Indirectly promoting evolvability as a side effect of othe… ▽ More Evolvability is an important feature that impacts the ability of evolutionary processes to find interesting novel solutions and to deal with changing conditions of the problem to solve. The estimation of evolvability is not straightforward and is generally too expensive to be directly used as selective pressure in the evolutionary process. Indirectly promoting evolvability as a side effect of other easier and faster to compute selection pressures would thus be advantageous. In an unbounded behavior space, it has already been shown that evolvable individuals naturally appear and tend to be selected as they are more likely to invade empty behavior niches. Evolvability is thus a natural byproduct of the search in this context. However, practical agents and environments often impose limits on the reach-able behavior space. How do these boundaries impact evolvability? In this context, can evolvability still be promoted without explicitly rewarding it? We show that Novelty Search implicitly creates a pressure for high evolvability even in bounded behavior spaces, and explore the reasons for such a behavior. More precisely we show that, throughout the search, the dynamic evaluation of novelty rewards individuals which are very mobile in the behavior space, which in turn promotes evolvability. △ Less

Submitted 13 May, 2020; originally announced May 2020.

arXiv:2005.06223 [pdf, other]

DREAM Architecture: a Developmental Approach to Open-Ended Learning in Robotics

Authors: Stephane Doncieux, Nicolas Bredeche, Léni Le Goff, Benoît Girard, Alexandre Coninx, Olivier Sigaud, Mehdi Khamassi, Natalia Díaz-Rodríguez, David Filliat, Timothy Hospedales, A. Eiben, Richard Duro

Abstract: Robots are still limited to controlled conditions, that the robot designer knows with enough details to endow the robot with the appropriate models or behaviors. Learning algorithms add some flexibility with the ability to discover the appropriate behavior given either some demonstrations or a reward to guide its exploration with a reinforcement learning algorithm. Reinforcement learning algorithm… ▽ More Robots are still limited to controlled conditions, that the robot designer knows with enough details to endow the robot with the appropriate models or behaviors. Learning algorithms add some flexibility with the ability to discover the appropriate behavior given either some demonstrations or a reward to guide its exploration with a reinforcement learning algorithm. Reinforcement learning algorithms rely on the definition of state and action spaces that define reachable behaviors. Their adaptation capability critically depends on the representations of these spaces: small and discrete spaces result in fast learning while large and continuous spaces are challenging and either require a long training period or prevent the robot from converging to an appropriate behavior. Beside the operational cycle of policy execution and the learning cycle, which works at a slower time scale to acquire new policies, we introduce the redescription cycle, a third cycle working at an even slower time scale to generate or adapt the required representations to the robot, its environment and the task. We introduce the challenges raised by this cycle and we present DREAM (Deferred Restructuring of Experience in Autonomous Machines), a developmental cognitive architecture to bootstrap this redescription process stage by stage, build new state representations with appropriate motivations, and transfer the acquired knowledge across domains or tasks or even across robots. We describe results obtained so far with this approach and end up with a discussion of the questions it raises in Neuroscience. △ Less

Submitted 13 May, 2020; originally announced May 2020.

arXiv:1910.01738 [pdf, other]

doi 10.1007/978-3-030-64580-9_26

State Representation Learning from Demonstration

Authors: Astrid Merckling, Alexandre Coninx, Loic Cressot, Stéphane Doncieux, Nicolas Perrin-Gilbert

Abstract: Robots could learn their own state and world representation from perception and experience without supervision. This desirable goal is the main focus of our field of interest, state representation learning (SRL). Indeed, a compact representation of such a state is beneficial to help robots grasp onto their environment for interacting. The properties of this representation have a strong impact on t… ▽ More Robots could learn their own state and world representation from perception and experience without supervision. This desirable goal is the main focus of our field of interest, state representation learning (SRL). Indeed, a compact representation of such a state is beneficial to help robots grasp onto their environment for interacting. The properties of this representation have a strong impact on the adaptive capability of the agent. In this article we present an approach based on imitation learning. The idea is to train several policies that share the same representation to reproduce various demonstrations. To do so, we use a multi-head neural network with a shared state representation feeding a task-specific agent. If the demonstrations are diverse, the trained representation will eventually contain the information necessary for all tasks, while discarding irrelevant information. As such, it will potentially become a compact state representation useful for new tasks. We call this approach SRLfD (State Representation Learning from Demonstration). Our experiments confirm that when a controller takes SRLfD-based representations as input, it can achieve better performance than with other representation strategies and promote more efficient reinforcement learning (RL) than with an end-to-end RL strategy. △ Less

Submitted 26 September, 2021; v1 submitted 15 September, 2019; originally announced October 2019.

Comments: Published as a conference paper at LOD 2020

arXiv:1909.05508 [pdf]

Unsupervised Learning and Exploration of Reachable Outcome Space

Authors: Giuseppe Paolo, Alban Laflaquière, Alexandre Coninx, Stephane Doncieux

Abstract: Performing Reinforcement Learning in sparse rewards settings, with very little prior knowledge, is a challenging problem since there is no signal to properly guide the learning process. In such situations, a good search strategy is fundamental. At the same time, not having to adapt the algorithm to every single problem is very desirable. Here we introduce TAXONS, a Task Agnostic eXploration of Out… ▽ More Performing Reinforcement Learning in sparse rewards settings, with very little prior knowledge, is a challenging problem since there is no signal to properly guide the learning process. In such situations, a good search strategy is fundamental. At the same time, not having to adapt the algorithm to every single problem is very desirable. Here we introduce TAXONS, a Task Agnostic eXploration of Outcome spaces through Novelty and Surprise algorithm. Based on a population-based divergent-search approach, it learns a set of diverse policies directly from high-dimensional observations, without any task-specific information. TAXONS builds a repertoire of policies while training an autoencoder on the high-dimensional observation of the final state of the system to build a low-dimensional outcome space. The learned outcome space, combined with the reconstruction error, is used to drive the search for new policies. Results show that TAXONS can find a diverse set of controllers, covering a good part of the ground-truth outcome space, while having no information about such space. △ Less

Submitted 4 May, 2020; v1 submitted 12 September, 2019; originally announced September 2019.

Comments: Published at IEEE International Conference on Robotics and Automation (ICRA) 2020

arXiv:1903.04413 [pdf, other]

Building an Affordances Map with Interactive Perception

Authors: Leni K. Le Goff, Oussama Yaakoubi, Alexandre Coninx, Stephane Doncieux

Abstract: Robots need to understand their environment to perform their task. If it is possible to pre-program a visual scene analysis process in closed environments, robots operating in an open environment would benefit from the ability to learn it through their interaction with their environment. This ability furthermore opens the way to the acquisition of affordances maps in which the action capabilities… ▽ More Robots need to understand their environment to perform their task. If it is possible to pre-program a visual scene analysis process in closed environments, robots operating in an open environment would benefit from the ability to learn it through their interaction with their environment. This ability furthermore opens the way to the acquisition of affordances maps in which the action capabilities of the robot structure its visual scene understanding. We propose an approach to build such affordances maps by relying on an interactive perception approach and an online classification. In the proposed formalization of affordances, actions and effects are related to visual features, not objects, and they can be combined. We have tested the approach on three action primitives and on a real PR2 robot. △ Less

Submitted 11 March, 2019; originally announced March 2019.

Comments: 14 pages, 15 figures

arXiv:1901.10968 [pdf, other]

Bootstrap** Robotic Ecological Perception from a Limited Set of Hypotheses Through Interactive Perception

Authors: Léni K. Le Goff, Ghanim Mukhtar, Alexandre Coninx, Stéphane Doncieux

Abstract: To solve its task, a robot needs to have the ability to interpret its perceptions. In vision, this interpretation is particularly difficult and relies on the understanding of the structure of the scene, at least to the extent of its task and sensorimotor abilities. A robot with the ability to build and adapt this interpretation process according to its own tasks and capabilities would push away th… ▽ More To solve its task, a robot needs to have the ability to interpret its perceptions. In vision, this interpretation is particularly difficult and relies on the understanding of the structure of the scene, at least to the extent of its task and sensorimotor abilities. A robot with the ability to build and adapt this interpretation process according to its own tasks and capabilities would push away the limits of what robots can achieve in a non controlled environment. A solution is to provide the robot with processes to build such representations that are not specific to an environment or a situation. A lot of works focus on objects segmentation, recognition and manipulation. Defining an object solely on the basis of its visual appearance is challenging given the wide range of possible objects and environments. Therefore, current works make simplifying assumptions about the structure of a scene. Such assumptions reduce the adaptivity of the object extraction process to the environments in which the assumption holds. To limit such assumptions, we introduce an exploration method aimed at identifying moveable elements in a scene without considering the concept of object. By using the interactive perception framework, we aim at bootstrap** the acquisition process of a representation of the environment with a minimum of context specific assumptions. The robotic system builds a perceptual map called relevance map which indicates the moveable parts of the current scene. A classifier is trained online to predict the category of each region (moveable or non-moveable). It is also used to select a region with which to interact, with the goal of minimizing the uncertainty of the classification. A specific classifier is introduced to fit these needs: the collaborative mixture models classifier. The method is tested on a set of scenarios of increasing complexity, using both simulations and a PR2 robot. △ Less

Submitted 30 January, 2019; originally announced January 2019.

Comments: 21 pages, 21 figures

arXiv:1901.00811 [pdf, other]

doi 10.1016/j.robot.2020.103710

From exploration to control: learning object manipulation skills through novelty search and local adaptation

Authors: Seungsu Kim, Alexandre Coninx, Stephane Doncieux

Abstract: Programming a robot to deal with open-ended tasks remains a challenge, in particular if the robot has to manipulate objects. Launching, gras**, pushing or any other object interaction can be simulated but the corresponding models are not reversible and the robot behavior thus cannot be directly deduced. These behaviors are hard to learn without a demonstration as the search space is large and th… ▽ More Programming a robot to deal with open-ended tasks remains a challenge, in particular if the robot has to manipulate objects. Launching, gras**, pushing or any other object interaction can be simulated but the corresponding models are not reversible and the robot behavior thus cannot be directly deduced. These behaviors are hard to learn without a demonstration as the search space is large and the reward sparse. We propose a method to autonomously generate a diverse repertoire of simple object interaction behaviors in simulation. Our goal is to bootstrap a robot learning and development process with limited information about what the robot has to achieve and how. This repertoire can be exploited to solve different tasks in reality thanks to a proposed adaptation method or could be used as a training set for data-hungry algorithms. The proposed approach relies on the definition of a goal space and generates a repertoire of trajectories to reach attainable goals, thus allowing the robot to control this goal space. The repertoire is built with an off-the-shelf simulation thanks to a quality diversity algorithm. The result is a set of solutions tested in simulation only. It may result in two different problems: (1) as the repertoire is discrete and finite, it may not contain the trajectory to deal with a given situation or (2) some trajectories may lead to a behavior in reality that differs from simulation because of a reality gap. We propose an approach to deal with both issues by using a local linearization of the map** between the motion parameters and the observed effects. Furthermore, we present an approach to update the existing solutions repertoire with the tests done on the real robot. The approach has been validated on two different experiments on the Baxter robot: a ball launching and a joystick manipulation tasks. △ Less

Submitted 16 November, 2020; v1 submitted 3 January, 2019; originally announced January 2019.

Comments: 30 pages, 18 figures, accepted for publication in Robotics and Autonomous Systems

arXiv:1811.02945 [pdf]

doi 10.1109/ICDL-EpiRob44920.2019

Behavioural Repertoire via Generative Adversarial Policy Networks

Authors: Marija Jegorova, Stéphane Doncieux, Timothy Hospedales

Abstract: Learning algorithms are enabling robots to solve increasingly challenging real-world tasks. These approaches often rely on demonstrations and reproduce the behavior shown. Unexpected changes in the environment may require using different behaviors to achieve the same effect, for instance to reach and grasp an object in changing clutter. An emerging paradigm addressing this robustness issue is to l… ▽ More Learning algorithms are enabling robots to solve increasingly challenging real-world tasks. These approaches often rely on demonstrations and reproduce the behavior shown. Unexpected changes in the environment may require using different behaviors to achieve the same effect, for instance to reach and grasp an object in changing clutter. An emerging paradigm addressing this robustness issue is to learn a diverse set of successful behaviors for a given task, from which a robot can select the most suitable policy when faced with a new environment. In this paper, we explore a novel realization of this vision by learning a generative model over policies. Rather than learning a single policy, or a small fixed repertoire, our generative model for policies compactly encodes an unbounded number of policies and allows novel controller variants to be sampled. Leveraging our generative policy network, a robot can sample novel behaviors until it finds one that works for a new environment. We demonstrate this idea with an application of robust ball-throwing in the presence of obstacles. We show that this approach achieves a greater diversity of behaviors than an existing evolutionary approach, while maintaining good efficacy of sampled behaviors, allowing a Baxter robot to hit targets more often when ball throwing in the presence of obstacles. △ Less

Submitted 18 February, 2020; v1 submitted 7 November, 2018; originally announced November 2018.

Comments: In Proceedings of 2019 Joint IEEE 9th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob), pages 320 - 326

Journal ref: 2019 Joint IEEE 9th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob)

arXiv:1803.03453 [pdf, other]

The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities

Authors: Joel Lehman, Jeff Clune, Dusan Misevic, Christoph Adami, Lee Altenberg, Julie Beaulieu, Peter J. Bentley, Samuel Bernard, Guillaume Beslon, David M. Bryson, Patryk Chrabaszcz, Nick Cheney, Antoine Cully, Stephane Doncieux, Fred C. Dyer, Kai Olav Ellefsen, Robert Feldt, Stephan Fischer, Stephanie Forrest, Antoine Frénoy, Christian Gagné, Leni Le Goff, Laura M. Grabowski, Babak Hodjat, Frank Hutter , et al. (28 additional authors not shown)

Abstract: Biological evolution provides a creative fount of complex and subtle adaptations, often surprising the scientists who discover them. However, because evolution is an algorithmic process that transcends the substrate in which it occurs, evolution's creativity is not limited to nature. Indeed, many researchers in the field of digital evolution have observed their evolving algorithms and organisms su… ▽ More Biological evolution provides a creative fount of complex and subtle adaptations, often surprising the scientists who discover them. However, because evolution is an algorithmic process that transcends the substrate in which it occurs, evolution's creativity is not limited to nature. Indeed, many researchers in the field of digital evolution have observed their evolving algorithms and organisms subverting their intentions, exposing unrecognized bugs in their code, producing unexpected adaptations, or exhibiting outcomes uncannily convergent with ones in nature. Such stories routinely reveal creativity by evolution in these digital worlds, but they rarely fit into the standard scientific narrative. Instead they are often treated as mere obstacles to be overcome, rather than results that warrant study in their own right. The stories themselves are traded among researchers through oral tradition, but that mode of information transmission is inefficient and prone to error and outright loss. Moreover, the fact that these stories tend to be shared only among practitioners means that many natural scientists do not realize how interesting and lifelike digital organisms are and how natural their evolution can be. To our knowledge, no collection of such anecdotes has been published before. This paper is the crowd-sourced product of researchers in the fields of artificial life and evolutionary computation who have provided first-hand accounts of such cases. It thus serves as a written, fact-checked collection of scientifically important and even entertaining stories. In doing so we also present here substantial evidence that the existence and importance of evolutionary surprises extends beyond the natural world, and may indeed be a universal property of all complex evolving systems. △ Less

Submitted 21 November, 2019; v1 submitted 9 March, 2018; originally announced March 2018.

arXiv:1507.06877 [pdf, other]

Multi-objective analysis of computational models

Authors: Stéphane Doncieux, Jean Liénard, Benoît Girard, Mohamed Hamdaoui, Joël Chaskalovic

Abstract: Computational models are of increasing complexity and their behavior may in particular emerge from the interaction of different parts. Studying such models becomes then more and more difficult and there is a need for methods and tools supporting this process. Multi-objective evolutionary algorithms generate a set of trade-off solutions instead of a single optimal solution. The availability of a se… ▽ More Computational models are of increasing complexity and their behavior may in particular emerge from the interaction of different parts. Studying such models becomes then more and more difficult and there is a need for methods and tools supporting this process. Multi-objective evolutionary algorithms generate a set of trade-off solutions instead of a single optimal solution. The availability of a set of solutions that have the specificity to be optimal relative to carefully chosen objectives allows to perform data mining in order to better understand model features and regularities. We review the corresponding work, propose a unifying framework, and highlight its potential use. Typical questions that such a methodology allows to address are the following: what are the most critical parameters of the model? What are the relations between the parameters and the objectives? What are the typical behaviors of the model? Two examples are provided to illustrate the capabilities of the methodology. The features of a flap**-wing robot are thus evaluated to find out its speed-energy relation, together with the criticality of its parameters. A neurocomputational model of the Basal Ganglia brain nuclei is then considered and its most salient features according to this methodology are presented and discussed. △ Less

Submitted 24 July, 2015; originally announced July 2015.

arXiv:1307.1870 [pdf, other]

Crossing the Reality Gap: a Short Introduction to the Transferability Approach

Authors: Jean-Baptiste Mouret, Sylvain Koos, Stéphane Doncieux

Abstract: In robotics, gradient-free optimization algorithms (e.g. evolutionary algorithms) are often used only in simulation because they require the evaluation of many candidate solutions. Nevertheless, solutions obtained in simulation often do not work well on the real device. The transferability approach aims at crossing this gap between simulation and reality by \emph{making the optimization algorithm… ▽ More In robotics, gradient-free optimization algorithms (e.g. evolutionary algorithms) are often used only in simulation because they require the evaluation of many candidate solutions. Nevertheless, solutions obtained in simulation often do not work well on the real device. The transferability approach aims at crossing this gap between simulation and reality by \emph{making the optimization algorithm aware of the limits of the simulation}. In the present paper, we first describe the transferability function, that maps solution descriptors to a score representing how well a simulator matches the reality. We then show that this function can be learned using a regression algorithm and a few experiments with the real devices. Our results are supported by an extensive study of the reality gap for a simple quadruped robot whose control parameters are optimized. In particular, we mapped the whole search space in reality and in simulation to understand the differences between the fitness landscapes. △ Less

Submitted 7 July, 2013; originally announced July 2013.

Showing 1–30 of 30 results for author: Doncieux, S