Search | arXiv e-print repository

Reliably Re-Acting to Partner's Actions with the Social Intrinsic Motivation of Transfer Empowerment

Authors: Tessa van der Heiden, Herke van Hoof, Efstratios Gavves, Christoph Salge

Abstract: We consider multi-agent reinforcement learning (MARL) for cooperative communication and coordination tasks. MARL agents can be brittle because they can overfit their training partners' policies. This overfitting can produce agents that adopt policies that act under the expectation that other agents will act in a certain way rather than react to their actions. Our objective is to bias the learning… ▽ More We consider multi-agent reinforcement learning (MARL) for cooperative communication and coordination tasks. MARL agents can be brittle because they can overfit their training partners' policies. This overfitting can produce agents that adopt policies that act under the expectation that other agents will act in a certain way rather than react to their actions. Our objective is to bias the learning process towards finding reactive strategies towards other agents' behaviors. Our method, transfer empowerment, measures the potential influence between agents' actions. Results from three simulated cooperation scenarios support our hypothesis that transfer empowerment improves MARL performance. We discuss how transfer empowerment could be a useful principle to guide multi-agent coordination by ensuring reactiveness to one's partner. △ Less

Submitted 7 March, 2022; originally announced March 2022.

Comments: arXiv admin note: text overlap with arXiv:2012.08255

arXiv:2012.08255 [pdf, other]

Robust Multi-Agent Reinforcement Learning with Social Empowerment for Coordination and Communication

Authors: T. van der Heiden, C. Salge, E. Gavves, H. van Hoof

Abstract: We consider the problem of robust multi-agent reinforcement learning (MARL) for cooperative communication and coordination tasks. MARL agents, mainly those trained in a centralized way, can be brittle because they can adopt policies that act under the expectation that other agents will act a certain way rather than react to their actions. Our objective is to bias the learning process towards findi… ▽ More We consider the problem of robust multi-agent reinforcement learning (MARL) for cooperative communication and coordination tasks. MARL agents, mainly those trained in a centralized way, can be brittle because they can adopt policies that act under the expectation that other agents will act a certain way rather than react to their actions. Our objective is to bias the learning process towards finding strategies that remain reactive towards others' behavior. Social empowerment measures the potential influence between agents' actions. We propose it as an additional reward term, so agents better adapt to other agents' actions. We show that the proposed method results in obtaining higher rewards faster and a higher success rate in three cooperative communication and coordination tasks. △ Less

Submitted 15 December, 2020; originally announced December 2020.

arXiv:2003.08158 [pdf, other]

Social Navigation with Human Empowerment driven Deep Reinforcement Learning

Authors: Tessa van der Heiden, Florian Mirus, Herke van Hoof

Abstract: Mobile robot navigation has seen extensive research in the last decades. The aspect of collaboration with robots and humans sharing workspaces will become increasingly important in the future. Therefore, the next generation of mobile robots needs to be socially-compliant to be accepted by their human collaborators. However, a formal definition of compliance is not straightforward. On the other han… ▽ More Mobile robot navigation has seen extensive research in the last decades. The aspect of collaboration with robots and humans sharing workspaces will become increasingly important in the future. Therefore, the next generation of mobile robots needs to be socially-compliant to be accepted by their human collaborators. However, a formal definition of compliance is not straightforward. On the other hand, empowerment has been used by artificial agents to learn complicated and generalized actions and also has been shown to be a good model for biological behaviors. In this paper, we go beyond the approach of classical \acf{RL} and provide our agent with intrinsic motivation using empowerment. In contrast to self-empowerment, a robot employing our approach strives for the empowerment of people in its environment, so they are not disturbed by the robot's presence and motion. In our experiments, we show that our approach has a positive influence on humans, as it minimizes its distance to humans and thus decreases human travel time while moving efficiently towards its own goal. An interactive user-study shows that our method is considered more social than other state-of-the-art approaches by the participants. △ Less

Submitted 5 August, 2020; v1 submitted 18 March, 2020; originally announced March 2020.

arXiv:1910.06673 [pdf, other]

SafeCritic: Collision-Aware Trajectory Prediction

Authors: Tessa van der Heiden, Naveen Shankar Nagaraja, Christian Weiss, Efstratios Gavves

Abstract: Navigating complex urban environments safely is a key to realize fully autonomous systems. Predicting future locations of vulnerable road users, such as pedestrians and cyclists, thus, has received a lot of attention in the recent years. While previous works have addressed modeling interactions with the static (obstacles) and dynamic (humans) environment agents, we address an important gap in traj… ▽ More Navigating complex urban environments safely is a key to realize fully autonomous systems. Predicting future locations of vulnerable road users, such as pedestrians and cyclists, thus, has received a lot of attention in the recent years. While previous works have addressed modeling interactions with the static (obstacles) and dynamic (humans) environment agents, we address an important gap in trajectory prediction. We propose SafeCritic, a model that synergizes generative adversarial networks for generating multiple "real" trajectories with reinforcement learning to generate "safe" trajectories. The Discriminator evaluates the generated candidates on whether they are consistent with the observed inputs. The Critic network is environmentally aware to prune trajectories that are in collision or are in violation with the environment. The auto-encoding loss stabilizes training and prevents mode-collapse. We demonstrate results on two large scale data sets with a considerable improvement over state-of-the-art. We also show that the Critic is able to classify the safety of trajectories. △ Less

Submitted 15 October, 2019; originally announced October 2019.

Comments: To Appear as workshop paper for the British Machine Vision Conference (BMVC) 2019

Showing 1–4 of 4 results for author: van der Heiden, T