Skip to main content

Showing 1–50 of 50 results for author: Oliehoek, F A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.19024  [pdf, other

    cs.LG cs.AI cs.GT cs.MA

    Inverse Concave-Utility Reinforcement Learning is Inverse Game Theory

    Authors: Mustafa Mert Çelikok, Frans A. Oliehoek, Jan-Willem van de Meent

    Abstract: We consider inverse reinforcement learning problems with concave utilities. Concave Utility Reinforcement Learning (CURL) is a generalisation of the standard RL objective, which employs a concave function of the state occupancy measure, rather than a linear function. CURL has garnered recent attention for its ability to represent instances of many important applications including the standard RL s… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  2. arXiv:2403.02227  [pdf, other

    cs.GT cs.AI cs.MA

    Policy Space Response Oracles: A Survey

    Authors: Ariyan Bighashdel, Yongzhao Wang, Stephen McAleer, Rahul Savani, Frans A. Oliehoek

    Abstract: Game theory provides a mathematical way to study the interaction between multiple decision makers. However, classical game-theoretic analysis is limited in scalability due to the large number of strategies, precluding direct application to more complex scenarios. This survey provides a comprehensive overview of a framework for large games, known as Policy Space Response Oracles (PSRO), which holds… ▽ More

    Submitted 27 May, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: Ariyan Bighashdel and Yongzhao Wang contributed equally

    Journal ref: The 33rd International Joint Conference on Artificial Intelligence, 2024

  3. arXiv:2402.12034  [pdf, other

    stat.ML cs.LG

    When Do Off-Policy and On-Policy Policy Gradient Methods Align?

    Authors: Davide Mambelli, Stephan Bongers, Onno Zoeter, Matthijs T. J. Spaan, Frans A. Oliehoek

    Abstract: Policy gradient methods are widely adopted reinforcement learning algorithms for tasks with continuous action spaces. These methods succeeded in many application domains, however, because of their notorious sample inefficiency their use remains limited to problems where fast and accurate simulations are available. A common way to improve sample efficiency is to modify their objective function to b… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

  4. arXiv:2311.11288  [pdf, other

    cs.AI

    What Lies beyond the Pareto Front? A Survey on Decision-Support Methods for Multi-Objective Optimization

    Authors: Zuzanna Osika, Jazmin Zatarain Salazar, Diederik M. Roijers, Frans A. Oliehoek, Pradeep K. Murukannaiah

    Abstract: We present a review that unifies decision-support methods for exploring the solutions produced by multi-objective optimization (MOO) algorithms. As MOO is applied to solve diverse problems, approaches for analyzing the trade-offs offered by MOO algorithms are scattered across fields. We provide an overview of the advances on this topic, including methods for visualization, mining the solution set,… ▽ More

    Submitted 19 November, 2023; originally announced November 2023.

    Comments: IJCAI 2023 Conference Paper, Survey Track

  5. arXiv:2306.02419  [pdf, other

    cs.LG

    Bad Habits: Policy Confounding and Out-of-Trajectory Generalization in RL

    Authors: Miguel Suau, Matthijs T. J. Spaan, Frans A. Oliehoek

    Abstract: Reinforcement learning agents tend to develop habits that are effective only under specific policies. Following an initial exploration phase where agents try out different actions, they eventually converge onto a particular policy. As this occurs, the distribution over state-action trajectories becomes narrower, leading agents to repeatedly experience the same transitions. This repetitive exposure… ▽ More

    Submitted 24 June, 2024; v1 submitted 4 June, 2023; originally announced June 2023.

  6. arXiv:2306.00840  [pdf, other

    cs.LG cs.AI

    What model does MuZero learn?

    Authors: **ke He, Thomas M. Moerland, Frans A. Oliehoek

    Abstract: Model-based reinforcement learning has drawn considerable interest in recent years, given its promise to improve sample efficiency. Moreover, when using deep-learned models, it is potentially possible to learn compact models from complex sensor data. However, the effectiveness of these learned models, particularly their capacity to plan, i.e., to improve the current policy, remains unclear. In thi… ▽ More

    Submitted 18 October, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

  7. arXiv:2305.18071  [pdf, ps, other

    cs.AI cs.GT cs.MA

    Towards a Unifying Model of Rationality in Multiagent Systems

    Authors: Robert Loftin, Mustafa Mert Çelikok, Frans A. Oliehoek

    Abstract: Multiagent systems deployed in the real world need to cooperate with other agents (including humans) nearly as effectively as these agents cooperate with one another. To design such AI, and provide guarantees of its effectiveness, we need to clearly specify what types of agents our AI must be able to cooperate with. In this work we propose a generic model of socially intelligent agents, which are… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

    Comments: 5 Pages, To appear in the OptLearnMAS Workshop at AAMAS 2023

    ACM Class: I.2.6

  8. arXiv:2302.13844  [pdf, other

    cs.AI cs.MA

    Safe Multi-agent Learning via Trap** Regions

    Authors: Aleksander Czechowski, Frans A. Oliehoek

    Abstract: One of the main challenges of multi-agent learning lies in establishing convergence of the algorithms, as, in general, a collection of individual, self-serving agents is not guaranteed to converge with their joint policy, when learning concurrently. This is in stark contrast to most single-agent environments, and sets a prohibitive barrier for deployment in practical applications, as it induces un… ▽ More

    Submitted 16 May, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

  9. arXiv:2302.03438  [pdf, other

    cs.LG cs.AI cs.MA

    Uncoupled Learning of Differential Stackelberg Equilibria with Commitments

    Authors: Robert Loftin, Mustafa Mert Çelikok, Herke van Hoof, Samuel Kaski, Frans A. Oliehoek

    Abstract: In multi-agent problems requiring a high degree of cooperation, success often depends on the ability of the agents to adapt to each other's behavior. A natural solution concept in such settings is the Stackelberg equilibrium, in which the ``leader'' agent selects the strategy that maximizes its own payoff given that the ``follower'' agent will choose their best response to this strategy. Recent wo… ▽ More

    Submitted 13 June, 2024; v1 submitted 7 February, 2023; originally announced February 2023.

    Comments: International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS) 2024

  10. arXiv:2208.14407  [pdf, other

    cs.LG

    An Analysis of Model-Based Reinforcement Learning From Abstracted Observations

    Authors: Rolf A. N. Starre, Marco Loog, Elena Congeduti, Frans A. Oliehoek

    Abstract: Many methods for Model-based Reinforcement learning (MBRL) in Markov decision processes (MDPs) provide guarantees for both the accuracy of the model they can deliver and the learning efficiency. At the same time, state abstraction techniques allow for a reduction of the size of an MDP while maintaining a bounded loss with respect to the original problem. Therefore, it may come as a surprise that n… ▽ More

    Submitted 15 November, 2023; v1 submitted 30 August, 2022; originally announced August 2022.

    Comments: 36 pages, 2 figures, published in Transactions on Machine Learning Research (TMLR) 2023

  11. arXiv:2207.00288  [pdf, other

    cs.LG cs.MA

    Distributed Influence-Augmented Local Simulators for Parallel MARL in Large Networked Systems

    Authors: Miguel Suau, **ke He, Mustafa Mert Çelikok, Matthijs T. J. Spaan, Frans A. Oliehoek

    Abstract: Due to its high sample complexity, simulation is, as of today, critical for the successful application of reinforcement learning. Many real-world problems, however, exhibit overly complex dynamics, which makes their full-scale simulation computationally slow. In this paper, we show how to decompose large networked systems of many agents into multiple local components such that we can build separat… ▽ More

    Submitted 1 March, 2024; v1 submitted 1 July, 2022; originally announced July 2022.

  12. arXiv:2206.10614  [pdf, ps, other

    cs.GT cs.AI cs.LG cs.MA

    On the Impossibility of Learning to Cooperate with Adaptive Partner Strategies in Repeated Games

    Authors: Robert Loftin, Frans A. Oliehoek

    Abstract: Learning to cooperate with other agents is challenging when those agents also possess the ability to adapt to our own behavior. Practical and theoretical approaches to learning in cooperative settings typically assume that other agents' behaviors are stationary, or else make very specific assumptions about other agents' learning processes. The goal of this work is to understand whether we can reli… ▽ More

    Submitted 25 November, 2022; v1 submitted 20 June, 2022; originally announced June 2022.

    Comments: 9 pages, to be published in The Proceedings of the 39th International Conference on Machine Learning, 2022

    ACM Class: I.2.6

  13. arXiv:2204.01160  [pdf, other

    cs.AI cs.LG cs.MA

    Best-Response Bayesian Reinforcement Learning with Bayes-adaptive POMDPs for Centaurs

    Authors: Mustafa Mert Çelikok, Frans A. Oliehoek, Samuel Kaski

    Abstract: Centaurs are half-human, half-AI decision-makers where the AI's goal is to complement the human. To do so, the AI must be able to recognize the goals and constraints of the human and have the means to help them. We present a novel formulation of the interaction between the human and the AI as a sequential game where the agents are modelled using Bayesian best-response models. We show that in this… ▽ More

    Submitted 3 April, 2022; originally announced April 2022.

    Comments: This paper is presented in part at the International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS) 2022

  14. arXiv:2202.08884  [pdf, other

    cs.LG cs.AI

    BADDr: Bayes-Adaptive Deep Dropout RL for POMDPs

    Authors: Sammie Katt, Hai Nguyen, Frans A. Oliehoek, Christopher Amato

    Abstract: While reinforcement learning (RL) has made great advances in scalability, exploration and partial observability are still active research topics. In contrast, Bayesian RL (BRL) provides a principled answer to both state estimation and the exploration-exploitation trade-off, but struggles to scale. To tackle this challenge, BRL frameworks with various prior assumptions have been proposed, with vari… ▽ More

    Submitted 17 February, 2022; originally announced February 2022.

  15. arXiv:2202.01534  [pdf, other

    cs.LG

    Influence-Augmented Local Simulators: A Scalable Solution for Fast Deep RL in Large Networked Systems

    Authors: Miguel Suau, **ke He, Matthijs T. J. Spaan, Frans A. Oliehoek

    Abstract: Learning effective policies for real-world problems is still an open challenge for the field of reinforcement learning (RL). The main limitation being the amount of data needed and the pace at which that data can be obtained. In this paper, we study how to build lightweight simulators of complicated systems that can run sufficiently fast for deep RL to be applicable. We focus on domains where agen… ▽ More

    Submitted 3 February, 2022; originally announced February 2022.

  16. arXiv:2201.11404  [pdf, other

    cs.AI

    Online Planning in POMDPs with Self-Improving Simulators

    Authors: **ke He, Miguel Suau, Hendrik Baier, Michael Kaisers, Frans A. Oliehoek

    Abstract: How can we plan efficiently in a large and complex environment when the time budget is limited? Given the original simulator of the environment, which may be computationally very demanding, we propose to learn online an approximate but much faster simulator that improves over time. To plan reliably and efficiently while the approximate simulator is learning, we develop a method that adaptively dec… ▽ More

    Submitted 12 December, 2022; v1 submitted 27 January, 2022; originally announced January 2022.

    Comments: presented at IJCAI 2022

  17. arXiv:2201.00012  [pdf, other

    cs.LG cs.AI

    MORAL: Aligning AI with Human Norms through Multi-Objective Reinforced Active Learning

    Authors: Markus Peschl, Arkady Zgonnikov, Frans A. Oliehoek, Luciano C. Siebert

    Abstract: Inferring reward functions from demonstrations and pairwise preferences are auspicious approaches for aligning Reinforcement Learning (RL) agents with human intentions. However, state-of-the art methods typically focus on learning a single reward model, thus rendering it difficult to trade off different reward functions from multiple experts. We propose Multi-Objective Reinforced Active Learning (… ▽ More

    Submitted 30 December, 2021; originally announced January 2022.

  18. arXiv:2110.04495  [pdf, other

    cs.LG cs.MA

    Multi-Agent MDP Homomorphic Networks

    Authors: Elise van der Pol, Herke van Hoof, Frans A. Oliehoek, Max Welling

    Abstract: This paper introduces Multi-Agent MDP Homomorphic Networks, a class of networks that allows distributed execution using only local information, yet is able to share experience between global symmetries in the joint state-action space of cooperative multi-agent systems. In cooperative multi-agent systems, complex symmetries arise between different configurations of the agents and their local observ… ▽ More

    Submitted 29 April, 2022; v1 submitted 9 October, 2021; originally announced October 2021.

    Comments: Camera ready version

  19. Difference Rewards Policy Gradients

    Authors: Jacopo Castellini, Sam Devlin, Frans A. Oliehoek, Rahul Savani

    Abstract: Policy gradient methods have become one of the most popular classes of algorithms for multi-agent reinforcement learning. A key challenge, however, that is not addressed by many of these methods is multi-agent credit assignment: assessing an agent's contribution to the overall performance, which is crucial for learning good policies. We propose a novel algorithm called Dr.Reinforce that explicitly… ▽ More

    Submitted 9 November, 2023; v1 submitted 21 December, 2020; originally announced December 2020.

    Comments: This work as been accepted as an Extended Abstract in Proc. of the 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2021), U. Endriss, A. Nowé, F. Dignum, A. Lomuscio (eds.), May 3-7 2021, Online

    ACM Class: I.2.6; I.2.11

    Journal ref: Neural Comput & Applic (2022)

  20. arXiv:2011.07665  [pdf, other

    cs.LG

    Analog Circuit Design with Dyna-Style Reinforcement Learning

    Authors: Wook Lee, Frans A. Oliehoek

    Abstract: In this work, we present a learning based approach to analog circuit design, where the goal is to optimize circuit performance subject to certain design constraints. One of the aspects that makes this problem challenging to optimize, is that measuring the performance of candidate configurations with simulation can be computationally expensive, particularly in the post-layout design. Additionally,… ▽ More

    Submitted 15 November, 2020; originally announced November 2020.

    Comments: NeurIPS 2020 Workshop on Machine Learning for Engineering Modeling, Simulation and Design

  21. arXiv:2011.01788  [pdf, other

    cs.AI

    Loss Bounds for Approximate Influence-Based Abstraction

    Authors: Elena Congeduti, Alexander Mey, Frans A. Oliehoek

    Abstract: Sequential decision making techniques hold great promise to improve the performance of many real-world systems, but computational complexity hampers their principled application. Influence-based abstraction aims to gain leverage by modeling local subproblems together with the 'influence' that the rest of the system exerts on them. While computing exact representations of such influence might be in… ▽ More

    Submitted 23 February, 2021; v1 submitted 3 November, 2020; originally announced November 2020.

    Comments: 13 pages, 9 figures

  22. arXiv:2010.11835  [pdf, other

    cs.AI cs.LG cs.MA

    Multi-agent active perception with prediction rewards

    Authors: Mikko Lauri, Frans A. Oliehoek

    Abstract: Multi-agent active perception is a task where a team of agents cooperatively gathers observations to compute a joint estimate of a hidden variable. The task is decentralized and the joint estimate can only be computed after the task ends by fusing observations of all agents. The objective is to maximize the accuracy of the estimate. The accuracy is quantified by a centralized prediction reward det… ▽ More

    Submitted 22 October, 2020; originally announced October 2020.

    Comments: 34th Conference on Neural Information Processing Systems (NeurIPS 2020)

  23. arXiv:2010.11038  [pdf, other

    cs.AI cs.LG

    Influence-Augmented Online Planning for Complex Environments

    Authors: **ke He, Miguel Suau, Frans A. Oliehoek

    Abstract: How can we plan efficiently in real time to control an agent in a complex environment that may involve many other agents? While existing sample-based planners have enjoyed empirical success in large POMDPs, their performance heavily relies on a fast simulator. However, real-world scenarios are complex in nature and their simulators are often computationally demanding, which severely limits the per… ▽ More

    Submitted 9 June, 2021; v1 submitted 21 October, 2020; originally announced October 2020.

    Comments: NeurIPS2020 - results have been updated after fixing minor bugs in the code

  24. arXiv:2010.03024  [pdf, other

    cs.CV

    Real-Time Resource Allocation for Tracking Systems

    Authors: Yash Satsangi, Shimon Whiteson, Frans A. Oliehoek, Henri Bouma

    Abstract: Automated tracking is key to many computer vision applications. However, many tracking systems struggle to perform in real-time due to the high computational cost of detecting people, especially in ultra high resolution images. We propose a new algorithm called \emph{PartiMax} that greatly reduces this cost by applying the person detector only to the relevant parts of the image. PartiMax exploits… ▽ More

    Submitted 21 September, 2020; originally announced October 2020.

    Comments: http://auai.org/uai2017/proceedings/papers/130.pdf

    Journal ref: UAI 2017

  25. Exploiting Submodular Value Functions For Scaling Up Active Perception

    Authors: Yash Satsangi, Shimon Whiteson, Frans A. Oliehoek, Matthijs T. J. Spaan

    Abstract: In active perception tasks, an agent aims to select sensory actions that reduce its uncertainty about one or more hidden variables. While partially observable Markov decision processes (POMDPs) provide a natural model for such problems, reward functions that directly penalize uncertainty in the agent's belief can remove the piecewise-linear and convex property of the value function required by mos… ▽ More

    Submitted 21 September, 2020; originally announced September 2020.

    Journal ref: Autonomous Robot 42 2018. Original article available via Springer journal open access: https://link.springer.com/article/10.1007/s10514-017-9666-5

  26. arXiv:2006.16908  [pdf, other

    cs.LG stat.ML

    MDP Homomorphic Networks: Group Symmetries in Reinforcement Learning

    Authors: Elise van der Pol, Daniel E. Worrall, Herke van Hoof, Frans A. Oliehoek, Max Welling

    Abstract: This paper introduces MDP homomorphic networks for deep reinforcement learning. MDP homomorphic networks are neural networks that are equivariant under symmetries in the joint state-action space of an MDP. Current approaches to deep reinforcement learning do not usually exploit knowledge about such structure. By building this prior knowledge into policy and value networks using an equivariance con… ▽ More

    Submitted 20 January, 2021; v1 submitted 30 June, 2020; originally announced June 2020.

  27. arXiv:2005.07308  [pdf, other

    cs.LG stat.ML

    Sensor Data for Human Activity Recognition: Feature Representation and Benchmarking

    Authors: Flávia Alves, Martin Gairing, Frans A. Oliehoek, Thanh-Toan Do

    Abstract: The field of Human Activity Recognition (HAR) focuses on obtaining and analysing data captured from monitoring devices (e.g. sensors). There is a wide range of applications within the field; for instance, assisted living, security surveillance, and intelligent transportation. In HAR, the development of Activity Recognition models is dependent upon the data captured by these devices and the methods… ▽ More

    Submitted 14 May, 2020; originally announced May 2020.

    Comments: 28 pages, 15 figures

  28. arXiv:2004.12678  [pdf, other

    cs.LG stat.ML

    Diversity in Action: General-Sum Multi-Agent Continuous Inverse Optimal Control

    Authors: Christian Muench, Frans A. Oliehoek, Dariu M. Gavrila

    Abstract: Traffic scenarios are inherently interactive. Multiple decision-makers predict the actions of others and choose strategies that maximize their rewards. We view these interactions from the perspective of game theory which introduces various challenges. Humans are not entirely rational, their rewards need to be inferred from real-world data, and any prediction algorithm needs to be real-time capable… ▽ More

    Submitted 27 April, 2020; originally announced April 2020.

    Comments: 16 pages, 6 figures

  29. arXiv:2004.00048  [pdf, other

    cs.AI

    Mimicking Evolution with Reinforcement Learning

    Authors: João P. Abrantes, Arnaldo J. Abrantes, Frans A. Oliehoek

    Abstract: Evolution gave rise to human and animal intelligence here on Earth. We argue that the path to develo** artificial human-like-intelligence will pass through mimicking the evolutionary process in a nature-like simulation. In Nature, there are two processes driving the development of the brain: evolution and learning. Evolution acts slowly, across generations, and amongst other things, it defines w… ▽ More

    Submitted 6 May, 2020; v1 submitted 31 March, 2020; originally announced April 2020.

    Comments: 18 pages, 7 figures

    ACM Class: I.2.0

  30. Decentralized MCTS via Learned Teammate Models

    Authors: Aleksander Czechowski, Frans A. Oliehoek

    Abstract: Decentralized online planning can be an attractive paradigm for cooperative multi-agent systems, due to improved scalability and robustness. A key difficulty of such approach lies in making accurate predictions about the decisions of other agents. In this paper, we present a trainable online decentralized planning algorithm based on decentralized Monte Carlo Tree Search, combined with models of te… ▽ More

    Submitted 10 November, 2020; v1 submitted 19 March, 2020; originally announced March 2020.

    Comments: Sole copyright holder is IJCAI, all rights reserved. Published version available online: https://doi.org/10.24963/ijcai.2020/12

    Journal ref: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, pages 81--88, 2020

  31. arXiv:2002.11963  [pdf, other

    cs.LG stat.ML

    Plannable Approximations to MDP Homomorphisms: Equivariance under Actions

    Authors: Elise van der Pol, Thomas Kipf, Frans A. Oliehoek, Max Welling

    Abstract: This work exploits action equivariance for representation learning in reinforcement learning. Equivariance under actions states that transitions in the input space are mirrored by equivalent transitions in latent space, while the map and transition functions should also commute. We introduce a contrastive loss function that enforces action equivariance on the learned representations. We prove that… ▽ More

    Submitted 27 February, 2020; originally announced February 2020.

    Comments: To appear in Proceedings of the International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS 2020)

  32. arXiv:1911.07643  [pdf, other

    cs.LG stat.ML

    Influence-aware Memory Architectures for Deep Reinforcement Learning

    Authors: Miguel Suau, **ke He, Elena Congeduti, Rolf A. N. Starre, Aleksander Czechowski, Frans A. Oliehoek

    Abstract: Due to its perceptual limitations, an agent may have too little information about the state of the environment to act optimally. In such cases, it is important to keep track of the observation history to uncover hidden state. Recent deep reinforcement learning methods use recurrent neural networks (RNN) to memorize past observations. However, these models are expensive to train and have convergenc… ▽ More

    Submitted 17 February, 2021; v1 submitted 18 November, 2019; originally announced November 2019.

  33. A Sufficient Statistic for Influence in Structured Multiagent Environments

    Authors: Frans A. Oliehoek, Stefan Witwicki, Leslie P. Kaelbling

    Abstract: Making decisions in complex environments is a key challenge in artificial intelligence (AI). Situations involving multiple decision makers are particularly complex, leading to computational intractability of principled solution methods. A body of work in AI has tried to mitigate this problem by trying to distill interaction to its essence: how does the policy of one agent influence another agent?… ▽ More

    Submitted 1 March, 2021; v1 submitted 22 July, 2019; originally announced July 2019.

    Journal ref: Journal of Artificial Intelligence Research, pp. 789-870, AI Access Foundation, Inc., February 2021

  34. Analysing Factorizations of Action-Value Networks for Cooperative Multi-Agent Reinforcement Learning

    Authors: Jacopo Castellini, Frans A. Oliehoek, Rahul Savani, Shimon Whiteson

    Abstract: Recent years have seen the application of deep reinforcement learning techniques to cooperative multi-agent systems, with great empirical success. However, given the lack of theoretical insight, it remains unclear what the employed neural networks are learning, or how we should enhance their learning power to address the problems on which they fail. In this work, we empirically investigate the lea… ▽ More

    Submitted 9 November, 2023; v1 submitted 20 February, 2019; originally announced February 2019.

    Comments: This work as been accepted as an Extended Abstract in Proc. of the 18th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2019), N. Agmon, M. E. Taylor, E. Elkind, M. Veloso (eds.), May 2019, Montreal, Canada

    ACM Class: I.2.6; I.2.11

    Journal ref: Auton Agent Multi-Agent Syst 35, 25 (2021)

  35. arXiv:1811.03516  [pdf, other

    cs.LG stat.ML

    Learning from Demonstration in the Wild

    Authors: Feryal Behbahani, Kyriacos Shiarlis, Xi Chen, Vitaly Kurin, Sudhanshu Kasewa, Ciprian Stirbu, João Gomes, Supratik Paul, Frans A. Oliehoek, João Messias, Shimon Whiteson

    Abstract: Learning from demonstration (LfD) is useful in settings where hand-coding behaviour or a reward function is impractical. It has succeeded in a wide range of problems but typically relies on manually generated demonstrations or specially deployed sensors and has not generally been able to leverage the copious demonstrations available in the wild: those that capture behaviours that were occurring an… ▽ More

    Submitted 25 March, 2019; v1 submitted 8 November, 2018; originally announced November 2018.

    Comments: Accepted to the IEEE International Conference on Robotics and Automation (ICRA) 2019; extended version with appendix

  36. arXiv:1806.07268  [pdf, other

    cs.LG cs.GT stat.ML

    Beyond Local Nash Equilibria for Adversarial Networks

    Authors: Frans A. Oliehoek, Rahul Savani, Jose Gallego, Elise van der Pol, Roderich Groß

    Abstract: Save for some special cases, current training methods for Generative Adversarial Networks (GANs) are at best guaranteed to converge to a `local Nash equilibrium` (LNE). Such LNEs, however, can be arbitrarily far from an actual Nash equilibrium (NE), which implies that there are no guarantees on the quality of the found generator or classifier. This paper proposes to model GANs explicitly as finite… ▽ More

    Submitted 26 July, 2018; v1 submitted 18 June, 2018; originally announced June 2018.

    Comments: Supersedes arXiv:1712.00679; v2 includes Fictitious GAN in the related work and refers to Danskin (1981)

    Journal ref: Published in Benelearn/BANIC 2018

  37. arXiv:1806.05631  [pdf, other

    cs.AI cs.LG

    Learning in POMDPs with Monte Carlo Tree Search

    Authors: Sammie Katt, Frans A. Oliehoek, Christopher Amato

    Abstract: The POMDP is a powerful framework for reasoning under outcome and information uncertainty, but constructing an accurate POMDP model is difficult. Bayes-Adaptive Partially Observable Markov Decision Processes (BA-POMDPs) extend POMDPs to allow the model to be learned during execution. BA-POMDPs are a Bayesian RL approach that, in principle, allows for an optimal trade-off between exploitation and e… ▽ More

    Submitted 14 June, 2018; originally announced June 2018.

    Journal ref: Proceedings of the 34th International Conference on Machine Learning, PMLR 70:1819-1827, 2017

  38. arXiv:1712.00679  [pdf, other

    stat.ML cs.GT cs.LG

    GANGs: Generative Adversarial Network Games

    Authors: Frans A. Oliehoek, Rahul Savani, Jose Gallego-Posada, Elise van der Pol, Edwin D. de Jong, Roderich Gross

    Abstract: Generative Adversarial Networks (GAN) have become one of the most successful frameworks for unsupervised generative modeling. As GANs are difficult to train much research has focused on this. However, very little of this research has directly exploited game-theoretic techniques. We introduce Generative Adversarial Network Games (GANGs), which explicitly model a finite zero-sum game between a gener… ▽ More

    Submitted 17 December, 2017; v1 submitted 2 December, 2017; originally announced December 2017.

    Comments: 9 pages, 5 figures

  39. arXiv:1704.06549  [pdf, other

    cs.CY

    LiftUpp: Support to develop learner performance

    Authors: Frans A. Oliehoek, Rahul Savani, Elliot Adderton, Xia Cui, David Jackson, Phil Jimmieson, John Christopher Jones, Keith Kennedy, Ben Mason, Adam Plumbley, Luke Dawson

    Abstract: Various motivations exist to move away from the simple assessment of knowledge towards the more complex assessment and development of competence. However, to accommodate such a change, high demands are put on the supporting e-infrastructure in terms of intelligently collecting and analysing data. In this paper, we discuss these challenges and how they are being addressed by LiftUpp, a system that… ▽ More

    Submitted 21 April, 2017; originally announced April 2017.

    Comments: Short 4-page version to appear at AIED 2017

  40. arXiv:1606.06888  [pdf, ps, other

    cs.AI cs.GT

    Structure in the Value Function of Two-Player Zero-Sum Games of Incomplete Information

    Authors: Auke J. Wiggers, Frans A. Oliehoek, Diederik M. Roijers

    Abstract: Zero-sum stochastic games provide a rich model for competitive decision making. However, under general forms of state uncertainty as considered in the Partially Observable Stochastic Game (POSG), such decision making problems are still not very well understood. This paper makes a contribution to the theory of zero-sum POSGs by characterizing structure in their value function. In particular, we int… ▽ More

    Submitted 22 June, 2016; originally announced June 2016.

  41. arXiv:1602.07860  [pdf, other

    cs.AI cs.LG stat.ML

    Probably Approximately Correct Greedy Maximization with Efficient Bounds on Information Gain for Sensor Selection

    Authors: Yash Satsangi, Shimon Whiteson, Frans A. Oliehoek

    Abstract: Submodular function maximization finds application in a variety of real-world decision-making problems. However, most existing methods, based on greedy maximization, assume it is computationally feasible to evaluate F, the function being maximized. Unfortunately, in many realistic settings F is too expensive to evaluate exactly even once. We present probably approximately correct greedy maximizati… ▽ More

    Submitted 10 August, 2020; v1 submitted 25 February, 2016; originally announced February 2016.

  42. arXiv:1511.09147  [pdf, other

    cs.AI

    Scaling POMDPs For Selecting Sellers in E-markets-Extended Version

    Authors: Athirai A. Irissappane, Frans A. Oliehoek, Jie Zhang

    Abstract: In multiagent e-marketplaces, buying agents need to select good sellers by querying other buyers (called advisors). Partially Observable Markov Decision Processes (POMDPs) have shown to be an effective framework for optimally selecting sellers by selectively querying advisors. However, current solution methods do not scale to hundreds or even tens of agents operating in the e-market. In this paper… ▽ More

    Submitted 9 December, 2015; v1 submitted 29 November, 2015; originally announced November 2015.

  43. arXiv:1511.09080  [pdf, other

    cs.AI cs.MA

    Exploiting Anonymity in Approximate Linear Programming: Scaling to Large Multiagent MDPs (Extended Version)

    Authors: Philipp Robbel, Frans A. Oliehoek, Mykel J. Kochenderfer

    Abstract: Many exact and approximate solution methods for Markov Decision Processes (MDPs) attempt to exploit structure in the problem and are based on factorization of the value function. Especially multiagent settings, however, are known to suffer from an exponential increase in value component sizes as interactions become denser, meaning that approximation architectures are restricted in the problem size… ▽ More

    Submitted 20 February, 2016; v1 submitted 29 November, 2015; originally announced November 2015.

    Comments: Extended version of AAAI 2016 paper

  44. arXiv:1511.09047  [pdf, other

    cs.AI cs.MA

    Solving Transition-Independent Multi-agent MDPs with Sparse Interactions (Extended version)

    Authors: Joris Scharpff, Diederik M. Roijers, Frans A. Oliehoek, Matthijs T. J. Spaan, Mathijs M. de Weerdt

    Abstract: In cooperative multi-agent sequential decision making under uncertainty, agents must coordinate to find an optimal joint policy that maximises joint value. Typical algorithms exploit additive structure in the value function, but in the fully-observable multi-agent MDP setting (MMDP) such structure is not present. We propose a new optimal solver for transition-independent MMDPs, in which agents can… ▽ More

    Submitted 11 February, 2016; v1 submitted 29 November, 2015; originally announced November 2015.

    Comments: This article is an extended version of the paper that was published under the same title in the Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI16), held in Phoenix, Arizona USA on February 12-17, 2016

  45. arXiv:1502.05443  [pdf, other

    cs.AI eess.SY

    Influence-Optimistic Local Values for Multiagent Planning --- Extended Version

    Authors: Frans A. Oliehoek, Matthijs T. J. Spaan, Stefan Witwicki

    Abstract: Recent years have seen the development of methods for multiagent planning under uncertainty that scale to tens or even hundreds of agents. However, most of these methods either make restrictive assumptions on the problem domain, or provide approximate solutions without any guarantees on quality. Methods in the former category typically build on heuristic search using upper bounds on the value func… ▽ More

    Submitted 20 July, 2015; v1 submitted 18 February, 2015; originally announced February 2015.

    Comments: Long version of IJCAI 2015 paper (and extended abstract at AAMAS 2015)

    ACM Class: I.2.11

  46. arXiv:1404.1140  [pdf, other

    cs.AI cs.LG

    Scalable Planning and Learning for Multiagent POMDPs: Extended Version

    Authors: Christopher Amato, Frans A. Oliehoek

    Abstract: Online, sample-based planning algorithms for POMDPs have shown great promise in scaling to problems with large state spaces, but they become intractable for large action and observation spaces. This is particularly problematic in multiagent POMDPs where the action and observation space grows exponentially with the number of agents. To combat this intractability, we propose a novel scalable approac… ▽ More

    Submitted 19 December, 2014; v1 submitted 3 April, 2014; originally announced April 2014.

  47. Incremental Clustering and Expansion for Faster Optimal Planning in Dec-POMDPs

    Authors: Frans Adriaan Oliehoek, Matthijs T. J. Spaan, Christopher Amato, Shimon Whiteson

    Abstract: This article presents the state-of-the-art in optimal solution methods for decentralized partially observable Markov decision processes (Dec-POMDPs), which are general models for collaborative multiagent planning under uncertainty. Building off the generalized multiagent A* (GMAA*) algorithm, which reduces the problem to a tree of one-shot collaborative Bayesian games (CBGs), we describe several a… ▽ More

    Submitted 3 February, 2014; originally announced February 2014.

    Journal ref: Journal Of Artificial Intelligence Research, Volume 46, pages 449-509, 2013

  48. arXiv:1210.4886  [pdf

    cs.GT cs.AI

    Exploiting Structure in Cooperative Bayesian Games

    Authors: Frans A. Oliehoek, Shimon Whiteson, Matthijs T. J. Spaan

    Abstract: Cooperative Bayesian games (BGs) can model decision-making problems for teams of agents under imperfect information, but require space and computation time that is exponential in the number of agents. While agent independence has been used to mitigate these problems in perfect information settings, we propose a novel approach for BGs based on the observation that BGs additionally possess a differe… ▽ More

    Submitted 16 October, 2012; originally announced October 2012.

    Comments: Appears in Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence (UAI2012)

    Report number: UAI-P-2012-PG-654-665

  49. Optimal and Approximate Q-value Functions for Decentralized POMDPs

    Authors: Frans A. Oliehoek, Matthijs T. J. Spaan, Nikos Vlassis

    Abstract: Decision-theoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In single-agent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Q-value functions: an optimal Q-value function Q* is computed in a recursive manner by dynamic programming, and then an optimal policy is extrac… ▽ More

    Submitted 31 October, 2011; originally announced November 2011.

    Journal ref: Journal Of Artificial Intelligence Research, Volume 32, pages 289-353, 2008

  50. arXiv:1108.0404  [pdf, ps, other

    cs.AI cs.GT

    Exploiting Agent and Type Independence in Collaborative Graphical Bayesian Games

    Authors: Frans A. Oliehoek, Shimon Whiteson, Matthijs T. J. Spaan

    Abstract: Efficient collaborative decision making is an important challenge for multiagent systems. Finding optimal joint actions is especially challenging when each agent has only imperfect information about the state of its environment. Such problems can be modeled as collaborative Bayesian games in which each agent receives private information in the form of its type. However, representing and solving su… ▽ More

    Submitted 25 April, 2014; v1 submitted 1 August, 2011; originally announced August 2011.