Skip to main content

Showing 1–50 of 58 results for author: Hofmann, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.03575  [pdf, other

    cs.AI cs.HC

    Toward Human-AI Alignment in Large-Scale Multi-Player Games

    Authors: Sugandha Sharma, Guy Davidson, Khimya Khetarpal, Anssi Kanervisto, Udit Arora, Katja Hofmann, Ida Momennejad

    Abstract: Achieving human-AI alignment in complex multi-agent games is crucial for creating trustworthy AI agents that enhance gameplay. We propose a method to evaluate this alignment using an interpretable task-sets framework, focusing on high-level behavioral tasks instead of low-level policies. Our approach has three components. First, we analyze extensive human gameplay data from Xbox's Bleeding Edge (1… ▽ More

    Submitted 18 June, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

  2. arXiv:2401.01855  [pdf, other

    cs.LG

    Transformer Neural Autoregressive Flows

    Authors: Massimiliano Patacchiola, Aliaksandra Shysheya, Katja Hofmann, Richard E. Turner

    Abstract: Density estimation, a central problem in machine learning, can be performed using Normalizing Flows (NFs). NFs comprise a sequence of invertible transformations, that turn a complex target distribution into a simple one, by exploiting the change of variables theorem. Neural Autoregressive Flows (NAFs) and Block Neural Autoregressive Flows (B-NAFs) are arguably the most perfomant members of the NF… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

  3. arXiv:2306.13554  [pdf, other

    cs.LG cs.AI

    Comparing the Efficacy of Fine-Tuning and Meta-Learning for Few-Shot Policy Imitation

    Authors: Massimiliano Patacchiola, Mingfei Sun, Katja Hofmann, Richard E. Turner

    Abstract: In this paper we explore few-shot imitation learning for control problems, which involves learning to imitate a target policy by accessing a limited set of offline rollouts. This setting has been relatively under-explored despite its relevance to robotics and control applications. State-of-the-art methods developed to tackle few-shot imitation rely on meta-learning, which is expensive to train as… ▽ More

    Submitted 23 June, 2023; originally announced June 2023.

  4. arXiv:2305.16147  [pdf, other

    cs.LG cs.AI stat.ML

    Learning Safety Constraints from Demonstrations with Unknown Rewards

    Authors: David Lindner, Xin Chen, Sebastian Tschiatschek, Katja Hofmann, Andreas Krause

    Abstract: We propose Convex Constraint Learning for Reinforcement Learning (CoCoRL), a novel approach for inferring shared constraints in a Constrained Markov Decision Process (CMDP) from a set of safe demonstrations with possibly different reward functions. While previous work is limited to demonstrations with known rewards or fully known environment dynamics, CoCoRL can learn constraints from demonstratio… ▽ More

    Submitted 1 March, 2024; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: Presented at the International Conference on Artificial Intelligence and Statistics (AISTATS) 2024

  5. A Checklist to Publish Collections as Data in GLAM Institutions

    Authors: Gustavo Candela, Nele Gabriëls, Sally Chambers, Thuy-An Pham, Sarah Ames, Neil Fitzgerald, Katrine Hofmann, Victor Harbo, Abigail Potter, Meghan Ferriter, Eileen Manchester, Alba Irollo, Ellen Van Keer, Mahendra Mahey, Olga Holownia, Milena Dobreva

    Abstract: Large-scale digitization in Galleries, Libraries, Archives and Museums (GLAM) created the conditions for providing access to collections as data. It opened new opportunities to explore, use and reuse digital collections. Strong proponents of collections as data are the Innovation Labs which provided numerous examples of publishing datasets under open licenses in order to reuse digital content in n… ▽ More

    Submitted 13 November, 2023; v1 submitted 5 April, 2023; originally announced April 2023.

    Comments: This is an original manuscript of an article published by Emerald Publishing Limited in Global Knowledge, Memory and Communication on 9 November 2023, available online: https://doi.org/10.1108/GKMC-06-2023-0195

  6. arXiv:2303.02160  [pdf, other

    cs.HC cs.LG cs.RO

    Navigates Like Me: Understanding How People Evaluate Human-Like AI in Video Games

    Authors: Stephanie Milani, Arthur Juliani, Ida Momennejad, Raluca Georgescu, Jaroslaw Rzpecki, Alison Shaw, Gavin Costello, Fei Fang, Sam Devlin, Katja Hofmann

    Abstract: We aim to understand how people assess human likeness in navigation produced by people and artificially intelligent (AI) agents in a video game. To this end, we propose a novel AI agent with the goal of generating more human-like behavior. We collect hundreds of crowd-sourced assessments comparing the human-likeness of navigation behavior generated by our agent and baseline AI agents with human-ge… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

    Comments: 18 pages; accepted at CHI 2023

  7. arXiv:2302.07985  [pdf, other

    cs.LG cs.AI

    Trust-Region-Free Policy Optimization for Stochastic Policies

    Authors: Mingfei Sun, Benjamin Ellis, Anuj Mahajan, Sam Devlin, Katja Hofmann, Shimon Whiteson

    Abstract: Trust Region Policy Optimization (TRPO) is an iterative method that simultaneously maximizes a surrogate objective and enforces a trust region constraint over consecutive policies in each iteration. The combination of the surrogate objective maximization and the trust region enforcement has been shown to be crucial to guarantee a monotonic policy improvement. However, solving a trust-region-constr… ▽ More

    Submitted 15 February, 2023; originally announced February 2023.

    Comments: RLDM 2022

  8. arXiv:2301.10677  [pdf, other

    cs.AI cs.LG stat.ML

    Imitating Human Behaviour with Diffusion Models

    Authors: Tim Pearce, Tabish Rashid, Anssi Kanervisto, Dave Bignell, Mingfei Sun, Raluca Georgescu, Sergio Valcarcel Macua, Shan Zheng Tan, Ida Momennejad, Katja Hofmann, Sam Devlin

    Abstract: Diffusion models have emerged as powerful generative models in the text-to-image domain. This paper studies their application as observation-to-action models for imitating human behaviour in sequential environments. Human behaviour is stochastic and multimodal, with structured correlations between action dimensions. Meanwhile, standard modelling choices in behaviour cloning are limited in their ex… ▽ More

    Submitted 3 March, 2023; v1 submitted 25 January, 2023; originally announced January 2023.

    Comments: Published in ICLR 2023

    Journal ref: ICLR 2023

  9. arXiv:2211.10869  [pdf, other

    cs.LG

    UniMASK: Unified Inference in Sequential Decision Problems

    Authors: Micah Carroll, Orr Paradise, Jessy Lin, Raluca Georgescu, Mingfei Sun, David Bignell, Stephanie Milani, Katja Hofmann, Matthew Hausknecht, Anca Dragan, Sam Devlin

    Abstract: Randomly masking and predicting word tokens has been a successful approach in pre-training language models for a variety of downstream tasks. In this work, we observe that the same idea also applies naturally to sequential decision-making, where many well-studied tasks like behavior cloning, offline reinforcement learning, inverse dynamics, and waypoint conditioning correspond to different sequenc… ▽ More

    Submitted 19 November, 2022; originally announced November 2022.

    Comments: NeurIPS 2022 (Oral). A prior version was published at an ICML Workshop, available at arXiv:2204.13326

  10. arXiv:2206.09843  [pdf, other

    cs.CV cs.LG

    Contextual Squeeze-and-Excitation for Efficient Few-Shot Image Classification

    Authors: Massimiliano Patacchiola, John Bronskill, Aliaksandra Shysheya, Katja Hofmann, Sebastian Nowozin, Richard E. Turner

    Abstract: Recent years have seen a growth in user-centric applications that require effective knowledge transfer across tasks in the low-data regime. An example is personalization, where a pretrained system is adapted by learning on small amounts of labeled data belonging to a specific user. This setting requires high accuracy under low computational complexity, therefore the Pareto frontier of accuracy vs.… ▽ More

    Submitted 11 January, 2023; v1 submitted 20 June, 2022; originally announced June 2022.

    Comments: Advances in Neural Information Processing Systems (NeurIPS 2022)

  11. arXiv:2206.05255  [pdf, other

    cs.LG cs.AI stat.ML

    Interactively Learning Preference Constraints in Linear Bandits

    Authors: David Lindner, Sebastian Tschiatschek, Katja Hofmann, Andreas Krause

    Abstract: We study sequential decision-making with known rewards and unknown constraints, motivated by situations where the constraints represent expensive-to-evaluate human preferences, such as safe and comfortable driving behavior. We formalize the challenge of interactively learning about these constraints as a novel linear bandit problem which we call constrained linear best-arm identification. To solve… ▽ More

    Submitted 10 June, 2022; originally announced June 2022.

    Comments: Accepted to International Conference on Machine Learning (ICML), 2022

  12. arXiv:2205.02388  [pdf, other

    cs.CL cs.AI

    Interactive Grounded Language Understanding in a Collaborative Environment: IGLU 2021

    Authors: Julia Kiseleva, Ziming Li, Mohammad Aliannejadi, Shrestha Mohanty, Maartje ter Hoeve, Mikhail Burtsev, Alexey Skrynnik, Artem Zholus, Aleksandr Panov, Kavya Srinet, Arthur Szlam, Yuxuan Sun, Marc-Alexandre Côté, Katja Hofmann, Ahmed Awadallah, Linar Abdrazakov, Igor Churin, Putra Manggala, Kata Naszadi, Michiel van der Meer, Taewoon Kim

    Abstract: Human intelligence has the remarkable ability to quickly adapt to new tasks and environments. Starting from a very young age, humans acquire new skills and learn how to solve new tasks either by imitating the behavior of others or by following provided natural language instructions. To facilitate research in this direction, we propose \emph{IGLU: Interactive Grounded Language Understanding in a Co… ▽ More

    Submitted 27 May, 2022; v1 submitted 4 May, 2022; originally announced May 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2110.06536

    Journal ref: Proceedings of Machine Learning Research NeurIPS 2021 Competition and Demonstration Track

  13. arXiv:2204.13326  [pdf, other

    cs.LG

    Towards Flexible Inference in Sequential Decision Problems via Bidirectional Transformers

    Authors: Micah Carroll, Jessy Lin, Orr Paradise, Raluca Georgescu, Mingfei Sun, David Bignell, Stephanie Milani, Katja Hofmann, Matthew Hausknecht, Anca Dragan, Sam Devlin

    Abstract: Randomly masking and predicting word tokens has been a successful approach in pre-training language models for a variety of downstream tasks. In this work, we observe that the same idea also applies naturally to sequential decision making, where many well-studied tasks like behavior cloning, offline RL, inverse dynamics, and waypoint conditioning correspond to different sequence maskings over a se… ▽ More

    Submitted 9 December, 2022; v1 submitted 28 April, 2022; originally announced April 2022.

    Comments: Superseded by arXiv:2211.10869

  14. arXiv:2202.00082  [pdf, other

    cs.LG

    Trust Region Bounds for Decentralized PPO Under Non-stationarity

    Authors: Mingfei Sun, Sam Devlin, Jacob Beck, Katja Hofmann, Shimon Whiteson

    Abstract: We present trust region bounds for optimizing decentralized policies in cooperative Multi-Agent Reinforcement Learning (MARL), which holds even when the transition dynamics are non-stationary. This new analysis provides a theoretical understanding of the strong performance of two recent actor-critic methods for MARL, which both rely on independent ratios, i.e., computing probability ratios separat… ▽ More

    Submitted 15 February, 2023; v1 submitted 31 January, 2022; originally announced February 2022.

    Comments: AAMAS 2023

  15. arXiv:2202.00079  [pdf, other

    cs.LG cs.AI

    You May Not Need Ratio Clip** in PPO

    Authors: Mingfei Sun, Vitaly Kurin, Guoqing Liu, Sam Devlin, Tao Qin, Katja Hofmann, Shimon Whiteson

    Abstract: Proximal Policy Optimization (PPO) methods learn a policy by iteratively performing multiple mini-batch optimization epochs of a surrogate objective with one set of sampled data. Ratio clip** PPO is a popular variant that clips the probability ratios between the target policy and the policy used to collect samples. Ratio clip** yields a pessimistic estimate of the original surrogate objective,… ▽ More

    Submitted 31 January, 2022; originally announced February 2022.

  16. arXiv:2112.06054  [pdf, other

    cs.LG

    Deterministic and Discriminative Imitation (D2-Imitation): Revisiting Adversarial Imitation for Sample Efficiency

    Authors: Mingfei Sun, Sam Devlin, Katja Hofmann, Shimon Whiteson

    Abstract: Sample efficiency is crucial for imitation learning methods to be applicable in real-world applications. Many studies improve sample efficiency by extending adversarial imitation to be off-policy regardless of the fact that these off-policy extensions could either change the original objective or involve complicated optimization. We revisit the foundation of adversarial imitation and propose an of… ▽ More

    Submitted 13 April, 2022; v1 submitted 11 December, 2021; originally announced December 2021.

    Comments: AAAI 2022

  17. arXiv:2110.06536  [pdf, other

    cs.AI

    NeurIPS 2021 Competition IGLU: Interactive Grounded Language Understanding in a Collaborative Environment

    Authors: Julia Kiseleva, Ziming Li, Mohammad Aliannejadi, Shrestha Mohanty, Maartje ter Hoeve, Mikhail Burtsev, Alexey Skrynnik, Artem Zholus, Aleksandr Panov, Kavya Srinet, Arthur Szlam, Yuxuan Sun, Katja Hofmann, Michel Galley, Ahmed Awadallah

    Abstract: Human intelligence has the remarkable ability to adapt to new tasks and environments quickly. Starting from a very young age, humans acquire new skills and learn how to solve new tasks either by imitating the behavior of others or by following provided natural language instructions. To facilitate research in this direction, we propose IGLU: Interactive Grounded Language Understanding in a Collabor… ▽ More

    Submitted 14 October, 2021; v1 submitted 13 October, 2021; originally announced October 2021.

  18. arXiv:2107.14698  [pdf, other

    cs.LG cs.AI cs.MA

    Strategically Efficient Exploration in Competitive Multi-agent Reinforcement Learning

    Authors: Robert Loftin, Aadirupa Saha, Sam Devlin, Katja Hofmann

    Abstract: High sample complexity remains a barrier to the application of reinforcement learning (RL), particularly in multi-agent systems. A large body of work has demonstrated that exploration mechanisms based on the principle of optimism under uncertainty can significantly improve the sample efficiency of RL in single agent tasks. This work seeks to understand the role of optimistic exploration in non-coo… ▽ More

    Submitted 30 July, 2021; originally announced July 2021.

    Comments: To Appear in Uncertainty in Artificial Intelligence (UAI) 2021. 10 figures, 14 pages

    MSC Class: 68T05 ACM Class: I.2.6

  19. arXiv:2107.01105  [pdf, other

    stat.ML cs.LG

    Memory Efficient Meta-Learning with Large Images

    Authors: John Bronskill, Daniela Massiceti, Massimiliano Patacchiola, Katja Hofmann, Sebastian Nowozin, Richard E. Turner

    Abstract: Meta learning approaches to few-shot classification are computationally efficient at test time, requiring just a few optimization steps or single forward pass to learn a new task, but they remain highly memory-intensive to train. This limitation arises because a task's entire support set, which can contain up to 1000 images, must be processed before an optimization step can be taken. Harnessing th… ▽ More

    Submitted 26 October, 2021; v1 submitted 2 July, 2021; originally announced July 2021.

    Journal ref: 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

  20. arXiv:2107.00956  [pdf, other

    cs.LG cs.AI cs.CL

    SocialAI: Benchmarking Socio-Cognitive Abilities in Deep Reinforcement Learning Agents

    Authors: Grgur Kovač, Rémy Portelas, Katja Hofmann, Pierre-Yves Oudeyer

    Abstract: Building embodied autonomous agents capable of participating in social interactions with humans is one of the main challenges in AI. Within the Deep Reinforcement Learning (DRL) field, this objective motivated multiple works on embodied language use. However, current approaches focus on language as a communication tool in very simplified and non-diverse social situations: the "naturalness" of lang… ▽ More

    Submitted 1 September, 2021; v1 submitted 2 July, 2021; originally announced July 2021.

    Comments: under review. This paper extends and generalizes work in arXiv:2104.13207

  21. arXiv:2106.08858  [pdf, other

    cs.AI cs.CL cs.LG

    Grounding Spatio-Temporal Language with Transformers

    Authors: Tristan Karch, Laetitia Teodorescu, Katja Hofmann, Clément Moulin-Frier, Pierre-Yves Oudeyer

    Abstract: Language is an interface to the outside world. In order for embodied agents to use it, language must be grounded in other, sensorimotor modalities. While there is an extended literature studying how machines can learn grounded language, the topic of how to learn spatio-temporal linguistic concepts is still largely uncharted. To make progress in this direction, we here introduce a novel spatio-temp… ▽ More

    Submitted 11 October, 2021; v1 submitted 16 June, 2021; originally announced June 2021.

    Comments: Contains main article and supplementaries

    Journal ref: Neurips 2021

  22. arXiv:2106.03155  [pdf, other

    cs.LG cs.AI

    SoftDICE for Imitation Learning: Rethinking Off-policy Distribution Matching

    Authors: Mingfei Sun, Anuj Mahajan, Katja Hofmann, Shimon Whiteson

    Abstract: We present SoftDICE, which achieves state-of-the-art performance for imitation learning. SoftDICE fixes several key problems in ValueDICE, an off-policy distribution matching approach for sample-efficient imitation learning. Specifically, the objective of ValueDICE contains logarithms and exponentials of expectations, for which the mini-batch gradient estimate is always biased. Second, ValueDICE r… ▽ More

    Submitted 6 June, 2021; originally announced June 2021.

  23. arXiv:2105.09637  [pdf, other

    cs.AI cs.LG

    Navigation Turing Test (NTT): Learning to Evaluate Human-Like Navigation

    Authors: Sam Devlin, Raluca Georgescu, Ida Momennejad, Jaroslaw Rzepecki, Evelyn Zuniga, Gavin Costello, Guy Leroy, Ali Shaw, Katja Hofmann

    Abstract: A key challenge on the path to develo** agents that learn complex human-like behavior is the need to quickly and accurately quantify human-likeness. While human assessments of such behavior can be highly accurate, speed and scalability are limited. We address these limitations through a novel automated Navigation Turing Test (ANTT) that learns to predict human judgments of human-likeness. We dem… ▽ More

    Submitted 28 July, 2021; v1 submitted 20 May, 2021; originally announced May 2021.

    Comments: All data collected throughout this study, plus the code to reproduce our analysis and ANTT are available at https://github.com/microsoft/NTT

    Journal ref: Proceedings of the 38th International Conference on Machine Learning (ICML), 139:2644-2653, 2021

  24. arXiv:2105.08187  [pdf, other

    cs.MA

    Learning to Win, Lose and Cooperate through Reward Signal Evolution

    Authors: Rafal Muszynski, Katja Hofmann, Jun Wang

    Abstract: Solving a reinforcement learning problem typically involves correctly prespecifying the reward signal from which the algorithm learns. Here, we approach the problem of reward signal design by using an evolutionary approach to perform a search on the space of all possible reward signals. We introduce a general framework for optimizing $N$ goals given $n$ reward signals. Through experiments we demon… ▽ More

    Submitted 17 May, 2021; originally announced May 2021.

    Comments: 7 pages, 4 figures

  25. arXiv:2104.13207  [pdf, other

    cs.LG cs.AI

    SocialAI 0.1: Towards a Benchmark to Stimulate Research on Socio-Cognitive Abilities in Deep Reinforcement Learning Agents

    Authors: Grgur Kovač, Rémy Portelas, Katja Hofmann, Pierre-Yves Oudeyer

    Abstract: Building embodied autonomous agents capable of participating in social interactions with humans is one of the main challenges in AI. This problem motivated many research directions on embodied language use. Current approaches focus on language as a communication tool in very simplified and non diverse social situations: the "naturalness" of language is reduced to the concept of high vocabulary siz… ▽ More

    Submitted 27 April, 2021; originally announced April 2021.

    Comments: Accepted at NAACL ViGIL Workshop 2021

  26. ORBIT: A Real-World Few-Shot Dataset for Teachable Object Recognition

    Authors: Daniela Massiceti, Luisa Zintgraf, John Bronskill, Lida Theodorou, Matthew Tobias Harris, Edward Cutrell, Cecily Morrison, Katja Hofmann, Simone Stumpf

    Abstract: Object recognition has made great advances in the last decade, but predominately still relies on many high-quality training examples per object category. In contrast, learning new objects from only a few examples could enable many impactful applications from robotics to user personalization. Most few-shot learning research, however, has been driven by benchmark datasets that lack the high variatio… ▽ More

    Submitted 8 October, 2021; v1 submitted 8 April, 2021; originally announced April 2021.

    Comments: IEEE/CVF International Conference on Computer Vision (ICCV), 2021

  27. arXiv:2103.09815  [pdf, other

    cs.LG

    TeachMyAgent: a Benchmark for Automatic Curriculum Learning in Deep RL

    Authors: Clément Romac, Rémy Portelas, Katja Hofmann, Pierre-Yves Oudeyer

    Abstract: Training autonomous agents able to generalize to multiple tasks is a key target of Deep Reinforcement Learning (DRL) research. In parallel to improving DRL algorithms themselves, Automatic Curriculum Learning (ACL) study how teacher algorithms can train DRL agents more efficiently by adapting task selection to their evolving abilities. While multiple standard benchmarks exist to compare DRL agents… ▽ More

    Submitted 9 June, 2021; v1 submitted 17 March, 2021; originally announced March 2021.

  28. arXiv:2101.05507  [pdf, other

    cs.LG cs.AI cs.HC cs.MA

    Evaluating the Robustness of Collaborative Agents

    Authors: Paul Knott, Micah Carroll, Sam Devlin, Kamil Ciosek, Katja Hofmann, A. D. Dragan, Rohin Shah

    Abstract: In order for agents trained by deep reinforcement learning to work alongside humans in realistic settings, we will need to ensure that the agents are \emph{robust}. Since the real world is very diverse, and human behavior often changes in response to agent deployment, the agent will likely encounter novel situations that have never been seen during training. This results in an evaluation challenge… ▽ More

    Submitted 14 January, 2021; originally announced January 2021.

  29. arXiv:2101.03864  [pdf, other

    cs.LG cs.MA

    Deep Interactive Bayesian Reinforcement Learning via Meta-Learning

    Authors: Luisa Zintgraf, Sam Devlin, Kamil Ciosek, Shimon Whiteson, Katja Hofmann

    Abstract: Agents that interact with other agents often do not know a priori what the other agents' strategies are, but have to maximise their own online return while interacting with and learning about others. The optimal adaptive behaviour under uncertainty over the other agents' strategies w.r.t. some prior can in principle be computed using the Interactive Bayesian Reinforcement Learning framework. Unfor… ▽ More

    Submitted 15 April, 2022; v1 submitted 11 January, 2021; originally announced January 2021.

    Comments: Published as an extended abstract at AAMAS 2021

  30. arXiv:2011.08463  [pdf, other

    cs.LG cs.AI

    Meta Automatic Curriculum Learning

    Authors: Rémy Portelas, Clément Romac, Katja Hofmann, Pierre-Yves Oudeyer

    Abstract: A major challenge in the Deep RL (DRL) community is to train agents able to generalize their control policy over situations never seen in training. Training on diverse tasks has been identified as a key ingredient for good generalization, which pushed researchers towards using rich procedural task generation systems controlled through complex continuous parameter spaces. In such complex task space… ▽ More

    Submitted 1 September, 2021; v1 submitted 16 November, 2020; originally announced November 2020.

    Comments: This paper extends and generalizes work in arXiv:2004.03168

  31. arXiv:2010.01062  [pdf, other

    cs.LG cs.AI stat.ML

    Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning

    Authors: Luisa Zintgraf, Leo Feng, Cong Lu, Maximilian Igl, Kristian Hartikainen, Katja Hofmann, Shimon Whiteson

    Abstract: To rapidly learn a new task, it is often essential for agents to explore efficiently -- especially when performance matters from the first timestep. One way to learn such behaviour is via meta-learning. Many existing methods however rely on dense rewards for meta-training, and can fail catastrophically if the rewards are sparse. Without a suitable reward signal, the need for exploration during met… ▽ More

    Submitted 9 June, 2021; v1 submitted 2 October, 2020; originally announced October 2020.

    Comments: Published at the International Conference on Machine Learning (ICML) 2021

  32. arXiv:2009.00541  [pdf, other

    cs.AI

    "It's Unwieldy and It Takes a Lot of Time." Challenges and Opportunities for Creating Agents in Commercial Games

    Authors: Mikhail Jacob, Sam Devlin, Katja Hofmann

    Abstract: Game agents such as opponents, non-player characters, and teammates are central to player experiences in many modern games. As the landscape of AI techniques used in the games industry evolves to adopt machine learning (ML) more widely, it is vital that the research community learn from the best practices cultivated within the industry over decades creating agents. However, although commercial gam… ▽ More

    Submitted 1 September, 2020; originally announced September 2020.

    Comments: 7 pages, 3 figures, to be published in the 16th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE-20)

  33. arXiv:2006.08718  [pdf, other

    cs.LG cs.RO stat.ML

    Analytic Manifold Learning: Unifying and Evaluating Representations for Continuous Control

    Authors: Rika Antonova, Maksim Maydanskiy, Danica Kragic, Sam Devlin, Katja Hofmann

    Abstract: We address the problem of learning reusable state representations from streaming high-dimensional observations. This is important for areas like Reinforcement Learning (RL), which yields non-stationary data distributions during training. We make two key contributions. First, we propose an evaluation suite that measures alignment between latent and true low-dimensional states. We benchmark several… ▽ More

    Submitted 6 October, 2020; v1 submitted 15 June, 2020; originally announced June 2020.

    Comments: Added Section 4: "Imposing AML Relations During Transfer"; expanded description of experiments in Section 5: "Evaluating AML and Latent Space Transfer"

  34. arXiv:2005.06041  [pdf, ps, other

    cs.LG stat.ML

    Guaranteeing Reproducibility in Deep Learning Competitions

    Authors: Brandon Houghton, Stephanie Milani, Nicholay Topin, William Guss, Katja Hofmann, Diego Perez-Liebana, Manuela Veloso, Ruslan Salakhutdinov

    Abstract: To encourage the development of methods with reproducible and robust training behavior, we propose a challenge paradigm where competitors are evaluated directly on the performance of their learning procedures rather than pre-trained agents. Since competition organizers re-train proposed methods in a controlled setting they can guarantee reproducibility, and -- by retraining submissions using a hel… ▽ More

    Submitted 12 May, 2020; originally announced May 2020.

    Comments: Accepted as a poster presentation to the 2019 NeruIPS Challenges in Machine Learning workshop (CiML)

  35. arXiv:2004.04546  [pdf, other

    cs.LG cs.CV stat.ML

    SpatialSim: Recognizing Spatial Configurations of Objects with Graph Neural Networks

    Authors: Laetitia Teodorescu, Katja Hofmann, Pierre-Yves Oudeyer

    Abstract: Recognizing precise geometrical configurations of groups of objects is a key capability of human spatial cognition, yet little studied in the deep learning literature so far. In particular, a fundamental problem is how a machine can learn and compare classes of geometric spatial configurations that are invariant to the point of view of an external observer. In this paper we make two key contributi… ▽ More

    Submitted 16 July, 2020; v1 submitted 9 April, 2020; originally announced April 2020.

  36. arXiv:2004.03168  [pdf, other

    cs.LG cs.AI stat.ML

    Trying AGAIN instead of Trying Longer: Prior Learning for Automatic Curriculum Learning

    Authors: Rémy Portelas, Katja Hofmann, Pierre-Yves Oudeyer

    Abstract: A major challenge in the Deep RL (DRL) community is to train agents able to generalize over unseen situations, which is often approached by training them on a diversity of tasks (or environments). A powerful method to foster diversity is to procedurally generate tasks by sampling their parameters from a multi-dimensional distribution, enabling in particular to propose a different task for each tra… ▽ More

    Submitted 7 April, 2020; originally announced April 2020.

    Comments: Accepted to the ICLR 2020 workshop Beyond tabula rasa in RL (BeTR-RL)

  37. arXiv:2003.04664  [pdf, other

    cs.LG cs.AI stat.ML

    Automatic Curriculum Learning For Deep RL: A Short Survey

    Authors: Rémy Portelas, Cédric Colas, Lilian Weng, Katja Hofmann, Pierre-Yves Oudeyer

    Abstract: Automatic Curriculum Learning (ACL) has become a cornerstone of recent successes in Deep Reinforcement Learning (DRL).These methods shape the learning trajectories of agents by challenging them with tasks adapted to their capacities. In recent years, they have been used to improve sample efficiency and asymptotic performance, to organize exploration, to encourage generalization or to solve sparse… ▽ More

    Submitted 28 May, 2020; v1 submitted 10 March, 2020; originally announced March 2020.

    Comments: Accepted at IJCAI2020

  38. arXiv:1910.12911  [pdf, other

    cs.LG cs.AI stat.ML

    Generalization in Reinforcement Learning with Selective Noise Injection and Information Bottleneck

    Authors: Maximilian Igl, Kamil Ciosek, Yingzhen Li, Sebastian Tschiatschek, Cheng Zhang, Sam Devlin, Katja Hofmann

    Abstract: The ability for policies to generalize to new environments is key to the broad application of RL agents. A promising approach to prevent an agent's policy from overfitting to a limited set of training environments is to apply regularization techniques originally developed for supervised learning. However, there are stark differences between supervised learning and RL. We discuss those differences… ▽ More

    Submitted 28 October, 2019; originally announced October 2019.

    Comments: Published at Neurips 2019

  39. arXiv:1910.12807  [pdf, other

    stat.ML cs.LG

    Better Exploration with Optimistic Actor-Critic

    Authors: Kamil Ciosek, Quan Vuong, Robert Loftin, Katja Hofmann

    Abstract: Actor-critic methods, a type of model-free Reinforcement Learning, have been successfully applied to challenging tasks in continuous control, often achieving state-of-the art performance. However, wide-scale adoption of these methods in real-world domains is made difficult by their poor sample efficiency. We address this problem both theoretically and empirically. On the theoretical side, we ident… ▽ More

    Submitted 28 October, 2019; originally announced October 2019.

    Comments: 20 pages (including supplement)

    Journal ref: NeurIPS 2019

  40. arXiv:1910.09349  [pdf, other

    stat.ML cs.LG

    Variational Integrator Networks for Physically Structured Embeddings

    Authors: Steindor Saemundsson, Alexander Terenin, Katja Hofmann, Marc Peter Deisenroth

    Abstract: Learning workable representations of dynamical systems is becoming an increasingly important problem in a number of application areas. By leveraging recent work connecting deep neural networks to systems of differential equations, we propose \emph{variational integrator networks}, a class of neural network architectures designed to preserve the geometric structure of physical systems. This class o… ▽ More

    Submitted 2 March, 2020; v1 submitted 21 October, 2019; originally announced October 2019.

    Journal ref: Artificial Intelligence and Statistics, 2020

  41. arXiv:1910.08348  [pdf, other

    cs.LG stat.ML

    VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

    Authors: Luisa Zintgraf, Kyriacos Shiarlis, Maximilian Igl, Sebastian Schulze, Yarin Gal, Katja Hofmann, Shimon Whiteson

    Abstract: Trading off exploration and exploitation in an unknown environment is key to maximising expected return during learning. A Bayes-optimal policy, which does so optimally, conditions its actions not only on the environment state but on the agent's uncertainty about the environment. Computing a Bayes-optimal policy is however intractable for all but the smallest tasks. In this paper, we introduce var… ▽ More

    Submitted 27 February, 2020; v1 submitted 18 October, 2019; originally announced October 2019.

    Comments: Published at ICLR 2020

  42. arXiv:1910.07224  [pdf, other

    cs.LG cs.RO stat.ML

    Teacher algorithms for curriculum learning of Deep RL in continuously parameterized environments

    Authors: Rémy Portelas, Cédric Colas, Katja Hofmann, Pierre-Yves Oudeyer

    Abstract: We consider the problem of how a teacher algorithm can enable an unknown Deep Reinforcement Learning (DRL) student to become good at a skill over a wide range of diverse environments. To do so, we study how a teacher algorithm can learn to generate a learning curriculum, whereby it sequentially samples parameters controlling a stochastic procedural generation of environments. Because it does not i… ▽ More

    Submitted 16 October, 2019; originally announced October 2019.

    Comments: Accepted at CoRL 2019

  43. arXiv:1910.03094  [pdf, other

    cs.LG cs.AI cs.GT cs.MA stat.ML

    Combining No-regret and Q-learning

    Authors: Ian A. Kash, Michael Sullins, Katja Hofmann

    Abstract: Counterfactual Regret Minimization (CFR) has found success in settings like poker which have both terminal states and perfect recall. We seek to understand how to relax these requirements. As a first step, we introduce a simple algorithm, local no-regret learning (LONR), which uses a Q-learning-like update rule to allow learning without terminal states or perfect recall. We prove its convergence f… ▽ More

    Submitted 13 January, 2022; v1 submitted 7 October, 2019; originally announced October 2019.

    Comments: Presented as conference paper at AAMAS 2020

  44. arXiv:1906.01609  [pdf, ps, other

    cs.LG cs.GT

    Near-Optimal Online Egalitarian learning in General Sum Repeated Matrix Games

    Authors: Aristide Tossou, Christos Dimitrakakis, Jaroslaw Rzepecki, Katja Hofmann

    Abstract: We study two-player general sum repeated finite games where the rewards of each player are generated from an unknown distribution. Our aim is to find the egalitarian bargaining solution (EBS) for the repeated game, which can lead to much higher rewards than the maximin value of both players. Our most important contribution is the derivation of an algorithm that achieves simultaneously, for both pl… ▽ More

    Submitted 4 June, 2019; originally announced June 2019.

  45. arXiv:1904.10079  [pdf, other

    cs.LG cs.AI stat.ML

    The MineRL 2019 Competition on Sample Efficient Reinforcement Learning using Human Priors

    Authors: William H. Guss, Cayden Codel, Katja Hofmann, Brandon Houghton, Noboru Kuno, Stephanie Milani, Sharada Mohanty, Diego Perez Liebana, Ruslan Salakhutdinov, Nicholay Topin, Manuela Veloso, Phillip Wang

    Abstract: Though deep reinforcement learning has led to breakthroughs in many difficult domains, these successes have required an ever-increasing number of samples. As state-of-the-art reinforcement learning (RL) systems require an exponentially increasing number of samples, their development is restricted to a continually shrinking segment of the AI community. Likewise, many of these systems cannot be appl… ▽ More

    Submitted 19 January, 2021; v1 submitted 22 April, 2019; originally announced April 2019.

    Comments: accepted at NeurIPS 2019, 28 pages

  46. arXiv:1901.08129  [pdf, ps, other

    cs.AI

    The Multi-Agent Reinforcement Learning in MalmÖ (MARLÖ) Competition

    Authors: Diego Perez-Liebana, Katja Hofmann, Sharada Prasanna Mohanty, Noburu Kuno, Andre Kramer, Sam Devlin, Raluca D. Gaina, Daniel Ionita

    Abstract: Learning in multi-agent scenarios is a fruitful research direction, but current approaches still show scalability problems in multiple games with general reward settings and different opponent types. The Multi-Agent Reinforcement Learning in MalmÖ (MARLÖ) competition is a new challenge that proposes research in this domain using multiple 3D games. The goal of this contest is to foster research in… ▽ More

    Submitted 23 January, 2019; originally announced January 2019.

    Comments: 2 pages plus references

    Journal ref: Challenges in Machine Learning (NIPS Workshop), 2018

  47. arXiv:1810.06530  [pdf, other

    cs.LG stat.ML

    Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning

    Authors: David Janz, Jiri Hron, Przemysław Mazur, Katja Hofmann, José Miguel Hernández-Lobato, Sebastian Tschiatschek

    Abstract: Posterior sampling for reinforcement learning (PSRL) is an effective method for balancing exploration and exploitation in reinforcement learning. Randomised value functions (RVF) can be viewed as a promising approach to scaling PSRL. However, we show that most contemporary algorithms combining RVF with neural network function approximation do not possess the properties which make PSRL effective, a… ▽ More

    Submitted 3 December, 2019; v1 submitted 15 October, 2018; originally announced October 2018.

    Comments: Camera ready version, NeurIPS 2019

  48. arXiv:1810.03642  [pdf, other

    cs.LG stat.ML

    Fast Context Adaptation via Meta-Learning

    Authors: Luisa M Zintgraf, Kyriacos Shiarlis, Vitaly Kurin, Katja Hofmann, Shimon Whiteson

    Abstract: We propose CAVIA for meta-learning, a simple extension to MAML that is less prone to meta-overfitting, easier to parallelise, and more interpretable. CAVIA partitions the model parameters into two parts: context parameters that serve as additional input to the model and are adapted on individual tasks, and shared parameters that are meta-trained and shared across tasks. At test time, only the cont… ▽ More

    Submitted 10 June, 2019; v1 submitted 8 October, 2018; originally announced October 2018.

    Comments: Published at the International Conference on Machine Learning (ICML) 2019

  49. arXiv:1805.11711  [pdf, other

    cs.LG cs.AI stat.ML

    Depth and nonlinearity induce implicit exploration for RL

    Authors: Justas Dauparas, Ryota Tomioka, Katja Hofmann

    Abstract: The question of how to explore, i.e., take actions with uncertain outcomes to learn about possible future rewards, is a key question in reinforcement learning (RL). Here, we show a surprising result: We show that Q-learning with nonlinear Q-function and no explicit exploration (i.e., a purely greedy policy) can learn several standard benchmark tasks, including mountain car, equally well as, or bet… ▽ More

    Submitted 29 May, 2018; originally announced May 2018.

  50. arXiv:1805.09281  [pdf, other

    stat.ML cs.LG

    Variational Inference for Data-Efficient Model Learning in POMDPs

    Authors: Sebastian Tschiatschek, Kai Arulkumaran, Jan Stühmer, Katja Hofmann

    Abstract: Partially observable Markov decision processes (POMDPs) are a powerful abstraction for tasks that require decision making under uncertainty, and capture a wide range of real world tasks. Today, effective planning approaches exist that generate effective strategies given black-box models of a POMDP task. Yet, an open question is how to acquire accurate models for complex domains. In this paper we p… ▽ More

    Submitted 23 May, 2018; originally announced May 2018.