Skip to main content

Showing 1–26 of 26 results for author: Orseau, L

.
  1. arXiv:2405.04407  [pdf, ps, other

    cs.LG cs.AI

    Super-Exponential Regret for UCT, AlphaGo and Variants

    Authors: Laurent Orseau, Remi Munos

    Abstract: We improve the proofs of the lower bounds of Coquelin and Munos (2007) that demonstrate that UCT can have $\exp(\dots\exp(1)\dots)$ regret (with $Ω(D)$ exp terms) on the $D$-chain environment, and that a `polynomial' UCT variant has $\exp_2(\exp_2(D - O(\log D)))$ regret on the same environment -- the original proofs contain an oversight for rewards bounded in $[0, 1]$, which we fix in the present… ▽ More

    Submitted 17 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

  2. arXiv:2401.14953  [pdf, other

    cs.LG cs.AI

    Learning Universal Predictors

    Authors: Jordi Grau-Moya, Tim Genewein, Marcus Hutter, Laurent Orseau, Grégoire Delétang, Elliot Catt, Anian Ruoss, Li Kevin Wenliang, Christopher Mattern, Matthew Aitchison, Joel Veness

    Abstract: Meta-learning has emerged as a powerful approach to train neural networks to learn new tasks quickly from limited data. Broad exposure to different tasks leads to versatile representations enabling general problem solving. But, what are the limits of meta-learning? In this work, we explore the potential of amortizing the most powerful universal predictor, namely Solomonoff Induction (SI), into neu… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

    Comments: 32 pages, 11 figures

  3. arXiv:2311.03583  [pdf, other

    cs.AI cs.DM cs.LG

    Finding Increasingly Large Extremal Graphs with AlphaZero and Tabu Search

    Authors: Abbas Mehrabian, Ankit Anand, Hyunjik Kim, Nicolas Sonnerat, Matej Balog, Gheorghe Comanici, Tudor Berariu, Andrew Lee, Anian Ruoss, Anna Bulanova, Daniel Toyama, Sam Blackwell, Bernardino Romera Paredes, Petar Veličković, Laurent Orseau, Joonkyung Lee, Anurag Murty Naredla, Doina Precup, Adam Zsolt Wagner

    Abstract: This work studies a central extremal graph theory problem inspired by a 1975 conjecture of Erdős, which aims to find graphs with a given size (number of nodes) that maximize the number of edges without having 3- or 4-cycles. We formulate this problem as a sequential decision-making problem and compare AlphaZero, a neural network-guided tree search, with tabu search, a heuristic local search method… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

    Comments: Accepted at MATH AI workshop at NeurIPS 2023, First three authors contributed equally, Last two authors have equal senior contribution

  4. arXiv:2309.10668  [pdf, other

    cs.LG cs.AI cs.CL cs.IT

    Language Modeling Is Compression

    Authors: Grégoire Delétang, Anian Ruoss, Paul-Ambroise Duquenne, Elliot Catt, Tim Genewein, Christopher Mattern, Jordi Grau-Moya, Li Kevin Wenliang, Matthew Aitchison, Laurent Orseau, Marcus Hutter, Joel Veness

    Abstract: It has long been established that predictive models can be transformed into lossless compressors and vice versa. Incidentally, in recent years, the machine learning community has focused on training increasingly large and powerful self-supervised (language) models. Since these large language models exhibit impressive predictive capabilities, they are well-positioned to be strong compressors. In th… ▽ More

    Submitted 18 March, 2024; v1 submitted 19 September, 2023; originally announced September 2023.

  5. arXiv:2307.16560  [pdf, other

    math.OC cs.LG

    Line Search for Convex Minimization

    Authors: Laurent Orseau, Marcus Hutter

    Abstract: Golden-section search and bisection search are the two main principled algorithms for 1d minimization of quasiconvex (unimodal) functions. The first one only uses function queries, while the second one also uses gradient queries. Other algorithms exist under much stronger assumptions, such as Newton's method. However, to the best of our knowledge, there is no principled exact line search algorithm… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

  6. arXiv:2305.16945  [pdf, other

    cs.LG cs.AI

    Levin Tree Search with Context Models

    Authors: Laurent Orseau, Marcus Hutter, Levi H. S. Lelis

    Abstract: Levin Tree Search (LTS) is a search algorithm that makes use of a policy (a probability distribution over actions) and comes with a theoretical guarantee on the number of expansions before reaching a goal node, depending on the quality of the policy. This guarantee can be used as a loss function, which we call the LTS loss, to optimize neural networks representing the policy (LTS+NN). In this work… ▽ More

    Submitted 27 June, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

  7. arXiv:2302.03067  [pdf, other

    cs.LG cs.AI stat.ML

    Memory-Based Meta-Learning on Non-Stationary Distributions

    Authors: Tim Genewein, Grégoire Delétang, Anian Ruoss, Li Kevin Wenliang, Elliot Catt, Vincent Dutordoir, Jordi Grau-Moya, Laurent Orseau, Marcus Hutter, Joel Veness

    Abstract: Memory-based meta-learning is a technique for approximating Bayes-optimal predictors. Under fairly general conditions, minimizing sequential prediction error, measured by the log loss, leads to implicit meta-learning. The goal of this work is to investigate how far this interpretation can be realized by current sequence prediction models and training regimes. The focus is on piecewise stationary s… ▽ More

    Submitted 25 May, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

  8. arXiv:2112.14586  [pdf, other

    cs.LG cs.AI

    Isotuning With Applications To Scale-Free Online Learning

    Authors: Laurent Orseau, Marcus Hutter

    Abstract: We extend and combine several tools of the literature to design fast, adaptive, anytime and scale-free online learning algorithms. Scale-free regret bounds must scale linearly with the maximum loss, both toward large losses and toward very small losses. Adaptive regret bounds demonstrate that an algorithm can take advantage of easy data and potentially have constant regret. We seek to develop fast… ▽ More

    Submitted 11 July, 2023; v1 submitted 29 December, 2021; originally announced December 2021.

  9. arXiv:2112.10664  [pdf, other

    cs.AI cs.LO

    Proving Theorems using Incremental Learning and Hindsight Experience Replay

    Authors: Eser Aygün, Laurent Orseau, Ankit Anand, Xavier Glorot, Vlad Firoiu, Lei M. Zhang, Doina Precup, Shibl Mourad

    Abstract: Traditional automated theorem provers for first-order logic depend on speed-optimized search and many handcrafted heuristics that are designed to work best over a wide range of domains. Machine learning approaches in literature either depend on these traditional provers to bootstrap themselves or fall short on reaching comparable performance. In this paper, we propose a general incremental learnin… ▽ More

    Submitted 20 December, 2021; originally announced December 2021.

    Comments: 16 pages, 2 figures

    ACM Class: I.2.3

  10. arXiv:2105.14111  [pdf, other

    cs.LG cs.AI

    Goal Misgeneralization in Deep Reinforcement Learning

    Authors: Lauro Langosco, Jack Koch, Lee Sharkey, Jacob Pfau, Laurent Orseau, David Krueger

    Abstract: We study goal misgeneralization, a type of out-of-distribution generalization failure in reinforcement learning (RL). Goal misgeneralization failures occur when an RL agent retains its capabilities out-of-distribution yet pursues the wrong goal. For instance, an agent might continue to competently avoid obstacles, but navigate to the wrong place. In contrast, previous works have typically focused… ▽ More

    Submitted 9 January, 2023; v1 submitted 28 May, 2021; originally announced May 2021.

    Comments: Published in ICML 2022. 9 Pages

  11. arXiv:2103.11505  [pdf, other

    cs.AI cs.LG

    Policy-Guided Heuristic Search with Guarantees

    Authors: Laurent Orseau, Levi H. S. Lelis

    Abstract: The use of a policy and a heuristic function for guiding search can be quite effective in adversarial problems, as demonstrated by AlphaGo and its successors, which are based on the PUCT search algorithm. While PUCT can also be used to solve single-agent deterministic problems, it lacks guarantees on its search effort and it can be computationally inefficient in practice. Combining the A* algorith… ▽ More

    Submitted 21 March, 2021; originally announced March 2021.

  12. arXiv:2103.03798  [pdf, other

    cs.AI

    Training a First-Order Theorem Prover from Synthetic Data

    Authors: Vlad Firoiu, Eser Aygun, Ankit Anand, Zafarali Ahmed, Xavier Glorot, Laurent Orseau, Lei Zhang, Doina Precup, Shibl Mourad

    Abstract: A major challenge in applying machine learning to automated theorem proving is the scarcity of training data, which is a key ingredient in training successful deep learning models. To tackle this problem, we propose an approach that relies on training purely with synthetically generated theorems, without any human data aside from axioms. We use these theorems to train a neurally-guided saturation-… ▽ More

    Submitted 6 April, 2021; v1 submitted 5 March, 2021; originally announced March 2021.

  13. arXiv:2010.07877  [pdf, other

    cs.LG cs.AI

    Avoiding Side Effects By Considering Future Tasks

    Authors: Victoria Krakovna, Laurent Orseau, Richard Ngo, Miljan Martic, Shane Legg

    Abstract: Designing reward functions is difficult: the designer has to specify what to do (what it means to complete the task) as well as what not to do (side effects that should be avoided while completing the task). To alleviate the burden on the reward designer, we propose an algorithm to automatically generate an auxiliary reward function that penalizes side effects. This auxiliary objective rewards the… ▽ More

    Submitted 15 October, 2020; originally announced October 2020.

    Comments: Published in NeurIPS 2020

  14. arXiv:2006.12156  [pdf, other

    cs.LG stat.ML

    Logarithmic Pruning is All You Need

    Authors: Laurent Orseau, Marcus Hutter, Omar Rivasplata

    Abstract: The Lottery Ticket Hypothesis is a conjecture that every large neural network contains a subnetwork that, when trained in isolation, achieves comparable performance to the large network. An even stronger conjecture has been proven recently: Every sufficiently overparameterized network contains a subnetwork that, at random initialization, but without training, achieves comparable accuracy to the tr… ▽ More

    Submitted 25 October, 2020; v1 submitted 22 June, 2020; originally announced June 2020.

    Comments: NeurIPS 2020

  15. arXiv:2006.11259  [pdf, other

    cs.LO cs.LG

    Learning to Prove from Synthetic Theorems

    Authors: Eser Aygün, Zafarali Ahmed, Ankit Anand, Vlad Firoiu, Xavier Glorot, Laurent Orseau, Doina Precup, Shibl Mourad

    Abstract: A major challenge in applying machine learning to automated theorem proving is the scarcity of training data, which is a key ingredient in training successful deep learning models. To tackle this problem, we propose an approach that relies on training with synthetic theorems, generated from a set of axioms. We show that such theorems can be used to train an automated prover and that the learned pr… ▽ More

    Submitted 19 June, 2020; originally announced June 2020.

    Comments: 17 pages, 6 figures, submitted to NeurIPS 2020

    ACM Class: I.2.3

  16. arXiv:2004.13654  [pdf, other

    cs.AI

    Pitfalls of learning a reward function online

    Authors: Stuart Armstrong, Jan Leike, Laurent Orseau, Shane Legg

    Abstract: In some agent designs like inverse reinforcement learning an agent needs to learn its own reward function. Learning the reward function and optimising for it are typically two different processes, usually performed at different stages. We consider a continual (``one life'') learning approach where the agent both learns the reward function and optimises for it at the same time. We show that this co… ▽ More

    Submitted 28 April, 2020; originally announced April 2020.

  17. arXiv:1907.13062  [pdf, ps, other

    cs.DS cs.AI

    Iterative Budgeted Exponential Search

    Authors: Malte Helmert, Tor Lattimore, Levi H. S. Lelis, Laurent Orseau, Nathan R. Sturtevant

    Abstract: We tackle two long-standing problems related to re-expansions in heuristic search algorithms. For graph search, A* can require $Ω(2^{n})$ expansions, where $n$ is the number of states within the final $f$ bound. Existing algorithms that address this problem like B and B' improve this bound to $Ω(n^2)$. For tree search, IDA* can also require $Ω(n^2)$ expansions. We describe a new algorithmic framew… ▽ More

    Submitted 30 July, 2019; originally announced July 2019.

  18. arXiv:1906.03242  [pdf, other

    cs.AI cs.DS

    Zooming Cautiously: Linear-Memory Heuristic Search With Node Expansion Guarantees

    Authors: Laurent Orseau, Levi H. S. Lelis, Tor Lattimore

    Abstract: We introduce and analyze two parameter-free linear-memory tree search algorithms. Under mild assumptions we prove our algorithms are guaranteed to perform only a logarithmic factor more node expansions than A* when the search space is a tree. Previously, the best guarantee for a linear-memory algorithm under similar assumptions was achieved by IDA*, which in the worst case expands quadratically mo… ▽ More

    Submitted 7 June, 2019; originally announced June 2019.

    Comments: This paper and another independent IJCAI 2019 submission have been merged into a single paper that subsumes both of them (Helmert et. al., 2019). This paper is placed here only for historical context. Please only cite the subsuming paper

  19. arXiv:1901.03559  [pdf, other

    cs.LG cs.AI stat.ML

    An investigation of model-free planning

    Authors: Arthur Guez, Mehdi Mirza, Karol Gregor, Rishabh Kabra, Sébastien Racanière, Théophane Weber, David Raposo, Adam Santoro, Laurent Orseau, Tom Eccles, Greg Wayne, David Silver, Timothy Lillicrap

    Abstract: The field of reinforcement learning (RL) is facing increasingly challenging domains with combinatorial complexity. For an RL agent to address these challenges, it is essential that it can plan effectively. Prior work has typically utilized an explicit model of the environment, combined with a specific planning algorithm (such as tree search). More recently, a new family of methods have been propos… ▽ More

    Submitted 20 May, 2019; v1 submitted 11 January, 2019; originally announced January 2019.

  20. arXiv:1901.02230  [pdf, other

    cs.LG stat.ML

    Soft-Bayes: Prod for Mixtures of Experts with Log-Loss

    Authors: Laurent Orseau, Tor Lattimore, Shane Legg

    Abstract: We consider prediction with expert advice under the log-loss with the goal of deriving efficient and robust algorithms. We argue that existing algorithms such as exponentiated gradient, online gradient descent and online Newton step do not adequately satisfy both requirements. Our main contribution is an analysis of the Prod algorithm that is robust to any data sequence and runs in linear time rel… ▽ More

    Submitted 8 January, 2019; originally announced January 2019.

    Journal ref: Algorithmic Learning Theory 2017

  21. arXiv:1811.10928  [pdf, other

    cs.AI

    Single-Agent Policy Tree Search With Guarantees

    Authors: Laurent Orseau, Levi H. S. Lelis, Tor Lattimore, Théophane Weber

    Abstract: We introduce two novel tree search algorithms that use a policy to guide search. The first algorithm is a best-first enumeration that uses a cost function that allows us to prove an upper bound on the number of nodes to be expanded before reaching a goal state. We show that this best-first algorithm is particularly well suited for `needle-in-a-haystack' problems. The second algorithm is based on s… ▽ More

    Submitted 28 November, 2018; v1 submitted 27 November, 2018; originally announced November 2018.

    Journal ref: 32nd Conference on Neural Information Processing Systems (NIPS 2018), Montréal, Canada

  22. arXiv:1806.01186  [pdf, other

    cs.LG cs.AI stat.ML

    Penalizing side effects using stepwise relative reachability

    Authors: Victoria Krakovna, Laurent Orseau, Ramana Kumar, Miljan Martic, Shane Legg

    Abstract: How can we design safe reinforcement learning agents that avoid unnecessary disruptions to their environment? We show that current approaches to penalizing side effects can introduce bad incentives, e.g. to prevent any irreversible changes in the environment, including the actions of other agents. To isolate the source of such undesirable incentives, we break down side effects penalties into two c… ▽ More

    Submitted 8 March, 2019; v1 submitted 4 June, 2018; originally announced June 2018.

  23. arXiv:1805.12387  [pdf, other

    cs.LG cs.AI stat.ML

    Agents and Devices: A Relative Definition of Agency

    Authors: Laurent Orseau, Simon McGregor McGill, Shane Legg

    Abstract: According to Dennett, the same system may be described using a `physical' (mechanical) explanatory stance, or using an `intentional' (belief- and goal-based) explanatory stance. Humans tend to find the physical stance more helpful for certain systems, such as planets orbiting a star, and the intentional stance for others, such as living animals. We define a formal counterpart of physical and inten… ▽ More

    Submitted 31 May, 2018; originally announced May 2018.

  24. arXiv:1711.09883  [pdf, other

    cs.LG cs.AI

    AI Safety Gridworlds

    Authors: Jan Leike, Miljan Martic, Victoria Krakovna, Pedro A. Ortega, Tom Everitt, Andrew Lefrancq, Laurent Orseau, Shane Legg

    Abstract: We present a suite of reinforcement learning environments illustrating various safety properties of intelligent agents. These problems include safe interruptibility, avoiding side effects, absent supervisor, reward gaming, safe exploration, as well as robustness to self-modification, distributional shift, and adversaries. To measure compliance with the intended safe behavior, we equip each environ… ▽ More

    Submitted 28 November, 2017; v1 submitted 27 November, 2017; originally announced November 2017.

  25. arXiv:1705.08417  [pdf, other

    cs.AI cs.LG stat.ML

    Reinforcement Learning with a Corrupted Reward Channel

    Authors: Tom Everitt, Victoria Krakovna, Laurent Orseau, Marcus Hutter, Shane Legg

    Abstract: No real-world reward function is perfect. Sensory errors and software bugs may result in RL agents observing higher (or lower) rewards than they should. For example, a reinforcement learning agent may prefer states where a sensory error gives it the maximum reward, but where the true reward is actually small. We formalise this problem as a generalised Markov Decision Problem called Corrupt Reward… ▽ More

    Submitted 19 August, 2017; v1 submitted 23 May, 2017; originally announced May 2017.

    Comments: A shorter version of this report was accepted to IJCAI 2017 AI and Autonomy track

    ACM Class: I.2.6; I.2.8

  26. arXiv:1602.07905  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Thompson Sampling is Asymptotically Optimal in General Environments

    Authors: Jan Leike, Tor Lattimore, Laurent Orseau, Marcus Hutter

    Abstract: We discuss a variant of Thompson sampling for nonparametric reinforcement learning in a countable classes of general stochastic environments. These environments can be non-Markov, non-ergodic, and partially observable. We show that Thompson sampling learns the environment class in the sense that (1) asymptotically its value converges to the optimal value in mean and (2) given a recoverability assu… ▽ More

    Submitted 3 June, 2016; v1 submitted 25 February, 2016; originally announced February 2016.

    Comments: UAI 2016