Skip to main content

Showing 1–4 of 4 results for author: Rauber, P

.
  1. arXiv:2305.00477  [pdf, other

    cs.LG cs.AI

    Posterior Sampling for Deep Reinforcement Learning

    Authors: Remo Sasso, Michelangelo Conserva, Paulo Rauber

    Abstract: Despite remarkable successes, deep reinforcement learning algorithms remain sample inefficient: they require an enormous amount of trial and error to find good policies. Model-based algorithms promise sample efficiency by building an environment model that can be used for planning. Posterior Sampling for Reinforcement Learning is such a model-based algorithm that has attracted significant interest… ▽ More

    Submitted 17 May, 2023; v1 submitted 30 April, 2023; originally announced May 2023.

    MSC Class: 68T07 ACM Class: I.2.m

  2. arXiv:2210.13075  [pdf, other

    cs.LG

    Hardness in Markov Decision Processes: Theory and Practice

    Authors: Michelangelo Conserva, Paulo Rauber

    Abstract: Meticulously analysing the empirical strengths and weaknesses of reinforcement learning methods in hard (challenging) environments is essential to inspire innovations and assess progress in the field. In tabular reinforcement learning, there is no well-established standard selection of environments to conduct such analysis, which is partially due to the lack of a widespread understanding of the ri… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

  3. Recurrent Neural-Linear Posterior Sampling for Nonstationary Contextual Bandits

    Authors: Aditya Ramesh, Paulo Rauber, Michelangelo Conserva, Jürgen Schmidhuber

    Abstract: An agent in a nonstationary contextual bandit problem should balance between exploration and the exploitation of (periodic or structured) patterns present in its previous experiences. Handcrafting an appropriate historical context is an attractive alternative to transform a nonstationary problem into a stationary problem that can be solved efficiently. However, even a carefully designed historical… ▽ More

    Submitted 3 November, 2023; v1 submitted 9 July, 2020; originally announced July 2020.

    Journal ref: Neural Computation. 2022 Oct 7;34(11):2232-72

  4. arXiv:1711.06006  [pdf, other

    cs.LG cs.AI cs.NE cs.RO

    Hindsight policy gradients

    Authors: Paulo Rauber, Avinash Ummadisingu, Filipe Mutz, Juergen Schmidhuber

    Abstract: A reinforcement learning agent that needs to pursue different goals across episodes requires a goal-conditional policy. In addition to their potential to generalize desirable behavior to unseen goals, such policies may also enable higher-level planning based on subgoals. In sparse-reward environments, the capacity to exploit information about the degree to which an arbitrary goal has been achieved… ▽ More

    Submitted 20 February, 2019; v1 submitted 16 November, 2017; originally announced November 2017.

    Comments: Accepted to ICLR 2019