Skip to main content

Showing 1–5 of 5 results for author: Blier, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2106.08863  [pdf, other

    cs.LG

    Unbiased Methods for Multi-Goal Reinforcement Learning

    Authors: Léonard Blier, Yann Ollivier

    Abstract: In multi-goal reinforcement learning (RL) settings, the reward for each goal is sparse, and located in a small neighborhood of the goal. In large dimension, the probability of reaching a reward vanishes and the agent receives little learning signal. Methods such as Hindsight Experience Replay (HER) tackle this issue by also learning from realized but unplanned-for goals. But HER is known to introd… ▽ More

    Submitted 16 June, 2021; originally announced June 2021.

    Comments: 9 pages

  2. arXiv:2101.07123  [pdf, other

    cs.LG

    Learning Successor States and Goal-Dependent Values: A Mathematical Viewpoint

    Authors: Léonard Blier, Corentin Tallec, Yann Ollivier

    Abstract: In reinforcement learning, temporal difference-based algorithms can be sample-inefficient: for instance, with sparse rewards, no learning occurs until a reward is observed. This can be remedied by learning richer objects, such as a model of the environment, or successor states. Successor states model the expected future state occupancy from any given state for a given policy and are related to goa… ▽ More

    Submitted 18 January, 2021; originally announced January 2021.

  3. arXiv:1901.09732  [pdf, other

    cs.LG stat.ML

    Making Deep Q-learning methods robust to time discretization

    Authors: Corentin Tallec, Léonard Blier, Yann Ollivier

    Abstract: Despite remarkable successes, Deep Reinforcement Learning (DRL) is not robust to hyperparameterization, implementation details, or small environment changes (Henderson et al. 2017, Zhang et al. 2018). Overcoming such sensitivity is key to making DRL applicable to real world problems. In this paper, we identify sensitivity to time discretization in near continuous-time environments as a critical fa… ▽ More

    Submitted 29 January, 2019; v1 submitted 28 January, 2019; originally announced January 2019.

  4. arXiv:1810.01322  [pdf, other

    cs.LG cs.NE stat.ML

    Learning with Random Learning Rates

    Authors: Léonard Blier, Pierre Wolinski, Yann Ollivier

    Abstract: Hyperparameter tuning is a bothersome step in the training of deep learning models. One of the most sensitive hyperparameters is the learning rate of the gradient descent. We present the 'All Learning Rates At Once' (Alrao) optimization method for neural networks: each unit or feature in the network gets its own learning rate sampled from a random distribution spanning several orders of magnitude.… ▽ More

    Submitted 29 January, 2019; v1 submitted 2 October, 2018; originally announced October 2018.

    Comments: 20 pages, 8 figures, code available on GitHub

  5. arXiv:1802.07044  [pdf, ps, other

    cs.LG

    The Description Length of Deep Learning Models

    Authors: Léonard Blier, Yann Ollivier

    Abstract: Solomonoff's general theory of inference and the Minimum Description Length principle formalize Occam's razor, and hold that a good model of data is a model that is good at losslessly compressing the data, including the cost of describing the model itself. Deep neural networks might seem to go against this principle given the large number of parameters to be encoded. We demonstrate experimentall… ▽ More

    Submitted 1 November, 2018; v1 submitted 20 February, 2018; originally announced February 2018.

    Comments: NIPS 2018