Skip to main content

Showing 1–6 of 6 results for author: Plisnier, H

.
  1. arXiv:2301.12820  [pdf, other

    cs.AI

    Transferring Multiple Policies to Hotstart Reinforcement Learning in an Air Compressor Management Problem

    Authors: Hélène Plisnier, Denis Steckelmacher, Jeroen Willems, Bruno Depraetere, Ann Nowé

    Abstract: Many instances of similar or almost-identical industrial machines or tools are often deployed at once, or in quick succession. For instance, a particular model of air compressor may be installed at hundreds of customers. Because these tools perform distinct but highly similar tasks, it is interesting to be able to quickly produce a high-quality controller for machine $N+1$ given the controllers al… ▽ More

    Submitted 30 January, 2023; originally announced January 2023.

    Comments: Preliminary version, experimental details still to be made more precise

  2. arXiv:1907.07958  [pdf, other

    cs.AI cs.RO

    Transfer Learning Across Simulated Robots With Different Sensors

    Authors: Hélène Plisnier, Denis Steckelmacher, Diederik Roijers, Ann Nowé

    Abstract: For a robot to learn a good policy, it often requires expensive equipment (such as sophisticated sensors) and a prepared training environment conducive to learning. However, it is seldom possible to perfectly equip robots for economic reasons, nor to guarantee ideal learning conditions, when deployed in real-life environments. A solution would be to prepare the robot in the lab environment, when a… ▽ More

    Submitted 18 July, 2019; originally announced July 2019.

  3. arXiv:1903.04193  [pdf, other

    cs.LG cs.AI

    Sample-Efficient Model-Free Reinforcement Learning with Off-Policy Critics

    Authors: Denis Steckelmacher, Hélène Plisnier, Diederik M. Roijers, Ann Nowé

    Abstract: Value-based reinforcement-learning algorithms provide state-of-the-art results in model-free discrete-action settings, and tend to outperform actor-critic algorithms. We argue that actor-critic algorithms are limited by their need for an on-policy critic. We propose Bootstrapped Dual Policy Iteration (BDPI), a novel model-free reinforcement-learning algorithm for continuous states and discrete act… ▽ More

    Submitted 12 June, 2019; v1 submitted 11 March, 2019; originally announced March 2019.

    Comments: Accepted at the European Conference on Machine Learning 2019 (ECML)

  4. arXiv:1902.02556  [pdf, other

    cs.AI

    The Actor-Advisor: Policy Gradient With Off-Policy Advice

    Authors: Hélène Plisnier, Denis Steckelmacher, Diederik M. Roijers, Ann Nowé

    Abstract: Actor-critic algorithms learn an explicit policy (actor), and an accompanying value function (critic). The actor performs actions in the environment, while the critic evaluates the actor's current policy. However, despite their stability and promising convergence properties, current actor-critic algorithms do not outperform critic-only ones in practice. We believe that the fact that the critic lea… ▽ More

    Submitted 7 February, 2019; originally announced February 2019.

  5. arXiv:1808.04096  [pdf, other

    cs.LG cs.AI stat.ML

    Directed Policy Gradient for Safe Reinforcement Learning with Human Advice

    Authors: Hélène Plisnier, Denis Steckelmacher, Tim Brys, Diederik M. Roijers, Ann Nowé

    Abstract: Many currently deployed Reinforcement Learning agents work in an environment shared with humans, be them co-workers, users or clients. It is desirable that these agents adjust to people's preferences, learn faster thanks to their help, and act safely around them. We argue that most current approaches that learn from human feedback are unsafe: rewarding or punishing the agent a-posteriori cannot im… ▽ More

    Submitted 13 August, 2018; originally announced August 2018.

    Comments: Accepted at the European Workshop on Reinforcement Learning 2018 (EWRL14)

  6. arXiv:1708.06551  [pdf, other

    cs.AI cs.LG

    Reinforcement Learning in POMDPs with Memoryless Options and Option-Observation Initiation Sets

    Authors: Denis Steckelmacher, Diederik M. Roijers, Anna Harutyunyan, Peter Vrancx, Hélène Plisnier, Ann Nowé

    Abstract: Many real-world reinforcement learning problems have a hierarchical nature, and often exhibit some degree of partial observability. While hierarchy and partial observability are usually tackled separately (for instance by combining recurrent neural networks and options), we show that addressing both problems simultaneously is simpler and more efficient in many cases. More specifically, we make the… ▽ More

    Submitted 12 September, 2017; v1 submitted 22 August, 2017; originally announced August 2017.