Skip to main content

Showing 1–3 of 3 results for author: Vernade, C

Searching in archive math. Search in all archives.
.
  1. arXiv:2405.18100  [pdf, other

    cs.LG math.OC

    A Pontryagin Perspective on Reinforcement Learning

    Authors: Onno Eberhard, Claire Vernade, Michael Muehlebach

    Abstract: Reinforcement learning has traditionally focused on learning state-dependent policies to solve optimal control problems in a closed-loop fashion. In this work, we introduce the paradigm of open-loop reinforcement learning where a fixed action sequence is learned instead. We present three new algorithms: one robust model-based method and two sample-efficient model-free methods. Rather than basing o… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  2. arXiv:2310.20266  [pdf, other

    cs.AI math.OC math.PR

    Beyond Average Return in Markov Decision Processes

    Authors: Alexandre Marthe, Aurélien Garivier, Claire Vernade

    Abstract: What are the functionals of the reward that can be computed and optimized exactly in Markov Decision Processes?In the finite-horizon, undiscounted setting, Dynamic Programming (DP) can only handle these operations efficiently for certain classes of statistics. We summarize the characterization of these classes for policy evaluation, and give a new answer for the planning problem. Interestingly, we… ▽ More

    Submitted 19 February, 2024; v1 submitted 31 October, 2023; originally announced October 2023.

    Comments: Neurips 2023, Dec 2023, New Orleans, United States

  3. arXiv:1606.02448  [pdf, other

    cs.LG math.ST

    Multiple-Play Bandits in the Position-Based Model

    Authors: Paul Lagrée, Claire Vernade, Olivier Cappé

    Abstract: Sequentially learning to place items in multi-position displays or lists is a task that can be cast into the multiple-play semi-bandit setting. However, a major concern in this context is when the system cannot decide whether the user feedback for each item is actually exploitable. Indeed, much of the content may have been simply ignored by the user. The present work proposes to exploit available… ▽ More

    Submitted 8 June, 2016; originally announced June 2016.